notes-books-langleyElementsOfMachineLearning

questions, comments, and potential errata on langley's Elements of machine learning (i haven't checked these over with anyone yet; after i look them over again, i plan to ask Langley about them):

2.6.3: why it is GA more like incremental beam search than incremental hill climbing? 2.7: EGS eliminates based on positive instances, not negatives 3.1.2: did you mean "thin walls" rather than "thick walls"? fig 3.1: i think (c) is wrong because the "thin walls"/"thick walls" dimension is not decisive. assuming you meant "thin walls" above, thin walls/two nuclei/two tails should be false and thick walls/one nucleus/two tails should be true 2.3.2: no need to normalize the heuristic function since only rank order matters 3.1.2: no need to have continuous output if you are going to threshold it immediately 3.1.2: why is threshold for radial basis function =sigma ? 3.1.2: perhaps you meant to square d? 3.3.1: in this formulation i think threshold should have the value -1, not 1. i think the example needs to be revised 3.3.3: define d 3.3.3: we have

\sum w_i x_i = V \sum_{i\neq k} w_i x_i + w_k x_k = V x_k (\sum_{i\neq k} w_i x_i/x_k + w_k) = V x_k (U_k + w_k) = V

so assuming (V_j > 0 -> j is positive, V_j < 0 -> j is negative), the condition is j is positive iff x_k (U_k + w_k) is positive iff (x_k > 0 and w_k > -U_k) or ((x_k < 0 and w_k < -U_k)

table 3.5 (IWP): <= should be >=

table 3.5 (IWP): doesn't include the leaving one attribute out as indicated in the text

3.3.3: i don't understand how a IWP revision can ever decrease the score; isn't (something equivalent to) the previous value of w_j always one of the options? 5.4: i don't understand the second to last paragraph, the 1 that talks about "enclosed terms" 5.6: it says ECD is ICD - averaging - sufficient match condition, but ECD is error driven and ICD is not. 5.7.2: "0 with the remaining 5" should be "0 with the remaining 4" table 6.2: the first two appearances of "P(simploid1" should be "P(simploid2"

note: AND/OR nets are monotonic boolean circuits

7.6, last 2 paragraphs: i think i can see a way to make a more direct analog of a decision tree topology using a threshold network. ditto for the "ahrder to see" things in the last paragraph. 8.1.2: i don't think the IHC section mentioned simplicity