Midterm Exam Answers
Test Form C.
· Clean data – somebody with knowledge of the data is very valuable in cleaning the data, and actually cleaning the data usually requires human action
· Data preparation – it may be valuable for the learning method for a person to develop new attributes based on existing attributes. Human intelligence is needed to determine what would be valuable
· Determine what experiments to do – people determine what algorithms are appropriate for the current data, what to try
· Evaluate results – people must determine if the accuracy found in tests is good or not
· Evaluate results – people must determine if the info learned makes sense
· Use results – people must determine what to do with what is learned
· A total of ____9____ predictions were incorrect.
· Of the ____12____ times Cancer was predicted, this prediction was correct ___5_____ times.
· Of the ___6___ times Not Cancer was predicted, this prediction was correct ___4_____ times.
· Of the ___7_____ times that Cancer occurred, the prediction was correct -____5____ times
· and incorrect ___2____ times.
· Of the ___11_____ times that Not Cancer occurred, the prediction was correct -____4____
· times and incorrect ___7____ times.
Probabilities with Laplace Estimator
Area |
Purchase = Yes |
Purchase = No |
Mt Airy |
4/10 |
4/13 |
Germantown |
5/10 |
3/13 |
Manyunk |
1/10 |
6/13 |
Home |
Purchase = Yes |
Purchase = No |
Own |
5/9 |
5/12 |
Rent |
4/9 |
7/12 |
Age |
Purchase = Yes |
Purchase = No |
Young |
6/11 |
2/14 |
Established |
3/11 |
5/14 |
Middle Aged |
1/11 |
4/14 |
Old |
1/11 |
3/14 |
To Predict |
Yes |
No |
Purchase |
8/19 |
11/19 |
Test Instance: Mt Airy, Rent, Established
Prob(Yes | Evidence ) = 4/10 * 4/9 * 3/11 * 8/19 = .0204
Prob(No | Evidence ) = 4/13 * 7/12 * 5/14 * 11/19 = .037
Predict NO, since it’s value is higher.
Sorted By Rating, showing value for Buy. Tally Yeses and Nos until have at least 3 of one – then continue until Buy answer switches
Rating |
Buy |
Num Yes |
Num No |
21 |
No |
0 |
1 |
25 |
Yes |
1 |
1 |
27 |
No |
1 |
2 |
28 |
Yes |
2 |
2 |
29 |
No |
2 |
3 |
30 |
No |
2 |
4 |
30 |
No |
2 |
5 |
33 |
No |
2 |
6 |
35 |
No |
2 |
7 |
38 |
No |
2 |
8 |
40 |
Yes |
1 |
0 |
41 |
Yes |
2 |
0 |
41 |
No |
2 |
1 |
45 |
No |
2 |
2 |
48 |
No |
2 |
3 |
49 |
No |
2 |
4 |
51 |
No |
2 |
5 |
52 |
Yes |
1 |
0 |
53 |
Yes |
2 |
0 |
Dividing lines are halfway in between – hence 39 and 51.5
Technically, the first two categories could be collapsed into one category since they have the same answer (No)