Midterm Exam Answers

Test Form C.

D
D
D
D
Greedy
Supervised
Nearest Neighbors
Prior Probability
Black box
Nominal
True
True – but false with an argument that “in some tasks people accept argument via cases” would also get full credit
False – there are a number of successful fielded applications as discussed in Chapter 1 – screening borderline loan approvals, detecting oil slicks, forecasting electricity demand, etc. Each provided significant benefit to the company using them.
False – some methods can do numeric prediction – including regression tree and model tree methods, instance-based learning, etc
People must be involved a number of ways (having 4 would be full credit):

· Clean data – somebody with knowledge of the data is very valuable in cleaning the data, and actually cleaning the data usually requires human action

· Data preparation – it may be valuable for the learning method for a person to develop new attributes based on existing attributes. Human intelligence is needed to determine what would be valuable

· Determine what experiments to do – people determine what algorithms are appropriate for the current data, what to try

· Evaluate results – people must determine if the accuracy found in tests is good or not

· Evaluate results – people must determine if the info learned makes sense

· Use results – people must determine what to do with what is learned

If the algorithm suggests making a decision based on an attribute that would be considered discriminatory (race, ethnicity, age, gender), the result (if used) is discrimination.
A total of ___9____ of ___18_____ predictions were correct.

· A total of ____9____ predictions were incorrect.

· Of the ____12____ times Cancer was predicted, this prediction was correct ___5_____ times.

· Of the ___6___ times Not Cancer was predicted, this prediction was correct ___4_____ times.

· Of the ___7_____ times that Cancer occurred, the prediction was correct -____5____ times

· and incorrect ___2____ times.

· Of the ___11_____ times that Not Cancer occurred, the prediction was correct -____4____

· times and incorrect ___7____ times.

Probabilities with Laplace Estimator

Area	Purchase = Yes	Purchase = No
Mt Airy	4/10	4/13
Germantown	5/10	3/13
Manyunk	1/10	6/13

Home	Purchase = Yes	Purchase = No
Own	5/9	5/12
Rent	4/9	7/12

Age	Purchase = Yes	Purchase = No
Young	6/11	2/14
Established	3/11	5/14
Middle Aged	1/11	4/14
Old	1/11	3/14

To Predict	Yes	No
Purchase	8/19	11/19

Test Instance: Mt Airy, Rent, Established

Prob(Yes | Evidence ) = 4/10 * 4/9 * 3/11 * 8/19 = .0204

Prob(No | Evidence ) = 4/13 * 7/12 * 5/14 * 11/19 = .037

Predict NO, since it’s value is higher.

Sorted By Rating, showing value for Buy. Tally Yeses and Nos until have at least 3 of one – then continue until Buy answer switches

Rating	Buy	Num Yes	Num No
21	No	0	1
25	Yes	1	1
27	No	1	2
28	Yes	2	2
29	No	2	3
30	No	2	4
30	No	2	5
33	No	2	6
35	No	2	7
38	No	2	8
40	Yes	1	0
41	Yes	2	0
41	No	2	1
45	No	2	2
48	No	2	3
49	No	2	4
51	No	2	5
52	Yes	1	0
53	Yes	2	0

Dividing lines are halfway in between – hence 39 and 51.5

Technically, the first two categories could be collapsed into one category since they have the same answer (No)