CSC 456 Spring 2008

Assignment 4 – Artificial Neural Networks with Weka

Due: Start of Class on 04/16/08

One area that neural networks have been successful in has been pattern recognition. There is a file, prepared for Weka, on my www assignment page – CharRecognition.arff that has simulated lower case ‘h’s and non-h’s. If you want to visualize how these 1’s and 0’s in the file represent h’s and non-h’s, look at the ‘picts’ tab in the CharRecognition.xls file. It has all of the h’s and some of the non-h’s (some of the non-h’s are just random patterns of 1’s and 0’s.) (Actually the bottom h’s are actually in the later file discussed below) .

Task:

Initial Experiment

· Run the Weka software MultilayerPerceptron Classifier (with default options) on the data file. Save the results (right click on the result on the Result list, choose Save Result Buffer and specify the file name (and location)).

· Turn in this results file, and answer the following questions:

a. Frequently we refer to the “chance” probability of getting a prediction correct. This means the chances of correctly predicting just by random chance. For instance, the chances of correctly predicting a coin flip is 50%. What do you think the “chance” probability of getting a prediction correct is in this problem? Explain.

b. Do you think that the programs’ accuracy is significantly better/worse than “chance”? Explain.

c. Would you consider the neural net’s performance to be good? Explain.

d. Look at the model built by the neural net. How useful do you think it is for explaining results to the user?

Different Data

· Load the data file CharRecognitionWShifts (available on my www assignment page). In this data, some of the h’s have been shifted left or right. This should make the problem more difficult as some h’s may have 1’s in many different locations than other h’s. See the bottom 6 picts in the xls file.

· Run the Weka software MultilayerPerceptron Classifier (with default options) on the new data file. Save the results.

· Turn in this results file, and answer the following questions:

a. Do you think performance (accuracy) on this task was significantly different than on the original data? Explain.

b. One might have expected that accuracy would have been really badly affected on this harder dataset. Why do you think that it was not (tie in with how the neural net works)?

Turn In:

· Softcopies of your results files (preferably named something related to them + you)

· Answers to all questions above (either hard or soft copy).

· Zip all softcopies together and submit to Blackboard.

NOTE:

I expect that this should be able to be completed in class. I plan most of 1 class period to work on it – probably on 4/09. But, if necessary, the Weka Tool is free-ware and can be downloaded from http://www.cs.waikato.ac.nz/~ml/weka/ . Look for Windows and the appropriate choice based on whether you already have the Java VM or not)
Weka is a great tool. Anybody interested in data mining who has space on their PC should get it.