CIS 655 Summer 2005

Assignment 5 – Artificial Neural Networks with Weka

Due: Start of Class on 07/27/05

One area that neural networks have been successful in has been pattern recognition. There is a file, prepared for Weka, on my www page – CharRecognition.arff that has simulated lower case ‘h’s and non-h’s. If you want to visualize how these 1’s and 0’s in the file represent h’s and non-h’s, look at the ‘picts’ tab in the CharRecognition.xls file. It has all of the h’s and some of the non-h’s (some of the non-h’s are just random patterns of 1’s and 0’s.) (Actually the bottom h’s are actually in the later file discussed below) .

Task:

Initial Experiment

· Run the Weka software MultilayerPerceptron Classifier (with default options) on the data file. Save the results (right click on the result on the Result list, choose Save and specify the file name)

· Turn in this results file, and answer the following questions:

a. What do you think the “chance” probability of getting a prediction correct is in this problem? Explain.

b. Do you think that the programs’ accuracy is significantly better/worse than “chance”? Explain.

c. Would you consider the neural net’s performance to be good? Explain.

d. Look at the model built by the neural net. How useful do you think it is for explaining results to the user?

Follow-up Experiments

Run Weka IBk (Instance-based with K nearest neighrbors) with several levels of K (number of neighbors)
Questions:

a. How do the results with IBk compare to the neural network?

Different Data

· Load the data file CharRecognitionWShifts. In this data, some of the h’s have been shifted left or right. This should make the problem more difficult as some h’s may have 1’s in many different locations than other h’s. See the bottom 6 picts in the xls file.

· Run the Weka software MultilayerPerceptron Classifier (with default options) on the new data file. Save the results.

· Turn in this results file, and answer the following questions:

a. Do you think performance (accuracy) on this task was significantly worse than on the original data? Explain.

b. One might have expected that accuracy would have been really badly affected on this harder dataset. Why do you think that it was not?

Turn In:

· Softcopies of your results files (preferably named something related to them + you)

· Answers to all questions above (either hard or soft copy).

NOTE:

I expect that this should be able to be completed in class. But, if necessary, the Weka Tool is free-ware and can be downloaded from http://www.cs.waikato.ac.nz/~ml/weka/ . Look for Windows and the appropriate choice based on whether you already have the Java VM or not)
Weka is a great tool. Anybody interested in data mining who has space on their PC should get it.