CSC 470 Data Mining Fall 2005 – Evening Section

Project                                                  Data Mining Project                              25% Plus 5% for Presentation

 

Due:         12/07/05   AT THE START OF CLASS.

IF NOT TURNED IN AT THE START OF CLASS, YOU WILL BE CONSIDERED LATE!!!

 

Task:

The project may be done individually or in pairs. The project and presentation are related. Students will pick out some data that is of interest to them (personal or professional). They will choose a data mining goal with respect to the data. They will prepare the data, experiment with it, and determine results.  Preparation may well require converting the data into arff file format before any other work is done.

The presentation will discuss the task and goals, an analysis of usefulness of available methods for the task, a summary of results and a conclusion. If the data is proprietary, be sure not to reveal proprietary aspects. 

Start thinking about what you want to do (and partner if to work in pair), and bounce ideas off me as soon as possible.

 

Hand In:

§         A log of all experiments that you did. For each, include:

o        when done,

o        algorithm, including any (non-default) options,

o        data  (filename, with what preparation was done to it (e.g. whether attributes were discretized) (only need to explain a given filename once),

o        percent correct,

o        confusion matrix (if numeric prediction, ave error instead of previous two),

o        comments on the model generated,

o        comments on the relative success of the experiment,

o        what you plan to do next and why.

§         Results files for each of the above experiments.

§         All data files used in your experiments (i.e. when you change a dataset, make sure the starting and resulting dataset are both saved)

§         A short paper, discussing the data, what you were trying to accomplish with the data, highlights of your experiments, conclusion, and what you would do next if continuing. 3-5 pages should be sufficient (the log will have all the details).

§         Any visual aids used in your presentation