CSC 470 Data Mining Spring
2004
Project Data
Mining Project 25%
Plus 5% for Presentation
Due: 04/22/04 AT
THE START OF CLASS.
IF
NOT TURNED IN AT THE START OF CLASS, YOU WILL BE CONSIDERED LATE!!!
Task:
The
project may be done individually or in pairs. The project and presentation are
related. Students will pick out some data that is of interest to them (personal
or professional). They will choose a data mining goal with respect to the data.
They will prepare the data, experiment with it, and determine results. Preparation may well require converting the
data into arff file format before any other work is done.
The
presentation will discuss the task and goals, an analysis of usefulness of
available methods for the task, a summary of results and a conclusion. If the
data is proprietary, be sure not to reveal proprietary aspects.
Start
thinking about what you want to do (and partner if to work in pair), and bounce
ideas off me as soon as possible.
Hand
In:
§
A
log of all experiments that you did. For each, include:
o
when
done,
o
algorithm,
including any (non-default) options,
o
data (filename, with what preparation was done to
it (e.g. whether attributes were discretized) (only need to explain a given
filename once),
o
percent
correct,
o
confusion
matrix (if numeric prediction, ave error instead of previous two),
o
comments
on the model generated,
o
comments
on the relative success of the experiment,
o
what
you plan to do next and why.
§
Results
files for each of the above experiments.
§
All
data files used in your experiments (i.e. when you change a dataset, make sure
the starting and resulting dataset are both saved)
§
A
short paper, discussing the data, what you were trying to accomplish with the
data, highlights of your experiments, conclusion, and what you would do next if
continuing. 3-5 pages should be sufficient (the log will have all the details).
§
Any
visual aids used in your presentation