Course Expectations and Tentative Syllabus
CSC:470 Special
Topics: Data Mining Spring 2004
Olney 200 Thur 2:00-4:30pm
Professor: Dr. Michael Redmond
330
Olney Hall (215) 951-1096
http://www.lasalle.edu/~redmond/470
Office Hours: MW
11am-12:50pm, Th 11am-12:30pm
And at other times by appointment.
Also, by phone and e-mail.
Text:
Witten,
I. H., and Frank, E. Data Mining; Practical Machine Learning Tools and
Techniques with Java Implementations, Morgan Kaufmann, 2000, ISBN 1-55860-552-5
Course Description:
This course is an introduction to data mining, with an emphasis on applying machine learning techniques for data mining. Data Mining involves digging information or knowledge out of a mass of raw data. Time Magazine named “Data Miner” number 5 on its list of top 10 jobs for the new century. Some practical applications include credit risk analysis and database marketing.
Machine Learning involves the attempt to get computer programs to acquire skills that they were not specifically programmed for and/or to improve with experience. Popular methods include methods of learning decision trees and decision tables, learning rules, and “lazy learning” – case-based reasoning. We will look in detail at several learning methods and their variations and how the machine learning methods can be used for data mining, including which algorithms can be used productively for what tasks and what data.
Also emphasized will be data preparation, and evaluation of results. The course work will involve some programming and will involve experimenting with public domain versions of famous machine learning/data mining programs to see how they work. Students will carry out a project with available data (probably using the public domain programs). This course counts for CSC and IT Elective credits.
Grading:
Assignments (5) 25%
Midterm Exam 20%
Project 25%
Presentation 5%
Final Exam 25%
Final Grades:
A 92-100 A-
90-91
B+ 88-89 B 82-87 B- 80-81
C+ 78-79 C 72-77 C- 70-71
D+ 68-69 D 60-67 F < 60
No
make up exams unless arranged in advance. Make ups may involve double-counting
of the final exam. Final exam is cumulative, but will focus more heavily on the
(previously untested) final half of the course.
The
assignments will mainly involve specific tasks experimenting with some famous
machine learning data mining programs. The programs we will use will be public
domain copies that are available over the Internet. Tasks will include data preparation, experimentation, and
analysis of results. There will be one
assignment involving writing a program – possibly implementing a well known
machine learning algorithm and applying it to some existing data. Details of assignments will be presented as
the semester proceeds. Since this class
only meets once a week, if you miss class make sure you find out if any
assignments were assigned.
The
project may be done individually or in pairs. The project and presentation are
related. Students will pick out some data that is of interest to them (personal
or professional). They will choose a Data Mining goal with respect to the data.
They will prepare the data, experiment with it, and determine results. The presentation will discuss the task and
goals, an analysis of usefulness of available methods for the task, a summary
of results and a conclusion. If the data is proprietary, be sure not to reveal
proprietary aspects.
Materials:
You may need at least 2 diskettes (or alternative media, such as CD-RW
(the Olney 200 lab DOES NOT have zip drives)). You may need access to Java and the WEKA data mining software outside of
class. It will be installed in Olney
200 and 200A, and may be downloaded for free from:
http://www.cs.waikato.ac.nz/ml/weka/
Course
Objectives
Concepts:
1. The student should understand the types
of problems being attacked by data mining, particularly through machine
learning methods.
2. The student should understand the methods
and techniques used to attack data mining problems, particularly machine
learning methods.
3. The student should understand how the
assumptions made influence the learning methods that can be used.
4. The student should understand various
learning methods.
5. The student should understand the methods
of evaluating data mining approaches and applications.
Applications:
1. The student should be able to prepare
data for data mining – including putting data into a standard format.
2. The student should be able to identify
machine learning algorithms that have the potential to address a given data
mining task.
3. The student should be able to analyze the
results of data mining experiments and come to conclusions
4. The student should be able to write a
program recreating an existing data mining method – demonstrating a detailed
understanding of the method.
5. The student should be able to integrate
concepts from the course to carry out a complete data mining project.
Tentative
Course Plan:
Date |
Material |
Reading |
Assignments
(Tentative) |
Jan 15 |
Intro to
Class, Intro to Data Mining |
|
|
Jan 22 |
Intro to
Data Mining |
Chapt 1 |
|
Jan 29 |
Concepts,
Instances |
Chapt 2 |
|
Feb 5 |
Attributes,
Data Preparation |
|
Data Prep
Assigned |
Feb 12 |
Output
Knowledge Representation |
Chapt 3 |
|
Feb 19 |
MIDTERM |
|
|
Feb 26 |
OneR |
Section 4.1 |
|
Mar 4 |
SPRING
BREAK – NO CLASS |
|
|
Mar 11 |
Naïve Bayes |
Section 4.2 |
Mining 1
Assigned |
Mar 18 |
Decision
Trees and Decision Rules |
Section
4.3, 4.4 |
Mining 2
Assigned |
Mar 25 |
Regression;
Instance Based Learning |
Section
4.6; 4.7 |
Program
Assigned |
Apr 1 |
Evaluation |
Chapt 5 |
Evaluation
Assigned |
Apr 8 |
Evaluation CLASSES
MEET ON HOLY THURSDAY!!!, SORRY! |
|
Project
Assigned |
Apr 15 |
Engineering
input and output |
Chapt 7 |
|
Apr 22 |
Engineering
input and output/ Project Presentations |
|
|
Apr 29 |
Final Exam |
|
|