Course Expectations and Tentative Syllabus

 

CIS:658                                 Machine Learning Approaches to Data Mining                                             Fall 2007

                                                Olney 201                                                                                                              Thur  6:15-9:00pm

 

Professor:              Dr. Michael Redmond   

                                330 Olney Hall  (215) 951-1096

                                redmond@lasalle.edu

http://www.lasalle.edu/~redmond/teach/658

 

Office Hours: Thur 5-6pm  

                          And at other times by appointment. Also, by phone and e-mail.

 

Text:

Witten, I. H., and Frank, E. Data Mining; Practical Machine Learning Tools and Techniques, Second Edition, Morgan Kaufmann, 2005,  ISBN 0-12-088407-0

 

Course Description:

 This course is an introduction to data mining, with an emphasis on applying machine learning techniques for data mining. Data Mining involves digging information or knowledge out of a mass of raw data. Time Magazine named “Data Miner” number 5 on its list of top 10 jobs for the new century in 1999. ComputerWorld named “Machine Learning” one of “12 IT skills that employers can’t say no to” in 2007.  Some practical applications include credit risk analysis and database marketing.

Machine Learning involves the attempt to get computer programs to acquire skills that they were not specifically programmed for and/or to improve with experience. Popular methods include methods of learning decision trees and decision tables, learning rules, and “lazy learning” – case-based reasoning. We will look in detail at several learning methods and their variations and how the machine learning methods can be used for data mining, including which algorithms can be used productively for what tasks and what data.

Also emphasized will be data preparation, and evaluation of results. The course work will involve a lot of experimenting with public domain versions of famous machine learning/data mining programs to see how they work. Students will carry out a project with available data (probably using the public domain programs).  Students will also have a choice of writing a small machine learning program implementing a known algorithm or writing a paper. This course counts for CIS and ITL Free Elective credits.

 

Grading:

Assignments (4)                   20%

Midterm Exam                       20%

Program / Paper                    10%

Project                                    20%

Presentation                             5%

Final Exam             25%

 

                Final Grades:

A             92-100                     A-           90-91

B+           88-89                       B             82-87                       B-            80-81

C             60-79                      

F              < 60

 

No make up exams unless arranged in advance. Make ups may involve double-counting of the final exam. Final exam is cumulative, but will focus more heavily on the (previously untested) final half of the course.

The assignments will mainly involve specific tasks experimenting with some famous machine learning data mining programs. The programs we will use will be public domain copies that are available over the Internet.  Tasks will include data preparation, experimentation, and analysis of results. Details of assignments will be presented as the semester proceeds.  Since this class only meets once a week, if you miss class make sure you find out if any assignments were assigned.

There will also be a choice of writing a program or a paper. The program will probably involve implementing a well known machine learning algorithm and applying it to some existing data.  The paper will probably involve comparing and contrasting available data mining tools (multiple references necessary).  It is highly recommended that CIS students choose the program option.

The project may be done individually or in pairs. The project and presentation are related. Students will pick out some data that is of interest to them (personal or professional). They will choose a Data Mining goal with respect to the data. They will prepare the data, experiment with it, and determine results.  The presentation will discuss the task and goals, an analysis of usefulness of available methods for the task, a summary of results and a conclusion. If the data is proprietary, be sure not to reveal proprietary aspects. 

 

Materials:  You may need access to Java and the WEKA data mining software outside of class (WEKA is written in Java).  It will be installed in Olney 200 and 200A, and may be downloaded for free from: 

http://www.cs.waikato.ac.nz/ml/weka/

If choosing the program option, you need access to a programming environment such as Visual Studio or NetBeans. You need a means of handing in assignments involving multiple files – probably via zipped archives such as via Winzip or Windows compressed  folders. 

 

                Course Objectives

 

Concepts:

 

1.        The student should understand the types of problems being attacked by data mining, particularly through machine learning methods.

 

2.        The student should understand the methods and techniques used to attack data mining problems, particularly machine learning methods.

 

3.        The student should understand how the assumptions made influence the learning methods that can be used.

 

4.        The student should understand in detail various learning methods.

 

5.        The student should understand the methods of evaluating data mining approaches and applications.

 

 

Applications:

 

1.        The student should be able to prepare data for data mining – including putting data into a standard format.

 

2.        The student should be able to identify machine learning algorithms that have the potential to address a given data mining task.

 

3.        The student should be able to analyze the results of data mining experiments and come to conclusions

 

4.        (Option – geared toward CIS) The student should be able to write a program recreating an existing data mining method – demonstrating a detailed understanding of the method.

 

5.        (Option – geared toward ITL) The student should successfully write a paper analyzing available data mining software – in terms of capabilities, flexibility, robustness, cost, learning curve, …. issues relevant to choice of software for use in the enterprise.

 

6.        The student should be able to integrate concepts from the course to carry out a complete data mining project.

 

 


 

Tentative Course Plan:

 

Date

Material

Reading

Assignments (Tentative)

Aug 30

Intro to Class, Intro to Data Mining

 

 

Sept 6

Intro to Data Mining

Chapt 1

 

Sept 13  

Concepts, Instances

Chapt 2

Data Prep Assigned

Sept 20  

Attributes, Data Preparation

 

 

Sept 27  

Output Knowledge Representation

Chapt 3

Mining 1 Assigned

Oct 4      

MIDTERM

 

 

Oct 11    

OneR

Section 4.1

Program/Paper Assigned

Oct 18    

Naïve Bayes

Section 4.2

Mining 2 Assigned

Oct 25    

Decision Trees and Decision Rules

Section 4.3, 4.4

Project Assigned

Nov 1    

Regression;

Section 4.6;  

 

Nov 8    

Instance Based Learning

Section 4.7

 

Nov 15  

K Means Clustering

Section 4.8

 

Nov 22

THANKSGIVING – NO CLASS

 

 

Nov 29  

Evaluation

Chapt 5

Evaluation Assigned

Dec 6     

Engineering input and output/ Project Presentations

Chapt 7

 

Dec 13   

Final Exam