CIS 624               Spring 2008                  Assignment 1 – Data Modeling

 

Assigned: 03/20/08

Due: Start of Class on 03/27/08

 

Task:

Answer the following questions. Your answer should be neat and easy to read.

 

Situation:

Imagine a university that is building a data warehouse around the business process of “registering students”.  They have an existing 3NF Database (design attached) as a source of data. Brief explanation: Each Student may enroll in several sections, each of which can enroll several students. Each section has a room and a professor, while the room can be used for multiple sections, and each professor can teach multiple sections. The section is one of many for a catalog course. The catalog courses can have many prerequisites and be a prerequisite for multiple courses as well.

 

A)      Two candidate fact table grain sizes are: a registration of a single student for a single section for a given semester OR registrations for a particular course for a particular semester. What are the advantages and disadvantages of each?

 

Suppose that the grain size is chosen to be registration of a single student for a single section for a given semester, and subsequently, the attached star schema has been developed. There are dimensions: Student, Room, Course, Faculty, Date, and Time.

 

B)      Suppose the fact attributes are chosen to be: numCourses (always 1 for each record), numCredits, numCreditsThisSemester. How additive are each of the fact attributes?

 

C)      The student dimension includes the student’s major. If this attribute is modeled as a slowly changing dimension using Kimball et al’s “type 2” method – create a new record after a change, briefly explain how this method yields correct results for BOTH questions such as 1) show the number of credits enrolled by semester and student major; and 2) show the number of credits enrolled by student year in school. You should makes clear why both questions will be correctly answered using this strategy. 

 

D)      Suppose the student’s “home state” attribute is modeled as a slowly changing dimension using Kimball et al’s “type 1” method - if an already enrolled student moves to another state, this is handled by replacing the old value with the new value. Briefly explain the problem that this presents for the question: show enrolled credits by year by home state.

 

Suppose the  university decided to create a “junk dimension” containing Way Registered (possible values – web, in person, by administrator) and When Registered (possible values – early, just before, first week) and Grade Earned (possible values – A, A-, B+, B, B-, C+, C, C-, D+, D, F).

 

E)      How many records would there be in the junk dimension? Briefly Explain.

 

F)       Briefly explain why this might be preferable to putting these in separate dimensions?

 

G)      Briefly explain why putting these in separate dimensions might be preferable