CIS 624 Fall 2002 Assignment 3 – Aggregates

Due: Start of Class on 11/14/02

Task:

Answer the following questions. Your answer should be neat and easy to read.

Imagine that a McD’s is building a data mart surrounding “customer purchases” using a star schema design. The fact table contains order items (fact attributes – price, discount amount, quantity, itemsubtotal), and dimensions include: store, cashier, product, promotion, and time (grain size of seconds). Suppose McD’s has 10,000 stores, which average 30 cashiers each, 20 products, 40 known promotions, and plans on keeping data for 5 years time. Further suppose that fact table records will be 36 bytes each. Further suppose that the average cashier handles 2400 order items per week.

A) estimate the amount of disk storage needed for the fact table. If you believe you need to make further assumptions, please ask me.

B) If store dimension records are 100 bytes each, cashier records are 60 bytes each, product records are 120 bytes each, promotion records are 20 bytes each, and time records are 250 bytes each, how much disk store is needed for dimension tables?

C) explain how data sparsity makes creating the data mart practical.

D) Suppose an aggregate is being created for order item purchases by sales district (a total of 300 districts) by product, by month. What would the aggregate data look like - What tables will there be? Show hypothetical sample fact table records to illustrate.

E) For the aggregate discussed in (D), estimate the amount of disk storage needed for the aggregate fact table. Assume each record will be 28 byes. If you believe you need to make further assumptions, please ask me.

F) Theoretically, the aggregate fact table should have 33 (stores collapsed into districts) * 40 (promotions collapsed entirely) * 2,592,000 (seconds collapsed into months) fewer records than in the atomic fact table. Why is this unlikely to be true?