Task:
Answer the following questions. Your answer should be neat and easy to read.
Imagine
that a McD’s is building a data mart surrounding “customer purchases” using a
star schema design. The fact table contains order items (fact attributes –
price, discount amount, quantity, itemsubtotal), and dimensions include: store,
cashier, product, promotion, and time (grain size of seconds). Suppose McD’s
has 10,000 stores, which average 30 cashiers each, 20 products, 40 known
promotions, and plans on keeping data
for 5 years time. Further suppose that fact table records will be 36 bytes
each. Further suppose that the average cashier handles 2400 order items per week.
A) estimate the amount of disk storage
needed for the fact table. If you
believe you need to make further assumptions, please ask me.
B) If store dimension records are 100 bytes
each, cashier records are 60 bytes each, product records are 120 bytes each,
promotion records are 20 bytes each, and time records are 250 bytes each, how
much disk store is needed for dimension tables?
C) explain how data sparsity makes creating
the data mart practical.
D) Suppose an aggregate is being created for
order item purchases by sales district
(a total of 300 districts) by product, by month. What would the
aggregate data look like - What tables
will there be? Show hypothetical sample fact table records to illustrate.
E) For the aggregate discussed in (D),
estimate the amount of disk storage needed for the aggregate fact table. Assume
each record will be 28 byes. If you
believe you need to make further assumptions, please ask me.
F)
Theoretically,
the aggregate fact table should have 33
(stores collapsed into districts) * 40 (promotions collapsed entirely) *
2,592,000 (seconds collapsed into months) fewer records than in the atomic fact
table. Why
is this unlikely to be true?