Task:
Answer the following questions. Your answer should be neat and easy to read.
Imagine
that Universal Record Company (URC) is building a data mart surrounding
“digital download purchases” using a star schema design. The fact table
contains downloaded items (fact attributes – price, and discount amount), and
dimensions include: customer, location of sale (www site or store), recording,
promotion, and time (grain size of minutes). Suppose URC has agreements with
1,000 stores to sell downloads from kiosks in their store, and 10 WWW sites to
sell over the internet. Further, suppose that URC currently sells downloads of
5000 recordings, and has 100,000 known customers, 20 known promotions, and plans on keeping data for 5 years time.
Further suppose that fact table records will be 28 bytes each. Further suppose
that the average customer buys 20 recordings per year.
A) estimate the amount of disk storage
needed for the fact table. If you
believe you need to make further assumptions, please ask me.
B) If customer records are 60 bytes each,
location records are 70 bytes each, recording records are 100 bytes each,
promotion records are 20 bytes each, and time records are 250 bytes each, how
much disk store is needed for dimension tables?
C) explain how data sparsity makes creating
the data mart practical.
D) Suppose an aggregate is being created for
recording download purchases by location region (a total of 10 regions) by genre ( a total of 10), by month. What
would the aggregate data look like -
What tables will there be? Show a few hypothetical sample fact table
records to illustrate.
E) For the aggregate discussed in (D),
estimate the amount of disk storage needed for the aggregate fact table. Assume
each record will be 20 bytes. If you
believe you need to make further assumptions, please ask me.
F)
Theoretically,
the aggregate fact table should have
101 (locations collapsed into regions) * 500 (recordings collapsed in
genres) * 20 (promotions collapsed entirely) * 100,000 (customers collapsed
entirely) * 43,200 (minutes collapsed into months) fewer records than in the
atomic fact table. Why is this unlikely to be true?