CIS 624                 Fall 2003            Assignment 3 – Aggregates

 

Due: Start of Class on 11/20/03

 

Task:

               Answer the following questions. Your answer should be neat and easy to read.

 

Imagine that Universal Record Company (URC) is building a data mart surrounding “digital download purchases” using a star schema design. The fact table contains downloaded items (fact attributes – price, and discount amount), and dimensions include: customer, location of sale (www site or store), recording, promotion, and time (grain size of minutes). Suppose URC has agreements with 1,000 stores to sell downloads from kiosks in their store, and 10 WWW sites to sell over the internet. Further, suppose that URC currently sells downloads of 5000 recordings, and has 100,000 known customers, 20 known promotions,  and plans on keeping data for 5 years time. Further suppose that fact table records will be 28 bytes each. Further suppose that the average customer buys 20 recordings per year.

A)     estimate the amount of disk storage needed for the fact table.  If you believe you need to make further assumptions, please ask me.

B)     If customer records are 60 bytes each, location records are 70 bytes each, recording records are 100 bytes each, promotion records are 20 bytes each, and time records are 250 bytes each, how much disk store is needed for dimension tables?

C)     explain how data sparsity makes creating the data mart practical.

D)    Suppose an aggregate is being created for recording download purchases by location region  (a total of 10 regions) by genre ( a total of 10), by month. What would the aggregate data look like -  What tables will there be? Show a few hypothetical sample fact table records to illustrate.

E)     For the aggregate discussed in (D), estimate the amount of disk storage needed for the aggregate fact table. Assume each record will be 20 bytes.  If you believe you need to make further assumptions, please ask me.

F)     Theoretically, the aggregate fact table should have  101 (locations collapsed into regions) * 500 (recordings collapsed in genres) * 20 (promotions collapsed entirely) * 100,000 (customers collapsed entirely) * 43,200 (minutes collapsed into months) fewer records than in the atomic fact table. Why is this unlikely to be true?