CIS 624                 Fall 2004               Assignment 3 – Aggregates

 

Due: Start of Class on 11/16/04

 

Task:

               Answer the following questions. Your answer should be neat and easy to read.

 

1) Imagine that a bank is building a data mart surrounding “In branch transactions” using a star schema design. The fact table contains individual transactions (fact attribute – amount), and dimensions include: branch, customer, account type, transaction type (deposit, withdrawal, … ), teller, date, and time (grain size of seconds). Suppose the bank has 250,000 customers, 30 branches, 7 different account types, 6 different transaction types, 140 tellers, and plans on keeping data for 5 years time. Further suppose that fact table records will be 32 bytes each. If the average customer makes 3 transactions per week,

A)    estimate the amount of disk storage needed for the fact table.  If you believe you need to make further assumptions, please ask me.

B)     If customer records will be 200 bytes each, branch records will be 50 bytes each, account type records will be 30 bytes, transaction type records will be 15 bytes each, teller records will be 75 bytes each, and date records will be 55 bytes each, and second records will be 30 bytes each, how much disk space is needed for dimension tables?

C)     explain how data sparsity makes creating the data mart practical.

D)    Suppose an aggregate is being created for

·        transactions by state the branch is in (5 different states with branches), by account type, by transaction type.

What would the aggregate data look like -  What tables will there be? Show hypothetical sample fact table records to illustrate.

E)     For the aggregate discussed in (D), estimate the amount of disk storage needed for the aggregate fact table. Assume each record will be 20 bytes.  If you believe you need to make further assumptions, please ask me.

F)     Theoretically, the aggregate fact table should have :

·        6 (branches collapsed into states) * 250,000 (customers collapsed entirely) * 140 (tellers collapsed entirely) * 1825 (dates collapsed entirely) * 86,400 (seconds collapsed entirely) fewer records than in the atomic fact table.

Why is this unlikely to be true?