Midterm Answers


Test Form A

  1. C
  2. E
  3. E
  4. B
  5. E
  6. Fact
  7. Smart Keys
  8. Meta Data
  9. Householding
  10. Slowly Changing
  11. False – Fact tables are frequently normalized. Dimension tables may even be normalized even though we don’t seek to normalize them (if no snow flake tables need to be denormalized)
  12. True
  13. False – There are methods that will help – fuzzy matching algorithms, sorting records using DBMS, … there are even commercial tools to help with this important data staging step
  14. True
  15. False – there are 4 other readiness factors – the most crucial one is to have a strong business sponsor

Both Forms

  1. (at least 2)

A)    So that DW doesn’t interfere with operational system performance

B)    So that the operational system doesn’t interfere with DW performance

C)    Because the data models should be different, since operational systems need to effectively handle updates, while DW must efficiently handle retrieval

D)    Because data warehouse may need integration of information in different operational systems that themselves cannot be integrated

E)     Because data cleaning may be necessary for effective DW that politically or technically cannot be done on the operational system

F)    

  1.  By taking on just one part of the business (one business process), the project will be less complex and will be finished more quickly than if a more complete and ambitious project were attempted.  This greatly increases the chances that you can have some success / provide value to the organization before too much money is spent. If large amounts of money are spent before any value is obtained, the project may be cancelled.
  2. “In search of demand” refers to the situation in which the “business side” doesn’t know how the DW will benefit them. The DW initiative may have come from the technical side (IT). It is a problem because it might make it hard to find requirements. Also, if the business side does not see the benefit, they won’t use the DW and the money spent will be wasted
  3. Costs must be estimated – including hardware, software, project members time, training, support.

Benefits must be estimated – including benefits of better decision making, increased sales, better customer service,  reduced costs, …

Costs and benefits must be compared in a organizational standard way such as ROI that allows comparing costs and benefits of the DW project with other possible uses of the money to be invested.

  1.  

A)    This is a reasonable question. It concerns the person’s job. Presuming that the organization has a way of measuring performance, the manager should know how he is being evaluated and should be motivated to track those measures in order to improve his/her performance. The DW can provide the manager a way to track how s/he is doing. Identifying these measures will lead to identifying facts that should go in the fact table.

B)    The former is a smaller grain size – a record is revenue from a single room, whereas the latter is non-atomic – a record is summarized revenue from all rooms of a given type (for a given night).  In the latter, since the fact table record involves multiple rooms with multiple customers, one FK would have to be to a room type rather than to an individual room, and there is no chance to have a FK to a customer dimension. Thus a major advantage of the former is greater flexibility in what queries can be answered (e.g. questions about customers, and, if interesting, individual rooms). The only significant disadvantage of the former is that significantly more disk space is needed for the fact table.

It is WRONG to say that the former can not look at summarized data for different room types. The FK from the fact table will be to an individual room record in the room dimension table, but one of the attributes in the room table will be the room type. Thus you can query based on it – fact table records will be added up to get the answer (or an aggregate might even be pre-calculated)

C)    (at least 3 – with additivity specified and explained)

The following are potentially fully additive facts – each makes sense added across any dimension

·         Room price paid – adds up to revenue by any dimension attribute

·         Room rented (factless – all records with value 1) – adds up to number of rooms rented by any dimension attribute

·         Charge for movies

·         Charge for phone calls

·         Restaurant Charged meals

·         Room service meals

·         Snack/mini bar charges

·         Discounts given – adds up to total discounts given by any dimension attribute

The following could be justified more than one way with proper argument:  

·         Number of Guests (occupants) – it could be interesting to add up number of guests by dimensions – how many people were served (by room type, by hotel, by home state, …). However, it might also be more interesting to look at averages (what is the average number of people who stay in King Suites. Double, etc; what is the average number of people who stay when travel is for business, leisure, etc)

The following is non additive:

·         Length of stay – each record is for one night’s rental. So a person who stays for two nights will result in two records, each with length of stay = 2. Adding these up will result in double counting. More meaningful is averaging, maybe finding maxes.

D)    Previous stays in which they stayed as a smoking customer will now be incorrectly attributed to non-smoking – making totals for smoking too low and non-smoking too high

E)     NOTE – the question explicitly says “ignoring business requirements”, so attempts to predict business requirements are not answering the question. Hence we are left with numeric considerations; we look for sharp decreases in number of values – saving a lot of disk accesses if a query is asked. Kimball et al’s rule of thumb is to look for an order of magnitude (factor of 10) decrease. From room nights to rooms, decrease is factor of 150 – a good place (numerically) to put an aggregate is “by room”. From room to hotel, the decrease is  a factor of  200 – so another good place (numerically) to put an aggregate is “by hotel”. From hotel to state, the decrease is by a factor of 4, so benefit is not that great – getting results by state if we have results by hotel only requires adding up (on the average) 4 numbers to get each state total. From state to region, the decrease is only a by a factor of 8.3. Again the benefit is not that great. From region to the whole company, the decrease is only by a factor of 6. Again the benefit would not be that great. Hence, (numerically) the aggregates to create include “by room” and “by hotel”.

  1. Table should be as described (shown below). Answer to question: “Newtown”