Midterm Answers
Test Form A
Both Forms
A) So that DW doesn’t interfere with operational system performance
B) So that the operational system doesn’t interfere with DW performance
C) Because the data models should be different, since operational systems need to effectively handle updates, while DW must efficiently handle retrieval
D) Because data warehouse may need integration of information in different operational systems that themselves cannot be integrated
E) Because data cleaning may be necessary for effective DW that politically or technically cannot be done on the operational system
F) …
Benefits must be estimated – including benefits of better decision making, increased sales, better customer service, reduced costs, …
Costs and benefits must be compared in a organizational standard way such as ROI that allows comparing costs and benefits of the DW project with other possible uses of the money to be invested.
A) This is a reasonable question. It concerns the person’s job. Presuming that the organization has a way of measuring performance, the manager should know how he is being evaluated and should be motivated to track those measures in order to improve his/her performance. The DW can provide the manager a way to track how s/he is doing. Identifying these measures will lead to identifying facts that should go in the fact table.
B) The former is a smaller grain size – a record is revenue from a single room, whereas the latter is non-atomic – a record is summarized revenue from all rooms of a given type (for a given night). In the latter, since the fact table record involves multiple rooms with multiple customers, one FK would have to be to a room type rather than to an individual room, and there is no chance to have a FK to a customer dimension. Thus a major advantage of the former is greater flexibility in what queries can be answered (e.g. questions about customers, and, if interesting, individual rooms). The only significant disadvantage of the former is that significantly more disk space is needed for the fact table.
It is WRONG to say that the former can not look at summarized data for different room types. The FK from the fact table will be to an individual room record in the room dimension table, but one of the attributes in the room table will be the room type. Thus you can query based on it – fact table records will be added up to get the answer (or an aggregate might even be pre-calculated)
C) (at least 3 – with additivity specified and explained)
The following are potentially fully additive facts – each makes sense added across any dimension
· Room price paid – adds up to revenue by any dimension attribute
· Room rented (factless – all records with value 1) – adds up to number of rooms rented by any dimension attribute
· Charge for movies
· Charge for phone calls
· Restaurant Charged meals
· Room service meals
· Snack/mini bar charges
· Discounts given – adds up to total discounts given by any dimension attribute
The following could be justified more than one way with proper argument:
· Number of Guests (occupants) – it could be interesting to add up number of guests by dimensions – how many people were served (by room type, by hotel, by home state, …). However, it might also be more interesting to look at averages (what is the average number of people who stay in King Suites. Double, etc; what is the average number of people who stay when travel is for business, leisure, etc)
The following is non additive:
· Length of stay – each record is for one night’s rental. So a person who stays for two nights will result in two records, each with length of stay = 2. Adding these up will result in double counting. More meaningful is averaging, maybe finding maxes.
D) Previous stays in which they stayed as a smoking customer will now be incorrectly attributed to non-smoking – making totals for smoking too low and non-smoking too high
E) NOTE – the question explicitly says “ignoring business requirements”, so attempts to predict business requirements are not answering the question. Hence we are left with numeric considerations; we look for sharp decreases in number of values – saving a lot of disk accesses if a query is asked. Kimball et al’s rule of thumb is to look for an order of magnitude (factor of 10) decrease. From room nights to rooms, decrease is factor of 150 – a good place (numerically) to put an aggregate is “by room”. From room to hotel, the decrease is a factor of 200 – so another good place (numerically) to put an aggregate is “by hotel”. From hotel to state, the decrease is by a factor of 4, so benefit is not that great – getting results by state if we have results by hotel only requires adding up (on the average) 4 numbers to get each state total. From state to region, the decrease is only a by a factor of 8.3. Again the benefit is not that great. From region to the whole company, the decrease is only by a factor of 6. Again the benefit would not be that great. Hence, (numerically) the aggregates to create include “by room” and “by hotel”.