A dimensional model divides the world into measurements and context. Context intuitively divided into clumps called dimensions. Dimensions describe the "who, what, where, when, why, and how" of the facts. They can also identified as the "by" words in a business question that asks for a report.
Dimension Tables
Contain the parameters by which the fact table measures are analyzed
Provide the context to the fact table measures they describe
Contain descriptors of the business (nouns)
Utilize business terminology
Have many large columns
Contain textual and discrete data
Are usually smaller than fact tables
Have a single column surrogate primary key (called the warehouse dimension key)
Are joined to a fact table through a foreign key reference to their primary key
Can contain one or more hierarchies
The hierarchies are de-normalized into the dimension tables
Surrogate key
A system assigned primary key. When the first row is added to a dimension, the system automatically assigns a key of 1 to the row. As each additional row is added, the system automatically increments the key by 1. It's meaningless, but essential as a foreign key in fact tables.
Warehouse Dimension Keys
Single column surrogate keys
Provide key control within the data warehouse
Substantially improve performance
Enable one method of tracking attribute history
Facilitate exception references from a fact table
Surrogate keys are implemented in every dimension, even date and time dimensions
Exception Condition Dimension Table Rows
0 - the fact table row had an invalid legacy id for this dimension (Invalid)
-1 - The fact table row should reference a value for this dimension, but the value is unknown (MissingMandatory)
-2 - The fact table row is not applicable for this dimension (MissingOptional)
Dimension Table Classifications and Examples
Date based (Date Dim, Month Dim)
Time based (Time Dim)
Business Entities (Store Dim, Customer Dim)
Analytical Profiles (Customer Demographics Dim)
Date Based Dimensions
Contain attributes like Sale Date, Sales Calendar Month, Sales Calendar Quarter, Sales Calendar Year, Sales Fiscal Month, Sales Fiscal Quarter, Sales Fiscal Year, Start Date, End Date, Arrival Date, Departure Date, Order Date, Ship Date
The Date Dimension Family
Implement the family and name each dimension for its granularity - Date Dim, Week Dim, Month Dim, Quarter Dim, Year Dim
The Date Dimension Family uses at least one character column for date and puts in all attributes that simplify analysis. It also enables all date functions (add, subtract, etc).
Enterprise-Wide Events Dimension
Contains attributes like Special Event Description (World Cup final game, Significant stock market fall/rise, Interest rate raise/drop, Gasoline price increase/decrease)
Reporting Periods Dimension
Contains attributes like MTD IND (Month to Date Indicator), YTD IND (Year to Date Indicator), Current 3MTH IND, Current 12MTH IND
Simpler, less error-prone filters using Reporting Periods Dimension
SELECT sum(sales) FROM fact <join statements> WHERE current_3mth_ind = 'YES'
Role-Playing Dimensions
Entity taking on different roles or uses for the same entity. Examples: Order date, Ship date, Bill to customer, Ship to customer, Flight origination airport, Flight destination airport.
The wrong way to model role-playing dimensions is to have the same column headings for different roles, making them indistinguishable.
Dimensional Normal Form (DNF)
A creative and practical approach to design Dimension Table Families. Fact tables are highly normalized for maintainability and flexibility. Dimensions have their hierarchies de-normalized into them for usability and performance. The schema is limited to two levels - a single first level or central highly normalized table called a fact table, and multiple second level tables called dimension tables linked to the first level table in primarily one to many relationships.
Snowflaking, where normalized tables are used in the dimensional model to break dimension hierarchies into normalized tables connected by foreign key - primary key relationships, is bad because it leads to many joins which can impact performance.
Hotel Business Overview
Maitutulog Mo Kaya Hotel (MMK Hotel) is composed of 500 hotels with three property types (luxury, economy, budget) and different room types. The goal is to maximize utilization and profit.
Hotel Business Strategic Planning
1. Determine the most profitable hotel type and room type mixes
2. Can I just build more of the most profitable hotel types or can I just put in more of the most profitable room types
3. Is there a saturation point that would make utilization and profit go down
4. What is the competition doing that will cause a change in my occupancy rates
5. How do rates, discounts, promotions, campaigns, and advertising affect my profit
Daily Room Type Profitability Analysis
1. What room types have the highest/lowest profitability across the chain, by property type
2. Which hotels have room type profitability different from the norm
3. How does weekend profitability compare with weekdays
4. How does weekday profitability differ by day
The challenge is to build the Logical Data Models for Daily Room Billing and Daily Utilization and Profitability based on the Hotel Property Management System Source Data 3NF
Dimensional modeling
A data warehousing approach that organizes data into measurements (facts) and context (dimensions) for easier analysis and understanding of relationships between data elements
Measurements (facts)
Numerical values that are typically the focus of analysis, such as sales revenue, product quantity, or employee hours worked
Context (dimensions)
Additional information about measurements, describing the "who, what, where, when, why, and how" of the facts, such as date, product, region, and salesperson
dimensions
Can be identified as the "by" words in a business question that asks for a report, such as "by region" or "by product category"