chapter 3

Cards (81)

  • Business Process
    A low-level activity performed by an organization, such as taking orders, invoicing, receiving payments, handling service calls, registering students, performing a medical procedure, or processing claims
  • Grain
    Specifying exactly what an individual fact table row represents. It also provides the answer to the question, "How do you describe a single row in the fact table?"
  • Dimensions
    Dimension tables contain the descriptive attributes used by BI applications for filtering and grouping the facts. Dimensions fall out of the question, "How do business people describe the data resulting from the business process measurement events?"
  • Facts
    Facts are determined by answering the question, "What is the process measuring?". Typical facts are numeric additive figures, such as quantity ordered or dollar cost amount
  • Promotions
    Temporary price reductions, ads in newspapers and newspaper inserts, displays in the grocery store, and coupons
  • Date Dimension
    A special dimension because it is the one dimension nearly guaranteed to be in every dimensional model since virtually every business process captures a time series of performance metrics
  • Flags and Indicators as Textual Attributes
    Dimension table attributes serve as report labels and values in pull-down query filter lists, this indicator should be populated with meaningful values. Instead of using codes such as N/Y, True/False, 0/1, the use of textual attributes could make the report much more efficient and easier to understand
  • Current and Relative Date Attributes
    Most date dimension attributes are not subject to updates. However, there are attributes you can add to the basic date dimension that will change over time, including IsCurrentDay, IsCurrentMonth, IsPrior60Days, and so on
  • Time-of-Day as a Dimension or Fact
    If you want to filter or roll up time periods based on summarized day part groupings, such as activity during 15-minute intervals, hours, shifts, lunch hour, or prime time, time-of-day would be treated as a full-fledged dimension table. If there's no need to roll up or filter on time-of-day groupings, time-of-day should be handled as a simple date/time fact in the fact table
  • Product Dimension
    The product dimension describes every SKU in the grocery store. The product dimension is almost always sourced from the operational product master file. SKU or stock keeping unit allows the vendors to keep track of their inventory
  • Flatten Many-to-One Hierarchies
    The merchandise hierarchy is an important group of attributes. Typically, individual SKUs roll up to brands, brands roll up to categories, and categories roll up to departments. Each of these is a many-to-one relationship
  • Attributes with Embedded Meaning

    Often operational product codes, identified in the dimension table by the NK notation for natural key, have embedded meaning with different parts of the code representing significant characteristics of the product
  • Numeric Values as Attributes or Facts
    If the numeric value is used primarily for calculation purposes, it likely belongs in the fact table. If the stable numeric value is used predominantly for filtering and grouping, it should be treated as a product dimension attribute
  • Drilling Down on Dimension Attributes
    Drilling down in a dimensional model is nothing more than adding row header attributes from the dimension tables. This would give you a more detailed view from the summarized data
  • Store Dimension
    Store Dimension describes every store in the grocery chain. The store dimension is the primary geographic dimension. Each store can be thought of as a location. You can roll stores up to any geographic attribute, such as ZIP code, county, and state in the United States
  • Promotion Dimension
    Promotion Dimension refers to the conditions under which a product is sold, such as temporary price reductions, end aisle displays, newspaper ads, and coupons. Also known as causal dimension, it describes factors thought to cause a change in product sales
  • Null Foreign Keys, Attributes, and Facts
    No null FKs in fact tables as they violate referential integrity. Nulls as metrics in fact tables are properly handled in aggregate functions such as SUM, MIN, MAX, COUNT, and AVG which do the "right thing" with nulls. Substituting a zero instead would improperly skew these aggregated calculations
  • Degenerate Dimensions for Transaction Numbers
    No attributes other than a key, serving as a grouping key for products purchased in a single market basket transaction, used when the grain of a fact table represents a single transaction or transaction line, representing the unique identifier of the parent, refers to empty dimensions in transaction fact tables, representing operational transaction control numbers like order, invoice, and bill-of-lading numbers
  • Factless Fact Table
    A factless fact table is a table that contains only foreign keys to dimensions, but no numeric facts. It records the relationship between the keys defined by a promotion, independent of other events like actual product sales
  • Dimension Table Surrogate Keys
    Every join between dimension and fact tables in the data warehouse should be based on meaningless integer surrogate keys. Advantages include buffering the data warehouse from operational changes, integrating multiple source systems, improving performance, handling null or unknown conditions, and supporting dimension attribute change tracking
  • Dimension Natural and Durable Supernatural Keys
    Operational source systems use natural keys, also known as business, production, or operational keys, which are identified using NK notation. To ensure the preservation of natural keys, the ETL system must assign permanent durable identifiers, also known as supernatural keys
  • Degenerate Dimension Surrogate Keys
    These keys can be assigned to dimension tables, ensuring that control number dimensions are no longer degenerate
  • Date Dimension Smart Keys
    The primary key of the date dimension is typically a meaningful integer formatted as yyyymmdd, which is not intended for bypassing the date dimension and directly querying the fact table. It is useful for partitioning fact tables, allowing for easy data removal and loading without disturbing the rest of the table
  • Fact Table Surrogate Keys
    Surrogate keys are recommended for dimension tables, but not for fact tables. Fact table surrogate keys are useful for ETL processing in back room, offering benefits like immediate unique identification, backing out or resuming a bulk load, replacing updates with inserts plus deletes, and using the fact table surrogate key as a parent in a parent/child schema
  • Snowflake Schemas with Normalized Dimensions
    A form of dimensional modeling in which dimensions are stored in multiple tables, with each table representing a level in a dimensional hierarchy
  • Date Dimension Smart Keys
    • The primary key of the date dimension is typically a meaningful integer formatted as yyyymmdd, which is not intended for bypassing the date dimension and directly querying the fact table
    • It is useful for partitioning fact tables, allowing for easy data removal and loading without disturbing the rest of the table
  • Fact Table Surrogate Keys
    • Surrogate keys are recommended for dimension tables, but not for fact tables
    • Fact table surrogate keys are useful for ETL processing in back room, but offer back-room benefits such as immediate unique identification, backing out or resuming a bulk load, replacing updates with inserts plus deletes, and using the fact table surrogate key as a parent in a parent/child schema
  • Snowflake Schemas with Normalized Dimensions is a form of dimensional modeling in which dimensions are stored in multiple related dimension tables
  • Snowflaking
    Redundant attributes are removed from the flat table and placed in separate normalized tables
  • Snowflaking is a legal extension of the dimensional model, however, it is discouraged due to the negative impacts on ease of use, performance, and the ability to browse within a dimension
  • Outriggers are tables or entities that are shared by more than one dimension
  • Centipede Fact Tables with Too Many Dimensions are fact tables surrounded by many dimension tables that have been (overly) normalized to avoid "snowflaking"
  • Most business processes can be represented with less than 20 dimensions, and if 25 or more dimensions are used, correlated dimensions should be combined into a single dimension
  • Inventory Periodic Snapshot
    • Stores the current state of data at a regular interval of time
    • Faces the challenge of generating dense snapshot tables as the retailer strives to avoid out-of-stock situations
  • Inventory Transactions
    • Records every transaction that affects inventory, including receiving, inspecting, returning, picking, shipping, and removing products
  • Inventory Accumulating Snapshot
    • One row is placed in the fact table when a particular product is received at the warehouse, and the disposition of the product is tracked on this single fact row until it leaves the warehouse
  • Fact Table Types
    • Transaction
    • Periodic Snapshot
    • Accumulating Snapshot
  • Transaction Fact Tables
    Represent an event that occurred at an instantaneous point in time, with a row for each transaction event
  • Periodic Snapshot Fact Tables

    Take a picture of the activity at the end of a day, week, or month, stacking the snapshots consecutively
  • Accumulating Snapshot Fact Tables
    Represent processes that have a definite beginning and definite end together with a standard set of intermediate process steps