DW

Cards (49)

  • The Kimball Lifecycle focuses on adding business value across the enterprise and dimensionally structures the data that's delivered to the business. It uses iterations and increments in a manageable lifecycle to do this.
  • There have been two main approaches to building data warehouses with data marts. The first is an approach by Bill Inmon and the second is the approach by Ralph Kimball.
  • Bill Inmon’s approach works this way: a) The enterprise data warehouse (EDW) should be in at least 3rd normal form. b) But the data marts should be in dimensional form. c) Big Bang Approach
  • Meanwhile, here is Kimball’s approach: a) The EDW is based on dimensional model design b) Focus on user-friendliness and easy to use c) Develop EDW on a departmental basis piece by piece
  • Kimball’s approach is more practical, more interpretable, easier to implement and less costly based on industry best practices. It involves the following steps: a) Program/Project Planning and management b) Deployment c) Maintenance d) Growth
  • The technology track involves technical architectural design and product selection and installation.
  • The first stage in the Kimball Lifecycle involves planning. Planning for three streams happen simultaneously; these streams are: a) Technology Track b) Data Track c) Application Track
  • The data track involves dimensional modeling, physical design, and ETL design and development.
  • The application track involves business application design and business application development.
  • Meanwhile, business application development involves using ideally a single advanced BI tool that meets all user needs. A
  • Advanced tools provide significant productivity gains for the application development team.
  • Good BI design enables end users to modify existing reports and develop ad hoc reports quickly without going to IT.
  • Dimensional modeling is a logical design technique for structuring data so that it is intuitive for business users and delivers fast query performance.
  • Relational modeling is widely used in databases nowadays.
  • dimensional modeling has two advantages over relational modeling. These are understandability and performance. The model must be easily understood by business users while representing the complexities of the business.
  • Measurements are numeric values called facts. Examples are sales amount and count of attendance
  • Dimensions, meanwhile, describe the “who, what, where, when, why, and how” of the facts. For example, dimensions for sales amount would be sales by quarter and sales by product.
  • A dimensional model consists of a fact table containing measurements surrounded by a halo of dimension tables containing textual context
  • It is known as a star join and as a star schema when stored in a relational database.
  • Fact tables contain the descriptive attributes (numerical values) needed to perform decision analysis and query reporting in the star schema.
  • A fact is a performance measure.
  • Fact values are not known in advance. They are only known when event measurement occurs.
  • Facts are numeric.
  • The most useful facts are numeric and additive.
  • Fact tables are usually the largest tables. A single fact table can contain either detail or summarized data. They are primarily joined to dimension tables through foreign keys.
  • The business definition of the measurement event that produces the fact table is called the fact table's grain.
  • In a star schema, dimension tables contain classification and aggregation information about the values in the fact table
  • Dimension tables contain the parameters by which the fact table measures are analyzed.
  • Dimension tables provide the context to the fact table measures they describe. They also contain descriptors of the business, utilizing business terminology. They have many large columns, contain textual and discrete data, and are usually smaller than fact tables.
  • Have a single column surrogate primary key (called the warehouse dimension key) and are joined to a fact table through a foreign key reference to their primary key.
  • Dimension tables can contain one or more hierarchies. These hierarchies are de-normalized into the dimension tables
  • practical approach originated by Mike Schmitz to design Dimension Table Families. Here, fact tables are highly normalized for maintainability and flexibility.
  • The Dimensional Normal Form is a creative and practical approach originated by Mike Schmitz to design Dimension Table Families.
  • Dimensions have their hierarchies denormalized into them for usability and performance. Its schema is limited to two levels. These are a single first level or central highly normalized table called a fact table and multiple second level tables called dimension tables linked to the first level table in primarily one to many relationships.
  • ETL is mostly done by business analytics people following an information technology track. However, it is useful for managers to know what happens during ETL.
  • The objective of ETL is to get data out of the source and load it into the data warehouse. It is simply a process of copying data from one database to other.
  • Data is extracted from a database, transformed to match the data warehouse schema and loaded into the data warehouse database
  • When defining ETL for a data warehouse, it is important to think of ETL as a process, not a physical implementation.
  • The process is usually handled using Structured Query Language (SQL) scripts, a special-purpose programming language designed for managing data held in a relational database.
  • In extraction, data is extracted from heterogeneous data sources. Each data source has its distinct set of characteristics that need to be managed and integrated into the ETL system in order to effectively extract data. This is usually done using SQL Select Statements.