The Kimball Lifecycle focuses on adding business value across the enterprise and dimensionally structures the data that's delivered to the business. It uses iterations and increments in a manageable lifecycle to do this.
There have been two main approaches to building data warehouses with data marts. The first is an approach by Bill Inmon and the second is the approach by RalphKimball.
Bill Inmon’s approach works this way: a) The enterprisedatawarehouse (EDW) should be in at least 3rdnormalform. b) But the datamarts should be in dimensionalform. c) BigBangApproach
Meanwhile, here is Kimball’sapproach: a) The EDW is based on dimensionalmodeldesign b) Focus on user-friendliness and easytouse c) Develop EDW on a departmentalbasis piece by piece
Kimball’sapproach is more practical, more interpretable, easier to implement and lesscostly based on industry best practices. It involves the following steps: a) Program/Project Planning and management b) Deployment c) Maintenance d) Growth
The technology track involves technical architectural design and product selection and installation.
The first stage in the KimballLifecycle involves planning.Planning for three streams happen simultaneously; these streams are: a) Technology Track b) DataTrack c) Application Track
The data track involves dimensional modeling, physical design, and ETL design and development.
The application track involves business application design and business application development.
Meanwhile, businessapplication development involves using ideally a single advanced BI tool that meets all user needs. A
Advanced tools provide significant productivity gains for the application development team.
Good BI design enables end users to modify existing reports and develop ad hoc reports quickly without going to IT.
Dimensional modeling is a logicaldesigntechnique for structuring data so that it is intuitive for business users and delivers fast query performance.
Relational modeling is widely used in databases nowadays.
dimensional modeling has two advantages over relational modeling. These are understandability and performance. The model must be easily understood by business users while representing the complexities of the business.
Measurements are numeric values called facts. Examples are sales amount and count of attendance
Dimensions, meanwhile, describe the “who, what, where, when, why, and how” of the facts. For example, dimensions for sales amount would be sales by quarter and sales by product.
A dimensional model consists of a fact table containing measurements surrounded by a halo of dimension tables containing textual context
It is known as a star join and as a star schema when stored in a relational database.
Fact tables contain the descriptive attributes (numerical values) needed to perform decision analysis and query reporting in the star schema.
A fact is a performance measure.
Fact values are not known in advance. They are only known when event measurement occurs.
Facts are numeric.
The most useful facts are numeric and additive.
Fact tables are usually the largest tables. A single fact table can contain either detail or summarized data. They are primarily joined to dimension tables through foreignkeys.
The business definition of the measurement event that produces the fact table is called the fact table'sgrain.
In a star schema, dimension tables contain classification and aggregationinformation about the values in the fact table
Dimension tables contain the parameters by which the fact table measures are analyzed.
Dimension tables provide the context to the fact table measures they describe. They also contain descriptors of the business, utilizing business terminology. They have manylargecolumns, contain textual and discretedata, and are usually smaller than fact tables.
Have a single column surrogate primary key (called the warehouse dimension key) and are joined to a fact table through a foreign key reference to their primary key.
Dimension tables can contain one or morehierarchies. These hierarchies are de-normalized into the dimension tables
practical approach originated by Mike Schmitz to design Dimension Table Families. Here, fact tables are highly normalized for maintainability and flexibility.
The Dimensional Normal Form is a creative and practical approach originated by Mike Schmitz to design Dimension Table Families.
Dimensions have their hierarchiesdenormalized into them for usability and performance. Its schema is limited to twolevels. These are a single first level or central highly normalized table called a fact table and multiple second level tables called dimension tables linked to the first level table in primarily one to many relationships.
ETL is mostly done by business analytics people following an information technology track. However, it is useful for managers to know what happens during ETL.
The objective of ETL is to get data out of the source and load it into the data warehouse. It is simply a process of copying data from one database to other.
Data is extracted from a database, transformed to match the data warehouse schema and loaded into the data warehouse database
When defining ETL for a data warehouse, it is important to think of ETL as a process, not a physical implementation.
The process is usually handled using Structured Query Language (SQL) scripts, a special-purpose programming language designed for managing data held in a relational database.
In extraction, data is extracted from heterogeneous data sources. Each data source has its distinct set of characteristics that need to be managed and integrated into the ETL system in order to effectively extract data. This is usually done using SQL Select Statements.