week 5

Created by

Hryn

Cards (33)

Data Warehouse 
Collection of data that helps analysts to make informed decisions in an organisation
View source
Data Warehouse 
OnLine Analytical Processing tools (OLAP)
Allows us to analyse data in a multi-dimensional space
Results in data generalisation and data mining
View source
Data Warehouse vs Database
It is a database but there is no frequent updating
It houses historical data
Usually kept separate from the organisation's operational database
Helps in the integration of a diversity of applications
Can use the data to help make strategic decisions
View source
Data Warehouse 
Subject-oriented – provides information around a subject rather than ongoing operations. Focuses on modelling and analysis of data
Integrated – constructed by integrating data from many different sources such as relational databases or flat files
Time-variant – the data collected is identified with a particular time period – e.g. last 12 months
Non-volatile – previous data is not erased when new data is added
View source
Types of data warehouses
Information processing
Analytical Processing
Data mining
View source
Information processing 
Allows processing of data stored in it. Can be processed by means of querying, basic statistical analysis, reporting using crosstabs, tables, charts or graphs
View source
Analytical Processing 
Supports analytical processing of the information stored in it by means of OLAP operations – slice and dice, drill down, drill up and pivoting
View source
Data mining 
Supports knowledge discovery by finding hidden patterns and associations, constructing analytical models, performing classification and prediction. Results can be presented using visualisation tools
View source
Enterprise Data Warehouse (EDW) 
Services entire enterprise
EDW environment may have an EDW, an Operational Data Store, Physical and Virtual Data Marts
View source
Data Marts 
Used by individual departments or groups
Use a dimensional data model to build
View source
Operational Data Store 
Hybrid form of data warehouse containing integrated information
View source
Data Warehouse (OLAP) vs Operational Database (OLTP)
Historical processing of data vs day to day processing
Analyse business vs run business
Based on Star Schema, Snowflake Schema, Fact Constellation Schema vs Based on Entity Relationship Model (commonly)
Summarised data vs highly detailed data
Number of users in 100s vs 1000s
Number of records accessed in the millions vs in 10s
Highly flexible vs high performance
Larger vs smaller
View source
Query driven approach 
Query - client side - metadata dictionary -heterogeneous sites -queries are mapped -local processor - global answer set
View source
Update driven approach 
Alternative to traditional approach
Information from multiple heterogeneous sources are integrated in advance and available for direct querying and analysis
View source
Advantages of update driven approach
High performance
Data is copied, processed, integrated, annotated, summarised and restructured in semantic data store in advance
View source
3 tier architecture 
Top tier: data mining, reporting tool, analysis/other tools
Middle tier: OLAP Server
Bottom tier: ETL - data warehouse - data marts
View source
Warehouse tools and what they do
Data extraction – gather data from many different sources
Data cleaning – finding and correcting errors in data
Data transformation – convert the data to warehouse format
Data loading – sorting, summarising, consolidating, checking integrity and building indices and patterns
Refreshing – updating data sources to warehouse
View source
Process flow in a data warehouse
1. Extract and load the data
2. Clean and transform the data
3. Backup and archive the data
4. Manage Queries and direct them to the appropriate data sources
View source
ETL - Load Manager
Controlling Process
Stored Procedures
Copy Management tools
Fast loader
View source
ETL Vs ELT 
ETL – extract data from source systems, convert the data to a structure suitable for querying and analysis and load data into data warehouse
ELT – takes data from various sources and loads it directly into the target system, then transforms the data on demand to enable analysis
View source
Warehouse Manager 
Controlling Process
Stored Procedures or C with SQL
Backup/Recovery Tool
SQL Scripts
View source
Query Management Process 
1. Query Redirection via C tool or RDBMS
2. Stored Procedures (generating views)
3. Query management tool to monitor indexes and summaries
4. Query Scheduling via C tool or RDBMS
View source
Metadata 
Data about data e.g. the index of a book is metadata for the contents of a book
Summarised data that leads us to detailed data
Roadmap to the data warehouse
Defines the warehouse objects
Acts as a directory – helps to locate the contents of a data warehouse
View source
Data Warehouse 
Use a dimensional data model
Data organised into dimension tables and fact tables using star and snowflake schemas
Traditional OLTP databases automate transactional operations – strive to eliminate data duplication (normalisation)
For a data warehouse - need denormalisation or grouping of data
Increases query performance
View source
Ways to organise a data warehouse
Entity relationship modelling and normalisation
Dimensional Modelling
View source
Normalisation 
Normalisation is a process for converting complex data structures into simple and stable data structures
Models are robust and stable, have minimal redundancy, 3NF, No repeating groups, No partial dependencies, No transitive dependencies, Avoid anomalies
View source
Dimensional Modelling 
Purpose – structure data for easy and efficient retrieval and analysis
ER modelling creates a single model of data required to support the organisation's processes
Dimensional Modelling allows for different individual models of interest so we can make decisions
Denormalised data structure – fewer tables, better performance
View source
Denormalisation 
The more normalised your data is, the more complex the queries needed to read the data because a single query combines data from many tables. This puts a huge strain on computing resources.
The data in a data warehouse does not need to be organized for quick transactions. Therefore, data warehouses normally use a denormalised data structure.
A denormalised data structure uses fewer tables because it groups data and doesn't exclude data redundancies. Denormalisation offers better performance when reading data for analytical purposes.
View source
Approaches to warehouse design
Top down - Enterprise Data Warehouse
Bottom up - Dimensional Data Warehouse
View source
Top down vs Bottom up
Top down: Provides a definite and consistent view of information as information from the data warehouse is used to create Data Marts. Time, Cost and Maintenance is high.
Bottom up: Reports can be generated easily as Data marts are created first and it is relatively easy to interact with Data Marts. Time, Cost and Maintenance are low.
View source
Components of dimensional model
Fact Tables – contain measurements of business e.g. sales, purchase orders, shipment (normalised). Can be quite large (50 billion records, 1-5 terabytes)
Dimension Tables – store the descriptions of the dimensions of the business e.g. product, customer, vendor, store. Smaller than Fact Tables (denormalised)
View source
Types of fact tables
Transaction Fact Tables – record information related to events – individual product sales
Snapshot Fact Table – record information that applies to specific moments in time – year-end accounts
Accumulating snapshot tables – record information to a running tally of data – year to year sales figures
View source
Dimensional tables and fact tables 
Dimension tables list surrogate primary key – consists of a single column integer related to the natural primary key
Wherever you list the Store ID number on the Fact Table, it will map to that specific row of store data on the dimension table
View source

See similar decks

week 5

Cards (33)

4.2.5 Strong and weak acids

3.1.12.3 Weak Acids and Bases

4.2.5 Strong and weak acids

12.3 Strong and Weak Acids and Bases

12.3.1 Degree of Ionization

Week 3

Week 5

Week 1

Week 10

Week 2

One week house spinning 1920

week 1

Week 4

Week 9

Week 7

Week 5 durkheim

brain week 3

Epithelia

happy week 1

week 10 brain

week 7 brain