ibm chapter 5

Cards (21)

  • hierarchical - data are organized into a tree-like structure as record, relationship between nodes is in parent-child, one to many relationship, can't support many to many relationship, lead to data duplication
  • network : data are organized in a net like hierarchical structure, each node have multiple parent or multiple child, support many to many relationship
  • type of DBMs : 1) network 2) hierarchical 3) relational 4) object-oriented
  • data warehousing : collection of data from multiple databases that are integrated for data mining, analysis, reporting and decision making. A core component of Business Intelligence (BI)
  • processes of data warehousing : 1)data cleansing 2) staging 3) data integration
  • data cleaning : process of detecting and correcting corrupt or inaccurate record, remove data thta are not not valid / incorrect
  • staging : storage area to perform ETL (extract, transform and load) process to load data from different operational system to data warehouse.
  • data integration : process of combining data residing in different source and provide user a unified view of them, data is arranged into hierarchical groups or dimension
  • traditional data warehouse architecture employ 3 tier structure : bottom tier, middle tier and top tier
  • bottom tier : contain database where data are cleansed and transformed from different source
  • middle tier : contain OLP to perform the analysis and querying
  • top tier : client layer where it contain the tools (query,reporting analysis,mining) and APIs used for high level data analysis
  • 3 main data in data warehouse : 1) metadata 2) summary data 3) raw data
  • operation of OLAP : 1) roll up 2)drill down 3) slicing 4) dicing 5) pivot
  • OLAP mostly optimized for reading (select) instead of writing (add,update,delete) compared to OLTP
  • roll up : perform consolidation by reducing dimension, move up the concept hierarchy
  • drill down : fragmented into smaller part, is the opposite of roll up
  • slicing : take out single dimension and create a sub-cube from main cube for specific data
  • dicing : select and view a sub-cube that is created from 2 or more dimension
  • pivot : rotate the data axes to provide a substitute presentation of data
  • benefit of data warehousing : integrate data from multiple sources into a single database, simplify the analyzing and reporting processes, improve data quality, enhance the operational business application and CRM systems, information can be stored forlonger time period, helpful in providing collective information to user