ADET LESSON 2

Cards (38)

  • Data Science
    • combines multiple fields including statistics, scientific methods, anddata analysis.
  • History of Data Science - In 2001, William S. Cleveland wanted to bring data mining to anotherlevel.
  • COMPUTER SCIENCE + DATA MINING = DATA SCIENCE
  • Data Mining
    • process of analysing data from different perspectives and summarising.
  • Data mining is the analysis step of the "knowledge discovery indatabases" process, or KDD.
  • Data Analysis
    • stands for human activities aimed at gaining some insight on a dataset.
  • Three Types Of Data Analysis
    • Predictive (forecasting)
    • Descriptive (business intelligence and data mining)
    • Prescriptive (optimization and simulation)
  • Predictive (forecasting)
    • turns data into valuable, actionable information
  • Descriptive (business intelligence and data mining)
    • looks at data and analyzes past events for insight as to how toapproach the future.
  • Prescriptive (optimization and simulation)
    • automatically synthesizes big data, mathematical sciences, businessrules.
  • How Data Mining Can Help a Business Improve Competitiveness (4)
    • Sales Forecasting
    • Database Marketing
    • Market Segmentation
    • E-commerce Basket Analysis
  • Sales Forecasting
    • analysing when customers bought to predict when they will buy again.
  • Database Marketing
    • marketing: examining customer purchasing patterns and looking.
  • Market Segmentation
    • a classic use of data mining, using data to break down a market.
  • E-commerce Basket Analysis
    • using mined data to predict future customer behavior.
  • Data Preprocessing
    • part of data preparation method and a data mining technique.
  • Data Collection
    • prepare machine learning models we need to collect data for therequired purpose.
  • 6 There are many ways to collect data such as:
    • ONLINE SURVEY
    • OBSERVATION
    • INTERVIEW
    • CASE STUDY METHODS
    • QUESTIONNAIRE
    • GOOGLE FORMS
  • 3 Data preprocessing techniques includes:
    • Formatting of data
    • Cleaning of data
    • Sampling of data
  • Data Aggregation
    • type of data and information mining process where data is searched.
  • Data Processing Cycle
    • process of changing or converting information into meaningful.
  • 6 Data Processing Cycle
    • Data Collection
    • Data Preparation
    • Data Input
    • Processing
    • Data Output
    • Data Storage
  • Data Collection
    • the first and the most crucial stage the quality impacts
  • Data Preparation
    • this stage,bad,incomplete or incorrect data will be eliminated.
  • Data Input
    • cleaned data is entered through a computer to process.
  • Processing
    • data entered on the computer is processed
  • Data Output
    • processed data is now delivered to the user.
  • Data Storage
    • last stage of the cycle where the data is stored for future
  • Big Data
    • data that is so large, fast or complex that it’s difficult or impossible toprocess using traditional methods.
  • Three V’s:
    Volume
    Velocity
    Variety
  • Volume
    • Organizations collect data from a variety of sources, including businesstransactions, smart (IoT) devices.
  • Velocity
    • With the growth in the Internet of Things, data streams in tobusinesses.
  • Variety
    • Data comes in all types of formats – from structured, numeric data.
  • Big Data with Hadoop
    • Tools to store and analyze data in Data Processing.
  • 3 Apache Hadoop offers below modules:
    • Hadoop Common
    • Hadoop Distributed File System (HDFS)
    • Hadoop YARN
  • Hadoop Common
    • module consists of the utilities to support other modules.
  • Hadoop Distributed File System (HDFS)
    • High-throughput access to the application data.
  • Hadoop YARN
    • cluster resource management and job scheduling are achieved.