Save
ADET LESSON 2
Save
Share
Learn
Content
Leaderboard
Learn
Created by
Mavi
Visit profile
Cards (38)
Data Science
combines multiple fields including statistics, scientific methods, anddata analysis.
History of Data Science - In
2001
,
William S. Cleveland
wanted to bring data mining to anotherlevel.
COMPUTER SCIENCE + DATA MINING =
DATA
SCIENCE
Data Mining
process of analysing data from different perspectives and summarising.
Data mining
is the analysis step of the "
knowledge
discovery
indatabases
" process, or
KDD.
Data Analysis
stands for human activities aimed at gaining some insight on a dataset.
Three Types Of Data Analysis
Predictive
(forecasting)
Descriptive
(business intelligence and data mining)
Prescriptive
(optimization and simulation)
Predictive
(forecasting)
turns data into valuable, actionable information
Descriptive
(business intelligence and data mining)
looks at data and analyzes past events for insight as to how toapproach the future.
Prescriptive
(optimization and simulation)
automatically synthesizes big data, mathematical sciences, businessrules.
How Data Mining Can Help a Business Improve Competitiveness (4)
Sales Forecasting
Database Marketing
Market Segmentation
E-commerce Basket Analysis
Sales Forecasting
analysing when customers bought to predict when they will buy again.
Database Marketing
marketing
: examining customer purchasing patterns and looking.
Market Segmentation
a classic use of data mining, using data to break down a market.
E-commerce Basket Analysis
using mined data to predict future customer behavior.
Data Preprocessing
part of data preparation method and a data mining technique.
Data Collection
prepare machine learning models we need to collect data for therequired purpose.
6 There are many ways to collect data such as:
ONLINE SURVEY
OBSERVATION
INTERVIEW
CASE STUDY METHODS
QUESTIONNAIRE
GOOGLE FORMS
3 Data preprocessing techniques includes:
Formatting of data
Cleaning of data
Sampling of data
Data Aggregation
type of data and information mining process where data is searched.
Data Processing Cycle
process of changing or converting information into meaningful.
6 Data Processing Cycle
Data Collection
Data Preparation
Data Input
Processing
Data Output
Data Storage
Data Collection
the first and the most crucial stage the quality impacts
Data Preparation
this stage,bad,incomplete or incorrect data will be eliminated.
Data Input
cleaned data is entered through a computer to process.
Processing
data entered on the computer is processed
Data Output
processed data is now delivered to the user.
Data Storage
last stage of the cycle where the data is stored for future
Big Data
data that is so large, fast or complex that it’s difficult or impossible toprocess using traditional methods.
Three V’s:
Volume
Velocity
Variety
Volume
Organizations collect data from a variety of sources, including businesstransactions, smart (IoT) devices.
Velocity
With the growth in the Internet of Things, data streams in tobusinesses.
Variety
Data comes in all types of formats – from structured, numeric data.
Big Data with Hadoop
Tools to store and analyze data in Data Processing.
3 Apache Hadoop offers below modules:
Hadoop Common
Hadoop Distributed File System
(
HDFS
)
Hadoop YARN
Hadoop Common
module consists of the utilities to support other modules.
Hadoop
Distributed
File
System
(
HDFS
)
High-throughput access to the application data.
Hadoop YARN
cluster resource management and job scheduling are achieved.