Data come from everywhere (hospital, weather station, grocery markets, e-commerce, stock exchange, social media)
Data
Collection of records and their attributes, an attribute is a characteristic of an object
Types of Data
Record Data, Temporal Data, Spatial & Spatial-Temporal Data, Graph Data, Unstructured Data, Semi-Structured Data
Record Data
Transactional Data
Market-Basket Dataset
Bread, Coke, Milk
Beer, Bread
Beer, Coke, Diaper, Milk
Beer, Bread, Diaper, Milk
Coke, Diaper, Milk
Data Matrix
If data objects have the same fixed set of numeric attributes, the data objects can be thought of as points in a multi-dimensional space, where each dimension represents a distinct attribute
Data Matrix Example for Documents
Each document becomes a 'term' vector, each term is a component (attribute) of the vector, the value of each component is the number of times the corresponding term occurs in the document
Distance Matrix
Represents the distances between data points
Temporal Data
Sequences Data, Time Series Data
Temporal Data
Patient Data, Yahoo Finance Website, Biological Sequence Data, Interval Data
Spatial & Spatial-Temporal Data
Spatial Data, Trajectory Data
Spatial & Spatial-Temporal Data
Spatial Distribution of Objects, Average Monthly Temperature, Dengue Disease Dataset, Hurricane Trajectories, User Movement Data
Graph Data
Data with graph structure
Semi-structured Data
Data that does not obey the tabular structure of data models associated with relational databases or other forms of data tables, but nonetheless contains tags or other markers to separate semantic elements and enforce hierarchies of records and fields within the data
Unstructured Data
Data with no predefined format or organization, making it much more difficult to collect, process, and analyze
Data can help us solve specific problems
Data Mining Tasks
Clustering
Classification
Frequent Patterns
Association Rules
What people do with time series data
Clustering
Classification
Query by Content
Rule Discovery
Motif Discovery
Novelty Detection
Visualization
Motif Association
What people do with trajectory data
Clustering
Motif Discovery
Visualization
Frequent Travel Patterns
Classification
Prediction
Types of Data
Transactional Data
Sequence Data
Interval Data
Time Series Data
Spatial Data
Spatio-Temporal Data
Data Set with Multiple Kinds of Data
Data Mining Methods
Frequent Pattern Discovery
Classification
Clustering
Outlier Detection
Statistical Analysis
Distinctions between statistics, machine learning, and data mining are fuzzy
Visualization facilitates human discovery and presents discovered results in a visually "nice" way
Summarization describes features of a selected group using natural language and graphics, usually in combination with deviation detection or other methods