11.2.1 Data mining techniques

Cards (35)

What is the primary goal of data mining?
Discover patterns in data
Data mining can be used to forecast future trends.
Data mining is the process of extracting patterns and insights from large datasets
Match the data mining technique with its purpose:
Association Rule Learning ↔️ Discovers relationships between variables
Classification ↔️ Assigns data to predefined categories
Clustering ↔️ Groups similar data points together
Anomaly Detection ↔️ Identifies unusual data points
Steps in Association Rule Learning
1️⃣ Discover frequent itemsets
2️⃣ Generate association rules
3️⃣ Evaluate rule metrics
Association Rule Learning identifies relationships between variables in a dataset
What type of data is required for classification?
Labeled data
Clustering is sensitive to feature selection.
What is a common weakness of anomaly detection?
High false positive rates
In Association Rule Mining, support measures the frequency of the rule's occurrence
Confidence in Association Rule Mining measures the probability that the rule is correct.
What does lift measure in Association Rule Mining?
The rule's effectiveness
Classification categorizes data into predefined classes
Classifying emails as spam or not spam is an example of regression.
False
Clustering groups similar data points together based on shared characteristics
DBSCAN is capable of finding clusters of arbitrary shapes.
K-Means partitions data into clusters based on minimizing the distance to centroids
Hierarchical clustering requires specifying the number of clusters beforehand.
False
What does DBSCAN use to identify clusters?
Density
Data mining discovers patterns, trends, and useful information from datasets
Trends in data mining refer to recurring sequences or relationships.
False
What is the purpose of decision-making in data mining?
Selecting best action
Association Rule Learning identifies relationships between variables
Clustering is sensitive to feature selection.
Which data mining technique predicts continuous numerical values?
Regression
Match the metric in Association Rule Mining with its description:
Support ↔️ Frequency of the rule's occurrence
Confidence ↔️ Probability that the rule is correct
Lift ↔️ Measures the rule's effectiveness
In market basket analysis, a rule might be "if a customer buys bread, they are likely to buy butter
What is the primary difference between classification and regression?
Output type
Regression outputs discrete categories.
False
Give an example of a classification problem.
Email spam detection
K-Means clustering partitions data into K clusters based on minimizing distance to centroids
Hierarchical clustering is computationally intensive for large datasets.
What is a key weakness of DBSCAN clustering?
Sensitive to density parameters
Anomaly detection identifies data points that deviate significantly from the norm
Machine learning models can be used for anomaly detection.