definitions from the surrounded context of the test

Created by

Ha Thien An Nguyen

Cards (35)

Categorical Variable 
A variable that defines qualities and falls into distinct categories. There's no inherent order to the categories.
View source
Categorical Variable 
Style of home (bungalow, ranch, colonial)
View source
Numerical Variable 
A variable that represents quantities and can be measured on a scale.
View source
Numerical Variable 
Number of bedrooms (can be 2, 3, 4, etc.)
View source
Bar Chart 
Used for comparing categories of data. Each category is represented by a bar whose height or length reflects the value for that category.
View source
Bar Chart 
Best for: Comparing frequencies of categorical data.
View source
Pie Chart 
Used to show proportions of a whole. The entire circle represents the whole, and slices represent the proportions of categories.
View source
Pie Chart 
Best for: Representing proportions where the categories sum to 100%.
View source
Histogram 
Used to visualize the distribution of continuous numerical data. The data is divided into ranges (bins), and the height of each bar represents the number of data points that fall within that range.
View source
Histogram 
Best for: Understanding the shape and spread of numerical data.
View source
Scatterplot 
Used to show the relationship between two numerical variables. Each data point represents a value for one variable plotted against the other variable.
View source
Scatterplot 
Best for: Identifying trends or associations between two variables.
View source
Mean (Average) 
The sum of all values in a data set divided by the number of values.
View source
Median 
The middle value when a data set is ordered from least to greatest. If there are two middle values, the median is the average of those two values.
View source
Standard Deviation (SD) 
A measure of how spread out the data is from the mean.
View source
Scatterplot 
A graphical representation that displays the relationship between two numerical variables. Each data point represents a value for one variable plotted against the other variable.
View source
Scatterplot 
The position of the points can reveal a trend or association between the variables.
View source
Association Between Variables 
The way in which changes in one variable are related to changes in another variable. This can be a positive association (as one increases, the other increases), negative association (as one increases, the other decreases), or no association (no clear relationship).
View source
Histogram 
A bar graph that shows the distribution of continuous numerical data. The data is divided into ranges (bins), and the height of each bar represents the number of data points that fall within that range.
View source
Histogram 
This helps visualize the shape (symmetrical, skewed) and spread of the data.
View source
Symmetry 
In statistics, symmetry refers to the distribution of data around the center (mean or median). A symmetrical distribution has a mirror image on either side of the center. A skewed distribution leans to one side.
View source
Data Spread 
Data spread refers to how scattered the data points are in a data set. It describes how far the values are from the center (mean or median). Standard deviation is a common measure of data spread. A higher standard deviation indicates a larger spread of data points.
View source
a line graph (also called a line chart or trend plot) is a visual representation of how a continuous numerical variable changes over time or another continuous numerical variable.
Line graphs are useful for:
Identifying trends (increasing, decreasing, or no clear trend)
Observing changes over time or in relation to another variable
Comparing trends between different groups or data sets (when plotted on the same graph)
Linear Association: This refers to a general trend where the points in the scatterplot show a straight-line pattern. This line can be:
Positive Linear Association: As the value of one variable increases, the value of the other variable also tends to increase. The slope of the line is positive. (Imagine points going up and to the right diagonally)
Negative Linear Association: As the value of one variable increases, the value of the other variable tends to decrease. The slope of the line is negative. (Imagine points going down and to the right diagonally)
Strength of Association: This describes how close the data points cluster around the straight line in a linear association.
Strong Association: The points form a tight cluster around a well-defined line. This indicates a clear and predictable relationship between the variables.
Weak Association: The points are more scattered around the line, with a wider range of values for one variable at a given value of the other variable. This indicates a less clear or predictable relationship.
Non-linear Association: This occurs when there is no clear straight-line pattern in the scatterplot. The points may show a curved pattern, a random scatter, or some other non-linear trend. This indicates that the two variables are not related in a simple linear way.
Positive Linear Association: As one variable increases, the other tends to increase. (Points go up and to the right diagonally.)
Negative Linear Association: As one variable increases, the other tends to decrease.(Points go down and to the right diagonally.)
Strong Association: Points cluster tightly around a well-defined line.
Weak Association: Points are more scattered around the line.
Non-linear Association: No clear straight-line pattern. Points may be curved, random, etc.
The Interquartile Range (IQR) is a measure of statistical dispersion, which is the spread of the data points. Specifically, it measures the range within which the middle 50% of the data lies. Here's how it's calculated:
When to Use IQR
Skewed Distributions: IQR is especially useful for data that are not symmetrically distributed (i.e., skewed distributions), where the mean and standard deviation might not represent the data well.
Presence of Outliers: When there are outliers, the mean can be misleading as it gets affected by extreme values. IQR, being based on the middle 50% of the data, is more reliable in such cases.
Calculate the IQR: IQR=Q3−Q1
Law of Total Probability:
P(Total) = P(A) + P(B) + P(not A and not B)
2. Express the answer as a percentage:
Multiply the decimal answer by 100% to convert it to a percentage:
The formula used to calculate the probability of the union of two events is known as the Principle of Inclusion-Exclusion. It is used to find the probability that at least one of the events occurs. The formula is:
P(A∪B)=P(A)+P(B)−P(A∩B)P(A \cup B) = P(A) + P(B) - P(A \cap B)P(A∪B)=P(A)+P(B)−P(A∩B)
Here's a breakdown of the terms:
P(A)P(A)P(A): Probability of event AAA occurring.
P(B)P(B)P(B): Probability of event BBB occurring.
P(A∩B)P(A \cap B)P(A∩B): Probability of both events AAA and BBB occurring.
Law of Total Probability:
Use when breaking down an event into disjoint scenarios.
Typically involves conditional probabilities.
Principle of Inclusion-Exclusion:
Use when dealing with the union of overlapping events.
Adjusts for double-counting intersections of events.
confidence level