definitions from the surrounded context of the test

Cards (35)

  • Categorical Variable

    A variable that defines qualities and falls into distinct categories. There's no inherent order to the categories.
  • Categorical Variable
    • Style of home (bungalow, ranch, colonial)
  • Numerical Variable
    A variable that represents quantities and can be measured on a scale.
  • Numerical Variable

    • Number of bedrooms (can be 2, 3, 4, etc.)
  • Bar Chart

    • Used for comparing categories of data. Each category is represented by a bar whose height or length reflects the value for that category.
  • Bar Chart
    Best for: Comparing frequencies of categorical data.
  • Pie Chart
    • Used to show proportions of a whole. The entire circle represents the whole, and slices represent the proportions of categories.
  • Pie Chart
    Best for: Representing proportions where the categories sum to 100%.
  • Histogram
    • Used to visualize the distribution of continuous numerical data. The data is divided into ranges (bins), and the height of each bar represents the number of data points that fall within that range.
  • Histogram
    Best for: Understanding the shape and spread of numerical data.
  • Scatterplot
    • Used to show the relationship between two numerical variables. Each data point represents a value for one variable plotted against the other variable.
  • Scatterplot
    Best for: Identifying trends or associations between two variables.
  • Mean (Average)

    The sum of all values in a data set divided by the number of values.
  • Median
    The middle value when a data set is ordered from least to greatest. If there are two middle values, the median is the average of those two values.
  • Standard Deviation (SD)

    A measure of how spread out the data is from the mean.
  • Scatterplot
    • A graphical representation that displays the relationship between two numerical variables. Each data point represents a value for one variable plotted against the other variable.
  • Scatterplot
    The position of the points can reveal a trend or association between the variables.
  • Association Between Variables
    The way in which changes in one variable are related to changes in another variable. This can be a positive association (as one increases, the other increases), negative association (as one increases, the other decreases), or no association (no clear relationship).
  • Histogram
    A bar graph that shows the distribution of continuous numerical data. The data is divided into ranges (bins), and the height of each bar represents the number of data points that fall within that range.
  • Histogram
    This helps visualize the shape (symmetrical, skewed) and spread of the data.
  • Symmetry
    In statistics, symmetry refers to the distribution of data around the center (mean or median). A symmetrical distribution has a mirror image on either side of the center. A skewed distribution leans to one side.
  • Data Spread
    Data spread refers to how scattered the data points are in a data set. It describes how far the values are from the center (mean or median). Standard deviation is a common measure of data spread. A higher standard deviation indicates a larger spread of data points.
  • a line graph (also called a line chart or trend plot) is a visual representation of how a continuous numerical variable changes over time or another continuous numerical variable.
  • Line graphs are useful for:
    • Identifying trends (increasing, decreasing, or no clear trend)
    • Observing changes over time or in relation to another variable
    • Comparing trends between different groups or data sets (when plotted on the same graph)
  • Linear Association: This refers to a general trend where the points in the scatterplot show a straight-line pattern. This line can be:
    • Positive Linear Association: As the value of one variable increases, the value of the other variable also tends to increase. The slope of the line is positive. (Imagine points going up and to the right diagonally)
    • Negative Linear Association: As the value of one variable increases, the value of the other variable tends to decrease. The slope of the line is negative. (Imagine points going down and to the right diagonally)
  • Strength of Association: This describes how close the data points cluster around the straight line in a linear association.
    • Strong Association: The points form a tight cluster around a well-defined line. This indicates a clear and predictable relationship between the variables.
    • Weak Association: The points are more scattered around the line, with a wider range of values for one variable at a given value of the other variable. This indicates a less clear or predictable relationship.
  • Non-linear Association: This occurs when there is no clear straight-line pattern in the scatterplot. The points may show a curved pattern, a random scatter, or some other non-linear trend. This indicates that the two variables are not related in a simple linear way.
  • Positive Linear Association: As one variable increases, the other tends to increase. (Points go up and to the right diagonally.)
    Negative Linear Association: As one variable increases, the other tends to decrease.(Points go down and to the right diagonally.)
    Strong Association: Points cluster tightly around a well-defined line.
    Weak Association: Points are more scattered around the line.
    Non-linear Association: No clear straight-line pattern. Points may be curved, random, etc.
  • The Interquartile Range (IQR) is a measure of statistical dispersion, which is the spread of the data points. Specifically, it measures the range within which the middle 50% of the data lies. Here's how it's calculated:
  • When to Use IQR
    • Skewed Distributions: IQR is especially useful for data that are not symmetrically distributed (i.e., skewed distributions), where the mean and standard deviation might not represent the data well.
    • Presence of Outliers: When there are outliers, the mean can be misleading as it gets affected by extreme values. IQR, being based on the middle 50% of the data, is more reliable in such cases.
  • Calculate the IQR: IQR=Q3−Q1
    1. Law of Total Probability:
    P(Total) = P(A) + P(B) + P(not A and not B)
    2. Express the answer as a percentage:
    Multiply the decimal answer by 100% to convert it to a percentage:
  • The formula used to calculate the probability of the union of two events is known as the Principle of Inclusion-Exclusion. It is used to find the probability that at least one of the events occurs. The formula is:
    P(A∪B)=P(A)+P(B)−P(A∩B)P(A \cup B) = P(A) + P(B) - P(A \cap B)P(A∪B)=P(A)+P(B)−P(A∩B)
    Here's a breakdown of the terms:
    • P(A)P(A)P(A): Probability of event AAA occurring.
    • P(B)P(B)P(B): Probability of event BBB occurring.
    • P(A∩B)P(A \cap B)P(A∩B): Probability of both events AAA and BBB occurring.
    • Law of Total Probability:
    • Use when breaking down an event into disjoint scenarios.
    • Typically involves conditional probabilities.
    • Principle of Inclusion-Exclusion:
    • Use when dealing with the union of overlapping events.
    • Adjusts for double-counting intersections of events.
  • confidence level