Economic Data Analysis (ECO1017)

Cards (33)

  • What is one example of where data was used in policy?
    New Jersey in 1992 - increased the minimum wage from $4.25 to $5.05 increasing employment by 13% in New Jersey relative to Pennsylvania
  • What is one example of data being used in business?
    Amazon's recommendation system - 35% of Amazon's total sales are through this.
  • What is a variable?

    Anything that can be measured and can differ across entities or across time.
  • What is a categorical variable?

    A type of variable that can take on one of a limited, fixed number of values, which represent distinct categories or groups.
  • What is an example of a categorical variable?

    Colours
  • What is a numerical variable?

    A measurable variable in which arithmetic operations are applicable.
  • What is an ordinal variable?

    A type of categorical variable that has a clear ordered relationship between its categories, meaning the categories can be ranked or sorted.
  • What is an example of an ordinal variable?

    Education level
  • What is an important aspect of categories?

    They can be numbered, but they're not numerical values.
  • What is an important aspect of ordinal variables?

    Their values can be numbered but they are not usually numerical variables.
  • What is the value of the variable?

    The possible outcomes or potential values that a variable can take based on its definition.
  • What is the realised (observed) value?

    The actual value that the variable takes after the experiment or observation is made; it's the concrete outcome from a specific trial or data point.
  • What is the preferred condition to use a pie chart?

    When there are around 3-5 categories.
  • What can the shape of a histogram tell us?

    The probability distribution.
  • What is a population?
    A well-defined collection of units to which we want to generalise a set of findings or a statistical model.
  • What is a sample?

    A much smaller collection of units derived form a population used to determine truths about that target population.
  • What are the two characteristics that make a sample good?
    • Representative - sample includes only members of the population being studied.
    • Random - every member of the population studied has an equal chance of being selected for the sample which prevents bias.
  • What are the three common measures of centre?
    • mean - average value
    • median - middle value
    • mode - most common value
  • What is the best use case for mode?

    For categorical or discrete data
  • What is the best use case for mean?

    For data that is symmetrically distributed without outliers.
  • What is the best use case for median?

    For skewed distributions when outliers are present.
  • What can the spread/variability of a distribution be described using?

    The percentiles
  • What is the five-number summary?
    • Minimum value
    • Q1
    • Median
    • Q3
    • Maximum value
  • What are the ranges for outliers?
    • 1.5 x IQR < Q1
    • 1.5 x IQR > Q3
  • Why may outliers appear?
    • Measurement errors
    • Data entry errors
    • Sampling errors
    • Natural variation (genuine rare events)
    • Skewed distribution
    • Unexpected external factors
  • What are the two measures of spread?
    • Standard deviation
    • Variance
  • What does the Standard Deviation show?

    The spread of observations around the mean.
  • What does the size of the value of the standard deviation and variance mean?
    • the standard deviation and variance will be large if the observations are widely spread about the mean and small if the observations are close to the mean.
  • Why do we use n - 1 instead of n in the formula for S.D.?

    We use it to account for the constraint in data as the last deviation is already determined as the sum of deviations is zero.
  • What does a positive skew look like?

    Tail of the distribution stretches longer to the right (towards higher values)
  • What is the order of the measures of centre in a positive skew?

    Mean > Median > Mode as the mean is pulled up by the higher values in the tail.
  • What does a negative skew look like?

    Tail of the distribution stretches longer to the left (towards lower values)
  • What is the order of the measures of centre in a negative skew?

    Mean < Median < Mode as the mean is pulled down by the lower values in the tail.