Descriptive statistics

Cards (50)

  • Statistics are all around us, every day
  • Sample statistics are estimates of population parameters
  • Samples are limited to measuring characteristics of a portion of the population
  • Inferential statistics
    • Used to make convincing statements & confident conclusions based on results
    • Statistical tests calculate the probability (p) that the observed effects/relationships happened by random chance
    • Probability = effects/relationships due to natural variability
    • Probability < effects/relationship enough to be real
  • Mean: Sum of all observations divided by the number of observations
  • Median: The middle measurement in an ordered set of data, less sensitive to extreme values than mean - more robust measure in some cases
  • Applications of statistics
    • Advertising campaigns
    • Trends
    • Evaluating crime patterns
    • GDP
  • Descriptive statistics
    • Used to summarise data
    • Measure of central tendency: Mean, median, mode
    • Measure of spread or dispersion: Standard deviation, variance, standard error, range
    • Help describe a sample from a population
  • Population parameters describe characteristics of a statistical population
  • As sample size (n) increases, sample statistics become more accurate - estimates of the true population
  • Mean is sensitive to extreme values in a population
  • Mode can also be used as a measure of central tendency for nominal and ordinal data
  • Sample variance (s2)
    Based on the sum of squares (SS), which is the sum of the squared deviations of each observation from the sample mean
  • Population variance (σ2) is the mean of SS divided by the mean square, where N is the population size
  • SD is the most commonly reported measure of variability for the mean when describing a sample, reported as mean ± SD
  • Median is the middle measurement in an ordered set of data, having an equal number of observations on either side and is less sensitive to extreme values than mean
  • Mode
    • Most common value in a dataset, not sensitive to extreme values
  • When reporting the mean, one must also report one of these measures of variability: Sample Variance (s2), Standard deviation of the sample (SD), Standard error of the sample mean (se), 95% confidence limits
  • Sample variance (s2) is the mean of SS divided by the mean square
  • Sample Standard deviation (SD) is the average deviation of observations from the sample mean and has the same units as the original measurements
  • Standard error (SE) calculation
    se = SD/√n
  • Coefficient of Variation (V)
    Used to compare variability (SD) between samples that have different means
  • SD is the most commonly reported measure of variability for the mean when describing a sample
  • Degrees of Freedom (d.f.) often = n-1 when the sample variance is calculated
  • Range is the difference between the smallest and largest value observed
  • SD
    Square root of the variance
  • Measures of variability for the median
    • Range
    • Interquartile range
  • Interquartile Range is the range of values in the 2nd and 3rd quartiles (middle 50% of the values)
  • Standard error (SE) is a measure of the precision or uncertainty around the estimate of the population mean
  • 95% Confidence limits of the Mean are a measure of the precision of the estimate of the mean
  • Interquartile range calculation

    1. Lowest 25% of the values = 1st quartile
    2. Next 25% = 2nd quartile
    3. Next 25% = 3rd quartile
    4. Last 25% = 4th quartile
  • Interquartile range
    Range of values in the 2nd and 3rd quartiles (middle 50% of the values)
  • If a firm increases advertising, their demand curve shifts right, increasing the equilibrium price and quantity
  • Frequency or number of each observed value/category in a dataset is used to estimate the probability of events happening
  • Mean, Median & Mode are similar in value or approach the same value in large datasets
  • If you add up marginal utility for each unit, you get total utility
  • Marginal utility
    Additional utility (satisfaction) gained from the consumption of an additional product
  • Median
    Middle-most value & forms the boundary between 2nd and 3rd quartiles
  • Frequency distributions of ratio/interval data often take the shape of a normal distribution, approximating to a symmetrical and bell-shaped curve
  • Data presentation: Boxplots