1. Fundamentals of statistical testing

Cards (20)

  • Mean
    Sum of all numbers in a set, divided by the number of numbers
  • Standard deviation (SD)
    Spread of the data around the mean. Average difference from the mean
    x = the value from the data
    x w/ the line = the mean
    n = number of values
    E symbol = sum of all (the calculations w each of the values)
  • Greek, Latin and hat symbols
    Greek = populations
    Latin = samples
    Hat = population estimates
  • Known distributions
    Some shapes are 'algebraically tractable' (i.e. there is a maths formula to draw the line). Y axis = density (worked out by the pre-set formula)
  • Normal distribution
    Continuous, unimodal (only one peak in distribution, one mode), symmetrical and bell-shaped. Normal distribution has fixed proportions and a function of two parameters (mean and SD). Key to be noted that not every bell-curved and symmetrical distribution is a normal distribution
  • Chi-square distribution
  • t distribution
  • Beta distribution
  • Uniform distribution
  • Area below the normal curve
    approx. 68% is within ±1sd from the mean
    95% is within ±1.96sd from the mean
    99% is within ±2.58sd from the mean
  • Proportions to probability

    Proportions are always the same in a normal distribution. If we know smth is normally distributed, we are able to know something about the probability
  • Working out proportions - standardisation
    Transforming any distribution to one with mean = 0 and sd = 1, aka transforming variables into z-scores. Do this by subtracting each score from the mean and dividing by the standard deviation
  • Working out proportions - probability of z-score
    Use z-table or R to see probability of score e.g. z-score -1.75 has a probability of around 4%. So 4% of variables are below this score, and 96% are above
  • using R to work out proportions (Charlie social events e.g.)
    Using original scores and distribution properties;
    charlie_events = 57
    pnorm(charlie_events, mean = 127, sd = 40, lower.tail = FALSE)
    Using z-scores and standard normal distribution;
    charlie_z = -1.75
    pnorm(charlie_z, mean = 0, sd = 1, lower.tail = FALSE)
  • Critical value
    A value that cuts off a specific proportion of a distribution, e.g. the top 5%
  • Working out critical values - Charlie social events e.g.

    Work backwards; find z-score corresponding to probability of 0.95, then transform z-score to og score
    Using r;
    qnorm(p = 0.95, mean = 127, sd = 40)
  • Sampling from distributions
    Collect data on variable = randomly sampling from distribution. Many variables come from normal distribution, some may come from other types... e.g.
    Reaction times - log-normal distribution
    Annual casualties due to horse kicks - Poisson distribution (only deals w/ integers, i.e. can't have .4 of a death)
    Passes/fails on exam - binomial distribution
  • Sampling more people
    Samples from the same population will be different from each other. If took many samples and each time calculated the mean, they would have their own distribution.
  • Sampling distribution (of the mean)

    Distribution of the means of many samples of a particular size. Distribution is normal and centred around the true population mean.
  • Central Limit Theorem
    As n gets larger, the sampling distribution of the mean tends towards a normal distribution with population mean