Cards (23)

    • What is the mean?
      Sum of all the values divided by the number of observations
    • What is the median?
      The value at the centre of the dataset, equal proportions above and below.
    • What is the mode?
      The most frequent observations
    • What does unimodal mean?
      A distribution which only has one peak - so will have a single highest data point (a normal distribution)
    • What does bimodal mean?
      A bimodal has two peaks (modes) - suggests the data may come from two different groups
    • What is a symmetric distribution?
      This is when the left and right sides are mirror images of each other around a central point (usually a mean or median)
    • What are the key characteristics of a symmetric distribution?
      The mean, median and mode are usually the same and lie in the middle, if you fold the distribution at it's centre both halves will match perfectly
    • What is a gaussian (normal) distribution?
      Bell shaped probability distribution that is symmetric around it's mean.
      It is symmetric - the left and right sides are mirror images
      it is a bell shaped curve - the highest point is at the mean
      probability decreases as you move away from it
      mean = median = mode - all three measures of central tendency are the same
      Continuous data
      Independent observations
      Vary from result to result in an unpredictable manner, although some values are more likely than others
    • What is a skewed distribution?
      This is when data is not represented symmetrically around the mean. It can be either left or right skewed where the data will be longer on either the left or right
    • Negatively skewed
      peak is more to the right, the mode is bigger followed by the median and then the mean
    • Positively skewed
      peak is more to the left, the mean is bigger followed by the median and then the mode - opposite of the negatively skewed graph
    • How do you calculate standard deviation?
      1. Calculate the mean of the data
      2. Subtract the mean from each data point (so you know how much each point deviates from the mean)
      3. Square the output of (step 2) for each data point
      4. Add all the outputs of (step 3) together
      5. Divide the output of (step 4) by n-1, where n is the number of data points
      6. Take the square root of the output of (step 5)
    • What is an estimator?
      This is a rule used to create an estimate
      (eg the assumption of a normal distribution may be used to estimate the statistical properties of a distribution from a sample thereof)
    • What makes a good estimator?
      Ubiased - not yield systematic errors
      Consistent - the estimates should converge as the sample gets larger
      Doesn't need to be precise
    • What is the gaussian (normal) distribution determined by?
      The mean and standard deviation
    • What is the z-score?
      The number of standard deviations an observation is above or below the mean
    • How do you convert a value to a z-score?
      1. Subtract the mean
      2. Divide by the standard deviation
      3. Allows the comparison of observations from different normal distributions
    • What is the equation to calculate z score?
      z = x - u/ o where x = the value you want to convert, u = the mean of the dataset, o = the standard deviation of the dataset
    • How do we interpret the z scores?
      z = 0 - the value is exactly at the mean
      z > 0 - the value is above the mean
      z < 0 - the value is below the mean
      |z| > 2 - the value is far away from the mean (unusual in a normal distribution)
    • what is a p value?How likely it is to get a result like this if H0 (null hypothesis) is true
      If null hypothesis is true, p value gives the probability of obtaining a test statistic at least as extreme as the one obtained
      If p is smaller, the evidence to reject H0 is stronger
    • What is a confidence interval?
      A range of values that is likely to contain the true population parameter (e.g., the mean) with a certain level of confidence (e.g., 95%)
    • What is a two tailed test?
      two-tailed test is a type of hypothesis test where you check for differences in both directions (greater than and less than). It is used when you want to determine whether a sample mean is significantly different (either higher or lower) from a population mean, rather than just greater or smaller.
    • What does a 95% confidence interval mean?
      A 95% confidence interval means that if you were to repeat an experiment or study many times, about 95% of the calculated confidence intervals from those repetitions would contain the true population parameter (like the true mean or proportion).
    See similar decks