Statistical Inferences

Cards (80)

  • Aim
    To be able to draw conclusions about a population mean on the basis of estimates from a sample of the population
  • This session should take you about 40 to 100 minutes to complete
  • In this session we will focus on quantitative data and how to draw inferences about a population mean from sample data
  • In session BS06 we will extend this to compare two means. In sessions BS07 and BS08 we will apply the same principles and methods for proportions
  • In session BS02 we looked at how we can summarise a quantitative variable in terms of graphical displays and summary measures
  • In using summary measures we describe the data in terms of an average value together with a measure of how spread out the data are
  • Example
    • If we measured the height of all PH students we could describe the height of the group by calculating their average height and a measure of variation of their heights
  • Number of students: 725
  • Mean height
    169.3 cm
  • Standard deviation
    9.2 cm
  • Median height
    168 cm
  • Interquartile range
    161 cm to 176 cm
  • Range
    149cm to 194 cm
  • Median is more appropriate than a mean when the distribution is skewed
  • In this session we will focus on methods that apply to the mean
  • Means are good summary measures for data with a symmetrical distribution
  • If the distribution is skewed then a median is the better summary measure
  • Calculating confidence intervals and performing hypothesis tests for medians (known as 'non-parametric methods') is beyond the scope of this study module
  • We can make a skewed distribution symmetrical by using a logarithmic transformation
  • We can apply the properties of sampling distributions to relatively small samples when the distribution is symmetrical
  • If we took many different samples of the same size from a population each would have a different mean and a different standard deviation
  • If we plotted these sample means, we would obtain a sampling distribution of means
  • Provided that the sample size is large enough, the distribution of sample means is approximately Normal, even if the distribution of the data in the population is not Normal
  • For the distribution of height for ALL 725 PH students, the mean was: μ = 169.3cm and the standard deviation was: σ = 9.2cm
  • If we took another 999 samples of size 150 from the same population of students, we would see this distribution of their means
  • If we sampled from the population of 725 students thousands of times we would obtain a smoother sampling distribution
  • Such a distribution would be Normal
  • The mean of the sampling distribution of means is the true population mean
  • Standard error (SE(x))

    σ /√n
  • The standard error of the sample mean is the estimated standard deviation of the sampling distribution
  • In practice we take only one sample
  • We can see where the estimate from the first sample of 150 students lies within the sampling distribution
  • Using the sample mean x, instead of the true mean, and the sample standard deviation s, instead of the population, we infer what the sampling distribution is
  • If we repeated this exercise for many samples we would find that the 'estimated' sampling distributions found from all the different samples fluctuate around the sampling distribution
  • 95% of the sample means fall within 1.96 times the standard error
  • The sampling distribution of a mean is the distribution of sample means
  • Confidence interval
    An interval around the estimated mean which we have a certain level of confidence contains the true population mean
  • A 95% confidence interval extends 1.96 SE either side of the mean
  • If we took thousands of samples, and for each sample calculated the mean and associated 95% confidence interval, we would expect 95% of these confidence intervals to include the population mean
  • We took another 100 samples of 150 from the population of PH students. For each sample the mean height and a 95% confidence interval were calculated