To be able to draw conclusions about a population mean on the basis of estimates from a sample of the population
This session should take you about 40 to 100 minutes to complete
In this session we will focus on quantitative data and how to draw inferences about a population mean from sample data
In session BS06 we will extend this to compare two means. In sessions BS07 and BS08 we will apply the same principles and methods for proportions
In session BS02 we looked at how we can summarise a quantitative variable in terms of graphical displays and summary measures
In using summary measures we describe the data in terms of an average value together with a measure of how spread out the data are
Example
If we measured the height of all PH students we could describe the height of the group by calculating their average height and a measure of variation of their heights
Number of students: 725
Mean height
169.3 cm
Standard deviation
9.2 cm
Median height
168 cm
Interquartile range
161 cm to 176 cm
Range
149cm to 194 cm
Median is more appropriate than a mean when the distribution is skewed
In this session we will focus on methods that apply to the mean
Means are good summary measures for data with a symmetrical distribution
If the distribution is skewed then a median is the better summary measure
Calculating confidence intervals and performing hypothesis tests for medians (known as 'non-parametric methods') is beyond the scope of this study module
We can make a skewed distribution symmetrical by using a logarithmic transformation
We can apply the properties of sampling distributions to relatively small samples when the distribution is symmetrical
If we took many different samples of the same size from a population each would have a different mean and a different standard deviation
If we plotted these samplemeans, we would obtain a sampling distribution of means
Provided that the sample size is large enough, the distribution of sample means is approximately Normal, even if the distribution of the data in the population is not Normal
For the distribution of height for ALL 725 PH students, the mean was: μ = 169.3cm and the standard deviation was: σ = 9.2cm
If we took another 999 samples of size 150 from the same population of students, we would see this distribution of their means
If we sampled from the population of 725 students thousands of times we would obtain a smoother sampling distribution
Such a distribution would be Normal
The mean of the sampling distribution of means is the true population mean
Standard error (SE(x))
σ /√n
The standard error of the sample mean is the estimated standard deviation of the sampling distribution
In practice we take only one sample
We can see where the estimate from the first sample of 150 students lies within the sampling distribution
Using the sample mean x, instead of the true mean, and the sample standard deviation s, instead of the population, we infer what the sampling distribution is
If we repeated this exercise for many samples we would find that the 'estimated' sampling distributions found from all the different samples fluctuate around the sampling distribution
95% of the sample means fall within 1.96 times the standard error
The sampling distribution of a mean is the distribution of sample means
Confidence interval
An interval around the estimated mean which we have a certain level of confidence contains the true population mean
A 95% confidence interval extends 1.96 SE either side of the mean
If we took thousands of samples, and for each sample calculated the mean and associated 95% confidence interval, we would expect 95% of these confidence intervals to include the population mean
We took another 100 samples of 150 from the population of PH students. For each sample the mean height and a 95% confidence interval were calculated