a census observes or measures every member of a population
a sample observes or measures a subset of a population which has been selected to represent the whole population
advantages of a census:
completely accurate result
disadvantages of a census:
time consuming
expensive
cannot be used if the testing process destroys the item
hard to process large amounts of data
advantages of a sample:
less time consuming
less expensive
fewer people have to respond
easier to process smaller amounts of data
disadvantages of a sample:
result is less accurate
sample may be too small to accurately represent sub-groups of the population
the size of the sample affects the result as the larger the sample, the more accurate the result, but a larger sample requires more resources
a sampling frame is a list of indiviually named or numbered sampling units
random sampling means every member of the population has equal chance of being selected, this makes the sample representative and removes bias
simple random sampling:
requires a sampling frame, numbers are randomly generated by a calculator, computer, random number table or lottery
the three types of random sampling are simple random, systematic, and stratified
systematic sampling:
requires an ordered list, items are chosen at regular intervals from a random starting place
stratified sampling:
requires the population to be divided into mutually exclusive strata, a random sample is taken from each stratum
advantages of simple random sampling:
removes bias
easy and cheap for small populations and samples
each sampling unit has an equal chance of selection
disadvantages of simple random sampling:
not suitable for large populations and samples
requires a sampling frame
advantages of systematic sampling:
simple and quick to use
suitable for large populations and samples
disadvantages of systematic sampling:
requires a sampling frame
can introduce bias if sampling frame is not random
advantages of stratified sampling:
accurately reflects the population structure
proportional representation of groups
disadvantages of stratified sampling:
requires population to be divided into strata
selection within strata has the same issues as simpple random sampling
the two types of non-random sampling are quota and opportunity
quota sampling:
population is divided into groups based on a characteristic, the researcher meets people, assesses their group and interviews them, until all groups have been filled
opportunity sampling:
sample is taken from the first required number of people who are available and fit the criteria
advantages of quota sampling:
small sample can be representative
no sampling frame required
quick, easy and cheap
different groups can be easily compared
disadvantages of quota sampling:
non-random so can introduce bias
population must be divided into groups, requires time and money
non-responses are ignored instead of recorded
advantages of opportunity sampling:
easy
inexpensive
disadvantages of opportunity sampling:
unlikely to be representative
dependent on individual researcher
quantitative data means data associated with numerical observations
qualitative data means data associated with non-numerical observations
a continuous variable is a variable which can take any value in a given range
a discrete variable is a variable which can only take specific values in a given range
the large data set has the following variables for UK weather stations:
daily mean temperature in degrees celsius
daily total rainfall in mm
daily total sunshine in hours
daily mean wind direction
daily mean wind speed in knots
daily maximum gust in knots
daily maximum relative humidity as a percentage
daily mean cloud cover in oktas
daily mean visibility in decametres
daily mean pressure in hectopascals
the large data set has the following variables for non-UK weather stations:
daily mean temperature in degrees celsius
daily total rainfall in mm
daily mean pressure in hectopascals
daily mean windspeed in knots
for daily total rainfall, sometimes the data is recorded as tr which means trace, this means there was less than 0.05mm of rainfall
for daily maximum relative humidity, a value above 95% means misty and foggy conditions
the weather stations in the large data set are:
leuchars in scotland
leeming in yorkshire
heathrow in london
hurn in dorset
camborne in cornwall
the data in the large data set is for the months may to october of 1987, and may to october of 2015
in the large data set, n/a means not available so the data is missing
the variables which are not suitable for a normal distribution are: