yr1 chap 1 data collection

Cards (38)

  • a census observes or measures every member of a population
  • a sample observes or measures a subset of a population which has been selected to represent the whole population
  • advantages of a census:
    • completely accurate result
  • disadvantages of a census:
    • time consuming
    • expensive
    • cannot be used if the testing process destroys the item
    • hard to process large amounts of data
  • advantages of a sample:
    • less time consuming
    • less expensive
    • fewer people have to respond
    • easier to process smaller amounts of data
  • disadvantages of a sample:
    • result is less accurate
    • sample may be too small to accurately represent sub-groups of the population
  • the size of the sample affects the result as the larger the sample, the more accurate the result, but a larger sample requires more resources
  • a sampling frame is a list of indiviually named or numbered sampling units
  • random sampling means every member of the population has equal chance of being selected, this makes the sample representative and removes bias
  • simple random sampling:
    requires a sampling frame, numbers are randomly generated by a calculator, computer, random number table or lottery
  • the three types of random sampling are simple random, systematic, and stratified
  • systematic sampling:
    requires an ordered list, items are chosen at regular intervals from a random starting place
  • stratified sampling:
    requires the population to be divided into mutually exclusive strata, a random sample is taken from each stratum
  • advantages of simple random sampling:
    • removes bias
    • easy and cheap for small populations and samples
    • each sampling unit has an equal chance of selection
  • disadvantages of simple random sampling:
    • not suitable for large populations and samples
    • requires a sampling frame
  • advantages of systematic sampling:
    • simple and quick to use
    • suitable for large populations and samples
  • disadvantages of systematic sampling:
    • requires a sampling frame
    • can introduce bias if sampling frame is not random
  • advantages of stratified sampling:
    • accurately reflects the population structure
    • proportional representation of groups
  • disadvantages of stratified sampling:
    • requires population to be divided into strata
    • selection within strata has the same issues as simpple random sampling
  • the two types of non-random sampling are quota and opportunity
  • quota sampling:
    population is divided into groups based on a characteristic, the researcher meets people, assesses their group and interviews them, until all groups have been filled
  • opportunity sampling:
    sample is taken from the first required number of people who are available and fit the criteria
  • advantages of quota sampling:
    • small sample can be representative
    • no sampling frame required
    • quick, easy and cheap
    • different groups can be easily compared
  • disadvantages of quota sampling:
    • non-random so can introduce bias
    • population must be divided into groups, requires time and money
    • non-responses are ignored instead of recorded
  • advantages of opportunity sampling:
    • easy
    • inexpensive
  • disadvantages of opportunity sampling:
    • unlikely to be representative
    • dependent on individual researcher
  • quantitative data means data associated with numerical observations
  • qualitative data means data associated with non-numerical observations
  • a continuous variable is a variable which can take any value in a given range
  • a discrete variable is a variable which can only take specific values in a given range
  • the large data set has the following variables for UK weather stations:
    • daily mean temperature in degrees celsius
    • daily total rainfall in mm
    • daily total sunshine in hours
    • daily mean wind direction
    • daily mean wind speed in knots
    • daily maximum gust in knots
    • daily maximum relative humidity as a percentage
    • daily mean cloud cover in oktas
    • daily mean visibility in decametres
    • daily mean pressure in hectopascals
  • the large data set has the following variables for non-UK weather stations:
    • daily mean temperature in degrees celsius
    • daily total rainfall in mm
    • daily mean pressure in hectopascals
    • daily mean windspeed in knots
  • for daily total rainfall, sometimes the data is recorded as tr which means trace, this means there was less than 0.05mm of rainfall
  • for daily maximum relative humidity, a value above 95% means misty and foggy conditions
  • the weather stations in the large data set are:
    • leuchars in scotland
    • leeming in yorkshire
    • heathrow in london
    • hurn in dorset
    • camborne in cornwall
  • the data in the large data set is for the months may to october of 1987, and may to october of 2015
  • in the large data set, n/a means not available so the data is missing
  • the variables which are not suitable for a normal distribution are:
    • wind speed as the data is qualitative
    • rainfall as the data is not symettrical