descriptive statistics

Cards (13)

  • descriptive statistics
    provide a summary of a set of data, drawn from a sample, that applies to a whole target population
    • they include measures of central tendency and measures of dispersion
  • measures of central tendency
    used to summarise large amounts of data into averages
    there are 3 types:
    • median
    • mode
    • mean
  • median
    the median is the central score in a list of rank-ordered scores
    advantages:
    • it is not effected by extreme scores
    • it is usually easier to calculate than the mean
    • the median can be used with ordinal data, unlike the mean
    disadvantages:
    • it is not as sensitive as the mean because not all the scores are used in the calculation
    • it can unrepresentative in a small set of data
  • mean
    the mid-point of the combined values of a set of data
    advantages:
    • it is the most accurate measure of central tendency as it uses the interval level of measurement, where the units of measurement are of equal size
    • uses all of the data in calculation
    disadvantages:
    • it is less useful if some scores are skewed, such as if there are some large or small scores
    • the mean score may not be one of the actual scores in the set of data
  • mode
    the most common number in a set of scores
    advantages:
    • it is less prone to distortion by extreme values
    • it sometimes makes more sense than the other measures
    disadvantages;
    • there can be more than one mode in set of data
    • it does not use all the scores
  • measures of dispersion
    provide measures of the variability of scores.
    they include:
    • the range
    • interquartile range
    • standard deviation
  • the range
    calculated by subtracting the lowest value from the highest value ina set of data
    advantages:
    • fairly easy and quick to work out
    • takes full account of extreme values
    disadvantages:
    • it can be distorted by extreme values
    • does not show whether data are clustered or spread evenly around the mean
  • standard deviation
    measure of the variability of a set of scores from the mean. the larger the standard deviation the larger the spread of scores will be.
    standard deviation is calculated by:
    • add all the scores together and divide by the number of scores to calculate
    • subtract the mean from each individual score
    • square each of these scores
    • add all the squared scores together
    • divide the sum of the squares by the number of scores minus 1. this is the variance
    • use calculator to work out the square root of the variance - standard deviation
  • standard deviation - advantages and disadvantages
    advantages:
    • it is a more sensitive dispersion measure than the range since all scores are used in its calculation
    • it allows for the interpretation of individual scores
    disadvantages:
    • it is more complicated to calculate
    • it is less meaningful if data are not normally distributed
  • presentation of quantitative data
    quantitative data can be presented in various ways, for example:
    • bar charts (not continuous)
    • histograms (continuous)
    • frequency polygon (line graph) (continuous)
    • pie charts
  • normal distribution
    for a given attribute most scores will be on or around the mean, with decreasing amounts away from the mean
    • data is symmetrical, normally forms a bell-shaped curve when plotted (equal amount of scores above and below the mean)
    several ways to check if data is normally distributed:
    • examine visually: look at the data to see if most scores are clustered around the mean
    • calculate measures of central tendency: calculate the mean, mode and median to see if they are similar
    • plot the frequency distribution: plot the data on a histogram to see if it forms a bell-shaped curve
  • skewed distribution
    unless data is symmetrical it will be skewed
    • anomalies can cause skewed distributions
    • a positive skewed distribution occurring when there is a high extreme score
    • a negative skewed distribution occurring when there is a low extreme score
    • positively skewed distribution will contain more low than high scores
    • negatively skewed distribution will contain more high than low scores
    • if mean is lower than other 2 (median and mode) its negatively skewed
    • if mean is higher than other 2 its positively skewed
  • graphs
    bar charts:
    • categories and bards need to be separate (nominal categories)(words)
    histogram:
    • continuous data, bars together (numerical)
    frequency polygon:
    • line graph, good for comparing (numerical) (2 data sets)
    scatter graph:
    • correlation, relation, association