measures of central tendency + dispersion

    Cards (44)

    • measures of central tendency informs us about central (or middle) values for a set of data, they are 'averages' ways of calculating a typical value for a set of data, the average can be calculated in different ways each one appropriate for a different situation
    • mean: the mean is calculated by adding up all the data items and dividing them by the number of data items, it is properly called the arithmetic mean because it involves an arithmetic calculation, it can only be used w/ ratio and interval level data
    • median: the median is the middle value in an ordered list, all data items must be arranged in order and the central value is then the median
    • median: if there are an even number of data items there will be 2 central values, to calculate the median add the 2 data items and divide by 2, the median can be used w/ ratio, interval and ordinal data
    • mode: the mode is the value that is the most common data item, w/ nominal data it is the category that has the highest frequency count, w/ interval and ordinal data it is the data item that occurs most frequently
    • mode: to identify this the data items need to be arranged in order, the modal group is the group w/ the greatest frequency, if the 2 categories or data items have the same frequency the data have 2 modes i.e. are bi-modal
    • measures of dispersion: a set of data can also be described in terms of how dispersed or spread out the data items are, these descriptions are known as measures of dispersion
    • measures of central tendency include: mean, median and mode
    • measures of dispersion include: range and standard deviation
    • range: the range is the arithmetic distance between the top and bottom values in a set of data, it is customary to add 1 so e.g. with the 1st data set below the range would be 15-3+1, the addition of 1 is because the bottom number of 3 could represent a value as low as 2.5 and the top number 15 could represent a number as big as 15.5
    • range: the 2 sets of numbers have the same mean but a different range so the range is helpful as a further method of describing the data, if we just used the mean the data would appear to be the same
    • standard deviation: there is a more precise method of expressing dispersion called the standard deviation, this is a measure of the average distance between each data item above and below the mean, ignoring plus or minus values
    • standard deviation: it is usually worked out using a calculator, the standard deviations for the 2 sets of numbers in the textbook are 3.69 and 4.45 respectively (worked out using a calculator), you won't be asked to calculate a standard deviation in the exam
    • there are 4 levels of measurement they are nominal, ordinal, interval and ratio
    • nominal: data are in separate categories such as grouping people according to their favourite football team (e.g. Liverpool, Oxford United etc.)
    • ordinal: data are ordered in some way e.g. asking people to put a list of football teams in order of liking, Liverpool might be 1st followed by Oxford United etc etc.
    • ordinal: the difference between each item is not the same i.e. the individual may like the first item alot more than the second but there might only be a small difference between the items ranked second and third
    • interval: data are measured using units of equal intervals such as when counting correct answers or using any 'public' unit of measurement, many psychological studied use 'plastic interval scales' where the intervals are arbitrarily determined and we can't therefore know for certain that there are equal intervals between the numbers
    • interval: however for the purpose of analysis such data may be accepted as interval
    • ratio: there is a true 0 point as in most measures of physical quantities
    • mean S: the mean is the most sensitive measure of central tendency because it takes account of the exact distance between all the values of all the data
    • mean L: this sensitivity means that it can be easily distorted by one (or a few) extreme values and thus end up being misrepresentative of the data as a whole
    • mean L: it can't be used w/ nominal data
    • mean L: it does not make sense to use it when you have discrete values such as average number of legs
    • therefore the mean is not always representative of the data as a whole and should always be considered alongside the standard deviation
    • median S: the median is not affected by extreme scores
    • median S: it is appropriate for ordinal (ranked) data
    • median S: it can be easier to calculate than the mean
    • median L: the median is not as 'sensitive' as the mean because the exact values are not reflected in the final calculation
    • the median therefore has strengths in that it can be used to describe a variety of data sets, including skewed data and non-normal distributions
    • mode S: the mode is also unaffected by extreme values
    • mode S: it is much more useful for discrete data
    • mode S: it is the only method that can be used when the data are in categories i.e. nominal data
    • mode L: it is not a useful way of describing data when there are several modes
    • mode L: it also tells us nothing about the other values in a distribution
    • as w/ all 3 measures of central tendency, the key is to use the mode only w/ data sets for which it is appropriate
    • range S: the range is easy to calculate
    • range L: it is affected by extreme values
    • range L: it fails to take account of the distribution of the numbers e.g. it doesn't indicate whether most numbers are closely grouped around the mean or spread out evenly
    • the range is useful for ordinal data or w/ highly skewed data or when making a quick calculation
    See similar decks