Descriptive statistics

Cards (24)

  • Nominal data:
    categorical data. To help you remember that data are in named categories, think NOM, the French for 'name’. E.g grouping people according to their favourite subject
  • Ordinal data:
    the results are points from a scale. To help you to remember that ORDinal data means points in ORDer along a scale. E.g. Putting subjects in order of liking.
  • Interval data:
    data is measured using units of equal intervals. NO TRUE ZERO POINT. E.g., Temperature
  • Nominal Data
    • Data is allocated to mutually exclusive categories and is discrete as it can only appear in one category.
    • Data is in the form of frequencies.
    • Category labels are names so there is no order.
    • Simplest level of data, Eg. Smoker OR Non-Smoker
    • Measure of Central Tendency = Mode
    • Presented by: Table, tallies, bar chart or pie chart
  • Ordinal Data
    • Data is ordered in some way.
    • Data is often in the form of a scale which consists of ratings/rankings.
    • Using a scale allows you to make statements about size of scores but the extent of comparison is limited because the intervals between units are not equal.
    • Better than nominal data but lacks precision because it is based on subjective opinion.
    • Eg. On a scale of 1 – 10 how much do you like psychology?
    • Measure of Central Tendency = Median
    • Measure of Dispersion = Range
  • Interval Data
    • Like ordinal data but based on numerical scales that include units of equal, precisely defined size.
    • Better than ordinal data because we use public scales of measurement that produce data based on accepted units of measurement.
    • Eg. Temperature, Time, Weight.
    • The 20 degree difference between 10-30 celsius and 50-70 celsius is known to be equivalent because this scale has equal, precisely defined units. As a result we can add and subtract these values, but we cannot multiple or divide them.
    • Measure of Central Tendency = Mean
    • Measure of Dispersion = Standard Deviation
  • Descriptive statistics
    • The information collected in any study is called data.
    • This comes in two forms; qualitative and quantitative.
    • We are going to focus on the latter; numerical data, considering the various ways that we can analyse this data to draw meaningful conclusions.
    • This is known as descriptive statistics; which includes measures of central tendency, measures of dispersion and also graphs!
  • Measures of central tendancy
    • These measures are ‘averages’.
    • They give us the most typical values in a set of data.
    • There average can be calculated in 3 different ways: mean, median, mode
  • Mean - add up and divide by total number
    • most informative and can only be used with interval data, measured on a standardised scale
    • Mean is most sensitive as it includes all scores/values in the data set within it’s calculation. It is therefore more representative of the data as a whole.
  • Median - middle value when data is ordered.
    • cannot use with nominal data but can use with ordinal and interval data
    • Median is less sensitive and therefore not impacted by extreme scores
  • Mode - most common
    • only measure of central tendency you can use with nominal data / categorical data
    • Mode is easy to calculate but not representative of whole data set.
  • Measures of central dispersion
    • Measures of Dispersion can also be used to analyse data. They are based on the spread of scores. How far scores vary and differ from one another.
    • The measures of dispersion that you need to know about are: Range, Standard deviation
  • Range
    • tells us the difference between the top and bottom values in a set of data.
    • It is customary to add 1 as it allows for the fact that raw scores are sometimes rounded up or down when they are recorded.
  • Range:
    Strengths
    • Easy to calculate
    Limitations
    • Affected by extreme values
    • Fails to account for distribution of the numbers i.e. whether the score are closely distributed around the mean or more spread out
  • Standard deviation
    • Standard deviation is a more precise measure of dispersion and gives us a single value that tells us how far score deviate (move away from) the mean.
    • A large SD means there is a large spread of data around the mean – therefore suggesting not all p’s were affected in the same way by the IV
    • Small SD means the data is clustered closer to the mean – implying that all p’s responded in a fairly similar way.
  • Example SD
  • Standard deviation:
    Strengths
    • Takes into account all scores.
    • More precise measure of SD and is not difficult to work out with a calculator.
    Limitations
    It may hide some of the characteristics of the data set e.g extreme scores
  • Ways to display quantitative data and data distribution
    • Tables (For pre-analysis raw data, summarised data (SD, mean, range,ect)
    • Bar charts (For noncontinuous data, columns can't touch)
    • Histograms (For continuous data, each column shows class interval)
    • Scattergrams (For correlational relationships)
  • Data distribution
    • If you measure certain variables, such as height of all the people in sixth form, the frequency of these measurements should form a bell-shaped curve. This is called a normal distribution which is symmetrical.
    • Within a normal distribution, most people (or items) are located in the middle area with few at the extreme ends. The mean, median and mode all occupy the same midpoint of the curve.
  • Data distributions
    • Not all distributions have such a balanced symmetrical pattern.
    • Instead some may produce skewed distributions – this is when the distribution appears to lean to one side or another.
    • Positive skew is where distribution is concentrated towards the left, resulting in a long tail on the right.
    • E.g. a very hard test where most students scored low marks and few people scored high – this would cause a positive skew.
    • Negative skew is caused by the opposite e.g. an easy test where many scored highly and few got low marks.
  • Which skew:
    mean first, then median, then mode
    Negative
  • Which skew:
    Mode first, then median, then mean
    Positive
  • Characteristics of Normal Distributions
    • Mean, median and mode are all in the same midpoint
    • Distribution is symmetrical about the midpoint
    • Dispersion of scores of measurement either side of midpoint if consistent and can be expressed in standard deviations
  • Characteristics of Skewed Distributions
    • Negative skew: Mean < median < mode
    • Positive skew Mean > median > mode
    The distribution is not symmetrical