Descriptive Statistics

Cards (25)

  • Primary data
    Primary data refers to original data that has been collected first-hand, by the researcher specifically towards a research aim
    • this is data which has not been published before (conducting your own experiment, questionnaire or interview to test your hypothesis)
  • Secondary data
    Secondary data refers to data collected second-hand, by someone else, not specifically created for the purpose of the study, which has been published before (government statistics, internet, journals)
    • one form of secondary data is meta-analysis. This is a process where researchers collect and collate a wide range of previously conducted research on a specific area. The data is statistically tested to provide an overall conclusion
  • Evaluation of primary and secondary data
    S - primary data is more valid than secondary data as it has been specifically created for the purpose of the hypothesis and has been controlled
    W - primary data is very expensive and time consuming to conduct, in comparison with secondary data which is quicker to collect and analyse. As it is time consuming, the sample may be smaller which means its not generalisable
  • Descriptive statistics
    descriptive statistics allow us to describe and summarise quantitative data. There are two types of summaries of results in descriptive statistics
    • measures of central tendency
    • measures of dispersion
  • Measures of central tendency
    this is a mathematical way to describe a typical or average score from a data set (such as using the mode, median or mean)
  • Measures of dispersion
    this is a mathematical way to describe how spread/consistent the scores in a data set are (such as the range and standard deviation)
    • higher MOD shows the scores are less consistent
    • lower MOD shows scores are more consistent
  • Measures of central tendency
    a measure of central tendency reduces a large amount of data (this is called the raw data) to a single value which is representative of that set of data
    there are 3 measures of central tendency;
    • mean - this is evenly sharing out data evenly - also known as the statistical average
    • median - this is the central value (middle number)
    • mode - this is the most frequently occurring (common) score in the data set
  • Mean
    the mean is the statistical average of a set of data. It is calculated by adding all the scores together and dividing by the number of values
    • advantages - it uses all the scores and is therefore more representative of the data than other measures
    • disadvantages - it can be distorted by extreme scores (scores which are much higher or lower compared to the others) making it unrepresentative. These scores are called outliers or anomalies
  • Median
    the median is the central (middle) value of a set of data. It is calculated by firstly putting into rank (ascending) order and then finding the middle score. If there is an even number of scores, add the two middle scores together and divide by 2
    • advantages - it is unaffected by extreme values. Therefore if a set of data has extreme values, the median would be a more appropriate measure of central tendency. Its easier to calculate the median
    • disadvantages - it only takes into account one or two scores (the middle values)
  • Mode
    the mode is the score that occurs the most often. It is calculated by a frequency count
    • advantages - it is unaffected by extreme values. Therefore if a set of data has extreme values, the mode would be a more appropriate measures of central tendency. It's easy to calculate the mode
    • disadvantages - it is not useful in small sets of data or when there are too many modes. It doesn't take into account the other scores
  • Summary of measures of central tendency
    -mean - this is the average. It's used for interval data. It includes all the scores but is affected by extreme scores
    -median - this is the middle value. It's used for ordinal data. It is unaffected by extreme scores but doesn't include all the scores
    -mode - this is the most common value. It's used for nominal data. It's unaffected by extreme scores but doesn't include all the scores
  • Measures of dispersion
    this includes the range and standard deviation. These both show the spread of data (how similar or different each participant scores)
  • Range
    this is taking away the lowest score from the highest score
    • if the range is high, then this shows there is a wide spread of data (everyone scores differently - some scoring high, some scoring low), showing the test is not effective for some but very effective for others
    • if the range is low, it shows the results are similar so the test is equally effective
  • Standard deviation
    this is linked to the mean and considers all the data, SD is a number which reflects the spread of scores and how far these deviate either side if the mean (how far these are scattered around the mean). It is best used when comparing the consistency of 2 sets of data
    • if the SD value is large, many of the data points are far away from the mean and participants scores were inconsistent
    • if the SD is small, the data was tightly clustered around the mean and participants scored similarly
    • this is a more precise method of expressing dispersion as it takes every score into account
  • Graphs
    this is the analysis and interpretation of quantitative data
    • graphs and charts enable a reader to look over and help illustrate patterns in data
    • correlations include scatter grams, but there are other types of graphs
    the main types are;
    • bar chart
    • histogram
  • Bar charts
    these show data when the x-axis (IV) is in the form of categories that the researchers wishes to compare
    • categories are placed on the x-axis
    • the columns of the bars should be the same width and separated by spaces
  • Histograms
    histograms are used when the x-axis (IV) consists of continuous data
    • these continuous values should increase on the x-axis
    • the frequency is then shown on the y-axis
    • there are no spaces between the bars
    • the column width should be the same
    • it is the area of each bar that gives the frequency for the interval
  • Tables
    these can either be;
    • a results table, which is the main findings of the study
    • or a data table, which is the raw scores from the research study
    • this table must have a title and the columns and rows have to be labelled clearly (units used)
  • Distributions
    if we measure certain variables, we can measure how the data is distributed between participants
    • distributions are measured on a bell curve. This can either be a normal or skewed distribution
  • Normal distributions
    a normal distribution bell curve is where the mean, mode and median are all located on the highest peak, so it creates a symmetrical spread of frequency data
    (the majority got a medium score)
  • Skewed distributions
    not all distributions are balanced, some data produces skewed distributions (this is where the bell curve appears to lean to one side)
  • Positive skew
    this is where the distribution is concentrated to the left of the graph, with the tail pointing to the right
    • the mode is the highest peak, then the median the second highest and then the mean
    • usually showing the participants didn't perform well on the thing that was being measured
    • however, the presence of outliers at the high end, positively skews the data
  • Adjusting positive skew
    to make a more normal distribution;
    • we need to adjust the difficulty of the test, making the test easier for participants
    • so more get a higher mark, to make it a more normal distribution so most participants get a medium score
  • Negative skew
    this is where the distribution is concentrated to the right of the graph and the tail is to point towards the left
    • the mode is the highest peak, then the median second highest and then the mean
    • usually showing the participants performed well on the thing that is being measured as the mode is to the right of the graph
    • however, the presence of outliers at the low end negatively skews the data
  • Adjusting negative skew
    we would need to adjust the difficulty of the test, making it more difficult
    • so fewer people get higher marks and it becomes a more normal distribution, so more participants get a middle score