7.2 Data handling and analysis

Cards (42)

  • Quantitative data is numerical
  • Qualitative data is non-numerical
  • Content analysis is used to analyse qualitative data
  • The process for content analysis is:
    • A sample of qualitative data is collected
    • Relevant coding units are identified and operationalised
    • The data is analysed according to these coding units to produce quantitative metrics
    • Statistical analysis is carried out on this data
  • Primary data is original data collected for a study
  • Secondary data is data from a previous study that has already been conducted
  • A meta-analysis is a study of studies that invovles taking several smaller studies within a certain research area and using statistics to identify similarity and trends within these studies to create a larger study
  • Meta-analyses are often more reliable than a single study as it is based on a larger data set, which means issues with an individual study can be balanced out by other studies
  • Mean, median and mode are measures of central tendency
  • Measures of central tendency are ways of reducing large data sets into averages
  • The mean is calculated by adding all the numbers in a set together and dividing the total by the number of numbers
  • The median is calculated by arranging all the numbers in a set from smallest to largest, and then finding the middle number in the set.
    If the total number of numbers is even, take the midpoint between the two numbers in the middle.
  • The mode is calculated by counting the most commonly occuring number in a data set
  • Range and standard deviation are measures of dispersion
  • Measures of dispersion quantify how much scores in a data set vary
  • The range is calculated by subtracting the smallest number in a data set from the largest number
  • The standard deviation is a measure of how many numbers in a data set deviate from the mean (average)
  • The standard deviation is calculated as:
    1. Calculate the mean in a data set
    2. Subtract the mean from each number in the set
    3. Square the numbers in the set
    4. Add the numbers together
    5. Divide the result by the number of numbers
    6. Square root this number- this is the standard deviation
  • To calculate percentage change:
    1. Subtract the result from the original number
    2. Divide the difference by the original number
    3. Multiply this result by 100
  • A data set that has a normal distribution will have the majority of scores on or near the mean average.
  • A normal distribution is symmetrical- there are an equal number of scores above the mean as below it
  • In a normal distribution, scores become rarer the more the deviate from the mean
  • An example of a normal distribution is IQ scores
  • When plotted on a histogram, data that follows a normal distribution will form a bell-shaped curve
  • A data set that has a skewed distribution will not be symmetrical and scores are not distributed evenly either side of the mean
  • Skewed distributions are caused by outliers (scores that throw off the mean)
  • A positively skewed distribution means the mean is much higher than most of the scores, so most scores are below the mean
    Mean > Median > Mode
  • A negatively skewed distribution means the mean is much lower than most of the scores, so most scores are above the mean
    Mean < Median < Mode
  • Correlation refers to how closely related two or more things are
  • Correlations are measured mathematically using correlation coefficients
  • A correlation coefficient is a value between 1 and -1
  • A correlation of +1 means that two things are perfectly positively correlated
  • A correlation of -1 means that two things are perfectly negatively correlated
  • A correlation of 0 means that two things are not correlated at all- they are totally independent
  • A negative correlation means that when one value goes up, the other goes down by the same amount
  • A positive correlation means that when one value goes up, so does the other by the same amount
  • Tables are used to present raw data or summarise results
  • A scattergram illustrates two variables at various data points
  • A bar chart is used for discrete (separate) data categories for comparison
  • A histogram is used to illustrate continuous or interval data