Data handling and analysis

Cards (67)

  • Qualitative data is non-numerical data that typically written in the form of words.
  • Strength of qualitative data - rich, descriptive and nuanced, qualitative data gives in-depth insights into people's behaviour, attitudes and beliefs, and form the starting point for almost all investigations.
  • limitation of qualitative data - discerning patterns and trends in qualitative data is difficult, and the conclusions cannot be empirically tested.
  • Quantitative data are numerical data, in the form or counts of measurements and scores which can be analysed with descriptive and inferential statistics.
  • Strengths for quantitative data - easy to analyse with statistics and present in visual form, allowing us to detect patterns in large data sets, and come to firm conclusions about the outcome of experiments.
  • Limitation of quantitative data - it lacks nuance, complex psychological variables are reduced to numbers which might be inappropriate.
  • primary data is data collected by the researcher themselves, for the purpose of the current study.
  • Strength of primary data is data collection is tailored to the hypothesis being tested, and the researcher can ensure that certain procedures are followed.
  • Limitation of primary data is that it is time-consuming to collect new data for every study.
  • Secondary data is data that was collected for some other purpose, and are being analysed in a new way by the researcher. The data may have been collected in the course of other scientific studies.
  • meta-analysis is a special type of secondary data study where findings from all exisiting studies on a question are combined to produce a single, more comprehensive result
  • Strengths of secondary data is the data can be reused for new purposes, and researchers can compare findings from many different research groups and across different cultures.
  • Limitation of secondary data is that the data collection is not controlled or standardises, so drawing conclusions can be risky
  • measure of central tendency describes the midpoint of data set, and are better know as averages. The different measures can be applied to all data sets, but depending on the situation, mean, median or mode may be more inforative.
  • the mean is the arithmetical average of a set of values, calculated by adding up all the values and dividing by the number of values.
  • Use the mean when you have precise measurements, e.g interval or ratio level, and your data are normally distributed, as the mean is strongly affected by skew and outliers.
  • The middle number in the data set. If you put all the values in the set in order smallest to largest, the middle one is the median.
  • Use the median when you have ranked choice data.
  • The most frquently occuring value in the data set is the mode, the most popular choice or score.
  • use the mode when you have categorical data and want to find the most common value in a set of data.
  • the range is the difference between the smallest and the largest value in the data set. It is calculated by subtracting the smallest value from the largest value and dividing by the largest value.
  • Use the range when you want a rough and ready estimate of the extremes of the values in your sample. Be careful, as the range is strongly affected by outliers - values which may be very unrepresentative of the sample as a whole.
  • The standard deviation is an arithmetical measure of the variability of a sample, which is calculated using every value in the data set, and represents the average distance from the mean of a value in the set.
  • Use the standard deviation when you would use the mean - that is, when you have precise, interval or ratio data and you want every value to contribute.
  • A positive correlation is when, as one value rises, the other value also tends to rise.
  • A negative correlation is when, as one value rises, the other falls.
  • A zero correlation is when the two variables have no relationship with one another, they move independently of one another.
  • The correlation coefficient is a number that tells you the strength and direction of a correlation, on a scale of -1 to 1, with zero indicating a zero correlation.
  • There are four ways to represent and diplay qualitative data: Pie charts, bar charts, histograms, and scatter graphs.
  • Pie charts are used to display percentages, allowing rapid assessment of the information. It's very useful to see the proportion of each category in a sample. Pie charts are much less useful when there are many possible responses.
  • Bar charts are used to display data from different categories, either the frequency, or a measure of central tendency or dispersion. Data from experiments are usually presented as bar charts, with the conditions of the IV forming, the different categories on the x-axis while the DV measure is on the y-axis.
  • Histograms are used to display frequency counts for CONTINUOUS data. On the x-axis must be a continuous variable, such as age, divided into contiguous categories. The y-axis is the count for each category.
  • Scattergrams are used to display the results of correlational analysis. Each axis represents one covariable, it doesn't matter which goes where.
  • A frequency distribution shows you how many observations you have at each value along the x-axis. The highest point on the graph is the most common value - the mode.
  • normal distribution curve, This is when the values in the data set have a symmetrical distribution around the midpoint, so that middling values are the most common, and higher and lower values are increasingly rare. A normal distribution can be described with just the mean and the standard deviation.
  • Positive skew is seen when the curve has a long ‘tail’ towards the higher values at the right of the graph. This is found when low values are the most common (the mode is the lowest value), but some values are very high which causes the mean to be higher.
  • Negative skew is when the curve has a long 'tail' towards the lower values at the left of the graph. This is found when the mode is the highest value and the mean is the lowest value.
  • There are three types of distribution curves: normal, positive, negative.
  • Types of correlation
    A) strong positive correlation
    B) no correlation
    C) weak negative correlation
  • There are three levels of measurement: nominal, ordinal and interval.