Primary data refers to original data that has been collected first-hand, by the researcher specifically towards a research aim
this is data which has not been published before (conducting your own experiment, questionnaire or interview to test your hypothesis)
Secondary data
Secondary data refers to data collected second-hand, by someone else, not specifically created for the purpose of the study, which has been published before (government statistics, internet, journals)
one form of secondary data is meta-analysis. This is a process where researchers collect and collate a wide range of previously conducted research on a specific area. The data is statistically tested to provide an overall conclusion
Evaluation of primary and secondary data
S - primary data is more valid than secondary data as it has been specifically created for the purpose of the hypothesis and has been controlled
W - primary data is very expensive and time consuming to conduct, in comparison with secondary data which is quicker to collect and analyse. As it is time consuming, the sample may be smaller which means its not generalisable
Descriptive statistics
descriptive statistics allow us to describe and summarise quantitative data. There are two types of summaries of results in descriptive statistics
measures of central tendency
measures of dispersion
Measures of central tendency
this is a mathematical way to describe a typical or average score from a data set (such as using the mode, median or mean)
Measures of dispersion
this is a mathematical way to describe how spread/consistent the scores in a data set are (such as the range and standard deviation)
higher MOD shows the scores are less consistent
lower MOD shows scores are more consistent
Measures of central tendency
a measure of central tendency reduces a large amount of data (this is called the raw data) to a single value which is representative of that set of data
there are 3 measures of central tendency;
mean - this is evenly sharing out data evenly - also known as the statistical average
median - this is the central value (middle number)
mode - this is the most frequently occurring (common) score in the data set
Mean
the mean is the statistical average of a set of data. It is calculated by adding all the scores together and dividing by the number of values
advantages - it uses all the scores and is therefore more representative of the data than other measures
disadvantages - it can be distorted by extreme scores (scores which are much higher or lower compared to the others) making it unrepresentative. These scores are called outliers or anomalies
Median
the median is the central (middle) value of a set of data. It is calculated by firstly putting into rank (ascending) order and then finding the middle score. If there is an even number of scores, add the two middle scores together and divide by 2
advantages - it is unaffected by extreme values. Therefore if a set of data has extreme values, the median would be a more appropriate measure of central tendency. Its easier to calculate the median
disadvantages - it only takes into account one or two scores (the middle values)
Mode
the mode is the score that occurs the most often. It is calculated by a frequency count
advantages - it is unaffected by extreme values. Therefore if a set of data has extreme values, the mode would be a more appropriate measures of central tendency. It's easy to calculate the mode
disadvantages - it is not useful in small sets of data or when there are too many modes. It doesn't take into account the other scores
Summary of measures of central tendency
-mean - this is the average. It's used for interval data. It includes all the scores but is affected by extreme scores
-median - this is the middle value. It's used for ordinal data. It is unaffected by extreme scores but doesn't include all the scores
-mode - this is the most common value. It's used for nominal data. It's unaffected by extreme scores but doesn't include all the scores
Measures of dispersion
this includes the range and standard deviation. These both show the spread of data (how similar or different each participant scores)
Range
this is taking away the lowest score from the highest score
if the range is high, then this shows there is a wide spread of data (everyone scores differently - some scoring high, some scoring low), showing the test is not effective for some but very effective for others
if the range is low, it shows the results are similar so the test is equally effective
Standard deviation
this is linked to the mean and considers all the data, SD is a number which reflects the spread of scores and how far these deviate either side if the mean (how far these are scattered around the mean). It is best used when comparing the consistency of 2 sets of data
if the SD value is large, many of the data points are far away from the mean and participants scores were inconsistent
if the SD is small, the data was tightly clustered around the mean and participants scored similarly
this is a more precise method of expressing dispersion as it takes every score into account
Graphs
this is the analysis and interpretation of quantitative data
graphs and charts enable a reader to look over and help illustrate patterns in data
correlations include scatter grams, but there are other types of graphs
the main types are;
bar chart
histogram
Bar charts
these show data when the x-axis (IV) is in the form of categories that the researchers wishes to compare
categories are placed on the x-axis
the columns of the bars should be the same width and separated by spaces
Histograms
histograms are used when the x-axis (IV) consists of continuous data
these continuous values should increase on the x-axis
the frequency is then shown on the y-axis
there are no spaces between the bars
the column width should be the same
it is the area of each bar that gives the frequency for the interval
Tables
these can either be;
a results table, which is the main findings of the study
or a data table, which is the raw scores from the research study
this table must have a title and the columns and rows have to be labelled clearly (units used)
Distributions
if we measure certain variables, we can measure how the data is distributed between participants
distributions are measured on a bell curve. This can either be a normal or skewed distribution
Normal distributions
a normal distribution bell curve is where the mean, mode and median are all located on the highest peak, so it creates a symmetrical spread of frequency data
(the majority got a medium score)
Skewed distributions
not all distributions are balanced, some data produces skewed distributions (this is where the bell curve appears to lean to one side)
Positive skew
this is where the distribution is concentrated to the left of the graph, with the tail pointing to the right
the mode is the highest peak, then the median the second highest and then the mean
usually showing the participants didn't perform well on the thing that was being measured
however, the presence of outliers at the high end, positively skews the data
Adjusting positive skew
to make a more normal distribution;
we need to adjust the difficulty of the test, making the test easier for participants
so more get a higher mark, to make it a more normal distribution so most participants get a medium score
Negative skew
this is where the distribution is concentrated to the right of the graph and the tail is to point towards the left
the mode is the highest peak, then the median second highest and then the mean
usually showing the participants performed well on the thing that is being measured as the mode is to the right of the graph
however, the presence of outliers at the low end negatively skews the data
Adjusting negative skew
we would need to adjust the difficulty of the test, making it more difficult
so fewer people get higher marks and it becomes a more normal distribution, so more participants get a middle score