Descriptive statistics are used to summarise quantitative data numerically, allowing researchers to view the data as a whole and saving readers from needing to navigate through lots of results to get a basic understanding of the data.
Descriptive statistics typically include a measure of central tendency and a measure of dispersion, which are selected based on the type of data collected, and can also include percentages.
Measures of central tendency, such as the mean, tell us about the central, most typical, value in a data set and are calculated in different ways.
The mean is the most sensitive of all the measures of central tendency as it takes into consideration all values in the dataset.
The mean is calculated by adding all of the data together, and dividing the sum by how many values there are in total.
The value that is then given should be a value that lies somewhere between the maximum and minimum values of that dataset.
If it isn't, then there is a human error with the calculations!
In cases where there are extreme values in a data set, thus making it difficult to get a true representation of the data through using the mean, the median can be used instead.
The median is not affected by extreme scores, so is ideal when considering a data set that is heavily skewed.
The median is also easy to calculate, as the median takes the middle value within the data set.
The median score for this data set is 71%, yet the mean score was 60.2%.
If there is an even number of values within the data set, there will be two values that fall directly in the middle.
To do this, the two middle scores are added together and then divided by two.
This value will then be the median score.
The mode in descriptive statistics refers to the value or score that appears most frequently within the data set.
Whilst easy to calculate, the mode can be quite misleading of the data set.
Imagine if the lowest value in the example data set (12%) appeared twice.
It wouldn't be truly representative of the whole data set; however, this would be the mode score.
The range is calculated by subtracting the lowest score in the data set from the highest score in the data set.
If the standard deviation value is to quite small, this suggests that the values are very concentrated around the mean, and that everyone scored relatively similarly to one other.
A much more informative measure of dispersion is the standard deviation.
Measures of dispersion are descriptive statistics that define the spread of data around a central value (mean or median).
For example, if there are two conditions comparing the effects of revision vs
no revision on test scores, a psychologist could provide the percentage of participants who performed better having revised, to give a rough idea of the findings of the study.
The standard deviation score takes into consideration all of the values within the data set, and is a very precise measurement.
If there are more than two, then the dataset is considered multi-modal.
IF the standard deviation is large , this suggests that the data is very dispersed around the mean.
The standard deviation looks at how far the scores deviate from the mean.
A strength of using the mode is that it can be used on categorical data, whilst the mean and median cannot.
The bottom number in the formula should always be the total number in question (such as total number of participants, or total possible score), with the top number being the number that meets the specific criteria (such as participants who improved, or a particular score achieved).
Providing percentages in the summary of a dataset can help the reader get a feel for the data at a glance, without needing to read all of the results.
In order to calculate a percentage, the following calculation would be used: Number of participants who improved * 100 Total number of participants.
However, it is also possible that there is no mode for a data set, if all of the values are different.
If participants were asked to identify the way that they travelled to work each day, and gave answers such as 'car', 'bus', or 'walk, then a mode could still be identified for this set of data, as it is simply the response that was given most often.
However, in the same way as the mean, the fact that it takes into account every value means that it can be easily distorted by an extreme value which could, in turn, mean that it misrepresents the data.
In order to calculate a percentage increase, firstly the difference, i.e increase, between the two numbers being compared must be calculated.
If there are two, then the data set is called bi-modal.
It is possible that a data set can have more than one mode.
There are two measures of dispersion: range and standard deviation.
descriptive statistics allows the reseacher to view data as a whole because it summarises data numerically
Mode —> appears most freuquently
+ easy to calculate
can be mislleading and not representative
+ used on categorical data (nominal) whereas median and mdoe can't