Inferential tests help psychologists decide if their results are significant — meaning, are the findings likely due to a realeffect or just due to chance?
When choosing a Statistical Test, we first ask What is the research method?
Are we testing a relationship between variables? (correlation)
Or testing a difference between conditions? (experiment)
When choosing a statistical test, we secondly ask What is the experimental design?
Unrelated (independent measures): Different participants in each condition.
Related (repeated measures/matched pairs): Same participants (or matchedparticipants) in both conditions.
When choosing a statistical test, we thirdly consider What is the level of measurement?
Nominal: Categories (e.g., male/female, yes/no).
Ordinal: Ranked data (e.g., 1st, 2nd, 3rd) or rating scales (e.g., 1–10).
Interval/Ratio: Standardised scales with equal intervals (e.g., time, temperature).
Chi-squared: Used when data is nominal and you are looking for a difference or association.
Sign Test: Used for nominal data in a repeated measures design when testing for a difference.
Spearman’s Rho: Used for ordinal data when looking for a correlation or relationship.
Mann-Whitney U: Used for ordinal data in independent measures experiments when looking for a difference.
Wilcoxon Test: Used for ordinal data in repeated measures experiments when looking for a difference.
Pearson’s r: Used for interval or ratio data when looking for a correlation between two variables.
Unrelated T-test: Used for interval or ratio data in independent measures experiments when looking for a difference.
Related T-test: Used for interval or ratio data in repeated measures experiments when looking for a difference.
Probability refers to how likely it is that a result happened by chance. We usually use a significance level of p ≤ 0.05.
This means there is less than or equal to a 5% probability that the results occurred by chance. In other words, we are 95%confident that the results are due to the effect we’re testing.
If p is less than or equal to 0.05, the result is said to be statistically significant.
If p is greater than 0.05, the result is not statistically significant, and we keep the null hypothesis.
Significance tells us whether the findings of a study are strong enough to support the idea that there is a realeffect or relationship, rather than the results being due to chance.
If a result is significant (p ≤ 0.05), it means the evidence is strong enough to reject the null hypothesis.
In any study, we begin with two hypotheses:
The null hypothesis (H0): predicts no relationship or no difference between conditions.
The alternative hypothesis (H1): predicts there will be a relationship or difference between conditions.
We run a statistical test to calculate the p-value, and based on this we:
Reject the null hypothesis if p ≤ 0.05 (significant result).
Accept the null hypothesis if p > 0.05 (not significant).
To use a critical value table, you need three things:
The significance level (usually 0.05)
The type of hypothesis (one-tailed or two-tailed)
The number of participants (n) or degrees of freedom (df), depending on the test
If the observed value from your statistical test is greater than or equal to the critical value (or smaller, depending on the test), the result is significant.
Two-tailed tests are more conservative because they split the 5% significance level between both tails of the distribution (2.5% each).
Type I error (false positive):
This happens when we reject the null hypothesis when it was actually true and accept the alternative hypothesis.
In other words, we conclude there is a significantdifference when there isn’t one.
Type II error (false negative):
This happens when we accept the null hypothesis when it was actually false and reject the alternative hypothesis.
In other words, we miss an effect that was really there.
Type I error is more likely if the significance level is too lenient (e.g., p = 0.10), while Type II error is more likely if the significance level is too strict (e.g., p = 0.01).
Psychologists favour the 5% level of significance as it best balances the risk of making a Type I or Type IIerror.