Inferential Statistics

Cards (24)

  • Inferential tests help psychologists decide if their results are significant — meaning, are the findings likely due to a real effect or just due to chance?
  • When choosing a Statistical Test, we first ask What is the research method?
    • Are we testing a relationship between variables? (correlation)
    • Or testing a difference between conditions? (experiment)
  • When choosing a statistical test, we secondly ask What is the experimental design?
    • Unrelated (independent measures): Different participants in each condition.
    • Related (repeated measures/matched pairs): Same participants (or matched participants) in both conditions.
  • When choosing a statistical test, we thirdly consider What is the level of measurement?
    • Nominal: Categories (e.g., male/female, yes/no).
    • Ordinal: Ranked data (e.g., 1st, 2nd, 3rd) or rating scales (e.g., 1–10).
    • Interval/Ratio: Standardised scales with equal intervals (e.g., time, temperature).
  • Chi-squared: Used when data is nominal and you are looking for a difference or association.
  • Sign Test: Used for nominal data in a repeated measures design when testing for a difference.
  • Spearman’s Rho: Used for ordinal data when looking for a correlation or relationship.
  • Mann-Whitney U: Used for ordinal data in independent measures experiments when looking for a difference.
  • Wilcoxon Test: Used for ordinal data in repeated measures experiments when looking for a difference.
  • Pearson’s r: Used for interval or ratio data when looking for a correlation between two variables.
  • Unrelated T-test: Used for interval or ratio data in independent measures experiments when looking for a difference.
  • Related T-test: Used for interval or ratio data in repeated measures experiments when looking for a difference.
  • Probability refers to how likely it is that a result happened by chance. We usually use a significance level of p ≤ 0.05.
    • This means there is less than or equal to a 5% probability that the results occurred by chance. In other words, we are 95% confident that the results are due to the effect we’re testing.
  • If p is less than or equal to 0.05, the result is said to be statistically significant.
  • If p is greater than 0.05, the result is not statistically significant, and we keep the null hypothesis.
  • Significance tells us whether the findings of a study are strong enough to support the idea that there is a real effect or relationship, rather than the results being due to chance.
    • If a result is significant (p ≤ 0.05), it means the evidence is strong enough to reject the null hypothesis.
  • In any study, we begin with two hypotheses:
    • The null hypothesis (H0): predicts no relationship or no difference between conditions.
    • The alternative hypothesis (H1): predicts there will be a relationship or difference between conditions.
  • We run a statistical test to calculate the p-value, and based on this we:
    • Reject the null hypothesis if p ≤ 0.05 (significant result).
    • Accept the null hypothesis if p > 0.05 (not significant).
  • To use a critical value table, you need three things:
    • The significance level (usually 0.05)
    • The type of hypothesis (one-tailed or two-tailed)
    • The number of participants (n) or degrees of freedom (df), depending on the test
  • If the observed value from your statistical test is greater than or equal to the critical value (or smaller, depending on the test), the result is significant.
  • Two-tailed tests are more conservative because they split the 5% significance level between both tails of the distribution (2.5% each).
  • Type I error (false positive):
    • This happens when we reject the null hypothesis when it was actually true and accept the alternative hypothesis.
    • In other words, we conclude there is a significant difference when there isn’t one.
  • Type II error (false negative):
    • This happens when we accept the null hypothesis when it was actually false and reject the alternative hypothesis.
    • In other words, we miss an effect that was really there.
  • Type I error is more likely if the significance level is too lenient (e.g., p = 0.10), while Type II error is more likely if the significance level is too strict (e.g., p = 0.01).
    • Psychologists favour the 5% level of significance as it best balances the risk of making a Type I or Type II error.