Effect size, error and power week 7

Cards (43)

  • A non-parametric test (sometimes called a “distribution free test”) does not assume anything about the underlying distribution of the data
  • Typically, we use non-parametric tests when we do not have normally distributed data
  • When analysing markets, a range of assumptions are made about the rationality of economic agents involved in the transactions
  • The Wealth of Nations was written
    1776
  • Examples of Non-Parametric Tests

    • Sign
    • Mann-Whitney U
    • Wilcoxon signed-rank
    • Kruskal-Wallis
    • Friedman
  • Different experiments have different numbers of conditions
  • Sign, Mann-Whitney U, and Wilcoxon signed-rank tests can only examine two conditions
  • Kruskal-Wallis and Friedman tests can examine three conditions but can only tell you that two conditions differ (not which two)
  • Different designs require different tests as different assumptions can be made
  • Wilcoxon signed-rank, sign, and Friedman tests make use of the fact that a repeated measures design has been used, so they can be more powerful
  • A sign test is a binomial test that is used to determine if there is a median difference between paired or matched observations
  • A sign test is similar to a t-test, but instead of mean scores, median scores are used
  • A sign test is used as an alternative to the paired-samples t-test or Wilcoxon signed-rank test when the sample distribution is neither normal nor symmetrical
  • A Wilcoxon signed rank test is another popular non-parametric test of a difference for matched or paired data
  • A Wilcoxon signed rank test is more powerful than a sign test as it also takes into account the magnitude of the observed difference
  • A Wilcoxon signed rank test is used as an alternative to the paired t-test when parametric assumptions have not been met
  • Wilcoxon Signed Rank Test
    • Non-parametric test of a difference for matched or paired data
    • More powerful than a sign test as it also takes into account the magnitude of the observed difference
    • Used as an alternative to the paired t-test when parametric assumptions have not been met
  • Hypotheses
    • Assumptions
  • Ranking Data
    1. Smallest score gets the smallest rank
    2. Assign temporary rank to ties, then work out an average rank
  • This is our p value for determining if there is a significant difference between our groups
  • The Standardized Test Statistic is our z value
  • Mann-Whitney U Test

    • Non-parametric test of a difference for between-groups data
    • Similar to a t-test, but instead of mean scores we are using median scores
    • Used as an alternative to the independent t-test
  • Significance tests are used to help us decide between the null and alternative hypotheses
  • Coolican (2004: 313): '“Significance tests are used to help us decide between the null and alternative hypotheses.”'
  • We are trying to find out whether the IV caused the change in the DV, or whether the results are just a fluke
  • p value
    The probability that I am rejecting the null hypothesis by mistake
  • Conventional cut-offs for reporting significance
    • 0.05 (less than 5% chance of error)
    • 0.01 (less than 1% chance of error)
    • 0.001 (less than 0.1% chance of error)
  • Example of a Type-1 Error: The effects of metal eating on memory
  • Type-2 Error: Accepting the null hypothesis when the alternative is true
  • Probability of a type-2 error referred to as beta (β)
  • Example of a Type-1 Error: A type-2 error resulting from too few observations
  • We can guard against type-2 errors by working with large samples and taking large numbers of observations from each person
  • Statistical Power: Increase your sample size
  • Having lots of trials in an experimental task or lots of items on a questionnaire increases the number of responses from each person
  • The probability that our test will identify an effect is known as Statistical Power
  • Importance of Statistical Power
    • Increase sample size
    • Less susceptibility to biasing influence of extreme scores
    • Results in a 'narrower' distribution of scores
    • Reduce variance in data
    • Reduce number of extreme scores
    • Increase significance threshold (alpha)
    • Increase effect size
  • Increasing Statistical Power
    1. Increase sample size
    2. Increase significance threshold
    3. Increase effect size
  • Increasing Statistical Power reduces the risk of a type-2 error but increases the risk of a type-1 error
  • Effect Size
    • Do we care about a 1% mean improvement in maths ability by forcing fish oil on children?
  • Effect sizes
    • Give additional information about the magnitude of an effect
    • Provide information about type-2 errors
    • Effect sizes are necessary to fully understand the value of results