issues of significance

Cards (61)

  • So far, you have learned about Descriptive statistics, How to calculate effect sizes, Confidence intervals, Significance level
  • In this chapter, you are going to learn about The relationship between power, effect size and probability levels, The factors influencing power, Issues surrounding the use of significance levels
  • Agenda
    • 7.1 Pitfalls of NHST
    • 7.2 Criterion Significance Levels
    • 7.3 Effect Sizes
    • 7.3.1 Cohen's d
    • 7.3.2 Pearson's r
    • 7.3.3 The odds ratio
    • 7.3.4 Effect Sizes compared to NHST
    • 7.4 Meta-Analysis
    • 7.5 Bayesian Approaches
    • 7.6 Power
    • 7.7 Factors Influencing Power
    • 7.8 GPower: Calculating Power
    • 7.9 Confidence Intervals
  • Pitfalls of NHST: Offers a rule-based frameworks for deciding whether to believe a hypothesis, Seems to provide an easy way to disentangle the 'correct' conclusion from the 'incorrect' one
  • Meehl: '"The almost universal reliance on merely refuting the null hypothesis is a terrible mistake, is basically unsound, poor scientific strategy, and one of the worst things that ever happened in the history of psychology."'
  • Misconception #1: A significant result means that the effect is important
  • Misconception #2: A nonsignificant result means that the null hypothesis is true
  • Misconception #3: A significant result means that the null hypothesis is false
  • NHST encourages all-or-nothing thinking (e.g. of p < 0.05 then an effect is significant, but if p > 0.05 it is not)
  • How different is a p-value of 0.051 from 0.75? Should 0.049 and 0.00001 be thought of as equally significant by report thing both as p<0.05?
  • Statistical significance is not always equated with importance
  • Statements reflecting view of antiSTATic
    • The evidence is equivocal; we need more research.
    • All the mean differences show a positive effect of antiSTATic; therefore, we have consistent evidence that antiSTATic works.
    • Four of the studies show a significant result (p < 0.05), but the other six do not. Therefore, the studies are inconclusive: some suggest that antiSTATic is better than placebo, but others suggest there's no difference. The fact that more than half of the studies showed no significant effect means that antiSTATic is not (on balance) more successful in reducing anxiety than the control
  • Looking at the confidence intervals rather than focusing on significance allows us to see the consistency in the data and not a bunch of apparently conflicting results
  • The conclusions from NHST depend on what the researcher intended to do before collecting data
  • Significant findings are about seven times more likely to be published than non-significant ones, leading to publication bias
  • Researcher degrees of freedom - a scientist has many decisions to make when designing and analyzing a study, which could be misused to exclude cases to make the result significant
    1. hacking
    Practices that lead to the selective reporting of significant p-value, most commonly trying multiple analyses and reporting only the one that yields significant results
  • HARKing
    Presenting a hypothesis that was made after data collection as though it were made before data collection
  • Ways to overcome the pitfalls of NHST
    • Effect Sizes
    • Meta-Analysis
    • Bayesian Estimation
    • Registration
    • Sense
  • Six Principles for Scientists Using NHST (Wasserstein & American Statistical Association, 2016)

    • Incompatibility with Null Hypothesis
    • Not probability of truth
    • Resist the all-or-nothing thinking
    • Don't P-hack
    • Don't confuse statistical significance with practical importance
    • P-value does not equal evidence
  • Pre-registration - the practice of making all aspects of your research process publicly available before data collection begins
  • Registered report - a submission to an academic journal that outlines an intended research protocol
  • Transparency and Openness Promotion (TOP) guidelines
    • Citations
    • Pre-registration of study protocols
    • Pre-registration of analysis protocols
    • Replication transparency with data
    • Analysis scripts
    • Design and Analysis plans
    • Research materials
    • replication
  • Effect size
    An objective and (usually) standardized measure of the magnitude of observed effect
  • Cohen's d
    A standardized effect size measure that expresses the difference between two means in terms of the pooled standard deviation
  • Pearson's r
    A standardized effect size measure that expresses the strength of the linear relationship between two variables
  • Odds ratio
    An effect size measure for categorical variables that expresses the relative odds of an outcome occurring in one group compared to another
  • Effect sizes are not affected by sample size, but it does affect how closely the sample effect size matches that of the population (the precision)
  • Interpretations based on effect sizes are more informative than those based solely on p-values
  • Two virtually identical means are deemed to be significantly different based on a p-value
  • Two experiments with identical means and standard deviations yield identical conclusions when using an effect size to interpret them (both studies had d = −0.667)
  • Two virtually identical means are deemed to be not very different at all based on an effect size (d = −0.003, which is tiny)
  • Pearson's r
    Effect size measure
  • Pearson's r
    • r = 0.10 (small effect): The effect explains 1% of the total variance
    • r = 0.30 (medium effect): The effect accounts for 9% of the total variance
    • r = 0.50 (large effect): The effect accounts for 25% of the variance
  • Odds Ratio
    Effect size for counts (frequency), categorical variables (yes or no)
  • The odds of a 'yes' response were 0.4 times as large to a singer as to someone who started a conversation
  • The odds of a 'yes' response were 2.5 times as large to a talker as to someone who sang
  • Effect sizes overcome many of the problems associated with NHST
  • Effect sizes are less affected than p-values by things like early or late termination of data collection, or sampling over a time period rather than until a set sample size is reached
  • There are still some researcher degrees of freedom (not related to sample size) that researchers could use to maximize (or minimize) effect sizes, but there is less incentive to do so because effect sizes are not tied to a decision rule in which effects either side of a certain threshold have qualitatively opposite interpretations