issues of significance

Created by

Norhafida

Cards (61)

So far, you have learned about Descriptive statistics, How to calculate effect sizes, Confidence intervals, Significance level
View source
In this chapter, you are going to learn about The relationship between power, effect size and probability levels, The factors influencing power, Issues surrounding the use of significance levels
View source
Agenda 
7.1 Pitfalls of NHST
7.2 Criterion Significance Levels
7.3 Effect Sizes
7.3.1 Cohen's d
7.3.2 Pearson's r
7.3.3 The odds ratio
7.3.4 Effect Sizes compared to NHST
7.4 Meta-Analysis
7.5 Bayesian Approaches
7.6 Power
7.7 Factors Influencing Power
7.8 GPower: Calculating Power
7.9 Confidence Intervals
View source
Pitfalls of NHST: Offers a rule-based frameworks for deciding whether to believe a hypothesis, Seems to provide an easy way to disentangle the 'correct' conclusion from the 'incorrect' one
View source
Meehl: '"The almost universal reliance on merely refuting the null hypothesis is a terrible mistake, is basically unsound, poor scientific strategy, and one of the worst things that ever happened in the history of psychology."'
View source
Misconception #1: A significant result means that the effect is important
View source
Misconception #2: A nonsignificant result means that the null hypothesis is true
View source
Misconception #3: A significant result means that the null hypothesis is false
View source
NHST encourages all-or-nothing thinking (e.g. of p < 0.05 then an effect is significant, but if p > 0.05 it is not)
View source
How different is a p-value of 0.051 from 0.75? Should 0.049 and 0.00001 be thought of as equally significant by report thing both as p<0.05?
View source
Statistical significance is not always equated with importance
View source
Statements reflecting view of antiSTATic
The evidence is equivocal; we need more research.
All the mean differences show a positive effect of antiSTATic; therefore, we have consistent evidence that antiSTATic works.
Four of the studies show a significant result (p < 0.05), but the other six do not. Therefore, the studies are inconclusive: some suggest that antiSTATic is better than placebo, but others suggest there's no difference. The fact that more than half of the studies showed no significant effect means that antiSTATic is not (on balance) more successful in reducing anxiety than the control
View source
Looking at the confidence intervals rather than focusing on significance allows us to see the consistency in the data and not a bunch of apparently conflicting results
View source
The conclusions from NHST depend on what the researcher intended to do before collecting data
View source
Significant findings are about seven times more likely to be published than non-significant ones, leading to publication bias
View source
Researcher degrees of freedom - a scientist has many decisions to make when designing and analyzing a study, which could be misused to exclude cases to make the result significant
View source
hacking 
Practices that lead to the selective reporting of significant p-value, most commonly trying multiple analyses and reporting only the one that yields significant results
View source
HARKing 
Presenting a hypothesis that was made after data collection as though it were made before data collection
View source
Ways to overcome the pitfalls of NHST
Effect Sizes
Meta-Analysis
Bayesian Estimation
Registration
Sense
View source
Six Principles for Scientists Using NHST (Wasserstein & American Statistical Association, 2016) 
Incompatibility with Null Hypothesis
Not probability of truth
Resist the all-or-nothing thinking
Don't P-hack
Don't confuse statistical significance with practical importance
P-value does not equal evidence
View source
Pre-registration - the practice of making all aspects of your research process publicly available before data collection begins
View source
Registered report - a submission to an academic journal that outlines an intended research protocol
View source
Transparency and Openness Promotion (TOP) guidelines
Citations
Pre-registration of study protocols
Pre-registration of analysis protocols
Replication transparency with data
Analysis scripts
Design and Analysis plans
Research materials
replication
View source
Effect size 
An objective and (usually) standardized measure of the magnitude of observed effect
View source
Cohen's d 
A standardized effect size measure that expresses the difference between two means in terms of the pooled standard deviation
View source
Pearson's r 
A standardized effect size measure that expresses the strength of the linear relationship between two variables
View source
Odds ratio 
An effect size measure for categorical variables that expresses the relative odds of an outcome occurring in one group compared to another
View source
Effect sizes are not affected by sample size, but it does affect how closely the sample effect size matches that of the population (the precision)
View source
Interpretations based on effect sizes are more informative than those based solely on p-values
View source
Two virtually identical means are deemed to be significantly different based on a p-value
View source
Two experiments with identical means and standard deviations yield identical conclusions when using an effect size to interpret them (both studies had d = −0.667)
View source
Two virtually identical means are deemed to be not very different at all based on an effect size (d = −0.003, which is tiny)
View source
Pearson's r 
Effect size measure
View source
Pearson's r 
r = 0.10 (small effect): The effect explains 1% of the total variance
r = 0.30 (medium effect): The effect accounts for 9% of the total variance
r = 0.50 (large effect): The effect accounts for 25% of the variance
View source
Odds Ratio 
Effect size for counts (frequency), categorical variables (yes or no)
View source
The odds of a 'yes' response were 0.4 times as large to a singer as to someone who started a conversation
View source
The odds of a 'yes' response were 2.5 times as large to a talker as to someone who sang
View source
Effect sizes overcome many of the problems associated with NHST
View source
Effect sizes are less affected than p-values by things like early or late termination of data collection, or sampling over a time period rather than until a set sample size is reached
View source
There are still some researcher degrees of freedom (not related to sample size) that researchers could use to maximize (or minimize) effect sizes, but there is less incentive to do so because effect sizes are not tied to a decision rule in which effects either side of a certain threshold have qualitatively opposite interpretations
View source