p-value can only be affected by samplesize, therefore large sample data may provide small, unimportant effects; small sample data may hide large, important effects
Non-significant results only tell us that an effect is not big enough to be found given the sample size
Significant test statistics are based on probabilistic reasoning
Criterion significance levels
Probability level; willing to accept as the likelihood that the results were due to sampling error
All-or-nothing thinking – p<.05 is merely a rule of thumb and not a threshold to decide a 1-0 situation
We counter all-or-nothing thinking by looking at confidence intervals
The intentions of the scientist
Affects the conclusions from NHST before data collection
Success
Defined by a scientist's results being significant
Researcher degrees of freedom
Scientists may selectively report their results to focus on significant findings and exclude non-significant ones
p-hacking
Practices that lead to the selective reporting of significant p-value; reporting only the one that yields significant results
HARKing
Presenting a hypothesis that was made after data collection as though it were made before data collection
EMBERS (Ways to counter the pitfalls)
Effect sizes
Meta-analysis
Bayesian estimation
Registration
Sense
p-values can indicate how incompatible the data are with a specified statistical model (i.e., Ho). The p-value can indicate how much the data can contradict the specific, expected statistical model
A p-value suggesting compatibility with a hypothesis does not mean the hypothesis is the sole true explanation
Open science
Movement; makes the process, data, and outcomes of research freely available to everyone
Pre-registration
Practice; making all aspects of your research process publicly available before data collection
Registered report
Submission; academic journal; intended research protocol
Peer Reviewers' Openness Initiative
Scientists; commit to the principles of open science; acting as expert reviewers
Effect size
An objective, usually standardized measure of magnitude of observed effect; affected by sample size but not attached to a decision rule; affects how closely sample effect size matches the population e.f.s
Standardized effect sizes
The ability to compare effect sizes across different studies
Effect size guidelines
Cohen's d - .2 (small); .5 (medium); .8 (large)
Pearson's r - .10 (small); .30 (medium); .50 (large)
Odds Ratio – effect size for counts (frequency); categorical variables; 2x2 con. Table
Meta-analysis
Uses studies to get a definitive estimate of the effect in the population
Weighted average in meta-analysis
Each effect size is weighted by its precision
Bayesian statistics
Using the data you collect to update your beliefs
Bayes' Theorem
Conditional probability of two events = individual probabilities & inverse conditional probability; used to update prior distribution w/ data; used to update prior belief in a hypothesis based on the observed data
Prior probability
Belief in a hypothesis before considering the data
Marginal likelihood/evidence
Probability of the observed data
Likelihood
Probability; observed data could be produced given the hypothesis/model
A posterior distribution can be used to obtain a point estimate
Power
The ability to detect a significant effect, when it exists
Ability of a test to reject a Ho correctly
Having a power result of 0 means you cannot find a difference or relationship between variables/means
0.1-0.3 are low power values; 0.8-0.9 are high power values
Factors affecting power
Size of the effect expected to be found
Criterion significance level (value of the significance level at which you are prepared to accept that results are probably not due to sampling error)
Variation in experimental scores among identically treated individuals within the same group who experienced the same experimental conditions
If you did not have enough power in a study, you wouldn't have been able to find an effect
In the case of a study having an enormous amount of participants but the effect size still being small, there can truly be no effect at all
The more power a test has, the narrower the confidence interval
Confidence interval
Statistically determined interval estimate of a population parameter
Independent samples t-test
Compare mean scores of two different groups of people
Paired samples t-test
Compare means scores for the same group of people on two different conditions
Rationale for t-tests
2 samples of data are collected and the sample means calculated
If the samples come from the same population, their means are expected to be roughly equal
The difference between collected sample means and the difference between sample means we expect to obtain if there were no effect are compared (means of two conditions are compared)
value
The higher the t-value, the more likely it is that the difference between groups is not the result of sampling error. Likelihood of having obtained the observed differences between two groups by sampling error