Descriptive statistics - means, confidence intervals where appropriate, medians, standard deviations, and graphical illustrations such as box and whisker plots.
Effect size - the magnitude of the difference between the conditions, called d, and an overall measure of effect, partial eta2
ANOVA
Serves the same purpose as the t-tests
It tests for differences in group means
ANOVA is more flexible in that it can handle any number of groups (t-tests are limited to 2 groups)
all about looking at the different sources of variability in a dataset.
ANOVA - Tests whether there is a significant difference between some or all of the means of the conditions by comparing them with the grand mean.
Independent ANOVA - used when participants perform in only one condition of several, i.e. an independent or between-participants design.
Related ANOVA - used when the participants perform in all conditions, i.e. a related or withinparticipants design.
ANOVA requires 2 or more groups to work. We refer to groups as levels.
Grouping variable - our predictor (it predicts or explains the values in the outcome variable) or, in experimental terms, our independent variable, and is made up of k groups.
Outcome variable - the variable on which people differ, and we are trying to explain or account for those differences based don group membership.
Individual group means – if we have k=3 groups, or means will be M1 , M2 , and M3
Grand Mean (Mg) - single mean representing the average of all participants across all groups.
Systematic variability - systematic variability between groups.
Random error - variability within each group.
Between-groups variability - the variability arising from the differences between groups.
Between-groups variation arises from:
Treatment effects
Individual differences
Experimental error
Treatment effects - the differences that reflect experimental manipulation; when performing an experiment
Individual differences - each participant is different therefore participants will respond differently even when faced with the same task.
Experimental error - differences due to experimental errors contribute to variability.
Within-groups variability - the variability arising from differences that occur within each group. Each individual deviates a little bit from their respective group mean. This represents our error in ANOVA.
Within-groups variation arises from:
Individual differences
Experimental error
Total Sum of Squares
SS(t)=SS(b)+SS(w)
Mean square - another way of saying variability.
F statistic - test statistic for ANOVA, it is compared to a critical value to see whether we can reject or fail to reject a null hypothesis.
If the F statistic is less than 1, MSW is greater than MSB (more unsystematic variance then systematic variance the effect of natural variation is greater than the difference brought about by the experiment)
F statistic is an omnibus test - It evaluates whether ‘overall there are differences between means; it does not provide specific information about which groups were affected.
When the between-groups variance is very much larger than the within-groups variance, the F-value is large, the likelihood of such a result occurring by sampling error decreases.
The larger the between-groups variance is in relation to the within-group variance, the larger the F ratio.
If we use the T-test statistic to compare 3 or more means, it can increase our probability of committing a Type I error.
If homogeneity of variance is violated, use:
Brown-Forsythe F
Welch's F
Assumptions to be met:
continuous dependent variable
categorical independent variable (2 or more independent groups).
Independence of observations (for independent ANOVA)
No outliers
Normally distributed
Homoscedasticity
When assumptions are violated:
routinely check welch's F
bootstrap
use kruskal-wallis (nonparametric alternative).
sensitivity analysis
Planned contrasts (contrast coding) - specific pairwise comparisons, are done to test specific hypotheses
Post-hoc tests - used when there were no specific hypotheses
Bonferroni correction - the probability of a Type I error is reduced by increasing the number of tests
The Type I error rate and the statistical power of a test are linked. Therefore, there is always a trade-off: if a test is conservative (the probability of a Type I error is small) then it is likely to lack statistical power (the probability of a Type II error will be high).
Least-significant Difference - no attempt to control Type I error, requires the overall ANOVA to be significant.
Studentized Newman - Keuls (SNK) - a very liberal test and lacks control over the familywise error rate
Bonferroni's and Tukey's Test - both control Type 1 error rate very well but are conservative (Bonferroni when the number of comparisons is small, Tukey when testing a large number of means)
Scheffe- conservative’ less likely to commit a type 1 error, but less power to detect effects
HOCHBERG’S GT2 AND GABRIEL’S PAIRWISE TEST PROCEDURE - sample sizes are different (if sample sizes are different Gabriel can be too liberal, Hochberg's GT2 is unreliable when variances are different).