The process of testing our hypothesis to ensure that our explanation or hypothesis about a behavior, event, or phenomenon is plausible - reasonable, and supported by evidence
Steps of hypothesis testing
1. State the research question
2. Specify the null and alternative hypothesis
3. Determine the level of significance
4. Compute the test statistic
5. Make a decision and state conclusion
Research question
The foundation of a systematic investigation, representing an inquiry the research aims to answer. It is written in question form and has no stated expectation about the outcome.
Types of research questions
Testing of Difference: Do men and women differ in terms of conversational memory?
Testing of Association: Is there a significant relationship between social media use and level of self-esteem among adolescents?
Hypothesis
A guess, possible state, or tentative answer to the research question, based on research and grounded in theory
Rationale of hypotheses
The logical reasoning behind the direction of the hypotheses, which may be based on past research, existing theory, or logical reasoning
Null hypothesis
A prediction of no effect on treatment, no relation between variables, and no difference between groups. The main idea is that nothing is going on.
Alternative hypothesis
A prediction of real effect on treatment, the relationship between variables, and the difference between groups. Can be directional (one-tailed) or non-directional (two-tailed).
Level of significance
The level of probability at which the null hypothesis can be rejected with confidence and the research hypothesis can be accepted with confidence. Conventional level is α = .05, conservative level is α = .01.
Parametric tests
Assume normality (or at least large sample sizes so that sampling distribution is normal), require interval or ratio level of data, and are the most powerful
Non-parametric tests
Do not require a normal distribution, can be used in both nominal or ordinal levels of measurement, and are utilized in conditions that are not stringent or controlled, or when the sample is small (n<30)
Parametric tests
Independent Samples t-test
Paired Samples t-test
Analysis of Variance (ANOVA) - One Way or Two Way
Pearson Correlation
Linear Regression
Non-parametric tests
Chi-Square
Kruskal-Wallis H
Wilcoxon Test
Friedman Test
Decision-making in hypothesis testing
1. Compare the p-value with the significance level
2. If p-value < significance level, reject the null hypothesis (statistically significant)
3. If p-value ≥ significance level, fail to reject the null hypothesis (not statistically significant)
Conclusion in hypothesis testing
1. Describe the results of the null hypothesis (whether it was accepted or rejected, and for what value of alpha or p-value)
2. Describe the results of the alternative hypothesis (answering the research question and stating the sample statistic collected)
TypeI error (α error)
Rejecting the null hypothesis when in reality the null hypothesis is true - finding a significant effect, difference, or relationship, when in reality, there is none (false positive)
Type II error (β error)
Failing to reject the null hypothesis when in reality null hypothesis is false - finding no significant effect, difference, or relationship when in reality, there is a difference (false negative)
Significance level
Regulates Type I error - conservative standards reduce Type I error, but increase the probability of Type II error
Sample size
Regulates Type II error - the larger the sample, the lower the probability of Type II error occurring in conservative testing
Inferential statistics
A set of tools and procedures wherein the data are used to estimate population parameters based on sample values or to test hypotheses and draw conclusions
Non-parametric alternatives to parametric tests
Mann Whitney U or Wilcoxon Rank Sums Test (alternative to Independent Samples T-test)
Wilcoxon Signed Rank Test (alternative to paired samples t-test)
Kruskal Wallis H (alternative to One-way and Two-way/Factorial Anova)
Friedman Test (alternative to Repeated Measures ANOVA)
Spearman's Rank Correlation (alternative to Person's Correlation)
Types of ANOVA designs
One-Way ANOVA - compare more than two groups/levels
Factorial ANOVA - use more than two independent variables
Repeated Measures ANOVA - measure each respondent three or more times/conditions to look for change
One-Way ANOVA
A statistical test and non-directional procedure that compares more than two means to determine if they have statistically significant differences
Logic of ANOVA
Variance - a measure of how spread out or dispersed the scores are from the mean
Two sources of variance: Variation Between Groups and Variation Within Groups
Comparing means using ANOVA entails an analysis of whether the distribution of groups are separate enough
Assumptions of One-Way ANOVA
The dependent variable must be continuous (interval/ratio)
The independent variable should consist of two or more categorical independent groups
All observations are independent of one another
The dependent variable is approximately normally distributed for each group of independent variables
No significant outliers
There needs to be homogeneity of variances
Concepts of ANOVA
Between Groups Sum of Squares: Deviation of each group mean from the total/combined mean
Within Groups Sum of Squares: Deviation of raw score from the group mean; Similar to variance
Total Sum of Squares: Deviation of raw score from the total/combined mean
df: degrees of freedom; k= number of groups; N - total number of sample size
MS: Mean Squares - a measure of variation that controls for the number of scores involved
Significance and effect size
A significant difference simply indicates that the observed statistic is unlikely to occur by chance or sampling error. To find out how large the difference/effect is, we compute for the EFFECT SIZE.
Post-hoc test
Used to determine exactly where the significant difference(s) lie after a significant ANOVA result
Standard error
The quantification of the sampling error, measuring the variability or dispersion of the sample mean from the population mean, used to estimate the population of the mean and estimate differences between population means
Effect size
The value that shows the practical significance of the statistical results, giving researchers an idea of how large, important, or meaningful a significant effect is (0.2 - small, 0.5 - moderate, 0.8 - large)
test table
Used to find the t-critical value to decide whether or not the null hypothesis should be rejected
Degrees of freedom
The factor that made sample standard deviation differ with population standard deviation
Confidence interval
A range of values within which the true population parameter is estimated to lie with certain level of confidence. If the range brackets the null hypothesis value, we fail to reject the null hypothesis. If it does not, we reject the null hypothesis.