If we would like to make generalisable inferences it is important to get sampling right and make sure the set of people that participate in the experiment are representative
Sampling methods
Probability sampling
Non-probability sampling
Probability sampling
A way to ensure that your sample is representative of the population
All members of the population have an equal chance of being selected in the sample
Simple random sampling
Each member has an equal and independent chance of being selected
Define a population, list all members, assign numbers
Use a table of random numbers to select, a "lotter" method or a computer program
Systematic random sampling
If you want to make sure the profile of the sample matches the profile of the population on some important characteristics e.g. ethnic mix, gender
Divide population into subpopulations (strata) and randomly sample from the strata
Can reduce sampling error by ensuring ratios reflect actual population (e.g. ratio males to females)
Non-probability sampling
Not every member of the population has an equal chance of being part of the sample
The ultimate is the probability sampling method
Probability sampling is often not workable or feasible given resources, time or the specific target population
Sampling method used should be fully explained and caveats about the likely generalisability of results made accordingly so that the reader can review your results in an informed way
Sampling error
The difference in value between the sample statistic and the population parameter (depends on sample size)
Sampling bias
When there are systematic differences between the people you sample and don't sample
Reliability
Does our measurement instrument behave sensibly? Does it always measure the same thing in the same way?
Types of reliability
Stability of the measure (Test-retest)
Internal consistency of the measure (spit-half, Cronbach's alpha)
Agreement or consistency across raters (inter-rater reliability)
Test-retest reliability
Does your test measure the same thing every time you use it?
Problems with test-retest: memory effect and practice effect
Split-half reliability
Is your measure internally consistent
Cronbach's Alpha
Equivalent to average of all possible split-half reliability for that test with that sample
Allows you to asses the 'internal consistency' of all your items across the measure
Cronbach's alpha yields a coefficient (alpha) that can range from 0 to 1.00. the closer the alpha is to 1.00, the better the reliability of the measure
Inter-rater or inter-observer reliability
Do different raters measure the same thing?
Validity
Are we measuring what we think we are measuring? Is our measure credible and believable?
Types of validity
Internal validity
External validity
Face validity
Content validity
Criterion-related validity
Construct validity
Internal validity
Relates to the logical structure of the experiment itself and the control of extraneous variables
Sound operationalisation of our DV
Strong experimental design logic
Sound operationalisation of our IV(s)
Consideration and use of appropriate remedies to control for extraneous variables
External validity
How generalisable are our findings (tied in with representativeness of sample), how representative of the real world (tied in with how artificial our study is)
The more stringently we try to control or ensure internal validity, the potentially more artificial our study becomes and hence less representative of reality and hence less generalisable…and hence less EXTERNALLY valid
Face validity
Does it (subjectively) look like it measures what you want it to measure?
Content validity
The extent to which the measure represents a balanced adequate sampling of relevant dimensions
Criterion-related validity
Involves checking the performance of your measure against some external criterion
Concurrent validity: does it relate to a known criterion, for example, an alternative (gold standard) measure of the same construct?
Predictive validity: does the measure predict/relate to some criterion that you would expect it to predict?
Construct validity
Established validity by showing that your measure relates to other constructs in a way that you would expect (theoretically)
Convergent: Measure of constructs that theoretically should be related to each other, are, in fact, observed to relate to each other
Divergent: Measures of constructs that theoretically should not be related to each other, are, in fact, observed not to relate to each other
Can a measure be reliable but not valid?
Can a measure be valid but not reliable?
Population
A group of people about whom one would like to draw some meaningful conclusions
Sample
A subset of the population that is actually included in your research study
Alpha
The Type 1 error rate always equals alpha α
Rejecting the null, saying there is a significant result, however, it is not significant, and you shouldn't have rejected the null
Beta
The Type 2 error rate is signified by beta β
Type 2 error is when you accept the null hypothesis when you should have rejected it
Power
The probability of finding a significant effect when there is one to find
Conceptualised as 1 – beta (beta=Type 2 error rate)
Alpha (α)
The Type 1 error rate, rejecting a true null hypothesis
Beta (β)
The Type 2 error rate, accepting the null hypothesis when you should have rejected it
Power
The probability of finding a significant effect when there is one to find
Ways to boost power
1. Increase sample size
2. Increase alpha level (increases type 2)
3. Reduce error variance component in design
4. Reduce the impact of nuisance variables and control for things
5. Increase sensitivity of design
6. Increase treatment effect size
A common recommendation to make in a report when there is not a significant result is to recommend using a bigger sample size