Considered intelligence as exceptional perceptual-motor skill
Battery - head size, visual acuity, RT, hand grip strength etc
9000 visitors to London Exhibition 1884
Scientists could not be distinguished from ordinary citizens
Invented the correlation coefficient
Alfred Binet (1857-1911)
1881 French law made school compulsory for all
Binet asked to devise test to detect slow learners who would not benefit from regular school curriculum
He assumed reasoning and problem solving should be measured
Binet and Simon published aptitude test in 1905
Binet's method: mental age scale
1. Considered slow child to perform like normal child of younger age
2. Devised scale of mental age
3. Used Average Mental Age (MA) score and Chronological Age (CA)
4. Bright: MA>CA; Dull: MA<CA
5. William Stern (1912) invented concept of intelligence quotient, or IQ =(MA/CA) x 100
6. Expresses IQ as ratio of MA to CA
7. Note that CA/MA measures only make sense for children - modern scores are different - 'deviation IQ'
Lewis Terman (1877-1956)
Stanford University School of Education
Published 'Stanford-Binet' Test (1916)
Used Stern's Intelligence Quotient, when MA=CA, IQ=100
Used to classify children with developmental disabilities
Used to classify Army recruits
Also studied gifted children
S-B test still in use today - 5th edition (Roid, 2003)
WAIS-IV (2008) Four Factor Model
Verbal Comprehension
Perceptual Reasoning
Working Memory
Processing Speed
WAIS-IV 2008 Subtests
Vocabulary
Similarities
Information
Comprehension
Digit Span
Arithmetic
Letter-Number Sequencing
Symbol Search
Coding
Visual Puzzles
Block Design
Matrix Reasoning
Picture Completion
Figure Weights
Cancellation
Standard scores (WAIS-IV 2008)
Mean: 100, SD: 15 for: Full scale FSIQ, Verbal Comprehension Index (VCI), Perceptual Reasoning Index (PRI), Working Memory Index (WMI), Processing Speed Index (PSI), General ability index (GAI) - VCI, PRI (6 subtests)
Characteristics of the test scores of a representative sample of people
Selection procedures: Stratified for location, residence (urban/rural), socioeconomic status, ethnic origin, gender, age
Number: Exclusions
WAIS-IV sample (Wechsler, 2008) Normative sample of 2,200 adults 16-90 years, 13 age bands, 200 at each (below 70 years), stratified by: age, gender, race/ethnicity, education level, and geographic region according to 2005 US census data
UK validation study (Wechsler, 2010) Data collection 2008-9, Validation sample: 270 aged 16-89 years, stratified to match UK population, UK data closely reflects that of the US
WAIS-IV: Use around the World Different language versions, Cultural adaptations, Local norms in countries
Tuddenham (1948) Army Alpha Test
Scores compared from enlisted soldiers in World War I to World War II
Significantly higher in WW II era
Explanations: More familiar with testing, Increased public health and nutrition, Greater amount and higher quality of education
Seen as real increases in cognitive ability
Suggested test norms should be periodically revised
Flynn (1984)
Observed re-norming of tests e.g. Stanford-Binet, WISC, Raven's Progressive Matrices
Noticed new versions normed to higher standard than previous version (more correct answers had to be given to achieve a given standard score)
Means that people perform better on older versions of tests
Similar results seen in different countries
Increase in IQ approx. 0.3 points per year (3 per decade)
Found to be stable across all ages
'Flynn Effect'
Clark, Lawlor-Savage & Goghari (2016) The Flynn Effect: A quantitative commentary on modernity and human intelligence. Argues these are not genuine increases in intelligence, but increasing aptitude for the types of modern thinking that modern life requires and that IQ tests measure
Dutton, van der Linden & Lynn (2016) The negative Flynn effect: Evidence that the Flynn effect has gone into reverse in some countries, with overall decline in IQ score 0.38-4.30 IQ points per decade
The normal distribution
Checking for normal distribution of scores: Distributions symmetrical, with no anomalies, Unimodal, Sufficient spread in the scores in all age groups, Raw score distribution represents error due to sampling
Qualitative descriptors of standard scores
Extremely high: 130 and above
Very high: 120-129
High average: 110-119
Average: 90-109
Low Average: 80-89
Very low: 70-79
Extremely low: 69 and below
Classical test theory
X = T + E
X - observed score
T - true score
E - error
Reliability & types of reliability
Test-retest
Parallel forms
Split-half
Internal consistency
Inter-rater
Standard error of measurement
Correlations
Relationship or association between two variables
Measured by a correlation coefficient
Measured on scale of -1 to +1
Sign (-/+) indicates direction of the relationship
Number indicates strength of relationship
Test-retest reliability
298 participants (4 age bands), test-retest interval 8-82 days, mean 22 days
Pearson's correlations (all ages together): .96 for Full Scale IQ, .87-.96 for index scores, .74-.90 for subtests
Parallel forms
Establishes reliability of two tests that supposedly sample the same material
E.g. short and long forms of a test
WAIS-IV (Wechsler, 2008) General ability index (GAI) - VCI, PRI (6 subtests) short measure, correlates with FSIQ .97
Split-half reliability
Split half method - test divided into two (odd/even), two halves correlated
Internal consistency
Represents the consistency of scores within a test
Cronbach's alpha (or Coefficient alpha) - to estimate reliability in multiple trials test
In general, 0.7 would be considered acceptable
Often examined within the standardization sample - and in other special populations
WAIS-IV Internal consistency: Reported for the normative group, Coefficients for all index scores in .90s, Coefficient for Full Scale IQ is .98, Coefficients for all core subtests in .80s or .90s, and within acceptable range of all subtests, Coefficients from clinical and special group samples consistent with the above, .80 -.90 range
Inter-rater reliability
The degree of agreement between two or more examiners or raters
How much consensus exists in the ratings or scores
Intra-class correlations
Cohen's Kappa
WAIS Inter-scorer agreement: Normative sample, All record forms double scored by two independent scorers .98-.99, Special studies for those sub-tests requiring more judgement: Similarities, Vocabulary, Information, Comprehension, 60 cases randomly selected and scored by individuals on graduate-level clinical psychology programmes, Intra-class correlations: .91-.97
Standard error of measurement (SEM)
Estimates the error in a test score
Expresses variation in terms of a SD and estimates range of scores within which an obtained score might fall when using a test with a certain reliability coefficient value
Used to calculate Confidence Intervals
Confidence interval
A range of values around the obtained test score, likely to include the true value, with a certain degree of confidence
The CI is paired with a percentage, like 68% or 95%
The percentage (95%) tells us that if we calculated a CI from 100 different samples, about 95% of them would contain the true score
WAIS-IV Confidence Intervals: Average SEM 2.16 for Full Scale IQ, SEMs vary across age and for each index/subtest, Example: Obtained score = 90, 68% CI = 90 +/- 2.16; 95% CI = 90 +/- 4.32, True score will lie between 85.68 - 94.32
Aspects of validity
Face validity
Content validity
Known-group validity
Criterion-related validity
Concurrent validity
Predictive validity
Face validity
Whether the test appears, at face value, to measure what it claims to measure
People rate the validity of a test as it appears to them, It only reflects the judgement of the rater
Construct Validity
The overall ability to measure the construct
The extent to which it captures the model or theory of the construct being measured
Different approaches and methods need to be combined
Factor analysis
Method used to determine how many factors (dimensions) are included in a construct, and which tasks best represent those factors
Confirmatory Factor Analysis
To test hypotheses about the structure of the data
Various computer programmes: LISREL, EQS, MX
Programme produces statistics showing how closely the postulated structure fits the actual data
Common practice to try out several models - choose the one which gives the best fit
'Path diagram' used to show relationships between variables, common factors and unique factors
Weiss et al (2013) Confirmatory factor analysis: Four or five factors?
WISC-V (2016) Five Factor Model
Verbal Comprehension
Visual Spatial
Fluid Reasoning
Working Memory
Processing Speed
Known-group validity
The extent to which a test can differentiate between groups where differences are expected
The groups are initially distinguished by some means, then tested
WAIS-IV Known group validity: Intellectually Gifted 34 17-64 year olds (current member of Mensa), Composite scores significantly higher than matched controls, Mild or Moderate Intellectual Disability 104 16-63 year olds with a diagnosis (73 'mild', 31 'moderate'), Both groups scored significantly lower on all subtests than matched control groups