tests of intelligence

Subdecks (4)

Cards (209)

  • Historical overview
    • Measurement of intelligence
    • 'Psychometrics'
    • Galton
    • Binet
    • Terman
    • Wechsler
  • Francis Galton (1822-1911)

    • Naturalist, mathematician
    • Interested in individual differences
    • Considered intelligence as exceptional perceptual-motor skill
    • Battery - head size, visual acuity, RT, hand grip strength etc
    • 9000 visitors to London Exhibition 1884
    • Scientists could not be distinguished from ordinary citizens
    • Invented the correlation coefficient
  • Alfred Binet (1857-1911)

    • 1881 French law made school compulsory for all
    • Binet asked to devise test to detect slow learners who would not benefit from regular school curriculum
    • He assumed reasoning and problem solving should be measured
    • Binet and Simon published aptitude test in 1905
  • Binet's method: mental age scale

    1. Considered slow child to perform like normal child of younger age
    2. Devised scale of mental age
    3. Used Average Mental Age (MA) score and Chronological Age (CA)
    4. Bright: MA>CA; Dull: MA<CA
    5. William Stern (1912) invented concept of intelligence quotient, or IQ =(MA/CA) x 100
    6. Expresses IQ as ratio of MA to CA
    7. Note that CA/MA measures only make sense for children - modern scores are different - 'deviation IQ'
  • Lewis Terman (1877-1956)
    • Stanford University School of Education
    • Published 'Stanford-Binet' Test (1916)
    • Used Stern's Intelligence Quotient, when MA=CA, IQ=100
    • Used to classify children with developmental disabilities
    • Used to classify Army recruits
    • Also studied gifted children
    • S-B test still in use today - 5th edition (Roid, 2003)
  • WAIS-IV (2008) Four Factor Model
    • Verbal Comprehension
    • Perceptual Reasoning
    • Working Memory
    • Processing Speed
  • WAIS-IV 2008 Subtests
    • Vocabulary
    • Similarities
    • Information
    • Comprehension
    • Digit Span
    • Arithmetic
    • Letter-Number Sequencing
    • Symbol Search
    • Coding
    • Visual Puzzles
    • Block Design
    • Matrix Reasoning
    • Picture Completion
    • Figure Weights
    • Cancellation
  • Standard scores (WAIS-IV 2008)
    • Mean: 100, SD: 15 for: Full scale FSIQ, Verbal Comprehension Index (VCI), Perceptual Reasoning Index (PRI), Working Memory Index (WMI), Processing Speed Index (PSI), General ability index (GAI) - VCI, PRI (6 subtests)
    • Subtest scaled scores mean: 10, SD: 3 for: 10 subtests, 5 supplemental tests
  • Norms
    • Characteristics of the test scores of a representative sample of people
    • Selection procedures: Stratified for location, residence (urban/rural), socioeconomic status, ethnic origin, gender, age
    • Number: Exclusions
  • WAIS-IV sample (Wechsler, 2008) Normative sample of 2,200 adults 16-90 years, 13 age bands, 200 at each (below 70 years), stratified by: age, gender, race/ethnicity, education level, and geographic region according to 2005 US census data
  • UK validation study (Wechsler, 2010) Data collection 2008-9, Validation sample: 270 aged 16-89 years, stratified to match UK population, UK data closely reflects that of the US
  • WAIS-IV: Use around the World Different language versions, Cultural adaptations, Local norms in countries
  • Tuddenham (1948) Army Alpha Test

    • Scores compared from enlisted soldiers in World War I to World War II
    • Significantly higher in WW II era
    • Explanations: More familiar with testing, Increased public health and nutrition, Greater amount and higher quality of education
    • Seen as real increases in cognitive ability
    • Suggested test norms should be periodically revised
  • Flynn (1984)
    • Observed re-norming of tests e.g. Stanford-Binet, WISC, Raven's Progressive Matrices
    • Noticed new versions normed to higher standard than previous version (more correct answers had to be given to achieve a given standard score)
    • Means that people perform better on older versions of tests
    • Similar results seen in different countries
    • Increase in IQ approx. 0.3 points per year (3 per decade)
    • Found to be stable across all ages
    • 'Flynn Effect'
  • Clark, Lawlor-Savage & Goghari (2016) The Flynn Effect: A quantitative commentary on modernity and human intelligence. Argues these are not genuine increases in intelligence, but increasing aptitude for the types of modern thinking that modern life requires and that IQ tests measure
  • Dutton, van der Linden & Lynn (2016) The negative Flynn effect: Evidence that the Flynn effect has gone into reverse in some countries, with overall decline in IQ score 0.38-4.30 IQ points per decade
  • The normal distribution
    Checking for normal distribution of scores: Distributions symmetrical, with no anomalies, Unimodal, Sufficient spread in the scores in all age groups, Raw score distribution represents error due to sampling
  • Qualitative descriptors of standard scores
    • Extremely high: 130 and above
    • Very high: 120-129
    • High average: 110-119
    • Average: 90-109
    • Low Average: 80-89
    • Very low: 70-79
    • Extremely low: 69 and below
  • Classical test theory

    • X = T + E
    • X - observed score
    • T - true score
    • E - error
  • Reliability & types of reliability
    • Test-retest
    • Parallel forms
    • Split-half
    • Internal consistency
    • Inter-rater
    • Standard error of measurement
  • Correlations
    • Relationship or association between two variables
    • Measured by a correlation coefficient
    • Measured on scale of -1 to +1
    • Sign (-/+) indicates direction of the relationship
    • Number indicates strength of relationship
  • Test-retest reliability
    • 298 participants (4 age bands), test-retest interval 8-82 days, mean 22 days
    • Pearson's correlations (all ages together): .96 for Full Scale IQ, .87-.96 for index scores, .74-.90 for subtests
  • Parallel forms
    • Establishes reliability of two tests that supposedly sample the same material
    • E.g. short and long forms of a test
    • WAIS-IV (Wechsler, 2008) General ability index (GAI) - VCI, PRI (6 subtests) short measure, correlates with FSIQ .97
  • Split-half reliability
    Split half method - test divided into two (odd/even), two halves correlated
  • Internal consistency
    • Represents the consistency of scores within a test
    • Cronbach's alpha (or Coefficient alpha) - to estimate reliability in multiple trials test
    • In general, 0.7 would be considered acceptable
    • Often examined within the standardization sample - and in other special populations
  • WAIS-IV Internal consistency: Reported for the normative group, Coefficients for all index scores in .90s, Coefficient for Full Scale IQ is .98, Coefficients for all core subtests in .80s or .90s, and within acceptable range of all subtests, Coefficients from clinical and special group samples consistent with the above, .80 -.90 range
  • Inter-rater reliability

    • The degree of agreement between two or more examiners or raters
    • How much consensus exists in the ratings or scores
    • Intra-class correlations
    • Cohen's Kappa
  • WAIS Inter-scorer agreement: Normative sample, All record forms double scored by two independent scorers .98-.99, Special studies for those sub-tests requiring more judgement: Similarities, Vocabulary, Information, Comprehension, 60 cases randomly selected and scored by individuals on graduate-level clinical psychology programmes, Intra-class correlations: .91-.97
  • Standard error of measurement (SEM)
    • Estimates the error in a test score
    • Expresses variation in terms of a SD and estimates range of scores within which an obtained score might fall when using a test with a certain reliability coefficient value
    • Used to calculate Confidence Intervals
  • Confidence interval
    • A range of values around the obtained test score, likely to include the true value, with a certain degree of confidence
    • The CI is paired with a percentage, like 68% or 95%
    • The percentage (95%) tells us that if we calculated a CI from 100 different samples, about 95% of them would contain the true score
  • WAIS-IV Confidence Intervals: Average SEM 2.16 for Full Scale IQ, SEMs vary across age and for each index/subtest, Example: Obtained score = 90, 68% CI = 90 +/- 2.16; 95% CI = 90 +/- 4.32, True score will lie between 85.68 - 94.32
  • Aspects of validity
    • Face validity
    • Content validity
    • Known-group validity
    • Criterion-related validity
    • Concurrent validity
    • Predictive validity
  • Face validity
    • Whether the test appears, at face value, to measure what it claims to measure
    • People rate the validity of a test as it appears to them, It only reflects the judgement of the rater
  • Construct Validity
    • The overall ability to measure the construct
    • The extent to which it captures the model or theory of the construct being measured
    • Different approaches and methods need to be combined
  • Factor analysis
    Method used to determine how many factors (dimensions) are included in a construct, and which tasks best represent those factors
  • Confirmatory Factor Analysis
    • To test hypotheses about the structure of the data
    • Various computer programmes: LISREL, EQS, MX
    • Programme produces statistics showing how closely the postulated structure fits the actual data
    • Common practice to try out several models - choose the one which gives the best fit
    • 'Path diagram' used to show relationships between variables, common factors and unique factors
  • Weiss et al (2013) Confirmatory factor analysis: Four or five factors?
  • WISC-V (2016) Five Factor Model
    • Verbal Comprehension
    • Visual Spatial
    • Fluid Reasoning
    • Working Memory
    • Processing Speed
  • Known-group validity

    • The extent to which a test can differentiate between groups where differences are expected
    • The groups are initially distinguished by some means, then tested
  • WAIS-IV Known group validity: Intellectually Gifted 34 17-64 year olds (current member of Mensa), Composite scores significantly higher than matched controls, Mild or Moderate Intellectual Disability 104 16-63 year olds with a diagnosis (73 'mild', 31 'moderate'), Both groups scored significantly lower on all subtests than matched control groups