Chapter 5: Reliability

Cards (29)

  • Reliability - Refers to consistency in measurement; something that produces similar results—not necessarily consistently good or bad, but simply consistent.
  • Reliability coefficient - Is a statistic that quantifies reliability, ranging from 0 (not at all reliable) to 1 (perfectly reliable).
  • Measurement error - Refers to the inherent uncertainty associated with any measurement, ever after care has been taken to minimize preventable mistakes.
  • A true score is tied to the measurement instrument used.
  • Construct score is a person’s standing on a theoretical variable independent of any particular measurement.
  • Carryover effects - Are measurement processes that alter hat is measured.
  • True - The observed score X is related to the true score T and the measurement error score E with this famous equation: X = T + E. (True or false)
  • True - If people’s observed scores are mostly determined by their true scores, the test is reliable. If people’s observed scores are mostly determined by measurement error, the test is unreliable. (True or false)
  • A statistic useful in describing sources of test score variability is the variance (σ2)—the standard deviation squared.
  • True variance - Variance from true differences
  • Error variance - Variance from irrelevant or random sources
  • True - The greater the proportion of the total variance attributed to true variance, the more reliable the test. (True or false)
  • Random error - A measurement error that consists of unpredictable fluctuations and inconsistencies of other variables in the measurement process.
  • Systematic error - A measurement error that do not cancel each other out because they influence test scores in a consistent direction.
  • Bias - Refers to the degree to which systematic error influences the measurement.
  • Different sources of error variance:
    1. Test construction
    2. Test administration
    3. Test scoring and interpretation
  • Item sampling - One source of variance during test construction; refers to variation among items within a test as well as to variation among items between tests.
  • Potential sources of error variance during test administration:
    1. Test taker variables
    2. Examiner related variables
  • Test retest - An estimate of reliability obtained by correlating pairs of scores from the same people on two different administrations of the same test.
  • When the interval between testing is greater than six months, the estimate of test-retest reliability is often referred to as the coefficient of stability.
  • Parallel forms reliability - A reliability estimate of the extent to which item sampling and other errors have affected test scores on versions of the same test when, for each form of the test, the means and variances of observed test scores are equal.
  • Alternate forms reliability - A reliability estimate of the extent to which these different forms of the same test have been affected by item sampling error, or other error.
  • Inter item consistency - Other methods of estimating internal consistency that refers to the degree of correlation among all the items on a scale.
  • Split half reliability - A reliability estimate that is obtained by correlating two pairs of scores obtained from equivalent halves of a single test administered once.
  • Inter scorer reliability - Is the degree of agreement or consistency between two or more scorers (or judges or raters) with regard to a particular measure.
  • Coefficient of inter scorer reliability - The simplest way of determining the degree of consistency among scorers in the scoring of a test is to calculate a coefficient of correlation.
  • The true score model of measurement and alternatives to it:
    1. Classical test theory
    2. Domain sampling theory
    3. Generalizability theory
    4. Item response theory
  • Standard error of measurement - The tool used to estimate or infer the extent to which an observed score deviates from a true score.
  • Standard error of the difference - a statistical measure that can aid a test user in determining how large a difference should be before it is considered statistically significant.