Reliability – dependability or consistency of the instrument or scores obtained by the same person when re - examined with the same test on different occasions, or with different sets of equivalent item.
Reliability test
May be reliable in one context, but unreliable in another
Estimate the range of possible random fluctuations that can be expected in an individual's score
Free from errors
More number of items = higher reliability
Minimizing error
Using only representative sample to obtain an observed score
True score can not be found
Reliability Coefficient: index of reliability, a proportion that indicates the ratio between the true score variance on a test and the total variance,
Classical Test Theory (True Score Theory) – score on an ability tests is presumed to reflect not only the testtaker’s true score on the ability being measured but also the error.
Error: refers to the component of the observed test score that does not have to do with the testtaker’s ability.
Errors of measurement are random
X = T + EX is the raw score T is the true score E is the error
The greater number of items, the higher the reliability.
Factors that contribute to inconsistency : characteristics of the individual, test, or situation, which have nothing to do with the attribute being measured, but still affect the score.
When you average all the observed scoresobtained over a period of time, then the result would be closest to the true score.
Goals of Reliability: ✓ Estimate errors ✓ Devise techniques to improve testing and reduce errors
Variance – useful in describing sources of test score variability ▪ True Variance: variance from true differences ▪ Error Variance: variance from irrelevant random source
Measurement Error – all of the factors associated with the process of measuring some variable, other than the variable being measured.
Measurement of error is the difference between the observed score and the true score.
Positive: can increase one’s score Negative: decrease one’s score
Sources of error variance: a. Item Sampling/Content Sampling b. Test Administration c. Test Scoring and Interpretation
Test scoring and interpretation involves the accuracy and consistency of how scores are calculated and interpreted by the scorer.
Item sampling or content sampling refers to the selection of items that are representative of the construct being assessed.
Test administration includes issues such as instructions given to examinees, time limits on tests, and the use of standardized procedures during testing.
Item Sampling/Content Sampling refer to variation among items within a test as well as to variation among items between tests. The extent to which testtaker’s score is affected by the content sampled on a test and by the way the content is sampled is a source of error variance
The reliability coefficient (r) measures the degree to which two different measurements of the same thing agree with each other. It ranges from -1 to +1, where 0 indicates no agreement at all, positive values indicate agreement, and negative values indicate disagreement.
Reliability coefficients can be classified into three types based on their statistical properties: internal consistency, stability over time, and equivalence across forms.
Internal Consistency Reliability Coefficients measure the degree to which items on a single form of a test are related to each other. They include Cronbach's alpha, Kuder-Richardson Formula 20, and Kuder-Richardson Formula 21.
Reliability can be measured using various methods, including internal consistency, alternate forms,split-half, interscorer, test-retest, and parallel forms.
Test Administration : testtaker’s motivation or attention, environment, etc.
Test Scoring and Interpretation : may employ objective - type items amenable to computer scoring of well - documented reliability.
Criterion Related Validity involves comparing scores on one test to another test that has already been shown to be valid.
Content Validity refers to whether the content of a test accurately represents the construct being assessed.
Validity is the extent to which an instrument actually measures what it purports to measure.
Content Validity is determined by whether an instrument measures what it claims to measure.
Construct Validity refers to how well a test measures a particular construct (e.g., intelligence).
Face Validity is when a test appears to assess what it purports to measure.
Construct Validity is determined by analyzing how well the results from a test match up with what we know about the underlying concept it measures.
Concurrent Validity compares scores obtained at two different times when both tests were administered simultaneously.
Random Error – source of error in measuring a targeted variable caused by unpredictable fluctuations and inconsistencies of other variables in measurement process (e.g., noise, temperature, weather, etc)
Systematic Error – source of error in a measuring a variable that is typically constant or proportionate to what is presumed to be the true values of the variable being measured. it has consistent effect on the true score and the SD does not change, but the mean does.
Reliability refers to the proportion of total variance attributed to true variance.
The greater the proportion of the total variance attributed to true variance, the more reliable the test.
Error variance may increase or decrease a test score by varying amounts, consistency of test score, and thus, the reliability can be affected.