Reliability refers to the consistency of test scores obtained by the same persons when they are re-examined with the same test on different occasions, or with different sets of equivalent items, or under varying examining conditions.
Reliability coefficient is an index of reliability, a proportion that indicates the ratio between the true score variance on a test and the total variance.
Random Error: a source of error in measuring a targeted variable caused by unpredictable fluctuations and inconsistencies of other variables in the measurement process (i.e., noise).
Systematic Error: a source of error in measuring a variable that is typically constant or proportionate to what is presumed to be the true value of the variable being measured.
Test Administration: sources of error may stem from the testing environment and test taker variables such as emotional problems, physical discomfort, lack of sleep, and the effects of drugs or medication.
Test Scoring and Interpretation: computer testing reduces error in test scoring, but many tests still require expert interpretation (e.g., projective tests).
Test-Retest reliability: an estimate of reliability obtained by correlating pairs of scores from the same people on two different administrations of the same test.
Alternate forms: different versions of a test that have been constructed so as to be parallel, but do not meet the strict requirements of parallel forms but typically item content and difficulty is similar between tests.
Coefficient alpha: Developed by Cronbach to estimate the internal consistency of tests in which the items are not scored as 0 or 1 (wrong or right), its values ranges from 0 to 1.
Reliability estimates in the range of .70 to .80 are good enough for most purposes in basic research, in clinical settings, high reliability is extremely important, or the tests are used for life and death decisions, it must be treated with high standards (i.e., reliability of .90 to .95).
Inter-scorer reliability: The degree of agreement or consistency between two or more scorers (or judges or raters) with regard to a particular measure, often used with behavioral measures, guards against biases or idiosyncrasies in scoring.
Most appropriate for variables that should be stable over time (e.g., personality) and not appropriate for variables expected to change over time (e.g., mood/states).
The test is a speed or a power test; a power test is long enough to allow test takers to attempt all items, a speed test contains items of uniform level of difficulty so that when given generous time limits, all test takers should be able to complete all the test items correctly.
Domain sampling theory is used to evaluate one’s spelling ability by using a sample of words instead of using the entire number of words in the dictionary to comprise the items of the test.