TestDevelopment - an umbrella term for all that goes into the process of creating a test.
Test Conceptualization - Brain storming of ideas about what kind of test a developer wants to publish
Pilot Testing - The first time the test is administered to participants who are not part of the normative sample. This helps identify any issues or problems with the test.
Normative Sample - A group of people from which scores will be used as reference points when interpreting results.
Validity - Whether a test measures what it claims to measure.
Reliability - How consistent a test's score is over repeated administrations.
Test Construction - stage in the process that entails writing test items, revisions, formatting, setting scoring rules
scaling - process of setting riles for assigning numbers in measurement
Age-based - age is of critical interest
Grade-based - Grade is of critical interest
Stanine - if all raw score of the test are to be transformed into scores that range from 1-9
rating scale - grouping of words statements or symbols on which judgements of the strength of a particular trait are indicated by the testaker
Summative scale - final score is obtained by summing the ratings across all the items
Likert scale - scale attitudes, usually reliable
Thurstone scale - collection of a variety of different statements about a phenomenon which are ranked by an expert panel in order to develop the questionnaire
Guttman scale - yields ordinal level measures
Method of paired comparison - produces ordinal data by presenting with pairs of two stimuli which they are asked to compare
Item pool - reservoir or well from which the items will or will not be drawn for the final version of the test
Item Format - form, plan, structure, arrangement, and layout of individual test items
Multiple choice - Has three elements: stem or question, a correct option, and several incorrect alternatives (distraction or foils)
Matching item - test taker is presented with two columns: premises and responses
Binary choice - true or false item
Item banks - relatively large and easily accessible collection of test questions
Computerized adaptive testing - a new form of testing that varies the difficulty, level, and order of questions that get asked depending on your performance within the test
Floor and ceiling effects - highest and lowest score
outlier - High scorers
Item branching - ability of the computer to tailor the content and order of presentation of items on the basis of responses to previous items.
cumulative scoring - the higher score one achieved on the test, the higher the test taker is on the ability that the test purpots to measure
Class scoring/Category scoring - test taker responses earn credit toward placement in a particular class or category with other test taker who pattern of responses is presumably similar in someway.
Ipsative scoring - comparing a test taker's score on one scale within a test to another scale within that same test
Semantic Differential rating technique - measures an individual's unique, perceived meaning of an object, a word, or an individual, usually essay type, open ended format.
Test tryout - test should be tried out on people who are similar in critical respects to the people for whom the test was designed.
Pseudobulbar effect - neurological disorder characterized by frequent involuntary outburst of laughing or crying that may not be appropriate.
Empirical criterion keying - approach to test Development that emphasizes the selection of items that discriminate between normal individuals and members of different diagnostic groups
Item Analysis - statistical procedure used to analyze items
Item Difficulty - defined by the number of people who get a particular item correct
Item difficulty index - calculating the proportion of the total number of test takers who answered the item correctly
Item Reliability index - provides an indication of the internal consistency of a test
Item Validity Index - Designed to provide an indication of the degree to which a test measure what it purpots to measure
Item -discrimination index - measure of the difference between the proportion of high score answering an item correctly and the proportion of low scorers answering the item correctly