7 Test Construction

Created by

Aljo

Cards (44)

Test development 
An umbrella term for all that goes into the process of creating a test
View source
Test development process
1. Test conceptualization
2. Test construction
3. Test tryout
4. Analysis
5. Revision
View source
Test conceptualization 
An idea for a test is conceived
View source
Test construction 
1. Writing test items (or re-writing/revising existing items)
2. Formatting items
3. Setting scoring rules
4. Designing and building the test
View source
Scaling 
The process of setting rules for assigning numbers in measurement; process by which a measuring device is designed and calibrated and by which numbers (or other indices) – scale values – are assigned to different amounts of the trait, attribute or characteristic being measured
View source
Types of scales
Age-based scale
Grade-based scale
Stanine scale
Unidimensional versus multidimensional scale
Comparative versus categorical scale
View source
Examples of Scaling methods
Rating scale
Summative scale (e.g. Likert scale)
Unidimensional and multidimensional scaling
Method of paired comparisons
Comparative scaling
Categorical scaling
Guttman scale/Scalogram analysis
View source
Writing test items
Determining the range of content to cover
Selecting the item formats to employ
Deciding how many items to write in total and for each content area
View source
Item pool 
Reservoir from which items will or will not be drawn for the final version of the test
View source
Types of Item formats
Selected-response format (e.g. multiple-choice, matching, true-false)
Constructed-response format (e.g. completion, short-answer, essay)
View source
Multiple-choice items
Stem
Correct alternative/option
Distractors/foils
View source
General item development guidelines and checklists exist for multiple-choice, Likert-type, matching, true-false, short-answer, and essay items
View source
Scoring items
Cumulative model (cumulative credit for a construct)
Class/category scoring (credit for placement in a particular class/category)
Ipsative scoring (comparing a testtaker's score on one scale to another scale within the same test)
View source
Test tryout 
The test is tried out on people similar to the intended test-takers, under conditions as identical as possible to the standardized administration
The more subjects in the tryout the better
It should be executed under conditions as identical as possible to the conditions under which the standardized test will be administered
View source
Item analysis 
Analysis of testtakers' performance on the test as a whole and on each item
Statistical procedures are employed to assist in making judgments about which items are good, need revision, or should be discarded
View source
Tools to analyze and select items
Item-difficulty index
Item-reliability index
Item-validity index
Item-discrimination index
View source
Item-difficulty index 
Proportion of total testtakers who answered the item correctly; the larger the index, the easier the item
View source
Optimal item difficulty
For binary-choice items: .625
For four-option multiple-choice items: 1.25
View source
Item-reliability index 
Indication of the internal consistency of a test; the higher the index, the greater the test's internal consistency
View source
Item-validity index 
Indication of the degree to which a test is measuring what it purports to measure; the higher the index, the greater the test's criterion-related validity
View source
Item-discrimination index 
Indication of how adequately an item separates or discriminates between high scorers and low scorers on an entire test; the higher the value, the more adequately the item discriminates
View source
Negative item-discrimination index is a red flag
View source
Qualitative item analysis 
Nonstatistical procedures designed to explore how individual test items work; compares individual items to each other and to the test as a whole
View source
Test revision 
Actions taken to modify a test's content or format for the purpose of improving the test's effectiveness as a tool of measurement
View source
When are existing tests due for revision?
Stimulus materials look dated and current testtakers cannot relate to them
Verbal content contains dated vocabulary not readily understood by current testtakers
Words/expressions perceived as inappropriate or offensive due to changes in popular culture
Test norms are no longer adequate due to group membership changes or age-related shifts in abilities
Reliability, validity, or item effectiveness can be significantly improved
The theory on which the test was based has been improved significantly
View source
Preliminary questions in TEST CONCEPTUALIZATION: (p2)
What is the ideal format of the test?
Should more than one form of the test be developed?
What special training will be required of the test users for administering or interpreting the test?
What types of responses will be required of testtakers?
Who benefits from an administration of this test?
Is there any potential for harm as the result of an administration of this test?
How will meaning be attributed to scores on this test?
Preliminary questions in TEST CONCEPTUALIZATION:
What is the test designed to measure?
What is the objective of the test?
Is there a need for this test?
Who will use this test?
Who will take the test?
What content will the test cover?
How will the test be administered?
Scaling methods  
• Assignment of numbers to responses so that a test score can be calculated
Rating scale  
– a grouping of words, statements, or symbols on which judgments of the strength of a particular trait, attitude, or emotion are indicated by the testtaker.
Summative scale  
– summing ratings across all the items to obtain the final test score
Method of paired comparisons  
– testtakers are presented with pairs of stimuli which they are asked to compare; selection of one stimuli is according to some rule
Comparative scaling  
– entails judgments of a stimulus in comparison with every other stimulus on the scale
Categorical scaling  
– stimuli are placed into one of two or more alternative categories that differ quantitatively with respect to some continuum
Guttman scale/Scalogram analysis  
– items range sequentially from weaker to stronger expressions of the attitude, belief, or feeling being measured
Matching item  
• Testtaker is presented with two columns: premises on the left and responses on the right
• The task is to determine which response is best associated with which premise
Binary-choice item (true-false item)  
• Takes the form of a sentence that requires the testtaker to indicate whether the statement is or is not a fact
*TYPES OF CONSTRUCTED-RESPONSE ITEMS
Completion item  
– requires the examinee to provide a word or phrase that completes a sentence
*TYPES OF CONSTRUCTED-RESPONSE ITEMS
Short-answer item  
– requires a succinct response
*TYPES OF CONSTRUCTED-RESPONSE ITEMS
Essay  
– requires the testtaker to respond to a question by writing a composition, typically one that demonstrates recall of facts, understanding, analysis, and/or interpretation
*Scoring items
Cumulative model  
– cumulative credit with regard to a particular construct

See similar decks

7 Test Construction

Cards (44)

3.5.1.2 Social Construction

Assessment Preparation

Assessment Structure

Assessment Structure

13.4.2 Test plans and test cases

Assessment Preparation

2.4.3 Differentiated Instruction

4.1 Properties and Constructions

1. Practical Skills Assessment

1.3 Grasping assessment objectives

5.4 Reconstruction

4.1 Properties and Constructions

7. Assessment Preparation

6.6.3 Self and Peer Assessment

3.5.1 The Social Construction of Crime and Deviance

7. Synoptic Assessment

Unit 3: Practical Assessment

3.3.2 Providing Constructive Feedback

1.1.9 Muscle contraction types

1.1.9 Muscle contraction types

Unit 3: Practical Assessment