Psychological Assessment 2

Cards (42)

  • Test
    A tool to measure a particular construct
  • Test development
    1. Test Conceptualization
    2. Test Construction
    3. Test tryout
    4. Item Analysis
    5. Test Revision
  • Test Conceptualization
    • Test developers' idea of developing a tool to measure a particular construct
    • The stimulus for developing a test can be anything (e.g. emergence of a social phenomenon)
  • Pilot work
    Preliminary research surrounding the creation of a prototype of the test<|>Involves creation, revision, and deletion of test items
  • Test Construction
    1. Scaling
    2. Writing Items
    3. Item Formats
    4. Scoring Items
  • Scaling
    The process of setting rules for assigning numbers in measurement
  • Types of scales
    • Age scale
    • Grade scale
    • Stanine scale
  • Likert scale
    Used to scale attitudes, presents test taker with five alternative responses on an agree/disagree or approve/disapprove continuum
  • Item writing
    Considerations: range of content to cover, item formats to employ, number of items to write
  • For a standardized test, the first draft usually contains approximately twice the number of items that the final version will contain
  • Item formats
    Selected response (multiple choice, matching, true/false)<|>Constructed response (completion, short answer, essay)
  • Scoring models
    Cumulative model (higher score = higher ability)<|>Class model (placement in a particular class/category)<|>Ipsative scoring (comparison of a test taker's scores on different scales)
  • Test Tryout
    1. Test is tried out on the sample for which it is constructed
    2. Conditions should be as similar as possible to standardized test administration
  • Characteristics of a good test item
    • Valid and reliable
    • Discriminates test takers (high scorers get it right, low scorers get it wrong)
  • Item Analysis
    1. Employs statistical procedures to select the best items from a pool of tryout items
    2. Considers item difficulty, item-validity index, item-reliability index, item discrimination index
  • Item difficulty index
    Proportion of total test takers who answered the item correctly
  • Item-reliability index
    Indication of the internal consistency of a test
  • Item-validity index

    Indication of the degree to which a test is measuring what it purports to measure
  • Test Revision
    1. Eliminate and rewrite items based on item analysis
    2. Balance strengths and weaknesses across items
    3. Administer revised test under standardized conditions
  • The process of developing a test occurs in five stages: Test Conceptualization, Test Construction, Test tryout, Item Analysis, and Test Revision
  • Test Revision
    Information gathered at item-analysis stage<|>Some items eliminated<|>Others re-written<|>Characterize each item's strengths and weaknesses<|>Balance strengths and weaknesses across items<|>Administer revised test under standardized conditions<|>Consider test in finished form based on item analysis
  • If many otherwise good items tend to be somewhat easy, the test developer may purposefully include some more difficult items
  • Having balanced all the concerns, the test developer comes out of revision stage with a test of improved quality
  • Forms of Reliability
    • Test-Retest Reliability
    • Parallel Forms Reliability
    • Inter-rater Reliability
    • Split-Half Reliability
  • Reliability
    Consistency of scores obtained by the same person when re-examined with the same test on different occasions, or with different sets of equivalent items, or under other variable examining condition
  • Test-Retest Reliability

    Comparing scores obtained from two successive measurements of the same individuals and calculating a correlation between the two sets of scores<|>Measures error associated with administering a test at two different times<|>Only applicable to stable traits
  • Parallel Forms Reliability
    At least two different versions of the test yield almost the same scores<|>Compares two equivalent forms of a test that measure the same attribute
  • Inter-rater Reliability
    Degree of agreement between two observers who simultaneously record measurements of the behaviors
  • Split-Half Reliability
    Obtained by splitting the items on a questionnaire or test in half, computing a separate score for each half, and then calculating the degree of consistency between the two scores for a group of participants
  • The test can be divided according to the odd and even numbers of the items (odd-even system)
  • Validity
    Degree to which the measurement procedure measures the variable that it claims to measure (strength and usefulness)
  • Forms of Validity
    • Face Validity
    • Content Validity
    • Criterion Validity
    • Construct Validity
  • Face Validity
    Simplest and least scientific form of validity, demonstrated when the face value or superficial appearance of a measurement measures what it is supposed to measure
  • Content Validity
    Concerned with the extent to which the test is representative of a defined body of content consisting of topics and processes<|>Not done by statistical analysis but by the inspection of items by a panel of experts
  • Criterion Validity

    Involves the relationship or correlation between the test scores and scores on some measurement representing an identical criterion
  • Predictive Validity

    Demonstrated when scores obtained from a measure accurately predict behavior (criterion) according to a theory
  • Concurrent Validity

    Established when the scores of a measure (predictor) is correlated with the scores of a different measure (criterion) taken at the same time
  • Construct Validity
    Requires that the scores obtained from a measurement procedure behave exactly the same as the variable/construct itself<|>Based on many research studies that use the same measurement procedure and grows gradually as each new study contributes more evidence
  • Convergent Validity

    Involves comparing two different methods to measure the same construct and it is demonstrated by a strong relationship between the scores obtained from the two methods
  • Divergent Validity

    Refers to the demonstration of the uniqueness of that test<|>Effectively demonstrated when a test has a low correlation with measures of unrelated constructs