9. Reliability and Validity

    Cards (12)

    • What is validity?
      • Refers to a psychological test, observation or experiment produces a legitimate result and represents what is actually 'out there' in the real world
    • Internal validity
      • A measure of whether the results obtained are solely affected by the variable being manipulated and not by other factors - the researcher has measured what they intended to
      • Major threats to internal validity include demand characteristics and are often present in lab experiments
    • External validity
      • A measure of whether data can be generalised to other situations outside of the research environment
      • Temporal validity - the extent to which findings from a study can be applied across time
      • Ecological validity - the extent to which findings can be generalised to other settings and situations eg. lab environments may have low ecological validity
      • Mundane realism - the task used to measure the DV is not like 'every day life'
    • Ways of assessing validity
      • Face validity - whether a test or scale measures what it's supposed to measure. Can be passed to an expert to check
      • Concurrent validity - the extent to which a psychological test or scale relates to an already established test or scale. High concurrent validity occurs when there is a close agreement between the two sets of data
    • Improving validity
      • Experiments - high level of control over extraneous variables, having a comparable control group, standardise procedures to minimise pps reactivity and investigator effects, single blind and double blind procedures
      • Questionnaires - incorporate a lie scale which controls effects of social desirability bias, anonymity, removal of leading questions
      • Qualitative research - higher ecological validity due to depth and detail of case studies, triangulation through the use of different sources as evidence
    • What is reliability?
      • How consistent a measuring device is including psychological tests and observations which assess behaviour
      • If the same result is produced twice then the measurement is reliable
    • Ways of assessing reliability
      • Psychologists tend not to measure concrete things like length or height but more interested in abstract concepts such as attitudes, aggression and memory
    • Test-retest
      • Administering the same test or questionnaire to the same person on different occasions
      • If the test or questionnaire is reliable then results should be similar or the same - can also be applied to interviews
      • There must be enough time between test and re-test to ensure that pps cannot recall their original answers but not so long that their attitudes have changed
      • For questionnaires or tests, two sets of scores will have a correlation if similar
    • Inter-observer reliability
      • Observations should be conducted with at least two people to remove subjectivity and bias - there will be an agreement between two or more observers
      • This may involve a pilot study of the observation to check observers are applying behavioural categories in the same way
      • Observers need to watch the same event but record their data independently and the data collected should be correlated to assess it's reliability
      • Can also apply to content analysis
    • Measuring reliability
      • Correlational analysis and in test-retest and inter-observer reliability, the two sets of scores are correlated
      • Correlation coefficient should exceed +0.80 for reliability
    • Improving reliability
      • Questionnaires - low test-retest reliability (below 0.80) amend or remove certain questions eg. complex or ambiguous ones. Open questions can be replaced by closed, fixed choice ones.
      • Interviews - use the same interviewer each time, not always possible, all interviewers must be fully trained, easily avoided in structured interviews, unstructured are more likely to be unreliable
    • Improving reliability
      • Observations - make sure behavioural categories are properly operationalised and self-evident, categories should not overlap
      • Experiments - procedures must standardised for comparison
      • Content analysis - categories used in coding must be properly operationalised
    See similar decks