9. Reliability and Validity

Cards (12)

  • What is validity?
    • Refers to a psychological test, observation or experiment produces a legitimate result and represents what is actually 'out there' in the real world
  • Internal validity
    • A measure of whether the results obtained are solely affected by the variable being manipulated and not by other factors - the researcher has measured what they intended to
    • Major threats to internal validity include demand characteristics and are often present in lab experiments
  • External validity
    • A measure of whether data can be generalised to other situations outside of the research environment
    • Temporal validity - the extent to which findings from a study can be applied across time
    • Ecological validity - the extent to which findings can be generalised to other settings and situations eg. lab environments may have low ecological validity
    • Mundane realism - the task used to measure the DV is not like 'every day life'
  • Ways of assessing validity
    • Face validity - whether a test or scale measures what it's supposed to measure. Can be passed to an expert to check
    • Concurrent validity - the extent to which a psychological test or scale relates to an already established test or scale. High concurrent validity occurs when there is a close agreement between the two sets of data
  • Improving validity
    • Experiments - high level of control over extraneous variables, having a comparable control group, standardise procedures to minimise pps reactivity and investigator effects, single blind and double blind procedures
    • Questionnaires - incorporate a lie scale which controls effects of social desirability bias, anonymity, removal of leading questions
    • Qualitative research - higher ecological validity due to depth and detail of case studies, triangulation through the use of different sources as evidence
  • What is reliability?
    • How consistent a measuring device is including psychological tests and observations which assess behaviour
    • If the same result is produced twice then the measurement is reliable
  • Ways of assessing reliability
    • Psychologists tend not to measure concrete things like length or height but more interested in abstract concepts such as attitudes, aggression and memory
  • Test-retest
    • Administering the same test or questionnaire to the same person on different occasions
    • If the test or questionnaire is reliable then results should be similar or the same - can also be applied to interviews
    • There must be enough time between test and re-test to ensure that pps cannot recall their original answers but not so long that their attitudes have changed
    • For questionnaires or tests, two sets of scores will have a correlation if similar
  • Inter-observer reliability
    • Observations should be conducted with at least two people to remove subjectivity and bias - there will be an agreement between two or more observers
    • This may involve a pilot study of the observation to check observers are applying behavioural categories in the same way
    • Observers need to watch the same event but record their data independently and the data collected should be correlated to assess it's reliability
    • Can also apply to content analysis
  • Measuring reliability
    • Correlational analysis and in test-retest and inter-observer reliability, the two sets of scores are correlated
    • Correlation coefficient should exceed +0.80 for reliability
  • Improving reliability
    • Questionnaires - low test-retest reliability (below 0.80) amend or remove certain questions eg. complex or ambiguous ones. Open questions can be replaced by closed, fixed choice ones.
    • Interviews - use the same interviewer each time, not always possible, all interviewers must be fully trained, easily avoided in structured interviews, unstructured are more likely to be unreliable
  • Improving reliability
    • Observations - make sure behavioural categories are properly operationalised and self-evident, categories should not overlap
    • Experiments - procedures must standardised for comparison
    • Content analysis - categories used in coding must be properly operationalised