inter-observer reliability low - behavioural categories might not be operationalised clearly enough, removed revised and rewritten - some observers need more practise using categories so can respond more quickly
test-retest low - test items or questions might be ambiguous, removed, revised and rewritten - test items or questions overly complex or too broad, revised and simplified - test conducted slightly differently each time, ensure all aspects are standardised