likert scale developed in an attempt to improve the levels of measurement in social research
a 5 point scale to express magnitude of agreement or disagreement
respondents are requested to state their level of agreement with a series of statements
optimal responding vs non contingent responding
optimal responding: ideal or desired behaviour - implies thoughtful, accurate, and contextually appropriate answers. avoiding biases such as social desirability or acquiescence. it aims to capture the true opinions, attitudes and behaviours of the respondents
non contingent responding: tendency to provide consistent answers across different questions, regardless of the content or context of each question. It suggests that individuals respond in a stable and uniform manner throughout the questionnaire. not unduly influenced by question wording, order effects, or other external factors. however, extreme non-contingency might indicate rigidity in responses
thurstone scale
developed by thurstone in 1928, as a means of measuring attitudes towards religion
equal appearing interval, each statement represents a different scale value for the attitude (highly favourable attitude towards religion, to neutral to highly unfavourable, determined by a panel of judges
thurstone scale - limitations
subjectivity in statement selection: the choice of statements can impact the validity and reliability of the scale
true interval scale: achieving true interval scaling is challenging - assumes the distances between adjacent scale points are equal
limited sensitivity: limited availability to differentiate between individuals with subtle differences in attitudes
assumption of unidimensionality: in reality, attitudes and opinions may be more complex and multidimensional, which can limit the scales ability to capture the full range of variation
thurstone scale-limitations cont
scoringcomplexity: can be complex and may involve sophisticated statistical techniques such as factor analysis
response bias: reluctance to use extreme categories
reliance on expert judgement: the creation of thurstone scales often involves expert judgement in the selection and ranking of statements. critics argue that this reliance on expert opinion may introduce biases and limit the scales generalizability
guttman scale
unidimensional scale with cumulative property: statements are ordered so that person who accepts a particular item will also accept all previous items
preise is that if a person agrees on an extreme indicator on a variable in question, they will also agree on a less extreme indicator
guttman scale - to determine if a hierarchical pattern exists between among responses
do you drink
do you smoke marijuana
do you use cocaine?
if person answers yes to 3 then they would also answer yes to 2 and 1
guttman scale limitations
unrealistic assumption of unidimensionality: that all items are measuring a single latent trait. The previous example shows that this is not true
stringent requirements for scalability: all respondents must endorse all items below their own position on the scale. limited flexibility in item placement
the guttman scaling procedure assumes a fixedhierarchy of items, this lack of flexibility can be a limitation when researchers want to add or remove items from the scale or consider alternative items
guttman scale
difficulty in scale development: requires a rigorous process of item development, testing and refinement. Achieving a perfect hierarchical arrangement of item can be challenging
guttman scale limitations
scoring complexity: the scoring and analysis of guttman scales can be complex, especially when dealing with large datasets
guttman scale limitations
limitation measurement precision of the underlying trait. the scale provides ordinal data but may not offer a precise measurement of the interval between different levels on the scale
guttman scale limitations
guttman scales may struggle to detect intermediate positions on the latent trait. in other words, the scale might not effectively capture nuances or variations in respondents' attitudes or behaviours
semantic differential scale
type of rating scale designed to measure the connotative meaning to receive the attitude towards the given object, event or concept
measures more depth in someone's attitude - encourages participant to think around topic more deeply
semantic differential scale limitations
relatively simplistic representation of attitudes or concepts may not capture the full complexity and nuances of the underlying construct being measured
have a limited number of scale points, from 5-7 and may not provide enough granularity to accurately reflect the subtleties of respondents attitudes
interpretation subjectivity: different individuals may interpret the scale points differently, leading to potential variations in how respondents understand and use the scale
assumptions of linearity: implying the distance between each point is equal
semantic differential scale limitations
culture and language sensitivity: the choice of words to represent the scale endpoints may not have the same connotations or meanings in different cultural or linguistic contexts
limitedcontextual information: semantic differential scales provide a numerical score but may lack detailed contextual information about why respondents chose a particular position on the scale
inability to capturechanges over time
difficulty in developing anchors
semantic differential scale limitations cont
selecting appropriate and balanced scale anchors can be chalenging
potential response bias: respondents may exhibit response bias, such as a tendency to only use certain points on the scale to avoid extreme categories
semantic differential scale limitations continued
selecting appropriate and balanced scale anchors can be challenging
potential response bias: respondents may exhibit response bias, such as tendency to use only certain points on the scale or to avoid extreme categories
psychometric tests
tests and questionnaires are often referred to as 'psychometric' because psychological theories of human behaviour and its measurement have been used in their construction
used to measure a person's capacities, work style or values. employers need this sort of information when they want to recruit a new employee or understand the potential and development needs of an existing one
psychometric tests
when developing a new psychometric measure e.g. for work performance
psychologists first carefully define what it is they want to measure
involves researching evidence on work performance to identify which personal factors are related to quality of functioning in a particular area
3 diff categories of psychometric tests
normative tests - where data exists which tells us the range of scores expected from the population under consideration e.g. IQ scores
criterion referenced tests - tests commonly used in education where a candidate has to meet some pre-arranged standard
idiographic tests - tests are used in therapy to observe an individuals progress over time
applications of personality traits
criminal psychologists might employ questionnaires to measure impulsivity and its relation to crime
health psychologists might measure peoples optimism in relation to their response to cancer diagnosis
occupational psychologists often employ personality tests to predict job performance and job suitability
all require standardisation - must be administered and scored the same way every time
Construction of Psychometric tests
Rigorous construction procedures including several pilot stages
Large number of questions (test items) typically >40
Item analysis: – individual items– combined effects of test items – Filter redundant/ non equivalent questions out
Evaluation to ensure: – Reliability (same scores over time)– Validity (does it measure what it is supposed to?)– Appropriate convergent and discrimination(compared to other measures ..?)
test-retest reliability: if a person retakes the test or takes a similar test within a short time after first testing, does he or she receive approximately the same score?
reliability :splithalf method: half of the test is administered on one occasion, the second half on another, to the same participants
alternative-forms method: two equivalent versions of test developed and given to same participants on two occasions
internal reliability
determines the internal consistency or average correlation of items in a questionnaire to assess its internal reliability
greek letter alpha
should range between 0 an 1 ( if negative, check you have reversed scored the correct items)
an a> .70
face validity - does the test seem valid according to common sense
validity cont
criterion validity: the extent to which a measure relates to an outcome
external validity: how well the findings of a study generalise to other situations or populations
content validity: a test should sample the full range of a behaviour represented by the theoretical concept being tested ( not just one part of it - difficulty concentrating, remembering details and making decisions )
construct validity
does it truly represent the theoretical construct it was developed to assess
ecological validity: are the results representative of the results that would be obtained from studying that behaviour in the natural environment
some threats to internal validity
morality: affects longitudinal studies - participants drop out before study is completed
maturation: change independently of your study. factors such as tiredness, boredom and hunger
threats to internal validity of experiments
regression effect: tendency of participants with extreme scores on a first measure to score closer to the mean on a second testing
related to the fact that extreme scores tend to be due to random error and so on second testing performance will be closer to the mean. you might conclude good students did worse (less effort) and bad students did better (benefitted from interventions)