For those who do not know this, the Psychometrics Forum began life in 1985. Back then it was the 16PF User’s Group which I founded in an era when psychometrics was very different from today. And yet, in being asked to write this blog, I am also struck by how little has changed! Over the years I have learnt to accept and even make sense of such paradoxes.
The early years
When I started the 16PF Users Group in 1985 computers were still a novelty. My career as a Psychologist began at NFER Publishing Company, the main distributor of psychometrics in the UK (which included 16PF, MBTI, GPPI and FIRO to name but a few). Learning about these instruments was a tortuous process. There were dense technical manuals to wade through and readable interpretative guides were not even on the radar. Practitioners spent hours writing reports which, to some, were like the 13th labour of Hercules – I still have had delegates come back 15 years later asking to complete their qualification. When the 16PF User Group was founded there is no doubt that training to use any personality questionnaires was a significant effort. So is today’s training process any easier? To judge that perhaps we need a recap.
The development of testing ‘best practice’ guidelines
… and so Ray Cattell is to blame … for a whole generation of Level A delegates that have been traumatised
Today’s BPS/EFPA guidelines for test use have a history. The early guidelines grew out of a dilemma that faced the NFER Publishing Company (the commercial arm of the National Foundation for Educational Research – NFER). NFER Pub Co was, therefore, wholly owned by a registered charity. The reason why NFER was so central to psychometrics was because the most significant testing effort post 1945 was the 11+. Hence the NFER was the centre of expertise in tests and measurement. When demand for personality testing in the occupational field began to grow there was an important question – who should be allowed to buy and use tests? Perhaps because the NFER were a charity, this was seen as an ethical rather than a commercial question. Hence the NFER asked the British Psychological Society (BPS) for a professional view – I believe this was in the late 1960’s. In their turn the BPS sought advice from the author of the most requested personality questionnaire at the time – the 16PF. And so Ray Cattell is to blame! He believed that, to understand the 16PF, people would need to understand its statistical properties – and that wasn’t just means, SD’s, scales scores and correlations – it also meant gaining some understanding of Factor Analysis. We can debate whether Ray was right or wrong but the effect was that early training courses included manual calculations for extracting the principal component from a correlation matrix. The result has been a whole generation of Level A delegates who have been traumatised by their experience!
Level A to Test User Ability (TUA) – why and what’s the difference?
Over time the emphasis on statistics in qualifying courses has reduced. Over the same period the length of training has also reduced. The first ‘Level A’ course I knew about was a 7-day version run by the NIIP (National Institute for Industrial Psychology). NIIP was eventually absorbed by NFER Pub Co and when I joined there in 1977 Peter Saville was running 5 day Level A courses. Since then the Level A equivalent courses have atrophied down to 3-day, 2-day and now down to 1-day assessment processes. Of course, one reason is that a significant amount of the learning has been shifted online. However, it may be interesting to question whether competence to use psychometrics today is either a different standard or simply different? Perhaps people need less expertise because computers have replaced it or perhaps people are so much cleverer today – but please don’t quote the Flynn effect as evidence. Maybe we should request a more serious analysis of what has been gained and what has been lost.
These comments have been focused on the training/learning/assessment for ability testing. However, what has changed in the tests themselves? I believe that if Spearman or Cattell were here today they would certainly recognise what we are doing in ability testing (in spite of the technology of how it is done). However, my belief is that they would share my view that the psychometric contribution – which started so well – has now faltered. If psychometrics has something to offer, it should include converting important qualities into measureable quantities. It should not limit itself to what is easy to measure. And it should not use increasingly sophisticated technology and analyses to divert people’s attention from the basics. In my view the concept of intelligence is hugely interesting and complex. I suggest that psychometrics is partly to blame for restricting its definition. Psychometric measurement of intellectual capabilities is far too narrow and formulaic – look at the narrowness of the standard verbal, numerical and abstract reasoning tests that are the mainstay of the psychometric enterprise today. In this rapidly changing world there are many more abilities that deserve the rigours that psychometrics could bring. I believe we should be going even further – we should be contributing to a redefinition of what intelligence means. A criticism of mainstream psychometrics is that there can be very loose claims such as ‘developed with rigour’ and ‘fully validated.’ Such claims often lack sufficient specificity (how, what, where and when). Perhaps those who claim ‘rigour’ should in fact claim ‘rigor mortis’?
… should those who claim ‘rigour’ should in fact claim ‘rigor mortis’?
And what has happened to personality?
The comments above were focussed on the testing of abilities. However, one of the most significant changes over the last 33 years is the explosion in the measurement of personality. This is almost entirely using self-report methodology. Of course, we now have personality via Big Data (i.e. often gleaned surreptitiously from Facebook and elsewhere) – but that is too big a topic for this blog. Hence I will focus my comments on personality as measured by self-report questionnaires and the qualifications that people need to use them competently – or at least to access them.
In terms of the qualification process, the old model was achieving Level A (to learn about measurement) followed by Level B (to apply those ideas to personality). This was challenged many times by coaches and facilitators who argued that it was not their job to evaluate a questionnaire – they simply wanted to use it. Hence a questionnaire with the right credentials (maybe via the BPS Test Review process) could be used after more practical and less technical training. This groundswell led to a review of the BPS qualification process in conjunction with the European Federation of Psychological Associations – EFPA. The result has been the change to the Test User Ability (TUA) and Test User Personality (TUP) qualifications that have replaced Levels A and B. How significant are these changes? It is true that some of the more esoteric elements of the Level A as pre-requisites for obtaining a TUP qualification can now be bypassed but you would be hard pushed to call it a major revision of the syllabus. I think there is still a debate to be had regarding whether the syllabus has been right all along or whether there has not been sufficient revision in the light of experience and technological changes. When we turn our focus to personality testing there is, perhaps, a more fundamental question. Are there different philosophies at work? At the risk of over-simplification I will summarise these philosophies as applied to self-report methodology as follows:
1. The measurement model: psychometric personality questionnaires are a way of quantifying psychological attributes. It has physical measurement as its parallel and hence focuses on issues of accuracy and asks how do the numbers generated correlate with real world issues. Hence it gets tied in with issues of permanence and change (c.f. the nature/nurture debate).
2. The narrative model: psychometric personality questionnaires are a way of exploring the stories that people have about themselves. It takes ‘100 bits of information’ (i.e. the statements in the questionnaire) and, unlike a ‘100 bits of information’ in a conversation which can be somewhat scattered, questionnaires provide a structure for these stories. This has the benefit of allowing comparisons – not only with other people but with the self over time. Hence it focuses more on usefulness and its contribution to growth.
… consider the character of Casanova … becoming a dirty old man …. how important is context …. flexibility ….role ….
It may be obvious that these two approaches can be at odds. The traditional psychometric model is predicated on stability and hence is focused on minimising change; test re-test reliability assumes great importance; the focus becomes the less dynamic or flexible elements of people’s character. However, these assumptions or biases are not always made explicit. Consistency is seen as evidence of the permanence of personality – and all that goes with it. There are alternative explanations. Consistency could be the result of what is being looked at or how it is being looked at. It could be the reality that people get stuck in their own stereotype (encouraged by the fact that I believe all personality questionnaires except the TDI encourage people to give the first answer that comes to them) or that they have difficulty evolving their story over time. Consider the character of Casanova – he probably believed he was still a gigolo without realising that he was becoming a dirty old man. This raises some fundamental questions about what we mean by ‘personality’. How much is it about typical behaviour; what is the importance of context; does flexibility of role behaviour impact on what we mean by personality; is there such a thing as a ‘work personality? Has the current trend (and explosion) in personality questionnaire development begun to address such questions?
Are we seeking to understand personality or simply to measure it?
This leads me on to another change I have noticed over the last 33 years which makes me question whether the psychometric model has become the tail that wags the dog? When I was introduced to this area by Paul Kline in 1972 (and later with Ray Cattell when I was invited to arrange and host ‘The Cattell Seminars’ in 1991, 1992 and 1993) I was struck by the spirit of enquiry. Eysenck and Cattell engaged in vitriol which was dominated by a desire to understand the nature of personality (and probably elements of ego as well). Measurement was a necessary part of that enquiry but not the sole purpose. Since then we have had two major trends. The first is the replacement of the search for the fundamental building blocks of personality with the generation of lists of labels that appeal to the user (e.g. persuasiveness, leadership, empathy etc.). Now, in my view, such labels are complex composites of many underlying factors – they fit into the idea or ‘competencies’ that are influenced by, rather than being personality characteristics. The second trend has been to use the Big Five as a catch-all justification for any questionnaire. Let’s be fair – it is easy to create scales with attractive labels that have high internal consistency. It is also very likely that many such scales, when factor analysed, will show a reasonable fit to the Big Five model of personality. This is supported by the evidence that there is now a plethora of questionnaires with attractive labels all of which can be subsumed under the ‘The Big Five’ but has this really increased our understanding? I believe many in the psychometric community have become complacent, using an orthodox model that is not fit for purpose (the subject of a later blog perhaps) and taking the Big Five as an easy justification. And all of this is based on people’s self-report – so we need to recognise that this, in itself, raises many questions and issues.
Some thoughts to take away
Has the psychometric world got carried away with its own methods and models? Is the supposed ‘validation’ of instruments missing that fundamental building block of science – the seeking evidence that disproves a hypothesis rather than scanning masses of data for supportive evidence? Are psychometricians using the trappings of science but without a genuine spirit of enquiry? Do we need to recognise we are dealing with ‘soft numbers’ (i.e. self report) and that increasingly sophisticated analyses (Item Response Theory, Factor Analysis, Cluster Analysis, Mathematical Modelling) may be interesting but the inferences that can be made are limited. Should the concept of personality be reconceived!!! Does psychometric thinking need a big and powerful stimulus to bring it back to life rather than using all that sophisticated analysis? I feel rather like a paramedic watching a patient in intensive care who is in danger of rigor mortis and who needs that short, sharp shock to stay alive.