The Flynn Effect refers to the well-established finding that average IQ scores have been rising steadily in all parts of the world for a century or more, at the rate of about 3 IQ points per decade. The effect is named after the political scientist James R. Flynn, professor emeritus at the University of Otago (New Zealand), and currently an associate at the The Psychometrics Centre at the University of Cambridge. Flynn did not discover the Effect himself. It has been named after him because he was the first to draw the attention of other researchers to it, and because his work has defined work in the field.
The central finding of the Effect is that all over the world people in a very wide range of cultures and societies are getting higher and higher scores on tests measuring the general factor of intelligence or g. This phenomenon has forced psychometricians to repeatedly renorm the tests, but (before Flynn) without paying enough attention to it. Renorming was found to be necessary in order to compare scores on IQ tests across generations. To do this, either recorded scores for past generations have to be revised upward, or scores for the present generation have to be revised downward. Either move generates apparently absurd results.
Finding the Effect required sleuthing due to a feature of IQ research that deserves some attention, because it is easy to overlook. An IQ score is an intelligence quotient. It designates a relation between an individual's score on a test and other individuals in his population. IQ was defined in this way from the outset, since the original purpose of IQ testing was to find a quick and dirty way of placing individuals in various job categories in the military and in business. Since the sorting and categorization function was so important, it was built into the very definition of IQ.
The mean IQ of a group is defined by convention to be an IQ score of 100. This is in no way an absolute measure of intelligence, but simply reflects an attempt to determine the relative standing of an individual relative to a given group. One consequence of the procedure is that, while the raw scores of students in school typically increase from year to year, the relative standings of the individuals vis-a-vis each other remain relatively stable after the ages of 5 or 6. This stability of relative standing (i.e., of the “quotient”) does not mean that in an absolute sense students are not getting "smarter." They are, but this fact is represented by the typically increasing raw scores on a given test as the student gets older, not by his or her IQ score.
What Flynn had to do was adjust the IQ scores for lots of intelligence tests given to individuals all over the world for the last 100 years. In effect, what he was finding were the comparable raw scores on these tests (i.e., before the renorming by the psychometricians had set the mean for any given group to the conventional 100). When he did so, he found that massive increases in the raw scores, particularly in heavily g-loaded tests like the Raven's Progressive Matrices, had occurred over the last 100 years.
This is an issue that needs to be kept constantly in mind when one is reading Murray, because he often switches, without warning, between talking about intelligence as an absolute measure (i.e., as measured by raw scores), and relative standing (i.e., how an individual compares with others in his or her own population or group). His observation that "half of the children are below average" is a point about the latter. His discussion of the social effects of raising lower ability IQs three points is, I take it, mostly a point about the former.
The points I made in part two about the NAEP and the CLA involved absolute intelligence, not relative standing. Each single cohort of students gets smarter from year to year, because the raw scores—or their NAEP equivalents—rise with age, and in particular with greater schooling (e.g., compared with matched students who drop out of school). This is not a point about relative standing or quotients or ratios. It does not involve any claim that gaps in IQ or cognitive abilities are reduced.
The Flynn effect is well established, and it is so large that it cannot be explained away by changes in the gene pools of the populations that have been studied. (No plausible account has ever been offered that it can be.) It therefore poses a direct challenge to those like Charles Murray and Arthur Jensen who believe that academic ability is largely hard-wired in the genes and that little can be done to significantly raise academic ability.
As in the case of quantum mechanics, it is very difficult to find an interpretation or explanation for the Flynn Effect that doesn’t lead to paradox and apparent absurdities. Below is a list of some of these, taken from the literature.
· The cognitive psychologist Ulric Neisser has estimated that if American children of 1932 could take an IQ test normed in 1997, their average IQ would be only about 80. In effect, half of the children in 1932 would be classified as having borderline mental retardation or worse in 1997.
· In an analysis of cross-generational test scores on Raven's Progressive Matrices, one of the most heavily g-loaded tests, Flynn found data that spanned a complete century. (In Great Britain, IQ score results on some tests go back even further.) He concluded that someone who scored among the top 10% a hundred years ago would today be among the 5% weakest. The conclusion: someone who would be considered bright a century ago should now be considered a moron.
· Flynn found that, compared to the previous generation, the number of people who score high enough to be classified as "genius" has increased more than 20 times.
· To make matters worse for intelligence researchers and psychometricians, renorming of the SAT has had to be done in the opposite direction, because progressively higher percentages of students have been doing worse on the SAT. This poses a serious problem for psychometrics, since within generations (i.e., synchronically), SAT and IQ scores are highly correlated.
One of the current controversies over the Effect is the extent to which it has been constant across populations and groups. There is some evidence, though there is controversy about it, that the Flynn Effect is largely confined to the lower end of the ability spectrum. If true, that would indicate a narrowing of the IQ gaps between and within populations. In his latest book, Flynn disputes this finding, arguing that at least by some measures the Effect is fairly uniform across the entire range of IQs. He does believe, however, that the Flynn Effect has stalled for some populations, while it continues for others. One effect of this trend, he argues, is that the black-white IQ gap has declined. In 2006, Flynn co-authored a paper with William T. Dickens, “Black Americans reduce the racial IQ gap,” that argued that blacks gained 5 or 6 IQ points on non-Hispanic whites between 1972 and 2002, and that these gains have been fairly uniform across the entire range of black cognitive ability. At this rate, the present 15 point IQ gap between whites and blacks will disappear in a few generations.
To return, however, to the central finding: that average IQ scores have been rising massively from generation to generation for almost all demographic groups for most of the last 100 years.
Flynn himself has been baffled by the effect that has been named after him, and over time has offered different explanations for it. At first he was inclined to conclude that the Effect showed that IQ tests, including heavily g-laden ones like the Raven's Progressive Matrices, do not measure intelligence, but something else, perhaps the ability to manipulate abstract symbols, which in itself has nothing much to do with academic ability or what we think of intuitively as intelligence.
This is the least attractive alternative for those who want to identify intelligence with IQ as measured by the various tests. On their view, IQ tests measure something called general intelligence (called g), which the early IQ researcher Charles Spearman identified using factor analysis on various tests of intelligence. The Effect presents a real conundrum for these researchers, because some of the IQ tests that have shown the greatest increases in IQ scores from generation to generation, like the Ravens Matrices, are particularly loaded for g. Less heavily g-loaded tests of cognitive ability have shown significantly lesser gains.
Flynn’s more recent view, as articulated in his book What Is Intelligence?: Beyond the Flynn Effect, is that there are many different aspects of intelligence, and that the scientific age in which we are living has placed a premium on certain kinds of abstract skills that are developed in response to experience. We are not hugely more intelligent than our ancestors: it is just that the skills we exercise are significantly different than the ones that defined “intelligence” for our predecessors, who had to focus their equally good minds on more concrete problems. The Effect reflects the cultural fact that the problems that our brains are required to deal with are more abstract.
Flynn insists that it is essential in analyzing the Effect to break IQ tests into subcomponents, and then to show how social reinforcement can lead to the enhancement or degeneration of some of these through feedback loops. He calls these reinforcers social multipliers. In a scientific age like ours, for example, great emphasis is placed on abstract reasoning or skills. These get rewarded as more and more jobs require abstract reasoning and skills. Flynn’s latest explanation of the Effect (called the Dickens-Flynn hypothesis, after coauthor W.T. Dickens of the Brookings Institution with whom Flynn has frequently worked) attempts to accommodate, in a modified and limited way, theories of intelligence built around the g factor, while also avoiding the absurd conclusion that people fifty years ago would be retarded by today’s standards of intelligence.
An invocation of environment and its ability to change the brain will almost certainly be a central feature of any theory that attempts in a serious way to explain the Flynn Effect in any case, since the Flynn Effect cannot be explained by changes in the gene pool. Since the Dickens-Flynn view is based on the idea that the changes occur as a result of the brain’s response to experience, it is not defeated by the observation that the Effect cannot be explained by changes in genes. It also fits in perfectly with the view in the neurosciences that the brain is plastic—i.e., that it is constantly changing and remapping itself in response to experience. Flynn himself acknowledges that his way of explaining the Effect does involve neuroplasticity (though he does not use that term in his latest book). As he acknowledges (p. 64), “The Dickens/Flynn model assumes that current environment has large effects on cognitive skills…”
This point has potentially important consequences for education. If environment and experience can operate to massively raise cognitive and intellectual skills diachronically (across generations), the potential obviously exists of raising such abilities within generations (synchronically). The Flynn effect can, and apparently must, be explained by a generally more stimulating environment for all people. Greater understanding of how this happens could lead to the development of educational interventions that produce long-term gains in cognitive and intellectual skills. Seen in the light of neuroplasticity, the Flynn Effect is not only understandable, it is almost predictable.
College outcomes assessment and intelligence research
The central theme of Murray’s Real Education is that research on human intelligence demonstrates the futility of trying to significantly improve the academic ability and cognitive skills of students. This seems wrong headed. As complicated and controversial as assessment in higher education is, it strikes this observer as a land of crystalline clarity and light compared to the current state of research on human intelligence. (One of my reasons for focusing on the Flynn Effect was to demonstrate this, though a number of other issues could have been used to make the same point.) Instead of reaching strong conclusions about what is possible in higher education from the very controversial and contested territory of intelligence research, it is more sensible to simply develop good outcome measures for higher education, and then see what those measures tell us. This is not to say that assessment issues are easy; it is just that assessment is more reliable than trying, as Murray does, to deduce policy prescriptions from highly contested theories in research on human intelligence.
We need outcomes assessment instruments like the NAEP for K-12 and the CLA for higher education in any case. Their aim is to assess outcomes on measures that lie at the very heart of the academic enterprise: critical thinking, analytic reasoning, problem solving, and written communication skills. These higher order skills overlap with abilities that intelligence researchers believe are important elements of intelligence, so research in human intelligence will have to find some way to accommodate the CLA findings (assuming, once again, that they continue to hold up), and to incorporate them somehow or other into that field’s theories and findings about general intelligence (g), learning transfer, and so on.
The implications of the CLA findings for intelligence research are of interest, because research in human intelligence is interesting and important in itself. However, the implications are not of direct interest to higher education. Higher education’s purpose is not to help resolve quandaries and paradoxes in the field of psychometrics or intelligence testing, to present new quandaries and paradoxes to it, to help it find a way to accommodate new data into existing theories, or to advance that field. The job of teachers is simply to educate. That involves, as I have been arguing at some length, making students in a very real sense smarter.
Higher education must also support the development of outcome measures of its efforts, partly in order to validate what it is doing, and partly in order to improve what it is doing. The CLA is not the only outcome measure we need, but it is an important one, and its preliminary findings support optimism (contra Murray) that important academic and cognitive skills can be and are developed in college. That is good news, both for higher education and for the wider society and culture.
Alexander, Karl L., Gary Natriello, and Aaron M. Pallas (1985). “For Whom the Bell Tolls: The Impact of Dropping Out on Academic Performance.” American Sociological Review, vol. 50, pp. 409-20.
Ceci, Stephen J. (1996). On Intelligence: A Bioecological Treatise on Intellectual Development (Expanded Edition). Harvard University Press.
Ceci, Stephen J. (1991) “How Much Does Schooling Influence General Intelligence and Its Cognitive Components? A Reassessment of the Evidence.” Developmental Psychology, Vol. 27, No. 5, 703-722.
Coley, Richard J. (2003) “Growth in School Revisited: Achievement Gains from the Fourth to the Eighth Grade.” Educational Testing Service. (Online document)
Doidge, Norman (2007). The brain that changes itself. Penguin.
Flynn, James R. (2007). What is Intelligence?: Beyond the Flynn Effect. Cambridge University Press.
Natriello, Gary, Aaron Pallas, and Karl Alexander (1989). “On the Right Track?: Curriculum and Academic Achievement.” Sociology of Education, Vol. 62, pp. 109-118.
Neisser, Ulric et al. (1995). Intelligence: knowns and unknowns. Report of a Task Force established by the Board of Scientific Affairs of the American Psychological Association. Released August 7, 1995. A slighted edited version was published in the American Psychologist (the official journal of the APA) on Feb 1996, 51 77-101. (Online document)
Neisser, Ulric (1997). “Rising Scores on Intelligence Tests.” American Scientist, September-October issue. (Online document)
Ralph, John, Dana Keller, and James Crouse (1994). “How Effective are American Schools?” Phi Delta Kappan, Vol. 76, No. 2, pp. 144-150.
Image: Wikimedia Commons, Public Domain