In an unusual move, an astronomer this week removed his forthcoming publication from a preprint website, put his related book on hold and apologized in writing for the “hurt” he’d caused.
The incident has caught the attention of culture warriors, who suggest that the professor was forced to sacrifice himself at the altar of the woke academic orthodoxy. Others — including many astronomers — say the paper suffered from serious flaws and it never should have made it past peer review in the first place.
A Magic Formula
The scholar in question is John Kormendy, a professor emeritus of astronomy at the University of Texas at Austin. His now-withdrawn paper is extension of his new 311-page book, Metrics of Research Impact in Astronomy. The premise of both works, according to the paper (archived in full here), is that faculty personnel committees often use “qualitative indicators and uncertain personal opinion” to make decisions about scientists that will ultimately impact science itself. They therefore “should aim to do better.”
Kormendy’s solution is a sort of formula for predicting the future research impact of early-career astronomers based on different citation counts. Judging scholars by their citation counts is hardly a new idea. What’s novel about Kormendy’s approach is that he “calibrated” his “machinery” by asking 22 eminent astronomers to rank, or “vote” for, some 500 other astronomers on a scale from one to eight, with one being effectively reserved for Nobel Prize winners.
This calibration is supposed to account for what citations alone can’t communicate — a kind of scientific je ne sais quoi — and make it easier for astronomers to judge the potential of scholars working in subfields beyond their own expertise.
Kormendy says that this work was inspired by his own interest in helping junior scholars build their careers, and he warns multiple times that his framework should not be the only criterion used to evaluate job or promotion candidates. He also addresses the growing attention to inclusivity in academe and the possibility of bias in his study. But he says the biases he could measure, including gender (three of his raters were women), were negligible to small.
The paper was accepted by Proceedings of the National Academy of Sciences, but Kormendy published a preprint of it last week on the preprint repository arXiv. Negative responses — including charges that the paper would only perpetuate academe’s overreliance on simple metrics, to the detriment of scientific discovery — followed. Other concerns centered on the paper’s potential to perpetuate biases against women and minorities, who are already relatively undercited in a predominantly male, white field. Additional critiques focused on methodology, including that Kormendy’s limited focus on astronomers from only the most selective departments made his study so unrepresentative of the field as a whole as to be useless.
One astronomer (who declined to be interviewed for this article) wrote on Twitter, for instance, that “if it makes you happy to create these plots, go ahead. But suggesting that we use this model in hiring decisions is horrifying. It’s bad from a modeling perspective, an equity perspective, and because it implies that we don’t want things to change.”
Pieter van Dokkum, Sol Goldman Family Professor of Astronomy at Yale University, used one of Kormendy’s older theories about the relationship between galaxy size and surface brightness to dunk on both him and the new paper on Twitter: “The Kormendy (1977) relation fell into disuse as it was realized that the correlation is largely driven by biases. In other news, to everyone who applied for our open faculty position at Yale: rest assured that we will not blindly apply citation statistics to rank candidates.”
Kormendy confirmed via email that he’d received a waiver from his institutional review board to conduct this research on human subjects, in the form of his astronomer raters. But at least one of those astronomers has publicly expressed regret at his role in the study and suggested he wouldn’t have participated had he known more about what Kormendy was doing.
“My interest at the time was actually a concern about what I view as the overuse of citations In judging academic activity,” Nobel Prize winner Brian Schmidt, vice chancellor of the Australian National University, wrote in a series of tweets about his having acted as a rater. “My mistake was thinking that my participation would provide value to that question — it has not. While I have not been involved in any other aspect of the article, I do accept a level of responsibility for [its] existence by my participation.”
Within days, Kormendy had withdrawn the paper as a preprint and journal article and published a mea culpa on his website, saying, “I apologize most humbly and sincerely for the stress that I have caused with the PNAS preprint, the PNAS paper, and my book on using metrics of research impact to help to inform decisions on career advancement.”
Adding that his goal was to “promote fairness and concreteness in judgments that now are based uncomfortably on personal opinion,” he said it “was hoped to favor inclusivity” and “especially intended to help us all to do the best science that we can.” A specific aim, he continued, “was to provide calibration of the ‘black box’ of jump-starting successful research careers. One of my aims throughout was to help young people as they start their careers and to help established scientists as they make decisions on how to lead the scientific enterprise at their institutions and beyond.”
However, Kormendy said, “intentions do not, in the end, matter. What matters is what my actions achieve. And I now see that my work has hurt people. I apologize to you all for the stress and the pain that I have caused. Nothing could be further from my hopes.”
Regarding his book, which was accepted for publication as a conference series monograph by the Astronomical Society of the Pacific, Kormendy said it was “on hold.” Kormendy had “hoped to generate healthy discussion and to provide practical aids,” but “we will see in the coming few months whether any good can be salvaged from its development.”
He added, “I fully support all efforts to promote fairness, inclusivity, and a nurturing environment for all. Only in such an environment can people and creativity thrive.”
Following the apology, Kormendy wrote another post on his website that addresses some of the more detailed concerns about his paper but then removed it soon after.
Some sense that Kormendy is the victim. Writer Matthew Yglesias wrote facetiously on Twitter, for instance, that it “Seems like a healthy, sane climate that America’s college professors are operating in.”
Yves Gingras, Canada Research Chair in History and Sociology of Science at the Université du Québec à Montréal, and an expert on bibliometrics and research evaluation, said Wednesday that it’s nearly always possible to debate methodology on “technical grounds.” More concerning, he said, is that this case is an example of the “recent rise in the quote-unquote moralization of science, where conclusions are evaluated on purely moral grounds and based on subjective feelings of being hurt by conclusions or results and based on hypothetical bad consequences that flow from those works, independently of their validity.”
Gingras further said that Kormendy’s apology uses an increasingly familiar rhetorical structure, in which “he apologizes for having hurt anybody and swears he wants to promote equality diversity and inclusion, the new magic words that must be used in order not to be ejected from the well-meaning society where people have lost the capacity to rational debate. I maintain that such decisions are irrational and do not serve anybody.”
Via email, Kormendy said that he’d been advised, including by his department chair, that withdrawing the publications was the best path forward. At the same time, he said, “I believed in the work that I did for my book and for the PNAS article that summarizes and extends the book. It is a matter of record — not my personal opinion — that the book was reviewed by two readers commissioned by the publisher, the Astronomical Society of the Pacific, and by the [society’s] publications committee. The PNAS paper was refereed by 3 referees,” to favorable reports.
Kormendy continued, “I do not want controversy in the context of the National Academy of Sciences. Any controversy should be mine alone. It was recommended to me that I withdraw the paper. I agreed 100 percent.”
Kormendy said he won’t resign his position at the university but that he’s disengaging from astronomy for several months.
Kormendy’s chair, Volker Bromm, said in a separate email, “UT Astronomy is committed to free expression. I invited Prof. Kormendy to share his ideas in a department seminar on this topic last month. My own thoughts on the paper are unimportant, but as department chair, I stand by any decision he wants to make about his academic research. I did not advise him to withdraw the paper and connected with him only after he had announced his plan to do so in his posted apology.”
What Went Wrong
Papers that attempt to predict a scholar’s potential aren’t outside the realm of what’s publishable. A 2016 paper in Science, for instance (not by Kormendy) described something called Q, or a measure of how successful a particular scientist may be throughout their career, in terms of citations. Q doesn’t appear to change with time, according to the paper, and could be something to consider in personnel decisions. Q was not the primary subject of that paper, however (it focused on whether scientific creativity decreases with age), and it was cautious about whether Q could be used in personnel decisions.
Kormendy’s new paper doesn’t suggest that institutions base personnel decisions only on metrics. “Judgments — especially decisions about hiring and tenure — should be and are made more holistically, weighing factors that metrics do not measure,” he says. But he does suggest, in detail, that his method can be used to whittle down a long list of candidates to a “few tens” of finalists, and that “automating this — with careful checking — would save institutions a great deal of work.” He also suggests that using his metric machinery “allows institutions to provide to oversight agencies rigorous and quantitative documentation on at least some aspects of how decisions are made.”
This analysis is where Kormendy’s paper really “goes off the rails,” said van Dokkum, of Yale, adding that Kormendy isn’t so much a casualty of cancel culture as of his own “bad science.”
“This is a very odd way of approaching this complex process of hiring,” van Dokkum said. “The other thing is, what this shows is just the opinions of these 22 people, who are not a particularly representative subset of the community. But somehow these 22 people and how they subjectively judge people correlates with the number of citations that these people had and presumably will have, and that’s it.”
Essentially, van Dokkum said, Kormendy is arguing, “We should listen to our elders. We should listen to these 22 people, and here’s a proxy for what these 22 people think, that anyone can use.”
Bryan Gaensler, Canada Research Chair and professor of astronomy at the University of Toronto, said that evaluating scholars, their work and their potential is “time-consuming and difficult. There are no formulae that can provide a shortcut.”
Beyond that, Gaensler asserted that there was in fact evidence of gender bias in Kormendy’s paper. Moreover, he said, “There’s a vast amount of excellent sociological research on metrics, bibliometrics and bias, none of which was cited or drawn upon.”
For starters, “It’s well-known that women, people of color and especially women of color get rated lower in assessments, that they get cited less and that their work is considered of lower impact. The work presented didn’t provide any remedy to or safeguard against these known effects, so the presumption is that those biases are present.”
Kormendy’s apparent workaround — asking “gold-plated” astronomers to serve as raters — doesn’t cut it, Gaensler said.
He added that it was less than appropriate for Kormendy to include himself as one of those 22 raters.
The paper lists five references, two of which are Kormendy’s own publications.
“I have enormous respect for John Kormendy, for his body of work and for his commitment to careers and to young scientists,” Gaensler said. “However, I don’t think the premise that motivated this work is correct, and I don’t think the actual work done should have passed peer review.”
There is indeed significant data on the practical effects of academe’s affinity for metrics. Cassidy Sugimoto, Tom and Marie Patton School Chair in the School of Public Policy at Georgia Institute of Technology, who studies how knowledge is created and research is measured, said she hadn’t been following the Kormendy case closely, but that she had two divergent concerns, based on what she knew: that peer reviewers must be experts in the discipline they’re reviewing, especially when the author is from another field, and that “we create a dangerous precedent for withdrawing works that are controversial but perhaps not inaccurate.”
“Science is inherently dynamic,” Sugimoto said. “The record is self-correcting as science progresses. Science builds on the interaction with previous work — either reinforcing or contradicting. The accumulation of evidence allows us to progress. We cannot engage meaningfully in this dialogue if controversy is met with retraction rather than debate.”
That said, she added, “we should use retractions and withdrawals when necessary,” and “with precision, lest it become a chilling effect for empirical dialogue.”