Ed. note: I wrote this blog post in May 2021 and never posted it, perhaps because I didn't want to seem to grumpy. I'm less worried about seeming grumpy now, and I still agree with it for the most part, so it's time to finally let it see the light of day.
I just received my most recent round of paper rejections. As usual, the reviews raised a few good points, but for the most part were cursory and incorrect. A few years ago, I internalized the fact that a paper rejection was not a reflection of my self-worth, or even the quality of the paper itself. However, the same is not true of my more junior co-authors. In particular, the students in my nascent lab at UNC might feel pretty disheartened by a rejection. How did we get to the place where we can’t rely on peer review to be a useful way to determine a paper’s quality? I think the main reason is that we, as a community, are over-indexing on paper acceptance.
This year was my first time serving on a graduate admissions committee. UNC is not a “top-5” school, but we nevertheless get many outstanding applications. Many applicants were applying to a Ph.D. or Master’s having already published multiple papers at top-tier venues. As a PI, it is tempting to hire students who have this level of experience, since it indicates that the student may be able to start publishing papers very soon after they start graduate school. The fact PIs are using prior paper acceptances as a heuristic measure for acceptance has led to a perverse incentive system for applicants where they think they must have published papers before applying to a Ph.D.
Separately, many other incentive structures in academia are built around paper acceptance. For my first few years as a research scientist at Google Brain, after each round of accept/reject decisions were released, leadership would send an email announcing “congrats everyone - we had N papers accepted at Venue X this year!”. Applicants to faculty positions are similarly judged in terms of how many papers they have published and at which venues (though here paper quality, letters of recommendation, and similarly-imperfect metrics like citation count and h-index come heavily into play). The de facto ranking of computer science departments literally only uses paper acceptances to determine ranking - nothing else. I imagine that tenure, Ph.D. candidacy, and funding decisions use similar heuristics, though I have less experience in those settings.
Some years ago, the NeurIPS conference ran an experiment where a subset of submissions were reviewed by two independent cohorts of reviewers and area chairs. You can read more about this impressive effort here. The goal of the experiment was to see how often the cohorts disagreed on accept/reject decisions. Unsurprisingly, they often disagreed. My preferred hypothesis for this effect is that papers tend to fall into one of three groups: Papers that are clearly bad and would be rejected by virtually any competent committee; papers that are clearly outstanding and would always be accepted; and papers in the “messy middle” where acceptance virtually amounts to a coin flip.
Indeed, I think many researchers have internalized the fact that accept/reject decisions can be somewhat random. Researchers often learn this the hard way - if you work hard on a paper that you truly believe should be accepted, but ultimately get cursory and plainly incorrect and negative reviews, how else should you explain it to yourself?
What should a researcher do when faced with the incentive to get as many papers accepted as possible, but also the knowledge that the review process is somewhat random? Writing papers that fall into the “outstanding” strategy is clearly very hard, takes a lot of time, and probably is ultimately quite rare. The optimal strategy is therefore to try to write as many papers that fall into the “messy middle” as possible and resubmit over and over again until the coin flip falls in your favor.
I think this strategy is one of the factors leading to the steady (linear, if not superlinear) increase in the number of submissions at virtually every conference I publish at (the popularity and excitement of the field is also clearly a major contributor). A consequence of this is that reviewer load has steadily increased. Consider the perspective of the reviewer: You have many more papers to review than you have time to give due consideration to; you are doing volunteer work that is barely incentivized; and you (like all your peers) have internalized the fact that peer review has gotten to a quite ugly place where many accept/reject decisions are effectively chance. So, you give papers a cursory read, you rely on bad heuristics to make accept/reject decisions (Does this paper get SoTA? Does this algorithm seem fancy enough? etc.), and you write short and uninformative reviews. Who can blame you? The net effect of this is that review quality, and consequently trust in the system, continues to deteriorate. This leads to increased decision randomness and further incentives to write and (re)submit as many “messy-middle” papers as possible.
The irony is that as submissions increase and review quality decreases, we are indexing on what is increasing a random signal. Paper acceptances increasingly mean “this paper is not awful, and the authors got lucky”, and not much else. It is worrisome that so much of our careers (graduate school acceptance, faculty applications, department rankings, etc.) depend on such a noisy signal.
How should we counteract this trend? A common approach to mitigating this issue is to plea for reviewers to work harder, and/or for conferences to create more and more onerous review forms (asking for ratings along many axes, various checklists, word count requirements, additional text boxes, multi-stage review processes with desk rejections, etc.). Until there are more meaningful incentives for reviewers, I have not seen any of these efforts significantly improve review quality (both as a paper author and a many-time reviewer/area chair).
Instead, I think we should stop over-indexing on paper acceptance. I personally try not to pay much attention to whether a paper has been published, and have hired students who have no first-author publications. I also try to avoid the self-congratulatory “N papers accepted at conference X” announcements. I am considering removing all publication venue information from my website and CV, though I recognize this probably will have limited impact. If you are making a decision about a researcher’s worth, and are heavily weighing how many papers they have had accepted and at which venues, please keep in mind that this is an incredibly noisy and ultimately unreliable measure of the researcher’s worth. If you want to judge a researcher’s quality, the only meaningful way is to read their papers and judge for yourself.