The reviewer is not a proofreader, they are checking the rigour and relevance of the work, which does not rest heavily on all of the references in a document. They are also assuming good faith.
You'll find a lot of papers from, say, the '70s, with a grand total of maybe 10 references, all of them to crucial prior work, and if those references don't say what the author claims they should say (e.g. that the particular method that is employed is valid), then chances are that the current paper is weaker than it seems, or even invalid, and so it is extremely important to check those references.
Then the internet came along, scientists started padding their work with easily found but barely relevant references and journal editors started requiring that even "the earth is round" should be well-referenced. The result is that peer reviewers feel that asking them to check the references is akin to asking them to do a spell check. Fair enough, I agree, I usually can't be bothered to do many or any citation checks when I am asked to do peer review, but it's good to remember that this in itself is an indication of a perverted system, which we just all ignored -- at our peril -- until LLM hallucinations upset the status quo.
The paper author likely believes Foo and Bar are X, it may well be that all their co-workers, if asked, would say that Foo and Bar are X, but "Everybody I have coffee with agrees" can't be cited, so we get this sort of junk citation.
Hopefully it's not crucial to the new work that Foo and Bar are in fact X. But that's not always the case, and it's a problem that years later somebody else will cite this paper, for the claim "Foo and Bar are X" which it was in fact merely citing erroneously.
But this would be more powerfull with an open knowledge base where all papers and citation verifications were registered, so that all the effort put into verification could be reused, and errors propagated through the citation chain.
And also of increasingly ridiculous and overly broad concepts of what plagiarism is. At some point things shifted from “don’t represent others’ work as novel” towards “give a genealogical ontology of every concept above that of an intro 101 college course on the topic.”
In the methods section, it's very common to say "We employ method barfoo [1] as implemented in library libbar [2], with the specific variant widget due to Smith et al. [3] and the gobbledygook renormalization [4,5]. The feoozbar is solved with geometric multigrid [6]. Data is analyzed using the froiznok method [7] from the boolbool library [8]." There goes 8, now you have 2 citations left for the introduction.
I've always assumed peer review is similar to diff review. Where I'm willing to sign my name onto the work of others. If I approve a diff/pr and it takes down prod. It's just as much my fault, no?
> They are also assuming good faith.
I can only relate this to code review, but assuming good faith means you assume they didn't try to introduce a bug by adding this dependency. But I would should still check to make sure this new dep isn't some typosquatted package. That's the rigor I'm responsible for.
Ph.D. in neuroscience here. Programmer by trade. This is not true. Less you know about most peer revies is better.
The better peer reviews are also not this 'thorough' and no one expects reviewers to read or even check references. Unless they are citing something they are familiar with and you are using it wrong then they will likely complain. Or they find some unknown citations very relevant to their work, they will read.
I don't have a great analogy to draw here. peer review is usually a thankless and unpaid work so there is unlikely to be any motivation for fraud detection unless it somehow affects your work.
Checking references can be useful when you are not familiar with the topic (but must review the paper anyway). In many conference proceedings that I have reviewed for, many if not most citations were redacted so as to keep the author anonymous (citations to the author's prior work or that of their colleagues).
LLMs could be used to find prior work anyway, today.
Yes in theory you can go through every semicolon to check if it's not actually a greek question mark; but one assumes good faith and baseline competence such that you as the reviewer would generally not be expected to perform such pedantic checks.
So if you think you might have reasonably missed greek question marks in a visual code review, then hopefully you can also appreciate how a paper reviewer might miss a false citation.
As a PR reviewer I frequently pull down the code and run it. Especially if I'm suggesting changes because I want to make sure my suggestion is correct.
Do other PR reviewers not do this?
No it's not. I think you're trying to make a different point, because you're using an example of a specific deliberate malicious way to hide a token error that prevents compilation, but is visually similar.
> and you as a code reviewer are only expected to review the code visually and are not provided the resources required to compile the code on your local machine to see the compiler fail.
What weird world are you living in where you don't have CI. Also, it's pretty common I'll test code locally when reviewing something more complex, more complex, or more important, if I don't have CI.
> Yes in theory you can go through every semicolon to check if it's not actually a greek question mark; but one assumes good faith and baseline competence such that you as the reviewer would generally not be expected to perform such pedantic checks.
I don't, because it won't compile. Not because I assume good faith. References and citations are similar to introducing dependencies. We're talking about completely fabricated deps. e.g. This engineer went on npm and grabbed the first package that said left-pad but it's actually a crypto miner. We're not talking about a citation missing a page number, or publication year. We're talking about something that's completely incorrect, being represented as relevant.
> So if you think you might have reasonably missed greek question marks in a visual code review, then hopefully you can also appreciate how a paper reviewer might miss a false citation.
I would never miss this, because the important thing is code needs to compile. If it doesn't compile, it doesn't reach the master branch. Peer review of a paper doesn't have CI, I'm aware, but it's also not vulnerable to syntax errors like that. A paper with a fake semicolon isn't meaningfully different, so this analogy doesn't map to the fraud I'm commenting on.
1. A patch is self-contained and applies to a codebase you have just as much access to as the author. A paper, on the other hand, is just the tip of the iceberg of research work, especially if there is some experiment or data collection involved. The reviewer does not have access to, say, videos of how the data was collected (and even if they did, they don't have the time to review all of that material).
2. The software is also self-contained. That's "prodcution". But a scientific paper does not necessarily aim to represent scientific consensus, but a finding by a particular team of researchers. If a paper's conclusions are wrong, it's expected that it will be refuted by another paper.
Given the repeatability crisis I keep reading about, maybe something should change?
> 2. The software is also self-contained. That's "prodcution". But a scientific paper does not necessarily aim to represent scientific consensus, but a finding by a particular team of researchers. If a paper's conclusions are wrong, it's expected that it will be refuted by another paper.
This is a much, MUCH stronger point. I would have lead with this because the contrast between this assertion, and my comparison to prod is night and day. The rules for prod are different from the rules of scientific consensus. I regret losing sight of that.
Yeah its insane the workload reviewers are faced with + being an author who gets a review from a novice
No.
Modern peer review is “how can I do minimum possible work so I can write ‘ICLR Reviewer 2025’ on my personal website”
I don't know, I still think this describes most of the reviews I've seen
I just hope most devs that do this know better than to admit to it.
This is systemic, and unlikely to change anytime soon. There have been remedies proposed (e.g. limits on how many papers an author can publish per year, let's say 4 to be generous), but they are unlikely to gain traction as thoug most would agree onbenefits, all involved in the system would stand to lose short term.
I guess this explains all those times over the years where I follow a citation from a paper and discover it doesn’t support what the first paper claimed.
...at least the mandatory automated checking processes are probably not far off at least for the more reputable journals, but it still makes you wonder how much you can trust the last two years of LLM-enhanced science that is now being quoted in current publications and if those hallucinations can be "reverted" after having been re-quoted. A bit like Wikipedia can be abused to establish facts.
Doesn't this sound like something that could be automated?
for paper_name in citations... do a web search for it, see if it there's a page in the results with that title.
That would at least give you "a paper with this name exists".
However the paper is submitted, like a folder on a cloud drive, just have them include a folder with PDFs/abstracts of all the citations?
They might then fraudulently produce papers to cite, but they can't cite something that doesn't exist.
Even if you could retrieve all citations (which isn't always as easy as you might hope) to validate citations you'd also have to confirm the paper says what the person citing it says. If I say "A GPU requires 1.4kg of copper" citing [1] is that a valid citation?
That means not just reviewing one paper, but also potentially checking 70+ papers it cites. The vast majority of paper reviewers will not check citations actually say what they're claimed to say, unless a truly outlandish claim is made.
At the same time, academia is strangely resistant to putting hyperlinks in citations, preferring to maintain old traditions - like citing conference papers by page number in a hypothetical book that has never been published; and having both a free and a paywalled version of a paper while considering the paywalled version the 'official' version.
But to your point, seems we need a tool that can do this
I experimented a couple of years ago with getting LLMs to check citations but stopped working on it because there's no incentive. You could run a fancy expensive pipeline burning scarce GPU hours and find a bunch of bad citations. Then what? Nobody cares. No journal is going to retract any of these papers, the academics themselves won't care or even respond to your emails, nobody is willing to pay for this stuff, least of all the universities, journals or governments themselves.
For example, there's a guy in France who runs a pre-LLM pipeline to discover bad papers using hand-coded heuristics like regexs or metadata analysis e.g. checking if a citation has been retracted. Many of the things it detects are plagiarism, paper mills (i.e. companies that sell fake papers to academics for a profit), or the result of joke paper creators like SciGen.
https://dbrech.irit.fr/pls/apex/f?p=9999:1::::::
Other than populating an obscure database nobody knows about, this work achieved bupkis.
After all, their grant covers their thesis, not their thesis plus all of the theses they cite.
2. If the paper turns out to be important, people will bother.
3. There's checking for cursory correctness, and there's forensic torture.
The review should also determine how valuable the contribution is, not only if it has mistakes or not.
Todays reviews determine neither value nor correctness in any meaningful way. And how could they, actually? That is why I review papers only to the extent that I understand them, and I clearly delineate my line of understanding. And I don't review papers that I am not interested in reading. I once got a paper to review that actually pointed out a mistake in one of my previous papers, and then proposed a different solution. They correctly identified the mistake, but I could not verify if their solution worked or not, that would have taken me several weeks to understand. I gave a report along these lines, and the person who gave me the review said I should say more about their solution, but I could not. So my review was not actually used. The paper was accepted, which is fine, but I am sure none of the other reviewers actually knows if it is correct.
Now, this was a case where I was an absolute expert. Which is far from the usual situation for a reviewer, even though many reviewers give themselves the highest mark for expertise when they just should not.