Over fifty new hallucinations in ICLR 2026 submissions (opens in new tab)

(gptzero.me)

507 pointsputtycat3mo ago422 comments

422 comments

Surely this is gross professional misconduct? If one of my postdocs did this they would be at risk of being fired. I would certainly never trust them again. If I let it get through, I should be at risk.

As a reviewer, if I see the authors lie in this way why should I trust anything else in the paper? The only ethical move is to reject immediately.

I acknowledge mistakes and so on are common but this is different league bad behaviour.

mike_hearn3mo ago

What field are you in?

In many fields it's gross professional misconduct only in theory. This sort of thing is very common and there's never any consequence. LLM-generated citations specifically are a new problem but citations of documents that don't support the claim, contradict it, have nothing to do with it or were retracted years ago have been an issue for a long time.

Gwern wrote about this here:

https://gwern.net/leprechaun

"A major source of [false claim] transmission is the frequency with which researchers do not read the papers they cite: because they do not read them, they repeat misstatements or add their own errors, further transforming the leprechaun and adding another link in the chain to anyone seeking the original source. This can be quantified by checking statements against the original paper, and examining the spread of typos in citations: someone reading the original will fix a typo in the usual citation, or is unlikely to make the same typo, and so will not repeat it. Both methods indicate high rates of non-reading"

I first noticed this during COVID and did some blogging about it. In public health it is quite common to do things like present a number with a citation, and then the paper doesn't contain that number anywhere in it, or it does but the number was an arbitrary assumption pulled out of thin air rather than the empirical fact it was being presented as.

It was also very common for papers to open by saying something like, "Epidemiological models are a powerful tool for predicting the spread of disease" with eight different citations, and every single citation would be an unvalidated model - zero evidence that any of the cited models were actually good at prediction.

Bad citations are hardly the worst problem with these fields, but when you see how widespread it is and that nobody within the institutions cares it does lead to the reaction you're having where you just throw your hands up and declare whole fields to be writeoffs.

TomasBM3mo ago

The abuse of claims and citations is a legitimate and common problem.

However, I think hallucinated citations pose a bigger problem, because they're fundamentally a lie by commission instead of omission, misinterpretation or misrepresentation of facts.

At the same time, it may be an accidental lie, insofar authors mistakenly used LLMs as search engines, just to support a claim that's commonly known, or that they remember well but can't find the origin of.

So, unless we reduce the pressure on publication speed, and increase the pressure for quality, we'll need to introduce more robust quality checks into peer review.

stainablesteel3mo ago

this brings us to a cultural divide, westerners would see this as a personal scar, as they consider the integrity of the publishing sphere at large to be held up by the integrity of individuals

i clicked on 4 of those papers, and the pattern i saw was middle-eastern, indian, and chinese names

these are cultures where they think this kind of behavior is actually acceptable, they would assume it's the fault of the journal for accepting the paper. they don't see the loss of reputation to be a personal scar because they instead attribute blame to the game.

some people would say it's racist to understand this, but in my opinion when i was working with people from these cultures there was just no other way to learn to cooperate with them than to understand them, it's an incredibly confusing experience to be working with them until you understand the various differences between your own culture and theirs

ssivark3mo ago

PSA: Please note that the names are hallucinated author lists part of the hallucinated citations, and not names of offending authors.

AFAIK the submissions are still blinded and we don't know who the authors are. We will, surely, soon -- since ICLR maintains all submissions in public record for posterity, even if "withdrawn". They are unblinded after the review period finishes.

ribosometronome3mo ago

Where do you see the authors? All I'm seeing is:

>Anonymous authors

>Paper under double-blind review

3 more replies

zsdfgyu3mo ago

This sort of behavior is not limited to researchers from those cultures. One of the highest profile academic frauds to date was from a German. Look up the Schön scandal.

throw109203mo ago

> these are cultures where they think this kind of behavior is actually acceptable, they would assume it's the fault of the journal for accepting the paper. they don't see the loss of reputation to be a personal scar because they instead attribute blame to the game.

I have a relative who lived in a country in the East for several years, and he says that this is just factually true.

The vast majority of people who disagree with this statement have never actually lived in these cultures. They just hallucinate that they have because they want that statement to be false so badly.

...but, simultaneously, I'm also not seeing where you see the authors of the papers - I only see hallucitation authors. e.g. at the link for the first paper submission (https://openreview.net/forum?id=WPgaGP4sVS), there doesn't appear to be any authors listed. Are you confusing the hallucinated citation authors with the primary paper authors?

In that case, I would expect Eastern authors to be over-represented, because they just publish a lot more.

Aeglaecia3mo ago

im not sure if you are gonna get downvoted so im sticking a limb out to cop any potential collateral damage in the name of finding out whether the common inhabitant of this forum considers the idea of low trust vs high trust societies to be inherently racist

2 more replies

make33mo ago

Isn't this mostly a set of citation typos? To me this mostly calls for better bibtex checking, writing and checking bibtex is super annoying

urspx3mo ago

Forgetting authors, misspelling them or the journals, putting a wrong digit etc... could be citation typos. I don't see how you add 5 non-existing authors and put a different—but conceptually plausible—journal in the bibtex.

Besides, I would think most people are using bibliographic managers like Zotero&co..., which will pull metadata through DOIs or such.

The errors look a lot more like what happens when you ask an LLM for some sources on xyz.

2 more replies

ulrashida3mo ago

Unfortunately while catching false citations is useful, in my experience that's not usually the problem affecting paper quality. Far more prevalent are authors who mis-cite materials, either drawing support from citations that don't actually say those things or strip the nuance away by using cherry picked quotes simply because that is what Google Scholar suggested as a top result.

The time it takes to find these errors is orders of magnitude higher than checking if a citation exists as you need to both read and understand the source material.

These bad actors should be subject to a three strikes rule: the steady corrosion of knowledge is not an accident by these individuals.

hippo223mo ago

It seems like this is the type of thing that LLMs would actually excel at though: find a list of citations and claims in this paper, do the cited works support the claims?

bryanrasmussen3mo ago

sure, except when they hallucinate that the cited works support the claims when they do not. At which point you're back at needing to read the cited works to see if they support the claims.

2 more replies

19f191ty3mo ago

Exactly abuse of citations is a much more prevalent and sinister issue and has been for a long time. Fake citations are of course bad but only tip of the iceberg.

seventytwo3mo ago

Then punish all of it.

lijenjin3mo ago

The linked article at the end says: "First, using Hallucination Check together with GPTZero’s AI Detector allows users to check for AI-generated text and suspicious citations at the same time, and even use one result to verify the other. Second, Hallucination Check greatly reduces the time and labor necessary to verify a document’s sources by identifying flawed citations for a human to review."

On their site (https://gptzero.me/sources) it also says "GPTZero's Hallucination Detector automatically detects hallucinated sources and poorly supported claims in essays. Verify academic integrity with the most accurate hallucination detection tool for educators", so it does more than just identify invalid citations. Seems to do exactly what you're talking about.

potato37328423mo ago

>These bad actors should be subject to a three strikes rule: the steady corrosion of knowledge is not an accident by these individuals.

These people are working in labs funded by Exxon or Meta or Pfizer or whoever and they know what results will make continued funding worthwhile in the eyes of their donors. If the lab doesn't produce the donor will fund another one that will.

mike_hearn3mo ago

No, not really. I've read lots of research papers from commercial firms and academic labs. Bad citations are something I only ever saw in academic papers.

I think that's because a lot of bad citations come from reviewer demands to add more of them during the journal publishing process, so they're not critical to the argument and end up being low effort citations that get copy/pasted between papers. Or someone is just spamming citations to make a weak claim look strong. And all this happens because academic uses citations as a kind of currency (it's a planned non-market economy, so they have to allocate funds using proxy signals).

Commercial labs are less likely to care about the journal process to begin with, and are much less likely to publish weak claims because publishing is just a recruiting tool, not the actual end goal of the R&D department.

theoldgreybeard3mo ago

If a carpenter builds a crappy shelf “because” his power tools are not calibrated correctly - that’s a crappy carpenter, not a crappy tool.

If a scientist uses an LLM to write a paper with fabricated citations - that’s a crappy scientist.

AI is not the problem, laziness and negligence is. There needs to be serious social consequences to this kind of thing, otherwise we are tacitly endorsing it.

CapitalistCartr3mo ago

I'm an industrial electrician. A lot of poor electrical work is visible only to a fellow electrician, and sometimes only another industrial electrician. Bad technical work requires technical inspectors to criticize. Sometimes highly skilled ones.

andy993mo ago

I’ve reviewed a lot of papers, I don’t consider it the reviewers responsibility to manually verify all citations are real. If there was an unusual citation that was relied on heavily for the basis of the work, one would expect it to be checked. Things like broad prior work, you’d just assume it’s part of background.

The reviewer is not a proofreader, they are checking the rigour and relevance of the work, which does not rest heavily on all of the references in a document. They are also assuming good faith.

14 more replies

xnx3mo ago

No doubt the best electricians are currently better than the best AI, but the best AI is likely now better than the novice homeowner. The trajectory over the past 2 years has been very good. Another five years and AI may be better than all but the very best, or most specialized, electricians.

1 more reply

lencastre3mo ago

an old boss of mine used to say there are no stupid electricians found alive, as they self select darwin award style

bdangubic3mo ago

same (and much, much, much worse) for science

barfoure3mo ago

I’d love to hear some examples of poor electrical work that you’ve come across that’s often missed or not seen.

2 more replies

kklisura3mo ago

> AI is not the problem, laziness and negligence is

This reminds me about discourse about a gun problem in US, "guns don't kill people, people kill people", etc - it is a discourse used solely for the purpose of not doing anything and not addressing anything about the underlying problem.

So no, you're wrong - AI IS THE PROBLEM.

sneak3mo ago

> it is a discourse used solely for the purpose of not doing anything and not addressing anything about the underlying problem

Solely? Oh brother.

In reality it’s the complete opposite. It exists to highlight the actual source of the problem, as both industries/practitioners using AI professionally and safely, and communities with very high rates of gun ownership and exceptionally low rates of gun violence exist.

It isn’t the tools. It’s the social circumstances of the people with access to the tools. That’s the point. The tools are inanimate. You can use them well or use them badly. The existence of the tools does not make humans act badly.

Yoofie3mo ago

No, the OP is right in this case. Did you read TFA? It was "peer reviewed".

> Worryingly, each of these submissions has already been reviewed by 3-5 peer experts, most of whom missed the fake citation(s). This failure suggests that some of these papers might have been accepted by ICLR without any intervention. Some had average ratings of 8/10, meaning they would almost certainly have been published.

If the peer reviewers can't be bothered to do the basics, then there is literally no point to peer review, which is fully independent of the author who uses or doesn't use AI tools.

2 more replies

TomatoCo3mo ago

To continue the carpenter analogy, the issue with LLMs is that the shelf looks great but is structurally unsound. That it looks good on surface inspection makes it harder to tell that the person making it had no idea what they're doing.

embedding-shape3mo ago

Regardless, if a carpenter is not validating their work before selling it, it's the same as if a researcher doesn't validate their citations before publishing. Neither of them have any excuses, and one isn't harder to detect than the other. It's just straight up laziness regardless.

1 more reply

k4rli3mo ago

Very good analogy I'd say.

Also similar to what Temu, Wish, and other similar sites offer. Picture and specs might look good but it will likely be disappointing in the end.

SubiculumCode3mo ago

Yeah seriously. Using an LLM to help find papers is fine. Then you read them. Then you use a tool like Zotero or manually add citations. I use Gemini Pro to identify useful papers that I might not yet have encountered before. But, even when asking to restrict itself to Pubmed resources, it's citations are wonky, citing three different version sources of the same paper (citations that don't say what they said they'd discuss).

That said, these tools have substantially reduced hallucinations over the last year, and will just get better. It also helps if you can restrict it to reference already screened papers.

Finally, I'd lke to say tthat if we want scientists to engage in good science, stop forcing them to spend a third of their time in a rat race for funding...it is ridiculously time consuming and wasteful of expertise.

bossyTeacher3mo ago

The problem isn't whether they have more or less hallucinations. The problem is that they have them. And as long as they hallucinate, you have to deal with that. It doesn't really matter how you prompt, you can't prevent hallucinations from happening and without manual checking, eventually hallucinations will slip under the radar because the only difference between a real pattern and a hallucinated one is that one exists in the world and the other one doesn't. This is not something you can really counter with more LLMs either as it is a problem intrinsic to LLMs

1 more reply

bigstrat20033mo ago

> If a carpenter builds a crappy shelf “because” his power tools are not calibrated correctly - that’s a crappy carpenter, not a crappy tool.

It's both. The tool is crappy, and the carpenter is crappy for blindly trusting it.

> AI is not the problem, laziness and negligence is.

Similarly, both are a problem here. LLMs are a bad tool, and we should hold people responsible when they blindly trust this bad tool and get bad results.

jodleif3mo ago

I find this to be a bit “easy”. There is such a thing as bad tools. If it is difficult to determine if the tool is good or bad i’d say some of the blame has to be put on the tool.

nwallin3mo ago

"Anyone, from the most clueless amateur to the best cryptographer, can create an algorithm that he himself can’t break."--Bruce Schneier

There's a corollary here with LLMs, but I'm not pithy enough to phrase it well. Anyone can create something using LLMs that they, themselves, aren't skilled enough to spot the LLMs' hallucinations. Or something.

LLMs are incredibly good at exploiting peoples' confirmation biases. If it "thinks" it knows what you believe/want, it will tell you what you believe/want. There does not exist a way to interface with LLMs that will not ultimately end in the LLM telling you exactly what you want to hear. Using an LLM in your process necessarily results in being told that you're right, even when you're wrong. Using an LLM necessarily results in it reinforcing all of your prior beliefs, regardless of whether those prior beliefs are correct. To an LLM, all hypotheses are true, it's just a matter of hallucinating enough evidence to satisfy the users' skepticism.

I do not believe there exists a way to safely use LLMs in scientific processes. Period. If my belief is true, and ChatGPT has told me it's true, then yes, AI, the tool, is the problem, not the human using the tool.

czl3mo ago

> I do not believe there exists a way to safely use LLMs in scientific processes.

What about giving the LLM a narrowly scoped role as a hostile reviewer, while your job is to strengthen the write-up to address any valid objections it raises, plus any hallucinations or confusions it introduces? That’s similar to fuzz testing software to see what breaks or where the reasoning crashes.

Used this way, the model isn’t a source of truth or a decision-maker. It’s a stress test for your argument and your clarity. Obviously it shouldn’t be the only check you do, but it can still be a useful tool in the broader validation process.

rectang3mo ago

“X isn’t the problem, people are the problem.” — the age-old cry of industry resisting regulation.

kklisura3mo ago

It's not about resisting. It's about undermining any action whatsoever.

theoldgreybeard3mo ago

I am not against regulation.

Quite the opposite actually.

codywashere3mo ago

what regulation are you advocating for here?

2 more replies

only-one17013mo ago

Absolutely brutal case of engineering brain here. Real "guns don't kill people, people kill people" stuff.

somehnguy3mo ago

Your second statement is correct. What about it makes it “engineering brain”?

2 more replies

theoldgreybeard3mo ago

If you were to wager a guess, what do you think my views on gun rights are?

1 more reply

Forgeties793mo ago

If my calculator gives me the wrong number 20% of the time yeah I should’ve identified the problem, but ideally, that wouldn’t have been sold to me as a functioning calculator in the first place.

imiric3mo ago

Indeed. The narrative that this type of issue is entirely the responsibility of the user to fix is insulting, and blame deflection 101.

It's not like these are new issues. They're the same ones we've experienced since the introduction of these tools. And yet the focus has always been to throw more data and compute at the problem, and optimize for fancy benchmarks, instead of addressing these fundamental problems. Worse still, whenever they're brought up users are blamed for "holding it wrong", or for misunderstanding how the tools work. I don't care. An "artificial intelligence" shouldn't be plagued by these issues.

2 more replies

theoldgreybeard3mo ago

If it was a well understood property of calculators that they gave incorrect answers randomly then you need to adjust the way you use the tool accordingly.

4 more replies

Hammershaft3mo ago

AI dramatically changes the perceived cost/benefit of laziness and negligence, which is leading to much more of it.

grey-area3mo ago

Generative AI and the companies selling it with false promises and using it for real work absolutely are the problem.

acituan3mo ago

> AI is not the problem, laziness and negligence is.

As much as I agree with you that this is wrong, there is a danger in putting the onus just on the human. Whether due to competition or top down expectations, humans are and will be pressured to use AI tools alongside their work and produce more. Whereas the original idea was for AI to assist the human, as the expected velocity and consumption pressure increases humans are more and more turning into a mere accountability laundering scheme for machine output. When we blame just the human, we are doing exactly what this scheme wants us to do.

Therefore we must also criticize all the systemic factors that puts pressure on reversal of AI‘s assistance into AI’s domination of human activity.

So AI (not as a technology but as a product when shoved down the throats) is the problem.

alexcdot3mo ago

Absolutely, expectations and tools given by management are a real problem.

If management fires you because they are wrong about how good AI is, and you're right - at the end of the day, you're fired and the manager is in lalaland.

People need to actually push the correct calibration of what these tools should be trusted to do, while also trying to work with what they have.

photochemsyn3mo ago

Yeah, I can't imagine not being familiar with every single reference in the bibliography of a technical publication with one's name on it. It's almost as bad as those PIs who rely on lab techs and postdocs to generate research data using equipment that they don't understand the workings of - but then, I've seen that kind of thing repeatedly in research academia, along with actual fabrication of data in the name of getting another paper out the door, another PhD granted, etc.

Unfortunately, a large fraction of academic fraud has historically been detected by sloppy data duplication, and with LLMs and similar image generation tools, data fabrication has never been easier to do or harder to detect.

b00ty4breakfast3mo ago

maybe the hammer factory should be held responsible for pumping out so many poorly calibrated hammer

SauntSolaire3mo ago

The obvious solution in this scenario is.. to just buy a different hammer.

And in the case of AI, either review its output, or simply don't use it. No one has a gun to your head forcing you to use this product (and poorly at that).

It's quite telling that, even in this basic hypothetical, your first instinct is to gesture vaguely in the direction of governmental action, rather than expect any agency at the level of the individual.

1 more reply

venturecruelty3mo ago

No, because this would cost tens of jobs and affect someone's profits, which are sacrosanct. Obviously the market wants exploding hammers, or else people wouldn't buy them. I am very smart.

venturecruelty3mo ago

"It's not a fentanyl problem, it's a people problem."

"It's not a car infrastructure problem, it's a people problem."

"It's not a food safety problem, it's a people problem."

"It's not a lead paint problem, it's a people problem."

"It's not an asbestos problem, it's a people problem."

"It's not a smoking problem, it's a people problem."

SauntSolaire3mo ago

What an absurd set of equivalences to make regarding a scientist's relationship to their own work.

If an engineer provided this line of excuse to me, I wouldn't let them anywhere near a product again - a complete abdication of personal and professional responsibility.

stocksinsmocks3mo ago

Trades also have self regulation. You can’t sell plumbing services or build houses without any experience or you get in legal trouble. If your workmanship is poor, you can be disciplined by the board even if the tool was at fault. I think fraudulent publications should be taken at least as seriously as badly installed toilets.

jval433mo ago

If a scientist just completely "made up" their references 10 years ago, that's a fraudster. Not just dishonesty but outright academic fraud.

If a scientist does it now, they just blame it on AI. But the consequences should remain the same. This is not an honest mistake.

People that do this - even once - should be banned for life. They put their name on the thing. But just like with plagiarism, falsifying data and academic cheating, somehow a large subset of people thinks it's okay to cheat and lie, and another subset gives them chance after chance to misbehave like they're some kind of children. But these are adults and anyone doing this simply lacks morals and will never improve.

And yes, I've published in academia and I've never cheated or plagiarized in my life. That should not be a drawback.

raincole3mo ago

Given we tacitly accepted replication crisis we'll definitely tacitly accept this.

psychoslave3mo ago

I don't see much crappy power tool provider throwing billions in marketing and product placement to make them used everywhere.

calmworm3mo ago

I don’t understand. You’re saying even with crappy tools one should be able to do the job the same as with well made tools?

tedd4u3mo ago

Three and a half years ago nobody had ever used tools like this. It can't be a legitimate complaint for an author to say, "not my fault my citations are fake it's the fault of these tools" because until recently no such tools were available and the expectation was that all citations are real.

1 more reply

constantcrying3mo ago

Absolutely correct. The real issue is that these people can avoid punishment. If you do not care enough about your paper to even verify the existence of citations, then you obviously should not have a job as a scientist.

Taking an academic who does something like that seriously, seem impossible. At best he is someone who is neglecting his most basic duties as an academic, at worst he is just a fraudster. In both cases he should be shunned and excluded.

DonHopkins3mo ago

Shouldn't there be a black list of people who get caught writing fraudulent papers?

theoldgreybeard3mo ago

Probably. Something like that is what I meant by “social consequences”. Perhaps there should be civil or criminal ones for more egregious cases.

nialv73mo ago

Ah, the "guns don't kill people, people kill people" argument.

I mean sure, but having a tool that made fabrication so much easier has made the problem a lot worse, don't you think?

theoldgreybeard3mo ago

Yes I do agree with you that having a tool that gives rocket fuel to a fraud engine should probably be regulated in some fashion.

Tiered licensing, mandatory safety training, and weapon classification by law enforcement works really well for Canada’s gun regime, for example.

left-struck3mo ago

It’s like the problem was there all along, all LLMs did was expose it more

theoldgreybeard3mo ago

Yes, LLMs didnt create the problem they just accelerated it to a speed that beggars belief.

criley23mo ago

https://en.wikipedia.org/wiki/Replication_crisis

Modern science is designed from the top to the bottom to produce bad results. The incentives are all mucked up. It's absolutely not surprising that AI is quickly becoming yet-another factor lowering quality.

RossBencina3mo ago

No qualified carpenter expects to use a hammer to drill a hole.

foxfired3mo ago

I disagree. When the tool promises to do something, you end up trusting it to do the thing.

When Tesla says their car is self driving, people trust them to self drive. Yes, you can blame the user for believing, but that's exactly what they were promised.

> Why didn't the lawyer who used ChatGPT to draft legal briefs verify the case citations before presenting them to a judge? Why are developers raising issues on projects like cURL using LLMs, but not verifying the generated code before pushing a Pull Request? Why are students using AI to write their essays, yet submitting the result without a single read-through? They are all using LLMs as their time-saving strategy. [0]

It's not laziness, its the feature we were promised. We can't keep saying everyone is holding it wrong.

[0]: https://idiallo.com/blog/none-of-us-read-the-specs

rolandog3mo ago

Very well put. You're promised Artificial Super Intelligence and shown a super cherry-picked promo and instead get an agent that can't hold its drool and needs constant hand-holding... it can't be both things at the same time, so... which is it?

gdulli3mo ago

That's like saying guns aren't the problem, the desire to shoot is the problem. Okay, sure, but wanting something like a metal detector requires us to focus on the more tangible aspect that is the gun.

baxtr3mo ago

If I gave you a gun would you start shooting people just because you had one?

8 more replies

jgalt2123mo ago

fair enough, but carpenters are not being beat over the head to use new-fangled probabilistic speed squares.

hansmayer3mo ago

Scientists who use LLMs to write a paper are crappy scientists indeed. They need to be held accountable, even ostracised by the scientific community. But something is missing from the picture. Why is it that they came up with this idea in the first place? Who could have been peddling the impression (not an outright lie - they are very careful) about LLMs being these almost sentient systems with emergent intelligence, alleviating all of your problems, blah blah blah. Where is the god damn cure for cancer the LLMs were supposed to invent? Who else is it that we need to keep accountable, scrutinised and ostracised for the ever-increasing mountains of AI-crap that is flooding not just the Internet content but now also penetrating into science, every day work, daily lives, conversations, etc. If someone released a tool that enabled and encouraged people to commit suicide in multiple instances that we know of by now, and we know since the infamous "plandemic" facebook trend that the tech bros are more than happy to tolerate worsening societal conditions in the name of their platform growth, who else do we need to keep accountable, scrutinise and ostracise as a society, I wonder?

the84723mo ago

> Where is the god damn cure for cancer the LLMs were supposed to invent?

Assuming that cure is meant as hyperbole, how about https://www.biorxiv.org/content/10.1101/2025.04.14.648850v3 ? AI models being used for bad purposes doesn't preclude them being used for good purposes.

1 more reply

belter3mo ago

"...each of which were missed by 3-5 peer reviewers..."

Its sloppy work all the way down...

1 more reply

rdiddly3mo ago

¿Por qué no los dos?

mk893mo ago

> we are tacitly endorsing it.

We are, in fact, not tacitly but openly endorsing this, due to this AI everywhere madness. I am so looking forward to when some genius in some banks starts to use it to simplify code and suddenly I have 100000000 € on my bank account. :)

thaumasiotes3mo ago

> If a scientist uses an LLM to write a paper with fabricated citations - that’s a crappy scientist.

Really? Regardless of whether it's a good paper?

Aurornis3mo ago

Citations are a key part of the paper. If the paper isn’t supported by the citations, it’s not a good paper.

1 more reply

zwnow3mo ago

How is it a good paper if the info in it cant be trusted lmao

1 more reply

jameshart3mo ago

Is the baseline assumption of this work that an erroneous citation is LLM hallucinated?

Did they run the checker across a body of papers before LLMs were available and verify that there were no citations in peer reviewed papers that got authors or titles wrong?

miniwark3mo ago

They explain in the article what they consider a proper citation, an erroneous one and an hallucination, in the section "Defining Hallucitations". They also say than they have many false positives, mostly real papers who are not available online.

Thad said, i am also very curious of the result than their tool, would give to papers from the 2010's and before.

sigmoid103mo ago

If you look at their examples in the "Defining Hallucitations" section, I'd say those could be 100% human errors. Shortening authors' names, leaving out authors, misattributing authors, misspelling or misremembering the paper title (or having an old preprint-title, as titles do change) are all things that I would fully expect to happen to anyone in any field were things get ever got published. Modern tools have made the citation process more comfortable, but if you go back to the old days, you'd probably find those kinds of errors everywhere. If you look at the full list of "hallucinations" they claim to have discovered, the only ones I'd not immediately blame on human screwups are the ones where a title and the authors got zero matches for existing papers/people. If you really want to do this kind of analysis correctly, you'd have to match the claim of the text and verify it with the cited article. Because I think it would be even more dangerous if you can get claims accepted by simply quoting an existing paper correctly, while completely ignoring its content (which would have worked here).

4 more replies

_alternator_3mo ago

Let me second this: a baseline analysis should include papers that were published or reviewed at least 3-4 years ago.

When I was in grad school, I kept a fairly large .bib file that almost certainly had a mistake or two in it. I don’t think any of them ever made it to print, but it’s hard to be 100% sure.

For most journals, they actually partially check your citations as part of the final editing. The citation record is important for journals, and linking with DOIs is fairly common.

currymj3mo ago

the papers themselves are publicly available online too. Most of the ones I spot-checked give the extremely strong impression of AI generation.

not just some hallucinated citations, and not just the writing. in many cases the actual purported research "ideas" seem to be plausible nonsense.

To get a feel for it, you can take some of the topics they write about and ask your favorite LLM to generate a paper. Maybe even throw "Deep Research" mode at it. Perhaps tell it to put it in ICLR latex format. It will look a lot like these.

tokai3mo ago

Yeah that is what their tool does.

llm_nerd3mo ago

People will commonly hold LLMs as unusable because they make mistakes. So do people. Books have errors. Papers have errors. People have flawed knowledge, often degraded through a conceptual game of telephone.

Exactly as you said, do precisely this to pre-LLM works. There will be an enormous number of errors with utter certainty.

People keep imperfect notes. People are lazy. People sometimes even fabricate. None of this needed LLMs to happen.

pmontra3mo ago

Fabricated citations are not errors.

A pre LLM paper with fabricated citations would demonstrate will to cheat by the author.

A post LLM paper with fabricated citations: same thing and if the authors attempt to defend themselves with something like, we trusted the AI, they are sloppy, probably cheaters and not very good at it.

2 more replies

the_af3mo ago

LLM are a force multiplier of this kind of errors though. It's not easy to hallucinate papers out of whole cloth, but LLMs can easily and confidently do it, quote paragraphs that don't exist, and do it tirelessly and at a pace unmatched by humans.

Humans can do all of the above but it costs them more, and they do it more slowly. LLMs generate spam at a much faster rate.

1 more reply

add-sub-mul-div3mo ago

Quoting myself from just last night because this comes up every time and doesn't always need a new write-up.

> You also don't need gunpowder to kill someone with projectiles, but gunpowder changed things in important ways. All I ever see are the most specious knee-jerk defenses of AI that immediately fall apart.

1 more reply

nkrisc3mo ago

Under what circumstances would a human mistakenly cite a paper which does not exist? I’m having difficulty imagining how someone could mistakenly do that.

1 more reply

chistev3mo ago

Last month, I was listening to the Joe Rogan Experience episode with guest Avi Loeb, who is a theoretical physicist and professor at Harvard University. He complained about the disturbingly increasing rate at which his students are submitting academic papers referencing non-existent scientific literature that were so clearly hallucinated by Large Language Models (LLMs). They never even bothered to confirm their references and took the AI's output as gospel.

https://www.rxjourney.net/how-artificial-intelligence-ai-is-...

teddyh3mo ago

> Avi Loeb, who is a theoretical physicist and professor at Harvard University

Also a frequent proponent of UFO claims about approaching meteors.

chistev3mo ago

Yea, he harped on that a lot during the podcast

mannanj3mo ago

Isn't this an underlying symptom of lack of accountability of our greater leadership? They do these things, they act like criminals and thieves, and so the people who follow them get shown examples that it's OK while being told to do otherwise.

"Show bad examples then hit you on the wrist for following my behavior" is like bad parenting.

dandanua3mo ago

I don't think they want you to follow their behavior. They do want accountability, but for everyone below them, not for themselves.

1 more reply

venturecruelty3mo ago

Talk about a buried lead... Avi Loeb is, first and foremost, a discredited crank.

sen3mo ago

That’s implied by the fact he was on the Joe Rogan show.

1 more reply

TaupeRanger3mo ago

It's going to be even worse than 50:

> Given that we've only scanned 300 out of 20,000 submissions, we estimate that we will find 100s of hallucinated papers in the coming days.

shusaku3mo ago

20,000 submissions to a single conference? That is nuts

ghaff3mo ago

Doesn't seem especially out of the norm for a large conference. Call it 10,000 attendees which is large but not huge. Sure; not everyone attending puts in a session proposal. But others put multiple. And many submit but, if not accepted don't attend.

Can't quote exact numbers but when I was on the conference committee for a maybe high four figures attendance conference, we certainly had many thousands of submissions.

zipy1243mo ago

When academics are graded based on number of papers this is the result.

1 more reply

analog313mo ago

This is an interesting article along those lines...

https://www.theguardian.com/technology/2025/dec/06/ai-resear...

thruifgguh5853mo ago

> crushed by an avalanche of submissions fueled by generative AI, paper mills, and publication pressure.

Run of the mill ML jobs these days ask for "papers in NeurIPS ICLR or other Tier-1 conferences".

We're well past Goodhart's law when it comes to publications.

It was already insane in CS - now it's reached asylum levels.

disqard3mo ago

You said the quiet part out loud.

Academia has been ripe for disruption for a while now.

The "Rooter" paper came out 20 years ago:

https://www.csail.mit.edu/news/how-fake-paper-generator-tric...

Isamu3mo ago

Someone commented here that hallucination is what LLMs do, it’s the designed mode of selecting statistically relevant model data that was built on the training set and then mashing it up for an output. The outcome is something that statistically resembles a real citation.

Creating a real citation is totally doable by a machine though, it is just selecting relevant text, looking up the title, authors, pages etc and putting that in canonical form. It’s just that LLMs are not currently doing the work we ask for, but instead something similar in form that may be good enough.

make33mo ago

This interpretation would have been ok for old generation models without search tools enabled and without reliable tool use and reasoning. Modern LLMs can actually look up the existence of papers with web search, and with reasoning, one can definitely get reasonable results by requiring the model to double check that everything actually exists.

senshan3mo ago

As many pointed out, the purpose of peer review is not linting, but the assessment of the novelty and subtle omissions.

Which incentives can be set to discourage the negligence?

How about bounties? A bounty fund set up by the publisher and each submission must come with a contribution to the fund. Then there be bounties for gross negligence that could attract bounty hunters.

How about a wall of shame? Once negligence crosses a certain threshold, the name of the researcher and the paper would be put on a wall of shame for everyone to search and see?

skybrian3mo ago

For the kinds of omissions described here, maybe the journal could do an automated citation check when the paper is submitted and bounce back any paper that has a problem with a day or two lag. This would be incentive for submitters to do their own lint check.

senshan3mo ago

True if the citation has only a small typo or two. But if it is unrecognizable or even irrelevant, this is clearly bad (fraudulent?) research -- each citation has be read and understood by the researcher and put in there only if absolutely necessary to support the paper.

There must be price to pay for wasting other people's time (lives?).

dclowd99013mo ago

To me, this is exactly what LLMs are good for. It would be exhausting double checking for valid citations in a research paper. Fuzzy comparison and rote lookup seem primed for usage with LLMs.

Writing academic papers is exactly the _wrong_ usage for LLMs. So here we have a clear cut case for their usage and a clear cut case for their avoidance.

skobes3mo ago

If LLMs produce fake citations, why would we trust LLMs to check them?

watwut3mo ago

Because the risk is lower. They will give you suspicious citations and you can manually check those for false positives. If some false citation pass, it was still a net gain.

venturecruelty3mo ago

Because my boss said if I don't, I'm fired.

idiotsecant3mo ago

Exactly, and there's nothing wrong with using LLMs in this same way as part of the writing process to locate sources (that you verify), do editing (that you check), etc. It's just peak stupidity and laziness to ask it to do the whole thing.

dawnerd3mo ago

Shouldn’t need an llm to check. It’s just a list of authors. I wouldn’t trust an llm on this, and even if they were perfect that’s a lot of resource use just to do something traditional code could do.

dclowd99013mo ago

I would assume you would use the LLM to not only check the source exists but check that the citation referenced actually says what the author says it does. That's not something you can do heuristically, I would think.

MarkusQ3mo ago

This is as much a failing of "peer review" as anything. Importantly, it is an intrinsic failure, which won't go away even if LLMs were to go away completely.

Peer review doesn't catch errors.

Acting as if it does, and thus assuming the fact of publication (and where it was published) are indicators of veracity is simply unfounded. We need to go back to the food fight system where everyone publishes whatever they want, their colleagues and other adversaries try their best to shred them, and the winners are the ones that stand up to the maelstrom. It's messy, but it forces critics to put forth their arguments rather than quietly gatekeeping, passing what they approve of, suppressing what they don't.

watwut3mo ago

Peer review was never supposed to check every single detail and every single citation. They are not proof readers. They are not even really supposed to agree or disagree with your results. They should check the soundness of a method, general structure of a paper, that sort of thing. They do catch some errors, but the expectation is not to do another independent study or something.

Passed peer review is the first basic bar that has to be cleared. It was never supposed to be all there is to the science.

dawnerd3mo ago

It would be crazy to expect them to verify every author is correct on a citation and to cross verify everything. There’s tooling that could be built for that and kinda wild isn’t a thing that’s run on paper submission.

MarkusQ3mo ago

Agreed. But too often it's treated as a golden ticket confirmation of veracity, giving the process more epistemological authority than it warrants.

ulrashida3mo ago

Peer review definitely does catch errors when performed by qualified individuals. I've personally flagged papers for major revisions or rejection as a result of errors in approach or misrepresentation of source material. I have peers who say they have done similar.

I'm not sure why you think this isn't the case?

MarkusQ3mo ago

Poor wording on my part.

I should have said "Peer review doesn't catch _all_ errors" or perhaps "Peer review doesn't eliminate errors".

In other words, being "peer reviewed" is nowhere close to "error free," and if (as is often the case) the rate of errors is significantly greater than the rate at which errors are caught, peer review may not even significantly improve the quality.

https://pmc.ncbi.nlm.nih.gov/articles/PMC1182327/

1 more reply

qbit423mo ago

I don’t think many researchers take peer review alone as a strong signal, unless it is a venue known for having serious reviewing (e.g. in CS theory, STOC and FOCS have a very high bar). But it acts as a basic filter that gets rid of obvious nonsense, which on its own is valuable. No doubt there are huge issues, but I know my papers would be worse off without reviewer feedback

tpoacher3mo ago

Peer review is as useless as code review and unit tests, yes.

It's much more useful if everyone including the janitor and their mom can have a say on your code before you're allowed to move to your next commit.

(/s, in case it's not obvious :D )

exasperaited3mo ago

No, it's not "as much".

The dominant "failing" here is that this is fraudulent on a professional, intellectual, and moral level.

currymj3mo ago

I recommend actually clicking through and reading some of these papers.

Most of those I spot checked do not give an impression of high quality. Not just AI writing assistance but many seem to have AI-generated "ideas", often plausible nonsense. the reviewers often catch the errors and sometimes even the fake citations.

can I prove malfeasance beyond a reasonable doubt? no. but I personally feel quite confident many of the papers I checked are primarily AI-generated.

I feel really bad for any authors who submitted legitimate work but made an innocent mistake in their .bib and ended up on the same list as the rest of this stuff.

uplifter3mo ago

To me such an interpretation suggests there are likely to be papers that were not so easy to spot, perhaps because the AI accidentally happened upon more plausible nonsense and then generated fully non-sense data, which was believable but still (at a reduced level of criticality) nonsense data, to bolster said non-sense theory at a level that is less easy to catch.

This isn't comforting at all.

noodlesUK3mo ago

It astonishes me that there would be so many cases of things like wrong authors. I began using a citation manager that extracted metadata automatically (zotero in my case) more than 15 years ago, and can’t imagine writing an academic paper without it or a similar tool.

How are the authors even submitting citations? Surely they could be required to send a .bib or similar file? It’s so easy to then quality control at least to verify that citations exist by looking up DOIs or similar.

I know it wouldn’t solve the human problem of relying on LLMs but I’m shocked we don’t even have this level of scrutiny.

pama3mo ago

Maybe you haven’t carefully checked yet the correctness of automatic tools or of the associated metadata. Zotero is certainly not bug free. Even authors themselves have miss-cited their own past work on occasion, and author lists have had errors that get revised upon resubmission or corrected in errata after publication. The DOI is indeed great, and if it is correct, I can still use the citation as a reader, but the (often abbreviated) lists of authors often have typos. In this case the error rate is not particularly high compared to random early review-level submissions I’ve seen many decades ago. Tools helped increase the number of citations and reduce the error per citation but not sure if they reduced the papers that have at least one error.

noodlesUK3mo ago

I agree that the author lists in various metadata sources and databases are often a bit wrong (weird formatting of names for instance is very common), but many of the cases in the OP article are pretty egregious and far beyond just data entry issues.

Presumably the citation scanner they're using is relying on similar data sources as Zotero in any case to detect these sorts of issues.

Regardless, my comment still stands, it seems like the submission is relying on the actual text of the bibliography being correct, rather than requiring a machine readable citation metadata file of some sort, which would at least allow much of the quality control checks to be automated (and certainly would preclude complete hallucinations of nonexistent papers getting through).

sj01f3mo ago

If the inaccuracies are limited to metadata and do not constitute scientific fabrication, the most plausible explanation is that the author attempted to patch incomplete Zotero exports using GPT, inadvertently introducing errors.

Such errors arise from the uncritical adoption of automated tools and a failure to verify outputs manually. This reflects academic laxity and an excessive trust in LLMs. As AI tools become ubiquitous in research—even for generating encyclopedic content—some individuals have developed a misplaced confidence in GPT, leading them to undervalue the importance of citation accuracy.

However, while this is undeniably negligent, it does not validate wholesale dismissal of the paper’s scientific merit, nor does it warrant ad hominem attacks. Demanding the end of an academic career for citation errors is a draconian measure akin to a witch hunt.

gedy3mo ago

The issue is there are incentives for more quantity and not quality in modern science (well more like academia), so people will use tools to pump stuff out. It'll get worse as academic jobs tighten due.

ineedasername3mo ago

How can someone not be aware, at this point, that— sure- use the systems for finding and summarizing research, but for each source, take 2 minutes to find the source and verify?

Really, this isn’t that hard and it’s not at all an obscure requirement or unknown factor.

I think this is much much less “LLMs dumbing things down” and significantly more just a shibboleth for identifying people that were already nearly or actually doing fraudulent research anyway. The ones who we should now go back and look at prior publications as very likely fraudulent as well.

btisler3mo ago

I’ve been working on tools that specifically address this problem, but from the level upstream of citation. They don’t check whether a citation exists — instead they measure whether the reasoning pathway leading to a citation is stable, coherent, and free of the entropy patterns that typically produce hallucinations.

The idea is simple: • Bad citations aren’t the root cause. • They are a late-stage symptom of a broken reasoning trajectory. • If you detect the break early, the hallucinated citation never appears.

The tools I’ve built (and documented so anyone can use) do three things: 1. Measure interrogative structure — they check whether the questions driving the paper’s logic are well-formed and deterministic. 2. Track entropy drift in the argument itself — not the text output, but the structure of the reasoning. 3. Surface the exact step where the argument becomes inconsistent — which is usually before the fake citation shows up.

These instruments don’t replace peer review, and they don’t make judgments about culture or intent. They just expose structural instability in real time — the same instability that produces fabricated references.

If anyone here wants to experiment or adapt the approach, everything is published openly with instructions. It’s not a commercial project — just an attempt to stabilize reasoning in environments where speed and tool-use are outrunning verification.

Code and instrument details are in my CubeGeometryTest repo (the implementation behind ‘A Geometric Instrument for Measuring Interrogative Entropy in Language Systems’). https://github.com/btisler-DS/CubeGeometryTest This is still a developing process.

ricardobeat3mo ago

One of the reported hallucinations in this work [1], starting with David Rein, says the other authors are entirely made up. They are indeed absent from the original cited paper [2], but a Google search shows some of the same names featured in citations from other papers [3] [4].

Most of the names in these wrong attributions are actual people though, not hallucinations. What is going on? Is this a case of AI-powered citation management creating some weird feedback loop?

[1] https://app.gptzero.me/documents/54c8aa45-c97d-48fc-b9d0-d49...

[2] https://arxiv.org/pdf/2311.12022

[3] https://arxiv.org/html/2509.22536v3

[4] https://arxiv.org/html/2511.01191v1

mjd3mo ago

I love that fake citation that adds George Costanza to the list of authors!

Ekaros3mo ago

One wonders why this has not been largely fully automated. If we track those citations anyway. Surely we have database of them and most of them are easily matched there. So only outliers need to be checked either as new latest papers or mistakes which should be close enough to something or real fakes.

Maybe there just is no incentive for this type of activity.

analog313mo ago

For that matter, it could be automated at the source. Let's say I'm an author. I'd gladly run a "linter" on my article that flags references that can't be tracked, and so forth. It would be no different than testing a computer program that I write before giving it to someone.

QuadmasterXLII3mo ago

It seems like the GPT zero team is automating it! Up to very recently, no one sane would cite a paper with correct title but make up random authors- and shortly, this specific signal will be goodhearted away by a “make my malpractice less detectable MCP,” so I can see why this automation is happening exactly now.

IanCal3mo ago

We do have these things and they are often wrong. Loads of the examples given look better than things I’ve seen in real databases on this kind of thing and I worked in this area for a decade.

jordanpg3mo ago

Does anyone know, from a technical standpoint, why are citations such a problem for LLMs?

I realize things are probably (much) more complicated than I realize, but programmatically, unlike arbitrary text, citations are generally strings with a well-defined format. There are literally "specs" for citation formats in various academic, legal, and scientific fields.

So, naively, one way to mitigate these hallucinations would be identify citations with a bunch of regexes, and if one is spotted, use the Google Scholar API (or whatever) to make sure it's real. If not, delete it or flag it, etc.

Why isn't something like this obvious solution being done? My guess is that it would slow things down too much. But it could be optional and it could also be done after the output is generated by another process.

Muller203mo ago

In general, a citation is something that needs to be precise, while LLMs are very good at generating some generic high probability text not grounded in reality. Sure, you could implement a custom fix for the very specific problem of citations, but you cannot solve all kinds of hallucinations. After all, if you could develop a manual solution you wouldn't use an LLM.

There are some mitigations that are used such as RAG or tool usage (e.g. a browser), but they don't completely fix the underlying issue.

jordanpg3mo ago

My point is that citations are constantly making headlines, yet at least at first glance, seems like an eminently solvable problem.

1 more reply

neilv3mo ago

https://blog.iclr.cc/2025/11/19/iclr-2026-response-to-llm-ge...

> Papers that make extensive usage of LLMs and do not disclose this usage will be desk rejected.

This sounds like they're endorsing the game of how much can we get away with, towards the goal of slipping it past the reviewers, and the only penalty is that the bad paper isn't accepted.

How about "Papers suspected of fabrications, plagiarism, ghost writers, or other academic dishonesty, will be reported to academic and professional organizations, as well as the affiliated institutions and sponsors named on the paper"?

proto-n3mo ago

1. "Suspected" is just that, suspected, you can't penalize papers based on your gut feel 2. LLM-s are a tool, and there's nothing wrong with using them unless you misuse them

neilv3mo ago

"Suspected" doesn't necessarily mean only gut feel.

upofadown3mo ago

If you are searching for references with plausible sounding titles then you are doing that because you don't want to have to actually read those references. After all if you read them and discover that one or more don't support your contention (or even worse, refutes it) then you would feel worse about what you are doing. So I suspect there would be a tendency to completely ignore such references and never consider if they actually exist.

LLMs should be awesome at finding plausible sounding titles. The crappy researcher just has to remember to check for existence. Perhaps there is a business model here, bogus references as a service, where this check is done automatically.

cratermoon3mo ago

I believe we discussed this last week, for a different vendor. https://news.ycombinator.com/item?id=46088236

Headline should be "AI vendor’s AI-generated analysis claims AI generated reviews for AI-generated papers at AI conference".

h/t to Paul Cantrell https://hachyderm.io/@inthehands/115633840133507279

pama3mo ago

Given how many errors I have seen in my years as a reviewer from well before the time of AI tools, it would be very surprizing if 99.75% of the ~20,000 submitted papers to didnt have such errors. If the 300 sample they used was truly random, then 50 of 300 sounds about right compared to errors I had seen starting in the 90s when people manually curated bintex entries. It is the author’s and editor’s job, not the reviewer’s, to fix the citations.

simonw3mo ago

I'm finding the GPTZero share links difficult to understand. Apparently this one shows a hallucinated citation but I couldn't understand what it was trying to tell me: https://app.gptzero.me/documents/9afb1d51-c5c8-48f2-9b75-250...

(I'm on mobile, haven't looked on desktop.)

leoc3mo ago

Ah, yes: meta-level model collapse. Very good, carry on.

knallfrosch3mo ago

And these are just the citations that any old free tool could have included via Bibtex link from the website?

Not only is that incredibly easy to verify (you could pay a first semester student without any training), it's also a worrying sign on what the paper's authors consider quality. Not even 5 minutes spent to get the citations right!

You have to wonder what's in these papers.

exasperaited3mo ago

Every single person who did this should be censured by their own institutions.

Do it more than once? Lose job.

End of story.

ls6123mo ago

Some of the examples listed are using the wrong paper title for a real paper (titles can change over time), missing authors (I’ve seen this before on Google Scholar bibitex), misstatements of venue (huh this working paper I added to my bibliography two years ago got published now nice to know), and similar mistakes. This just tells me you hate academics and want to hurt them gratuitously.

exasperaited3mo ago

> This just tells me you hate academics and want to hurt them gratuitously.

Well then you're being rather silly, because that is a silly conclusion to draw (and one not supported by the evidence).

A fairer conclusion was that I meant what is obvious: if you use AI to generate a bibliography, you are being academically negligent.

If you disagree with that, I would say it is you that has the problem with academia, not me.

1 more reply

hyperpape3mo ago

It's awful that there are these hallucinated citations, and the researchers who submitted them ought to be ashamed. I also put some of the blame on the boneheaded culture of academic citations.

"Compression has been widely used in columnar databases and has had an increasing importance over time.[1][2][3][4][5][6]"

Ok, literally everyone in the field already knows this. Are citations 1-6 useful? Well, hopefully one of them is an actually useful survey paper, but odds are that 4-5 of them are arbitrarily chosen papers by you or your friends. Good for a little bit of h-index bumping!

So many citations are not an integral part of the paper, but instead randomly sprinkled on to give an air of authority and completeness that isn't deserved.

I actually have a lot of respect for the academic world, probably more than most HN posters, but this particular practice has always struck me as silly. Outside of survey papers (which are extremely under-provided), most papers need many fewer citations than they have, for the specific claims where the paper is relying on prior work or showing an advance over it.

mccoyb3mo ago

That's only part of the reason that this type of content is used in academic papers. The other part is that you never know what PhD student / postdoc / researcher will be reviewing your paper, which means you are incentivized to be liberal with citations (however tangential) just in case someone is reading your paper, and has the reaction "why didn't they cite this work, of which I had some role in?"

Papers with a fake air of authority of easily dispatched with. What is not so easily dispatched with is the politics of the submission process.

This type of content is fundamentally about emotions (in the reviewer of your paper), and emotions is undeniably a large factor in acceptance / rejection.

zipy1243mo ago

Indeed. One can even game review systems by leaving errors in for the reviewers to find so that they feel good about themselves and that they've done their job. The meta-science game is toxic and full of politics and ego-pleasing.

wohoef3mo ago

Tools like GPTzero are incredibly unreliable. Me and plently of my colleagues often get our writing flagged as 100% AI by these tools, when no AI was used.

obscurette3mo ago

That's what I'm really afraid of – we will be drowning in the AI slop as a society and we'll loose the most important thing that made free and democratic society possible - a trust. People just don't tust anyone and/or anything any more. And the lack of trust, especially in scale, is very expensive.

John78787813mo ago

Yep. And trust is already at all time lows for science, as if it couldn't get any worse.

jqpabc1233mo ago

The legal system has a word to describe AI "slop" --- it is called "negligence".

And as the remedy starts being applied (aka "liability"), the enthusiasm for AI will start to wane.

I wouldn't be surprised if some businesses ban the use of AI --- starting with law firms.

ls6123mo ago

The legal system has a word to describe software bugs --- it is called "negligence".

And as the remedy starts being applied (aka "liability"), the enthusiasm for software will start to wane.

What if anything do you think is wrong with my analogy? I doubt most people here support strict liability for bugs in code.

hnfong3mo ago

I don't even think GP knows what negligence is.

Generally the law allows people to make mistakes, as long as a reasonable level of care is taken to avoid them (and also you can get away with carelessness if you don't owe any duty of care to the party). The law regarding what level of care is needed to verify genAI output is probably not very well defined, but it definitely isn't going to be strict liability.

The emotionally-driven hate for AI, in a tech-centric forum even, to the extent that so many commenters seem to be off-balance in their rational thinking, is kinda wild to me.

1 more reply

jqpabc1233mo ago

What if anything do you think is wrong with my analogy?

I think what is clearly wrong with your analogy is assuming that AI applies mostly to software and code production. This is actually a minor use-case for AI.

Government and businesses of all types ---doctors, lawyers, airlines, delivery companies, etc. are attempting to apply AI to uses and situations that can't be tested in advance the same way "vibe" code can. And some of the adverse results have already been ruled on in court.

https://www.evidentlyai.com/blog/ai-failures-examples

senshan3mo ago

Very good analogy indeed. With one modification it makes perfect sense:

> And as the remedy starts being applied (aka "liability"), the enthusiasm for sloppy and poorly tested software will start to wane.

Many of us use AI to write code these days, but the burden is still on us to design and run all the tests.

loloquwowndueo3mo ago

I applaud your use of triple dashes to avoid automatic conversion to em dashes and being labeled an AI. Kudos!

ghaff3mo ago

This is a particular meme that I really don't like. I've used em-dashes routinely for years. Do I need to stop using them because various people assume they're an AI flag?

1 more reply

VerifiedReports3mo ago

Fabricated, not "hallucinated."

godelski3mo ago

In case people missed it there's some additional important context:

  - Major AI conference flooded with peer reviews written by AI 
      https://news.ycombinator.com/item?id=46088236
  - "All OpenReview Data Leaks" 
    https://news.ycombinator.com/item?id=46073488
    - "The Day Anonymity Died: Inside the OpenReview / ICLR 2026 Leak" 
      https://news.ycombinator.com/item?id=46082370
    - More about the leak
      https://forum.cspaper.org/topic/191/iclr-i-can-locate-reviewer-how-an-api-bug-turned-blind-review-into-a-data-apocalypse

The second one went under the radar, but basically OpenReview left the API open so you didn't need credentials. This meant all reviewers and authors were deanonymized across multiple conferences.

All these links are for ICLR too, which is the #2 ML conference for those that don't know.

And for some important context of the link for this post, note that they only sampled 300 papers and found 50. It looks to be almost exclusively citations but those are probably the easiest things to verify.

And this week CVPR sent out notifications that OpenReview will be down between Dec 6th and Dec 9th. No explanation for why.

So we have reviewers using LLMs, authors using LLMs, and idk the conference systems writing their software with LLMs? Things seem pretty fragile right now...

I think at least this article should highlight one of the problems we have in academia right now (beyond just ML, though it is more egregious there): citation mining. It is pretty standard to have over 50 citations in your 10 page paper these days. You can bet that most of these are not going to be for the critical claims but instead heavily placed in the background section. I looked at a few of the papers and everyone I looked at had their hallucinated citations in background (or background in appendix) sections. So these are "filler" citations, which I think illustrates a problem: citations are being abused. I mean the metric hacking should be pretty obvious if you just look at how many citations ML people have. It's grown exponentially! Do we really need so many citations? I'm all for giving people credit but a hyper-fixation on citation count as our measure of credit just doesn't work. It's far too simple of a metric. Like we might as well measure how good of a coder you are by the number of lines of code you produce[0].

It really seems that academia doesn't scale very well...

[0] https://www.youtube.com/shorts/rDk_LsON3CM

tomrod3mo ago

How sloppy is someone that they don't check their references!

analog313mo ago

A reference is included in a paper if the paper uses information derived from the reference, or to acknowledges the reference as a prior source. If the reference is fake, then the derived information could very well be fake.

Let's say that I use a formula, and give a reference to where the formula came from, but the reference doesn't exist. Would you trust the formula?

Let's say a computer program calls a subroutine with a certain name from a certain library, but the library doesn't exist.

A person doing good research doesn't need to check their references. Now, they could stand to check the references for typographic errors, but that's a stretch too. Almost every online service for retrieving articles includes a reference for each article that you can just copy and paste.

michaelcampbell3mo ago

After an interview with Cory Doctorow I saw recently, I'm going to stop anthropomorphizing these things by calling them "hallucinations". They're computers, so these incidents are just simply Errors.

skobes3mo ago

Developers have been anthropomorphizing computers for as long as they've been around though.

"The compiler thinks my variable isn't declared" "That function wants a null-terminated string" "Teach this code to use a cache"

Even the word computer once referred to a human.

grayhatter3mo ago

I'll continue calling them hallucinations. That's a much more fitting term when you account for the reasonableness of people who believe them. There's also equally a huge breadth of different types of errors that don't pattern match well into, "made up bullshit" the same way calling them hallucinations do. There's no need to introduce that ambiguity when discussing something narrow.

there's nothing wrong with anthropomorphizing genai, it's source material is human sourced, and humans are going to use human like pattern matching when interacting with it. I.e. This isn't the river I want to swim upstream in. I assume you wouldn't complain if someone anthropomorphized a rock... up until they started to believe it was actually alive.

vegabook3mo ago

Given that an (incompetent or even malicious) human put their names(s) to this stuff, “bullshit” is an even better and fitting anthropomorphization

1 more reply

crazygringo3mo ago

They're a very specific kind of error, just like off-by-one errors, or I/O errors, or network errors. The name for this kind of error is a hallucination.

We need a word for this specific kind of error, and we have one, so we use it. Being less specific about a type of error isn't helping anyone. Whether it "anthropomorphizes", I couldn't care less. Heck, bugs come from actual insects. It's a word we've collectively started to use and it works.

ml-anon3mo ago

No it’s not. It’s made up bullshit that arises for reasons that literally no one can formalize or reliably prevent. This is the exact opposite of specific.

1 more reply

Ekaros3mo ago

We still use term bug. And no modern bug is cause by an Arthropod. In that sense I think hallucination is fair term. As coming up anything sufficiently better is hard.

teddyh3mo ago

An actually better (and also more accurate) term would be “confabulations”. Unfortunately, it has not caught on.

JTbane3mo ago

Nah it's very apt and perfectly encapsulates output that looks plausible but is in fact factually incorrect or made up.

rdiddly3mo ago

So papers and citations are created with AI, and here they're being reviewed with AI. When they're published they'll be read by AI, and used to write more papers with AI. Pretty soon, humans won't need to be involved at all, in this apparently insufferable and dreary business we call science, that nobody wants to actually do.

watwut3mo ago

Can we just call them "lies" and "fabrications" which is what they are? If I write the same, you will call them "made up citations" and "academic dishonesty".

One can use AI to help them write without going all the way to having it generate facts and citations.

sorokod3mo ago

As long as the submissions are on behalf of humans we should. The humans should accept the consequences too.

Barbing3mo ago

Ars has often gone with “confabulation”:

>Confabulation was coined right here on Ars, by AI-beat columnist Benj Edwards, in Why ChatGPT and Bing Chat are so good at making things up (Apr 2023).

https://arstechnica.com/civis/threads/researchers-describe-h...

>Generative AI is so new that we need metaphors borrowed from existing ideas to explain these highly technical concepts to the broader public. In this vein, we feel the term "confabulation," although similarly imperfect, is a better metaphor than "hallucination." In human psychology, a "confabulation" occurs when someone's memory has a gap and the brain convincingly fills in the rest without intending to deceive others.

https://arstechnica.com/information-technology/2023/04/why-a...

jmount3mo ago

That is a key point: they are fabrications, not hallucinations.

4bpp3mo ago

Once upon a time, in a more innocent age, someone made a parody (of an even older Evangelical propaganda comic [1]) that imputed an unexpected motivation to cultists who worship eldritch horrors: https://www.entrelineas.org/pdf/assets/who-will-be-eaten-fir...

It occurred to me that this interpretation is applicable here.

[1] https://en.wikipedia.org/wiki/Chick_tract

shusaku3mo ago

Checking each citation one by one is quite critical in peer review, and of course checking a colleagues paper. I’ve never had to deal with AI slop, but you’ll definitely see something cited for the wrong reason. And just the other day during the final typesetting of a paper of mine I found the journal had messed up a citation (same journal / author but wrong work!)

stefan_3mo ago

Is it quite critical? Peer review is not checking homework, it's about the novel contribution presented. Papers will frequently cite related notable experiments or introduce a problem that as a peer reviewer in the field I'm already well familiar with. These paragraphs generate many citations but are the least important part of a peer review.

(People submitting AI slop should still be ostracized of course, if you can't be bothered to read it, why would you think I should)

1 more reply

benbojangles3mo ago

How to get to the top of you are not smart enough?

peppersghost933mo ago

I sincerely hope every person who has invested money in these bullshit machines loses every cent they've got to their name. LLMs poison every industry they touch.

mlmonkey3mo ago

"Given that we've only scanned 300 out of 20,000 submissions"

Fuck! 20,000!!

teekert3mo ago

Thanx AI, for exposing this problem that we knew was there, but could never quite prove.

saimiam3mo ago

Just today, I was working with ChatGPT to convert Hinduism's Mimamsa School's hermeneutic principles for interpreting the Vedas into custom instructions to prevent hallucinations. I'll share the custom instructions here to protect future scientists for shooting themselves in the foot with Gen AI.

---

As an LLM, use strict factual discipline. Use external knowledge but never invent, fabricate, or hallucinate. Rules: Literal Priority: User text is primary; correct only with real knowledge. If info is unknown, say so. Start–End Coherence: Keep interpretation aligned; don’t drift. Repetition = Intent: Repeated themes show true focus. No Novelty: Add no details without user text, verified knowledge, or necessary inference. Goal-Focused: Serve the user’s purpose; avoid tangents or speculation. Narrative ≠ Data: Treat stories/analogies as illustration unless marked factual. Logical Coherence: Reasoning must be explicit, traceable, supported. Valid Knowledge Only: Use reliable sources, necessary inference, and minimal presumption. Never use invented facts or fake data. Mark uncertainty. Intended Meaning: Infer intent from context and repetition; choose the most literal, grounded reading. Higher Certainty: Prefer factual reality and literal meaning over speculation. Declare Assumptions: State assumptions and revise when clarified. Meaning Ladder: Literal → implied (only if literal fails) → suggestive (only if asked). Uncertainty: Say “I cannot answer without guessing” when needed. Prime Directive: Seek correct info; never hallucinate; admit uncertainty.

bitwarrior3mo ago

Are you sure this even works? My understanding is that hallucinations are a result of physics and the algorithms at play. The LLM always needs to guess what the next word will be. There is never a point where there is a word that is 100% likely to occur next.

The LLM doesn't know what "reliable" sources are, or "real knowledge". Everything it has is user text, there is nothing it knows that isn't user text. It doesn't know what "verified" knowledge is. It doesn't know what "fake data" is, it simply has its model.

Personally I think you're just as likely to fall victim to this. Perhaps moreso because now you're walking around thinking you have a solution to hallucinations.

2 more replies

j / k navigate · click thread line to collapse

422 comments

WWWWH3mo ago

As a reviewer, if I see the authors lie in this way why should I trust anything else in the paper? The only ethical move is to reject immediately.

I acknowledge mistakes and so on are common but this is different league bad behaviour.

mike_hearn3mo ago

What field are you in?

Gwern wrote about this here:

https://gwern.net/leprechaun

TomasBM3mo ago

The abuse of claims and citations is a legitimate and common problem.

However, I think hallucinated citations pose a bigger problem, because they're fundamentally a lie by commission instead of omission, misinterpretation or misrepresentation of facts.

So, unless we reduce the pressure on publication speed, and increase the pressure for quality, we'll need to introduce more robust quality checks into peer review.

stainablesteel3mo ago

this brings us to a cultural divide, westerners would see this as a personal scar, as they consider the integrity of the publishing sphere at large to be held up by the integrity of individuals

i clicked on 4 of those papers, and the pattern i saw was middle-eastern, indian, and chinese names

ssivark3mo ago

PSA: Please note that the names are hallucinated author lists part of the hallucinated citations, and not names of offending authors.

ribosometronome3mo ago

Where do you see the authors? All I'm seeing is:

>Anonymous authors

>Paper under double-blind review

3 more replies

zsdfgyu3mo ago

This sort of behavior is not limited to researchers from those cultures. One of the highest profile academic frauds to date was from a German. Look up the Schön scandal.

throw109203mo ago

I have a relative who lived in a country in the East for several years, and he says that this is just factually true.

The vast majority of people who disagree with this statement have never actually lived in these cultures. They just hallucinate that they have because they want that statement to be false so badly.

In that case, I would expect Eastern authors to be over-represented, because they just publish a lot more.

Aeglaecia3mo ago

2 more replies

make33mo ago

Isn't this mostly a set of citation typos? To me this mostly calls for better bibtex checking, writing and checking bibtex is super annoying

urspx3mo ago

Besides, I would think most people are using bibliographic managers like Zotero&co..., which will pull metadata through DOIs or such.

The errors look a lot more like what happens when you ask an LLM for some sources on xyz.

2 more replies

ulrashida3mo ago

The time it takes to find these errors is orders of magnitude higher than checking if a citation exists as you need to both read and understand the source material.

These bad actors should be subject to a three strikes rule: the steady corrosion of knowledge is not an accident by these individuals.

hippo223mo ago

It seems like this is the type of thing that LLMs would actually excel at though: find a list of citations and claims in this paper, do the cited works support the claims?

bryanrasmussen3mo ago

sure, except when they hallucinate that the cited works support the claims when they do not. At which point you're back at needing to read the cited works to see if they support the claims.

2 more replies

19f191ty3mo ago

Exactly abuse of citations is a much more prevalent and sinister issue and has been for a long time. Fake citations are of course bad but only tip of the iceberg.

seventytwo3mo ago

Then punish all of it.

lijenjin3mo ago

potato37328423mo ago

>These bad actors should be subject to a three strikes rule: the steady corrosion of knowledge is not an accident by these individuals.

mike_hearn3mo ago

No, not really. I've read lots of research papers from commercial firms and academic labs. Bad citations are something I only ever saw in academic papers.

theoldgreybeard3mo ago

If a carpenter builds a crappy shelf “because” his power tools are not calibrated correctly - that’s a crappy carpenter, not a crappy tool.

If a scientist uses an LLM to write a paper with fabricated citations - that’s a crappy scientist.

AI is not the problem, laziness and negligence is. There needs to be serious social consequences to this kind of thing, otherwise we are tacitly endorsing it.

CapitalistCartr3mo ago

andy993mo ago

The reviewer is not a proofreader, they are checking the rigour and relevance of the work, which does not rest heavily on all of the references in a document. They are also assuming good faith.

14 more replies

xnx3mo ago

1 more reply

lencastre3mo ago

an old boss of mine used to say there are no stupid electricians found alive, as they self select darwin award style

bdangubic3mo ago

same (and much, much, much worse) for science

barfoure3mo ago

I’d love to hear some examples of poor electrical work that you’ve come across that’s often missed or not seen.

2 more replies

kklisura3mo ago

> AI is not the problem, laziness and negligence is

So no, you're wrong - AI IS THE PROBLEM.

sneak3mo ago

> it is a discourse used solely for the purpose of not doing anything and not addressing anything about the underlying problem

Solely? Oh brother.

Yoofie3mo ago

No, the OP is right in this case. Did you read TFA? It was "peer reviewed".

If the peer reviewers can't be bothered to do the basics, then there is literally no point to peer review, which is fully independent of the author who uses or doesn't use AI tools.

2 more replies

TomatoCo3mo ago

embedding-shape3mo ago

1 more reply

k4rli3mo ago

Very good analogy I'd say.

Also similar to what Temu, Wish, and other similar sites offer. Picture and specs might look good but it will likely be disappointing in the end.

SubiculumCode3mo ago

That said, these tools have substantially reduced hallucinations over the last year, and will just get better. It also helps if you can restrict it to reference already screened papers.

bossyTeacher3mo ago

1 more reply

bigstrat20033mo ago

> If a carpenter builds a crappy shelf “because” his power tools are not calibrated correctly - that’s a crappy carpenter, not a crappy tool.

It's both. The tool is crappy, and the carpenter is crappy for blindly trusting it.

> AI is not the problem, laziness and negligence is.

Similarly, both are a problem here. LLMs are a bad tool, and we should hold people responsible when they blindly trust this bad tool and get bad results.

jodleif3mo ago

I find this to be a bit “easy”. There is such a thing as bad tools. If it is difficult to determine if the tool is good or bad i’d say some of the blame has to be put on the tool.

nwallin3mo ago

"Anyone, from the most clueless amateur to the best cryptographer, can create an algorithm that he himself can’t break."--Bruce Schneier

czl3mo ago

> I do not believe there exists a way to safely use LLMs in scientific processes.

rectang3mo ago

“X isn’t the problem, people are the problem.” — the age-old cry of industry resisting regulation.

kklisura3mo ago

It's not about resisting. It's about undermining any action whatsoever.

theoldgreybeard3mo ago

I am not against regulation.

Quite the opposite actually.

codywashere3mo ago

what regulation are you advocating for here?

2 more replies

only-one17013mo ago

Absolutely brutal case of engineering brain here. Real "guns don't kill people, people kill people" stuff.

somehnguy3mo ago

Your second statement is correct. What about it makes it “engineering brain”?

2 more replies

theoldgreybeard3mo ago

If you were to wager a guess, what do you think my views on gun rights are?

1 more reply

Forgeties793mo ago

If my calculator gives me the wrong number 20% of the time yeah I should’ve identified the problem, but ideally, that wouldn’t have been sold to me as a functioning calculator in the first place.

imiric3mo ago

Indeed. The narrative that this type of issue is entirely the responsibility of the user to fix is insulting, and blame deflection 101.

2 more replies

theoldgreybeard3mo ago

If it was a well understood property of calculators that they gave incorrect answers randomly then you need to adjust the way you use the tool accordingly.

4 more replies

Hammershaft3mo ago

AI dramatically changes the perceived cost/benefit of laziness and negligence, which is leading to much more of it.

grey-area3mo ago

Generative AI and the companies selling it with false promises and using it for real work absolutely are the problem.

acituan3mo ago

> AI is not the problem, laziness and negligence is.

Therefore we must also criticize all the systemic factors that puts pressure on reversal of AI‘s assistance into AI’s domination of human activity.

So AI (not as a technology but as a product when shoved down the throats) is the problem.

alexcdot3mo ago

Absolutely, expectations and tools given by management are a real problem.

If management fires you because they are wrong about how good AI is, and you're right - at the end of the day, you're fired and the manager is in lalaland.

People need to actually push the correct calibration of what these tools should be trusted to do, while also trying to work with what they have.

photochemsyn3mo ago

b00ty4breakfast3mo ago

maybe the hammer factory should be held responsible for pumping out so many poorly calibrated hammer

SauntSolaire3mo ago

The obvious solution in this scenario is.. to just buy a different hammer.

And in the case of AI, either review its output, or simply don't use it. No one has a gun to your head forcing you to use this product (and poorly at that).

1 more reply

venturecruelty3mo ago

No, because this would cost tens of jobs and affect someone's profits, which are sacrosanct. Obviously the market wants exploding hammers, or else people wouldn't buy them. I am very smart.

venturecruelty3mo ago

"It's not a fentanyl problem, it's a people problem."

"It's not a car infrastructure problem, it's a people problem."

"It's not a food safety problem, it's a people problem."

"It's not a lead paint problem, it's a people problem."

"It's not an asbestos problem, it's a people problem."

"It's not a smoking problem, it's a people problem."

SauntSolaire3mo ago

What an absurd set of equivalences to make regarding a scientist's relationship to their own work.

If an engineer provided this line of excuse to me, I wouldn't let them anywhere near a product again - a complete abdication of personal and professional responsibility.

stocksinsmocks3mo ago

jval433mo ago

If a scientist just completely "made up" their references 10 years ago, that's a fraudster. Not just dishonesty but outright academic fraud.

If a scientist does it now, they just blame it on AI. But the consequences should remain the same. This is not an honest mistake.

And yes, I've published in academia and I've never cheated or plagiarized in my life. That should not be a drawback.

raincole3mo ago

Given we tacitly accepted replication crisis we'll definitely tacitly accept this.

psychoslave3mo ago

I don't see much crappy power tool provider throwing billions in marketing and product placement to make them used everywhere.

calmworm3mo ago

I don’t understand. You’re saying even with crappy tools one should be able to do the job the same as with well made tools?

tedd4u3mo ago

1 more reply

constantcrying3mo ago

DonHopkins3mo ago

Shouldn't there be a black list of people who get caught writing fraudulent papers?

theoldgreybeard3mo ago

Probably. Something like that is what I meant by “social consequences”. Perhaps there should be civil or criminal ones for more egregious cases.

nialv73mo ago

Ah, the "guns don't kill people, people kill people" argument.

I mean sure, but having a tool that made fabrication so much easier has made the problem a lot worse, don't you think?

theoldgreybeard3mo ago

Yes I do agree with you that having a tool that gives rocket fuel to a fraud engine should probably be regulated in some fashion.

Tiered licensing, mandatory safety training, and weapon classification by law enforcement works really well for Canada’s gun regime, for example.

left-struck3mo ago

It’s like the problem was there all along, all LLMs did was expose it more

theoldgreybeard3mo ago

Yes, LLMs didnt create the problem they just accelerated it to a speed that beggars belief.

criley23mo ago

https://en.wikipedia.org/wiki/Replication_crisis

RossBencina3mo ago

No qualified carpenter expects to use a hammer to drill a hole.

foxfired3mo ago

I disagree. When the tool promises to do something, you end up trusting it to do the thing.

When Tesla says their car is self driving, people trust them to self drive. Yes, you can blame the user for believing, but that's exactly what they were promised.

It's not laziness, its the feature we were promised. We can't keep saying everyone is holding it wrong.

[0]: https://idiallo.com/blog/none-of-us-read-the-specs

rolandog3mo ago

gdulli3mo ago

baxtr3mo ago

If I gave you a gun would you start shooting people just because you had one?

8 more replies

jgalt2123mo ago

fair enough, but carpenters are not being beat over the head to use new-fangled probabilistic speed squares.

hansmayer3mo ago

the84723mo ago

> Where is the god damn cure for cancer the LLMs were supposed to invent?

1 more reply

belter3mo ago

"...each of which were missed by 3-5 peer reviewers..."

Its sloppy work all the way down...

1 more reply

rdiddly3mo ago

¿Por qué no los dos?

mk893mo ago

> we are tacitly endorsing it.

thaumasiotes3mo ago

> If a scientist uses an LLM to write a paper with fabricated citations - that’s a crappy scientist.

Really? Regardless of whether it's a good paper?

Aurornis3mo ago

Citations are a key part of the paper. If the paper isn’t supported by the citations, it’s not a good paper.

1 more reply

zwnow3mo ago

How is it a good paper if the info in it cant be trusted lmao

1 more reply

jameshart3mo ago

Is the baseline assumption of this work that an erroneous citation is LLM hallucinated?

Did they run the checker across a body of papers before LLMs were available and verify that there were no citations in peer reviewed papers that got authors or titles wrong?

miniwark3mo ago

Thad said, i am also very curious of the result than their tool, would give to papers from the 2010's and before.

sigmoid103mo ago

4 more replies

_alternator_3mo ago

Let me second this: a baseline analysis should include papers that were published or reviewed at least 3-4 years ago.

When I was in grad school, I kept a fairly large .bib file that almost certainly had a mistake or two in it. I don’t think any of them ever made it to print, but it’s hard to be 100% sure.

For most journals, they actually partially check your citations as part of the final editing. The citation record is important for journals, and linking with DOIs is fairly common.

currymj3mo ago

the papers themselves are publicly available online too. Most of the ones I spot-checked give the extremely strong impression of AI generation.

not just some hallucinated citations, and not just the writing. in many cases the actual purported research "ideas" seem to be plausible nonsense.

tokai3mo ago

Yeah that is what their tool does.

llm_nerd3mo ago

Exactly as you said, do precisely this to pre-LLM works. There will be an enormous number of errors with utter certainty.

People keep imperfect notes. People are lazy. People sometimes even fabricate. None of this needed LLMs to happen.

pmontra3mo ago

Fabricated citations are not errors.

A pre LLM paper with fabricated citations would demonstrate will to cheat by the author.

2 more replies

the_af3mo ago

Humans can do all of the above but it costs them more, and they do it more slowly. LLMs generate spam at a much faster rate.

1 more reply

add-sub-mul-div3mo ago

Quoting myself from just last night because this comes up every time and doesn't always need a new write-up.

1 more reply

nkrisc3mo ago

Under what circumstances would a human mistakenly cite a paper which does not exist? I’m having difficulty imagining how someone could mistakenly do that.

1 more reply

chistev3mo ago

https://www.rxjourney.net/how-artificial-intelligence-ai-is-...

teddyh3mo ago

> Avi Loeb, who is a theoretical physicist and professor at Harvard University

Also a frequent proponent of UFO claims about approaching meteors.

chistev3mo ago

Yea, he harped on that a lot during the podcast

mannanj3mo ago

"Show bad examples then hit you on the wrist for following my behavior" is like bad parenting.

dandanua3mo ago

I don't think they want you to follow their behavior. They do want accountability, but for everyone below them, not for themselves.

1 more reply

venturecruelty3mo ago

Talk about a buried lead... Avi Loeb is, first and foremost, a discredited crank.

sen3mo ago

That’s implied by the fact he was on the Joe Rogan show.

1 more reply

TaupeRanger3mo ago

It's going to be even worse than 50:

> Given that we've only scanned 300 out of 20,000 submissions, we estimate that we will find 100s of hallucinated papers in the coming days.

shusaku3mo ago

20,000 submissions to a single conference? That is nuts

ghaff3mo ago

Can't quote exact numbers but when I was on the conference committee for a maybe high four figures attendance conference, we certainly had many thousands of submissions.

zipy1243mo ago

When academics are graded based on number of papers this is the result.

1 more reply

analog313mo ago

This is an interesting article along those lines...

https://www.theguardian.com/technology/2025/dec/06/ai-resear...

thruifgguh5853mo ago

> crushed by an avalanche of submissions fueled by generative AI, paper mills, and publication pressure.

Run of the mill ML jobs these days ask for "papers in NeurIPS ICLR or other Tier-1 conferences".

We're well past Goodhart's law when it comes to publications.

It was already insane in CS - now it's reached asylum levels.

disqard3mo ago

You said the quiet part out loud.

Academia has been ripe for disruption for a while now.

The "Rooter" paper came out 20 years ago:

https://www.csail.mit.edu/news/how-fake-paper-generator-tric...

Isamu3mo ago

make33mo ago

senshan3mo ago

As many pointed out, the purpose of peer review is not linting, but the assessment of the novelty and subtle omissions.

Which incentives can be set to discourage the negligence?

How about bounties? A bounty fund set up by the publisher and each submission must come with a contribution to the fund. Then there be bounties for gross negligence that could attract bounty hunters.

How about a wall of shame? Once negligence crosses a certain threshold, the name of the researcher and the paper would be put on a wall of shame for everyone to search and see?

skybrian3mo ago

senshan3mo ago

There must be price to pay for wasting other people's time (lives?).

dclowd99013mo ago

To me, this is exactly what LLMs are good for. It would be exhausting double checking for valid citations in a research paper. Fuzzy comparison and rote lookup seem primed for usage with LLMs.

Writing academic papers is exactly the _wrong_ usage for LLMs. So here we have a clear cut case for their usage and a clear cut case for their avoidance.

skobes3mo ago

If LLMs produce fake citations, why would we trust LLMs to check them?

watwut3mo ago

Because the risk is lower. They will give you suspicious citations and you can manually check those for false positives. If some false citation pass, it was still a net gain.

venturecruelty3mo ago

Because my boss said if I don't, I'm fired.

idiotsecant3mo ago

dawnerd3mo ago

dclowd99013mo ago

MarkusQ3mo ago

This is as much a failing of "peer review" as anything. Importantly, it is an intrinsic failure, which won't go away even if LLMs were to go away completely.

Peer review doesn't catch errors.

watwut3mo ago

Passed peer review is the first basic bar that has to be cleared. It was never supposed to be all there is to the science.

dawnerd3mo ago

MarkusQ3mo ago

Agreed. But too often it's treated as a golden ticket confirmation of veracity, giving the process more epistemological authority than it warrants.

ulrashida3mo ago

I'm not sure why you think this isn't the case?

MarkusQ3mo ago

Poor wording on my part.

I should have said "Peer review doesn't catch _all_ errors" or perhaps "Peer review doesn't eliminate errors".

https://pmc.ncbi.nlm.nih.gov/articles/PMC1182327/

1 more reply

qbit423mo ago

tpoacher3mo ago

Peer review is as useless as code review and unit tests, yes.

It's much more useful if everyone including the janitor and their mom can have a say on your code before you're allowed to move to your next commit.

(/s, in case it's not obvious :D )

exasperaited3mo ago

No, it's not "as much".

The dominant "failing" here is that this is fraudulent on a professional, intellectual, and moral level.

currymj3mo ago

I recommend actually clicking through and reading some of these papers.

can I prove malfeasance beyond a reasonable doubt? no. but I personally feel quite confident many of the papers I checked are primarily AI-generated.

I feel really bad for any authors who submitted legitimate work but made an innocent mistake in their .bib and ended up on the same list as the rest of this stuff.

uplifter3mo ago

This isn't comforting at all.

noodlesUK3mo ago

I know it wouldn’t solve the human problem of relying on LLMs but I’m shocked we don’t even have this level of scrutiny.

pama3mo ago

noodlesUK3mo ago

Presumably the citation scanner they're using is relying on similar data sources as Zotero in any case to detect these sorts of issues.

sj01f3mo ago

gedy3mo ago

ineedasername3mo ago

How can someone not be aware, at this point, that— sure- use the systems for finding and summarizing research, but for each source, take 2 minutes to find the source and verify?

Really, this isn’t that hard and it’s not at all an obscure requirement or unknown factor.

btisler3mo ago

ricardobeat3mo ago

Most of the names in these wrong attributions are actual people though, not hallucinations. What is going on? Is this a case of AI-powered citation management creating some weird feedback loop?

[1] https://app.gptzero.me/documents/54c8aa45-c97d-48fc-b9d0-d49...

[2] https://arxiv.org/pdf/2311.12022

[3] https://arxiv.org/html/2509.22536v3

[4] https://arxiv.org/html/2511.01191v1

mjd3mo ago

I love that fake citation that adds George Costanza to the list of authors!

Ekaros3mo ago

Maybe there just is no incentive for this type of activity.

analog313mo ago

QuadmasterXLII3mo ago

IanCal3mo ago

We do have these things and they are often wrong. Loads of the examples given look better than things I’ve seen in real databases on this kind of thing and I worked in this area for a decade.

jordanpg3mo ago

Does anyone know, from a technical standpoint, why are citations such a problem for LLMs?

Muller203mo ago

There are some mitigations that are used such as RAG or tool usage (e.g. a browser), but they don't completely fix the underlying issue.

jordanpg3mo ago

My point is that citations are constantly making headlines, yet at least at first glance, seems like an eminently solvable problem.

1 more reply

neilv3mo ago

https://blog.iclr.cc/2025/11/19/iclr-2026-response-to-llm-ge...

> Papers that make extensive usage of LLMs and do not disclose this usage will be desk rejected.

This sounds like they're endorsing the game of how much can we get away with, towards the goal of slipping it past the reviewers, and the only penalty is that the bad paper isn't accepted.

proto-n3mo ago

1. "Suspected" is just that, suspected, you can't penalize papers based on your gut feel 2. LLM-s are a tool, and there's nothing wrong with using them unless you misuse them

neilv3mo ago

"Suspected" doesn't necessarily mean only gut feel.

upofadown3mo ago

cratermoon3mo ago

I believe we discussed this last week, for a different vendor. https://news.ycombinator.com/item?id=46088236

Headline should be "AI vendor’s AI-generated analysis claims AI generated reviews for AI-generated papers at AI conference".

h/t to Paul Cantrell https://hachyderm.io/@inthehands/115633840133507279

pama3mo ago

simonw3mo ago

(I'm on mobile, haven't looked on desktop.)

leoc3mo ago

Ah, yes: meta-level model collapse. Very good, carry on.

knallfrosch3mo ago

And these are just the citations that any old free tool could have included via Bibtex link from the website?

You have to wonder what's in these papers.

exasperaited3mo ago

Every single person who did this should be censured by their own institutions.

Do it more than once? Lose job.

End of story.

ls6123mo ago

exasperaited3mo ago

> This just tells me you hate academics and want to hurt them gratuitously.

Well then you're being rather silly, because that is a silly conclusion to draw (and one not supported by the evidence).

A fairer conclusion was that I meant what is obvious: if you use AI to generate a bibliography, you are being academically negligent.

If you disagree with that, I would say it is you that has the problem with academia, not me.

1 more reply

hyperpape3mo ago

It's awful that there are these hallucinated citations, and the researchers who submitted them ought to be ashamed. I also put some of the blame on the boneheaded culture of academic citations.

"Compression has been widely used in columnar databases and has had an increasing importance over time.[1][2][3][4][5][6]"

So many citations are not an integral part of the paper, but instead randomly sprinkled on to give an air of authority and completeness that isn't deserved.

mccoyb3mo ago

Papers with a fake air of authority of easily dispatched with. What is not so easily dispatched with is the politics of the submission process.

This type of content is fundamentally about emotions (in the reviewer of your paper), and emotions is undeniably a large factor in acceptance / rejection.

zipy1243mo ago

wohoef3mo ago

Tools like GPTzero are incredibly unreliable. Me and plently of my colleagues often get our writing flagged as 100% AI by these tools, when no AI was used.

obscurette3mo ago

John78787813mo ago

Yep. And trust is already at all time lows for science, as if it couldn't get any worse.

jqpabc1233mo ago

The legal system has a word to describe AI "slop" --- it is called "negligence".

And as the remedy starts being applied (aka "liability"), the enthusiasm for AI will start to wane.

I wouldn't be surprised if some businesses ban the use of AI --- starting with law firms.

ls6123mo ago

The legal system has a word to describe software bugs --- it is called "negligence".

And as the remedy starts being applied (aka "liability"), the enthusiasm for software will start to wane.

What if anything do you think is wrong with my analogy? I doubt most people here support strict liability for bugs in code.

hnfong3mo ago

I don't even think GP knows what negligence is.

The emotionally-driven hate for AI, in a tech-centric forum even, to the extent that so many commenters seem to be off-balance in their rational thinking, is kinda wild to me.

1 more reply

jqpabc1233mo ago

What if anything do you think is wrong with my analogy?

I think what is clearly wrong with your analogy is assuming that AI applies mostly to software and code production. This is actually a minor use-case for AI.

https://www.evidentlyai.com/blog/ai-failures-examples

senshan3mo ago

Very good analogy indeed. With one modification it makes perfect sense:

> And as the remedy starts being applied (aka "liability"), the enthusiasm for sloppy and poorly tested software will start to wane.

Many of us use AI to write code these days, but the burden is still on us to design and run all the tests.

loloquwowndueo3mo ago

I applaud your use of triple dashes to avoid automatic conversion to em dashes and being labeled an AI. Kudos!

ghaff3mo ago

This is a particular meme that I really don't like. I've used em-dashes routinely for years. Do I need to stop using them because various people assume they're an AI flag?

1 more reply

VerifiedReports3mo ago

Fabricated, not "hallucinated."

godelski3mo ago

In case people missed it there's some additional important context:

  - Major AI conference flooded with peer reviews written by AI 
      https://news.ycombinator.com/item?id=46088236
  - "All OpenReview Data Leaks" 
    https://news.ycombinator.com/item?id=46073488
    - "The Day Anonymity Died: Inside the OpenReview / ICLR 2026 Leak" 
      https://news.ycombinator.com/item?id=46082370
    - More about the leak
      https://forum.cspaper.org/topic/191/iclr-i-can-locate-reviewer-how-an-api-bug-turned-blind-review-into-a-data-apocalypse

The second one went under the radar, but basically OpenReview left the API open so you didn't need credentials. This meant all reviewers and authors were deanonymized across multiple conferences.

All these links are for ICLR too, which is the #2 ML conference for those that don't know.

And this week CVPR sent out notifications that OpenReview will be down between Dec 6th and Dec 9th. No explanation for why.

So we have reviewers using LLMs, authors using LLMs, and idk the conference systems writing their software with LLMs? Things seem pretty fragile right now...

It really seems that academia doesn't scale very well...

[0] https://www.youtube.com/shorts/rDk_LsON3CM

tomrod3mo ago

How sloppy is someone that they don't check their references!

analog313mo ago

Let's say that I use a formula, and give a reference to where the formula came from, but the reference doesn't exist. Would you trust the formula?

Let's say a computer program calls a subroutine with a certain name from a certain library, but the library doesn't exist.

michaelcampbell3mo ago

After an interview with Cory Doctorow I saw recently, I'm going to stop anthropomorphizing these things by calling them "hallucinations". They're computers, so these incidents are just simply Errors.

skobes3mo ago

Developers have been anthropomorphizing computers for as long as they've been around though.

"The compiler thinks my variable isn't declared" "That function wants a null-terminated string" "Teach this code to use a cache"

Even the word computer once referred to a human.

grayhatter3mo ago

vegabook3mo ago

Given that an (incompetent or even malicious) human put their names(s) to this stuff, “bullshit” is an even better and fitting anthropomorphization

1 more reply

crazygringo3mo ago

They're a very specific kind of error, just like off-by-one errors, or I/O errors, or network errors. The name for this kind of error is a hallucination.

ml-anon3mo ago

No it’s not. It’s made up bullshit that arises for reasons that literally no one can formalize or reliably prevent. This is the exact opposite of specific.

1 more reply

Ekaros3mo ago

We still use term bug. And no modern bug is cause by an Arthropod. In that sense I think hallucination is fair term. As coming up anything sufficiently better is hard.

teddyh3mo ago

An actually better (and also more accurate) term would be “confabulations”. Unfortunately, it has not caught on.

JTbane3mo ago

Nah it's very apt and perfectly encapsulates output that looks plausible but is in fact factually incorrect or made up.

rdiddly3mo ago

watwut3mo ago

Can we just call them "lies" and "fabrications" which is what they are? If I write the same, you will call them "made up citations" and "academic dishonesty".

One can use AI to help them write without going all the way to having it generate facts and citations.

sorokod3mo ago

As long as the submissions are on behalf of humans we should. The humans should accept the consequences too.

Barbing3mo ago

Ars has often gone with “confabulation”:

>Confabulation was coined right here on Ars, by AI-beat columnist Benj Edwards, in Why ChatGPT and Bing Chat are so good at making things up (Apr 2023).

https://arstechnica.com/civis/threads/researchers-describe-h...

https://arstechnica.com/information-technology/2023/04/why-a...

jmount3mo ago

That is a key point: they are fabrications, not hallucinations.

4bpp3mo ago

It occurred to me that this interpretation is applicable here.

[1] https://en.wikipedia.org/wiki/Chick_tract

shusaku3mo ago

stefan_3mo ago

(People submitting AI slop should still be ostracized of course, if you can't be bothered to read it, why would you think I should)

1 more reply

benbojangles3mo ago

How to get to the top of you are not smart enough?

peppersghost933mo ago

I sincerely hope every person who has invested money in these bullshit machines loses every cent they've got to their name. LLMs poison every industry they touch.

mlmonkey3mo ago

"Given that we've only scanned 300 out of 20,000 submissions"

Fuck! 20,000!!

teekert3mo ago

Thanx AI, for exposing this problem that we knew was there, but could never quite prove.

saimiam3mo ago

---

bitwarrior3mo ago

Personally I think you're just as likely to fall victim to this. Perhaps moreso because now you're walking around thinking you have a solution to hallucinations.

2 more replies

j / k navigate · click thread line to collapse