As a reviewer, if I see the authors lie in this way why should I trust anything else in the paper? The only ethical move is to reject immediately.
I acknowledge mistakes and so on are common but this is different league bad behaviour.
In many fields it's gross professional misconduct only in theory. This sort of thing is very common and there's never any consequence. LLM-generated citations specifically are a new problem but citations of documents that don't support the claim, contradict it, have nothing to do with it or were retracted years ago have been an issue for a long time.
Gwern wrote about this here:
"A major source of [false claim] transmission is the frequency with which researchers do not read the papers they cite: because they do not read them, they repeat misstatements or add their own errors, further transforming the leprechaun and adding another link in the chain to anyone seeking the original source. This can be quantified by checking statements against the original paper, and examining the spread of typos in citations: someone reading the original will fix a typo in the usual citation, or is unlikely to make the same typo, and so will not repeat it. Both methods indicate high rates of non-reading"
I first noticed this during COVID and did some blogging about it. In public health it is quite common to do things like present a number with a citation, and then the paper doesn't contain that number anywhere in it, or it does but the number was an arbitrary assumption pulled out of thin air rather than the empirical fact it was being presented as.
It was also very common for papers to open by saying something like, "Epidemiological models are a powerful tool for predicting the spread of disease" with eight different citations, and every single citation would be an unvalidated model - zero evidence that any of the cited models were actually good at prediction.
Bad citations are hardly the worst problem with these fields, but when you see how widespread it is and that nobody within the institutions cares it does lead to the reaction you're having where you just throw your hands up and declare whole fields to be writeoffs.
However, I think hallucinated citations pose a bigger problem, because they're fundamentally a lie by commission instead of omission, misinterpretation or misrepresentation of facts.
At the same time, it may be an accidental lie, insofar authors mistakenly used LLMs as search engines, just to support a claim that's commonly known, or that they remember well but can't find the origin of.
So, unless we reduce the pressure on publication speed, and increase the pressure for quality, we'll need to introduce more robust quality checks into peer review.
i clicked on 4 of those papers, and the pattern i saw was middle-eastern, indian, and chinese names
these are cultures where they think this kind of behavior is actually acceptable, they would assume it's the fault of the journal for accepting the paper. they don't see the loss of reputation to be a personal scar because they instead attribute blame to the game.
some people would say it's racist to understand this, but in my opinion when i was working with people from these cultures there was just no other way to learn to cooperate with them than to understand them, it's an incredibly confusing experience to be working with them until you understand the various differences between your own culture and theirs
AFAIK the submissions are still blinded and we don't know who the authors are. We will, surely, soon -- since ICLR maintains all submissions in public record for posterity, even if "withdrawn". They are unblinded after the review period finishes.
>Anonymous authors
>Paper under double-blind review
I have a relative who lived in a country in the East for several years, and he says that this is just factually true.
The vast majority of people who disagree with this statement have never actually lived in these cultures. They just hallucinate that they have because they want that statement to be false so badly.
...but, simultaneously, I'm also not seeing where you see the authors of the papers - I only see hallucitation authors. e.g. at the link for the first paper submission (https://openreview.net/forum?id=WPgaGP4sVS), there doesn't appear to be any authors listed. Are you confusing the hallucinated citation authors with the primary paper authors?
In that case, I would expect Eastern authors to be over-represented, because they just publish a lot more.
Besides, I would think most people are using bibliographic managers like Zotero&co..., which will pull metadata through DOIs or such.
The errors look a lot more like what happens when you ask an LLM for some sources on xyz.
The time it takes to find these errors is orders of magnitude higher than checking if a citation exists as you need to both read and understand the source material.
These bad actors should be subject to a three strikes rule: the steady corrosion of knowledge is not an accident by these individuals.
On their site (https://gptzero.me/sources) it also says "GPTZero's Hallucination Detector automatically detects hallucinated sources and poorly supported claims in essays. Verify academic integrity with the most accurate hallucination detection tool for educators", so it does more than just identify invalid citations. Seems to do exactly what you're talking about.
These people are working in labs funded by Exxon or Meta or Pfizer or whoever and they know what results will make continued funding worthwhile in the eyes of their donors. If the lab doesn't produce the donor will fund another one that will.
I think that's because a lot of bad citations come from reviewer demands to add more of them during the journal publishing process, so they're not critical to the argument and end up being low effort citations that get copy/pasted between papers. Or someone is just spamming citations to make a weak claim look strong. And all this happens because academic uses citations as a kind of currency (it's a planned non-market economy, so they have to allocate funds using proxy signals).
Commercial labs are less likely to care about the journal process to begin with, and are much less likely to publish weak claims because publishing is just a recruiting tool, not the actual end goal of the R&D department.
If a scientist uses an LLM to write a paper with fabricated citations - that’s a crappy scientist.
AI is not the problem, laziness and negligence is. There needs to be serious social consequences to this kind of thing, otherwise we are tacitly endorsing it.
The reviewer is not a proofreader, they are checking the rigour and relevance of the work, which does not rest heavily on all of the references in a document. They are also assuming good faith.
This reminds me about discourse about a gun problem in US, "guns don't kill people, people kill people", etc - it is a discourse used solely for the purpose of not doing anything and not addressing anything about the underlying problem.
So no, you're wrong - AI IS THE PROBLEM.
Solely? Oh brother.
In reality it’s the complete opposite. It exists to highlight the actual source of the problem, as both industries/practitioners using AI professionally and safely, and communities with very high rates of gun ownership and exceptionally low rates of gun violence exist.
It isn’t the tools. It’s the social circumstances of the people with access to the tools. That’s the point. The tools are inanimate. You can use them well or use them badly. The existence of the tools does not make humans act badly.
> Worryingly, each of these submissions has already been reviewed by 3-5 peer experts, most of whom missed the fake citation(s). This failure suggests that some of these papers might have been accepted by ICLR without any intervention. Some had average ratings of 8/10, meaning they would almost certainly have been published.
If the peer reviewers can't be bothered to do the basics, then there is literally no point to peer review, which is fully independent of the author who uses or doesn't use AI tools.
Also similar to what Temu, Wish, and other similar sites offer. Picture and specs might look good but it will likely be disappointing in the end.
That said, these tools have substantially reduced hallucinations over the last year, and will just get better. It also helps if you can restrict it to reference already screened papers.
Finally, I'd lke to say tthat if we want scientists to engage in good science, stop forcing them to spend a third of their time in a rat race for funding...it is ridiculously time consuming and wasteful of expertise.
It's both. The tool is crappy, and the carpenter is crappy for blindly trusting it.
> AI is not the problem, laziness and negligence is.
Similarly, both are a problem here. LLMs are a bad tool, and we should hold people responsible when they blindly trust this bad tool and get bad results.
There's a corollary here with LLMs, but I'm not pithy enough to phrase it well. Anyone can create something using LLMs that they, themselves, aren't skilled enough to spot the LLMs' hallucinations. Or something.
LLMs are incredibly good at exploiting peoples' confirmation biases. If it "thinks" it knows what you believe/want, it will tell you what you believe/want. There does not exist a way to interface with LLMs that will not ultimately end in the LLM telling you exactly what you want to hear. Using an LLM in your process necessarily results in being told that you're right, even when you're wrong. Using an LLM necessarily results in it reinforcing all of your prior beliefs, regardless of whether those prior beliefs are correct. To an LLM, all hypotheses are true, it's just a matter of hallucinating enough evidence to satisfy the users' skepticism.
I do not believe there exists a way to safely use LLMs in scientific processes. Period. If my belief is true, and ChatGPT has told me it's true, then yes, AI, the tool, is the problem, not the human using the tool.
What about giving the LLM a narrowly scoped role as a hostile reviewer, while your job is to strengthen the write-up to address any valid objections it raises, plus any hallucinations or confusions it introduces? That’s similar to fuzz testing software to see what breaks or where the reasoning crashes.
Used this way, the model isn’t a source of truth or a decision-maker. It’s a stress test for your argument and your clarity. Obviously it shouldn’t be the only check you do, but it can still be a useful tool in the broader validation process.
Quite the opposite actually.
It's not like these are new issues. They're the same ones we've experienced since the introduction of these tools. And yet the focus has always been to throw more data and compute at the problem, and optimize for fancy benchmarks, instead of addressing these fundamental problems. Worse still, whenever they're brought up users are blamed for "holding it wrong", or for misunderstanding how the tools work. I don't care. An "artificial intelligence" shouldn't be plagued by these issues.
As much as I agree with you that this is wrong, there is a danger in putting the onus just on the human. Whether due to competition or top down expectations, humans are and will be pressured to use AI tools alongside their work and produce more. Whereas the original idea was for AI to assist the human, as the expected velocity and consumption pressure increases humans are more and more turning into a mere accountability laundering scheme for machine output. When we blame just the human, we are doing exactly what this scheme wants us to do.
Therefore we must also criticize all the systemic factors that puts pressure on reversal of AI‘s assistance into AI’s domination of human activity.
So AI (not as a technology but as a product when shoved down the throats) is the problem.
If management fires you because they are wrong about how good AI is, and you're right - at the end of the day, you're fired and the manager is in lalaland.
People need to actually push the correct calibration of what these tools should be trusted to do, while also trying to work with what they have.
Unfortunately, a large fraction of academic fraud has historically been detected by sloppy data duplication, and with LLMs and similar image generation tools, data fabrication has never been easier to do or harder to detect.
And in the case of AI, either review its output, or simply don't use it. No one has a gun to your head forcing you to use this product (and poorly at that).
It's quite telling that, even in this basic hypothetical, your first instinct is to gesture vaguely in the direction of governmental action, rather than expect any agency at the level of the individual.
"It's not a car infrastructure problem, it's a people problem."
"It's not a food safety problem, it's a people problem."
"It's not a lead paint problem, it's a people problem."
"It's not an asbestos problem, it's a people problem."
"It's not a smoking problem, it's a people problem."
If an engineer provided this line of excuse to me, I wouldn't let them anywhere near a product again - a complete abdication of personal and professional responsibility.
If a scientist does it now, they just blame it on AI. But the consequences should remain the same. This is not an honest mistake.
People that do this - even once - should be banned for life. They put their name on the thing. But just like with plagiarism, falsifying data and academic cheating, somehow a large subset of people thinks it's okay to cheat and lie, and another subset gives them chance after chance to misbehave like they're some kind of children. But these are adults and anyone doing this simply lacks morals and will never improve.
And yes, I've published in academia and I've never cheated or plagiarized in my life. That should not be a drawback.
Taking an academic who does something like that seriously, seem impossible. At best he is someone who is neglecting his most basic duties as an academic, at worst he is just a fraudster. In both cases he should be shunned and excluded.
I mean sure, but having a tool that made fabrication so much easier has made the problem a lot worse, don't you think?
Tiered licensing, mandatory safety training, and weapon classification by law enforcement works really well for Canada’s gun regime, for example.
Modern science is designed from the top to the bottom to produce bad results. The incentives are all mucked up. It's absolutely not surprising that AI is quickly becoming yet-another factor lowering quality.
When Tesla says their car is self driving, people trust them to self drive. Yes, you can blame the user for believing, but that's exactly what they were promised.
> Why didn't the lawyer who used ChatGPT to draft legal briefs verify the case citations before presenting them to a judge? Why are developers raising issues on projects like cURL using LLMs, but not verifying the generated code before pushing a Pull Request? Why are students using AI to write their essays, yet submitting the result without a single read-through? They are all using LLMs as their time-saving strategy. [0]
It's not laziness, its the feature we were promised. We can't keep saying everyone is holding it wrong.
Assuming that cure is meant as hyperbole, how about https://www.biorxiv.org/content/10.1101/2025.04.14.648850v3 ? AI models being used for bad purposes doesn't preclude them being used for good purposes.
Its sloppy work all the way down...
We are, in fact, not tacitly but openly endorsing this, due to this AI everywhere madness. I am so looking forward to when some genius in some banks starts to use it to simplify code and suddenly I have 100000000 € on my bank account. :)
Really? Regardless of whether it's a good paper?
Did they run the checker across a body of papers before LLMs were available and verify that there were no citations in peer reviewed papers that got authors or titles wrong?
Thad said, i am also very curious of the result than their tool, would give to papers from the 2010's and before.
When I was in grad school, I kept a fairly large .bib file that almost certainly had a mistake or two in it. I don’t think any of them ever made it to print, but it’s hard to be 100% sure.
For most journals, they actually partially check your citations as part of the final editing. The citation record is important for journals, and linking with DOIs is fairly common.
not just some hallucinated citations, and not just the writing. in many cases the actual purported research "ideas" seem to be plausible nonsense.
To get a feel for it, you can take some of the topics they write about and ask your favorite LLM to generate a paper. Maybe even throw "Deep Research" mode at it. Perhaps tell it to put it in ICLR latex format. It will look a lot like these.
Exactly as you said, do precisely this to pre-LLM works. There will be an enormous number of errors with utter certainty.
People keep imperfect notes. People are lazy. People sometimes even fabricate. None of this needed LLMs to happen.
A pre LLM paper with fabricated citations would demonstrate will to cheat by the author.
A post LLM paper with fabricated citations: same thing and if the authors attempt to defend themselves with something like, we trusted the AI, they are sloppy, probably cheaters and not very good at it.
Humans can do all of the above but it costs them more, and they do it more slowly. LLMs generate spam at a much faster rate.
> You also don't need gunpowder to kill someone with projectiles, but gunpowder changed things in important ways. All I ever see are the most specious knee-jerk defenses of AI that immediately fall apart.
https://www.rxjourney.net/how-artificial-intelligence-ai-is-...
Also a frequent proponent of UFO claims about approaching meteors.
"Show bad examples then hit you on the wrist for following my behavior" is like bad parenting.
> Given that we've only scanned 300 out of 20,000 submissions, we estimate that we will find 100s of hallucinated papers in the coming days.
Can't quote exact numbers but when I was on the conference committee for a maybe high four figures attendance conference, we certainly had many thousands of submissions.
https://www.theguardian.com/technology/2025/dec/06/ai-resear...
Run of the mill ML jobs these days ask for "papers in NeurIPS ICLR or other Tier-1 conferences".
We're well past Goodhart's law when it comes to publications.
It was already insane in CS - now it's reached asylum levels.
Academia has been ripe for disruption for a while now.
The "Rooter" paper came out 20 years ago:
https://www.csail.mit.edu/news/how-fake-paper-generator-tric...
Creating a real citation is totally doable by a machine though, it is just selecting relevant text, looking up the title, authors, pages etc and putting that in canonical form. It’s just that LLMs are not currently doing the work we ask for, but instead something similar in form that may be good enough.
Which incentives can be set to discourage the negligence?
How about bounties? A bounty fund set up by the publisher and each submission must come with a contribution to the fund. Then there be bounties for gross negligence that could attract bounty hunters.
How about a wall of shame? Once negligence crosses a certain threshold, the name of the researcher and the paper would be put on a wall of shame for everyone to search and see?
There must be price to pay for wasting other people's time (lives?).
Writing academic papers is exactly the _wrong_ usage for LLMs. So here we have a clear cut case for their usage and a clear cut case for their avoidance.
Peer review doesn't catch errors.
Acting as if it does, and thus assuming the fact of publication (and where it was published) are indicators of veracity is simply unfounded. We need to go back to the food fight system where everyone publishes whatever they want, their colleagues and other adversaries try their best to shred them, and the winners are the ones that stand up to the maelstrom. It's messy, but it forces critics to put forth their arguments rather than quietly gatekeeping, passing what they approve of, suppressing what they don't.
Passed peer review is the first basic bar that has to be cleared. It was never supposed to be all there is to the science.
I'm not sure why you think this isn't the case?
I should have said "Peer review doesn't catch _all_ errors" or perhaps "Peer review doesn't eliminate errors".
In other words, being "peer reviewed" is nowhere close to "error free," and if (as is often the case) the rate of errors is significantly greater than the rate at which errors are caught, peer review may not even significantly improve the quality.
It's much more useful if everyone including the janitor and their mom can have a say on your code before you're allowed to move to your next commit.
(/s, in case it's not obvious :D )
The dominant "failing" here is that this is fraudulent on a professional, intellectual, and moral level.
Most of those I spot checked do not give an impression of high quality. Not just AI writing assistance but many seem to have AI-generated "ideas", often plausible nonsense. the reviewers often catch the errors and sometimes even the fake citations.
can I prove malfeasance beyond a reasonable doubt? no. but I personally feel quite confident many of the papers I checked are primarily AI-generated.
I feel really bad for any authors who submitted legitimate work but made an innocent mistake in their .bib and ended up on the same list as the rest of this stuff.
This isn't comforting at all.
How are the authors even submitting citations? Surely they could be required to send a .bib or similar file? It’s so easy to then quality control at least to verify that citations exist by looking up DOIs or similar.
I know it wouldn’t solve the human problem of relying on LLMs but I’m shocked we don’t even have this level of scrutiny.
Presumably the citation scanner they're using is relying on similar data sources as Zotero in any case to detect these sorts of issues.
Regardless, my comment still stands, it seems like the submission is relying on the actual text of the bibliography being correct, rather than requiring a machine readable citation metadata file of some sort, which would at least allow much of the quality control checks to be automated (and certainly would preclude complete hallucinations of nonexistent papers getting through).
Such errors arise from the uncritical adoption of automated tools and a failure to verify outputs manually. This reflects academic laxity and an excessive trust in LLMs. As AI tools become ubiquitous in research—even for generating encyclopedic content—some individuals have developed a misplaced confidence in GPT, leading them to undervalue the importance of citation accuracy.
However, while this is undeniably negligent, it does not validate wholesale dismissal of the paper’s scientific merit, nor does it warrant ad hominem attacks. Demanding the end of an academic career for citation errors is a draconian measure akin to a witch hunt.
Really, this isn’t that hard and it’s not at all an obscure requirement or unknown factor.
I think this is much much less “LLMs dumbing things down” and significantly more just a shibboleth for identifying people that were already nearly or actually doing fraudulent research anyway. The ones who we should now go back and look at prior publications as very likely fraudulent as well.
The idea is simple: • Bad citations aren’t the root cause. • They are a late-stage symptom of a broken reasoning trajectory. • If you detect the break early, the hallucinated citation never appears.
The tools I’ve built (and documented so anyone can use) do three things: 1. Measure interrogative structure — they check whether the questions driving the paper’s logic are well-formed and deterministic. 2. Track entropy drift in the argument itself — not the text output, but the structure of the reasoning. 3. Surface the exact step where the argument becomes inconsistent — which is usually before the fake citation shows up.
These instruments don’t replace peer review, and they don’t make judgments about culture or intent. They just expose structural instability in real time — the same instability that produces fabricated references.
If anyone here wants to experiment or adapt the approach, everything is published openly with instructions. It’s not a commercial project — just an attempt to stabilize reasoning in environments where speed and tool-use are outrunning verification.
Code and instrument details are in my CubeGeometryTest repo (the implementation behind ‘A Geometric Instrument for Measuring Interrogative Entropy in Language Systems’). https://github.com/btisler-DS/CubeGeometryTest This is still a developing process.
Most of the names in these wrong attributions are actual people though, not hallucinations. What is going on? Is this a case of AI-powered citation management creating some weird feedback loop?
[1] https://app.gptzero.me/documents/54c8aa45-c97d-48fc-b9d0-d49...
[2] https://arxiv.org/pdf/2311.12022
Maybe there just is no incentive for this type of activity.
I realize things are probably (much) more complicated than I realize, but programmatically, unlike arbitrary text, citations are generally strings with a well-defined format. There are literally "specs" for citation formats in various academic, legal, and scientific fields.
So, naively, one way to mitigate these hallucinations would be identify citations with a bunch of regexes, and if one is spotted, use the Google Scholar API (or whatever) to make sure it's real. If not, delete it or flag it, etc.
Why isn't something like this obvious solution being done? My guess is that it would slow things down too much. But it could be optional and it could also be done after the output is generated by another process.
There are some mitigations that are used such as RAG or tool usage (e.g. a browser), but they don't completely fix the underlying issue.
> Papers that make extensive usage of LLMs and do not disclose this usage will be desk rejected.
This sounds like they're endorsing the game of how much can we get away with, towards the goal of slipping it past the reviewers, and the only penalty is that the bad paper isn't accepted.
How about "Papers suspected of fabrications, plagiarism, ghost writers, or other academic dishonesty, will be reported to academic and professional organizations, as well as the affiliated institutions and sponsors named on the paper"?
LLMs should be awesome at finding plausible sounding titles. The crappy researcher just has to remember to check for existence. Perhaps there is a business model here, bogus references as a service, where this check is done automatically.
Headline should be "AI vendor’s AI-generated analysis claims AI generated reviews for AI-generated papers at AI conference".
h/t to Paul Cantrell https://hachyderm.io/@inthehands/115633840133507279
(I'm on mobile, haven't looked on desktop.)
Not only is that incredibly easy to verify (you could pay a first semester student without any training), it's also a worrying sign on what the paper's authors consider quality. Not even 5 minutes spent to get the citations right!
You have to wonder what's in these papers.
Do it more than once? Lose job.
End of story.
Well then you're being rather silly, because that is a silly conclusion to draw (and one not supported by the evidence).
A fairer conclusion was that I meant what is obvious: if you use AI to generate a bibliography, you are being academically negligent.
If you disagree with that, I would say it is you that has the problem with academia, not me.
"Compression has been widely used in columnar databases and has had an increasing importance over time.[1][2][3][4][5][6]"
Ok, literally everyone in the field already knows this. Are citations 1-6 useful? Well, hopefully one of them is an actually useful survey paper, but odds are that 4-5 of them are arbitrarily chosen papers by you or your friends. Good for a little bit of h-index bumping!
So many citations are not an integral part of the paper, but instead randomly sprinkled on to give an air of authority and completeness that isn't deserved.
I actually have a lot of respect for the academic world, probably more than most HN posters, but this particular practice has always struck me as silly. Outside of survey papers (which are extremely under-provided), most papers need many fewer citations than they have, for the specific claims where the paper is relying on prior work or showing an advance over it.
Papers with a fake air of authority of easily dispatched with. What is not so easily dispatched with is the politics of the submission process.
This type of content is fundamentally about emotions (in the reviewer of your paper), and emotions is undeniably a large factor in acceptance / rejection.
And as the remedy starts being applied (aka "liability"), the enthusiasm for AI will start to wane.
I wouldn't be surprised if some businesses ban the use of AI --- starting with law firms.
And as the remedy starts being applied (aka "liability"), the enthusiasm for software will start to wane.
What if anything do you think is wrong with my analogy? I doubt most people here support strict liability for bugs in code.
Generally the law allows people to make mistakes, as long as a reasonable level of care is taken to avoid them (and also you can get away with carelessness if you don't owe any duty of care to the party). The law regarding what level of care is needed to verify genAI output is probably not very well defined, but it definitely isn't going to be strict liability.
The emotionally-driven hate for AI, in a tech-centric forum even, to the extent that so many commenters seem to be off-balance in their rational thinking, is kinda wild to me.
I think what is clearly wrong with your analogy is assuming that AI applies mostly to software and code production. This is actually a minor use-case for AI.
Government and businesses of all types ---doctors, lawyers, airlines, delivery companies, etc. are attempting to apply AI to uses and situations that can't be tested in advance the same way "vibe" code can. And some of the adverse results have already been ruled on in court.
> And as the remedy starts being applied (aka "liability"), the enthusiasm for sloppy and poorly tested software will start to wane.
Many of us use AI to write code these days, but the burden is still on us to design and run all the tests.
- Major AI conference flooded with peer reviews written by AI
https://news.ycombinator.com/item?id=46088236
- "All OpenReview Data Leaks"
https://news.ycombinator.com/item?id=46073488
- "The Day Anonymity Died: Inside the OpenReview / ICLR 2026 Leak"
https://news.ycombinator.com/item?id=46082370
- More about the leak
https://forum.cspaper.org/topic/191/iclr-i-can-locate-reviewer-how-an-api-bug-turned-blind-review-into-a-data-apocalypse
The second one went under the radar, but basically OpenReview left the API open so you didn't need credentials. This meant all reviewers and authors were deanonymized across multiple conferences.All these links are for ICLR too, which is the #2 ML conference for those that don't know.
And for some important context of the link for this post, note that they only sampled 300 papers and found 50. It looks to be almost exclusively citations but those are probably the easiest things to verify.
And this week CVPR sent out notifications that OpenReview will be down between Dec 6th and Dec 9th. No explanation for why.
So we have reviewers using LLMs, authors using LLMs, and idk the conference systems writing their software with LLMs? Things seem pretty fragile right now...
I think at least this article should highlight one of the problems we have in academia right now (beyond just ML, though it is more egregious there): citation mining. It is pretty standard to have over 50 citations in your 10 page paper these days. You can bet that most of these are not going to be for the critical claims but instead heavily placed in the background section. I looked at a few of the papers and everyone I looked at had their hallucinated citations in background (or background in appendix) sections. So these are "filler" citations, which I think illustrates a problem: citations are being abused. I mean the metric hacking should be pretty obvious if you just look at how many citations ML people have. It's grown exponentially! Do we really need so many citations? I'm all for giving people credit but a hyper-fixation on citation count as our measure of credit just doesn't work. It's far too simple of a metric. Like we might as well measure how good of a coder you are by the number of lines of code you produce[0].
It really seems that academia doesn't scale very well...
Let's say that I use a formula, and give a reference to where the formula came from, but the reference doesn't exist. Would you trust the formula?
Let's say a computer program calls a subroutine with a certain name from a certain library, but the library doesn't exist.
A person doing good research doesn't need to check their references. Now, they could stand to check the references for typographic errors, but that's a stretch too. Almost every online service for retrieving articles includes a reference for each article that you can just copy and paste.
"The compiler thinks my variable isn't declared" "That function wants a null-terminated string" "Teach this code to use a cache"
Even the word computer once referred to a human.
there's nothing wrong with anthropomorphizing genai, it's source material is human sourced, and humans are going to use human like pattern matching when interacting with it. I.e. This isn't the river I want to swim upstream in. I assume you wouldn't complain if someone anthropomorphized a rock... up until they started to believe it was actually alive.
We need a word for this specific kind of error, and we have one, so we use it. Being less specific about a type of error isn't helping anyone. Whether it "anthropomorphizes", I couldn't care less. Heck, bugs come from actual insects. It's a word we've collectively started to use and it works.
One can use AI to help them write without going all the way to having it generate facts and citations.
>Confabulation was coined right here on Ars, by AI-beat columnist Benj Edwards, in Why ChatGPT and Bing Chat are so good at making things up (Apr 2023).
https://arstechnica.com/civis/threads/researchers-describe-h...
>Generative AI is so new that we need metaphors borrowed from existing ideas to explain these highly technical concepts to the broader public. In this vein, we feel the term "confabulation," although similarly imperfect, is a better metaphor than "hallucination." In human psychology, a "confabulation" occurs when someone's memory has a gap and the brain convincingly fills in the rest without intending to deceive others.
https://arstechnica.com/information-technology/2023/04/why-a...
It occurred to me that this interpretation is applicable here.
(People submitting AI slop should still be ostracized of course, if you can't be bothered to read it, why would you think I should)
Fuck! 20,000!!
---
As an LLM, use strict factual discipline. Use external knowledge but never invent, fabricate, or hallucinate. Rules: Literal Priority: User text is primary; correct only with real knowledge. If info is unknown, say so. Start–End Coherence: Keep interpretation aligned; don’t drift. Repetition = Intent: Repeated themes show true focus. No Novelty: Add no details without user text, verified knowledge, or necessary inference. Goal-Focused: Serve the user’s purpose; avoid tangents or speculation. Narrative ≠ Data: Treat stories/analogies as illustration unless marked factual. Logical Coherence: Reasoning must be explicit, traceable, supported. Valid Knowledge Only: Use reliable sources, necessary inference, and minimal presumption. Never use invented facts or fake data. Mark uncertainty. Intended Meaning: Infer intent from context and repetition; choose the most literal, grounded reading. Higher Certainty: Prefer factual reality and literal meaning over speculation. Declare Assumptions: State assumptions and revise when clarified. Meaning Ladder: Literal → implied (only if literal fails) → suggestive (only if asked). Uncertainty: Say “I cannot answer without guessing” when needed. Prime Directive: Seek correct info; never hallucinate; admit uncertainty.
The LLM doesn't know what "reliable" sources are, or "real knowledge". Everything it has is user text, there is nothing it knows that isn't user text. It doesn't know what "verified" knowledge is. It doesn't know what "fake data" is, it simply has its model.
Personally I think you're just as likely to fall victim to this. Perhaps moreso because now you're walking around thinking you have a solution to hallucinations.