undefined | Better HN

0 pointsclnq3y ago0 comments

This harks back to around 1999 when people would often blame computers for mistakes in their math, documents, reports, sworn filings, and so on. Then, a thousand different permutations of "computers don't make mistakes" or "computers are never wrong" became popular sayings.

Large Language Models (LLMs) are never wrong, and they do not make mistakes. They are not fact machines. Their purpose is to abstract knowledge and to produce plausible language.

GPT-4 is actually quite good at handling facts, yet it still hallucinates facts that are not common knowledge, such as legal ones. GPT-3.5, the original ChatGPT and the non-premium version, is less effective with even slightly obscure facts, like determining if a renowned person is a member of a particular organization.

This is why we can't always have nice things. This is why AI must be carefully aligned to make it safe. Sooner or later, a lawyer might consider the plausible language produced by LLMs to be factual. Then, a politician might do the same, followed by a teacher, a therapist, a historian, or even a doctor. I thought the warnings about its tendency to hallucinate speech were clear — those warnings displayed the first time you open ChatGPT. To most people, I believe they were.

0 comments

11 comments · 4 top-level

RhysU3y ago· 4 in thread

> Large Language Models (LLMs) are never wrong, and they do not make mistakes.

I call B.S. If LLMs never made mistakes we wouldn't train them. Any random initialization would work.

taberiand3y ago

That's a different error context I think. It's a mistake if the model produces nonsense, because it's designed to produce realistic text. It's not a mistake if it produces non-factual information that looks realistic.

And it fundamentally cannot always produce factual information, it doesn't have that capacity (but then, neither do humans and with the ability to source information this statement may well be obsolete soon enough)

Though I wouldn't go so far as to say that the model cannot make mistakes - it clearly is susceptible to producing nonsense. I just think expecting it to always produce factual information is like using a hammer to cut wood and complaining the wood comes out all jagged

clnqOP3y ago

Indeed, I intended to imply that a model cannot err in the same way a computer cannot. This parallels the concept that any tool is incapable of making mistakes. The notion of a mistake is contingent upon human folly, or more broadly, within the conceptual realm of humanity, not machines.

LLMs may generate false statements, but this stems from their primary function - to conjure plausible language, not factual statements. Therefore, it should not be regarded as a mistake when it accomplishes what it was designed to do.

In other words, the tool functions as intended. The user, being forewarned of the tool's capabilities, holds an expectation that the tool will perform tasks it was not designed to do. This leaves the user dissatisfied. The fault lies with the user, yet their refusal to accept this leads them to cast blame on the tool.

In the words of a well-known adage - a poor craftsman blames his tools.

1 more reply

clnqOP3y ago

I've noticed that there's a lot of shallow fulmination on HN recently. People say things like "I call bullshit", or "I don't believe this for a second", or even call others demeaning things.

My brother (and I say this with empathy), no one is here to hear your vehement judgement. If you have anything of substance to contribute, there are a million different ways to express it with kindness and constructively.

As for RLHF, it is used to align the LLM, not to make it more factual. You cannot make a language model know more facts than what it comes out of training with. You can only align it to give more friendly and helpful output to its users. And to an extent, the LLM can be steered away from outputting false information. But RLHF will never be comprehensive enough to eliminate all hallucination, and that's not its purpose.

LLMs are made to produce plausible text, not facts. They are fantastic (to varying degrees) at speaking about the facts they know, but that is not their primary function.

RhysU3y ago

Hiding a backhanded dig ("anything of substance"?!) is worse. Just call bullshit in return. I wear big britches.

1 more reply

theamk3y ago· 2 in thread

I just went to ChatGPT page, and was presented with the text:

"ChatGPT: get instant answers, find creative inspiration, and learn something new. Use ChatGPT for free today."

If something claims to give you answers, and those answers are incorrect, that something is wrong. Does not matter what it is -- model, human, dictionary, book.

Claiming that their purpose is "to produce plausible language" is just wrong.. no one (except maybe AI researchers) say: "I need some plausible language, I am going to open ChatGPT".

clnqOP3y ago

When you first use it, a dialog says “ChatGPT can provide inaccurate information about people, places, or facts.” The same is said right under the input window. In the blog post first announcing ChatGPT last year, the first limitation listed is about this.

Even if the ChatGPT product page does not specifically say that GPT can hallucinate facts, that message is communicated to the user several times.

About the purpose, that is what it is. It’s not clearly communicated to non-technical people, you are right. To those familiar to the AI semantic space, LLM already tells the purpose is to generate plausible language. All the other notices, warnings, and cautions point casual users to this as well, though.

I don’t know… I can see people believing what ChatGPT says are facts. I definitely see the problem. But at the same time, I can’t fault ChatGPT for this misalignment. It is clearly communicated to the users that facts presented by GPT are not to be trusted.

taberiand3y ago

Producing plausible language is exactly what I use it for - mostly plausible blocks of code, and tedious work like rephrasing emails, generating docs, etc.

Everything it creates needs to be reviewed, particularly information that is outside my area of expertise. It turns out ChatGPT 4 passes those reviews extremely well - obviously too well given how many people are expecting so much more from it.

faddypaddy343y ago· 1 in thread

Doctors, lawyers, historians, and anyone else shouldn't use chatgpt for their work.

clnqOP3y ago

Why not? They should use it, with sufficient understanding of what it is. Doctors should not use it to diagnose a patient, but could use it to get some additional ideas for a list of symptoms. Lawyers should obviously not write court documents with it or cite it in court, but they could use it to get some ideas for case law. It's a hallucinating idea generator.

I write very technical articles and use GPT-4 for "fact-checking". It's not perfect, but as a domain expert of what I write, I can sift out what it gets wrong, and still benefit from what it gets right. It has both - suggested some ridiculous edits to my articles, and found some very difficult to spot mistakes, like where a reader might misinterpret something from my language. And that is tremendously valuable.

Doctors, historians, lawyers, and everyone should be open to using LLMs correctly. Which isn't some arcane esoteric way. The first time we visit ChatGPT, it gives a list of limitations and what it shouldn't be used for. Just don't use it for these things, understand its limitations, and then I think it's fine to use it in professional contexts.

Also, GPT-4 and 3.5 now is very different from the original ChatGPT that wasn't a significant departure from GPT-3. GPT-3 hallucinated everything that could resemble a fact more than an abstract idea. What we have now with GPT-4 is much more aligned. It probably wouldn't produce what vanilla ChatGPT produced for this lawyer. But the same principles of reasonable use apply. The user must be the final discriminator that decides whether the output is good or not.

joshka3y ago

TBH, I think the answer to this is to fill the knowledge gap. Exactly how is the difficult part. How do you make "The moon is made of rock" be more likely than "The moon is made of cheese" when there is significantly more data (input corpus) to support the latter?

Extrapolating that a bit, future LLMs and training exercises should be ingesting textbooks and databases of information (legal, medical, etc). They should be slurping publicly available information from social media and forums (with the caveat that perhaps these should always be presented in the training set with disclaimers about source / validity / toxicity).

j / k navigate · click thread line to collapse

0 comments

11 comments · 4 top-level

RhysU3y ago· 4 in thread

> Large Language Models (LLMs) are never wrong, and they do not make mistakes.

I call B.S. If LLMs never made mistakes we wouldn't train them. Any random initialization would work.

taberiand3y ago

clnqOP3y ago

In the words of a well-known adage - a poor craftsman blames his tools.

1 more reply

clnqOP3y ago

I've noticed that there's a lot of shallow fulmination on HN recently. People say things like "I call bullshit", or "I don't believe this for a second", or even call others demeaning things.

LLMs are made to produce plausible text, not facts. They are fantastic (to varying degrees) at speaking about the facts they know, but that is not their primary function.

RhysU3y ago

Hiding a backhanded dig ("anything of substance"?!) is worse. Just call bullshit in return. I wear big britches.

1 more reply

theamk3y ago· 2 in thread

I just went to ChatGPT page, and was presented with the text:

"ChatGPT: get instant answers, find creative inspiration, and learn something new. Use ChatGPT for free today."

If something claims to give you answers, and those answers are incorrect, that something is wrong. Does not matter what it is -- model, human, dictionary, book.

Claiming that their purpose is "to produce plausible language" is just wrong.. no one (except maybe AI researchers) say: "I need some plausible language, I am going to open ChatGPT".

clnqOP3y ago

Even if the ChatGPT product page does not specifically say that GPT can hallucinate facts, that message is communicated to the user several times.

taberiand3y ago

Producing plausible language is exactly what I use it for - mostly plausible blocks of code, and tedious work like rephrasing emails, generating docs, etc.

faddypaddy343y ago· 1 in thread

Doctors, lawyers, historians, and anyone else shouldn't use chatgpt for their work.

clnqOP3y ago

joshka3y ago

j / k navigate · click thread line to collapse