undefined | Better HN

0 pointsred75prime4mo ago0 comments

> there are significant limitations

Where can we read about those significant limitations?

0 comments

4 comments · 1 top-level

grey-area4mo ago· 3 in thread

Well here's some:

Confabulation/Hallucination - https://github.com/lechmazur/confabulations

Failure to read context - https://georggrab.net/content/opus46retrieval.html

Deleting tests to make them pass - https://www.linkedin.com/posts/jasongorman_and-after-it-did-...

Going rogue and deleting data - https://x.com/jasonlk/status/1946069562723897802

Agent security nightmares because they are not in fact intelligent assistants - https://x.com/theonejvo/status/2015401219746128322

Failure to read or generate structured data - https://support.google.com/gemini/thread/390981629/llm-ignor...

There are many, many examples, mostly caused by people thinking LLMs are intelligent and reasoning and giving them too much power (e.g. treating them as agents, not text generators). I'm sure they're all fixed in whatever new version came out this week though.

red75primeOP4mo ago

Your sarcasm is misplaced. Without principled limitations that demonstrate the existence of a lower bound on the error rate and show that errors are correlated across invocations and models (so that you can't improve the error rate with multiple supervision), you can’t exclude the possibility that "they're all fixed in the new version" (for practical purposes).

joquarky4mo ago

I've seen all of these from human teammates in my 30+ years in tech.

grey-area4mo ago

Sure but now everyone can do them all the time at 10x speed!

j / k navigate · click thread line to collapse