Conversely, I sometimes present it with some existing code and ask it what it does. If it gets it wrong, that's a good sign my API is confusing, and how.
These are ways to harness what neural networks are best at: not providing accurate information but making shit up that is highly plausible, "hallucination". Creativity, not logic.
(The best thing about this is that I don't have to spend my time carefully tracking down the bugs GPT-4 has cunningly concealed in its code, which often takes longer than just writing the code the usual way.)
There are multiple ways that an interface can be bad, and being unintuitive is the only one that this will fix. It could also be inherently inefficient or unreliable, for example, or lack composability. The AI won't help with those. But it can make sure your API is guessable and understandable, and that's very valuable.
Unfortunately, this only works with APIs that aren't already super popular.
IMO this has always been the killer use case for AI—from Google Maps to Grammarly.
I discovered Grammarly at the very last phase of writing my book. I accepted maybe 1/3 of its suggestions, which is pretty damn good considering my book had already been edited by me dozens of times AND professionally copy-edited.
But if I'd have accepted all of Grammarly's changes, the book would have been much worse. Grammarly is great for sniffing out extra words and passive voice. But it doesn't get writing for humorous effect, context, deliberate repetition, etc.
The problem is executives want to completely remove humans from the loop, which almost universally leads to disastrous results.
Thanks for your words of wisdom, which touch on a very important other point I want to raise: often, we (i.e., developers, researchers) construct a technology that would be helpful and "net benign" if deployed as a tool for humans to use, instead of deploying it in order to replace humans. But then along comes a greedy business manager who reckons recklessly that using said technology not as a tool, but in full automation mode, results will be 5% worse, but save 15% of staff costs; and they decide that that is a fantastic trade-off for the company - yet employees may lose and customers may lose.
The big problem is that developers/researchers lose control of what they develop, usually once the project is completed if they ever had control in the first place. What can we do? Perhaps write open source licenses that are less liberal?
The problem is that people who are laid off often experience significant life disruption. And people who work in a field that is largely or entirely replaced by technology often experience permanent disruption.
However, there's no reason it has to be this way - the fact people having their jobs replace by technology are completely screwed over is a result of the society we have all created together, it's not a rule of nature.
> But then along comes a greedy business manager who reckons recklessly
Thanks for this. :)
You make something, but because you don’t own it—others caused and directed the effort—you don’t control it. But the people who control things can’t make things.
Should only the people who can make things decide how they are used though? I think that’s also folly. What about the rest of society affected by those things?
It’s ultimately a societal decision-making problem: who has power, and why, and how does the use of power affect who has power (accountability).
if these developers/researchers are being paid by someone else, why should that same someone else be giving up the control that they paid for?
If these developers/researchers are paying the research themselves (e.g., a startup of their own founding), then why would they ever lose control, unless they sell it?
As the comment above said that we need a human in the loop for better results, Well firstly it also depends on human to human.
A senior can be way more productive in the loop than a junior.
So Everybody has just stopped hiring juniors because they cost money and they will deal with the AI almost-slop later/ someone else will deal with it.
Now the current seniors will one day retire but we won't have a new generation of seniors because nobody is giving juniors a chance or that's what I've heard about the job market being brutal.
Stock your underground bunkers with enough food and water for the rest of your life and work hard to persuade the AI that you're not a threat. If possible, upload your consciousness to a starwisp and accelerate it out of the Solar System as close to lightspeed as you can possibly get it.
Those measures might work. (Or they might be impossible, or insufficient.) Changing your license won't.
That's how you get economics of scale.
Google couldn't have a human in the loop to review every page of search results before handing them out in response to queries.
What benefit might human review have? Maybe they could make sure the SERP list entries actually have the keywords you're looking for. Even better, they could make sure the prices in the shopping section are correct! Maybe even make sure they relate to the product you actually searched for... I might actually pay money for that.
That’s like getting rid of all languages and accents and switch to the same language
Examples:
* Active - concise, complete info: The manager approved the proposal.
* Passive - wordy, awkward: The proposal was approved by the manager.
* Passive - missing info: The proposal was approved. [by who?]
Most experienced writers will use active unless they have a specific reason not to, e.g., to emphasize another element of the sentence, as the third bullet's sentence emphasizes approval.
-
edited for clarity, detail
The problem is that many people have only a poor ability to recognize the passive voice in the first place. This results in the examples being clunky, wordy messes that are bad because they're, well, clunky and wordy, and not because they're passive--indeed, you've often got only a fifty-fifty chance of the example passive voice actually being passive in the first place.
I'll point out that the commenter you're replying to used the passive voice, as did the one they responded to, and I suspect that such uses went unnoticed. Hell, I just rewrote the previous sentence to use the passive voice, and I wonder how many people think recognized that in the first place let alone think it worse for being so written.
Language log has been writing about this for so long it's not even funny: https://languagelog.ldc.upenn.edu/nll/?cat=54
Although the best place to start is probably here: https://languagelog.ldc.upenn.edu/nll/?p=2922
- Active: The user presses the Enter key.
- Passive: The Enter key is to be pressed.
- Imperative (aka command): Press the Enter key.
The imperative mood is concise and doesn't dance around questions about who's doing what. The reader is expected to do it.
It also taught me to be more careful about checkpointing my work in git before letting an agent go wild on my codebase. It left a mess trying to fix its problems.
That's closer to simply observing the mean. For an analogy, it's like waiting to pave a path until people tread the grass in a specific pattern. (Some courtyard designers used to do just that. Wait to see where people were walking first.)
Making things easy for Chat GPT means making things close to ordinary, average, or mainstream. Not creative, but can still be valuable.
On the bright side, a lot of work is just finding the mean solution so.
I've found that LLMs can be kind of dumb about understanding things, and are particularly bad at reading between the lines for anything subtle. In this aspect, I find they make good proxies for inattentive anonymous reviewers, and so will try to revise my text until even the LLM can grasp the key points that I'm trying to make.
In both cases, you might get extra bonus usability if the reviewers or the API users actually give your output to the same LLM you used to improve the draft. Or maybe a more harshly quantized version of the same model, so it makes more mistakes.
Many many python image-processing libraries have an `imread()` function. I didn't know about this when designing our own bespoke image-lib at work, and went with an esoteric `image_get()` that I never bothered to refactor.
When I ask ChatGPT for help writing one-off scripts using the internal library I often forget to give it more context than just `import mylib` at the top, and it almost always defaults to `mylib.imread()`.
(Unless, on the gripping hand, your image_get function is subtly different from Matlab's imread, for example by not returning an array, in which case a different name might be better.)
That's also how I'm approaching it. If all the condensed common wisdom poured into the model's parameters says that this is how my API is supposed to work to be intuitive, how on earth do I think it should work differently? There needs to be a good reason (like composability, for example). I break expectations otherwise.
“Sometimes” being a very important qualifier to that statement.
Claude 4 naturally doesn’t write code with any kind of long term maintenance in-mind, especially if it’s trying to make things look like what the less experienced developers wrote in the same repo.
Please don’t assume just because it looks smart that it is. That will bite you hard.
Even with well-intentional rules, terrible things happen. It took me weeks to see some of it.
> I don't have to spend my time carefully tracking down the bugs GPT-4 has cunningly concealed in its code
If anyone is stuck in this situation, give me a holler. My Gmail username is the same as my HN username. I've always been the one to hunt down my coworkers' bugs, and I think I'm the only person on the planet will finds it enjoyable to find ChatGPT'S oversights and sometimes seemingly malicious intent.I'll charge you, don't get me wrong, but I'll save you time, money, and frustration. And future bug reports and security issues.
Having an LLM demo your tool, then taking what it does wrong or uses incorrectly and adjusting the API works very very well. Updating the docs to instruct the LLM on how to use your tool does not work well.
This is also similar to which areas TD-Gammon excelled at in Backgammon.
Which is all pretty amusing, if you compare it to how people usually tended to characterise computers and AI, especially in fiction.
> Any person who has used a computer in the past ten years knows that doing meaningless tasks is just part of the experience. Millions of people create accounts, confirm emails, dismiss notifications, solve captchas, reject cookies, and accept terms and conditions—not because they particularly want to or even need to. They do it because that’s what the computer told them to do. Like it or not, we are already serving the machines. (...)
> You might’ve heard a story of Soundslice [adding a feature because ChatGPT kept telling people it exists](https://www.holovaty.com/writing/chatgpt-fake-feature/). We see the same at Instant: for example, we used `tx.update` for both inserting and updating entities, but LLMs kept writing `tx.create` instead. Guess what: we now have `tx.create`, too.
> Is it good or is it bad? It definitely feels strange. In a sense, it’s helpful: LLMs here have seen millions of other APIs and are suggesting the most obvious thing, something every developer would think of first, too.
> It’s also a unique testing device: if developers use your API wrong, they blame themselves, read the documentation, and fix their code. In the end, you might never learn that they even had the problem. But with ChatGPT, you yourself can experience “newbie’s POV” at any time.
Sometimes I can just say, "How do I use the <made-up name> API in Python to do <task>?" Unfortunately the safeguards against hallucinations in more recent models can make this more difficult, because it's more likely to tell me it's never heard of it. You can usually coax it into suspension of disbelief, but I think the results aren't as good.
AI is going to get the hang of coding to fill in the spaces (i.e. the part you’re doing) long before it’s able to intelligently design an API. Correct API design requires a lot of contextual information and forward planning for things that don’t exist today.
Right now it’s throwing spaghetti at the wall and you’re drawing around it.
Even if your API is for something that's never been done before, it can usually still take advantage of its training data to suggest a sensible shape once you describe the new nouns and verbs to it.
I agree that it's also not currently capable of judging those creative ideas, so I have to do that.
It's not creative at all, any more than taking the sum of text on a topic, and throwing a dart at it. It's a mild, short step beyond a weighted random, and certainly not capable of any real creativity.
Myriads of HN enthusiasts often chime in here "Are humans any more creative" and other blather. Well, that's a whataboutism, and doesn't detract from the fact that creative does not exist in the AI sphere.
I agree that you have to judge its output.
Also, sorry for hanging my comment here. Might seem over the top, but anytime I see 'creative' and 'AI', I have all sorts of dark thoughts. Dark, brooding thoughts with a sense of deep foreboding.
Insanity driven development: altering your api to accept 7 levels of "broken and different" structures so as to bend to the will of the llms
If you automatically assume that what the LLM spits out is what the API ought to be then I agree that that’s bad engineering. But if you’re using it to brainstorm what an intuitive interface would look like, that seems pretty reasonable.
Of course when it suggests a bad interface you shouldn't implement it.
> Hallucinations can sometimes serve the same role as TDD. If an LLM hallucinates a method that doesn’t exist, sometimes that’s because it makes sense to have a method like that and you should implement it.
— https://www.threads.com/@jimdabell/post/DLek0rbSmEM
I guess it’s true for product features as well.
> Maybe hallucinations of vibe coders are just a suggestion those API calls should have existed in the first place.
> Hallucination-driven-development is in.
https://x.com/pwnies/status/1922759748014772488?s=46&t=bwJTI...
No, what actually happened is that OP developed a type of chatgpt integration, and a shitty one at that, chatgpt could have just directed the user to the site and told them to upload that image to OP's site. But it felt it needed to do something with the image, so it did.
There's no new value add here, at least yet, maybe if users started requesting changes to the sheet I guess, not what's going on.
This doesn’t seem likely. The utility is pretty obvious.
> chatgpt could have just directed the user to the site and told them to upload that image to OP's site.
What image? Did you think the first image shown is what is being entered into ChatGPT? It’s not. That’s what the site expects to be uploaded to them. There’s no indication that the ChatGPT users are scanning tabs. ChatGPT is producing ASCII tabs, but we aren’t shown what input it is in response to.
>> Hallucinations can sometimes serve the same role as TDD. If an LLM hallucinates a method that doesn’t exist, sometimes that’s because it makes sense to have a method like that and you should implement it.
A detailed counterargument to this position can be found here[0]. In short, what is colloquially described as "LLM hallucinations" do not serve any plausible role in software design other than to introduce an opportunity for software engineers to stop and think about the problem being solved.
See also Clark's third law[1].
0 - https://addxorrol.blogspot.com/2025/07/a-non-anthropomorphiz...
I also don’t see the relevance of Clarke’s third law.
The users are different, the music that is notated is different, and for the most part if you are on one side, you don't feel the need to cross over. Multiple efforts have been made (MusicXML, etc.) to unify these two worlds into a superset of information. But the camps are still different.
So what ChatGPT did is actually very interesting. It hallucinated a world in which tab readers would want to use Soundslice. But, largely, my guess is they probably don't....today. In a future world, they might? Especially if Soundslice then enables additional features that make tab readers get more out of the result.
It’s not that they added a new feature because there was demand.
They added a new feature because technology hallucinated a feature that didn’t exist.
The savior of tech, generative AI, was telling folks a feature existed that didn’t exist.
That’s what the headline is, and in a sane world the folks that run ChatGPT would be falling over themselves to be sure it didn’t happen again, because next time it might not be so benign as it was this time.
This would be a world without generative AI available to the public, at the moment. Requiring perfection would either mean guardrails that would make it useless for most cases, or no LLM access until AGI exists, which are both completely irrational, since many people are finding practical value in its current imperfect state.
The current state of LLM is useful for what it's useful for, warnings of hallucinations are present on every official public interface, and its limitations are quickly understood with any real use.
Nearly everyone in AI research is working on this problem, directly or indirectly.
Really!?
If “don’t hallucinate” is too much to ask then ethics flew out the window long ago.
> If “don’t hallucinate” is too much to ask then ethics flew out the window long ago.
Those sentences aren't compatible.
> but hallucination is a major issue
Again, every official public AI interface has warnings/disclaimers for this issue. It's well known. It's not some secret. Every AI researcher is directly or indirectly working on this.
> is in the opposite direction of the “goal” of AGI
This isn't a logical statement, so it's difficult to respond to. Hallucination isn't a direction that's being headed towards, it's being actively, with intent and $$$, headed away from.
What?? What does AGI have to do with this? (If this was some kind of hyperbolic joke, sorry, i didn't get it.)
But, more importantly, the GP only said that in a sane world, the ChatGPT creators should be the ones trying to fix this mistake on ChatGPT. After all, it's obviously a mistake on ChatGPT's part, right?
That was the main point of the GP post. It was not about "requiring perfection" or something like that. So please let's not attack a straw man.
Their requirement is no hallucinations [1], also stated as "be sure it didn't happen again" in the original comment. If you define a hallucination as something that wasn't in the training data, directly or indirectly (indirectly being something like an "obvious" abstract concept), then you've placed a profound constraint on the system, requiring determinism. That requirement fundamentally, by the non-deterministic statistics that these run on, means you cannot use an LLM, as they exist today. They're not "truth" machines - use a database instead.
Saying "I don't know", with determinism is only slightly different than saying "I know" with determinism, since it requires being fully aware of what you do know, not at a fact level, but at a conceptual/abstract level. Once you have a system that fully reasons about concepts, is self aware of its own knowledge, and can find the fundamental "truth" to answer a question with determinism, you have something indistinguishable from AGI.
Of course, there's a terrible hell that lives between those two, in the form of: "Error: Question outside of known questions." I think a better alternative to this hell would be a breakthrough that allowed "confidence" to be quantified. So, accept that hallucinations will exist, but present uncertainty to the user.
Meanwhile, sensible people have concluded that, even though it isn’t perfect, Wikipedia is still very, very useful – despite the possibility of being misled occasionally.
There is a chasm of difference between being misled occasionally (Wikipedia) and frequently (LLMs). I don’t think you understand how much effort goes on behind the scenes at Wikipedia. No, not everyone can edit every Wikipedia page willy-nilly. Pages for major political figures often can only be edited with an account. IPs like those of iCloud Private Relay are banned and can’t anonymously edit the most basic of pages.
Furthermore, Wikipedia was always honest about what it is from the start. They managed expectations, underpromised and overdelivered. The bozos releasing LLMs talk about them as if they created the embryo of god, and giving money to their religion will solve all your problems.
I understand Wikipedia puts effort in, but it’s irrelevant. As a user, you can never be sure that what you are reading on Wikipedia is the truth. There are good reasons to assume that certain topics are more safe and certain topics are less safe, but there are no guarantees. The same is true of AI.
> Wikipedia was always honest about what it is from the start.
Every mainstream AI chatbot includes wording like “ChatGPT can make mistakes. Check important info.”
A technology can be extremely useful despite not being perfect. Failure cases can be taken into consideration rationally without turning it into a moral panic.
You have no ability to edit Wikipedia to stop it from lying. Somebody can come along and re-add the lie a millisecond later.
And in this case, OP didn't have to take ChatGPT's word for the existence of the pattern, it showed up on their (digital) doorstep in the form of people taking action based on ChatGPT's incorrect information.
So, pattern noticed and surfaced by an LLM as a hallucination, people take action on the "info", nonzero market demand validated, vendor adds feature.
Unless the phantom feature is very costly to implement, seems like the right response.
I would go on to say that thisminteraction between ‘holes’ exposed by LLM expectations _and_ demonstrated museerbase interest _and_ expert input (by the devs’ decision to implement changes) is an ideal outcome that would not have occurred if each of the pieces were not in place to facilitate these interactions, and there’s probably something here to learn from and expand on in the age of LLMs altering user experiences.
Some people express concerns about AGI creating swarms of robots to conquer the earth and make humans do its bidding. I think market forces are a much more straightforward tool that AI systems will use to shape the world.
One of the most dangerous systems an AI can reach and exploit is a human being.
I know nothing about this. I imagine people are already working on it, wonder what they've figured out.
(Alternatively, in the future can I pay OpenAI to get ChatGPT to be more likely to recommend my product than my competitors?)
So winning AI SEO is not so different than regular SEO.
Would be pretty trivial to have a model which recognises product recommendations in the output and inserts branded equivalents - or nudges the output towards branded equivalents.
Clearly the users are already using ChatGPT for generating some guitar practice, as it is basically infinite free personalized lessons. For practicing they do want to be able hear it to play along at variable speed, maybe create slight variations etc.
Soundslice is a service that does exactly that. Except that before people used to have sheet music as the source. I know way back when I had guitar aspirations, people exchanged binders of photocopied sheet music.
Now they could have asked ChatGPT to output an svg of the thing as sheet music (it does, I tested). Soundslice could have done this behind the scenes as a half hour quick and dirty fix while developing a better and more cost effective alternative.
Look, if at the turn of the century you were a blacksmith living of changing horseshoes, and you suddenly have people mistakenly showing up for a tire change on their car, are you going to blame the villagers that keep sending them your way, or open a tire change service? We know who came out on top.
Your solution is the equivalent of asking Google to completely delist you because one page you dont want ended up on Googles search results.
I've been wanted them to do this for questions like "what is your context length?" for ages - it frustrates me how badly ChatGPT handles questions about its own abilities, it feels like that would be worth them using some kind of special case or RAG mechanism to support.
Let alone that dynamically modifying the base system prompt would likely break their entire caching mechanism given that caching is based on longest prefix, and I can't imagine that the model's system prompt is somehow excluded from this.
This seems similar, and like a decent indicator that most people (aka the average developer) would expect X to exist in your API.
I find it interesting that any user would attribute this issue to Soundslice. As a user, I would be annoyed that GPT is lying and wouldn't think twice about Soundslice looking bad in the process
That kind of thinking is how you never get new customers and eventually fail as a business.
OTOH it's free(?) advertising, as long as that first impression isn't too negative.
I also went back to just sleeping on those flights and using connected models for most of my code generation needs.
What surprised me initially was just how confidently wrong Llama was... Now I'm used to confident wrongness from smaller models. It's almost like working with real people...
We get ~50% of traffic from ChatGPT now, unfortunately a large amount of the features it says we have are made up.
I really don't want to get into a state of ChatGPT-Driven-Development as I imagine that will be never ending!
Example:
https://llama-cpp-agent.readthedocs.io/en/latest/structured-...
I already strongly suspect that LLMs are just going to magnify the dominance of python as LLMs can remove the most friction from its use. Then will come the second order effects where libraries are explicitly written to be LLM friendly, further removing friction.
LLMs write code best in python -> python gets used more -> python gets optimized for LLMs -> LLMs write code best in python
We don't live in a nice world, so you'll probably end up right.
Some users might share it. ChatGPT has so many users it's somewhat mind boggling
If the super-intelligent AI understands human incentives and is in control of a very popular service, it can subtly influence people to its agenda by using the power of mass usage. Like how a search engine can influence a population's view of an issue by changing the rankings of news sources that it prefers.
1. I might consider a thing like that like any other feature request. If not already added to the feature request tracker, it could be done. It might be accepted or rejected, or more discussion may be wanted, and/or other changes made, etc, like any other feature request.
2. I might add a FAQ entry to specify that it does not have such a feature, and that ChatGPT is wrong. This does not necessarily mean that it will not be added in future, if there is a good reason to do so. If there is a good reason to not include it, this will be mentioned, too. It might also be mentioned other programs that can be used instead if this one doesn't work.
Also note that in the article, the second ChatGPT screenshot has a note on the bottom saying that ChatGPT can make mistakes (which, in this case, it does). Their program might also be made to detect ChatGPT screenshots and to display a special error message in that case.
> Correct feature almost exists
> Creator profile: analytical, perceptive, responsive;
> Feature within product scope, creator ability
> Induce demand
> await "That doesn't work" => "Thanks!"
> update memory
Figuring out the paths that users (or LLMs) actually want to take—not based on your original design or model of what paths they should want, but based on the paths that they actually do want and do trod down. Aka, meeting demand.
The user is not going to understand this. The user may not even need that feature at all to accomplish whatever it is they're doing. Alternatives may exist. The consequences will be severe if companies don't take this seriously.
It really likes to cogitate on code from several versions ago. And it often insists repeatedly on edits unrelated to the current task.
I feel like I'm spending more time educating the LLM. If I can resist the urge to lean on the LLM beyond its capabilities, I think I can be productive with it. If I'm going to stop teaching the thing, the least it can do is monitor my changes and not try to make suggestions from the first draft of code from five days ago, alas ...
1 - e.g. a 500-line text file representing values that will be converted to enums, with varying adherence to some naming scheme - I start typing, and after correcting the first two, it suggests the next few. I accept its suggestions until it makes a mistake because the data changed, start manual edits again ... I repeated this process for about 30 lines and it successfully learned how I wanted the remainder of the file edited.
On the other hand, adding a feature because you believe it is a feature your product should have, a feature that fits your vision and strategy, is a pretty sound approach that works regardless of what made you think of that feature in the first place.
I recall that early on a coworker was saying that ChatGPT hallucinated a simpler API than the one we offered, albeit with some easy to fix errors and extra assumptions that could've been nicer defaults in the API. I'm not sure if this ever got implemented though, as he was from a different team.
what a wonderful incident / bug report my god.
totally sorry for the trouble and amazing find and fix honestly.
sorry i am more amazed than sorry :D. thanks for sharing this !!
so i am happy you implemented this, and will now look at using your service. thx chatgpt, and you.
Maybe I'll turn it into a feature request then ...
Had something similar happen to us with our dev-tools saas. Non devs started coming to the product because gpt told them about it. Had to change parts of the onboarding and integration to accommodate it for non-devs who were having a harder time reading the documentation and understanding what to do.
It'll all be fine in a few years. :-;
If that's too scary, the failed tool call could trigger another AI to go draft up a PR with that proposed tool, since hey, it's cheap and might be useful.
Dynamic, on-the-fly generation & execution is definitely fascinating to watch in a sandbox, but is far to scary (from a compliance/security/sanity perspective) without spending a lot more time on guardrails.
We do however take note of hallucinated tool calls and have had it suggest an implementation we start with and have several such tools in production now.
It's also useful to spin up any completed agents and interrogate them about what tools they might have found useful during execution (or really any number of other post-process questionnaire you can think of).
Would love love love to hear more on what you are doing here? This seems super fascinating (and scary). :)
It was a plausible answer, and the core of what these models do is generate plausible responses to (or continuations of) the prompt they’re given. They’re not databases or oracles.
With errors like this, if you ask a followup question it’ll typically agree that the feature isn’t supported, because the text of that question combined with its training essentially prompts it to reach that conclusion.
Re the follow-up question, it’s almost certainly the direction that advertising in general is going to take.
Also, I’m not suggesting an LLM is actually thinking. We’ve been using “thinking” in a computing context for a long time.
The example in the OP is a common one: ask a model how to do something with a tool, and if there’s no easy way to perform that operation they’ll commonly make up a plausible answer.
https://www.soundslice.com/help/en/player/advanced/17/expand...
That's available for any music in Soundslice, not just music that was created via our scanning feature.
"Would you still have added this feature if ChatGPT hadn't bullied you into it?" Absolutely not.
I feel like this resolves several longstanding time travel paradox tropes.
Creating the feature means it's no longer misinformation.
The bigger issue isn't that ChatGPT produces misinformation - it's that it takes less effort to update reality to match ChatGPT than it takes to update ChatGPT to match reality. Expect to see even more of this as we match toward accepting ChatGPT's reality over other sources.
If a feature has enough customers to pay for itself, develop it.
Neat
> My feelings on this are conflicted
Doubt
In our new post truth, anti-realism reality, pounding one's head against a brick wall is often instructive in the way the brain damage actually produces great results!
ChatGPT routinely hallucinates API calls. ChatGPT flat-out makes it from whole cloth. "Apple Intelligence" creates variants of existing API calls, Usually, by adding nonexistent arguments.
Both of them will hallucinate API calls that are frequently added by programmers through extensions.
Amateur musicians often lack just one or two features in the program they use, and the devs won't respond to their pleas.
Adding support for guitar tabs has made OP's product almost certainly more versatile and useful for a larger set of people. Which, IMHO, is a good thing.
But I also get the resentment of "a darn stupid robot made me do it". We don't take kindly to being bossed around by robots.
Over the last year, on average, I've had much more luck logically reasoning with AIs than with humans.
I really don't see any good reason against replacing some product managers with AIs that actually talk to individual users all the time and synthesize their requests and feedback. You should still probably have a top-level CPO to set strategy, but for the day-to-day discovery and specification, I would argue that AIs already have benefits over humans.
This is generally how you work with LLMs.
We’ve never supported ASCII tab; ChatGPT was outright lying to people. And making us look bad in the process, setting false expectations about our service.... We ended up deciding: what the heck, we might as well meet the market demand.
[...]
My feelings on this are conflicted. I’m happy to add a tool that helps people. But I feel like our hand was forced in a weird way. Should we really be developing features in response to misinformation?
The feature seems pretty useless for practicing guitar since ASCII tablature usually doesn't include the rhythm: it is a bit shady to present the music as faithfully representing the tab, especially since only
beginner guitarists would ask ChatGPT for help - they might not realize the rhythm is wrong. If ChatGPT didn't "force their hand" I doubt they would have included a misleading and useless feature."We’ve got a steady stream of new users" and it seems like a simple feature to implement.
This is the exact chaos AI brings that's wonderful. Forcing us to evolve in ways we didn't think of.
I can think of a dozen reasons why this might be bad, but I see no reason why they have more weight than the positive here.
Take the positive side of this unknown and run with it.
We have decades more of AI coming up, Debbie Downers will be left behind in the ditch.
No, because you'll be held responsible for the misinformation being accurate: users will say it is YOUR fault when they learn stuff wrong.
- Do you keep bolting on new updates to match these hallucinations, potentially breaking existing behavior?
- Or do you resign yourself to following whatever spec the AI gods invent next?
- And what if different LLMs hallucinate conflicting behavior for the same endpoint?
I don’t have a great solution, but a few options come to mind:
1. Implement the hallucinated endpoint and return a 200 OK or 202 Accepted, but include an X-Warning header like "X-Warning: The endpoint you used was built in response to ChatGPT hallucinations. Always double-check an LLM's advice on building against 3rd-party APIs with the API docs themselves. Refer to https://api.example.com/docs for our docs. We reserve the right to change our approach to building against LLM hallucinations in the future." Most consumers won’t notice the header, but it’s a low-friction way to correct false assumptions while still supporting the request.
2. Fail loudly: Respond with 404 Not Found or 501 Not Implemented, and include a JSON body explaining that the endpoint never existed and may have been incorrectly inferred by an LLM. This is less friendly but more likely to get the developer’s attention.
Normally I'd say that good API versioning would prevent this, but it feels like that all goes out the window unless an LLM user thinks to double-check what the LLM tells them against actual docs. And if that had happened, it seems like they wouldn't have built against a hallucinated endpoint in the first place.
It’s frustrating that teams now have to reshape their product roadmap around misinformation from language models. It feels like there’s real potential here for long-term erosion of product boundaries and spec integrity.
EDIT: for the down-voters, if you've got actual qualms with the technical aspects of the above, I'd love to hear them and am open to learning if / how I'm wrong. I want to be a better engineer!
Also, it's not like ChatGPT or users are directly querying their API. They're submitting images through the Soundslice website. The images just aren't of the sort that was previously expected.
this is my general philosophy and, in my case, this is why I deploy things on blockchains
so many people keep wondering about whether there will ever be some mythical unfalsifiable to define “mainstream” use case, and ignoring that crypto natives just … exist. and have problems they will pay (a lot) to solve.
to the author’s burning question about whether any other company has done this. I would say yes. I’ve discovered services recommended by ChatGPT and other LLMs that didnt do what was described of them, and they subsequently tweaked it once they figured out there was new demand
We detached this subthread from https://news.ycombinator.com/item?id=44492212 and marked it off topic.
I don't care if they used an LLM provided they put their best effort in to confirm that it's clearly communicating the message they are intending to communicate.
Requests containing elements of hostility, shame, or injury frequently serve dual purposes: (1) the ostensible aim of eliciting an action and (2) the underlying objective of inflicting some from of harm (here shame) as a means compelling compliance through emotional leverage. Even if the respondent doesn't honor the request, the secondary purpose still occurs.
Having wrangled some open-source work that’s the kind of genius that only its mother could love… there’s a place for idiosyncratic interface design (UI-wise and API-wise), but there’s also a whole group of people who are great at that design sensibility. That category of people doesn’t always overlap with people who are great at the underlying engineering. Similarly, as academic writing tends to demonstrate, people with interesting and important ideas aren’t always people with a tremendous facility for writing to be read.
(And then there are people like me who have neither—I agree that you should roll your eyes at anything I ask an LLM to squirt out! :)
But GP’s technique, like TFA’s, sounds to me like something closer to that of a person with something meaningful to say, who now has a patient close-reader alongside them while they hone drafts. It’s not like you’d take half of your test reader’s suggestions, but some of them might be good in a way that didn’t occur to you in the moment, right?