How so? What the LLM says is whatever is more likely given the context. It has no relation to the underlying reality whatsoever.
For example, here is a simple query:
> I put my bike in the bike stand, I removed my bike from the bike stand and biked to the beach, then I biked home, where is my bike. Answer only with the answer and no explanation
> Chatgpt: At home
> Why did you answer that?
> I answered that your bike is at home because the last action you described was biking home, which implies you took the bike with you and ended your journey there. Therefore, the bike would logically be at home now.
Do you doubt that the answer would change if I changed the query to make the final destination be "the park" instead of "home"? If you don't doubt that, what do you mean that the answer doesn't correspond to the underlying reality? The reality is the answer depends on the final destination mentioned, and that's also the explanation given by the LLM, clearly the reality and the answers are related.
Then there is the question of what you would do with its response. It’s not like code where you can go in and update the logic. There are billions of floating point numbers. If you actually wanted to update the weights you’ll quickly find yourself fine-tuning the monstrosity. Orders of magnitude more work than updating an “if” statement.
> Then there is the question of what you would do with its response. I
Sure but that's a separate question. I'd say the first course of action would be to edit the prompt. If you have to resort to fine tuning I'd say the approach has failed and the tool was insufficient for the task.
For LLMs, interpretability is one problem. The ability to effectively apply fixes is another. If we are talking about business logic, have the LLM write code for it and don’t tie yourself in knots begging the LLM to do things correctly.
There is a grey area though, which is where code sucks and statistical models shine. If your task was to differentiate between a cat and a dog visually, good luck writing code for that. But neural nets do that for breakfast. It’s all about using the right tool for the job.
No it isn't. You misunderstand how LLMs work. They're giant Mad Libs machines: given these surrounding words, fill in this blank with whatever statistically is most likely. LLMs don't model reality in any way.
> They're giant Mad Libs machines: given these surrounding words, fill in this blank with whatever statistically is most likely. LLMs don't model reality in any way.
Not sure why you think this is incompatible with the statement you disagreed with.
Yes, I do. An LLM replies with the most likely string of tokens. Which may or may not correspond with the correct or reasonable string of tokens, depending on how stars align. In this case the statistically most likely explanation the LLM replied with just happened to correspond with the correct one.