My understanding of machine learning in general is that this is not how it works, and rather it uses a neural network for a lot of what it does (which isn't merely picking the closest arbitrary text in its training set), though I don't know details about NLP specifically. I am aware that some activities involve ranking the similarity of particular texts, and picking the closest matches (that's how embeddings get used for providing context to prompts), but I didn't think that's how the underlying models operate when, say, asked the question I asked it.
Regarding the rest of your remarks, the aim isn't necessarily perfection. You listed some other possible outputs, but the fact is that it didn't offer these options, and instead it offered the correct one, and it seems that a system that uses ChatGPT + something else can provide an improvement over ChatGPT on its own (with the downside being that it requires more resources to provide those answers).
Is the other "completely valid and possible output" you provided actually likely? Even a human could possibly make the mistakes you list, but possible doesn't mean probable, and being possible doesn't take away from the results of such a hybrid system being an improvement over ChatGPT on its own.
I didn't think guarantees are the game here. Achieving something closer to what humans are (and even some humans struggle with logic and mathematics), and even exceeding it in some places, would be of immense value.
> ChatGPT cannot do anything at all without making a guess, because "guess" is everything that ChatGPT is
Is this different to what humans do? There are accounts of the brain where it takes sensory input, updates an internal model, and makes predictions (guesses) about what inputs it expects to receive. Getting better at guessing might be the right game to play. i.e., it's not a criticism to say it's guessing, because that's what these models are trying to do. If it can guess reasonably well when a question is mathematical or logical, and can guess reasonably well about how to structure a mathematical or logical equation or statement based on that text, then that could be fed into another system to produce a more accurate answer.