Or you just need a model that can recognize math, and then pass it to a system that can do math. Math is actually something traditional, non-AI systems are very good at doing (it is the raison d’être of traditional computing), so if an AI model can simply recognize that math needs to do be done, there is no reason for it to do the math.
Wolfram Alpha already does that. But that's because Wolfram Alpha is built as a model whose purpose is "recognize what kind of problem this natural language query requires, then pass it on to the problem engine for that kind of problem", where each problem engine is an actual solution model for that kind of problem, based on actual facts about the world.
ChatGPT, though, is built as a completely different type of model, whose purpose is "find a pattern that this natural language query matches, then generate a greatest probability sequence of natural language words for that pattern based on the training data set". That's a completely different structure.
It's also possible to create fact-grounded retrieval-enhanced language models e.g. https://proceedings.mlr.press/v162/borgeaud22a.html.
Personally I think hybridization is the way to go.
https://twitter.com/goodside/status/1581805503897735168?s=20...
GPT-3 is perfectly capable of recognizing what kinds of things it will be bad at, and can be encouraged to generate machine-executable queries to fill in that gap.
(note this is based on prompting GPT-3, not chatGPT, but the principles about what this language model is capable of apply)
It can even go one level deeper - there's an example there of it generating a python script that uses the 'wikipedia' library to look up the date of death of the Queen, as a way to fill in knowledge it doesn't have. Tell it it can use the wolframalpha module to answer questions that involve complex units, quantities, or advanced mathematics, and it'll almost certainly do that too.
One of the things I love is this reply to that tweet - https://twitter.com/JulienMouchnino/status/15820120109127065...
"How is it possible that GPT-3 understands what a human can compute in his/her head?"
Riley's quick demo prompt that shows how well ChatGPT can guess whether particular mathematical results are easy to guess matches human intuition surprisingly well.
By what seems to me to be the obvious choice for a definition of "bad at", namely "not answering queries based on an actual semantically connected world model", GPT-3 is bad at everything. And an obvious example of an endpoint of your perfectly reasonable suggestion to have it pass on queries to solution machines that are based on actual world models, is...Wolfram Alpha.
You could then use it to classify things that it is bad at and use that information (as part of a larger whole) to dispatch a query to the knowledge system that can return back the proper (current) information, and report on that.
The following is a list of questions. Identify the category they belong to as one of {Current Events}, {General knowlege}, {Unit conversion}, {Math}:
1. How many feet in a mile?
2. What is the square root of 541696?
3. Who is the Speaker of the House?
4. How many turkeys in Turkey?
to which it responds: 1. Unit conversion
2. Math
3. Current Events
4. General Knowledge
The supervisor system (for lack of a better word) would then dispatch the questions to different systems that it is coded to be able to either further classify the question or provide the proper answer.