Later research showed that models know that they don't know certain pieces of information, but the fine tuning constraint of providing answers did not give them the ability to express that they didn't know.
Asking the model questions against known information can produce a correct/incorrect map detailing a sample of facts that the model knows and does not know. Fine tuning a model to say "I don't know" in response to the those questions where it was incorrect can allow it to generalise the concept to its internal concept of unknown.
It is good to keep in mind that the models we have been playing with are just the first ones to appear. GPT 3.5 is like the Atari 2600. You can get it provide a limited experience for what you want and its cool that you can do it at all, but it is fundamentally limited and far from an ideal solution. I see the current proliferation of models to be like the Cambrian explosion of early 8 bit home computers. Exciting and interesting technology which can be used for real world purposes, but you still have to operate with the knowledge of the limitations forefront in your mind and tailor tasks to allow them to perform the bits they are good at. I have no-idea of the timeframe, but there is plenty more to come. There have been a lot of advances revealed in papers. A huge number of those advances have not yet coalesced into shipping models. When models cost millions to train you want to be using a set of enhancements that play nicely together. Some features will be mutually exclusive. By the time you have analysed the options to find an optimal combination, a whole lot of new papers will be suggesting more options.
We have not yet got the thing for AI that Unix was for computers. We are just now exposing people to the problems that drives the need to create such a thing.