I think the main problem is that it doesn't actually have a concept of truth or falsehood—it's just very good at knowing what sounds correct. So, to GPT3, a subtle error is almost as good as being totally right, whereas in practice there's a huge gulf between correct and incorrect. That's a categorical problem, not something that can be patched.