While I lack specific data, my intuition is based on observed trends in AI model development. I believe some other models that claimed such numbers excelled in benchmarks but fell short in real-world applications. Further research can validate this claim, and I welcome a balanced discussion.
It does seem incredible that chatgpt has so much expertise in literally everything. Does this mean you can beat chatgpt by creating smaller "experts" and directing questions to each?