In that vein, perhaps the delta between o3 @ 87.5% and Human @ 85% represents a deficit in the ability of text to communicate human reasoning.
In other words, it's possible humans can reason better than o3, but cannot articulate that reasoning as well through text - only in our heads, or through some alternative medium.
Yeah, this is sort of meaningless without some idea of cost or consequences of a wrong answer. One of the nice things about working with a competent human is being able to tell them "all of our jobs are on the line" and knowing with certainty that they'll come to a good answer.
It's possible humans reason better through text than not through text, so these models, having been trained on text, should be able to out-reason any person who's not currently sitting down to write.