> Heaviest blue whale ever recorded
I don't think estimation errors regarding things outside of someone's area of familiarity say much.
You could ask a much "easier"" question from the same topic area and still get terrible answers: "What percentage of blue whales are blue?" Or just "Are blue whales blue?"
Estimating something often encountered but uncounted seems like a better test. Like how many cars pass in front of my house every day. I could apply arithmetic, soft logic and intuition to that. But that would be a difficult question to grade, given it has no universal answer.