Don't look at absolute number, instead think of it in terms of relative improvement.
DocVQA is a benchmark with a very strong SOTA. GPT-4 achieves 88.4, Gemini 90.9. It's only 2.5% increase, but a ~22% error reduction which is massive for real-life usecases where the error tolerance is lower.