undefined | Better HN

0 pointsbelval2y ago0 comments

Don't look at absolute number, instead think of it in terms of relative improvement.

DocVQA is a benchmark with a very strong SOTA. GPT-4 achieves 88.4, Gemini 90.9. It's only 2.5% increase, but a ~22% error reduction which is massive for real-life usecases where the error tolerance is lower.

0 comments

machiaweliczny2y ago

This + some benchmarks are shitty thus rational model should be allowed to not answer them but ask claryfying questions.

1 more reply

j / k navigate · click thread line to collapse

0 comments

machiaweliczny2y ago

This + some benchmarks are shitty thus rational model should be allowed to not answer them but ask claryfying questions.

1 more reply

j / k navigate · click thread line to collapse