undefined | Better HN

0 pointsthemanmaran1y ago0 comments

Update: Just ran our benchmark on the Mistral model and results are.. surprisingly bad?

Mistral OCR:

- 72.2% accuracy

- $1/1000 pages

- 5.42s / page

Which is pretty far cry from the 95% accuracy they were advertising from their private benchmark. The biggest thing I noticed is how it skips anything it classifies as an image/figure. So charts, infographics, some tables, etc. all get lifted out and returned as [image](image_002). Compared to the other VLMs that are able to interpret those images into a text representation.

https://github.com/getomni-ai/benchmark

https://huggingface.co/datasets/getomni-ai/ocr-benchmark

https://getomni.ai/ocr-benchmark

0 comments

2 comments · 1 top-level

Thaxll1y ago· 1 in thread

Do you benchmark the right thing though? It seems to focus a lot on image / charts etc...

The 95% from their benchmark: "we evaluate them on our internal “text-only” test-set containing various publication papers, and PDFs from the web; below:"

Text only.

themanmaranOP1y ago

Our goal is to benchmark on real world data. Which is often more complex than plain text. If we have to make the benchmark data easier for the model to perform better, it's not an honest assessment of the reality.

j / k navigate · click thread line to collapse