- Every document has ground truth text, a JSON schema, and the ground truth JSON.
- Run OCR on each document and pass the result to GPT-4o along with the JSON Schema
- Compare the predicted JSON against the ground truth JSON for accuracy.
In our benchmark, the ground truth text => gpt-4o was 99.7%+ accuracy. Meaning whenever gpt-4o was given the correct text, it could extract the structured JSON values ~100% of the time. So if we pass in the OCR text from Mistral and it scores 70%, that means the inaccuracies are isolated to OCR errors.