I wanted to abstract away the PDF form building my own html form on top of a data model that can later be used to programmatically fill the PDF .
Since I had 100s of PDFs, I wanted an OCR+LLM pipeline to build a data model for each PDF. Unfortunately, OCR + LLM works ~90% of the time but sometimes fields are missed or mislabeled in the data model.
Does this sometimes get it wrong during programmatic filling? How do you deal with that?
No comments yet.