I want to build a system that can keep a running tab of my purchases by item, price, and store. I need to find a library that can effectively scan a receipt, recognize the store (usually name, number, address and logo at the top), and differentiate each item label and its price. I plan to manually tag each item label from a store's receipt with the item's barcode the first time it is seen.
I have been sporadically googling the past 6 months but am still unsure which OCR library(s) I should invest my time in. Or how low level I should start. Should I grab a library like tesseract and do my own feature extraction or libs that spit out semi-structured objects with text and hope it returns something similar enough across store receipts to make sense of consistently?
I'm ok with this being an extended project, but I would like some input on choosing a solid library with accurate OCR and advice on how to approach training/parsing from someone with more experience.
Other solutions and advice are also welcome++