Skip to content
Better HN
Top
Best
Ask
Show
New
Jobs
Search
⌘K
0 points
mehrdadn
6y ago
0 comments
Save
Share
Can I ask how you parse PDFs? I'm curious both in terms of reading the PDF data (Python library?) and parsing it (regex?)... and do you have to deal with OCR as well?
0 comments
2 comments · 1 top-level
top
newest
oldest
haberman
6y ago
· 1 in thread
I use "pdftotext -layout" and then parse that. Here is some more info from people who have tried this approach:
https://hn.algolia.com/?dateRange=all&page=0&prefix=false&qu...
mehrdadn
OP
6y ago
Thanks!
j
/
k
navigate · click thread line to collapse