What's the easiest way to go about doing this? Even converting it to a simple text file with proper line breaks and paragraphs would take this problem a long way.
Consider that you have a pdf file with margins, a common font, the name of the book/chapter on the top/bottom gutters and the page number. Parsing the contents page would be great but it is not necessary.
No comments yet.