undefined | Better HN

story

0 pointsmehrdadn7y ago0 comments

Ooh, I've been looking for something like this. Could I ask how you parse your bills? Did you just hack together your own script or did you find a parser online? I tried doing this several times but it seems rather painful to get right since the formats can differ and since copying text from PDFs can sometimes ignore the table layouts and whatnot.

0 comments

petters7y ago

I have just written my own parser. Took some work when my bank switched from txt to PDF.

Then I match keywords in the strings to categorize the expenses. These days, this would be called AI :)

Edit: I use pdftotext, which has a mode that keeps the spatial structure of tables. Works for my bank.

mkl7y ago

There are no general solutions, unfortunately. Every PDF-generating system will do things differently, so parsing them must be custom for each type. Generated PDFs are usually very systematic, though, and PDF commands are text (compressed). You can uncompress with PDFTK:

  pdftk input.pdf output output.pdf uncompress

Then try grepping or using whatever tools you like (there will be binary parts of the file still, like embedded fonts and bitmaps).

1 more reply

geerlingguy7y ago

The poor person’s solution would be mint.com... though the security model there is dubious.

1 more reply

j / k navigate · click thread line to collapse

0 comments

petters7y ago

I have just written my own parser. Took some work when my bank switched from txt to PDF.

Then I match keywords in the strings to categorize the expenses. These days, this would be called AI :)

Edit: I use pdftotext, which has a mode that keeps the spatial structure of tables. Works for my bank.

mkl7y ago

  pdftk input.pdf output output.pdf uncompress

Then try grepping or using whatever tools you like (there will be binary parts of the file still, like embedded fonts and bitmaps).

1 more reply

geerlingguy7y ago

The poor person’s solution would be mint.com... though the security model there is dubious.

1 more reply

j / k navigate · click thread line to collapse