undefined | Better HN

0 pointsmattbillenstein6y ago0 comments

You're parsing PDF? I'd imagine most places allow some sort of data export like csv or something? That's how I get the data out of Chase and BofA.

0 comments

4 comments · 2 top-level

haberman6y ago· 1 in thread

Yes, in my experience CSV/XML/etc export is spotty. Some institutions don't have it at all, and even when they do the time range can be limited or hard to time-window reliably.

For example, I can't get CSV about my pay stub and all the places money flows (taxes, insurance, etc). So I use PDF as the best way to get all the data.

Parsing PDF is a huge PITA, since PDF is really designed only for layout and not to encode semantic document structure. But if you want the greatest amount of visibility, nothing is as authoritative as the actual statements.

mattbillensteinOP6y ago

Yeah, I guess I don't really look at paystubs often and my banks support good csv export for like 2+ years, so I'll always have it when I need when I refresh those sheets. I do year-to-date in those sheets to make them not huge.

Regarding taxes, I do a tax year calculation using my W2's and returns to compute an effective tax rate and figure out ways I can improve my situation re taxes -- but my budgeting basically only starts once the net money hits a bank account which has worked pretty well.

dataflow6y ago· 1 in thread

There are a few problems with exporting data... here's what comes to my mind at the moment:

- It's an additional step and unnecessary when you're already downloading PDF statements, which you probably should be doing regardless

- Data is often unavailable for export after a few months or a year or two (depending on the bank), far earlier than PDFs

- If you have PDF scans (especially from earlier days) then you need to parse the OCR anyway

- If you're using official APIs (instead of writing a scraper or or downloading by hand) then you may need to pay extra

mattbillensteinOP6y ago

To each their own, csv for me has been the easiest thing moving forward every month or three.

I did do a historical net worth calculation and the only data I had going back far enough was PDFs, so through a mix of bash scripts and pdftotext, I was able to get a number for each month back about 10 years. But I ended up just putting that monthly balance for each account in a google sheet so I could sum and plot it there. Now I just stick the month-end balance for each account in a sheet to keep this updated.

j / k navigate · click thread line to collapse

0 comments

4 comments · 2 top-level

haberman6y ago· 1 in thread

Yes, in my experience CSV/XML/etc export is spotty. Some institutions don't have it at all, and even when they do the time range can be limited or hard to time-window reliably.

For example, I can't get CSV about my pay stub and all the places money flows (taxes, insurance, etc). So I use PDF as the best way to get all the data.

mattbillensteinOP6y ago

dataflow6y ago· 1 in thread

There are a few problems with exporting data... here's what comes to my mind at the moment:

- It's an additional step and unnecessary when you're already downloading PDF statements, which you probably should be doing regardless

- Data is often unavailable for export after a few months or a year or two (depending on the bank), far earlier than PDFs

- If you have PDF scans (especially from earlier days) then you need to parse the OCR anyway

- If you're using official APIs (instead of writing a scraper or or downloading by hand) then you may need to pay extra

mattbillensteinOP6y ago

To each their own, csv for me has been the easiest thing moving forward every month or three.

j / k navigate · click thread line to collapse