For example, I can't get CSV about my pay stub and all the places money flows (taxes, insurance, etc). So I use PDF as the best way to get all the data.
Parsing PDF is a huge PITA, since PDF is really designed only for layout and not to encode semantic document structure. But if you want the greatest amount of visibility, nothing is as authoritative as the actual statements.
Regarding taxes, I do a tax year calculation using my W2's and returns to compute an effective tax rate and figure out ways I can improve my situation re taxes -- but my budgeting basically only starts once the net money hits a bank account which has worked pretty well.
- It's an additional step and unnecessary when you're already downloading PDF statements, which you probably should be doing regardless
- Data is often unavailable for export after a few months or a year or two (depending on the bank), far earlier than PDFs
- If you have PDF scans (especially from earlier days) then you need to parse the OCR anyway
- If you're using official APIs (instead of writing a scraper or or downloading by hand) then you may need to pay extra
I did do a historical net worth calculation and the only data I had going back far enough was PDFs, so through a mix of bash scripts and pdftotext, I was able to get a number for each month back about 10 years. But I ended up just putting that monthly balance for each account in a google sheet so I could sum and plot it there. Now I just stick the month-end balance for each account in a sheet to keep this updated.