shekhar101 on Hacker News

I am have a set of pdf that are bank statements. The formats of these statements are different based on the bank but they are limited set (<15). What's the current best approach to extract tabular data from PDFs? I tried writing custom logic based on pdfplumber and such but they are very fragile and have lots of ad-hoc logic. The maintenance is pretty high. Are there small models that can run preferably on CPUs alone and that I can possibly fine tune for this task? Any guides or pointers for that? I see a lot of available models, but as someone with no ML background, it's difficult to navigate through.

4shekhar1012y ago2

9

Tesla Fleet Telemetry (opens in new tab)

(github.com)GitHub

214shekhar1012y ago130

10

Why Tesla removed radar and ultrasonic sensors [video] (opens in new tab)

(youtube.com)Video

241shekhar1013y ago471

11

Microscopic View of an Intel I486 (opens in new tab)

(youtube.com)Video

2shekhar1013y ago0

12

Ask HN: How do I properly secure a windows server machine using FOSS tools?

I manage a windows server machine for a small business. I do it part time for my brother's business. I've enabled usual protection suggested by Microsoft like regular scan, backup, write protection in certain folders with whitelist etc. Are there FOSS tools that I can use to manage, monitor and secure Windows machine. His office also has 7-8 windows laptops connecting to this "server" which is just a powerful AMD desktops class machine with good storage and memory. How can I manage this small fleet and secure them also?

2shekhar1013y ago1

13

Novi Is Shutting Down (opens in new tab)

(novi.com)

3shekhar1013y ago1

14

Shimano Orders Hammerhead to Remove Di2 Features (opens in new tab)

(youtube.com)Video

6shekhar1014y ago0

15

Samsung Freestyle [Video] (opens in new tab)

(youtube.com)Video

2shekhar1014y ago0

shekhar101

Recent submissions

Magnifier on Mac [video] (opens in new tab)

Terence Tao on how we measure the cosmos [video] (opens in new tab)

The Most Useful Thing AI Has Ever Done [video] (opens in new tab)

Steroids Are Awesome [video] (opens in new tab)

Jeremy Howard: CUDA for Python Programmers [video] (opens in new tab)

Coqui Is Shutting Down (opens in new tab)

Anthropic Claude for Google Sheets (opens in new tab)

Ask HN: What's the current best way to extract tables from PDFs?

Tesla Fleet Telemetry (opens in new tab)

Why Tesla removed radar and ultrasonic sensors [video] (opens in new tab)

Microscopic View of an Intel I486 (opens in new tab)

Ask HN: How do I properly secure a windows server machine using FOSS tools?

Novi Is Shutting Down (opens in new tab)

Shimano Orders Hammerhead to Remove Di2 Features (opens in new tab)

Samsung Freestyle [Video] (opens in new tab)

Recent submissions

Magnifier on Mac [video] (opens in new tab)

Terence Tao on how we measure the cosmos [video] (opens in new tab)

The Most Useful Thing AI Has Ever Done [video] (opens in new tab)

Steroids Are Awesome [video] (opens in new tab)

Jeremy Howard: CUDA for Python Programmers [video] (opens in new tab)

Coqui Is Shutting Down (opens in new tab)

Anthropic Claude for Google Sheets (opens in new tab)

Ask HN: What's the current best way to extract tables from PDFs?

Tesla Fleet Telemetry (opens in new tab)

Why Tesla removed radar and ultrasonic sensors [video] (opens in new tab)

Microscopic View of an Intel I486 (opens in new tab)

Ask HN: How do I properly secure a windows server machine using FOSS tools?

Novi Is Shutting Down (opens in new tab)

Shimano Orders Hammerhead to Remove Di2 Features (opens in new tab)

Samsung Freestyle [Video] (opens in new tab)