Putting Andrew Ng's OCR models to the test (opens in new tab)

(runpulse.com)

124 pointsritvikpandey211y ago61 comments

61 comments

53 comments · 21 top-level

ritvikpandey21OP1y ago· 8 in thread

Today, Andrew Ng, one of the legends of the AI world, released a new document extraction service that went viral on X:

https://x.com/AndrewYNg/status/1895183929977843970

At Pulse, we put the models to the test with complex financial statements and nested tables – the results were underwhelming to say the least, and suffer from many of the same issues we see when simply dumping documents into GPT or Claude.

panny1y ago

It seems like you missed the point. Andrew Ng is not there to give you production grade models. He exists to deliver a proof of concept that needs refinements.

>Here's an idea that could use some polish, but I think as an esteemed AI researcher that it could improve your models. -- Andrew Ng

>OH MY GOSH! IT ISN'T PRODUCTION READY OUT OF THE BOX, LOOK AT HOW DUMB THIS STUFFED SHIRT HAPPENS TO BE!!! -- You

Nobody appreciates a grandstander. You're really treading on thin ice by attacking someone who has given so much to the AI community and asked for so little in return. Andrew Ng clearly does this because he enjoys it. You are here to self-promote and it looks bad on you.

yorwba1y ago

This is not about some paper Ng published with a new idea that needs some polishing before being useful in the real world.

It's a product released by a company Ng cofounded. So expecting production-readiness isn't asking for too much in my opinion.

1 more reply

ritvikpandey21OP1y ago

we respect andrew a lot, as we mentioned in our blog! he's an absolute legend in the field, founded google brain, coursera, worked heavily on baidu ai. this is more to inform everyone not to blindly trust new document extraction tools without really giving them challenges!

1 more reply

moralestapia1y ago

That's the standard tier of competence you expect from Ng. Academia is always close but no cigar.

teruakohatu1y ago

> That's the standard tier of competence you expect from Ng. Academia is always close but no cigar.

Academics do research. You should not expect an academic paper to be turned into a business or production overnight.

The first neural network, the Mark 1 Perceptron, was invented during WWII for OCR. It took 70 years of non-commercial research to bring us to the very useful multimodal LLMs of today.

2 more replies

ritvikpandey21OP1y ago

don't be mistaken, andrew's a legend! he's done some incredible work -- google brain, coursera, baidu ai, etc.

igleria1y ago

He might not have business chops, but this seems a bit harsh :/

tpoacher1y ago

And on the other side, there's companies like Theranos, where you think the world will never be the same again, until you actually try the thing they're selling. Full cigar promised, but not even close.

Not saying this is the case with the OP company, but if you're ready to make sweeping generalizations about cigars like that on the basis of a commercial blog selling a product, you might as well invoke some healthy skepticism, and consider how the generalization works on both sides of the spectrum.

The whole corporation-glorifying, academia-bashing gaslighting narrative is getting very tiring lately.

lukaslalinsky1y ago· 5 in thread

That's the problem with the current deep learning models, they don't seem to know when they are wrong.

There was so much hype about AlphaGo years ago, which seemed to be very good at reasoning about what's good and what's not, that I thought some form of "AI" is really going come relatively soon. The reality we have these days is that statistical models seem to be running without any constraints, making rules up as they go.

I'm really thankful for the AI-assisted coding, code reviews and many other things that came from that, but the fact is, these really are just assistants that will make very bad mistakes and you need to watch them carefully.

spiderfarmer1y ago

Most people don’t realize when they’re wrong either. It’s fascinating that, just like with humans, reasoning appears to reduce hallucinations.

At least an AI will respond politely when you point out its mistakes.

chad1n1y ago

I don't think that's the case, when a model is reasoning, it sometimes starts gaslighting itself and "solving" other problems completely than the one you've shown. Reasoning can help "in general", but very frequently, reasoning also makes it more "nondetermistic". Without reasoning, usually it ends up just writing some code from its training data, but with reasoning, it can end up hallucinating hard. Yesterday, I asked Claude thinking to solve me a problem in c++ and it showed the result in python.

greatgib1y ago

If I just go out of a cave since one year and someone ask me who was on tv last night, I will not invent a name and be convincing that it is the truth. Or invent that you are a famous book author about cooking chicken because it sounds well.

So AI hallucinations are nothing related to human confusion, or honest mistakes.

1 more reply

dingnuts1y ago

they gaslight you in "polite" Corporate Voice, you mean. It's one of the things I hate most about conversational agents. I always tell them to stop using the first person and respond in short declarative sentences and to stop pretending to have emotions and it makes it a lot more tolerable.

Fuck polite. It's a machine. Machines can't be polite because they don't have the capacity for empathy. What you are calling polite is a vacuous and flowery waste of expensive tokens in a patronizing tone.

My favorite is when it politely gets it wrong again. And again.

vrighter1y ago

Ah but I (usually) know when I will probably be wrong if I do give an answer, when I know I'm not familiar enough with the subject. Or if I do I will explicitly say this is an educated guess, at best. What I will not do is just spout bullshit with the confidence of an orange-musk-puppet

cheema331y ago· 3 in thread

Am I the only one seeing a conflict of interest issue with this blog post?

"We ran our OCR offering against competition. We find ours to be better. Sign up today."

It feels like an ad masquerading as a news story.

edude031y ago

+1, and worse other than pointing out where it was wrong, there wasn't any clear test criteria, process, side by side comparison, details about either model etc.

mikevin1y ago

Of course there's a conflict of interest. It's their right to promote their solution but I would never take their word for it.

oliwarner1y ago

A conflict? It's their blog. They can post what they like, including adverts to it.

The news is they appear to be better than this other model. Their methodology might not be trustworthy but deliberately tanking the Ng model wouldn't be smart either.

helloguillecl1y ago· 3 in thread

OCR, VLM or LLM for such important use cases seems like a a problem we should not have in 2025.

The real solution would be to have machine readable data embedded in those PDFs, and have the table be built around that data.

We could then we actual machine readable financial statements or reports, much like our passports.

bayindirh1y ago

The problem is, you're coming from paper for these PDFs, and this is the step where you add that data.

While the world became much more digitized (for example, for any sale, I get a PDF and an XML version of my receipt, which is great), but not everything is coming from computers and made for humans.

We have hand written notes, printed documents, etc., and OCR has to solve this. On the other hand, desktop OCR applications like Prizmo and latest versions of macOS already have much better output quality when compared to these models. Also there are specialized free applications to extract tables from PDF files (PDF files are bunch of fonts and pixels, they have no information about layout, tables, etc.).

We have these tools, and they work well. Even there's venerable Tessaract, built to OCR scanned papers and have neural network layer for years. Yet, we still try to throw LLMs to everyhting and we cheer like 5 year olds when it does 20% of these systems, and act like this technology doesn't exist, for two decades.

helloguillecl1y ago

The funny thing is that sometimes we need to machine-read documents produced by humans on machines, but the actual source is almost always machine-readable data.

Agree on the hand-written part.

1 more reply

advisedwang1y ago

A lot of times you are OCRing documents from people who do not care about how easy it is for the reader to extract data. A common example is regulatory filings - the goal is to comply with the law, not help people read your data. Or perhaps it's from a source that sells the data or has copyright and doesn't want to make it easy for other people to use in ways besides their intention. etc.

serjester1y ago· 2 in thread

Personally I find it frustrating they called it "agentic" parsing when there's nothing agentic about it. Not surprised the quality is lackluster.

ritvikpandey21OP1y ago

we're not the biggest believers in 'agentic' parsing! we definitely do believe there's a specific role for LLMs in the data ingestion pipeline, but this occurs more when bar graphs/charts/figures -> structured markdown.

we're messing around with some agentic zooming around documents internally, will make our findings public!

pierre1y ago

If you want to try agentic parsing we added support for sonnet-3.7 agentic parse and gemini 2.0 in llamaParse. cloud.llamaindex.ai/parse (select advanced options / parse with agent then a model)

However this come at a high cost in token and latency, but result in way better parse quality. Hopefully with new model this can be improved.

sinuhe691y ago· 2 in thread

I still don’t understand why companies don’t release a machine-readable version of their finance statements. They are read by machines anyway! Export those data from their software is a simple task.

nabla91y ago

In EU European Securities and Markets Authority (ESMA) mandated machine readable standard from 2020. In the US Financial Data Transparency Act of 2022 (FDTA) made similar push and SEC is working towards it.

mistrial91y ago

maybe related to why large banks can reply to requests for written records with low-quality photocopies

j7ake1y ago· 2 in thread

Honestly he’s famous for pedagogy and research papers, not real world products.

Not surprised it’s underwhelming

deepsun1y ago

What about Coursera? It's a real world product.

porridgeraisin1y ago

> Pedagogy

1 more reply

Ishirv1y ago· 2 in thread

good read, saw your recent raise in BI - congrats!

sidmanchkanti211y ago

appreciate it!

ritvikpandey21OP1y ago

thanks man!

noitanec1y ago· 1 in thread

I took the screenshot of the the bill in their article and ran through the tool at https://va.landing.ai/demo/doc-extraction. The tool doesn't hallucinate any of the value as reported in the article. In fact, the value for Profit/loss for continuing operations is 1654 in their extraction which is the gt, still they've plot a red bbox around it.

ritvikpandey21OP1y ago

good catch on the 1654, will edit that on our blog! try it multiple times, we've noticed esp for tabular data it's fairly nondeterministic. we trialed it over 10 times on many financial CIMs and observed this phenomena.

xrd1y ago· 1 in thread

Will we start to see a type of "SLA" from AI model providers? If I rent a server, I can pay for more 9s, but can I pay for a guarantee of accuracy from the models?

IanCal1y ago

You could contact an insurance firm about this. Lots of SLAs are simple forms of this really where you aren't buying reliability you're getting payouts if it falls below some level.

krashidov1y ago· 1 in thread

How does pulse compare to reducto and gemini? Claude is actually pretty good at PDFs (much better than GPT)

ritvikpandey21OP1y ago

claude is definitely better than gpt -- but both have their flaws! they pretty much fall flat on their face with nested entries, low-fidelity images, etc. (we detailed this heavily in our blog post here [1])

other ocr providers are doing a great job - we personally believe we have the highest accuracy tool on the market. we're not here to dunk on anyone just provide unbiased feedback when putting new document extraction tools through a challenge.

[1]: https://www.runpulse.com/blog/why-llms-suck-at-ocr

bzmrgonz1y ago· 1 in thread

Why isn't there a pixel comparison step after the extraction? I think that would have identified some errors. Essentially, read, extract, recreate, pixel compare.

advisedwang1y ago

Recreating is not going to be close enough for a pixel comparison. First of all that requires also detecting and recreating font, font variation, style and exact positioning. But more importantly he entire reason why OCR is challenging is because there's a lot of variation that happens between rasterization in the source system and scanning at the end. Especially if you are doing OCR on actual printed documents, which is really the only unsolved challenge from conventional OCR.

what1y ago· 1 in thread

> - Over 50% hallucinated values in complex financial tables

> - Completely fabricated numbers in several instances

Why are these different bullet points? Which one is correct number of wrong values?

ritvikpandey21OP1y ago

to not make the read extra long, we only included one example. we tried over 50 docs and found a couple with pie charts/bar graphs that weren't parsed at all. there were also a few instances with entire column entires incorrect due to mismatching.

codelion1y ago

I think there's a valid point about the production-readiness aspect. It's one thing to release a research paper, and another to market something as a service. The expectation levels are just different, and fair to scrutinize accordingly.

rahimnathwani1y ago

Has anyone compared this with the stuff Allen AI recently released?

https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...

jgalt2121y ago

I can't believe there's market demand for non deterministic OCR, but what I really suspect is almost no one scans the same document twice and probably don't even realize this is a possibility.

infecto1y ago

I think a lot of OCR workflows are going the way of multimodal models but I still find that the cloud OCR tools to be vastly superior to most of these other startups in the space like the ad piece here from pulse.

veerdoshi1y ago

Interested to see how OCR evals play a role in deciding the best model. Great read

sreekanth8501y ago

What has agents do with document parsing? Is it just extracting the text and use an LLM to analyze the extracted data?

_giorgio_1y ago

https://x.com/svpino/status/1592140348905517056

""" In 2017, a team led by Andrew Ng published a paper showing off a Deep Learning model to detect pneumonia.

[...]

But there was a big problem with their results:

[...]

A random split would have sent images from the same patient to the train and validation sets.

This creates a leaky validation strategy.

"""

He's not infallible.

kneegerman1y ago

>grifter grifts diggity

j / k navigate · click thread line to collapse

61 comments

53 comments · 21 top-level

ritvikpandey21OP1y ago· 8 in thread

Today, Andrew Ng, one of the legends of the AI world, released a new document extraction service that went viral on X:

https://x.com/AndrewYNg/status/1895183929977843970

panny1y ago

It seems like you missed the point. Andrew Ng is not there to give you production grade models. He exists to deliver a proof of concept that needs refinements.

>Here's an idea that could use some polish, but I think as an esteemed AI researcher that it could improve your models. -- Andrew Ng

>OH MY GOSH! IT ISN'T PRODUCTION READY OUT OF THE BOX, LOOK AT HOW DUMB THIS STUFFED SHIRT HAPPENS TO BE!!! -- You

yorwba1y ago

This is not about some paper Ng published with a new idea that needs some polishing before being useful in the real world.

It's a product released by a company Ng cofounded. So expecting production-readiness isn't asking for too much in my opinion.

1 more reply

ritvikpandey21OP1y ago

1 more reply

moralestapia1y ago

That's the standard tier of competence you expect from Ng. Academia is always close but no cigar.

teruakohatu1y ago

> That's the standard tier of competence you expect from Ng. Academia is always close but no cigar.

Academics do research. You should not expect an academic paper to be turned into a business or production overnight.

The first neural network, the Mark 1 Perceptron, was invented during WWII for OCR. It took 70 years of non-commercial research to bring us to the very useful multimodal LLMs of today.

2 more replies

ritvikpandey21OP1y ago

don't be mistaken, andrew's a legend! he's done some incredible work -- google brain, coursera, baidu ai, etc.

igleria1y ago

He might not have business chops, but this seems a bit harsh :/

tpoacher1y ago

The whole corporation-glorifying, academia-bashing gaslighting narrative is getting very tiring lately.

lukaslalinsky1y ago· 5 in thread

That's the problem with the current deep learning models, they don't seem to know when they are wrong.

spiderfarmer1y ago

Most people don’t realize when they’re wrong either. It’s fascinating that, just like with humans, reasoning appears to reduce hallucinations.

At least an AI will respond politely when you point out its mistakes.

chad1n1y ago

greatgib1y ago

So AI hallucinations are nothing related to human confusion, or honest mistakes.

1 more reply

dingnuts1y ago

My favorite is when it politely gets it wrong again. And again.

vrighter1y ago

cheema331y ago· 3 in thread

Am I the only one seeing a conflict of interest issue with this blog post?

"We ran our OCR offering against competition. We find ours to be better. Sign up today."

It feels like an ad masquerading as a news story.

edude031y ago

+1, and worse other than pointing out where it was wrong, there wasn't any clear test criteria, process, side by side comparison, details about either model etc.

mikevin1y ago

Of course there's a conflict of interest. It's their right to promote their solution but I would never take their word for it.

oliwarner1y ago

A conflict? It's their blog. They can post what they like, including adverts to it.

The news is they appear to be better than this other model. Their methodology might not be trustworthy but deliberately tanking the Ng model wouldn't be smart either.

helloguillecl1y ago· 3 in thread

OCR, VLM or LLM for such important use cases seems like a a problem we should not have in 2025.

The real solution would be to have machine readable data embedded in those PDFs, and have the table be built around that data.

We could then we actual machine readable financial statements or reports, much like our passports.

bayindirh1y ago

The problem is, you're coming from paper for these PDFs, and this is the step where you add that data.

While the world became much more digitized (for example, for any sale, I get a PDF and an XML version of my receipt, which is great), but not everything is coming from computers and made for humans.

helloguillecl1y ago

The funny thing is that sometimes we need to machine-read documents produced by humans on machines, but the actual source is almost always machine-readable data.

Agree on the hand-written part.

1 more reply

advisedwang1y ago

serjester1y ago· 2 in thread

Personally I find it frustrating they called it "agentic" parsing when there's nothing agentic about it. Not surprised the quality is lackluster.

ritvikpandey21OP1y ago

we're messing around with some agentic zooming around documents internally, will make our findings public!

pierre1y ago

If you want to try agentic parsing we added support for sonnet-3.7 agentic parse and gemini 2.0 in llamaParse. cloud.llamaindex.ai/parse (select advanced options / parse with agent then a model)

However this come at a high cost in token and latency, but result in way better parse quality. Hopefully with new model this can be improved.

sinuhe691y ago· 2 in thread

nabla91y ago

mistrial91y ago

maybe related to why large banks can reply to requests for written records with low-quality photocopies

j7ake1y ago· 2 in thread

Honestly he’s famous for pedagogy and research papers, not real world products.

Not surprised it’s underwhelming

deepsun1y ago

What about Coursera? It's a real world product.

porridgeraisin1y ago

> Pedagogy

1 more reply

Ishirv1y ago· 2 in thread

good read, saw your recent raise in BI - congrats!

sidmanchkanti211y ago

appreciate it!

ritvikpandey21OP1y ago

thanks man!

noitanec1y ago· 1 in thread

ritvikpandey21OP1y ago

xrd1y ago· 1 in thread

Will we start to see a type of "SLA" from AI model providers? If I rent a server, I can pay for more 9s, but can I pay for a guarantee of accuracy from the models?

IanCal1y ago

You could contact an insurance firm about this. Lots of SLAs are simple forms of this really where you aren't buying reliability you're getting payouts if it falls below some level.

krashidov1y ago· 1 in thread

How does pulse compare to reducto and gemini? Claude is actually pretty good at PDFs (much better than GPT)

ritvikpandey21OP1y ago

[1]: https://www.runpulse.com/blog/why-llms-suck-at-ocr

bzmrgonz1y ago· 1 in thread

Why isn't there a pixel comparison step after the extraction? I think that would have identified some errors. Essentially, read, extract, recreate, pixel compare.

advisedwang1y ago

what1y ago· 1 in thread

> - Over 50% hallucinated values in complex financial tables

> - Completely fabricated numbers in several instances

Why are these different bullet points? Which one is correct number of wrong values?

ritvikpandey21OP1y ago

codelion1y ago

rahimnathwani1y ago

Has anyone compared this with the stuff Allen AI recently released?

https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...

jgalt2121y ago

I can't believe there's market demand for non deterministic OCR, but what I really suspect is almost no one scans the same document twice and probably don't even realize this is a possibility.

infecto1y ago

veerdoshi1y ago

Interested to see how OCR evals play a role in deciding the best model. Great read

sreekanth8501y ago

What has agents do with document parsing? Is it just extracting the text and use an LLM to analyze the extracted data?

_giorgio_1y ago

https://x.com/svpino/status/1592140348905517056

""" In 2017, a team led by Andrew Ng published a paper showing off a Deep Learning model to detect pneumonia.

[...]

But there was a big problem with their results:

[...]

A random split would have sent images from the same patient to the train and validation sets.

This creates a leaky validation strategy.

"""

He's not infallible.

kneegerman1y ago

>grifter grifts diggity

j / k navigate · click thread line to collapse