https://x.com/AndrewYNg/status/1895183929977843970
At Pulse, we put the models to the test with complex financial statements and nested tables – the results were underwhelming to say the least, and suffer from many of the same issues we see when simply dumping documents into GPT or Claude.
>Here's an idea that could use some polish, but I think as an esteemed AI researcher that it could improve your models. -- Andrew Ng
>OH MY GOSH! IT ISN'T PRODUCTION READY OUT OF THE BOX, LOOK AT HOW DUMB THIS STUFFED SHIRT HAPPENS TO BE!!! -- You
Nobody appreciates a grandstander. You're really treading on thin ice by attacking someone who has given so much to the AI community and asked for so little in return. Andrew Ng clearly does this because he enjoys it. You are here to self-promote and it looks bad on you.
It's a product released by a company Ng cofounded. So expecting production-readiness isn't asking for too much in my opinion.
Academics do research. You should not expect an academic paper to be turned into a business or production overnight.
The first neural network, the Mark 1 Perceptron, was invented during WWII for OCR. It took 70 years of non-commercial research to bring us to the very useful multimodal LLMs of today.
Not saying this is the case with the OP company, but if you're ready to make sweeping generalizations about cigars like that on the basis of a commercial blog selling a product, you might as well invoke some healthy skepticism, and consider how the generalization works on both sides of the spectrum.
The whole corporation-glorifying, academia-bashing gaslighting narrative is getting very tiring lately.
There was so much hype about AlphaGo years ago, which seemed to be very good at reasoning about what's good and what's not, that I thought some form of "AI" is really going come relatively soon. The reality we have these days is that statistical models seem to be running without any constraints, making rules up as they go.
I'm really thankful for the AI-assisted coding, code reviews and many other things that came from that, but the fact is, these really are just assistants that will make very bad mistakes and you need to watch them carefully.
At least an AI will respond politely when you point out its mistakes.
So AI hallucinations are nothing related to human confusion, or honest mistakes.
Fuck polite. It's a machine. Machines can't be polite because they don't have the capacity for empathy. What you are calling polite is a vacuous and flowery waste of expensive tokens in a patronizing tone.
My favorite is when it politely gets it wrong again. And again.
"We ran our OCR offering against competition. We find ours to be better. Sign up today."
It feels like an ad masquerading as a news story.
The news is they appear to be better than this other model. Their methodology might not be trustworthy but deliberately tanking the Ng model wouldn't be smart either.
The real solution would be to have machine readable data embedded in those PDFs, and have the table be built around that data.
We could then we actual machine readable financial statements or reports, much like our passports.
While the world became much more digitized (for example, for any sale, I get a PDF and an XML version of my receipt, which is great), but not everything is coming from computers and made for humans.
We have hand written notes, printed documents, etc., and OCR has to solve this. On the other hand, desktop OCR applications like Prizmo and latest versions of macOS already have much better output quality when compared to these models. Also there are specialized free applications to extract tables from PDF files (PDF files are bunch of fonts and pixels, they have no information about layout, tables, etc.).
We have these tools, and they work well. Even there's venerable Tessaract, built to OCR scanned papers and have neural network layer for years. Yet, we still try to throw LLMs to everyhting and we cheer like 5 year olds when it does 20% of these systems, and act like this technology doesn't exist, for two decades.
Agree on the hand-written part.
we're messing around with some agentic zooming around documents internally, will make our findings public!
However this come at a high cost in token and latency, but result in way better parse quality. Hopefully with new model this can be improved.
Not surprised it’s underwhelming
other ocr providers are doing a great job - we personally believe we have the highest accuracy tool on the market. we're not here to dunk on anyone just provide unbiased feedback when putting new document extraction tools through a challenge.
> - Completely fabricated numbers in several instances
Why are these different bullet points? Which one is correct number of wrong values?
https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...
""" In 2017, a team led by Andrew Ng published a paper showing off a Deep Learning model to detect pneumonia.
[...]
But there was a big problem with their results:
[...]
A random split would have sent images from the same patient to the train and validation sets.
This creates a leaky validation strategy.
"""
He's not infallible.