undefined | Better HN

0 pointsCaptainOfCoit8mo ago0 comments

> Lot's of OCR/ LLM's (Even Gemini Pro 2.5) still struggle converting complex tables to markdown or HTML:

But that's something else, that's no longer just OCR ("Optical Character Recognition"). If the goal suddenly changes from "Can take letters in images and make into digital text" to "Can replicate anything seen on a screen", the problem-space gets too big.

For those images you have, I'd use something like Magistral + Structured Outputs instead, first pass figure out what's the right structure to parse into, second pass to actually fetch and structure the data.

0 comments

4 comments · 3 top-level

kmacdough8mo ago· 1 in thread

> But that's something else, that's no longer just OCR ("Optical Character Recognition").

Lines often blur for technologies under such rapid evolution. Not sure it's helpful to nitpick the verbal semantics.

It is a fair question whether the OCR-inspired approach is the correct approach for more complex structured documents where wider context may be important. But saying it's "not OCR" doesn't seem meaningful from a technical perspective. It's an extension of the same goal to convert images of documents into the most accurate and useful digitized form with the least manual intervention.

CaptainOfCoitOP8mo ago

Personally I think it's a meaningful distinction between "Can extract text" VS "Can extract text and structure". It is true that some OCR systems can handle trying to replicate the structure, but still today I think that's the exception, not the norm.

Not to mention it's helpful to separate the two because there is such a big difference in the difficulty of the tasks.

kmacdough8mo ago

> But that's something else, that's no longer just OCR ("Optical Character Recognition").

Lines often blur for technologies under such rapid evolution. Not sure it's helpful to nitpick the verbal semantics.

It is a fair question whether the OCR-inspired approach is the correct approach for more complex structured documents. But saying it's "not OCR" do doesn't seem meaningful from a technical perspective.

eeixlk8mo ago

htceaad t nofdnsy lyruuieo sieerrr t owcope?

j / k navigate · click thread line to collapse

0 comments

4 comments · 3 top-level

kmacdough8mo ago· 1 in thread

> But that's something else, that's no longer just OCR ("Optical Character Recognition").

Lines often blur for technologies under such rapid evolution. Not sure it's helpful to nitpick the verbal semantics.

CaptainOfCoitOP8mo ago

Not to mention it's helpful to separate the two because there is such a big difference in the difficulty of the tasks.

kmacdough8mo ago

> But that's something else, that's no longer just OCR ("Optical Character Recognition").

Lines often blur for technologies under such rapid evolution. Not sure it's helpful to nitpick the verbal semantics.

eeixlk8mo ago

htceaad t nofdnsy lyruuieo sieerrr t owcope?

j / k navigate · click thread line to collapse