undefined | Better HN

0 pointssushid1y ago0 comments

Is that not just traditional OCR applied on top of LLM?

0 comments

2 comments · 2 top-level

It's possible they have a software layer that does that. But I was assuming they don't, because the open source multimodal models don't.

No it’s not, it’s a multimodal transformer model.

j / k navigate · click thread line to collapse