undefined | Better HN

0 pointscoredog648mo ago0 comments

Something that's in my personal backlog is fine-tuning of TrOCR for purse seine observer workbooks. The default TrOCR is expecting English words, and so the FAO species codes used in the workbook result in terrible accuracy. LLMs do poorly in this space because you'll commonly see repeats (e.g. 100 out of 120 samples all have the same species code) which then leads to hallucination.

0 comments

1 comments · 1 top-level

CaptainOfCoit8mo ago

You might enjoy this :) https://news.ycombinator.com/item?id=45640594 (DeepSeek OCR)

j / k navigate · click thread line to collapse