Something that's in my personal backlog is fine-tuning of TrOCR for purse seine observer workbooks. The default TrOCR is expecting English words, and so the FAO species codes used in the workbook result in terrible accuracy. LLMs do poorly in this space because you'll commonly see repeats (e.g. 100 out of 120 samples all have the same species code) which then leads to hallucination.