I dunno. I totally missed this and will check it out.
- Huggingface is less likely to "cheat" by training on tests than other orgs, I think.
- Some finetunes are really good at a particular test (like XWin). This isnt necessarily a bad thing, if they are good at a specific niche.