Is there interest in benchmarking the proprietary LLMs for translation? Curious as I often use Gemini 3 Flash, but I have no idea how good it is for my language family. I prefer open models (in fact the smaller the better for offline), but it'd be useful to know how well the Big Three do.
We did some benchmarking of them internally, but not sure if we'll publish the detailed results.
Just in case, keep an eye on https://huggingface.co/spaces/facebook/bouquet: if we release the evaluation results, they will be there.