There are a couple of reasons.. 1) That size (even for the large) is too much for multiple languages with good BLEU scores. 2) Encoder and decoder models don't tend to get trained for translation as much as e.g. GPT models with large translation texts in their datasets across multiple languages (with exceptions such as T5 translation task).