Until this paper (https://arxiv.org/abs/2305.14705) indicated they apparently benefit far more from Instruct tuning than dense models, it was mostly a "good on paper" kind of thing.
In the paper, you can see the underperformance i'm talking about.
Flan-Moe-32b(259b total) scores 25.5% on MMLU pre Instruct tuning and 65.4 after.
Flan 62b scores 55% before Instruct tuning and 59% after.