None of these can do text well. There's a model that does do text and composition well, but the name escapes me. And the general quality is much lower overall, so it's a pretty heavy tradeoff.
I believe this is at least one solution, and one that the folks at stability themselves were pushing hard as a next step forward in the development of LLMs.