> The medium is superfluous.
No it's not. Generating an SVG is asking the model to write text that can be rendered as an image.
Generating an image directly skips the intermediate step and directly outputs an image: so the accuracy is in a completely different league. The models people ask for SVGs typically cannot do this, only specially post-trained variants can.
(An LLM will do the svg of a pelican on a bike much more accurately btw.)