It's hard to "imagine" what the rendered SVG looks like, for both humans and LLMs, so just iterating on text won't really be as useful of a test. But if you show it what it rendered, it might observe the bad-looking bicycle and be able to fix the text that way.