undefined | Better HN

0 pointslosvedir1mo ago0 comments

The question was whether you were giving it the rendered image and using the model's visual modal capability, or feeding back in the textual SVG.

It's hard to "imagine" what the rendered SVG looks like, for both humans and LLMs, so just iterating on text won't really be as useful of a test. But if you show it what it rendered, it might observe the bad-looking bicycle and be able to fix the text that way.

0 comments

1 comments · 1 top-level

irthomasthomas1mo ago

"I've even experimented with feeding the broken pelican svgs to an image model to look for flaws, and they still fail to spot the broken elements."

j / k navigate · click thread line to collapse