Source? I haven't seen anything like that for ARC-AGI performance.
Also, if it makes that big of a difference, then make a renderer for your agent that looks like the web page and have it solve them in the graphical interface and funnel the results to the API. I guarantee you won't get better performance, because the AGI is going to have to "understand" the raw data can be represented as a 2D matrix regardless of whether it gets a 2D matrix of pixels or a 2D matrix of enumeration in JSON. If anything, that makes it a more difficult problem for a AI system that "speaks" in tokens.