I kind of don't get why if you want to display something in a web browser you'd generate anything other than HTML.
The only exposure the back-end has to HTML is streaming the static files to the browser. Which can be done in small chunks.
If your back-end is rendering HTML with every request, it has to do a lot more work. It has to load HTML templates into memory and insert strings into them.
Just raw structs of data? Or do you turn that back into HTML?
Now you've got two sets of templates to cope with...
Why would I care about how much effort it is for the server to generate? It's already generating HTML from templates, and it's more-or-less infinitely capable of doing so.
In practice, I doubt this is much slower than serializing JSON. Keeping a couple kilobytes of HTML templates in memory is nothing. Conversely, running a whole vdom on the frontend (typically more resource-constrained than the server) is a much bigger performance issue.
So how do we still get a fancy SPA website? Build it all down to a simple zip bundle, the ARM can serve those static files just fine. The SPA talks to the ARM via a few JSON APIs. Very nice clean boundary.
Why are you then offloading rendering HTML from JSON to a painfully slow scripting language on the client?