Seems like the complexity of the client-side code comes from rendering the DOM tree based on the CSS and constraints of the drawing area, hardware, etc. So yeah, just manipulating the tree would seem to be pretty fast.
Of course, it's being compared to spitting out an arbitrary stream of characters to the client without the need to parse HTML on the server side, which is probably faster. Years of experience with the string rendering approach tells us it's definitely more error prone. For example it's impossible to tell whether the character stream is well formed HTML without traversing all the code paths.