The syntax shouldn't matter (you may not even being using a plain text syntax - or any syntax - anyway), you could treat an image or whatever as a single "special" character. Or just assign a linearly increasing ID (increasing in the order the text, images, etc flows) to each node.
Though that is basically another way to represent what i wrote above with having a pair of node pointers and a subrange (well, an index actually, the other end of the subrange is implicit if the node pointers are different). This is basically what the old HTML editing control Microsoft had back in the 90s used and that worked with the DOM tree (also what i used in a test editor i wrote some time ago). And yeah it isn't simple.