The DOM allows multiple text nodes of an element.
When you update the React tree with a different/longer paragraph node, it's actually removing the element and replacing it with a new one and the selection state is blown away. This is solved in non-text elements by a very simple diffing algorithm that says, as long as it's not a list with .map (which why is why .map triggers a linter rule warning you to use keys), if it is an element with the same tag (<a ...> is the same as <a ...> but different than <img .../>) then React uses DOM nudging to modify the element as opposed to unmounting it from the DOM and replacing it.
Text nodes work differently in the DOM. If you open up devtools and edit a paragraph while its selected in the browser, you'll see the same effect. Try doing the manipulation with JS, same thing (document.querySelector(a_selector).innerText += 'sdfsdf').
Using Fragments instructs React to append multiple text nodes to the same parent, which means we're simply appending nodes instead of blowing them away, which solves the selection state issue.
All made possible by the fact that browsers just cleanly reconstitute the text fragments back together into display text.
Unfortunately this approach does seem to break the built-in apple screenreader, but then I placed the screenreader on ChatGPT and saw how it already totally was failing. It seems surprising to me that OpenAI wouldn't have addressed this access issue. I guess there may be a whole extensive implementation issue with streaming text and screenreaders in general.