Text input and manipulation is so incredibly ugly and hard, that basically nobody cares it enough to to it really well.
Unicode to start is complicated. Then you have things like bidirectional text.
Line breaking is complicated.
There are no accepted practices for a lot of use cases.
Hyperlinks. Embedded content like images.
Copying/inserting html and other formats.
Styling content.
Rendering 100 000 lines of styled content without screwing up user input.
Line-heights. Tabs. Font definitions that are ambiguous an inconsistent (a lot of font tables today still don't provide enough space for accents on capital letters, think Swedish - so you have broken looking text or have to 'fake it'). Screwed up Kerning tables.
No established standard for real pixelation and rendering - zoom in and you'll see font's can be rendered a variety of ways.
Emojis, fat finger cursor navigation on mobile devcies, input managers.
It's hard to describe how messy it gets because none of it is academic, it's not some scary-hard algorithm - it's just an incredibly ugly pile of cross-cutting code with a zillion little bits and pieces of corner cases.
And a big one for gaming engines: they don't render text 'natively'. You basically have to create textures. Every big of text, it's own texture. How big is the texture for 100 000 lines of code? Big.
This is actually one are where Qt falls flat. In the 'new and improved' Qt, they do everything in the GPU, if you try to load 100 000 lines of something the app will use up 1G of memory and puke. So then you have to magically code your way around it.
All for what? Why would you want to render text in a gaming engine in the first place ...
Then you discover that 'apps' and 'games' are really different things, and they use different tech for good reason. It only gets ugly when they really have cross paths.