Your particular angle on it might be novel, I’m not certain, but only because it’s an even worse idea. Formerly, there was a simple rule: remember to escape data when emitting HTML. You’ve replaced that with something complex: some things will be entity-encoded, and others won’t; and often you won’t want things entity-encoded. (You’re inappropriately tying API to HTML, by the looks of it.) This is just a bad abstraction that makes errors sure, and errors will lead to some mangled text in all cases, and security bugs in some cases.
It’s very similar to how the consensus is now well-established (though it’ll take plenty more time to be fully applied) in memory safety, a related security field: avoid languages like C, they’re too dangerous, use memory-safe languages instead.
Another well-established principle: parse at input, let your system deal with data, and serialise at output. Your approach instead serialises at input—and that only most of the time—and hopes you never need to parse.
I’m certain that I’ve upset you, and I’m sorry about that, but the approach really, truly is that bad, and I did just want to appeal to you to think it over. Look, if you can find someone experienced in web security and frontend framework design and things like that, preferably who’s been doing this for fifteen or twenty years, present the whole design to them, and see what they say. I genuinely expect horror.
As for your changes to escape_html, I’ll just ask why? because you’ve compounded the error. There is no plausible purpose for all the additions you’ve made. By encoding, you declare intent to feed the value to an HTML or XML parser in data state, single-quoted attribute state or double-quoted attribute state. In none of those states do any of these additions achieve anything. And if you’re making them for some other purpose, then you don’t want the HTML escaping in the first place.
Fifteen years ago, I would encounter both double-escaped HTML and entity-encoding in non-HTML contexts, far too often. Now, I feel like it’s years since I’ve seen such a thing, outside of text/plain parts of emails (but developers and marketers are frequently awful about text/plain parts).
The only advantage I can see in entity-encoding more characters, especially all normal punctuation, is that bad escaping is going to be caught earlier. But I say, if you reckon those worth escaping, why not go the whole hog and .replace(/./ug, s => '&#' + s.codePointAt(0) + ';')?
I’m not sure what your experience is, but the impression I’m getting is that you’ve been largely just JavaScript. A lot of the problems you’re producing were best-known through PHP, and better type systems are a large part of the best solutions to these things. If I’m sounding you out correctly, I’d recommend that you look into and learn some strong statically-typed language, preferably one with algebraic data types (also called sum types). Rust has been my own preference for the last decade, but there are plenty of others that would guide you in similar lessons, like (in no particular order) C♯, Swift, Elixir, Kotlin, Scala, Haskell. (I don’t include TypeScript in this list, although it could definitely help with some of the lessons, because it’s too compromised by JavaScript limitations.) If you haven’t had any of this sort of experience, getting it will help you to write better JavaScript, and make better systems.
—⁂—
Now, for a specific matter:
> You hypothetically could store escaped HTML, but you'd have to show me where I'm encouraging it
<https://docs.cheatcode.co/joystick/node/escape_html#:~:text=...>
(And if you unconditionally entity-encode query string parameters… I don’t even want to think about the implications of that.)
I would also point out that https://docs.cheatcode.co/joystick/node/app/api/sanitization says:
> Sanitization is the process of scrubbing unexpected or unwanted HTML tags from the values returned by your API.
But this is not what you’ve implemented. You don’t actually scrub, you entity-encode.