undefined | Better HN

0 pointschrismorgan1y ago0 comments

Sorry, but it’s like the reaction to https://news.ycombinator.com/item?id=42359067: “this is neither a new idea nor a good one”. Ad-hoc escaping for serialisation is just fundamentally a really bad approach, and although it used to be how almost everything did it, nothing serious has tried working that way for at least a decade, probably closer to two by this point, because as an industry we’ve actually learned things.

Your particular angle on it might be novel, I’m not certain, but only because it’s an even worse idea. Formerly, there was a simple rule: remember to escape data when emitting HTML. You’ve replaced that with something complex: some things will be entity-encoded, and others won’t; and often you won’t want things entity-encoded. (You’re inappropriately tying API to HTML, by the looks of it.) This is just a bad abstraction that makes errors sure, and errors will lead to some mangled text in all cases, and security bugs in some cases.

It’s very similar to how the consensus is now well-established (though it’ll take plenty more time to be fully applied) in memory safety, a related security field: avoid languages like C, they’re too dangerous, use memory-safe languages instead.

Another well-established principle: parse at input, let your system deal with data, and serialise at output. Your approach instead serialises at input—and that only most of the time—and hopes you never need to parse.

I’m certain that I’ve upset you, and I’m sorry about that, but the approach really, truly is that bad, and I did just want to appeal to you to think it over. Look, if you can find someone experienced in web security and frontend framework design and things like that, preferably who’s been doing this for fifteen or twenty years, present the whole design to them, and see what they say. I genuinely expect horror.

As for your changes to escape_html, I’ll just ask why? because you’ve compounded the error. There is no plausible purpose for all the additions you’ve made. By encoding, you declare intent to feed the value to an HTML or XML parser in data state, single-quoted attribute state or double-quoted attribute state. In none of those states do any of these additions achieve anything. And if you’re making them for some other purpose, then you don’t want the HTML escaping in the first place.

Fifteen years ago, I would encounter both double-escaped HTML and entity-encoding in non-HTML contexts, far too often. Now, I feel like it’s years since I’ve seen such a thing, outside of text/plain parts of emails (but developers and marketers are frequently awful about text/plain parts).

The only advantage I can see in entity-encoding more characters, especially all normal punctuation, is that bad escaping is going to be caught earlier. But I say, if you reckon those worth escaping, why not go the whole hog and .replace(/./ug, s => '&#' + s.codePointAt(0) + ';')?

I’m not sure what your experience is, but the impression I’m getting is that you’ve been largely just JavaScript. A lot of the problems you’re producing were best-known through PHP, and better type systems are a large part of the best solutions to these things. If I’m sounding you out correctly, I’d recommend that you look into and learn some strong statically-typed language, preferably one with algebraic data types (also called sum types). Rust has been my own preference for the last decade, but there are plenty of others that would guide you in similar lessons, like (in no particular order) C♯, Swift, Elixir, Kotlin, Scala, Haskell. (I don’t include TypeScript in this list, although it could definitely help with some of the lessons, because it’s too compromised by JavaScript limitations.) If you haven’t had any of this sort of experience, getting it will help you to write better JavaScript, and make better systems.

—⁂—

Now, for a specific matter:

> You hypothetically could store escaped HTML, but you'd have to show me where I'm encouraging it

<https://docs.cheatcode.co/joystick/node/escape_html#:~:text=...>

(And if you unconditionally entity-encode query string parameters… I don’t even want to think about the implications of that.)

I would also point out that https://docs.cheatcode.co/joystick/node/app/api/sanitization says:

> Sanitization is the process of scrubbing unexpected or unwanted HTML tags from the values returned by your API.

But this is not what you’ve implemented. You don’t actually scrub, you entity-encode.

0 comments

2 comments · 1 top-level

rglover1y ago· 1 in thread

After all of this, it's not terribly clear what I'm doing (aside from making escaping only through a one-off function) that's "bad." You mention "a lot of the problems you're producing" but then fail to give a what or why those things are problems.

If there's something specific that you can point me toward, I'll take a look and work on improvements. But honestly, the majority of the above reads like you just want me to feel bad and quit (and again, doesn't provide a clear answer to "what to do instead"). If you can succinctly explain why my approach is bad and offer examples of fixes (in relation to the current design of the component system), I'm all ears.

rglover1y ago

After doing a bit of digging and back-and-forth with an LLM, I think I understand the problem now.

By doing escaping solely on the server, it creates two problems:

1. You're leaving escaping up to good behavior which will inevitably lead to some value that needs escaping being missed.

2. You create a double-escape problem on the client as you may need to render the raw output (e.g., rich text) and a server-escaped value would require an "unwinding" on the client.

Beyond this, I also corrected what you pointed out in the docs as that was the opposite of what I intended for that function.

I think the approach will be to do some automatic escaping on the client and then pass some special functions to the render() method on components to allow rendering raw, unescaped user data (i.e., a dangerouslySetInnerHTML equivalent).

If there's anything else specific that you'd add, let me know.

---

You seem like you have a lot of knowledge, but in the future, if you want someone to listen and comprehend what you're saying, avoid the fatalistic, theatrical language and emphasis.

Your point is incredibly helpful and showed me something I wasn't thinking about, but the way you communicated it almost made me ignore it.

If the above isn't correct (or misguided), it'd be best if you just plainly explain why and what should be done instead (and not a wholesale "this could never work, burn it down" response).

j / k navigate · click thread line to collapse