undefined | Better HN

0 pointslayer81y ago0 comments

It depends on the use case. If, for example, you want to hash a username–password pair, I find

    write(toUtf8(username));
    write((byte) 0xFF);  // never occurs in UTF-8, hence unambiguous separator
    write(toUtf8(password));

to be most straightforward and parsimonious, and the assumption is maximally local.

0 comments

4 comments · 2 top-level

oconnor6631y ago· 2 in thread

I agree that this approach works in the use case you're describing, but there are a couple reasons I wouldn't want to teach it broadly. Mainly, a caveat like "remember to only use this with UTF-8 strings and not with arbitrary bytes" is exactly the sort of thing applications routinely get wrong in the wild. Also, focusing on wacky edge cases, do we really know that our UTF-8 strings are valid UTF-8? Maybe we're comfortable staking our security on that in a language like Rust, where invalid strings are literally undefined behavior anyway. But what about in Go for example, where this snippet writes 0xff to stdout with no warnings whatsoever?

    func main() {
        s := string([]byte{0xff})
        fmt.Println(s)
    }

layer8OP1y ago

> do we really know that our UTF-8 strings are valid UTF-8?

I’m explicitly talking about cases where you UTF-8-encode right where you’re hashing.

With the length approach, you also have to take care you’re using the correct byte length, for example, and not forget the final length (i.e. it’s a suffix and not just a separator). And for user input like passwords, you’d probably want to NFC-canonicalize before hashing. There’s always things you have to pay attention to.

You can encapsulate it in a function that hashes a list of character strings passed as its (typed) argument. Then you have a safe and reusable function.

oconnor6631y ago

> I’m explicitly talking about cases where you UTF-8-encode right where you’re hashing.

Totally, I get that. I think what you're pointing out is that in a language like Python for example, the scenario I'm trying to describe is meaningless. You can't make an "invalid string" in Python (as far as I know, without resorting to FFI), because it checks things like that during string decoding, and it'll just crash.

But languages like C/C++/Rust/Go work differently. As these languages are commonly used, the string -> UTF-8 step is actually a no-op, because the assumption is that strings are already UTF-8 in memory. (In C or C++ this is usually in the programmer's head rather than in the types, but it's a common choice.) In these languages it's possible for the result of that no-op "encoding" to be invalid, if the input string was invalid somehow. This is a pretty weird edge case and almost certainly a bug that the application needs to fix fix anyway, but if we're noodling about cryptography best practices, it might be nice to limit the "blast radius" of a bug like that.

1 more reply

jgalt2121y ago

why are you hashing "username–password"? Isn't username a key index for the lookup of hashed(password), or do you select all hashed("username–password") and see if any match and if so, authenticate?

j / k navigate · click thread line to collapse