“YOLO” is not a valid hash construction (opens in new tab)

(blog.trailofbits.com)

140 pointsssklash1y ago73 comments

73 comments

50 comments · 10 top-level

layer81y ago· 8 in thread

> The issue we run into here is ambiguous encoding.

What I have done in the past for this is to encode the messages as UTF-8 and separate them by 0xFF, since that byte value never occurs in UTF-8 encoding [0]. If the messages to be hashed are character strings, you have to decide on some encoding anyway in order to hash them.

[0] UTF-8 bytes always contain at least one zero bit: https://en.wikipedia.org/wiki/UTF-8#Encoding. Incidentally, if one wanted to create the UTF-8 equivalent of zero-terminated strings without reserving a character value (like NUL) as the sentinel value, one could use 0xFF for that.

oconnor6631y ago

I usually recommend an 8-byte length suffix for each piece instead of a separator.* That way you don't need to make any assumptions about what bytes are legal. Putting the length in suffix position means you can still stream input into your hash state without knowing how long it's going to be in advance, and 2^64 bytes is enough for any practical application. (The maximum input size of SHA-256 is less than this.)

I'm less sanguine than the authors about using Protobuf or CBOR for "canonical" serialization like this. I think it tends to "work until it doesn't". It's not what these formats were designed for, and you have to ask awkward questions like "is the order of struct fields guaranteed?" and "do integers always use the smallest possible representation?". This is more obvious for JSON, as the post points out, but every common serialization format I'm aware of has problems like this. I think we need a dedicated standard to do a good job of this, but I'm not aware of anything widespread. It's a surprisingly hard problem.

* Technically you only need the suffix for each variable-length piece, and you can omit the first one. But it's more complicated if the number of pieces is variable. (If you have two adjacent fixed-length pieces, and then you combine them, does the hash change?) This sort of penny-pinching is interesting to think about in a design that's going to come with a giant set of test vectors, but it's asking for trouble in an application doing something custom. This is another reason I'd like to have a standard here.

thequux1y ago

It must be a length prefix or you're still vulnerable to length extension attacks. Suppose you had encoded "foo3"; the attacker could easily extend that to "foo3bar7".

2 more replies

layer81y ago

It depends on the use case. If, for example, you want to hash a username–password pair, I find

    write(toUtf8(username));
    write((byte) 0xFF);  // never occurs in UTF-8, hence unambiguous separator
    write(toUtf8(password));

to be most straightforward and parsimonious, and the assumption is maximally local.

2 more replies

hamandcheese1y ago

Does that mean the random suffix is effectively a secret? Otherwise an attacker could construct inputs with the separator, achieving the same thing as a simple separator, right?

1 more reply

kragen1y ago

bencode was designed to solve the canonicality problem. so is asn.1 cer. but it is sufficient to have an unambiguous encoding to prevent the vulnerabilities in this post; nonunique encodings will only produce interoperability problems, not security holes

akira25011y ago

If your data can't contain a NUL byte then you could just XOR all the bytes of the message by the separator character byte itself.

layer81y ago

If your data can't contain a NUL byte you can just use NUL as the separator.

dfox1y ago

That actually makes more sense than the usual opposite approach of using over-long encoding for NULs in the data.

krackers1y ago· 8 in thread

A cryptography course should be mandated in the curriculum of most universities, just so people gain some intuition about the types of attacks that are possible. Just yelling "don't roll your own crypto" isn't practical advice when most issues come from misusing primitives or combining primitives in a "weak" manner.

axoltl1y ago

Speaking as someone that has both broken crypto systems and designed ones that got broken (internally, in review, before they made it out), it takes a lot of practice to become proficient at cryptography. Sometimes a little bit of knowledge is worse than none at all.

gopher_space1y ago

The problem with repeating "don't roll your own" is that you're also going to cock up implementing someone else's solution.

viraptor1y ago

I would disagree. I learned the basics and done things like cryptopals and that was enough to be aware of everything in this post. At the same time, I understand the threshold of things I shouldn't touch much better. With the right curriculum, you can focus on the non-dangerous ideas, rather than spending time on the math behind RSA like my uni course did. (And ended up practically useless)

You don't need to be proficient at cryptography to be aware of the common attack classes and reasons we use prepackaged things like NaCl before going low level.

nick2381y ago

Software engineers should be made aware of such pitfalls, but I don't think a whole course is necessary or useful. It's very easy to build encryption that you can't crack, especially because the "types of attacks" is a truly endless font.

It's probably more useful to have a module within a course to discuss the current state of the art and learning some history about how the methods were chosen (e.g. NIST's AES, SHA2/3, and PQC open processes. I think making it very obvious that there are extremely good, quality, free tools out there would reduce the likelihood of someone DIYing some crap.

That said, I once spec'd using Ed25519 asymmetric signatures for webhooks sent out to customers, and later on one of our Elixir developers was complaining that the throughput was garbage. I was confused because https://ed25519.cr.yp.to/ boasts signing rates of ~27k/sec/core on very old hardware. Turns out they were using some "pure Elixir" library which had shit (over 1000x worse) performance. There wasn't any real surface area for attacks here, but there are plenty of devs who will blindly search package-manager-of-choice for an otherwise good encryption and get screwed. Not sure who blame in that scenario.

tialaramex1y ago

Do you have any hard evidence that somehow mandating a "cryptography course" fixes this?

My guess is that it doesn't help and might even make things worse because now they'll think "Don't roll your own" was an instruction to plebs who didn't take that one semester cryptography course at school.

plorkyeran1y ago

My primary takeaway from the cryptography course I took was a good understanding of why I shouldn't try to roll my own crypto, but a lot of that came down to the course design. Mine spent significantly more time on covering how various cryptography schemes were broken than on how to implement things, and a course which was the other way around could easily inspire false confidence.

reubenmorais1y ago

The cryptography course in my bachelor's was good enough to paint a picture of the complexity involved. By then I had already heard the "don't roll your own crypto" mantra, so maybe that primed me, but the semester course helped to get an appreciation for the subtle ways information leaks when you try to contain it. It also gave me some more confidence to push back if for example a colleague tried to convince me that the mantra doesn't apply to them/us.

Obviously you can't mandate a high quality course into existence, but I definitely good value out of having it in the required curriculum.

withinboredom1y ago

I've seen crypto papers (and cited!) that are worse than this advice. Not sure what you are trying to say here.

tptacek1y ago· 7 in thread

This is a good post, but there's a little too much ritual in here about password-based KDFs for my taste. Put all the mainstream KDFs on a dartboard, yes including PBKDF2, and throw a dart. I think you'll be fine. Bcrypt, the most popular password hash, has held up surprisingly well, and scrypt might still be one of the best options.

Aachen1y ago

Still a difference between "you'll be fine" and "if you're newly choosing, might as well not pick PBKDF2 where anyone with a regular GPU gets a 100x speedup compared to your server CPU". Bcrypt wasn't even designed for memory hardness but is so much more annoying to crack because it seems to stress (per my understanding) the GPU's memory bus just enough with the iirc 4k state table that's being randomly accessed

layer81y ago

In the browser, PBKDF2 can still make sense, because all modern browsers support it via SubtleCrypto.deriveKey(), and you may want to avoid the overhead of a JS-implemented alternative. Just select the maximum iteration count that is still acceptable for your use case and targeted devices (mobile devices tend to be the slowest).

1 more reply

Vecr1y ago

There's no point intentionally making your implementation weaker than it has to be.

delusional1y ago

The complement is also true. There's no point in making it stronger than it has to be.

If I have this box that says Bcrypt, but scrypt might be better. I can either spend a bunch of time re-implementing scrypt, or I can shove Bcrypt into there and move on. If Bcrypt is sufficient, then I don't really care if scrypt would be "better"

1 more reply

vintermann1y ago

Yes, it was conspicuous to me that for the two other examples, they mentioned "this has been used in a real attack!", but for key derivation functions, they didn't say that.

Sure, you should still use Argon2.

FreakLegion1y ago

I'd draw a line at the post's NIST-adjacent advice.

With the caveat that I may be up to two years out of date here: Last I checked Balloon was a bad choice (only a research implementation) and Argon2 didn't meet the requirements (unapproved primitive in BLAKE2).

With enough rigamarole can you get these into a government office? Probably. But scrypt and yescrypt (the default on most Linux systems) already fit the bill, so just use one of those.

tptacek1y ago

I have a really hard time taking the attack research on password KDFs all that seriously. I don't think there are many common threat models where "weaknesses" in password hashes are more than marginal issues.

Not using a real password KDF is a big issue. Using the wrong one, not so much.

1 more reply

trainofbit1y ago· 6 in thread

The RFC recommendation for 1G RAM or 64MB Argon PKDF is insane. Don't follow this advice. In a real world server, any API endpoint using this advice will quickly become a DOS vector. A saner value is 1MB for Argon. It stills blocks major GPU attacks, which is the whole point.

galdor1y ago

OWASP recommendations for Argon2id are 19MiB memory, iterations 2, parallelism 1. And following OWASP is not only a good idea for security but also makes it easy to justify with IT security, compliance, etc.

tptacek1y ago

It's been a little while since I've looked carefully but I would not take OWASP especially seriously on matters of cryptography. It helps to understand that OWASP is more of an affinity group than a carefully structured authority, and some of its official recommendations are more akin to wiki pages than real standards.

Dylan168071y ago

You have to be careful to toss around gigabytes, but what's unreasonable about 64MB? You should only be running about one per core, right?.

lmz1y ago

Some people run more threads than cores and if they do the KDF in the same thread you can see how that will end poorly.

1 more reply

whizzter1y ago

That's insanely bad advice, yes GPU memories was a tad anemic only a few years back so it could've been an option, BUT the introduction of raytracing (RTX) means that the GPU's needs to have a full world in memory so memory sizes has started to increase quickly, this is entirely disregarding AI workloads that might have had an even bigger impact on memory sizes.

A highend consumer RTX 4090 has 16000 CUDA cores and 24GB of memory, that's 1.5mb of memory per core.

cdelsolar1y ago

Yeah i had a server that did this and someone took it down by trying a few single-char passwords fast. My mitigation was just to rate limit the password guessing rate to once per second _globally_. Obviously not a huge fan of that idea, but what else am I supposed to do? I also thought that recommendation was bizarre if it could allow this so easily.

playingalong1y ago· 4 in thread

The article is not that different than "don't invent your own cryptography".

It's hard to understand for non-crypto specialists. It uses notions which are unknown to most programmers like MAC or other *MACs.

So not sure who is the target audience for this.

inopinatus1y ago

The problem with the general assertion, "don't roll your own crypto", is the potential for implicit-misconstruction-by-contrast of professionally rolled cryptography being a pluggable, install-and-forget black box solution.

This is of course bunk, because the boundary layer and level of abstraction matters, and the apparent target audience for this content marketing piece is any developer that might fall into the trap of assuming otherwise. The selection, integration, and configuration of cryptographic elements into an application carries as much significance for the strength of the resulting cryptosystem as the cryptographic qualities of the elements themselves, especially when considered by an attacker that seeks to drive a wedge into any gap available.

The article is obviously far from a comprehensive survey on the topic but does zero in on a few of the practical cases for hash functions, although you're not obliged to necessarily draw the same conclusions since (as the comments in these threads reveal) there are more alternatives than those directly discussed.

olliej1y ago

It is essentially “don’t roll your own crypto”, but instead being screeds of complexity it just highlights a few very basic and simple things that seem “obviously fine” to folk that are unfamiliar (eg doesn’t touch on any complex reasons for it being bad, and doesn’t just say “you’re stupid”). It then gives examples of these basic problems causing real world failures, and it then just says “this is what you should use instead”.

Eg it’s very to-the-point, doesn’t spend all its time talking about how the professionals are awesome and better than you, and gives actionable recommendations. Most “don’t roll your own crypto” articles don’t do that and just come off as being elitist, and don’t actually _help_ the reader.

tptacek1y ago

Trail runs a cryptography practice that does assurance work for cryptography engineers. They're not writing to encourage randomly-selected HN readers to design their own cryptosystems.

yarg1y ago

It was simple enough for me as a non-crypto guy - in fact, it seemed mostly obvious.

blauditore1y ago· 2 in thread

I'm no expert and a bit tired, but: Is the problem around hashing password + salt for a key just about the fact it can be brute-forced with enough recources, or did I miss something?

lilyball1y ago

Basically, yeah. It sounds like the problem is the difficulty of cracking it scales linearly, and a ton of work has gone into figuring out how to efficiently crack it. Modern KDFs are memory-hard, which is a resource that's a lot harder/more expensive to scale than computing power.

tialaramex1y ago

But as Thomas points out, this is almost certainly not a problem you should actually care about.

If the user of your new Shiny Goat service used the password "ShinyGoat" then all the memory hard KDF shenanigans in the world won't help, attackers will guess "ShinyGoat", and that's correct, they're in.

If another user chose a 32 random alphanumerics then it doesn't matter if you just dropped in PBKDF2 with whatever default settings because the attackers couldn't guess 32 random alphanumerics no matter what.

The KDF comes into the picture only for users who've chosen aggressively mediocre passwords. Not so easy attackers will definitely guess them, not so hard that it's impossible. Users who insist their "password" must be a single English word, or who insist on memorizing their passwords and so nothing longer than six characters is acceptable. That sort of thing. The attackers can guess these passwords, but they need a lot of guesses so the KDF can make it impractical.

That's just not a plausible scenario for a real world attack and therefore it should not be a focus for your attention. You should use a real KDF, but PBKDF2 is fine for this purpose, any time you spend arguing about which KDF to use or implementing a different KDF, rather than solving actual defects in your system's security is a bad trade.

1 more reply

hiddencost1y ago· 2 in thread

Oh fun. They actually give some erroneous advice.

Proto bufs don't guarantee consistent serialization.

conradludgate1y ago

> Note that, while many serialization methods will create unambiguous encodings, they don’t all necessarily produce unique encodings. For instance, JSON is largely insensitive to changes in whitespace and element ordering, so using JSON serializations produced by different libraries could lead to different hashes. Be careful!

JoachimSchipper1y ago

It's not ideal, but it's not a cryptographic risk.

Using an encoding that (like Protobuf) has multiple representations for a message may cause you problems if you switch implementations - sha256(encode(msg)) might yield different hashes on different implementations of encode().

But the main risk is an encoding that has multiple interpretations of a single encoding (e.g. sha256(encode("admin", "true")) == sha256(encode("admint", "rue"))), and Protobuf (can be unserialized, and thus) doesn't have that problem.

theamk1y ago· 2 in thread

It's interesting that using JSON encoding for all messages eliminates many of the problems.

ambiguous encoding? Nothing ambiguous about JSON, you don't even need any separator. Or merge them into json array.

length-extension attacks? appending non-whitespace to json makes it invalid (for sane decoders at least)

skybrian1y ago

Yes, but there are lots of ways to get different encodings (and therefore hashes) for the same data: whitespace, JSON object key order, and the different ways of escaping characters.

If you want a unique hash (like, for a hash table lookup) then you'll need to sort the keys of every object and use a particular implementation of JSON.stringify.

(Also, what do you mean by "you don't even need any separator?")

magicalhippo1y ago

XML has a complex solution for this, canonicalization[1], to enable signed XMLs[2]. Seems some folks are trying the same with JSON[3].

[1]: https://en.wikipedia.org/wiki/Canonical_XML

[2]: https://en.wikipedia.org/wiki/XML_Signature

[3]: https://www.rfc-editor.org/rfc/rfc8785

1 more reply

Dylan168071y ago· 1 in thread

> While Keccak doesn’t suffer from the length-extension attacks that HMAC is meant to address, the phrase “simply prepending the message with the key” carries a lot of assumptions about key length and key formatting with it.

A lot of assumptions, or just that it's fixed length?

adgjlsfhk11y ago

also that the key is never used for anything else (which could allow for unintentional collisions)

jgalt2121y ago

At one point do I need to be concerned about this? i.e. have someone review hash generating code sites.

- 100s of hashes?

- 1000s of hashes?

- 1,000,000s of hashes?

j / k navigate · click thread line to collapse

73 comments

50 comments · 10 top-level

layer81y ago· 8 in thread

> The issue we run into here is ambiguous encoding.

oconnor6631y ago

thequux1y ago

It must be a length prefix or you're still vulnerable to length extension attacks. Suppose you had encoded "foo3"; the attacker could easily extend that to "foo3bar7".

2 more replies

layer81y ago

It depends on the use case. If, for example, you want to hash a username–password pair, I find

    write(toUtf8(username));
    write((byte) 0xFF);  // never occurs in UTF-8, hence unambiguous separator
    write(toUtf8(password));

to be most straightforward and parsimonious, and the assumption is maximally local.

2 more replies

hamandcheese1y ago

Does that mean the random suffix is effectively a secret? Otherwise an attacker could construct inputs with the separator, achieving the same thing as a simple separator, right?

1 more reply

kragen1y ago

akira25011y ago

If your data can't contain a NUL byte then you could just XOR all the bytes of the message by the separator character byte itself.

layer81y ago

If your data can't contain a NUL byte you can just use NUL as the separator.

dfox1y ago

That actually makes more sense than the usual opposite approach of using over-long encoding for NULs in the data.

krackers1y ago· 8 in thread

axoltl1y ago

gopher_space1y ago

The problem with repeating "don't roll your own" is that you're also going to cock up implementing someone else's solution.

viraptor1y ago

You don't need to be proficient at cryptography to be aware of the common attack classes and reasons we use prepackaged things like NaCl before going low level.

nick2381y ago

tialaramex1y ago

Do you have any hard evidence that somehow mandating a "cryptography course" fixes this?

plorkyeran1y ago

reubenmorais1y ago

Obviously you can't mandate a high quality course into existence, but I definitely good value out of having it in the required curriculum.

withinboredom1y ago

I've seen crypto papers (and cited!) that are worse than this advice. Not sure what you are trying to say here.

tptacek1y ago· 7 in thread

Aachen1y ago

layer81y ago

1 more reply

Vecr1y ago

There's no point intentionally making your implementation weaker than it has to be.

delusional1y ago

The complement is also true. There's no point in making it stronger than it has to be.

1 more reply

vintermann1y ago

Yes, it was conspicuous to me that for the two other examples, they mentioned "this has been used in a real attack!", but for key derivation functions, they didn't say that.

Sure, you should still use Argon2.

FreakLegion1y ago

I'd draw a line at the post's NIST-adjacent advice.

With enough rigamarole can you get these into a government office? Probably. But scrypt and yescrypt (the default on most Linux systems) already fit the bill, so just use one of those.

tptacek1y ago

Not using a real password KDF is a big issue. Using the wrong one, not so much.

1 more reply

trainofbit1y ago· 6 in thread

galdor1y ago

tptacek1y ago

Dylan168071y ago

You have to be careful to toss around gigabytes, but what's unreasonable about 64MB? You should only be running about one per core, right?.

lmz1y ago

Some people run more threads than cores and if they do the KDF in the same thread you can see how that will end poorly.

1 more reply

whizzter1y ago

A highend consumer RTX 4090 has 16000 CUDA cores and 24GB of memory, that's 1.5mb of memory per core.

cdelsolar1y ago

playingalong1y ago· 4 in thread

The article is not that different than "don't invent your own cryptography".

It's hard to understand for non-crypto specialists. It uses notions which are unknown to most programmers like MAC or other *MACs.

So not sure who is the target audience for this.

inopinatus1y ago

olliej1y ago

tptacek1y ago

Trail runs a cryptography practice that does assurance work for cryptography engineers. They're not writing to encourage randomly-selected HN readers to design their own cryptosystems.

yarg1y ago

It was simple enough for me as a non-crypto guy - in fact, it seemed mostly obvious.

blauditore1y ago· 2 in thread

I'm no expert and a bit tired, but: Is the problem around hashing password + salt for a key just about the fact it can be brute-forced with enough recources, or did I miss something?

lilyball1y ago

tialaramex1y ago

But as Thomas points out, this is almost certainly not a problem you should actually care about.

1 more reply

hiddencost1y ago· 2 in thread

Oh fun. They actually give some erroneous advice.

Proto bufs don't guarantee consistent serialization.

conradludgate1y ago

JoachimSchipper1y ago

It's not ideal, but it's not a cryptographic risk.

theamk1y ago· 2 in thread

It's interesting that using JSON encoding for all messages eliminates many of the problems.

ambiguous encoding? Nothing ambiguous about JSON, you don't even need any separator. Or merge them into json array.

length-extension attacks? appending non-whitespace to json makes it invalid (for sane decoders at least)

skybrian1y ago

Yes, but there are lots of ways to get different encodings (and therefore hashes) for the same data: whitespace, JSON object key order, and the different ways of escaping characters.

If you want a unique hash (like, for a hash table lookup) then you'll need to sort the keys of every object and use a particular implementation of JSON.stringify.

(Also, what do you mean by "you don't even need any separator?")

magicalhippo1y ago

XML has a complex solution for this, canonicalization[1], to enable signed XMLs[2]. Seems some folks are trying the same with JSON[3].

[1]: https://en.wikipedia.org/wiki/Canonical_XML

[2]: https://en.wikipedia.org/wiki/XML_Signature

[3]: https://www.rfc-editor.org/rfc/rfc8785

1 more reply

Dylan168071y ago· 1 in thread

A lot of assumptions, or just that it's fixed length?

adgjlsfhk11y ago

also that the key is never used for anything else (which could allow for unintentional collisions)

jgalt2121y ago

At one point do I need to be concerned about this? i.e. have someone review hash generating code sites.

- 100s of hashes?

- 1000s of hashes?

- 1,000,000s of hashes?

j / k navigate · click thread line to collapse