What I have done in the past for this is to encode the messages as UTF-8 and separate them by 0xFF, since that byte value never occurs in UTF-8 encoding [0]. If the messages to be hashed are character strings, you have to decide on some encoding anyway in order to hash them.
[0] UTF-8 bytes always contain at least one zero bit: https://en.wikipedia.org/wiki/UTF-8#Encoding. Incidentally, if one wanted to create the UTF-8 equivalent of zero-terminated strings without reserving a character value (like NUL) as the sentinel value, one could use 0xFF for that.
I'm less sanguine than the authors about using Protobuf or CBOR for "canonical" serialization like this. I think it tends to "work until it doesn't". It's not what these formats were designed for, and you have to ask awkward questions like "is the order of struct fields guaranteed?" and "do integers always use the smallest possible representation?". This is more obvious for JSON, as the post points out, but every common serialization format I'm aware of has problems like this. I think we need a dedicated standard to do a good job of this, but I'm not aware of anything widespread. It's a surprisingly hard problem.
* Technically you only need the suffix for each variable-length piece, and you can omit the first one. But it's more complicated if the number of pieces is variable. (If you have two adjacent fixed-length pieces, and then you combine them, does the hash change?) This sort of penny-pinching is interesting to think about in a design that's going to come with a giant set of test vectors, but it's asking for trouble in an application doing something custom. This is another reason I'd like to have a standard here.
write(toUtf8(username));
write((byte) 0xFF); // never occurs in UTF-8, hence unambiguous separator
write(toUtf8(password));
to be most straightforward and parsimonious, and the assumption is maximally local.You don't need to be proficient at cryptography to be aware of the common attack classes and reasons we use prepackaged things like NaCl before going low level.
It's probably more useful to have a module within a course to discuss the current state of the art and learning some history about how the methods were chosen (e.g. NIST's AES, SHA2/3, and PQC open processes. I think making it very obvious that there are extremely good, quality, free tools out there would reduce the likelihood of someone DIYing some crap.
That said, I once spec'd using Ed25519 asymmetric signatures for webhooks sent out to customers, and later on one of our Elixir developers was complaining that the throughput was garbage. I was confused because https://ed25519.cr.yp.to/ boasts signing rates of ~27k/sec/core on very old hardware. Turns out they were using some "pure Elixir" library which had shit (over 1000x worse) performance. There wasn't any real surface area for attacks here, but there are plenty of devs who will blindly search package-manager-of-choice for an otherwise good encryption and get screwed. Not sure who blame in that scenario.
My guess is that it doesn't help and might even make things worse because now they'll think "Don't roll your own" was an instruction to plebs who didn't take that one semester cryptography course at school.
Obviously you can't mandate a high quality course into existence, but I definitely good value out of having it in the required curriculum.
If I have this box that says Bcrypt, but scrypt might be better. I can either spend a bunch of time re-implementing scrypt, or I can shove Bcrypt into there and move on. If Bcrypt is sufficient, then I don't really care if scrypt would be "better"
Sure, you should still use Argon2.
With the caveat that I may be up to two years out of date here: Last I checked Balloon was a bad choice (only a research implementation) and Argon2 didn't meet the requirements (unapproved primitive in BLAKE2).
With enough rigamarole can you get these into a government office? Probably. But scrypt and yescrypt (the default on most Linux systems) already fit the bill, so just use one of those.
Not using a real password KDF is a big issue. Using the wrong one, not so much.
A highend consumer RTX 4090 has 16000 CUDA cores and 24GB of memory, that's 1.5mb of memory per core.
It's hard to understand for non-crypto specialists. It uses notions which are unknown to most programmers like MAC or other *MACs.
So not sure who is the target audience for this.
This is of course bunk, because the boundary layer and level of abstraction matters, and the apparent target audience for this content marketing piece is any developer that might fall into the trap of assuming otherwise. The selection, integration, and configuration of cryptographic elements into an application carries as much significance for the strength of the resulting cryptosystem as the cryptographic qualities of the elements themselves, especially when considered by an attacker that seeks to drive a wedge into any gap available.
The article is obviously far from a comprehensive survey on the topic but does zero in on a few of the practical cases for hash functions, although you're not obliged to necessarily draw the same conclusions since (as the comments in these threads reveal) there are more alternatives than those directly discussed.
Eg it’s very to-the-point, doesn’t spend all its time talking about how the professionals are awesome and better than you, and gives actionable recommendations. Most “don’t roll your own crypto” articles don’t do that and just come off as being elitist, and don’t actually _help_ the reader.
If the user of your new Shiny Goat service used the password "ShinyGoat" then all the memory hard KDF shenanigans in the world won't help, attackers will guess "ShinyGoat", and that's correct, they're in.
If another user chose a 32 random alphanumerics then it doesn't matter if you just dropped in PBKDF2 with whatever default settings because the attackers couldn't guess 32 random alphanumerics no matter what.
The KDF comes into the picture only for users who've chosen aggressively mediocre passwords. Not so easy attackers will definitely guess them, not so hard that it's impossible. Users who insist their "password" must be a single English word, or who insist on memorizing their passwords and so nothing longer than six characters is acceptable. That sort of thing. The attackers can guess these passwords, but they need a lot of guesses so the KDF can make it impractical.
That's just not a plausible scenario for a real world attack and therefore it should not be a focus for your attention. You should use a real KDF, but PBKDF2 is fine for this purpose, any time you spend arguing about which KDF to use or implementing a different KDF, rather than solving actual defects in your system's security is a bad trade.
Proto bufs don't guarantee consistent serialization.
Using an encoding that (like Protobuf) has multiple representations for a message may cause you problems if you switch implementations - sha256(encode(msg)) might yield different hashes on different implementations of encode().
But the main risk is an encoding that has multiple interpretations of a single encoding (e.g. sha256(encode("admin", "true")) == sha256(encode("admint", "rue"))), and Protobuf (can be unserialized, and thus) doesn't have that problem.
ambiguous encoding? Nothing ambiguous about JSON, you don't even need any separator. Or merge them into json array.
length-extension attacks? appending non-whitespace to json makes it invalid (for sane decoders at least)
If you want a unique hash (like, for a hash table lookup) then you'll need to sort the keys of every object and use a particular implementation of JSON.stringify.
(Also, what do you mean by "you don't even need any separator?")
[1]: https://en.wikipedia.org/wiki/Canonical_XML
A lot of assumptions, or just that it's fixed length?
- 100s of hashes?
- 1000s of hashes?
- 1,000,000s of hashes?