It's straightforward and effective when password hashing happens client-side (like local encryption for a bitcoin wallet), but using scrypt or bcrypt server-side at effective difficulty levels will either destroy your L1/L2 cache or exceed your CPU budget.
In practice, from what I've seen, this means scrypt is never used, and bcrypt is used with a work factor that's tuned low. I think this is just a fundamental asymmetry, and that at the end of the day people use bcrypt so that they can say "we used bcrypt!" and get a nod when their entire user database is compromised, even though it provides very little value the way it's deployed.
Actually the Password Hashing Competition has shown this asymmetry can be exploited to the server's advantage: one of the ideas being studied is called server relief.
For example instead of storing the scrypt hash on the server you store the SHA256 hash of the scrypt hash (and the salt alongside), and ask the client to send the scrypt hash to authenticate. This way the per authentication cost to the client is expensive (one scrypt hash) but the cost to the server is cheap (only one SHA256 hash).
Attackers cannot take shortcuts. They cannot send random scrypt hashes to the server while skipping the scrypt computation step, because the space of possible scrypt hashes is just too large to brute force.
The only downside of server relief is that you need custom code on the client-side to do the computation. So it works with javascript in browsers, but not with traditional clients like an IMAP or POP3 mail app.
Edit: @__david__: yes you could SHA256() only the hash part of the scrypt hash, as opposed to the hash & salt parts. It is an implementation detail --not really important. And regardless of how you do it, if the DB get stolen the SHA256 hashes are useless to authenticate.
Edit: @espadrine: no, breaking a SHA256(scrypt()) hash is no easier than breaking a scrypt() hash. Server relief with SHA256 introduces no weaknesses. It is strictly more secure than standard scrypt hashes.
Edit: @plantbased: I recommend against MD5(scrypt()), because it is likely preimage attacks will be found on MD5 in the near future, so someone stealing your database of MD5(scrypt()) hashes will be capable of computing an arbitrary scrypt hash that authenticates to a given user. But you could certainly do MD5-crypt(scrypt()) with MD5-crypt being the salted & iterated version by PHK.
Edit: Thinking about it more, I'm guessing it is for the case where your DB gets stolen. With the salted hashes, attackers can't just send the direct scrypt hash down to the server to get authenticated.
You may be able to detect this with testing, but not necessarily. And yes, it'll probably happen sooner or later, given the nature of JS compilers.
I don't know if there's a method that avoids this problem that still has the advantages that you mention.
IMHO While MD5 has a fraction of the keyspace of SHA2, it's still a very hard problem to reverse it and intuitively it seems this might provide a huge improvement over salted and stretched (multiple rounds) of md5 on the server.
Thanks again.
That, and the major reason we do hashes even over TLS: if the server gets cracked and the attacker has access to the stored hashes, breaking a SHA256 is trivial (ie, brute-forcing b when knowing s = SHA256(b)). Then, the attacker can assume the identity of any individual whose SHA256 s got broken, by sending the bcrypt b directly from a crafted client, without having to find p where b = bcrypt(p).
2. Even with a lower work factor, bcrypt is drastically better than salted SHA.
3. Companies with auth loads so high that they really need to tune bcrypt should also be using dedicated auth servers, which (a) improve security in other ways and (b) allow for a lot of flexibility in handling work factors.
Isn't that a security risk of it's own though? Wouldn't creating a rainbow table for (non-salted) passwords hashed with the default bcrypt settings then make sense again?
From the article, even 74kh/s is going to tear through a lot of passwords, but with a secure element in the mix, offline attacks go nowhere.
Facebook has 900M monthly users, 30M per day and probably most of those are cookied in. Even if not that's only 350 authentications/sec...
You need to make sure your login system can handle malicious clients, brute-force attempts from individual IPs, or against particular accounts, dropping back to CAPCTHAs or some other means of resource control when things look bad. This isn't a particularly easy problem to solve, and if you do it incorrectly you open yourself and your users up to a denial-of-service.
IMHO, some of the new fancy password-based authentication protocols, like AugPAKE, which do hashing client-side, might ease this problem... but they're still stateful (SYN floods anyone?) and therefore still depend upon good DoS protection.
Edit: Oh poop, you're djb.
The systems involved actually doesn't tend to handle this many in reality (the ISP in question restrict it to some 20-30k/s to our system). These systems are starting to be better at supporting keeping state through restarts, but there can still be pretty impressive spikes of authentication traffic if the wrong system is rebooted (or crashes!).
To the larger point, I do agree that some of this starts to look to be a bit of security theater. If a user picks a laughably weak password, and the server is completely compromised, it doesn't much matter what password storage scheme you used, and on the other hand if the user picks a great password, especially if he never reuses it, the consequences of losing that hash is limited.
And using scrypt with a decent work factor isn't 100 times harder, its tens or hundreds of thousands of times harder.
Additionally, I think one of the benefits of scrypt is that it is specfically designed to deter GPU or FPGA attacks because it is memory intensive instead of just CPU.
Hardware is cheaper than trust, or litigation.
(And there's always rate-limiting)
In the time it takes for your users to give up on the hourglass, assume their login request is hung, and start double-clicking submit, you have now DDoS'd yourself. That's about 750ms I think...
Also, try this sometime -- clear your cookies and try to login to Google, or Twitter, or Facebook. Time how long it takes before you see the first byte of a session cookie. Hint: Not Long Enough.
Password hashing is embarrassingly parallel. The botnets don't cost the attackers anything to hash your password on 100,000 CPUs all at once, after they've decided to target you. Moore's law? Also not helping.
We need something better. I'm working on just that, I'm sure it will have more than it's share of naysayers, but I'm giving it my best.
I can probably count on one hand the number of times I've seen an application use bcrypt or scrypt and define their work factors. The common case seems to be sticking to the defaults that their chosen library ships with. While these may be somewhat less than ideal, it's universally true that these schemes are stronger than salted passwords fed to a fast cryptographic hash.
Perhaps this is the real issue: lack of sufficient budget for CPUs that can reliably get the job done is a reflection on the priority organizations place on information security.
Comparatively, if they hit with 100 req/s for normal front-end data, caches can easily handle that. So really, adding these types of login checks to any publicly available login page ends up creating an on-demand DDOS target.
From the password cracking side, the asymmetry comes from the fact that an organization has to have X amount of server resources to service Y number of requests, but an attacker can utilize the same X amount of server resources to service just one thing, breaking the exact password they are looking for. This is further exaggerated by the fact that generally for an organization those X server resources also need to more than just service logins (like actually serve content or API data).
Few downsides that immediately come to mind:
- Some people run NoScript or have niche devices that mangle JavaScript (no login for you!)
- The client now has to generate the salt. So you'll want the client to be using a modern browser which supports window.crypto.getRandomValues(); (so IE 11 or newer)
- Low power (either electrically or performance wise) might struggle with the work factor and that might reduce the user experience (the browser might become unresponsive)
- Changing the crypto function is now quite annoying/work intensive. Effectively the client will now need to generate the old hash and the new hash, and send both. You'll need to check the old, check the new, and store the new if either are correct.
- There are some cross-site-scripting and forged request concerns. But those exist for all login pages frankly.
- You'll still need to do SOME work on the server to convert the raw client hash into a storable version (in case of compromise) so they cannot just re-send what is in the database for login (essentially the hash would be a plain text password for all intents and purposes). You could reduce the work factor, but not remove it entirely.
Essentially trusting a browser is a bad bet because too many browsers suck. I'd prefer to put my trust in a more known quantity, the web-server, even if it costs me a few cent in usage than do the extra dev', extra testing, and extra tech' support required to get browsers to do it.
Unfortunately there are still a lot of IE 6-8, Android 2.xx, and so on users who drag everyone down.
Moxie seemed to hint at this problem by mentioning local bitcoin wallets--client-side hashing only makes sense if you're not sending the hash anywhere.
To prevent this, you still need some transformation on the server between the login form and the database, which means you're back where you started.
What ever work factor downtuning you need to do to conserve server resources may well be dwarfed by the downtuning you need to do to get bcrypt to run in a reasonable amount of time on a mid range smartphone.
I think salting offers pretty clear benefits against other attacks if you have more than one user.
Edit: The point I'm trying to make here is that it is N_USERS times better to have a unique salt per user, regardless of the hash. One should use bcrypt/scrypt/pbkdf2 with unique per-user salts.
If you knew that a hash was used many times over in a system, then you'd try cracking that password first to get access to the most accounts.
With salting you don't get to prioritize, or compare to previous information trivially.
Your argument is about protecting trivial passwords. It's quite common for people to independently come up with the same crappy passwords - the adobe "crossword puzzle" leak was a particularly interesting example of that.
My argument is about protecting moderately good passwords. If you use, for example, PBKDF2 with a fixed salt and have a million users, an attacker can try close to a million times as many passwords vs random salts.
You can still prioritize cracking with salted passwords in many cases. Say you had a dump of hashes form hackernews, and they were bcrypted with random salts. You'd go after the admin accounts first, then the well known accounts with a lot of influence, then everybody else.
Of course salting is better than no salting, but its orders of magnitude less important than choosing a good hashing mechanism. Luckily, the good hashing mechanisms do the salting for you so fixating on salting really just serves as a proxy for letting us know that you don't understand the attack vector.
Yes, having a good hash in the first place is more important, but salting is right up there.
Even if you're using a "hard" hashing algorithm, you're significantly reducing the amount of computations an attacker would have to go through by forgoing salts. Especially considering that salting has very, very little overhead.
How can it "salt for you" without you providing the salt?
This question is operating on the presumption of a unique salt per user.
This is absolute drivel.
Salts prevent checking an attempt against all the hashes in parallel.
That is to say if the attacker has a dump of 1 million salted hashes, they have 1 million times the work to do compared to 1 million unsalted hashes.
Just from the added password length alone it has to be "way more" secure, right? Would love some professional crypto opinion on this.
The salt is generally known to the attacker. The point of the salt is that it’s unique for each password hash, making it impossible to run crack attempts on every password at the same time; for a password hash to be checked, the salt has to be retrieved. This means they’re usually stored together and therefore also compromised together; keeping them apart is rarely useful (kind of like peppers, which is what you might be thinking of; they’re per-application rather than per-hash, and their purpose is to be unknown to the attacker, which is equivalent to encrypting things with a per-application key).
If N = 1000000 and it's just SHA("somesalt65634734" + password) then the attacker is just about guaranteed to be rolling in tons of cracked passwords in no time. Whereas with bcrypt or a unique salt per password you don't get that massive speedup by a factor of N.
Lets assume the hash function being used is SHA2-256.
The input block size there is 512 bits (64 bytes).
So if adding the salt pushes the total length of the hash input past 512 bits then it doubles the runtime.
I do not see any particular advantage there over simply running the hash function twice.
Term people are using for a secret is pepper.
H(pepper | salt | password) type of thing.
Imagine a service which just stores hashes of every password that it can find from everywhere. Any site that wants can check whether a given password is in use..somewhere..and reject it if it is. Any site that wants can add more passwords to the list.
Internally the service would work from a bloom table designed for a fixed false positive rate. (You can actually layer bloom tables in front of bloom tables with parameters to make this possible.) So even if someone compromises that site, there is no direct record of the actual hashes for passwords.
Basically let sites say, "You have to choose a unique password, and do not reuse a password that anyone has used elsewhere!"
If the service and your site were both compromised, then it would be possible to match up incoming hashed passwords to users by timestamp. But even so, the fact that no passwords were reused will hopefully make cracking the hashes harder.
So..probably a bad idea. :-(
Some subset of your users will inevitably use the same terrible password as another user. Or one user will create multiple accounts using the same strong password. These passwords should appear unique in storage.
You still need salt, and that salt must be unique for each password. That's the peril of PBKDF2 password storage: I've seen applications completely botch salting by using one for all users. Bcrypt and scrypt avoid this common error by salting internally.
At least that's my understanding... I am not a security expert! :)
In the "user with multiple accounts but one strong password" scenario, for instance, grouping by password would highlight which users are inclined to use the same strong password with multiple accounts, perhaps on other web sites. Multiple accounts would give an adversary more identifiers to use when cross-referencing other leaked databases or the web.
For instance in the first case where hash = md5('deliciously-salty-' + password)
I'm not sure how easy it is for an attacker to reverse the hash if they don't know the salt. (I understand you need to assume the attacker has the salt, but just posing a constrained hypothetical here)
Attacking md5 is not really that hard, nowadays - the article mentions 180 billion tries/second.
Yes, it requires a rig with 25 GPUs, but this is almost the perfect use for the "cloud" - you can rent those for a few bucks just for a specific attack. There are even dedicated services nowadays, with no programming knowledge required - you just submit the hashes, and out come the found passwords.
If all you have is a hash and know the password you still dont know if it's
md5sum('salt:' + password) or
md5sum(password + '-salt') or
md5sum('salt+' + password + '=salt')
etc
I suppose you'd need to brute force a large key space and look for any results which contain your known password. Does that sound right?
Consider an alternative: randomly generate salts with a constant password. It's the same amount of information, and it is obvious that it does no good to ask for a password if there is only one password that everyone shares. All you need is the dynamic part, and you have access. The dynamic part is the security and the data, and everything else is infrastructure that can be known simply by probing.
This is just how information processing works. You have a static 'language' and a dynamic 'meaning', and if you don't know the language, you can't possibly get the meaning -- but this means you can't even use the site at all (even with a password.) So obviously the 'language' has to be public so it can be known by the people who wish to communicate with the site legitimately.
He's simply saying that a salt isn't enough to keep your passwords secure - you should also be using a slow hashing / memory intensive hashing function.
This is dangerously incomplete reasoning. Salts real value is that if multiple password hashes are Leaked simultaneously then each must be cracked individually. In the Adobe leak password hints were included, which allowed for very easy guessing of passwords, and because there were no salts even if you had no hint if someone else had the same password and an obvious hint you were cracked.
If a database is popped and people know John123's password, and his hash is the same as Worklogin456's hash, then they know my password.
Iterative hashing with per-user salts has been a known pattern. Practical Cryptography covered it in the first edition. The relevant parts should be derivable for most engineers. There's no reason there should be any confusion or crazy blog posts.
- Rolled my own password library (linked elsewhere in this thread)
- Rewritten an existing AES-CBC + HMAC-SHA-256 library to use a different
underlying driver (mcrypt is basically a mummy at this point)
- Did both of these things in PHP, which a lot of people assume
is inherently incompatible with security
However, everything I've done in this vein has been passed before people who are far smarter than myself, to which I received no complaints. And I constantly seek out criticism from people who understand crypto too. (Aside: Most crypto people are dicks. But they mean well.)However many people have reviewed your code, far more people have reviewed libraries like bcrypt!
For example, I'm writing an app that requires a password setup in Python so I was investigating this issue:
1. Passlib - https://pythonhosted.org/passlib/ - "implementations of over 30 password hashing algorithms" - What? I'm not qualified to judge what's secure so I don't even care, I just want something secure!
2. python bcrypt - https://pypi.python.org/pypi/bcrypt/1.0.1 - OK standard bcrypt, supposedly that's secure - "Author: Donald Stufft" - who is Donald Stufft? Is the code good? I don't know C and I'm not qualified to judge!
3. python scrypt - https://pypi.python.org/pypi/scrypt/ - OK it's Colin Percival's code, it's probably good, "Author: Magnus Hallin" - did Magnus Hallin screw it up though? "Burstaholic on Bitbucket provided the necessary changes to make the library build on Windows." Eh. Did "Burstaholic" ruin the crypto? Who knows.
Also, in my case I needed something cross-platform, primarily for Windows. bcrypt worked out of the box. The scrypt library required building on Windows with OpenSSL development headers or whatever and all that - I just said "pass".
Nacl is fantastic and simple to use, it can't get much easier than this: https://libnacl.readthedocs.org/en/latest/topics/public.html
Something similar should exist for passwords - I just want to call compare(password, storedhash) or whatever and be done with it, I'm not qualified to judge the crypto. And it should be cross-platform.
Correct me if this is mentioned in the article, but based on several mentions of how salts were effectively useless, I think this was overlooked.
The user group really benefiting from strong password hashing schemes is the one between the two mentioned extremes. So if you really want to protect all your users and use simple password authentication you have to enforce strong passwords policies which is itself not trivial and will definitely badly annoy some users.
Sure. This is something I touched on very briefly near the end of the post. Password re-use is a huge problem and could easily be an blogpost of its own.
> On the other hand strong passwords will withstand an attack even if they are hashed with plain MD5 or SHA1 - at least right now, there are already some serious collision attacks.
For some value of 'strong' sure. If you're password is long enough, random enough, and of a broad character set then yes, an md5() of that may well be beyond the reach of most attackers. But that's not the sort of password most people, even security conscious people, choose. A preimage against MD5 would quickly change this too, and that's a risky gamble.
> The user group really benefiting from strong password hashing schemes is the one between the two mentioned extremes.
The best we can do is make attacker costs a function of the password that the user has chosen. If someone makes their password 'password' there's no degree of magic that's going to help. Mostly, the benefit comes if/when disclosure occurs and there's an active incident response effort.
> So if you really want to protect all your users and use simple password authentication you have to enforce strong passwords policies which is itself not trivial and will definitely badly annoy some users.
It's not a zero sum game. When a breach occurs, the rough points in time are something like:
1) Attacker compromises digests (let's just assume all are obtained in an instant, rather than exfiltrated over time) 2) Attacker recovers 1 password 3) Attacker recovers n passwords 4) Attacker recovers all passwords
Our goal isn't to eliminate these steps except hopefully the extreme of 4. Rather, it's to increase the time between 2 and 4, and make sure 3 is closer to 2 than 4.
On the other side of the coin, there's a few points in time:
a) Victim org. realizes digests have been compromised. b) Some credentials valid at 1 are made invalid. c) All credentials valid at 1 are made invalid.
If you accept my previous point that at some extreme there's nothing that can be done to prevent brute force, then you should also see that storing passwords in this fashion isn't meant to be a perfect defense. By minimizing the number of compromised accounts at a given point in time, we can maximize the benefits of our response. We can notify users and force resets, invalidating some set of credentials before they're recovered. We want b and c to be closer in time to 2 than 4, and minimize the value of n when we get there.
You don't need SQL Injection if you can directly read the database file, or can operate as the web server user, therefore if you're doing SQL Injection you don't have those privileges.
If you can HMAC or encrypt passwords (prior to hashing them) in the database with a key only on the web server, then that's an extra level of protection. You can't get that key with SQL Injection alone.
https://blog.mozilla.org/webdev/2012/06/08/lets-talk-about-p...
password_hash(base64_encode(hash_hmac("sha512", $password, $key, true)), PASSWORD_BCRYPT)
No we can't.
var hash = new System.Security.Cryptography.Rfc2898DeriveBytes(password,
salt, workfactor).GetBytes(32);
Using a NIST-approved function that's continuously supported by a large player seems like a good choice.Wound up having to implement my own version of PBKDF2 that lets you specify the HMAC algorithm.
Also, there's a large industry trend of, "let's 'pepper' our hashes too!" whereby they have a sitewide salt. This is a mistake. If you want to add a site-specific layer of security to your passwords, encrypting them makes far better sense than adding to the salt.
https://github.com/lizardhq/php_bcrypt_aes
http://blog.ircmaxell.com/2015/03/security-issue-combining-b...
[1] http://world.std.com/~reinhold/diceware.html
[2] https://firstlook.org/theintercept/2015/03/26/passphrases-ca...
[3] https://github.com/freedomofpress/securedrop/issues/180#issu...
ps. for those that use bcrypt and want to ensure a constant user experience on their server see https://www.npmjs.com/package/bcrypt-speed
Beside it makes rotation easier, and not having to remember multiple passwords, or having a password manager, etc. a thing.
Oh wait ;) Google and others are actually attempting to make this work with U2F:
https://support.google.com/accounts/answer/6103523?hl=en (yes it works with gmail. Right. Now.)
While it recommends using bcrypt, if you augment that with "or scrypt or PBKDF2" the same principles apply.
https://www.tinfoilsecurity.com/blog/store-data-securely-enc...
For anybody who has critical security needs, just use an HSM already. Sophos/Utimaco makes a nice one, and it is a nice foundation for password security.
Rolling your own login system, dealing with hash functions and checking passwords yourself is far too low level and dangerous for most apps (as this thread demonstrates).
Or should we still include salts in addition to scrypt?
If yes, only one global or many user specific or both?