Why you should never use hash functions for message authentication (opens in new tab)

(blog.jcoglan.com)

171 pointsbakkdoor14y ago101 comments

101 comments

68 comments · 19 top-level

cbsmith14y ago· 7 in thread

This is a great essay on why you should never use a hash function for message authentication.

Except not for the reason the author thinks.

There are several problems here.

First, with SHA-1 for example, you have 64 bytes per chunk. That means you basically get a free ride on this problem for anything < 64 bytes. A lot of "application state" fits pretty well in 64 bytes.

Secondly, unless a message ends right on the 64 byte boundary, it is not nearly that simple. You have a bit of a problem, because the hash is padded, and when you add extra characters to your original string, that padding gets replaced with those values. So, it's no longer simple to just "keep going" from where you stopped.

Still, you can see how that leaves a distinct subset of cases where you'd be exposed. SHA-1, along with most secure hash functions, appends the length of the message the end of the source text before performing the hash function. That means that if you add even one byte to the string, you have now changed the last 8 bytes that were fed in to the "original" hash function. Oh, and your extra byte goes in before those bytes, so not only did you change those 8 bytes, but you shifted them down a byte.

So, no, it isn't nearly that easy to crack a SHA-1 based authentication, and yet, it is easy enough that you should totally NOT use them for authentication and instead use HMAC ; they are vulnerable to extension attacks, it's just not nearly as easy as this article suggests, and conclusions one might draw from this article (like you can solve this problem by feeding all source text in to the hash algorithm backwards) are likely ill founded.

It just turns out that cryptography is way more complicated, and even in terms of understanding weaknesses that arise from doing things wrong, you are going to get it wrong. Trust the experts, when they say it is a bad idea, but don't assume why it is a bad idea can be explained in a short blog article like this.

UPDATED: Added an explanation as to why it might be dangerous to just take this article at its word.

tptacek14y ago

In practice, attackers just guess the bit length of the message by writing the Ruby script that generates the 100 candidate messages at different expected bit lengths and feeding them to their fuzzer. The bit length issue is not a real one in practice.

In practice, length-extendable SHA-1 MAC functions are mechanically exploitable; they'll be discovered quickly by competent attackers who don't even have access to your source code, because attempts to break them require very little code and even less time.

It is a very, very bad idea to put off fixing a problem like this because you've been led to believe by someone on a message board that MD padding is going to be an obstacle to attackers. It isn't. The Flickr MAC break required MD padding, too; Thai and Juliano broke it just the same.

cbsmith14y ago

There seems to be some kind of impedance. I'm not suggesting you can avoid using an HMAC and get away with just a hash.

Let's be clear about what an HMAC is. It's pretty much what this article describes, but executed with a specific protocol that avoids exploitable weaknesses in naive approaches.

It is literally:

    first_pad = a_constant_block ^ key
    second_pad = another_constant_block ^ key
    hash(second_pad + hash(first_pad + message))

It's that similar. The biggest non-security consequence is that you have to call your hash function twice. If that is a problem, you have bigger problems. If you are using a secure hash to authenticate a message, you should always use HMAC.

> In practice, attackers just guess the bit length of the message by writing the Ruby script that generates the 100 candidate messages at different expected bit lengths and feeding them to their fuzzer. The bit length issue is not a real one in practice.

You're closer to addressing the real complexity involved in cracking it than the original article. I agree that in principle it isn't hard to break a naive hash-based digest authentication. I'm merely pointing out that it isn't simple.

Also... not sure why they'd have to use Ruby or why that is relevant...

> It is a very, very bad idea to put off fixing a problem like this because you've been led to believe by someone on a message board that MD padding is going to be an obstacle to attackers.

No. It's a very, very bad idea to ever go down this road in the first place. Start with a MAC. If you think you can be more clever than the collective expertise of the field, study cryptography and prove it, but until you've finished studying it, do what they say. History is littered with people who thought they knew just enough about cryptography that they could do things their own way without consequences (WEP anyone?).

In general I'm all for encouraging amateurs to compete with the pros, but cryptography is definitely one of those fields where it can appear deceptively simple to achieve your goals, and it totally, totally, totally is not.

1 more reply

pbsd14y ago

"Never use a hash function for message authentication" is such a simplistic view. The author takes a common hash function design (Merkle-Damgard), and somehow extrapolates that hash functions should be simply ruled out for authentication.

First, this may send the wrong message to the less-focused reader: "what, I should use block ciphers instead?". Luckily, HMAC is eventually brought up, which is a fine solution.

HMAC requires 2 calls to H, our favorite hash function. Certain applications may find the overhead to be prohibitively high. With non-broken hash functions (e.g., any of the SHA-3 finalists), we can use the so-called envelope authenticator: A = H(K||M||K), with some padding to separate K from M to keep security proofs happy. This is significantly faster for short messages, and short is the most common size out there.

cbsmith14y ago

> Certain applications may find the overhead to be prohibitively high. With non-broken hash functions (e.g., any of the SHA-3 finalists), we can use the so-called envelope authenticator: A = H(K||M||K), with some padding to separate K from M to keep security proofs happy.

I think it is safe to say that you are suggesting a very narrow context. If invoking a hash twice is prohibitive and you are looking at choosing amongst SHA-3 finalists as an alternative (particularly given that many of them are slower than SHA-2, and I believe they are all slower than SHA-1 on low cost CPU's where you might expect this constraint), you are already talking about a very narrow performance window (literally one iteration of Moore's Law covers the gap). You then throw in that people are making refined selection of hash algorithms taking in to account various security nuances, and I think you are dealing not only with a narrow context, but a degree of cryptographic expertise that you don't have to worry that anyone solving it would be consulting this blog or Hacker News in general. ;-)

threedaymonk14y ago

First, this may send the wrong message to the less-focused reader: "what, I should use block ciphers instead?". Luckily, HMAC is eventually brought up, which is a fine solution.

If the reader can't be bothered to read the article to the end, I hardly think it reflects on the author. Whilst it might indeed be a more concise article if it just said "don't use a hash function for message authentication, use HMAC", it would still miss the important final point about timing attacks, not to mention the journey of explanation about why you shouldn't just use a hash function.

2 more replies

jcoglan14y ago

Your point about padding and length bytes is spot-on. I actually left this out of the explanation, although it's interesting, because I felt it a useless diversion: something that's safe except in an easily describable subset of cases is still not fit for purpose when specially designed tools without these problems exist.

Point being, although hash functions might work most of the time, their general construction does not make them safe for this purpose. You can't say something is 'mostly secure' because it works 'most of the time'. The times is doesn't work will affect someone, and for them the system is not secure at all.

Plus, implementations with weird edge cases make for horrible debugging, especially when it comes to security.

cbsmith14y ago

> Your point about padding and length bytes is spot-on. I actually left this out of the explanation, although it's interesting, because I felt it a useless diversion: something that's safe except in an easily describable subset of cases is still not fit for purpose when specially designed tools without these problems exist.

I'm with you except for: "a useless diversion".

From the description in your article, I might think that if I have really short messages, I'm okay. I might think that if I reversed my strings, so any additional state is prepended instead of appended, I'd be okay. I might think that if I truncate my hashes (say, used the first 256 bits of SHA-512), I'd be okay. I'd be really, really wrong.

> Point being, although hash functions might work most of the time, their general construction does not make them safe for this purpose. You can't say something is 'mostly secure' because it works 'most of the time'. The times is doesn't work will affect someone, and for them the system is not secure at all.

That's the wrong point though. It's not that hash functions "might work most of the time". Uncompromised secure hash functions work all of the time for their intended purpose. For this other purpose, they work exactly none of the time.

The aspects I described that might make you think SHA-1 actually has this covered except for a few corner cases are trivial challenges for an attacker to overcome, as are all of the above "remedies" I mentioned (all of which resemble idiot moves that have in fact been made by naive developers who thought they could be clever and save themselves... I don't know.. maybe some CPU time?).

I do think the post was some great writing that illustrated an idea quite clearly. I think it could be quite informative for a lot of people. I just think what you wrote would benefit from being framed appropriately. I'd suggest a preface... something like, "HMAC's were invented to address the fact that you can't just use a hash for authentication. Somehow people miss this point, possibly because HMAC's seem like a trivial layer on top of a secure hash. In practice, there are a surprising variety of issues. To give you an idea of the kinds of problems you might run in to, and hopefully a sense that you can't just naively tweak your use of a hash to address them, I'm going to simplify a common problem with using secure hashes for authentication that has probably never occurred to you. I assure you that this problem is actually more complex, and there are many more problems."

1 more reply

bemmu14y ago· 7 in thread

For the string comparison, could you really use that in a timing attack. Wouldn't the difference between comparison taking one char longer be measured in nanoseconds, while the overall network lag would be milliseconds?

cjg14y ago

Absolutely it is possible, the smaller the difference in timing, the more measurements you need, but statistics is a powerful tool.

jcoglan14y ago

You'd think that, but it is possible. Google "timing attacks over the internet".

yuliyp14y ago

I don't see string comparisons on tiny strings as feasible from either the Crosby/Wallach (suggests you need hundreds of nanoseconds of difference in execution time to be detectable) paper nor the Brumley/Boneh paper (which talks about a specific vulnerability in modular exponentiation which ends up revealing key details, not string comparison).

doomslice14y ago

This is just a guess on how you would do it... but it would involve making many many many requests until you could establish a mean against a given confidence level. Then you could compare that mean for each starting character (once again, many many many requests for each character). If the mean of any of them (or heck, even the distribution of the timings) differ than the others, then you know you you hit the right mark.

This assumes that 1: you can reliably measure at that small of a time difference and 2: you can submit enough requests without being detected.

lobster_johnson14y ago

Yes, I don't see that being detectable for things like web-based APIs. For these string lengths (eg., HMAC with SHA-256), the network lag would add sufficient randomness that you would not be able to measure any difference in timings in the string comparison.

tptacek14y ago

It turns out that it is possible, over a WAN, on some frameworks. The rule of thumb is, if the comparison is effectively being done by libc's memcmp(), the timing attack is very difficult to do even across a single switch; however, some platforms don't drop to memcmp and instead compare each byte explicitly. These are timeable.

If you're wondering, "how do I detect nanosecond differences over a network when my measurement will be swamped by other things happening on the target, the network, and my host", the answers boil down to:

* You're going to move your measurement code as close to the drivers as possible, and fix interrupt handling so that interrupts don't confound your measurements.

* You're going to get yourself on the same hosting provider as your target; for instance, a good chunk of all target apps can be attacked via Amazon EC2 for not very much money.

* You're going to take lots and lots and lots and lots of measurements and then use high school statistics to process the results.

jcoglan14y ago

I realize it seems impossible, but it's been done, repeatedly. https://www.google.co.uk/search?q=timing+attacks+over+the+in...

exit14y ago· 6 in thread

what's wrong with hashing (message + secret) instead?

tptacek14y ago

The high-level answer is that construction is vulnerable to practical attacks based on hash function weaknesses, but that it's significantly harder to attack than secret-prefix MAC functions. But once you understand how secret-prefix MACs are broken, you quickly see the logic behind HMAC.

more_original14y ago

For example, if you already happen to know a collision hash(m1)=hash(m2), where m1 and m2 have full block size, then you also get a collision hash(m1|key)=hash(m2|key), just as explained in the article. So, one could forge messages, which should clearly be counted as a weakness, even if it assumes knowledge of a collision.

tomp14y ago

Not according to http://news.ycombinator.com/item?id=4089076 (SHA1 appends the length of the original message to the message).

1 more reply

jcoglan14y ago

Relying on the way you happen to combine data, instead of using a function that's designed for authentication and has baked-in a safe way to combine the inputs, is a bad idea. "What if $EDGE_CASE_OF_INAPPROPRIATE_CRYPTO_FUNCTION" is never a good question to ask. Just use the right tools in the first place.

exit14y ago

> "What if $EDGE_CASE_OF_INAPPROPRIATE_CRYPTO_FUNCTION" is never a good question to ask. Just use the right tools in the first place.

that's not an intelligent attitude.

understanding where an edge case breaks down is still illuminating, regardless of whether i use hmac in the end.

rlpb14y ago

> Just use the right tools in the first place.

There are two reasons to ask the question.

The first reason is because the questioner is looking for an excuse to use something different. In this case, your answer is the right answer.

The second reason is because the questioner wants to learn more about how cryptography works, for educational value. In this case, your answer is not helpful.

spicyj14y ago· 5 in thread

> The easiest way to defeat this attack is, instead of directly comparing two strings, compare their mappings under a collision-resistant hash function.

Is this really the best way to compare strings without giving away timing info?

tptacek14y ago

No. It's A way to do it, but not the fastest or the simplest. An easier, faster way to do it is what Rails does (after Nate Lawson via Coda Hale told them how): accumulate the XORs of each offset of both strings and then verify that they add up to zero.

blutonium14y ago

Yeah, the Rails code is pretty easy to follow too: https://github.com/rails/rails/blob/f3e1b21ca91afbd97f33d1e5...

marshray14y ago

> accumulate the XORs of each offset of both strings and then verify that they add up to zero

If you do that in a language with an optimizing compiler or JIT runtime you'd better look at the generated code and/or do your own timing measurements. A sufficiently-smart optimizer could recognize that the first difference encountered allows for early loop termination.

reitzensteinm14y ago

What's wrong with simply doing a traditional compare, but not exiting early when it's found the two strings don't match?

I see how the xor method works, but not why it's superior.

Also, it would seem that you'd be leaking information about the size of the other string with either method (whether you exit early or not when you run past the length of one of the strings) - not a problem when comparing hashes presumably, but is it never a concern?

I don't think you'd be leaking that information if you hashed both strings and compared the hash, because it wouldn't stop getting faster when the shorter string gets shorter.

1 more reply

jcoglan14y ago

Not the best, but most string comparison methods are designed to fail as soon as possible, which is not appropriate for this use case. Compare the entire input, or use some misdirection to destroy the relationship between correctness and time.

This is just one of the easier ways. It's effective and easy to code.

klodolph14y ago· 4 in thread

> Finally, you should make sure your application does not exit early if the tag is invalid. You should do all the data processing you would normally do, just short of modifying the database, and check the tag last. If you return early you risk another timing attack.

What kind of timing attack is that? In order for there to be a timing attack, there has to be a difference in the timings.

1. You can either process the data, check the authentication code, then commit.

2. Or you can check the authentication code, process the data, then commit.

I don't see any attacks on #2 that couldn't also work on #1.

jcoglan14y ago

The timing attack is if you check the tag, that fails, and then you don't do any further request processing. This shortens the request time. It depends quite a lot on what you're actually doing with the message, but in general you want to leak as little info as possible about what's happening during any crypto-related process.

tomp14y ago

If your authentication and processing steps are distinct and independent, then you're not doing any good by not returning early (immediately after your authentication process). The only thing that the attacker can learn from the timing of the response is whether authentication was successful or not, what any useful API should convey anyhow.

The only good thing that "authenticating last" does is that it prevents the attacker from issuing lots of requests sequentially, thus brute-forcing your authentication, but this should be solved in another way, without slowing down legitimate users and overloading your servers.

majormajor14y ago

But it only tells them that it failed, not that they got closer, like with the string-comparison based one. What does that gain an attacker if you already provide other feedback about the request being invalid? And not doing so would result in a bad user experience if you have users run into a bug somewhere and it fails silently so they don't know that nothing was actually done.

2 more replies

klodolph14y ago

You can compare success versus failure.

Success: check authentication, process data, commit, return 2xx status.

Failure: check authentication, return 4xx status.

The reason that you don't get any information from the difference in timing between the two branches is because you already get that information from the status code.

Now on the other hand, if you use naive string comparison, different failure branches will take different amounts of time. That is a security hole, and it's not what we're talking about here.

Mithrandir14y ago· 4 in thread

Number one thing I learned from the Coursera class: don't build your own crypto.

cjg14y ago

I've heard this said many times before and I agree. However, using crypto libraries does not solve the problem of vulnerabilities through cryptography misuse. For example, keys must be stored correctly, algorithms often need initialising in the correct mode for your specific application, IVs must not be repeated, and, as shown in this article, hashes should be used in specific ways to work correctly.

There are many ways to fail with cryptography and avoiding them all takes considerable expertise. Using crypto libraries does not solve this problem.

tptacek14y ago

This is the entire point of high-level crypto libraries, like Guttman's libcrypt and Google's Keyczar. So, yeah, don't use OpenSSL or javax::crypto or whatever .NET calls it; but, do consider using something like Keyczar, or, better yet, just use PGP/GPG to store data at rest, TLS for data in motion, and be done with it.

2 more replies

skibrah14y ago

Agreed- you can take an intro course in crypto and still find many ways of getting it wrong. My intro to crypto professor reiterated this fact almost ever class and noted that even seemingly small changes to a protocol can render it vulnerable to attack.

waterlesscloud14y ago

Just finished the class today myself, and yep, that's the most repeated advice.

The class repeats starting Monday, if anyone is interested in learning the basics of how all this stuff works.

https://www.coursera.org/course/crypto

theunixbeard14y ago· 4 in thread

The title is sort of linkbait, as in fact what it should be is "Never use hash functions vulnerable to extension attacks"... (And most common ones are) With that said, this stuff is pretty cool and after reading that the author learned all this in the Coursera Cryptography class I decided to sign up for it. (Starts June 11th)

klodolph14y ago

Although really, the maxim in the title is true. You should never use hash functions for authentication, you should use authentication codes.

As an analogy, you should never use a hammer to put in screws. That's not linkbait just because you have a tool that's a hammer on one end and a screwdriver on the other end.

cobbal14y ago

I think the title is fairly accurate if you read it as "never use a hash when you need a MAC". MACs have nice provable security properties that you get for free, so you don't get surprises like extension attacks.

quotemstr14y ago

You can learn all that and more just by reading Applied Cryptography.

tptacek14y ago

People that build crypto after reading Applied Cryptography are doing a fine job paying for my kids college education, so I agree with you and encourage everyone to do likewise.

If you don't happen to like my kids, well, first, screw you, and secondly: buy _Practical Cryptography_ or _Cryptography Engineering_ (really the same book) and burn your copy of "Applied".

1 more reply

jebblue14y ago· 3 in thread

>> This fact means that an attacker can determine the first correct character of the tag by submitting requests to a signed URL with a different first character in the tag each time, and stopping when the request takes a little longer than usual. After guessing the first character they can move onto the second, and so on until they’ve guessed the whole correct tag.

Wouldn't this attack be eliminated by using iptables rate limiting to reduce the attack window of opportunity?

tptacek14y ago

Wouldn't this attack be better eliminated by fixing the timing leak that is potentially allowing people to guess valid MACs on packets?

The reality is that you probably can dick around with things in your deployment and your app to make timing attacks prohibitively expensive/annoying; if you understand that you're not eliminating the timing leak, but rather masking it, you can take advantage of the additional measurements required to unmask the leak to give your MAC enough of a buffer to last for its whole useful lifetime.

But when you do this, you're really playing on the razor's edge of what we currently know about side channel attacks on crypto, and you're probably going to end up putting more effort into your workaround than you would in just fixing the underlying bug.

CamperBob214y ago

I guess the idea of just blacklisting the client's IP after the first 1,048,576 failed attempts is too boring, or has some other drawback.

2 more replies

jcoglan14y ago

Possibly, although it depends on the lifetime of the link and the size of the tag (e.g. SHA-1 is 40 chars, each case requires mean of 8 guesses, so whole thing only requires 320 requests).

Better to have resistance as baked into your crypto as you can, rather than relying on a firewall further up the stack. If the data itself is resilient, no implementation will be able to defeat it efficiently.

Ralz14y ago· 2 in thread

I e-mailed visa about something similar with their new upcoming V.me service. They suggest that you use md5 to generate the tag which is known to be weaker than SHA-1. I was a little surprised that a company like Visa would mess up on crypto and not know to use HMAC instead of just a simple hash. Never heard a response from them either.

Here's what their documentation says:

Language Standard Syntax for Generating MD5 Hash Java import org.apache.commons.codec.digest.*; hash = DigestUtils.md5Hex(string1+string2+string3...); PHP $hash = md5($string1.$string2.$string3...); Ruby require 'digest/md5' hash = Digest::MD5.hexdigest(string1+string2+string3...) Python import md5 hash = md5.new(string1+string2+string3...)

simondlr14y ago

The link: https://developer.v.me/docs/get_credentials

jcoglan14y ago

Dear Christ just kill me already.

terangdom14y ago· 2 in thread

In order for an extension attack, wouldn't the blocks have to align perfectly? Like suppose I hash [abcd][efgh][k]

How would you extend that?

tptacek14y ago

You guess the length of the last block (or iterate over possible lengths with trials of the attack). When you know it, you can easily predict the MD padding at the end of the hash to fill in the block. That fake padding (our code calls it "glue padding") actually ends up in the forged message; for instance, if you're signing URLs, you'll see it as gibberish in the middle of the URL. In practice, most code does not care about the gibberish "glue" bytes.

mcpherrinm14y ago

The compression function requires blocks be of the correct size. Your blocks will align.

Your hash construction will pad out your data to be multiples of the right size. That padding can be verified in some way - often it is all nulls, or encodes the length of the message. http://en.wikipedia.org/wiki/Merkle%E2%80%93Damg%C3%A5rd_con...

ma2rten14y ago· 2 in thread

I was wondering about that timing attack. Is that really possible? How many requests would you have to make until you can get reliable statistics over the timing of a string comparison, when you have network delays, other requests and all kinds of stuff that influence timing?

tptacek14y ago

It's very difficult, but possible. It's a plausible enough threat, especially if you're cloud hosted now or ever might be, that you should take steps to avoid it.

Jitter and confounding are problems that can be addressed simply by repeated measurements.

The rule-of-thumb from Crosby & Wallach's paper on remote timing attacks is, assume tens-to-hundreds of nanoseconds precision if you can colocate the attacker at the same provider, and tens of microseconds if you have to do it over the Internet.

ma2rten14y ago

Thanks for your reply. I was totally oblivious to this type of timing attack before I read the article.

einhverfr14y ago· 2 in thread

This seems to me like a variant of "do not trust the client." Good info though. I have learned a lot more about how hash algorythms work. I do wonder though if fixed-param hashes are relatively safe due to the inability to add suffixes.

tptacek14y ago

"Never trust the client" is, like "always validate input", one of those timeless bits of security strategy that is worth pretty much nothing in the real world. It's about as useful as "pretty much you should always make sure you're secure, pretty much."

einhverfr14y ago

I disagree with you here. One should always be sure input is validated somewhere before anything important is done with it. And not trusting the client is one element of what I call a "push security back" strategy (that strategy is, basically, don't do any security enforcement in your application you can't make a component further back do just as well. The reasoning here is that components like operating systems and web apps will always have more review and more eyes than your web app, so if they can work without trusting your app then all the better. Work from least privilege and design so that components further back can do things like authentication, authorization, and the like, to the extent this is practical. This creates a narrower security perimeter, and greater depth in defence. Of course, as in all things, security is a matter of perpetual tradeoffs.

loeg14y ago· 1 in thread

Tl;dr: Use HMAC for Hash-based Message Authentication Codes and hash functions for hash functions. Don't use them the other way around.

PS, maybe more developers should take an intro course on crypto.

ch0wn14y ago

+ don't compare secret strings in a manner that makes it possible to draw conclusions about the position of the inequality.

ajdecon14y ago

Stupid question: Whenever I've used GPG to sign an email, it includes a line saying "Hash: SHA1". Does this imply PGP-signed messages are vulnerable to this, or does PGP/GPG do something different/smarter?

jiggy201114y ago

One thing I'm slightly confused about here.

The article says:

"This sequence is then folded using a compression function h(). The details of the h() depend on the hashing function, but the only thing that concerns us here is that the compression function takes two message blocks and returns another block of the same size."

So if the chunks in a SHA-1 hash are 512 bits each then surely the output of the hash function would be 512 bits rather than the 160 bit digest?

Edit: the IV is 160 bits , so "another block of the same size" means each derivative block is the size of the IV not of the actual data.

more_original14y ago

RFC 2104 specifies how you should do it, see e.g. http://de.wikipedia.org/wiki/Keyed-Hash_Message_Authenticati...

The Handbook of Applied Cryptography, Chapter 9 (free online: http://cacr.uwaterloo.ca/hac/) nicely explains the reasons.

seats14y ago

tptacek or others with domain knowledge-

Is the timing attack hardening suggested in the blog post a standard approach?

If I was trying to attack a system and knew loosely that they did what he suggested (hashing then comparing vs comparing with timing exposed) , my untrained instinct would be that this is the weakest part. In other words I think this just makes the timing attack a little more difficult, but still possible, by producing specific hashes that carry out the timing attack.

When I've needed to harden comparisons against timing attacks, I've always just used constant time comparison functions, such as these ->

http://codahale.com/a-lesson-in-timing-attacks/ http://rdist.root.org/2010/01/07/timing-independent-array-co...

quotemstr14y ago

Also, HMAC is just one MAC (message authentication code). OMAC and is another good one; it has the interesting property of being built on top of a block cipher instead of a hash function, which can reduce the number of "moving parts" in a system if you're using a block cipher of some sort anyway.

X-Istence14y ago

If I remember correctly this was the same issue that Flickr had with their API calls at one point in time!

j / k navigate · click thread line to collapse

101 comments

68 comments · 19 top-level

cbsmith14y ago· 7 in thread

This is a great essay on why you should never use a hash function for message authentication.

Except not for the reason the author thinks.

There are several problems here.

UPDATED: Added an explanation as to why it might be dangerous to just take this article at its word.

tptacek14y ago

cbsmith14y ago

There seems to be some kind of impedance. I'm not suggesting you can avoid using an HMAC and get away with just a hash.

Let's be clear about what an HMAC is. It's pretty much what this article describes, but executed with a specific protocol that avoids exploitable weaknesses in naive approaches.

It is literally:

    first_pad = a_constant_block ^ key
    second_pad = another_constant_block ^ key
    hash(second_pad + hash(first_pad + message))

Also... not sure why they'd have to use Ruby or why that is relevant...

> It is a very, very bad idea to put off fixing a problem like this because you've been led to believe by someone on a message board that MD padding is going to be an obstacle to attackers.

1 more reply

pbsd14y ago

First, this may send the wrong message to the less-focused reader: "what, I should use block ciphers instead?". Luckily, HMAC is eventually brought up, which is a fine solution.

cbsmith14y ago

threedaymonk14y ago

First, this may send the wrong message to the less-focused reader: "what, I should use block ciphers instead?". Luckily, HMAC is eventually brought up, which is a fine solution.

2 more replies

jcoglan14y ago

Plus, implementations with weird edge cases make for horrible debugging, especially when it comes to security.

cbsmith14y ago

I'm with you except for: "a useless diversion".

1 more reply

bemmu14y ago· 7 in thread

cjg14y ago

Absolutely it is possible, the smaller the difference in timing, the more measurements you need, but statistics is a powerful tool.

jcoglan14y ago

You'd think that, but it is possible. Google "timing attacks over the internet".

yuliyp14y ago

doomslice14y ago

This assumes that 1: you can reliably measure at that small of a time difference and 2: you can submit enough requests without being detected.

lobster_johnson14y ago

tptacek14y ago

* You're going to move your measurement code as close to the drivers as possible, and fix interrupt handling so that interrupts don't confound your measurements.

* You're going to get yourself on the same hosting provider as your target; for instance, a good chunk of all target apps can be attacked via Amazon EC2 for not very much money.

* You're going to take lots and lots and lots and lots of measurements and then use high school statistics to process the results.

jcoglan14y ago

I realize it seems impossible, but it's been done, repeatedly. https://www.google.co.uk/search?q=timing+attacks+over+the+in...

exit14y ago· 6 in thread

what's wrong with hashing (message + secret) instead?

tptacek14y ago

more_original14y ago

tomp14y ago

Not according to http://news.ycombinator.com/item?id=4089076 (SHA1 appends the length of the original message to the message).

1 more reply

jcoglan14y ago

exit14y ago

> "What if $EDGE_CASE_OF_INAPPROPRIATE_CRYPTO_FUNCTION" is never a good question to ask. Just use the right tools in the first place.

that's not an intelligent attitude.

understanding where an edge case breaks down is still illuminating, regardless of whether i use hmac in the end.

rlpb14y ago

> Just use the right tools in the first place.

There are two reasons to ask the question.

The first reason is because the questioner is looking for an excuse to use something different. In this case, your answer is the right answer.

The second reason is because the questioner wants to learn more about how cryptography works, for educational value. In this case, your answer is not helpful.

spicyj14y ago· 5 in thread

> The easiest way to defeat this attack is, instead of directly comparing two strings, compare their mappings under a collision-resistant hash function.

Is this really the best way to compare strings without giving away timing info?

tptacek14y ago

blutonium14y ago

Yeah, the Rails code is pretty easy to follow too: https://github.com/rails/rails/blob/f3e1b21ca91afbd97f33d1e5...

marshray14y ago

> accumulate the XORs of each offset of both strings and then verify that they add up to zero

reitzensteinm14y ago

What's wrong with simply doing a traditional compare, but not exiting early when it's found the two strings don't match?

I see how the xor method works, but not why it's superior.

I don't think you'd be leaking that information if you hashed both strings and compared the hash, because it wouldn't stop getting faster when the shorter string gets shorter.

1 more reply

jcoglan14y ago

This is just one of the easier ways. It's effective and easy to code.

klodolph14y ago· 4 in thread

What kind of timing attack is that? In order for there to be a timing attack, there has to be a difference in the timings.

1. You can either process the data, check the authentication code, then commit.

2. Or you can check the authentication code, process the data, then commit.

I don't see any attacks on #2 that couldn't also work on #1.

jcoglan14y ago

tomp14y ago

majormajor14y ago

2 more replies

klodolph14y ago

You can compare success versus failure.

Success: check authentication, process data, commit, return 2xx status.

Failure: check authentication, return 4xx status.

The reason that you don't get any information from the difference in timing between the two branches is because you already get that information from the status code.

Now on the other hand, if you use naive string comparison, different failure branches will take different amounts of time. That is a security hole, and it's not what we're talking about here.

Mithrandir14y ago· 4 in thread

Number one thing I learned from the Coursera class: don't build your own crypto.

cjg14y ago

There are many ways to fail with cryptography and avoiding them all takes considerable expertise. Using crypto libraries does not solve this problem.

tptacek14y ago

2 more replies

skibrah14y ago

waterlesscloud14y ago

Just finished the class today myself, and yep, that's the most repeated advice.

The class repeats starting Monday, if anyone is interested in learning the basics of how all this stuff works.

https://www.coursera.org/course/crypto

theunixbeard14y ago· 4 in thread

klodolph14y ago

Although really, the maxim in the title is true. You should never use hash functions for authentication, you should use authentication codes.

As an analogy, you should never use a hammer to put in screws. That's not linkbait just because you have a tool that's a hammer on one end and a screwdriver on the other end.

cobbal14y ago

quotemstr14y ago

You can learn all that and more just by reading Applied Cryptography.

tptacek14y ago

People that build crypto after reading Applied Cryptography are doing a fine job paying for my kids college education, so I agree with you and encourage everyone to do likewise.

If you don't happen to like my kids, well, first, screw you, and secondly: buy _Practical Cryptography_ or _Cryptography Engineering_ (really the same book) and burn your copy of "Applied".

1 more reply

jebblue14y ago· 3 in thread

Wouldn't this attack be eliminated by using iptables rate limiting to reduce the attack window of opportunity?

tptacek14y ago

Wouldn't this attack be better eliminated by fixing the timing leak that is potentially allowing people to guess valid MACs on packets?

CamperBob214y ago

I guess the idea of just blacklisting the client's IP after the first 1,048,576 failed attempts is too boring, or has some other drawback.

2 more replies

jcoglan14y ago

Possibly, although it depends on the lifetime of the link and the size of the tag (e.g. SHA-1 is 40 chars, each case requires mean of 8 guesses, so whole thing only requires 320 requests).

Ralz14y ago· 2 in thread

Here's what their documentation says:

simondlr14y ago

The link: https://developer.v.me/docs/get_credentials

jcoglan14y ago

Dear Christ just kill me already.

terangdom14y ago· 2 in thread

In order for an extension attack, wouldn't the blocks have to align perfectly? Like suppose I hash [abcd][efgh][k]

How would you extend that?

tptacek14y ago

mcpherrinm14y ago

The compression function requires blocks be of the correct size. Your blocks will align.

ma2rten14y ago· 2 in thread

tptacek14y ago

It's very difficult, but possible. It's a plausible enough threat, especially if you're cloud hosted now or ever might be, that you should take steps to avoid it.

Jitter and confounding are problems that can be addressed simply by repeated measurements.

ma2rten14y ago

Thanks for your reply. I was totally oblivious to this type of timing attack before I read the article.

einhverfr14y ago· 2 in thread

tptacek14y ago

einhverfr14y ago

loeg14y ago· 1 in thread

Tl;dr: Use HMAC for Hash-based Message Authentication Codes and hash functions for hash functions. Don't use them the other way around.

PS, maybe more developers should take an intro course on crypto.

ch0wn14y ago

+ don't compare secret strings in a manner that makes it possible to draw conclusions about the position of the inequality.

ajdecon14y ago

jiggy201114y ago

One thing I'm slightly confused about here.

The article says:

So if the chunks in a SHA-1 hash are 512 bits each then surely the output of the hash function would be 512 bits rather than the 160 bit digest?

Edit: the IV is 160 bits , so "another block of the same size" means each derivative block is the size of the IV not of the actual data.

more_original14y ago

RFC 2104 specifies how you should do it, see e.g. http://de.wikipedia.org/wiki/Keyed-Hash_Message_Authenticati...

The Handbook of Applied Cryptography, Chapter 9 (free online: http://cacr.uwaterloo.ca/hac/) nicely explains the reasons.

seats14y ago

tptacek or others with domain knowledge-

Is the timing attack hardening suggested in the blog post a standard approach?

When I've needed to harden comparisons against timing attacks, I've always just used constant time comparison functions, such as these ->

http://codahale.com/a-lesson-in-timing-attacks/ http://rdist.root.org/2010/01/07/timing-independent-array-co...

quotemstr14y ago

X-Istence14y ago

If I remember correctly this was the same issue that Flickr had with their API calls at one point in time!

j / k navigate · click thread line to collapse