urandom and concurrency (opens in new tab)

(drsnyder.us)

90 pointsdrsnyder12y ago78 comments

78 comments

34 comments · 11 top-level

Glyptodon12y ago· 4 in thread

So why is there a lock for reads from urandom? I suppose if there weren't a lock concurrent reads would all get the same random values?

bodyfour12y ago

Yeah basically. That could be a disaster for, say, nonce generation.

The solution would be to have multiple independent entropy pools and either bind them to cores(/sets of cores) or pick a non-busy one in a contention case.

acqq12y ago

Yes, if there is no a urandom generator per core, it would be convenient for some extreme cases to introduce such. The question is if it's worth the effort and the resulted "bloat" of the kernel code and memory usage. Linux runs on some very small devices too and even there decent user-space programmers can easily do their own per-thread generation in their programs. Normal uses of crypto are such: you initialize your own crypto once, then produce a lot of data in your own space.

If urandom is really "one for all cores" somebody should be able to demonstrate the speed drop by just writing some bash script? Volunteers?

1 more reply

drsnyderOP12y ago

Good question. The only reference to it that I could find was here http://lkml.iu.edu//hypermail/linux/kernel/0412.1/0181.html but he doesn't explain why it's necessary.

gizmo68612y ago

From the mail:

>This patch solves a problem where simultaneous reads to /dev/urandom can cause two processes on different processors to get the same value. We're not using a spinlock around the random generation loop because this will be a huge hit to preempt latency. So instead we just use a mutex around random_read and urandom_read. Yeah, it's not as efficient in the case of contention, if an application is calling /dev/urandom a huge amount, it's there's something really misdesigned with it, and we don't want to optimize for stupid applications.

2 more replies

Mister_Snuggles12y ago· 4 in thread

A more important question would be "Why does asynchronous DNS resolution require random data in the first place?"

mike-cardwell12y ago

So you can randomise the ID in the request packet to help protect against cache poisoning. And also so you can apply 0x20 bit (x) encoding to the qname for further protection.

(x) http://courses.isi.jhu.edu/netsec/papers/increased_dns_resis...

bch12y ago

Hard to say w/o seeing the data in question, but based on that, perhaps nscd or re-using curl handles could mitigate their frustration w/ runtime.

1 more reply

TazeTSchnitzel12y ago

IDs of requests?

shachar12y ago

choosing random UDP source port

mike-cardwell12y ago· 4 in thread

Interestingly enough, I have actually been working on writing a DNS client library in C++ with Boost ASIO this very afternoon. I was going to get my source of random data using the following C++11 standard library code. I would really appreciate any comments from people here if there is anything wrong with what I'm doing:

  #include <random>
  std::uniform_int_distribution<uint32_t> dist;

  // Seed a Mersenne twister PRNG with random data:
  std::mt19937 eng;
  std::random_device rd;
  eng.seed(dist(rd));

  // Now to generate random numbers, simply:
  uint32_t random_number = dist(eng);

aidenn012y ago

I don't know what DNS uses the randomness for, but if a malicious attacker can gain from guessing the randomness, don't use MT, as the state can be extracted from MT by observing a relatively small number of outputs.

mike-cardwell12y ago

Ah. You appear to be right. I'm glad I asked now.

[edit] I'm going to skip using the Mersenne twister engine and just use std::random_device for all random data, instead of as a seed. It seems on Linux at least that random_device is basically /dev/urandom. I assume the source will be sane on other OS's too.

2 more replies

akira250112y ago

I believe you would use it to determine a random outgoing port to use to contact the DNS server; this prevents spoofing. However, the port space is only 16-bits, so how you map the outputs of the MT into that space would have the biggest impact -- but you're right that's it's probably best to avoid it entirely.

1 more reply

en4bz12y ago

    std::random_device rd;
    std::mt19937 rng(rd()); //Construct with random seed. 
    uint32_t random_number = dist(rng);

Since only the seed value comes from `rd` you should be fine if you suspected the results from the article would affect you. What was most likely happening in the article was constant use of `rd` without a prng.

aidenn012y ago· 3 in thread

Seed a secure userspace PRNG from urandom, perhaps?

hosay12312y ago

Adding to aidenn0's comment, if you trust /dev/urandom to produce 4kb of random data, it follows that you trust it to produce 128 bits.

128 bits (32 bytes) is sufficient to initialize a PRNG into any one of 115792089237316195423570985008687907853269984665640564039457584007913129639936 states (that's 1 with 77 digits). Consequently, hitting the kernel constantly for so much data is utterly inefficient in the first instance, and totally unnecessary in the second.

Blog author could improve his design's efficiency >128x just by seeding a PRNG with a single 32 byte read at the start of the subprocess

mcpherrinm12y ago

Userland PRNGs are one of the easiest ways to introduce security vulnerabilities into your programs. I would recommend being very, VERY careful before trying to do this, like the traditional "Don't roll your own crypto" advice.

2 more replies

tptacek12y ago

If you care about security, avoid this approach; it creates an additional single point of failure, which historically has also tended to be a very likely point of failure (see: Debian randomness, Android Java SecureRandom, &c).

rafekett12y ago· 3 in thread

overreliance on /dev/urandom in the presence of little entropy is a well known performance problem on servers. that's why http://en.wikipedia.org/wiki/Hardware_random_number_generato... exist

claudius12y ago

If I understand that problem correctly, it has nothing to do with the amount of entropy available but is a simple synchronisation/locking issue. Were reads from, say, /dev/zero ‘protected’ by spinlocks in the same way, the same issue would arise. Conversely, I don’t see how adding a hardware RNG to the system could alleviate the locking issue.

kevingadd12y ago

A hardware RNG isn't going to do anything to address the scalability problems inherent in having a single shared lock around /dev/urandom.

jerf12y ago

/dev/urandom is not /dev/random.

ape412y ago· 2 in thread

Why does he need so much pseudorandomness. And why use /dev/urandom directly. Maybe using the random library from the programming environment would make more sense.

frankfarmer12y ago

Simply initializing a curl handle causes the /dev/urandom read -- so a large number of parallel curl requests easily triggers this issue.

ape412y ago

Thanks for the reply.

sebcat12y ago· 1 in thread

As a user of libcares (which is awesome for bulk DNS lookups btw) I'll add that I've only ever needed one ares_channel per process. Having one ares_channel for every CURL-handle seems a bit excessive. This is probably the main problem here, not the kernel spinlock.

Edit: Come to think about it, why isn't the CURL-handle reused? Sounds like a new CURL-handle is inited for every request, which I don't recall being necessary.

drsnyderOP12y ago

The curl handle should be re-used if possible so that's also part of the problem.

rcoh12y ago· 1 in thread

The stdlib rand() function on unix has a global lock around it provided by many versions of Linux. As such, if rand() is called in performance critical parallel code, performance will tank as each thread or process attempts to acquire this lock. Even if this lock is not acquired, you will still have a race condition on the state of the random number generator and may produce bad (non-random) randomness.

Use rand_r(unsigned int *state) instead in parallel and concurrent applications.

Sources: man 3 rand [unix command] http://unixhelp.ed.ac.uk/CGI/man-cgi?rand+3

ekimekim12y ago

The problem you're describing is similar but not the same as the one in the article. What you describe is part of the libc implementation of rand(3), whereas the article is talking about reads from /dev/urandom, which has a lock inside the kernel code (for the same reasons as libc).

X-Istence12y ago· 1 in thread

I love how Theodore Ts'o suggests using a user space PRNG that is seeded from /dev/urandom. OpenBSD are ripping out all of the user space PRNG stuff from OpenSSL in favour of arc4random_buf()...

clarry12y ago

arc4random_buf() operates in userspace (in this case; it also exists in the kernel). It is seeded from the kernel, using a sysctl.

kijin12y ago

If your program needs to read 4K from /dev/urandom multiple times per second, you're doing it wrong. There is little benefit in reading anything over 32 bytes at a time.

According to the man page for /dev/random and /dev/urandom:

> no cryptographic primitive available today can hope to promise more than 256 bits of security, so if any program reads more than 256 bits (32 bytes) from the kernel random pool per invocation, or per reasonable reseed interval (not less than one minute), that should be taken as a sign that its cryptography is not skillfully implemented.

bcl12y ago

The code he pointed to is for kernel 2.6.18 which at this point could be considered ancient history. If you look at current master - https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux....

it looks like it has been re-factored somewhat, although the lock is still in there.

j / k navigate · click thread line to collapse

78 comments

34 comments · 11 top-level

Glyptodon12y ago· 4 in thread

So why is there a lock for reads from urandom? I suppose if there weren't a lock concurrent reads would all get the same random values?

bodyfour12y ago

Yeah basically. That could be a disaster for, say, nonce generation.

The solution would be to have multiple independent entropy pools and either bind them to cores(/sets of cores) or pick a non-busy one in a contention case.

acqq12y ago

If urandom is really "one for all cores" somebody should be able to demonstrate the speed drop by just writing some bash script? Volunteers?

1 more reply

drsnyderOP12y ago

Good question. The only reference to it that I could find was here http://lkml.iu.edu//hypermail/linux/kernel/0412.1/0181.html but he doesn't explain why it's necessary.

gizmo68612y ago

From the mail:

2 more replies

Mister_Snuggles12y ago· 4 in thread

A more important question would be "Why does asynchronous DNS resolution require random data in the first place?"

mike-cardwell12y ago

So you can randomise the ID in the request packet to help protect against cache poisoning. And also so you can apply 0x20 bit (x) encoding to the qname for further protection.

(x) http://courses.isi.jhu.edu/netsec/papers/increased_dns_resis...

bch12y ago

Hard to say w/o seeing the data in question, but based on that, perhaps nscd or re-using curl handles could mitigate their frustration w/ runtime.

1 more reply

TazeTSchnitzel12y ago

IDs of requests?

shachar12y ago

choosing random UDP source port

mike-cardwell12y ago· 4 in thread

  #include <random>
  std::uniform_int_distribution<uint32_t> dist;

  // Seed a Mersenne twister PRNG with random data:
  std::mt19937 eng;
  std::random_device rd;
  eng.seed(dist(rd));

  // Now to generate random numbers, simply:
  uint32_t random_number = dist(eng);

aidenn012y ago

mike-cardwell12y ago

Ah. You appear to be right. I'm glad I asked now.

2 more replies

akira250112y ago

1 more reply

en4bz12y ago

    std::random_device rd;
    std::mt19937 rng(rd()); //Construct with random seed. 
    uint32_t random_number = dist(rng);

aidenn012y ago· 3 in thread

Seed a secure userspace PRNG from urandom, perhaps?

hosay12312y ago

Adding to aidenn0's comment, if you trust /dev/urandom to produce 4kb of random data, it follows that you trust it to produce 128 bits.

Blog author could improve his design's efficiency >128x just by seeding a PRNG with a single 32 byte read at the start of the subprocess

mcpherrinm12y ago

2 more replies

tptacek12y ago

rafekett12y ago· 3 in thread

overreliance on /dev/urandom in the presence of little entropy is a well known performance problem on servers. that's why http://en.wikipedia.org/wiki/Hardware_random_number_generato... exist

claudius12y ago

kevingadd12y ago

A hardware RNG isn't going to do anything to address the scalability problems inherent in having a single shared lock around /dev/urandom.

jerf12y ago

/dev/urandom is not /dev/random.

ape412y ago· 2 in thread

Why does he need so much pseudorandomness. And why use /dev/urandom directly. Maybe using the random library from the programming environment would make more sense.

frankfarmer12y ago

Simply initializing a curl handle causes the /dev/urandom read -- so a large number of parallel curl requests easily triggers this issue.

ape412y ago

Thanks for the reply.

sebcat12y ago· 1 in thread

Edit: Come to think about it, why isn't the CURL-handle reused? Sounds like a new CURL-handle is inited for every request, which I don't recall being necessary.

drsnyderOP12y ago

The curl handle should be re-used if possible so that's also part of the problem.

rcoh12y ago· 1 in thread

Use rand_r(unsigned int *state) instead in parallel and concurrent applications.

Sources: man 3 rand [unix command] http://unixhelp.ed.ac.uk/CGI/man-cgi?rand+3

ekimekim12y ago

X-Istence12y ago· 1 in thread

I love how Theodore Ts'o suggests using a user space PRNG that is seeded from /dev/urandom. OpenBSD are ripping out all of the user space PRNG stuff from OpenSSL in favour of arc4random_buf()...

clarry12y ago

arc4random_buf() operates in userspace (in this case; it also exists in the kernel). It is seeded from the kernel, using a sysctl.

kijin12y ago

If your program needs to read 4K from /dev/urandom multiple times per second, you're doing it wrong. There is little benefit in reading anything over 32 bytes at a time.

According to the man page for /dev/random and /dev/urandom:

bcl12y ago

The code he pointed to is for kernel 2.6.18 which at this point could be considered ancient history. If you look at current master - https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux....

it looks like it has been re-factored somewhat, although the lock is still in there.

j / k navigate · click thread line to collapse