The solution would be to have multiple independent entropy pools and either bind them to cores(/sets of cores) or pick a non-busy one in a contention case.
If urandom is really "one for all cores" somebody should be able to demonstrate the speed drop by just writing some bash script? Volunteers?
>This patch solves a problem where simultaneous reads to /dev/urandom can cause two processes on different processors to get the same value. We're not using a spinlock around the random generation loop because this will be a huge hit to preempt latency. So instead we just use a mutex around random_read and urandom_read. Yeah, it's not as efficient in the case of contention, if an application is calling /dev/urandom a huge amount, it's there's something really misdesigned with it, and we don't want to optimize for stupid applications.
(x) http://courses.isi.jhu.edu/netsec/papers/increased_dns_resis...
#include <random>
std::uniform_int_distribution<uint32_t> dist;
// Seed a Mersenne twister PRNG with random data:
std::mt19937 eng;
std::random_device rd;
eng.seed(dist(rd));
// Now to generate random numbers, simply:
uint32_t random_number = dist(eng);[edit] I'm going to skip using the Mersenne twister engine and just use std::random_device for all random data, instead of as a seed. It seems on Linux at least that random_device is basically /dev/urandom. I assume the source will be sane on other OS's too.
std::random_device rd;
std::mt19937 rng(rd()); //Construct with random seed.
uint32_t random_number = dist(rng);
Since only the seed value comes from `rd` you should be fine if you suspected the results from the article would affect you. What was most likely happening in the article was constant use of `rd` without a prng.128 bits (32 bytes) is sufficient to initialize a PRNG into any one of 115792089237316195423570985008687907853269984665640564039457584007913129639936 states (that's 1 with 77 digits). Consequently, hitting the kernel constantly for so much data is utterly inefficient in the first instance, and totally unnecessary in the second.
Blog author could improve his design's efficiency >128x just by seeding a PRNG with a single 32 byte read at the start of the subprocess
Edit: Come to think about it, why isn't the CURL-handle reused? Sounds like a new CURL-handle is inited for every request, which I don't recall being necessary.
Use rand_r(unsigned int *state) instead in parallel and concurrent applications.
Sources: man 3 rand [unix command] http://unixhelp.ed.ac.uk/CGI/man-cgi?rand+3
According to the man page for /dev/random and /dev/urandom:
> no cryptographic primitive available today can hope to promise more than 256 bits of security, so if any program reads more than 256 bits (32 bytes) from the kernel random pool per invocation, or per reasonable reseed interval (not less than one minute), that should be taken as a sign that its cryptography is not skillfully implemented.
it looks like it has been re-factored somewhat, although the lock is still in there.