In fact, there is! The idea that your kernel has some finite amount of entropy that can be "used up" is a persistent myth. See https://www.2uo.de/myths-about-urandom/
Okay, it's technically not infinite, but 128 bits of "true entropy" is enough to seed a CSPRNG that will generate as many random numbers as you will ever need.
> Almost nothing needs CSPRNGs
This is how we end up with, e.g., hashmap implementations that use non-cryptographic hash functions, and then an attacker DoS's your server with 10,000 keys that all end up in the same bucket. Or "random" automated tests that always use the same PRNG with the same seed, meaning certain code paths will never be taken.
We used to live in a world where developers frequently had to choose between safety and performance. Except for a few niche applications, that is no longer the case: we can have both. I certainly agree that there are situations where maximum performance is required, and security is pointless -- but 99% of the time, developers should not even be asking that question. They should just be using the default RNG provided by their language, which should be a CSPRNG, not a PRNG.
Thomas' Digital Garden blog is not really the place to find good advice on this. Crypto researchers are the place to look. The quality of how Linux handles this is quite open to debate, and researchers routinely question the choies made. Here's [1] one of many. Use Google Scholar, enter urandom, and limit the search to recently to see more.
The way urandom does stretching does not provide entropy (which follows from the second law of thermo). And yes, there is a finite amount. Developers have decided to conflate actual entropy with "hard to compute," which is simply not true.
Yes, you can get 128 bits of true entropy, and then use a stream cipher to reuse it over and over, but it's not adding or making more entropy. It's simply that the stream cipher is not (yet?) broken, at which point you'll realize there is not enough entropy.
By the argument of entropy stretching, you could claim you only need 10, or 2, or 1 bit of entropy, and then use a stream cipher, and have unlimited entropy. Since that would be broken quickly, showing the lie, they choose 128 or 256 or so, and hope the cipher method doesn't break.
And yes, getting more and more from that stream weakens the system, and eventually, like all crpyto, the method will break.
>This is how we end up with, e.g., hashmap implementations that use non-cryptographic hash functions, and then an attacker DoS's your server with 10,000 keys
Conversely, you use an algorithm that takes 100x of the time to hash, then they simply DoS's your server by sending it any large batch of things to hash. For many hashes the PRNG is a significant amount of time used. This is why there is no one size fits all method.
I have worked in crypto (and high speed computing and numerics) for decades, and have patents in crypto stuff, and have even written articles about PRNGs (and given talks on how to break them via SAT solvers and Z3 stuff), so I am well aware of the uses of all these pieces.
If some area needs crypto, assuming an amateur will simply throw in a proper function call and now magically have security is terrible way to design or implement systems. If an area needs crypto, have a senior person decide what to do, and walk the person that is left guessing through how it works.
Your advice leads to people thinking (incorrectly) that since they called getentropy to seed, now they are secure, when there is a zillion other attacks they are still open to (timing, forgetting how to use block ciphers, choosing the wrong hash map type in the case of hash maps, not handling salt correctly, not using memory buffers properly, not ensuring contents don't leak, ensuring things running in the cloud don't leak, and on and on..., forgetting any of a ton of things needed to make the thing secure).
If you're worried about DDOS, paying 100x time for a PRNG call can lead to similar system failures - in gaming, rendering, simulation, science stuff.
So the same advice: use the one that is most suited. I always recommend default to faster since someone not knowing about crypto should ever be implementing things that accidentally need crypto at such a tiny scale. And the slowdowns hit all software.
>Except for a few niche applications
The majority of running code in the world is not facing attackers - it's stuff like embedded systems, medical devices, tools and toys, programs on my PC (only a tiny few face the world), data processing systems (finance, billing, logging, tracking, inventory...), and so on. The largest use of PRNG calls are simulation by far, which most certainly do not want to pay the 100x performance hit.
The niche applications are those needing a CSPRNG, which is truly the smaller fraction of pieces of code written.
>Or "random" automated tests that always use the same PRNG with the same seed, meaning certain code paths will never be taken
Repeatable tests are by far easiest to fix. I've seen tons of people do exactly what you say, and get an error they never see again. The correct answer here is not to rely on randomness executing a path in your code for testing. Make sure that the code is tested properly, not randomly. Relying on the luck of random number generators is simply bad advice. Use them to get more values, or to stress test for load testing, but expecting a lucky number pick to happen on test 1 of a trillion runs is not very solid advice compared to simply running the test, repeatably, with enough runs to provide the level of stress you desire. At least then you can redo the test and find the bug and fix it in reasonable time.
>that is no longer the case: we can have both
No, we do not. I regularly work on crypto code and often get called in to go over stuff with others. I also write lots of high performance code (mostly scientific, simulation, and the occasional rendering needs) that needs speed. There is no PRNG that meets both needs by a long shot.
> They should just be using the default RNG provided by their language,
Agreed.
> which should be a CSPRNG, not a PRNG
And that is not true for any language I am aware of for the reasons I mentioned.
Python: Mersenne Twister
C#: Xoshiro
JavaScript: xorshift128+
Go: Additive Lagged Fibonacci
C/C++: large range, from really bad to mostly bad
Java: LCG
And so on......[1] "An Empirical Study on the Quality of Entropy Sources in Linux Random Number Generator", https://ieeexplore.ieee.org/abstract/document/9839285
I prefer DJB's blog on this: https://blog.cr.yp.to/20140205-entropy.html
>> The Linux /dev/urandom manual page claims that without new entropy the user is "theoretically vulnerable to a cryptographic attack", but (as I've mentioned in various venues) this is a ludicrous argument—how can anyone simultaneously believe that
>> - we can't figure out how to deterministically expand one 256-bit secret into an endless stream of unpredictable keys (this is what we need from urandom), but
>> - we can figure out how to use a single key to safely encrypt many messages (this is what we need from SSL, PGP, etc.)?
That's how we got here.....
True, they are different, but there is no meaningful distinction between a value that is "truly random" and a value that can be computed with a computer larger than the universe.
> eventually, like all crypto, the method will break
It's disheartening to see this claim being made by someone who regularly works on crypto code. When it comes to symmetric encryption, the war between cryptographers and cryptanalysts is over -- and the cryptographers have won. The security margin provided by modern ciphers like ChaCha20 is so high, and attacks on them so pitiful, that there are now calls for reducing the strength of ciphers in order to increase performance without sacrificing a meaningful amount of security: https://eprint.iacr.org/2019/1492
ChaCha20 will not be broken in our lifetime; probably it will never be broken, in the sense that an attacker will be able to observe any subset of the keystream and predict the next block (which is what we need from a CSPRNG).
Anyway, given these premises (which AFAIK we both agree on):
1) There is no "one-size-fits-all" RNG
2) Using a PRNG instead of a CSPRNG may lead to security vulnerabilities
3) Using a CSPRNG instead of a PRNG may lead to performance degradation
Which type of RNG should be the default (e.g. the one you get if you type 'import rng'), and which type should the programmer have to ask for specifically? That's the question at hand here.Sloppy thinking and conflating different ideas are not a good way to think about computer security.
>ChaCha20 will not be broken in our lifetime; probably it will never be broken
As was said of the zillion currently broken cryptosystems, hashes, and all manner of security schemes......
> the cryptographers have won.
Is this why NIST routinely is asking for better crypto systems? Because crypto is solved?
> there are now calls for reducing the strength https://eprint.iacr.org/2019/1492
Yet followup papers often invent new methods of attack https://eprint.iacr.org/2022/695. It's almost as if theoretical advances can change the unproven-yet-assumed strength of previous methods.
>Which type of RNG should be the default (e.g. the one you get if you type 'import rng')
I already demonstrated that answer is PRNG for pretty much all widely use languages, which I agree with. There's simply no CSPRNG possible that ports over the widespread systems these languages are used for, so it's silly to continue to argue that they should default to a CSPRNG. CSPRNGs are not used by default, have never been, there is no trend to move that way I can find, all for the reasons I gave my very first reply in this thread.
The JavaScript one decent (only because it generates floats using it).
The MT in C++ and python isn't all that bad, bit it's huge and it fails statistical test, albeit pretty late.
The others are slow and low quality. (If I remember my test correctly)
Agreed - this is mostly due to age. Mersenne Twister appearing so many places is terrible since it has bad mixing properties (especially around 0, but the same thing shows up elsewhere), is error prone to seed as a result, is huge and slow.
I doubt default C++ rand() ever uses MT - it's far too slow. It appeared in Boost, then in the std (and like all C++ stuff, appeared a decade after it was obsolete, unfortunately).
The xo* styles ones are decent, but PCG seems to outperform them at just about everything someone needs for fast PRNG with good statistical properties.
>The JavaScript one decent (only because it generates floats using it).
I can almost guarantee they make the usual errors trying to convert to float. :)
Converting PRNG output to float is a notoriously error-prone minefield, so if a language makes it into a float, you can almost certainly assume it is not a good RNG value.
The first problem many make is that the underlying source better be uniform as integers, and I've ran across many language implementation that were not. The PRNGS with period 2^N-1 are a common source of error compared to those with 2^N periods. This PRNG should have lots of other nice properties such as k-equidistribution for high enough k. Many more fail that.
EDIT: ha ha - called it. Here's [1] Javascript (V8) rng functions. They at least admit the double [0,1) is not uniform :) They start off just as I claimed will happen, using a 2^N-1 period, push this non-uniform value into the mantissa as simply bits (ensuring already non-uniformity by anything downstream), then in other places in the code they take this at most 53 bits mantissa, mult by a 64 bit value, then use that as if it's truly 64 bits. What a mess...... This is the state of most libraries when I inspect them.... This type of crap shows up when you run large numerical simulations (weather, nuke testing, giant finance, physics/planetary sims...) and the underlying bias craps out your results. It's hard to test for events occurring once in trillions when the underlying code is so flaky.
Then, and here is the great part. They quite often simply divide this by 2^N as a float, which means lots of possible floating point values are not represented, and they're certainly not represented with the frequency one needs them to be. For example, there are twice as many float in [0,1/2) as [1/2,1) and so on, so the division left out lots of possible values. A float has a 23 bit mantissa, a double has 53. Then a person often simply multiplies this [0,1) float back into an integer range, and suddenly you lost tons of of properties you wanted: uniform distribution, all values equally likely, etc.
So the float version is nearly always a bad choice.
Whenever I have to provide some sources of (P)RNG for a library, I always make a few that do the common tasks so people don't roll their own: uniform 32 and 64 bit (if needed), uniform(M) for [0,M), uniform for [A,B), a proper float for [A,B) (since the get [0,1) then scale loses values, a proper float [0,1) and [0,1] (which are different things), etc., and hopefully people looking at the ways to get random numbers use these. They save a lot of issues.
[1] https://github.com/v8/v8/blob/main/src/base/utils/random-num...