brief funny story. My dad built the 5th computer in the UK, ICCE [0] and one of the tests they wanted to run was statistical analysis over random number fields.
They approached the General Post Office (GPO) which had an RNG called "ernie" [1] which ran the postal investment bond lottery. This is a true RNG, based on radio device avalanche diode behaviour (actually, neon tubes). It was dressed up as a computer but it was basically a detector device and A-to-D converter dressed up to look like one. They asked for a million truly random numbers to run some tests over. Interestingly, ERNIE was made by somebody who worked on Colossus at Bletchley.
The GPO refused to share a feed of numbers: They were concerned the team would discover some predictable event in the number field, and either destroy the post office bond scheme by revealing it, or use it to make millions.
[0] https://en.wikipedia.org/wiki/Imperial_College_Computing_Eng...
Going off on a tangent: An oft-neglected issue is that, even when the random source (like avalanched diodes) is actually sufficiently random, any apparatus that captures that randomness for use inherently causes a bias in the observations.
Even if everything else is perfect (it usually isn't), in terms of signal processing, any observation window (e.g. a finite length of time of measurement) is an aperture which ends up getting convolved with the signal source being observed.
It sometimes helps to convert the skew into white noise with a "whitening" post-pass algorithm.
Using real life randomness is still a good thing to do, of course, it's just that are always real world issues with anything and everything.
From your ERNIE link:
The designers were Tommy Flowers and Harry Fensom and it derives from Colossus, one of the world's first digital computers ...
https://en.wikipedia.org/wiki/Tommy_Flowershttps://en.wikipedia.org/wiki/Harry_Fensom
( I travelled to Canada from Australia in the 1980s and interviewed William Tutte about codebreaking and the war, I missed out on an opportnity to talk to Tommy Flowers the following year and didn't return to the UK until after his death )
Guess which WW2 activity used a very large number of relays, and was made by the GPO (who used relays heavily in telephony), and was subsequently discontinued by the GPO at wars end as things to do with encoding and decoding scaled back thus flooding the electronics market in the UK with relays...
Tommy Thomas (head of the ERCC in edinburgh where my dad wound up and I therefore lived) worked at Manchester on the Mark 1. He never talked about it and I never asked, subsequently. We both wound up in Australia at the CSIRO, where Radiophysics had been started by people in the radar space, and that bled into their interest in Computing. it's a small world. He was the head of the IT division and I was a lowly researcher, our paths didn't cross much. I wish I'd talked to him more, about this stuff and computer history.
I ended up mainly doing geophysical field work with a lot of aquisition, processing, and interpretation coding work .. but I dabbled in symbolic computer algebra systems (in Australia) for a while and loitered a little in technical history of the borderline classified - I interviewed "for the record" Leonard Beadell, Jack Wong Sue, Mark Oliphant, people associated with the Mungalalu Truscott Airbase, etc.
I seem to recall a fair bit was done here with early over the horizon radar work but I didn't go far in that direction ending up more in radiometrics and resource | energy tracking.
Also, PCG didn't stop the development. Nowadays, modern PRNGs explore the usage of chaotic PRNGs (without a fixed period), which are often faster than non-chaotic ones. Notable examples are the Romu family [0] of PRNGs and sfc [1], and tylov's sfc derivative [2].
Another thing that would be nice to mention is that we went full circle, the good old middle square method already used by von Neumann, has been found to work very well if you scale it up and add a weyl sequence. [3]
Edit: And how could I forget, there has also been a lot of effort in using SIMD, e.g. by SHISHUA. [4]
Another thing to consider is how to efficiently distribute the generated numbers in a given distribution. I'm not aware of any recent improvements in that regard, other then some approximations that have probably been reinvented a bunch of times.
[0] https://www.romu-random.org/
[1] https://numpy.org/devdocs/reference/random/bit_generators/sf...
[2] https://github.com/tylov/STC/blob/master/docs/crandom_api.md
[3] https://arxiv.org/abs/1704.00358
[4] https://espadrine.github.io/blog/posts/shishua-the-fastest-p...
Edit: I had a few names mixed up
A new programmer reading this article would come away with the impression that, if they need random numbers, they should use xorshift or PCG, when in reality they should be calling getentropy(), or, if a syscall is too expensive, using a CSPRNG (e.g. ChaCha or BLAKE3) seeded with getentropy(). We now have RNGs that are both secure and really, really fast -- multiple GB/s fast -- so there are very few circumstances where a PRNG is truly necessary.
If one needs fast PRNGs, say for simulation, monte carlo stuff, etc. then CSPRNGs are a terrible idea. They're literally orders of magnitude slower than fast PRNGs. Almost nothing needs CSPRNGs (only things needing crypto level security, which is a tiny amount of the uses for PRNGs).
In short, use the right tool for the job.
In fact, there is! The idea that your kernel has some finite amount of entropy that can be "used up" is a persistent myth. See https://www.2uo.de/myths-about-urandom/
Okay, it's technically not infinite, but 128 bits of "true entropy" is enough to seed a CSPRNG that will generate as many random numbers as you will ever need.
> Almost nothing needs CSPRNGs
This is how we end up with, e.g., hashmap implementations that use non-cryptographic hash functions, and then an attacker DoS's your server with 10,000 keys that all end up in the same bucket. Or "random" automated tests that always use the same PRNG with the same seed, meaning certain code paths will never be taken.
We used to live in a world where developers frequently had to choose between safety and performance. Except for a few niche applications, that is no longer the case: we can have both. I certainly agree that there are situations where maximum performance is required, and security is pointless -- but 99% of the time, developers should not even be asking that question. They should just be using the default RNG provided by their language, which should be a CSPRNG, not a PRNG.
Thomas' Digital Garden blog is not really the place to find good advice on this. Crypto researchers are the place to look. The quality of how Linux handles this is quite open to debate, and researchers routinely question the choies made. Here's [1] one of many. Use Google Scholar, enter urandom, and limit the search to recently to see more.
The way urandom does stretching does not provide entropy (which follows from the second law of thermo). And yes, there is a finite amount. Developers have decided to conflate actual entropy with "hard to compute," which is simply not true.
Yes, you can get 128 bits of true entropy, and then use a stream cipher to reuse it over and over, but it's not adding or making more entropy. It's simply that the stream cipher is not (yet?) broken, at which point you'll realize there is not enough entropy.
By the argument of entropy stretching, you could claim you only need 10, or 2, or 1 bit of entropy, and then use a stream cipher, and have unlimited entropy. Since that would be broken quickly, showing the lie, they choose 128 or 256 or so, and hope the cipher method doesn't break.
And yes, getting more and more from that stream weakens the system, and eventually, like all crpyto, the method will break.
>This is how we end up with, e.g., hashmap implementations that use non-cryptographic hash functions, and then an attacker DoS's your server with 10,000 keys
Conversely, you use an algorithm that takes 100x of the time to hash, then they simply DoS's your server by sending it any large batch of things to hash. For many hashes the PRNG is a significant amount of time used. This is why there is no one size fits all method.
I have worked in crypto (and high speed computing and numerics) for decades, and have patents in crypto stuff, and have even written articles about PRNGs (and given talks on how to break them via SAT solvers and Z3 stuff), so I am well aware of the uses of all these pieces.
If some area needs crypto, assuming an amateur will simply throw in a proper function call and now magically have security is terrible way to design or implement systems. If an area needs crypto, have a senior person decide what to do, and walk the person that is left guessing through how it works.
Your advice leads to people thinking (incorrectly) that since they called getentropy to seed, now they are secure, when there is a zillion other attacks they are still open to (timing, forgetting how to use block ciphers, choosing the wrong hash map type in the case of hash maps, not handling salt correctly, not using memory buffers properly, not ensuring contents don't leak, ensuring things running in the cloud don't leak, and on and on..., forgetting any of a ton of things needed to make the thing secure).
If you're worried about DDOS, paying 100x time for a PRNG call can lead to similar system failures - in gaming, rendering, simulation, science stuff.
So the same advice: use the one that is most suited. I always recommend default to faster since someone not knowing about crypto should ever be implementing things that accidentally need crypto at such a tiny scale. And the slowdowns hit all software.
>Except for a few niche applications
The majority of running code in the world is not facing attackers - it's stuff like embedded systems, medical devices, tools and toys, programs on my PC (only a tiny few face the world), data processing systems (finance, billing, logging, tracking, inventory...), and so on. The largest use of PRNG calls are simulation by far, which most certainly do not want to pay the 100x performance hit.
The niche applications are those needing a CSPRNG, which is truly the smaller fraction of pieces of code written.
>Or "random" automated tests that always use the same PRNG with the same seed, meaning certain code paths will never be taken
Repeatable tests are by far easiest to fix. I've seen tons of people do exactly what you say, and get an error they never see again. The correct answer here is not to rely on randomness executing a path in your code for testing. Make sure that the code is tested properly, not randomly. Relying on the luck of random number generators is simply bad advice. Use them to get more values, or to stress test for load testing, but expecting a lucky number pick to happen on test 1 of a trillion runs is not very solid advice compared to simply running the test, repeatably, with enough runs to provide the level of stress you desire. At least then you can redo the test and find the bug and fix it in reasonable time.
>that is no longer the case: we can have both
No, we do not. I regularly work on crypto code and often get called in to go over stuff with others. I also write lots of high performance code (mostly scientific, simulation, and the occasional rendering needs) that needs speed. There is no PRNG that meets both needs by a long shot.
> They should just be using the default RNG provided by their language,
Agreed.
> which should be a CSPRNG, not a PRNG
And that is not true for any language I am aware of for the reasons I mentioned.
Python: Mersenne Twister
C#: Xoshiro
JavaScript: xorshift128+
Go: Additive Lagged Fibonacci
C/C++: large range, from really bad to mostly bad
Java: LCG
And so on......[1] "An Empirical Study on the Quality of Entropy Sources in Linux Random Number Generator", https://ieeexplore.ieee.org/abstract/document/9839285
A new programmer shouldn't be meddling in cryptography, so they probably don't need either cryptographically-secure pseudo-random numbers nor true random numbers. True random numbers are tricky.
Yuh. I'm not sure what "strong entropy" means, in this context; entropy's usually reported as some number of bits of entropy. So perhaps "a lot of entropy" is clearer.
At any rate, by default a (CS)PRNG doesn't have any entropy that isn't present in its seed. According to some, at least, that entropy is diminished every time you read from the RNG, so it depletes to nothing after a finite number of reads.
I've finally come to the conclusion that entropy, whatever that means, is orthogonal to RNGs. Instead, RNGs should be classified by their unpredictability. A CSPRNG is one with high unpredictability. And I've given up on trying to build a DIY HWRNG. It was a misbegotten project.
The more popular library rand usese ChaCha and getentropy as you described.
I guess that's the best we had for students before the widespread adoption of computers?
https://en.wikipedia.org/wiki/A_Million_Random_Digits_with_1...
And there's a news article about it here: https://www.wsj.com/articles/rand-million-random-digits-numb...
You have to understand what properties you actually care about and choose a (P)RNG that has those properties.
Perhaps this is because true randomness doesn't always appear to be random enough. I would argue that this property is what makes it real. Sometimes true randomness might be six dice all showing the face of six.
For many use cases in statistics / sampling / monte carlo simulations, you often need millions/billions of well-distributed random numbers with very low generation overhead.
Even things like game AIs care about performance with regards to the RNGs they use.
(A few notes on the second link: I wouldn’t recommend xoshiro256+x8 since it is very weak statistically, same for xoshiro256 IMO. Also, disclaimer, I wrote SHISHUA.)
X f(X u);
on which i want to iterate occasionally to get in turn f(u0),f(f(u0)),...
The "smart" way to do this, is X u=f(u0); //Initialize once
...
X smart_f() {
X w = u;
u = f(w);
return w; // This line is not stalled by the previous one,
// hence a super scalar processor might be able to hide the latency of calculation f(w)
}
I doubt any compiler will be able to figure this out, and surely not if f has side effects.Also, I learned to use a slide-rule in the sixties; I 've never touched one again until I inherited an antique, a few years ago. Nobody was using slide-rules in the seventies, surely.