On Melissa O’Neill’s PCG random number generator (opens in new tab)

(lemire.me)

104 pointsozanonay8y ago83 comments

83 comments

46 comments · 13 top-level

Animats8y ago· 6 in thread

Here's the site for the random number generator.[1] It's basically a simple linear congruential random number generator (well known, but not very good) fed into a mixer. The mixer is new.

Most of the analysis is about the LCG or the final output. The suggested mixer is just

    output = rotate64(uint64_t(state ^ (state >> 64)), state >> 122);

That's simple, and the insight in this paper is that something that simple helps a lot. I would have thought that you'd want a mixer where changing one bit of the input changes, on average, half the bits of the output. The mixer above won't do that. DES as a mixer would probably be better, but it's slower. The new result here is that something this simple passes many statistical tests.

This isn't crypto-grade; both that mixer and a LCG generator are reversible with enough work.

[1] http://www.pcg-random.org/

marze8y ago

>linear congruential random number generator (well known, but not very good)

Relevant quotes from the paper:

"But if you began reading the section with the belief that “linear congruential generators are bad” (a fairly widely-held belief amongst people who know a little about random number generation), you may have been surprised by how well they performed. We’ve seen that they are fast, fairly space efficient, and at larger sizes even make it through statistical tests that take down other purportedly better generators. And that’s without an improving step."

and

"Despite their flaws, LCGs have endured as one of the most widely used random- number generation schemes, with good reason. They are fast, easy to implement, and fairly space efficient. As we saw in Section 3.3, despite poor performance at small bit sizes, they continue to improve as we add bits to their state, and at larger bit sizes, they pass stringent statistical tests (provided that we discard the low-order bits), actually outperforming many more-complex generators. And in a surprise upset, they can even rival the Mersenne Twister at its principle claims to fame, long period and equidistribution."

"Nevertheless, there is much room for improvement. From the empirical evidence we saw in Section 3.3 (and the much more thorough treatment of L’Ecuyer & Simard [28], who observe that LCGs are only free of birthday-test issues if n < 16p1/3, where n is the number of numbers used and p is the period), we can surmise that we may observe statistical flaws in a 128-bit LCG after reading fewer than 247 numbers (which is more than BigCrush consumes but nevertheless isn’t that many—an algorithm could plausibly use one number per nanosecond and 247 nanoseconds is less than two days)."

lfowles8y ago

> 247 nanoseconds is less than two days

Yes, yes it is.

For those just as confused as I was, replace all instances of 247 with 247

2 more replies

msla8y ago

Two-to-the-forty-seven nanoseconds is a little over thirty-nine hours.

(Let's see HN mangle that!)

1 more reply

tptacek8y ago

It's a weird paper. It's quite long and takes an eternity to reach this simple point. It's also littered with "security concerns" which are confusing at best and misleading at worst; in reality, none of the generators it discusses are suitable for "sensitive applications", even in the corner-case scenario it discusses of needing permutations of all the b-bit integers, a problem we already have cryptographic tools to solve.

But I'll confess to not really understanding what all the fuss is about insecure generators.

Animats8y ago

Linear congruential generators (x = k1*x + k2, unsigned with truncation on overflow) have some strange properties, some of which the paper explores. If you repeatedly take three sequential values from one and treat them as 3D coordinates, the points line up in parallel planes. (There's an explanation of why in Knuth.) Something has to be done to destroy that order. The solution here is to XOR the upper and lower halves of the 128-bit state to get 64 bits, then circular shift that by the high 6 bits of the 128-bit input. This passes most of the classic tests for random number generators.

That's the paper, basically.

1 more reply

MiceWithYaffle8y ago

> But I'll confess to not really understanding what all the fuss is about insecure generators.

Based on things she's said on her site and in comments on John D. Cook's blog, it's all about algorithmic complexity attacks on randomized algorithms.

In other words, if you're doing quicksort on external input with a random pivot, and someone knows the PRNG state, they can make a pathological input that'll trigger quadratic behavior.

I don't know how likely this is to happen, but I know there were similar attacks on hash tables a few years ago.

1 more reply

lowmagnet8y ago· 6 in thread

I like this because she is a professor at Harvey Mudd. They took steps to make CS more inclusive, with great results. I appreciate her attitude on accessibility, which is in keeping with that institution's philosophy.

That she ran into a paper wall doesn't bother her because she's openly publishing is even better.

dsacco8y ago

Regardless of her attitude on academic accessibility, it is inappropriate for a paper introducing a novel primitive with proposed security considerations to spend the time explaining why determinism is a concern in functions dealing with randomness. This paper could have been 10 pages.

If you want to make your research more accessible, there are ways to do that without assuming that your reader is coming in from a dead start on the field.

mathperson8y ago

I am not sure why you are being downvoted. I think this is clearly true.

An easy way to make research accessible is to write a monograph!

thanatropism8y ago

There's room for accessible and for abstruse literature. Usually what happens with novel work is that initial publications are abstruse but met with excitement by the scholarly community and gradually more accessible works (as the number of collaborators/coauthors grows too) are published.

That said, even if multi-culti math means that top-line researchers are going to be spending time with song-and-dance introductions, she should still have put a grad student onto the task of making the short paper that experts will actually read.

If the whole thing is a matter of style and not of obfuscation, this would have given a grad student an easy, cool first publication.

pizza8y ago

Well, Harvey Mudd is an undergrad-only STEM school, so I'm not sure having a grad student edit it would have been possible.

1 more reply

mathperson8y ago

What is multi-culti?

3 more replies

CalChris8y ago

I watched her EE380 talk [1] and thought she was a fine professor for a difficult (to me) subject. That she didn't get published for at best stylistic reasons means nothing.

[1] https://ee.stanford.edu/event/seminar/ee380-computer-systems...

lisper8y ago· 3 in thread

This is an interesting article, but it's more about the changing landscape of academic publishing than it is about random number generators.

[EDIT] The actual paper is here: http://www.pcg-random.org/pdf/hmc-cs-2014-0905.pdf

pedrocr8y ago

Only if it turns out the work is actually good. If not this is exactly what should happen in peer review.

CalChris8y ago

That is exactly what peer review should do, to determine whether the idea has merit. But that isn't what has happened. Peer review which is a cornerstone to the academic/scientific community stumbled on style and didn't get to merit.

1 more reply

lisper8y ago

You don't think the peer review process ever produces false negatives?

3 more replies

the_stc8y ago· 3 in thread

As a general comment, I dislike deliberately obtuse writing in papers. In my current work, I came across a very in-depth survey of our industry (sex work). Excellent study, very helpful. But some of the sentences seemed to over-complicate the math. Example: "Consider the set P {p1, p2, ... pN} representing providers and the set C {c1, c2, ... cN} representing customers". I am pretty sure this kind of stuff is filler or pretends to make things look more rigorous than they are.

On the other hand, maybe spending more than a line explaining what the birthday paradox is should be cut out and put in a backgrounder paper or appendix so that the paper can focus on the actual novel ideas.

dsacco8y ago

> On the other hand, maybe spending more than a line explaining what the birthday paradox is should be cut out and put in a backgrounder paper or appendix so that the paper can focus on the actual novel ideas.

That was my annoyance with the paper as well. Add to that explanations that amount to, "What even is determinism?" or "What's a seed?" and I'm unsurprised it's nearly 60 pages.

mathperson8y ago

are you joking? That is your example of abstruse mathematical notation? Some variables with names?!

the_stc8y ago

No. I am saying that giving the definition of a set each time is just extra verbosity. The whole paper had that extra verbiage, everywhere. Kind of why-use-one-word-when-ten-will-do feeling.

1 more reply

Houshalter8y ago· 3 in thread

I once tried to develop my own fast random number generate using nothing but bitwise operations. On the theory they were the fastest/simplest. I had a program generate thousands of random combinations of bitwise functions. And then used statistical tests to see which ones produced the most "random" seeming behavior.

It worked as far as I can tell. But I don't trust the statistical tests. Who is to say there isn't a very obvious pattern in the numbers that I didn't test for or notice? How do you prove a random number generator is good?

thaumasiotes8y ago

> How do you prove a random number generator is good?

You can't; that's the nature of randomness. You can prove they're bad, though.

cgmg8y ago

A theoretical approach based on computability theory is https://en.m.wikipedia.org/wiki/Algorithmically_random_seque....

carapace8y ago

> How do you prove a random number generator is good?

Compress its output.

dsacco8y ago· 3 in thread

I have a few comments:

1. The paper itself[1] is extremely readable by the standards of most cryptography research. On one hand, this is great because I was able to follow the whole thing in essentially one pass. On the other hand, the paper is very long for its result (58 pages!), and it could easily do without passages like this one:

Yet because the algorithms that we are concerned with are deterministic, their behavior is governed by their inputs, thus they will produce the same stream of “random” numbers from the same initial conditions—we might therefore say that they are only random to an observer unaware of those initial conditions or unaware of how the algorithm has iterated its state since that point. This deterministic behavior is valuable in a number of fields, as it makes experiments reproducible. As a result, the parameters that set the initial state of the generator are usually known as the seed. If we want reproducible results we should pick an arbitrary seed and remember it to reproduce the same random sequence later, whereas if we want results that cannot be easily reproduced, we should select the seed in some inscrutable (and, ideally, nondeterministic) way, and keep it secret. Knowing the seed, we can predict the output, but for many generators even without the seed it is possible to infer the current state of the generator from its output. This property is trivially true for any generator where its output is its entire internal state—a strategy used by a number of simple random number generators. For some other generators, such as the Mersenne Twister [35], we have to go to a little more trouble and invert its tempering function (which is a bijection; see Section 5), but nevertheless after only 624 outputs, we will have captured its entire internal state.

That's a lot of setup for what is frankly a very basic idea. A cryptographer being verbose in their writing might briefly remind the reader of these properties with the first sentence, but they'd still likely do that with much more brevity than this. I understand wanting to make your research accessible, but for people who understand the field this detracts from getting to the "meat." It might make it harder to get through, but a 10-30 page result is preferable to a nearly 60-page one that assumes I know nearly nothing about the field. If I don't know these details very well, how can I properly assess the author's results?

2. The author's tone in her writing is something I take issue with. For example, passages like this one...

Suppose that, excited by the idea of permutation functions, you decide to always improve the random number generators you use with a multiplicative step. You turn to L’Ecuyer’s excellent paper [25], and without reading it closely (who has time to read papers these days!), you grab the last 32-bit constant he lists, 204209821. You are then surprised to discover that your “improvement” makes things worse! The problem is that you were using XorShift 32/32, a generator that already includes multiplication by 747796405 as an improving step. Unfortunately, 204209821 is the multiplicative inverse of 747796405 (mod 2 32), so you have just turned it back into the far-worse–performing XorShift generator! Oops.*

...go a bit beyond levity. If you're trying to establish rigorous definitions and use cases to distinguish between generators, functions and permutations, this isn't the way to do it. This isn't appropriate because it doesn't go far enough to formalize the point. It makes it intuitive, sure, and that's a great educational tool! But it's a poor scenario to use as the basis for a problem statement - research is not motivated by the failure of an engineer to properly read and understand existing primitives, it's motivated by novel results that exhibit superior qualities over existing primitives.

3. The biggest grievance I have with this paper is the way in which it analyzes its primitives for cryptographic security. For example, this passage under 6.2.2 Security Considerations:

In addition, most of the PCG variations presented in the next section have an output function that returns only half as many bits as there are in the generator state. But the mere use of a 2 b/2-to -1 function does not guarantee that an adversary cannot reconstruct generator state from the output. For example, Frieze et al. [12] showed that if we simply drop the low-order bits, it is possible for an adversary to discover what they are. Our output functions are much more complex than mere bit dropping, however, with each adding at least some element of additional challenge. In addition, one of the generators, PCG-XSL-RR (described in Section 6.3.3), is explicitly designed to make any attempt at state reconstruction especially difficult, using xor folding to minimize the amount of information about internal state that leaks out.17 It should be used when a fast general-purpose generator is needed but enhanced security would also be desirable. It is also the default generator for 64-bit output.

That's not a rigorous analysis of a primitive's security. It is an informal explanation of why the primitive may be secure, but it so high level that there is no proof based on a significant hardness assumption. Compare this with Dan Boneh's recent paper, "Constrained Keys for Invertible Pseudorandom Functions"[2]. Appendices A and B after the list of references occupy nearly 20 pages of theorems used to analyze and prove the security of primitives explored in the paper under various assumptions.

Novel research exploring functions with (pseudo)random properties is inherently mathematical; it's absolutely insufficient to use a bunch of statistical tests, then informally assess the security of a primitive based on the abbreviated references to one or two papers.

_________

1. http://www.pcg-random.org/pdf/hmc-cs-2014-0905.pdf

2. https://eprint.iacr.org/2017/477.pdf

tptacek8y ago

Just to be clear: it's not a cryptography paper, is it? Did you figure out what journal it was submitted to?

dsacco8y ago

She submitted it to ACM Transactions on Mathematical Software. I would personally consider it a cryptography paper, for three reasons:

1. She purports to introduce a novel result that bridges "medium-grade" performance characteristics and security characteristics in one primitive. In fact, if you look at the PCG Random website (pcg-random.org), she very clearly compares and emphasizes both performance and security characteristics with functions like xorshift and ChaCha.

2. We see cryptography papers submitted to all manner of theoretical CS conferences and journals, for example Symposium on the Theory of Computing, which are not uniformly crypto-focused.

3. She acknowledges herself that she found it hard to categorize her paper (it could be relevant for simulstion, it could be relevant for stream ciphers, etc) in a blog post about how she chose the venue: http://www.pcg-random.org/posts/history-of-the-pcg-paper.htm...

As a meta point I read the whole thing, and I actually think it would be a nice publishable result if it were, say 10 - 20 pages. But 60 is wild! It took me longer to get through this "accessible" paper than it did for me to get through any of Boneh's papers on constrained and puncturable pseudorandom functions!

It's definitely interesting, and sure, why not explore "medium-grade security" that makes explicit tradeoffs with performance and security. But the presentation seems like it was written by someone writing for a non-academic audience, and the content of 6.2.2 "Security Considerations" is really light on provable security.

2 more replies

ahh8y ago

ACM Transactions on Mathematical Software.

fwdpropaganda8y ago· 3 in thread

Physicist here.

Off-topic

> And it is not even entirely clear what “really random” would mean. It is not clear that we live in a randomized universe…

At the quantum level it really is clear that we live in a really random universe. What's the meaning of really random? The outcome of a quantum process.

On-topic. Yeah, you have to know your audience. As OP mentions, just because the paper wasn't published doesn't prevent anyone from thinking about it and even building on it. On the other hand these scientific publications have styles and target audiences, and maybe she got rejected not due to lack of relevance or rigor, but because the paper didn't match the publication's non-scientific criteria for publication.

eridius8y ago

A quantum process is a random process, but isn't it still an open philosophical question as to whether the "random process" we observe is truly random, or is instead governed by deterministic hidden state?

fwdpropaganda8y ago

No, all theories of (local) hidden variables have been experimentally ruled out.

https://en.wikipedia.org/wiki/Bell%27s_theorem

> Bell's theorem states that any physical theory that incorporates local realism cannot reproduce all the predictions of quantum mechanical theory. Because numerous experiments agree with the predictions of quantum mechanical theory, and show differences between correlations that could not be explained by local hidden variables, the experimental results have been taken by many as refuting the concept of local realism as an explanation of the physical phenomena under test. For a hidden variable theory, if Bell's conditions are correct, the results that agree with quantum mechanical theory appear to indicate superluminal effects, in contradiction to the principle of locality.

2 more replies

Tloewald8y ago

Have quantum outcomes been tested for the kinds of statistical not-quite-randomness that PRNGs have? Aren't the measurements far too imprecise?

FabHK8y ago· 2 in thread

A few notes:

The author writes "Meanwhile, at least one influential researcher (whose work I respect) had harsh words publicly for her result", and then quotes some of these words:

   Note that (smartly enough) the PCG author avoids
   carefully to compare with xorshift128+ or xorshift1024*.

However, the author fails to note that said "influential researcher", Sebastiano Vigna, is the author of xorshift128+ and related PRNG.

In the linked test [2] by John D. Cook (who uses PactRand, a test similar to the (obsolete) DIEHARD), xorshift128+ and xoroshir0128+ fail within 3 seconds, while PCG ran 16 hours producing 2 TB of pseudo-random numbers without any suspicious p-value detected.

On the other hand, Vigna claims that the xoroshiro family does "pass" PactRand.

I've submitted an answer to StackOverflow a while ago [1], recommending xoroshiro and PCG, thus I'd be concerned if PCG turns out to be flawed. It's actually quite hard to get academics in the field to give an authoritative recommendation (I've tried) - their response is typically along the line "It's complicated"...

[1] https://stackoverflow.com/questions/4720822/best-pseudo-rand...

[2] https://www.johndcook.com/blog/2017/08/14/testing-rngs-with-...

Edit: remove italics due to asterisk in PRNG name, & add link to John. D Cook's test.

MiceWithYaffle8y ago

I think Vigna's claim is that if you ignore the PractRand tests that fail, it passes. (Really!)

O'Neill has instructions on how to test with PractRand and with TestU01 on her blog (http://www.pcg-random.org/blog/). I had a go with TestU01 on Vigna's generators, and when you test the low 32 bits reversed (for 64-bit PRNGs, you have to test the high 32, the low 32, both forwards and reversed), I found that all Vigna's generators fail.

Given the PractRand results it makes sense, I guess, but I had read that Vigna's generators were supposed to pass TestU01.

Does anyone else wants to have a go at testing so I can know if I screwed up somehow?

rockdoe8y ago

I think Vigna's claim is that if you ignore the PractRand tests that fail, it passes. (Really!)

The code does explain exactly what the issue is, i.e. that the last bit isn't random:

   This generator passes the PractRand test suite
   up to (and included) 16TB, with the exception of binary rank tests,
   which fail due to the lowest bit being an LFSR; all other bits pass all
   tests. We suggest to use a sign test to extract a random Boolean value.

But I'm tempted to agree this isn't a desirable property for a generic RNG.

How many users of JavaScript know about this property? (it's the default RNG for most browser engines) Or does it not matter because they return 53-bit floats?

1 more reply

starmole8y ago· 2 in thread

As an engineer who switched to PCG:

- PCG is not crypto, everybody should understand that. It's for simulation and rendering.

- PCG mainly replaces Mersenne Twister which is in c++11. The Twister has a LOT more state and is a LOT slower for less randomness.

- In rendering and simulation speed really matters, and PCG excels there.

- Xorshift is another algorithm in the same class. I would really like to see an objective comparison. In my cursory engineering look PCG seemed better.

- Fast PRNG is almost a new field again: It's not crypto, but immensely useful. How did the Twister get into C++11 while it is so much worse than PCG or Xorshift? Nobody cared!

- Maybe PCG should have been a paper at SigGraph.

- For the style of the paper, I think one contribution is rethinking PRNG outside crypto. That deserves and requires a lot of exposition.

MiceWithYaffle8y ago

Yeah, it's not for crypto. But I think it's for more than simulation and rendering. It's meant as a general purpose PRNG. It's just as good for randomized algorithms like picking the pivot in quicksort, or playing games, procedural content generation (PCG!), or whatever you want to use it for.

I think that the whole point of the prediction difficulty stuff is that a library (e.g., C++11's) with general purpose PRNGs can't know how they'll be used. Maybe some idiot write code for a gambling machine in C++ and use whatever PRNG is to hand. There was a story in the news the other week about people going around casinos predicting slot machines, so maybe this has already happened! PCG is trying to make your simulation and rendering code fast while trying to offer at least some defense against egregious misuse.

Basically PCG is trying to be a good all rounder. As you say, it's meant as a replacement for the Mersenne Twister.

starmole8y ago

I found the story: https://www.wired.com/2017/02/russians-engineer-brilliant-sl...

And yes, PCG is harder to exploit in this way than the Twister, but you still really should not bet money on it!

bmm6o8y ago· 2 in thread

It sounds like an interesting result, I look forward to reading the paper more carefully. That said, it's clearly not written for an academic journal. Section 2.4.3 is entitled "The Importance of Code Size", and explains why shorter code is better. I think you can argue that some academic papers are excessively concise, but this is a 58-page paper about an RNG. It is clearly not a journal paper and has a ton of extraneous content. I have to sympathize with the commenter that the author has made a trade-off and written a paper that's less rigorous than it should be (for peer review). I wonder why she didn't write 2 versions.

wadkar8y ago

> I wonder why she didn't write [two] versions.

Because the reviewers took over 10 months to respond with a rejection mainly citing the length of the paper. And more importantly, "By that point, everyone who might have wanted to read it had almost certainly found it here and done so, so I saw little merit in drastically shortening the paper."[1]

She has updated the blog post which discusses all the nuanced details of the whole affair last month (2017-07-25)[2].

[1] http://www.pcg-random.org/paper.html [2] http://www.pcg-random.org/posts/history-of-the-pcg-paper.htm...

bmm6o8y ago

There is no excuse for the journal to take so long to provide a response. At the same time, it seems to me that their response was entirely predictable. Or does the journal usually post such long articles directed at a general audience?

sulizilxia8y ago

I love O'Neill's work on PCG, and loved the talks by her I watched online.

As a tenured professor I want to say two things about this piece:

1. I think academic publishing will be forced to change. I'm not sure what it's going to look like in the end, but traditional journals are starting to seem really quaint and outdated now.

2. As far as I can tell from what she's written on the PCG page, the submission to TOMS is a poor example, because no one I know expects to be done with one submission. That is, no one I know submits a paper to one journal, even one reputable journal, and is done. They submit and it gets rejected and revise it and resubmit it, maybe three or even four times. After the fourth or fifth time, you might give up, but not necessarily even then.

I have mixed feelings about the PCG paper as an example, because in some ways it's great: an example of how something very influential has superceded traditional academic publishing. In other ways, though, it's horrible, because it's misleading about the typical academic publishing experience. Yes, academic publishing is full of random nonsense, and corruption, but yes, you can also get past it (usually) with just a little persistence. In still other ways, it's a good example of what we might see increasingly, which is a researcher having a lower threshold for the typical bullshit out there.

mjb8y ago

> I wonder whether the academic publications are growing ever less relevant to practice.

I think there are two topics here. One is whether academic research and work is becoming less relevant to practice. The other is whether the formalism of academic-style publishing are becoming less relevant to the modern world where more and more venues for publishing, rating, and discovering work.

On the former, I believe that academic work is as relevant as ever. There are some areas (like systems) where I'm doubtful about relevance from the point of view of a practitioner, but other areas (like hardware and ML where work remains extremely relevant). I haven't noticed a trend there over the last decade, except in some areas of systems where the industrial practice tends to happen on cluster sizes that are often not approachable for academia.

On the latter, academic publication does indeed seem to be getting less relevant. There are other (often better) ways to discover work. There are other ways to tell whether a piece of work is relevant, or credible. There are other, definitely better, ways to publish and distribute work. In some sense I think this is a pity: as an academic-turned-practitioner I like academic-style publications. Still, I think they are going to either change substantially or die.

This article raises another very good point: sometimes the formalism of academic publication makes the work harder to understand, less approachable, or less valuable. That's clear harm, and it seems like this professor was right to avoid that.

drallison8y ago

Melissa O'Neill gave a talk describing the PCG random number generator at Stanford in EE380. https://www.youtube.com/watch?v=45Oet5qjlms

j / k navigate · click thread line to collapse

83 comments

46 comments · 13 top-level

Animats8y ago· 6 in thread

Here's the site for the random number generator.[1] It's basically a simple linear congruential random number generator (well known, but not very good) fed into a mixer. The mixer is new.

Most of the analysis is about the LCG or the final output. The suggested mixer is just

    output = rotate64(uint64_t(state ^ (state >> 64)), state >> 122);

This isn't crypto-grade; both that mixer and a LCG generator are reversible with enough work.

[1] http://www.pcg-random.org/

marze8y ago

>linear congruential random number generator (well known, but not very good)

Relevant quotes from the paper:

and

lfowles8y ago

> 247 nanoseconds is less than two days

Yes, yes it is.

For those just as confused as I was, replace all instances of 247 with 247

2 more replies

msla8y ago

Two-to-the-forty-seven nanoseconds is a little over thirty-nine hours.

(Let's see HN mangle that!)

1 more reply

tptacek8y ago

But I'll confess to not really understanding what all the fuss is about insecure generators.

Animats8y ago

That's the paper, basically.

1 more reply

MiceWithYaffle8y ago

> But I'll confess to not really understanding what all the fuss is about insecure generators.

Based on things she's said on her site and in comments on John D. Cook's blog, it's all about algorithmic complexity attacks on randomized algorithms.

In other words, if you're doing quicksort on external input with a random pivot, and someone knows the PRNG state, they can make a pathological input that'll trigger quadratic behavior.

I don't know how likely this is to happen, but I know there were similar attacks on hash tables a few years ago.

1 more reply

lowmagnet8y ago· 6 in thread

That she ran into a paper wall doesn't bother her because she's openly publishing is even better.

dsacco8y ago

If you want to make your research more accessible, there are ways to do that without assuming that your reader is coming in from a dead start on the field.

mathperson8y ago

I am not sure why you are being downvoted. I think this is clearly true.

An easy way to make research accessible is to write a monograph!

thanatropism8y ago

If the whole thing is a matter of style and not of obfuscation, this would have given a grad student an easy, cool first publication.

pizza8y ago

Well, Harvey Mudd is an undergrad-only STEM school, so I'm not sure having a grad student edit it would have been possible.

1 more reply

mathperson8y ago

What is multi-culti?

3 more replies

CalChris8y ago

I watched her EE380 talk [1] and thought she was a fine professor for a difficult (to me) subject. That she didn't get published for at best stylistic reasons means nothing.

[1] https://ee.stanford.edu/event/seminar/ee380-computer-systems...

lisper8y ago· 3 in thread

This is an interesting article, but it's more about the changing landscape of academic publishing than it is about random number generators.

[EDIT] The actual paper is here: http://www.pcg-random.org/pdf/hmc-cs-2014-0905.pdf

pedrocr8y ago

Only if it turns out the work is actually good. If not this is exactly what should happen in peer review.

CalChris8y ago

1 more reply

lisper8y ago

You don't think the peer review process ever produces false negatives?

3 more replies

the_stc8y ago· 3 in thread

dsacco8y ago

That was my annoyance with the paper as well. Add to that explanations that amount to, "What even is determinism?" or "What's a seed?" and I'm unsurprised it's nearly 60 pages.

mathperson8y ago

are you joking? That is your example of abstruse mathematical notation? Some variables with names?!

the_stc8y ago

No. I am saying that giving the definition of a set each time is just extra verbosity. The whole paper had that extra verbiage, everywhere. Kind of why-use-one-word-when-ten-will-do feeling.

1 more reply

Houshalter8y ago· 3 in thread

thaumasiotes8y ago

> How do you prove a random number generator is good?

You can't; that's the nature of randomness. You can prove they're bad, though.

cgmg8y ago

A theoretical approach based on computability theory is https://en.m.wikipedia.org/wiki/Algorithmically_random_seque....

carapace8y ago

> How do you prove a random number generator is good?

Compress its output.

dsacco8y ago· 3 in thread

I have a few comments:

2. The author's tone in her writing is something I take issue with. For example, passages like this one...

3. The biggest grievance I have with this paper is the way in which it analyzes its primitives for cryptographic security. For example, this passage under 6.2.2 Security Considerations:

_________

1. http://www.pcg-random.org/pdf/hmc-cs-2014-0905.pdf

2. https://eprint.iacr.org/2017/477.pdf

tptacek8y ago

Just to be clear: it's not a cryptography paper, is it? Did you figure out what journal it was submitted to?

dsacco8y ago

She submitted it to ACM Transactions on Mathematical Software. I would personally consider it a cryptography paper, for three reasons:

2. We see cryptography papers submitted to all manner of theoretical CS conferences and journals, for example Symposium on the Theory of Computing, which are not uniformly crypto-focused.

2 more replies

ahh8y ago

ACM Transactions on Mathematical Software.

fwdpropaganda8y ago· 3 in thread

Physicist here.

Off-topic

> And it is not even entirely clear what “really random” would mean. It is not clear that we live in a randomized universe…

At the quantum level it really is clear that we live in a really random universe. What's the meaning of really random? The outcome of a quantum process.

eridius8y ago

fwdpropaganda8y ago

No, all theories of (local) hidden variables have been experimentally ruled out.

https://en.wikipedia.org/wiki/Bell%27s_theorem

2 more replies

Tloewald8y ago

Have quantum outcomes been tested for the kinds of statistical not-quite-randomness that PRNGs have? Aren't the measurements far too imprecise?

FabHK8y ago· 2 in thread

A few notes:

The author writes "Meanwhile, at least one influential researcher (whose work I respect) had harsh words publicly for her result", and then quotes some of these words:

   Note that (smartly enough) the PCG author avoids
   carefully to compare with xorshift128+ or xorshift1024*.

However, the author fails to note that said "influential researcher", Sebastiano Vigna, is the author of xorshift128+ and related PRNG.

On the other hand, Vigna claims that the xoroshiro family does "pass" PactRand.

[1] https://stackoverflow.com/questions/4720822/best-pseudo-rand...

[2] https://www.johndcook.com/blog/2017/08/14/testing-rngs-with-...

Edit: remove italics due to asterisk in PRNG name, & add link to John. D Cook's test.

MiceWithYaffle8y ago

I think Vigna's claim is that if you ignore the PractRand tests that fail, it passes. (Really!)

Given the PractRand results it makes sense, I guess, but I had read that Vigna's generators were supposed to pass TestU01.

Does anyone else wants to have a go at testing so I can know if I screwed up somehow?

rockdoe8y ago

I think Vigna's claim is that if you ignore the PractRand tests that fail, it passes. (Really!)

The code does explain exactly what the issue is, i.e. that the last bit isn't random:

   This generator passes the PractRand test suite
   up to (and included) 16TB, with the exception of binary rank tests,
   which fail due to the lowest bit being an LFSR; all other bits pass all
   tests. We suggest to use a sign test to extract a random Boolean value.

But I'm tempted to agree this isn't a desirable property for a generic RNG.

How many users of JavaScript know about this property? (it's the default RNG for most browser engines) Or does it not matter because they return 53-bit floats?

1 more reply

starmole8y ago· 2 in thread

As an engineer who switched to PCG:

- PCG is not crypto, everybody should understand that. It's for simulation and rendering.

- PCG mainly replaces Mersenne Twister which is in c++11. The Twister has a LOT more state and is a LOT slower for less randomness.

- In rendering and simulation speed really matters, and PCG excels there.

- Xorshift is another algorithm in the same class. I would really like to see an objective comparison. In my cursory engineering look PCG seemed better.

- Fast PRNG is almost a new field again: It's not crypto, but immensely useful. How did the Twister get into C++11 while it is so much worse than PCG or Xorshift? Nobody cared!

- Maybe PCG should have been a paper at SigGraph.

- For the style of the paper, I think one contribution is rethinking PRNG outside crypto. That deserves and requires a lot of exposition.

MiceWithYaffle8y ago

Basically PCG is trying to be a good all rounder. As you say, it's meant as a replacement for the Mersenne Twister.

starmole8y ago

I found the story: https://www.wired.com/2017/02/russians-engineer-brilliant-sl...

And yes, PCG is harder to exploit in this way than the Twister, but you still really should not bet money on it!

bmm6o8y ago· 2 in thread

wadkar8y ago

> I wonder why she didn't write [two] versions.

She has updated the blog post which discusses all the nuanced details of the whole affair last month (2017-07-25)[2].

[1] http://www.pcg-random.org/paper.html [2] http://www.pcg-random.org/posts/history-of-the-pcg-paper.htm...

bmm6o8y ago

sulizilxia8y ago

I love O'Neill's work on PCG, and loved the talks by her I watched online.

As a tenured professor I want to say two things about this piece:

1. I think academic publishing will be forced to change. I'm not sure what it's going to look like in the end, but traditional journals are starting to seem really quaint and outdated now.

mjb8y ago

> I wonder whether the academic publications are growing ever less relevant to practice.

drallison8y ago

Melissa O'Neill gave a talk describing the PCG random number generator at Stanford in EE380. https://www.youtube.com/watch?v=45Oet5qjlms

j / k navigate · click thread line to collapse