Zero Tolerance for Bias (opens in new tab)

(queue.acm.org)

190 pointsHarmohit2y ago89 comments

89 comments

50 comments · 8 top-level

orlp2y ago· 16 in thread

Here is a trivial shuffle algorithm that is completely unbiased and only requires an unbiased coin (or random number generator giving bits):

1. Randomly assign each element to list A or list B. 2. Recursively shuffle lists A and B. 3. Concatenate lists A and B.

To prove it's correct, note that assigning a random real number to each element and sorting based on that number is an unbiased shuffle. Then we note the above does in fact do that by considering the fractional base-2 expansion of the random numbers, and noting the above is in fact a base-2 radix sort of these numbers. We can sort these random real numbers even though they have an infinite amount of random bits, as we can stop expanding the digits when the prefix of digits is unique (which corresponds to the event that a list is down to a single element).

I call the above algorithm RadixShuffle. You can do it in base-2, but also in other bases. For base-2 you can make it in-place similar to how the partition for Quicksort is implemented in-place, for other bases you either have to do it out-of-place or in two passes (the first pass only counting how many elements go in each bucket to compute offsets).

The above can be combined with a fallback algorithm for small N such as Fisher-Yates. I believe even though the above is N log N it can be faster than Fisher-Yates for larger N because it is exceptionally cache-efficient as well as RNG-efficient whereas Fisher-Yates requires a call to the RNG and invokes an expected cache miss for each element.

---

Another fun fact: you can turn any biased memoryless coin into an unbiased one with a simple trick. Throw the coin twice, if it gives HH or TT you throw away the toss, if it's HT or TH you use the first toss as your unbiased coin.

This works because if p is the probability that heads comes up we have:

    HH: p^2
    HT: p(1-p)
    TH: (1-p)p
    TT: (1-p)^2

Naturally, p(1-p) and (1-p)p are equiprobable, thus if we reject the other outcomes we have distilled an unbiased coin out of our biased coin.

mlochbaum2y ago

I was able to make a variant of the higher-base version that runs in a single pass, by stopping when one partition fills up and using a different method for the remaining (asymptotically few) elements. I described the idea, which is based on another effort called MergeShuffle, here: https://mlochbaum.github.io/BQN/implementation/primitive/ran...

And it is better when N gets large. My implementation set the cutoff at 2^19 elements, although the effect isn't too big for a few more powers of two. Here's the main radix loop: https://github.com/dzaima/CBQN/blob/v0.7.0/src/builtins/sysf...

orlp2y ago

I found another in-place approach which also does a higher-base version described here: https://arxiv.org/pdf/2302.03317, with an open source implementation: https://crates.io/crates/rip_shuffle. Might want to compare it with your version.