Also worth reading up on are sum-heaps. Alias tables are O(1) to sample from but O(n) to build/modify. Sum-heaps let you modify in O(log(n)) at the cost of sampling in O(log(n)) as well. A good writeup is here: https://timvieira.github.io/blog/post/2016/11/21/heaps-for-i...
Here's a quick demo class that shows the technique. It's amazingly simple.
class Demo EXAMPLE = { "75%" => 0.75, "15%" => 0.15, "9%" => 0.09, "1%" => 0.01 }
def self.sample(choices = EXAMPLE)
choices.max_by { |_, weight| rand ** (1.0 / weight) }.first
end
def self.show(samples: 10000)
items = samples.times.map {sample}
items.each_with_object(Hash.new(0)) do |item, counts|
counts[item]+=1
end.sort_by(&:last).reverse.to_ h
end
end# Demo.show # => {"75%"=>7480, "15%"=>1525, "9%"=>901, "1%"=>94}
Edit: Sorry for the bad formatting. I added a gist:
https://gist.github.com/jasonl99/25d3c922d73f10a75fe228c2de3...
(For choosing from a small number of things this code has the advantage that it's not doing a lot of complicated stuff in Ruby, and in any case the speed probably doesn't matter too much. In fact, even when choosing from a large enough number of things to make this much slower than it need be, the speed may well not matter at all. But it's worth being aware.)