1. Truncate your lat longs to some arbitrary decimal place (this is very very stupid, you end up with grid lines [1])
2. The above method ^^ but everyone tries basically doing like random angle + random length along angle, which doesn't generate uniform results in a circle as the article mentions[2]. So then you try generating points in a box that encloses your circle, and rejecting anything outside the circle, but that smells bad. So you do some googling and find method 6 listed in the article (Good! and Fast!)
3. Realize that fuzzing points is stupid, what you really want is to view points in aggregate anyways, so you try heat maps
[1]: Ok you always end up with grid lines, but truncating to like 1-6 decimal places produces very obvious grid lines to the human eye
[2]: Try this in pyplot! You'll see right away with ~100 points its not uniform in the way you expect
https://extremelearning.com.au/how-to-evenly-distribute-poin...
It might fail for higher dimensions, but lots of programs only run on a 3D sphere of a planet, haha!
I don't actually know of any useful algorithms with worse asymptomatic behaviour.
Fortunately, I never used it for anything, because I made the classic naive mistake of simply choosing a random theta in 0,2pi and phi in -pi,pi, which ends up with points biased towards the poles.
Somehow *12 years later* my subconscious flagged it up and I woke up in the middle of the night realizing the issue. Even though I'd never revisited it since then!
https://github.com/dmd/thesis/commit/bff319690188a62a79821aa...
Am I right in thinking that it’s because circles of constant theta on the sphere contain the same ‘expected’ number of points (since phi is uniformly distributed), and these circles get smaller (and in the limit, shrink to a point) as one travels towards the poles — hence, higher density of points…?
…makes me realise that spherical polar coordinates, as natural as they seem, aren’t in some way homogeneous in the sense that whilst the mapping (theta, phi) —> (x, y, z) is continuous, it isn’t uniformly so… or something like that. I don’t know enough (possibly differential) geometry to articulate it properly, but it feels like the problem of getting genuinely uniform sampling from a sphere (or indeed any manifold) is somewhat equivalent to having a ‘nice’ coordinate system (one that respects the symmetry of the sphere). Basically, polar coordinates are prejudiced and treat points differently depending on how close to the poles they are.
By definition, our sampling in the domain space (two intervals) is uniform; the problem comes when we project through a coordinate system that doesn’t respect this.
Which better solution did you use? I’m having trouble reading your code on my current device.
The term is "measure preserving". The mapping is continuous but it doesn't preserve the length of intervals projected through it.
I didn't, since I haven't touched or needed this code since 2003. I just put a warning saying don't use it.
I don't think it's a good idea to introduce over 20 different methods before talking about the correct one that works for any number of dimensions, say n, and the reason behind its correctness is very obvious:
* Generate a random vector by sampling n standard normal distributions: `vector = np.random.randn(n)`
* Key step: show that the vector has a uniform direction.
The proof is as follows: you look at the probability density function of a normal distribution, which is `p(x)=1/sqrt(2pi)*exp(-x^2/2)`, and the probability density function of the vector is the product of all these densities of its individual dimensions. Now the product `p(x1)p(x2)...p(xn)=1/sqrt(2pi)^n*exp(-(x1^2+x2^2+...+xn^2)/2)` since `exp(a)exp(b)=exp(a+b)` due to power functions.
Now it's easy to see that probability density is invariant to vector length, which means the vector has uniform probability for any specific value of x1^2+x2^2+...+xn^2. Whatever rotation you apply after sampling this vector, since rotation preserves x1^2+x2^2+...+xn^2 by definition, you get the exact same probability density function and therefore the same distribution of vectors.
* Now that the direction of the vector is uniformly sampled, decide the radius separately: for n-sphere the radius is just 1, and for n-ball the volume of the radius is proportional to the nth power of the radius, so you sample a uniform number from [0,1] as the volume and take the nth root as the radius: `radius = np.random.uniform(0, 1)*(1/n)`
* Normalize the vector to the radius: `vector = vector / np.sqrt((vector*2).sum()) * radius`
I think Section 2 of this book provides a much better perspective on the problem of generating uniform random points, since it also provides intuition behind the geometry of high dimensions and properties of the unit ball, etc.
Tao, T., Vu, V., and Krishnapur, M. (2010) Random matrices: universality of ESDs and the circular law. The Annals of Probability. 38(5) 2023-2065.
If the circle represents the area in which a simulated projectile will hit, you probably don't want a truly random distribution of points but instead have a bias towards the middle of the circle. A real gun shot multiple times at a fixed target will probably (assuming perfect aim but some variation on every shot) have more shots hit the middle of the pattern than the edges.
Some early Valve shot code actually had a purely random distribution, but at some point an alternate version got written and shared at https://developer.valvesoftware.com/wiki/CShotManipulator
Ironically the biased version is based on a pretty simple method that in fact people sometimes get wrong when they want a truly random point distribution in a circle. Just doing a random radius and theta will lead to a biased distribution. Wolfram Mathworld has a good writeup on it at https://mathworld.wolfram.com/DiskPointPicking.html
> purely random distribution
Nitpick: you don’t mean random; you mean uniform.
I think the best illustration of "reasonable choices of what random values should be used leading to biased results" is Bertrand's paradox which I was introduced to via numberphile/3blue1brown: https://www.youtube.com/watch?v=mZBwsm6B280 and am just glad that nothing I have ever needed random sampling for has ever been important :D
[1] please don't use this blindly, I'm really just going off very old recollection, if you need it google random sphere sampling :D
Currently, I am porting my codebase to Rust and did that part over the weekend, so if anyone is interested in this exact implementation, I'd willing to share it (as a crate if necessary).
If I roll a 12-sided die, that's random. If I roll two 6-sided dice and add the result, that's also random, but it has a different distribution of values.
The two dice verison, there's one way to get a result of 2, but _several_ ways to get 7. You'll get 7 way more often than you'll get 2.
The one-die version each outcome is equally likely. You're exactly as likely to get 2 as you are to get 7 or any other value in the range of possibilities.
The one-die version is a uniform distribution. The two-dice version is not uniform.
https://extremelearning.com.au/how-to-evenly-distribute-poin...
https://numpy.org/doc/stable/reference/random/generated/nump...
Or in god's own words (TAOCP section 3.4):
Applications of random numbers often call for other kinds of distributions, however; for example, if we want to make a random choice from among k alternatives, we want a random integer between 1 and k. If some simulation process calls for a random waiting time between occurrences of independent events, a random number with the exponential distribution is desired. Sometimes we don't even want random numbers — we want a random permutation (a random arrangement of n objects) or a random combination (a random choice of k objects from a collection of n).
In principle, any of these other random quantities can be obtained from the uniform deviates U0, U1, U2, ...; people have devised a number of important "random tricks" for the efficient transformation of uniform deviates. A study of these techniques also gives us insight into the proper use of random numbers in any Monte Carlo application.
The easiest example most people are aware of in practice is throwing two 6 sided dice vs one 12 sided dice. The outcome of both is random, but people seem to know fairly intuitively that the two dice case produces is more likely to produce middle values than extreme ones, while the single 12 sided dice doesn't have that behavior.
My reasoning is, if you are counting cards in a game, you are giving yourself an advantage. But what really happens is that from your point of view cards will be drawn less and less uniformly randomly. Or put in another way, if you know the distribution is normal, you can bet on the the result being near the center and come out on top, but if it’s uniformly random distribution all hopes are out.
But what if the domain isn't finite? e.g. the entire real line? Then we can't define a uniform distribution, because the total probability can never sum to one. We can still define a maximum entropy distribution, but only by making some more assumptions. The normal distribution is the maximum entropy distribution on the real line, given a particular mean and variance.
Other examples: on the positive real line, the exponential distribution is maximum-entropy for a given mean. Poisson distribution is the equivalent on the non-negative integers.
The problem is if you are doing something that requires uniform random values, and your random source is not uniform, then things will go wrong.
Rolling two dice gives you a number which is random in [2, 12], but not uniform -- 7 is far more likely than 2 or 12.
Finite support somehow seems less random because you can still say something interesting about its mean and bounds in the same way you can say something about the mean and variance of a normal distribution.
That’s just because the payout distribution and real distribution are different, and there’s a place the payout is at a loss.
If you bet on a uniform distribution of 2 through 12 that payed out according to a normal distribution, you’d make a killing betting on 2 or 12 (the “outliers”). If you roll 2d6 (normalish) and get paid according to a uniform distribution, you’ll make a killing betting on 7.