> If you ask them to add a million floats in order, you get the same result every time.
There are a bunch of ways to add a million floats in order on a gpu, but they will all get you different results.:
* split the million floats into ‘n’ chunks, each chunk is summed, then you sum the ‘n’ results. * if you sum results as they are gathered (you don’t need to block) you will get a non-deterministic result, as the threads finishing (outside of a warp) is non-deterministic in order. * if you change ‘n’, your result will change. * if you sort after gathering , your result will change.
TLDR: parallel race-conditions are nondeterministic. Map-reduce has an underlying race-condition that you can prevent, but it costs time/performance. Sometimes you don’t care about the non-determinism enough to pay the performance penalty to fix it.
[0] https://www.microsoft.com/en-us/research/wp-content/uploads/...