It is technically possible to make it deterministic.
The main reason you don't deterministic outputs today is that Cuda/GPU optimizations make the calculations run much faster if you let them be undeterministic.
The internal GPU scheduler will then process things in the order it thinks is fastest.
Since floating point is not associative, you can get different results for (a + (b + c)) and ((a + b) + c).