It literally says that the GPU is deterministic, the NVIDIA libraries on top are deterministic, but it is Tensorflow that introduces variability (errors!) for “performance”.
My argument is that it is the AI/ML code that is introducing non-determinism, usually by sacrificing repeatability to gain performance.
That's precisely what's happening here. Tensorflow introduced a "harmless"[1] data race to improve performance by not having to use a deterministic but slower algorithm.
The individual floating point computations are deterministic, it's the multi-threaded design on top that's introducing the variability in the output.
[1] Used to be harmless, but cutting corners like this will make it nigh impossible to repeatably validate the safety of future models like GPT5. That seems pretty dangerous...