so, if their experiment was designed such that reproduction is inherently difficult, they should have designed it in a better way, and they should've used a toolset that wouldn't run into that problem.
a non-reproducible experiment isn't necessarily completely without value, but it's a thing that everyone should look askance at till it proves its worth.
(apologies if my comments don't apply to this experiment and if it is reproducible -- i didn't have time to read through the OP, but i thought this reply was still a worthwhile response to its specific parent comment)
That being said the paper in question doesn't seem to reference open source code anyway so I guess my point was kind of moot, apologies.
There are specific CUDA operations which are not guaranteed to be reproducible though, as well as some CuDNN operations which are non-determanistic without performance sacrifice, and this does cause real problems.
See https://pytorch.org/docs/stable/notes/randomness.html for some reasonable docs on this.