Wow, cool, you actually tested it! And an effective test too.
Stepping back and forth throughout the frames (using mpv), the flash clearly enhances several spots of localized brightness where contrast pops out into clear relief.
The effect is clearest at the very bottom of the image which goes from "shadow blob" to "adequately discernible", but I think the area just above that (the 3rd vertical quarter of the image) is most interesting; the detail visible in frames 24-29 (immediately before 00:00:01 / 30.030fps) is excellent, and that's with the flash LED at peak brightness.
Flash synchronization would be effectively impossible to achieve (the camera would need to stream LED status information inside each frame), but achieving such synchronization may provide no net gain, even with "LED is on" information available, both because the exact point the hardware says "LED is off" will not necessarily correspond to the exact moment in time the light decays to zero (based on 1/960 = 1.0416 milliseconds per frame, the video suggests it takes apparently 2 frames or ~2.08 milliseconds for the light to decay), which will never be the same as the flash sends light outwards into arbitrarily different environments. I can't help but wonder if calibration references for everything from Vantablack to mirrors would be needed... for each camera sensor... and that there would then be the problem of figuring out which reference(s?) to select.
Staring at the video frames some more, two ideas come to mind: 1), analyzing all the frames to identify areas of significant difference in brightness, then 2), for each (perhaps nonrectangular) region of difference, figuring out the "best" source reference for that specific region. As an example reference, I'd generally use frame 13 for most of the image, and frame 44 or so (out of many, many possible candidates) for the bits that, as you say, become float64 1.00 :). Obviously a nontrivial amount of normalization would then be needed.
I'm not aware of how you'd do either of these neurally :) but the idea for (1) came from https://en.wikipedia.org/wiki/Seam_carving (although just basic edge detection may be more correct for this scenario), while the idea for (2) came from
https://github.com/google/butteraugli which "estimates the psychovisual similarity of two images"; perhaps there's something out there that can identify "best contrast"? I'm not sure.
Trivial aside: I wondered why mpv kept saying "Inserting rotation filter." and also why the frame numbers appeared sideways. Then I realized the video has rotation metadata in it, presumably so the device doesn't need to do landscape-to-portrait frame buffering at 960fps (heh). I then realized the left-to-right rolling shutter effect I was seeing was actually a bottom-to-top rolling shutter. I... think that's unusual? I'm curious - after Googling then reading (or, more accurately, digging signal out of) https://www.androidauthority.com/real-960fps-super-slow-moti... - was the device an Xperia 1?
(And just to write it down for future reference: --vf 'drawtext=fontcolor=white:fontsize=100:text="%{n}"' adds frame numbers to mpv. Yay.)