undefined | Better HN

0 pointsexikyut6y ago0 comments

The way I mistakenly initially parsed this comment gave rise to a potentially-dumb idea/question:

What would happen if you

- begin capturing video (unsure of fps) on a phone-quality sensor in a near-dark environment

- pulse the phone's flash LED(s) like you're taking a photo

- do super-resolution on the resulting video to extract a photo...

- ...while factoring in the decay in brightness/saturation in consecutive video frames produced by the flash pulse?

I vaguely recall reading somewhere that oversaturated photos have more signal in them and are easier to fix than undersaturated. Hmm.

IIRC super-resolution worked with 30fps source video for better quality; I wonder if 60fps or 120fps source video would produce better brightness decay data, or whether super-resolution could actually help extract more signal out of the decay sequence too.

On the other hand, I'm not sure if super-resolution fundamentally requires largely consistent brightness in order to work as well as it does. :/

Perhaps individual networks could be trained/tuned to specific slices/windows of the brightness gradient. I also wonder if it would be useful to factor the superresolution process into each of the brightness-specific stages or just to do it at the end.

0 comments

2 comments · 1 top-level

y7r4m6y ago· 1 in thread

For the most part, our effort has been focused on single exposure image enhancement, however we are beginning to use recurrent models to improve quality when video information is available.

Nonetheless, it's kinda a neat idea, so I tried testing the feasibility of it. I set up a recent flagship phone that claims to have 960fps super-slow-motion video capture next to another phone with a strobe app at 12Hz with a short delay in between pulses.

https://www.dropbox.com/s/ha51ntucl3klkcb/cell_flash_960fps....

There are definitely a few frames where the LED is at an intermediate brightness, however teasing out the exact timings between the flash and the camera may prove to be difficult to correctly synchronize.

As for over-saturated images having more signal... although the PSNR calculation may give you a better number, in practice, a region that is over-saturated is just a blob of 1s on the image (assuming float64 pixel values of 0-1) and there is no information there to extract. With a black level near but not at 0, we've found there is often more information hidden in the 'dark noise' than can be discerned by the human eye alone.

exikyutOP6y ago

Wow, cool, you actually tested it! And an effective test too.

Stepping back and forth throughout the frames (using mpv), the flash clearly enhances several spots of localized brightness where contrast pops out into clear relief.

The effect is clearest at the very bottom of the image which goes from "shadow blob" to "adequately discernible", but I think the area just above that (the 3rd vertical quarter of the image) is most interesting; the detail visible in frames 24-29 (immediately before 00:00:01 / 30.030fps) is excellent, and that's with the flash LED at peak brightness.

Flash synchronization would be effectively impossible to achieve (the camera would need to stream LED status information inside each frame), but achieving such synchronization may provide no net gain, even with "LED is on" information available, both because the exact point the hardware says "LED is off" will not necessarily correspond to the exact moment in time the light decays to zero (based on 1/960 = 1.0416 milliseconds per frame, the video suggests it takes apparently 2 frames or ~2.08 milliseconds for the light to decay), which will never be the same as the flash sends light outwards into arbitrarily different environments. I can't help but wonder if calibration references for everything from Vantablack to mirrors would be needed... for each camera sensor... and that there would then be the problem of figuring out which reference(s?) to select.

Staring at the video frames some more, two ideas come to mind: 1), analyzing all the frames to identify areas of significant difference in brightness, then 2), for each (perhaps nonrectangular) region of difference, figuring out the "best" source reference for that specific region. As an example reference, I'd generally use frame 13 for most of the image, and frame 44 or so (out of many, many possible candidates) for the bits that, as you say, become float64 1.00 :). Obviously a nontrivial amount of normalization would then be needed.

I'm not aware of how you'd do either of these neurally :) but the idea for (1) came from https://en.wikipedia.org/wiki/Seam_carving (although just basic edge detection may be more correct for this scenario), while the idea for (2) came from https://github.com/google/butteraugli which "estimates the psychovisual similarity of two images"; perhaps there's something out there that can identify "best contrast"? I'm not sure.

Trivial aside: I wondered why mpv kept saying "Inserting rotation filter." and also why the frame numbers appeared sideways. Then I realized the video has rotation metadata in it, presumably so the device doesn't need to do landscape-to-portrait frame buffering at 960fps (heh). I then realized the left-to-right rolling shutter effect I was seeing was actually a bottom-to-top rolling shutter. I... think that's unusual? I'm curious - after Googling then reading (or, more accurately, digging signal out of) https://www.androidauthority.com/real-960fps-super-slow-moti... - was the device an Xperia 1?

(And just to write it down for future reference: --vf 'drawtext=fontcolor=white:fontsize=100:text="%{n}"' adds frame numbers to mpv. Yay.)

j / k navigate · click thread line to collapse