Subpixel: A subpixel convolutional neural network implementation with Tensorflow (opens in new tab)

(github.com)

210 pointsjgoldsmith9y ago44 comments

44 comments

35 comments · 11 top-level

maxander9y ago· 9 in thread

So, basically, this is the thing in a crime detective movie where the forensic analyst is looking at a terrible pixelated surveillance camera still and says "enhance," and the computer magically increases the resolution to reveal the culprit's face.

Just another entry on the "things that are supposed to be impossible that convolutional nets can do now."

Eliezer9y ago

And that's how that guy whose face appeared a few times in ImageNet became the world's most wanted terrorist, on the run for thousands of crimes.

nullc9y ago

Ding ding.

But good luck convincing a jury that a maximum likelihood decode from a few grainy pixels wasn't reliable when it gave a crystal clear output.

1 more reply

taneq9y ago

I bet he's hiding out with that lab tech whose poor technique lead to their DNA being in hundreds of crime scene samples.

eejr9y ago

yup, to certain point! there are information theoretic limits though. You can fill in information, but there will be biases to a certain point. in this case defined by the dataset. if the "enhance" is too strong, we should be careful with what we do with the results in forensics.

but man, it can make your internet pics look smooth! :) thanks for the comment!

paulsutter9y ago

If you have multiple images of the same scene (for example, from video frames), you should be able to use information across frames for a true enhancement?

2 more replies

nitrogen9y ago

This (overenhancement) was a minor plot point in Crichton's novel Congo, IIRC.

dankohn19y ago

Here is the ridiculous Let's Enhance supercut that all just became realistic: https://www.youtube.com/watch?v=LhF_56SxrGk

(Created by the super talented duncanrobson)

duaneb9y ago

I imagine you'd want PII erased from the training set, but the danger stands.

joelthelion9y ago

Except it might unblur to the face of someone else.

Roboprog9y ago· 6 in thread

Interesting image "upscale" algorithm.

I'm not familiar enough with the field to understand how the "neutral net" part feeds in, other than to do parallel computation on the x-pos, y-pos, (RGB) color-type-intensity tensor interpolated/weighted into a larger/finer tensor.

(linear algebra speak for upscaling my old DVD to HD, that sort of thing)

At the risk of exposing my ignorance, this has nothing to do with "AI", right? It's "just" parallel computation?

eejr9y ago

yeah, no AI. Its low level computer vision. There is no implicit understanding of the scene to enhance it here. We show the neural nets several examples of low and high quality images it learns a function that makes the low quality looks more like the high quality.

this may make you feel disappointed now, but in the write up we are also pitching this same module to be used in generative networks and other models that do build an understanding of the scene. Lets see what the community (and ourselves) can do next...

utkarshsinha9y ago

Wait, the neural network encodes within itself probability distributions of the various image patches it has seen. This is sort of like AI.

Approaches in the past used heuristics (like finding edges and upsampling them, etc). Those were fragile systems. In this approach, the system learns what's appropriate on its own.

1 more reply

anotheryou9y ago

I'm glad to hear that, I feared it might just paste any eyes where it sees some eyes, but like this it might be much closer to what is really in the pixels.

tree_of_item9y ago

Everything that we understand how to do is "not really AI". It's only "AI" when it's still a mystery. At least that's the way people act.

Roboprog9y ago

Fair enough :-)

I missed the part, though, where there was some "learning"/"adjusted predication" in the interpolation function(s), rather than just a fixed calculation such as a literal linear interpolation.

I was happy just to be able to tease apart the big equation before the the python code sample, but was too lazy to drill down into what the "delta-x"/"delta-y" factor-functions were.

Still, this was a good presentation: somebody with little to no knowledge of the field, but some math, could get the gist of it. Kudos to the author.

andrewprock9y ago

I would expect AI to include some sort "emergent behavior", so in a sense you are correct. If a program does exactly what we expect it to, exactly how we tell it to, it almost certainly isn't AI.

Unless we are telling it to "be intelligent" whatever that means.

1 more reply

zokier9y ago· 2 in thread

I'm not sure, but there seems to be something wonky in the input images. They are very blocky, so I thought that they would be just pixel doubled (or quadrupled) from low-res pictures, but the blockiness lacks the regularity I'd expect from pixel-doubled images.

How were the input images prepared?

anotheryou9y ago

Super wonky indeed. Also it should compare to something like photoshops bicubic enlargement or the original size, because the brain gets stuck on the pixel edges.

trustswz9y ago

If you are interested in how it compares to bicubic or the original. Check these papers using the sub pixel convolutional layer: https://arxiv.org/abs/1609.05158 https://arxiv.org/abs/1609.04802.

1 more reply

Keyframe9y ago· 2 in thread

This is impressive! But, I'll be really impressed once this 'new thing' brings us roto masks in motion. That is, isolating objects from background on a movie with pixel-perfect accuracy. It will also make a lot of people out of job and a lot of people happy at the same time.

trobertson9y ago

Considering motion blur, "pixel-perfect" is a difficult requirement.

Keyframe9y ago

There are ways around it. In case of motion blur, edge has a 'feather' where mask's alpha is a gradient. In severe cases, mask's curve has a lot of control segments with each having a different in and out feather defined.

edit:

example: https://youtu.be/yZyIYUEfT3U?t=71

Also, masks themselves can be motion blurred, and if motion blur approximation is close enough to the footage, then it's good https://www.youtube.com/watch?v=biginQL6NIo

And, what it looks like pulling a matte with state-of-the-art tools https://www.youtube.com/watch?v=8oQqr6Lfmag Still a pain.

imaginenore9y ago· 2 in thread

The problem with subpixel images is that there are RBG and GBR monitors. Not only that, there are horizontal and vertical variations. And there's no way to tell which one the user is using on the web. And that's not even counting all the mobile number like pentile.

It's still useful though, browsers, for instance, could use it for displaying downscaled images.

mappu9y ago

This project is using 'subpixel' not to refer to monitor subpixels, but instead, lost information between existing pixels in an image.

You're right though, and that's why chroma hinting for subpixel AA has fallen out of favor. It also doesn't work on mobile where the screen can be rotated from RGB-horz to RGB-vert at a moment's notice. This was changed for ClearType in Windows 8 (DirectWrite never did chroma hinting).

eejr9y ago

this is supposed to be used in the data processing step. you load your image from jpeg or your video using ffmpeg, enhance the images and then pass it to the next step where color rendering is done. you can do that in the browser or mobile just as fine.

anotheryou9y ago· 1 in thread

I think it's always problematic to compare to images upscaled via nearest-neighbor. The big pixels are hard to parse for our brain, we detect all the blocky edges.

A good content unaware upscaling would be nice (one of the default photoshop algos)

I also wonder what they used for the downscaling. I see 4x4 pixel blocks, but also some with 3px or 7px lengths.

This looks pixely and is supposed to be a source file?: https://raw.githubusercontent.com/Tetrachrome/subpixel/d2e28...

anotheryou9y ago

from trustswz' comment:

https://arxiv.org/abs/1609.04802

The pic with the boat on page 13 is interesting. In the SRGAN version I would take the shore for some sort of cliff, while the original shows separated boulders.

ericjang9y ago· 1 in thread

The explanation in the README of the github project is excellent and well-written! Here's a really great set of animations by Vincent Dumoulin on how various conv operators work: https://github.com/vdumoulin/conv_arithmetic

transcranial9y ago

And https://arxiv.org/abs/1603.07285 for the corresponding paper. Really clear and easy-to-understand explanation of some of the math.

thoreauway9y ago· 1 in thread

ENHANCE. ENHANCE.

ct5209y ago

https://www.youtube.com/watch?v=KiqkclCJsZs

markisus9y ago

It seems that this subpixel convolution layer is equivalent to what is known in the neural net community as the "deconvolution layer" but it is much more memory and computation efficient. The interlacing rainbow picture was a bit hard to understand until I read this https://export.arxiv.org/ftp/arxiv/papers/1609/1609.07009.pd...

amelius9y ago

Interesting. They should post more examples (not with just faces), or make an online demo, like waifu2x [1]

[1] http://waifu2x.udp.jp/

1 more reply

robertkrahn019y ago

And I always wondered how those photo enhancers in Blade Runner worked...!

j / k navigate · click thread line to collapse

44 comments

35 comments · 11 top-level

maxander9y ago· 9 in thread

Just another entry on the "things that are supposed to be impossible that convolutional nets can do now."

Eliezer9y ago

And that's how that guy whose face appeared a few times in ImageNet became the world's most wanted terrorist, on the run for thousands of crimes.

nullc9y ago

Ding ding.

But good luck convincing a jury that a maximum likelihood decode from a few grainy pixels wasn't reliable when it gave a crystal clear output.

1 more reply

taneq9y ago

I bet he's hiding out with that lab tech whose poor technique lead to their DNA being in hundreds of crime scene samples.

eejr9y ago

but man, it can make your internet pics look smooth! :) thanks for the comment!

paulsutter9y ago

If you have multiple images of the same scene (for example, from video frames), you should be able to use information across frames for a true enhancement?

2 more replies

nitrogen9y ago

This (overenhancement) was a minor plot point in Crichton's novel Congo, IIRC.

dankohn19y ago

Here is the ridiculous Let's Enhance supercut that all just became realistic: https://www.youtube.com/watch?v=LhF_56SxrGk

(Created by the super talented duncanrobson)

duaneb9y ago

I imagine you'd want PII erased from the training set, but the danger stands.

joelthelion9y ago

Except it might unblur to the face of someone else.

Roboprog9y ago· 6 in thread

Interesting image "upscale" algorithm.

(linear algebra speak for upscaling my old DVD to HD, that sort of thing)

At the risk of exposing my ignorance, this has nothing to do with "AI", right? It's "just" parallel computation?

eejr9y ago

utkarshsinha9y ago

Wait, the neural network encodes within itself probability distributions of the various image patches it has seen. This is sort of like AI.

Approaches in the past used heuristics (like finding edges and upsampling them, etc). Those were fragile systems. In this approach, the system learns what's appropriate on its own.

1 more reply

anotheryou9y ago

I'm glad to hear that, I feared it might just paste any eyes where it sees some eyes, but like this it might be much closer to what is really in the pixels.

tree_of_item9y ago

Everything that we understand how to do is "not really AI". It's only "AI" when it's still a mystery. At least that's the way people act.

Roboprog9y ago

Fair enough :-)

I missed the part, though, where there was some "learning"/"adjusted predication" in the interpolation function(s), rather than just a fixed calculation such as a literal linear interpolation.

I was happy just to be able to tease apart the big equation before the the python code sample, but was too lazy to drill down into what the "delta-x"/"delta-y" factor-functions were.

Still, this was a good presentation: somebody with little to no knowledge of the field, but some math, could get the gist of it. Kudos to the author.

andrewprock9y ago

I would expect AI to include some sort "emergent behavior", so in a sense you are correct. If a program does exactly what we expect it to, exactly how we tell it to, it almost certainly isn't AI.

Unless we are telling it to "be intelligent" whatever that means.

1 more reply

zokier9y ago· 2 in thread

How were the input images prepared?

anotheryou9y ago

Super wonky indeed. Also it should compare to something like photoshops bicubic enlargement or the original size, because the brain gets stuck on the pixel edges.

trustswz9y ago

If you are interested in how it compares to bicubic or the original. Check these papers using the sub pixel convolutional layer: https://arxiv.org/abs/1609.05158 https://arxiv.org/abs/1609.04802.

1 more reply

Keyframe9y ago· 2 in thread

trobertson9y ago

Considering motion blur, "pixel-perfect" is a difficult requirement.

Keyframe9y ago

edit:

example: https://youtu.be/yZyIYUEfT3U?t=71

Also, masks themselves can be motion blurred, and if motion blur approximation is close enough to the footage, then it's good https://www.youtube.com/watch?v=biginQL6NIo

And, what it looks like pulling a matte with state-of-the-art tools https://www.youtube.com/watch?v=8oQqr6Lfmag Still a pain.

imaginenore9y ago· 2 in thread

It's still useful though, browsers, for instance, could use it for displaying downscaled images.

mappu9y ago

This project is using 'subpixel' not to refer to monitor subpixels, but instead, lost information between existing pixels in an image.

eejr9y ago

anotheryou9y ago· 1 in thread

I think it's always problematic to compare to images upscaled via nearest-neighbor. The big pixels are hard to parse for our brain, we detect all the blocky edges.

A good content unaware upscaling would be nice (one of the default photoshop algos)

I also wonder what they used for the downscaling. I see 4x4 pixel blocks, but also some with 3px or 7px lengths.

This looks pixely and is supposed to be a source file?: https://raw.githubusercontent.com/Tetrachrome/subpixel/d2e28...

anotheryou9y ago

from trustswz' comment:

https://arxiv.org/abs/1609.04802

The pic with the boat on page 13 is interesting. In the SRGAN version I would take the shore for some sort of cliff, while the original shows separated boulders.

ericjang9y ago· 1 in thread

transcranial9y ago

And https://arxiv.org/abs/1603.07285 for the corresponding paper. Really clear and easy-to-understand explanation of some of the math.

thoreauway9y ago· 1 in thread

ENHANCE. ENHANCE.

ct5209y ago

https://www.youtube.com/watch?v=KiqkclCJsZs

markisus9y ago

amelius9y ago

Interesting. They should post more examples (not with just faces), or make an online demo, like waifu2x [1]

[1] http://waifu2x.udp.jp/

1 more reply

robertkrahn019y ago

And I always wondered how those photo enhancers in Blade Runner worked...!

j / k navigate · click thread line to collapse