Image-to-Image Translation with Conditional Adversarial Nets (opens in new tab)

(phillipi.github.io)

317 pointscruisestacy9y ago56 comments

56 comments

39 comments · 15 top-level

verytrivial9y ago· 8 in thread

Does anyone else have the feeling that with the current trajectory, something exactly like this, but with perhaps a million times the amount of feedback and data, thought will just emerge? Yes, this is all 2D and abstract/selective training sets etc, but what if AI is the ultimate fake-it-until-you-make-it?

73737373739y ago

Networks with selective attention already exist, but what if they can learn about themselves? Right now they cannot create any notion of self or "body" (defined as the boundary between the environment that can be predicted vs that which cannot), because their outputs have no causal effect on their inputs. There are no differences within the network that make a difference to itself, there is no intrinsic perspective.

Could this change, if, for example

- inputs are augmented with the network state (or derived version thereof)

- previous outputs of the network / external memory are fed back?

This seems to be the kind of self reference self awareness requires.

Also, do asynchronous networks have fundamental advantages over synchronous networks? What about static vs dynamic networks?

gallerdude9y ago

I don't see this happening. What I do see happening, is it figuring us out. Somewhere out there, there's a function which explains how exactly our society is completely organized in every way.

From that, the AI could generate books, movies, and do a lot of things.

rm_-rf_slash9y ago

Reminds me of the novel-rewriting-apparatus from 1984, except with more friggin' superheroes and remakes.

bertiewhykovich9y ago

Hoo boy, buddy, do I got some news for you: https://www.marxists.org/archive/marx/works/1867-c1/

greendestiny9y ago

I don't think it's going to emerge without significant effort to make it happen. I think most of the 'intelligence' we desire will be attainable without sentience. Sentience itself will require a lot of specific research directed at the goal. It's certainly a risk though.

unlikelymordant9y ago

Didn't it just emerge with humans? I don't see why it couldn't happen again. There may be a specific structure or wiring that facilitates thought, but i suspect any large enough net with enough training data can do it.

ionforce9y ago

If you can define thought and it can be implemented, sure.

marxidad9y ago

Biological nature didn't define thought before implementing it.

sebleon9y ago· 4 in thread

This is awesome!

Makes me wonder how this can apply to image and video compression. You could send over the semantic segmentation version of an image or video, and system on the other end would use these technique to reconstruct the original.

espadrine9y ago

You can perform extremely good compression this way, but the computational and energy cost would be prohibitive.

There are even more traditional tricks that don't make it in things like H.265 because it is too costly.

iamaaditya9y ago

Here is my work, where I do use semantic information to achieve compression (rather improve JPEG). This is not an end to end compression like Google's work, but just incorporating semantic knowledge into compression. I am still trying to clean up the code before I make arxiv/github submission, but since you are interested here is the link http://gpgpu.cs-i.brandeis.edu/semantic_jpeg.pdf

nl9y ago

https://research.googleblog.com/2016/09/image-compression-wi...

discordianfish9y ago

I understood that to be the tech behind "Silicon Valley".

bflesch9y ago· 4 in thread

I feel this can potentially revolutionize creative processes, for example in the clothing industry. You just draw up a purse or a shoe, let the machines generate dozens of variants (with pictures), and then you only have to filter and rank them.

You can pipe these product sketches directly into focus groups who tell you which product is most likely to sell. You don't need massive staff to come up with product variants any more.

nathancahill9y ago

I feel like we would end up here: http://www.gianlucagimini.it/prototypes/velocipedia.html

zelpa9y ago

I wonder if you were to average the design of the bicycles whether it would actually produce something that works?

1 more reply

ragebol9y ago

It has the potential to redefine what we think of as 'creativity', as happened with what we consider intelligence and what we think of as "AI Hard" problems.

Perhaps what these networks are generating can be labeled better as "Guided/constrained imitation" rather than real creativity.

visarga9y ago

> real creativity

What is real creativity? Creativity is just random noise converted into patterns. Is the computer variety of creativity not real enough?

4 more replies

jawns9y ago· 3 in thread

The "sketches to handbags" example, which is buried toward the bottom, is really cool. It's basically an extension of the "edges to handbags," but with hand-drawn sketches.

Even though the sketches are fairly crude, with no shading and a low level of detail, many of the generated images look like they could, in fact, be real handbags. They still have the mark of a generated image (e.g. weird mottling) but they're totally recognizable as the thing they're meant to be.

The "sketches to shoes" example, on the other hand, reveals some of the limitations. Most of the sketches use poor perspective, so they wouldn't match up well with edges detected from an actual image of a shoe. Our brains can "get the gist" of the sketches and perform some perspective translation, but the algorithm doesn't appear to perform any translation of the input (e.g. "here's a sketch that appears to represent a shoe, here's what a shoe is actually shaped like, let's fit to that shape before going any further"), so you end up with images where a shoe-like texture is applied to something that doesn't look convincingly like a real shoe.

ape49y ago

This is be a popular shopping website. Sketch your perfect handbag. See an image of the product. Click to buy.

daveguy9y ago

"Sketch your perfect handbag" may be a bit much to ask of most people.

2 more replies

dougabug9y ago

There was a paper at CVPR 2016 called "Sketch Me That Shoe," which basically converted hand sketches to images using tied embedding networks. https://www.eecs.qmul.ac.uk/~qian/Project_cvpr16.html

hanoz9y ago· 2 in thread

I'm interested in having a play. As an out and out ML newbie, is there such a thing as an AWS image I could run on a GPU instance and then just git clone and go?

gregn6109y ago

Try one of the bitfusion AMIs on a g2.2xlarge instance.

hanoz9y ago

Thanks very much. If anyone else is interested I can confirm that the Bitfusion Boost Ubuntu 14 Torch 7 AMI on a g2.2xlarge instance does offer a relatively painless way to get going with this, although I couldn't get the python image combiner to work so had prepare those separately. Have just trained my first neural net, most exciting!

ragebol9y ago· 1 in thread

Interesting.

What I like about the "Day to Night" example is that is clearly demonstrates that these sort of networks lack common sense. It expects light to be where they are clearly (to humans with common sense at least) no things that can produce light. E.g. in the middle of a roof or in a tree. Of course, there can be, but it's fairly uncommon.

And the opposite as well, no lights where a human would totally expect a light, eg. in the front of buildings or on the top of, well, lighting poles.

drcode9y ago

I'd guess the problem is that the daytime pictures allow for easy feature detection (tree, building etc) but the nightime pictures are washed out- We humans look at the daytime picture first, then say "that nighttime picture must have a tree there" which involves feature detection across both pictures (in the training phase)

I suspect a neural network better specialized for this task (i.e. that has the data interlaced for both day and nighttime during training) would have no problem feature detecting trees and leaving them unlit.

amelius9y ago· 1 in thread

I wonder how well this scales to a larger domain of interest. So, e.g., if the neural net needs to know not only about cars and nature, but about more topics such as people, faces, computers, gastronomy, santa claus, halloween, etcetera, how does the neural net scale? And how should its topology be extended under such scaling?

visarga9y ago

It's being researched with great interest. Building models from text and images, describing internal structure and relations between objects, building rich prior knowledge about the world in order to do inference and guide behavior.

I see lots of papers that go in this direction, of creating a rich, semantic, predictive representation of images, video and text and then using it as the basis for reinforcement learning. Learning to understand the world and to act based on that understanding.

romaniv9y ago· 1 in thread

Kudos for providing proper examples of the network doing its thing, both good and bad. This is what all researched ought to do. Too many papers these days handpick a couple coolest looking results and stop at that.

...

I get a feeling this could be used in game design to do some really cool stuff with map and texture generation.

31reasons9y ago

It could reduce the game size way down if it can generate textures on-the-fly.

aexaey9y ago

Truly impressive overall. Unfortunately, it looks like training set was way too small. Look for example at reconstruction of #13 here:

https://phillipi.github.io/pix2pix/images/index_facades2_los...

Notice white triangles (image crop artifacts) present on the original image, yet completely absent on the net input image. They make re-appearance on the output of 3 (4 even?) out of 5 nets despite the lack of corresponding cue in the input image. Looks like network cheated a bit here, i.e. took advantage of small set size and memorized the input image as a whole. Then recognized and recalled this very image (already seen during training) rather than actually reconstructing it purely from the input.

Same (but less prominent) for other images where "ground truth" image was cropped.

mshenfield9y ago

Just want to throw out that none of these applications are new. What is novel about their approach is that, instead of learning a mapping function using a hand-picked function to quantify accuracy for each problem, they also have a mechanism for choosing the function that quantifies accuracy. Haven't grokked the paper to see how they do it, but that is pretty neat IMO.

willcodeforfoo9y ago

The Aerial-to-Map example looks like this may be useful for automatic map/satellite rectification/georeferencing, but not sure how efficient it'd be if it has to compare against a large area.

Does anyone have any experience in this area?

iraphael9y ago

Besides a cool new application of GANNs, I don't see if this architecture is much different than normal GANNs. Anyone else have thoughts?

1 more reply

rosstex9y ago

I'm enrolled in Efros' computational photography course this semester, and Tinghui and Jun-Yan are the GSIs. It's fantastic to experience the bridge between teaching and cutting-edge research!

mmastrac9y ago

This is an absolutely incredible result. All of this stuff would be considered insanely advanced AI ten years ago, but now we look at it and say "this is just stuff computers can do".

We've got the pieces of visual processing and imagination here and the pieces of language input/output as part of Google's work. It feels like we just need to make some progress on an "AI executive" before we can get a real, interactive, human-like machine.

oluckyman9y ago

Neural nets! Is there anything they can't do?

j / k navigate · click thread line to collapse

56 comments

39 comments · 15 top-level

verytrivial9y ago· 8 in thread

73737373739y ago

Could this change, if, for example

- inputs are augmented with the network state (or derived version thereof)

- previous outputs of the network / external memory are fed back?

This seems to be the kind of self reference self awareness requires.

Also, do asynchronous networks have fundamental advantages over synchronous networks? What about static vs dynamic networks?

gallerdude9y ago

I don't see this happening. What I do see happening, is it figuring us out. Somewhere out there, there's a function which explains how exactly our society is completely organized in every way.

From that, the AI could generate books, movies, and do a lot of things.

rm_-rf_slash9y ago

Reminds me of the novel-rewriting-apparatus from 1984, except with more friggin' superheroes and remakes.

bertiewhykovich9y ago

Hoo boy, buddy, do I got some news for you: https://www.marxists.org/archive/marx/works/1867-c1/

greendestiny9y ago

unlikelymordant9y ago

ionforce9y ago

If you can define thought and it can be implemented, sure.

marxidad9y ago

Biological nature didn't define thought before implementing it.

sebleon9y ago· 4 in thread

This is awesome!

espadrine9y ago

You can perform extremely good compression this way, but the computational and energy cost would be prohibitive.

There are even more traditional tricks that don't make it in things like H.265 because it is too costly.

iamaaditya9y ago

nl9y ago

https://research.googleblog.com/2016/09/image-compression-wi...

discordianfish9y ago

I understood that to be the tech behind "Silicon Valley".

bflesch9y ago· 4 in thread

You can pipe these product sketches directly into focus groups who tell you which product is most likely to sell. You don't need massive staff to come up with product variants any more.

nathancahill9y ago

I feel like we would end up here: http://www.gianlucagimini.it/prototypes/velocipedia.html

zelpa9y ago

I wonder if you were to average the design of the bicycles whether it would actually produce something that works?

1 more reply

ragebol9y ago

It has the potential to redefine what we think of as 'creativity', as happened with what we consider intelligence and what we think of as "AI Hard" problems.

Perhaps what these networks are generating can be labeled better as "Guided/constrained imitation" rather than real creativity.

visarga9y ago

> real creativity

What is real creativity? Creativity is just random noise converted into patterns. Is the computer variety of creativity not real enough?

4 more replies

jawns9y ago· 3 in thread

The "sketches to handbags" example, which is buried toward the bottom, is really cool. It's basically an extension of the "edges to handbags," but with hand-drawn sketches.

ape49y ago

This is be a popular shopping website. Sketch your perfect handbag. See an image of the product. Click to buy.

daveguy9y ago

"Sketch your perfect handbag" may be a bit much to ask of most people.

2 more replies

dougabug9y ago

There was a paper at CVPR 2016 called "Sketch Me That Shoe," which basically converted hand sketches to images using tied embedding networks. https://www.eecs.qmul.ac.uk/~qian/Project_cvpr16.html

hanoz9y ago· 2 in thread

I'm interested in having a play. As an out and out ML newbie, is there such a thing as an AWS image I could run on a GPU instance and then just git clone and go?

gregn6109y ago

Try one of the bitfusion AMIs on a g2.2xlarge instance.

hanoz9y ago

ragebol9y ago· 1 in thread

Interesting.

And the opposite as well, no lights where a human would totally expect a light, eg. in the front of buildings or on the top of, well, lighting poles.

drcode9y ago

amelius9y ago· 1 in thread

visarga9y ago

romaniv9y ago· 1 in thread

...

I get a feeling this could be used in game design to do some really cool stuff with map and texture generation.

31reasons9y ago

It could reduce the game size way down if it can generate textures on-the-fly.

aexaey9y ago

Truly impressive overall. Unfortunately, it looks like training set was way too small. Look for example at reconstruction of #13 here:

https://phillipi.github.io/pix2pix/images/index_facades2_los...

Same (but less prominent) for other images where "ground truth" image was cropped.

mshenfield9y ago

willcodeforfoo9y ago

The Aerial-to-Map example looks like this may be useful for automatic map/satellite rectification/georeferencing, but not sure how efficient it'd be if it has to compare against a large area.

Does anyone have any experience in this area?

iraphael9y ago

Besides a cool new application of GANNs, I don't see if this architecture is much different than normal GANNs. Anyone else have thoughts?

1 more reply

rosstex9y ago

I'm enrolled in Efros' computational photography course this semester, and Tinghui and Jun-Yan are the GSIs. It's fantastic to experience the bridge between teaching and cutting-edge research!

mmastrac9y ago

This is an absolutely incredible result. All of this stuff would be considered insanely advanced AI ten years ago, but now we look at it and say "this is just stuff computers can do".

oluckyman9y ago

Neural nets! Is there anything they can't do?

j / k navigate · click thread line to collapse