Could this change, if, for example
- inputs are augmented with the network state (or derived version thereof)
- previous outputs of the network / external memory are fed back?
This seems to be the kind of self reference self awareness requires.
Also, do asynchronous networks have fundamental advantages over synchronous networks? What about static vs dynamic networks?
From that, the AI could generate books, movies, and do a lot of things.
Makes me wonder how this can apply to image and video compression. You could send over the semantic segmentation version of an image or video, and system on the other end would use these technique to reconstruct the original.
There are even more traditional tricks that don't make it in things like H.265 because it is too costly.
You can pipe these product sketches directly into focus groups who tell you which product is most likely to sell. You don't need massive staff to come up with product variants any more.
Perhaps what these networks are generating can be labeled better as "Guided/constrained imitation" rather than real creativity.
What is real creativity? Creativity is just random noise converted into patterns. Is the computer variety of creativity not real enough?
Even though the sketches are fairly crude, with no shading and a low level of detail, many of the generated images look like they could, in fact, be real handbags. They still have the mark of a generated image (e.g. weird mottling) but they're totally recognizable as the thing they're meant to be.
The "sketches to shoes" example, on the other hand, reveals some of the limitations. Most of the sketches use poor perspective, so they wouldn't match up well with edges detected from an actual image of a shoe. Our brains can "get the gist" of the sketches and perform some perspective translation, but the algorithm doesn't appear to perform any translation of the input (e.g. "here's a sketch that appears to represent a shoe, here's what a shoe is actually shaped like, let's fit to that shape before going any further"), so you end up with images where a shoe-like texture is applied to something that doesn't look convincingly like a real shoe.
What I like about the "Day to Night" example is that is clearly demonstrates that these sort of networks lack common sense. It expects light to be where they are clearly (to humans with common sense at least) no things that can produce light. E.g. in the middle of a roof or in a tree. Of course, there can be, but it's fairly uncommon.
And the opposite as well, no lights where a human would totally expect a light, eg. in the front of buildings or on the top of, well, lighting poles.
I suspect a neural network better specialized for this task (i.e. that has the data interlaced for both day and nighttime during training) would have no problem feature detecting trees and leaving them unlit.
I see lots of papers that go in this direction, of creating a rich, semantic, predictive representation of images, video and text and then using it as the basis for reinforcement learning. Learning to understand the world and to act based on that understanding.
...
I get a feeling this could be used in game design to do some really cool stuff with map and texture generation.
https://phillipi.github.io/pix2pix/images/index_facades2_los...
Notice white triangles (image crop artifacts) present on the original image, yet completely absent on the net input image. They make re-appearance on the output of 3 (4 even?) out of 5 nets despite the lack of corresponding cue in the input image. Looks like network cheated a bit here, i.e. took advantage of small set size and memorized the input image as a whole. Then recognized and recalled this very image (already seen during training) rather than actually reconstructing it purely from the input.
Same (but less prominent) for other images where "ground truth" image was cropped.
Does anyone have any experience in this area?
We've got the pieces of visual processing and imagination here and the pieces of language input/output as part of Google's work. It feels like we just need to make some progress on an "AI executive" before we can get a real, interactive, human-like machine.