https://www.wired.com/2016/04/can-draw-bikes-memory-definite...
[1] https://github.com/CompVis/latent-diffusion.git [2] https://imgur.com/a/Sl8YVD5
gwern can maybe comment here.
An actually scary thing is that AIs are getting okay at reproducing people’s voices.
Music, I'm afraid, appears stuck in the doldrums of small one-offs doing stuff like MIDI. Nothing like the breadth & quality of Jukebox has come out since it, even though it's super-obvious that there is a big overhang there and applying diffusion & other new methods would give you something like much like DALL-E 2 / Imagen for general music.
In practice, my guess is that even though Dall-e level performance in music generation would be stunning and incredible, it would also be tiresome and predictable to consume on any extended basis. I mean- that's my reaction to Dall-e- I find the images astonishing and magical but can only look at them for limited periods of time. At these early stages in this new world the outputs of real individual brains are still more interesting.
But having tools like this to facilitate creation and inspiration by those brains- would be so so cool.
> We show that scaling the pretrained text encoder size is more important than scaling the diffusion model size.
There seems to be an unexpected level of synergy between text and vision models. Can't wait to see what video and audio modalities will add to the mix.
Particularly as you approach the point where the image quality itself is superb and people increasingly turn to attacking the semantics & control of the prompt to degrade the quality ("...The donkey is holding a rope on one end, the octopus is holding onto the other. The donkey holds the rope in its mouth. A cat is jumping over the rope..."). For that sort of thing, it's hard to see how simply beefing up the raw pixel-generating part will help much: if the input seed is incorrect and doesn't correctly encode a thumbnail sketch of how all these animals ought to be engaging in outdoors sports, there's nothing some low-level pixel-munging neurons can do to help much.
I wouldn’t be surprised if the lack of video and 3D understanding in the image dataset training fails to understand things like the fear of heights, and the concept of gravity ends up being learned in the text processing weights.
https://twitter.com/joeyliaw/status/1528856081476116480?s=21...
That is what I feel personally.
For example, what kind of source images are used for the snake made of corn[0]? It's baffling to me how the corn is mapped to the snake body.
[0] https://gweb-research-imagen.appspot.com/main_gallery_images...
So text -> text representation -> most likely noised image space -> iteratively reduce noise N times -> upsample result
Something like that, please correct anything I'm missing.
Re: the snake corn question, it is mapping the "concept" of corn to the concept of a body as represented by intermediary learned vector representations.
I usually consider myself fairly intelligent, but I know that when I read an AI research paper I'm going to feel dumb real quick. All I managed to extract from the paper was a) there isn't a clear explanation of how it's done that was written for lay people and b) they are concerned about the quality and biases in the training sets.
Having thought about the problem of "building" an artificial means to visualize from thought, I have a very high level (dumb) view of this. Some human minds are capable of generating synthetic images from certain terms. If I say "visualize a GREEN apple sitting on a picnic table with a checkerboard table cloth", many people will create an image that approximately matches the query. They probably also see a red and white checkerboard cloth because that's what most people have trained their models on in the past. By leaving that part out of the query we can "see" biases "in the wild".
Of course there are people that don't do generative in-mind imagery, but almost all of us do build some type of model in real time from our sensor inputs. That visual model is being continuously updated and is what is perceived by the mind "as being seen". Or, as the Gorillaz put it:
… For me I say God, y'all can see me now
'Cos you don't see with your eye
You perceive with your mind
That's the end of it…
To generatively produce strongly accurate imagery from text, a system needs enough reference material in the document collection. It needs to have sampled a lot of images of corn and snakes. It needs to be able to do image segmentation and probably perspective estimation. It needs a lot of semantic representations (optimized query of words) of what is being seen in a given image, across multiple "viewing models", even from humans (who also created/curated the collections). It needs to be able to "know" what corn looks like, even from the perspective of another model. It needs to know what "shape" a snake model takes and how combining the bitmask of the corn will affect perspective and framing of the final image. All of this information ends up inside the model's network.Miika Aittala at Nvidia Research has done several presentations on taking a model (imagined as a wireframe) and then mapping a bitmapped image onto it with a convolutional neural network. They have shown generative abilities for making brick walls that looks real, for example, from images of a bunch of brick walls and running those on various wireframes.
Maybe Imagen is an example of the next step in this, by using diffusion models instead of the CNN for the generator and adding in semantic text mappings while varying the language models weights (i.e. allowing the language model to more broadly use related semantics when processing what is seen in a generated image). I'm probably wrong about half that.
Here's my cut on how I saw this working from a few years ago: https://storage.googleapis.com/mitta-public/generate.PNG
Regardless of how it works, it's AMAZING that we are here now. Very exciting!
I mean, from my perspective, the skill in these (and DALL-E's) image reproductions is truly astonishing. Just looking for more information about how the software actually works, even if there are big chunks of it that are "this is beyond your understanding without taking some in-depth courses".
There is a Google Colab workbook that you can try and run for free :)
This is the image-text pairs behind: https://laion.ai/laion-400-open-dataset/
A basic part of it is that neural networks combine learning and memorizing fluidly inside them, and these networks are really really big, so they can memorize stuff good.
So when you see it reproduce a Shiba Inu well, don’t think of it as “the model understands Shiba Inus”. Think of it as making a collage out of some Shiba Inu clip art it found on the internet. You’d do the same if someone asked you to make this image.
It’s certainly impressive that the lighting and blending are as good as they are though.
People tend to really underestimate just how big these models are. Of course these models aren't simply "really really big" MLPs, but the cleverness of the techniques used to build them is only useful at insanely large scale.
I do find these models impressive as examples of "here's what the limit of insane amounts of data, insane amounts of compute can achieve with some matrix multiplication". But at the same time, that's all they are.
What saddens me about the rise of deep neural networks is it is really is the end of the era of true hackers. You can't reproduce this at home. You can't afford to reproduce this one in the cloud with any reasonable amount of funding. If you want to build this stuff your best bet is to go to top tier school, make the right connections and get hired by a mega-corp.
But the real tragedy here is that the output of this is honestly only interesting it if it's the work of some hacker fiddling around in their spare time. A couple of friend hacking in their garage making images of raccoon painting is pretty cool. One of the most powerful, well funded, owners of the likely the most compute resources on the planet doing this as their crowning achievement in AI... is depressing.
What I don't understand is how they do the composition. E.g. for "A giant cobra snake on a farm. The snake is made out of corn." I think I could understand how it could reproduce the "A giant cobra snake on a farm" part. What I don't understand is how it accurately pictured "The snake is made out of corn." part, when I'm guessing it has never seen images of snakes made out of corn, and the way it combined "snake" with "made out of corn", in a way that is pretty much how I imagined it would look, is the part I'm baffled by.
Each box you see there has a section in the paper explaining it in more detail.
Some of the reasoning:
>Preliminary assessment also suggests Imagen encodes several social biases and stereotypes, including an overall bias towards generating images of people with lighter skin tones and a tendency for images portraying different professions to align with Western gender stereotypes. Finally, even when we focus generations away from people, our preliminary analysis indicates Imagen encodes a range of social and cultural biases when generating images of activities, events, and objects. We aim to make progress on several of these open challenges and limitations in future work.
Really sad that breakthrough technologies are going to be withheld due to our inability to cope with the results.
We certainly don't want to perpetuate harmful stereotypes. But is it a flaw that the model encodes the world as it really is, statistically, rather than as we would like it to be? By this I mean that there are more light-skinned people in the west than dark, and there are more women nurses than men, which is reflected in the model's training data. If the model only generates images of female nurses, is that a problem to fix, or a correct assessment of the data?
If some particular demographic shows up in 51% of the data but 100% of the model's output shows that one demographic, that does seem like a statistics problem that the model could correct by just picking less likely "next token" predictions.
Also, is it wrong to have localized models? For example, should a model for use in Japan conform to the demographics of Japan, or to that of the world?
If you want the model to understand what a "nurse" actually is, then it shouldn't be associated with female.
If you want the model to understand how the word "nurse" is usually used, without regard for what a "nurse" actually is, then associating it with female is fine.
The issue with a correlative model is that it can easily be self-reinforcing.
For a one-shot generative algorithm you must accept the artist’s biases.
Does a bias towards lighter skin represent reality? I was under the impression that Caucasians are a minority globally.
I read the disclaimer as "the model does NOT represent reality".
Also, getting a random sample of any demographic would be really hard, so no machine learning project is going to do that. Instead you've got a random sample of some arbitrary dataset that's not directly relevant to any particular purpose.
This is, in essence, a design or artistic problem: the Google researchers have some idea of what they want the statistical properties of their image generator to look like. What it does isn't it. So, artistically, the result doesn't meet their standards, and they're going to fix it.
There is no objective, universal, scientifically correct answer about which fictional images to generate. That doesn't all art is equally good, or that you should just ship anything without looking at quality along various axes.
I want to be clear here, bias can be introduced at many different points. There's dataset bias, model bias, and training bias. Every model is biased. Every dataset is biased.
Yes, the real world is also biased. But I want to make sure that there are ways to resolve this issue. It is terribly difficult, especially in a DL framework (even more so in a generative model), but it is possible to significantly reduce the real world bias.
Yeah, but you get that same effect on every axis, not just the one you're trying to correct. You might get male nurses, but they have green hair and six fingers, because you're sampling from the tail on all axes.
So even if we managed to create a perfect model of representation and inclusion, people could still use it to generate extremely offensive images with little effort. I think people see that as profoundly dangerous. Restricting the ability to be creative seems to be a new frontier of censorship.
If the model only generated images of female nurses, then it is not representative of the real world, because male nurses exist and they deserve to not be erased. The training data is the proximate causes here, but one wonders what process ended up distorting "most nurses are female" into "nearly all nurse photos are of female nurses" something amplified a real world imbalance into a dataset that exhibited more bias than the real world, and then training the AI bakes that bias into an algorithm (that may end up further reinforcing the bias in the real world depending on the use-cases).
>While a subset of our training data was filtered to removed noise and undesirable content, such as pornographic imagery and toxic language, we also utilized LAION-400M dataset which is known to contain a wide range of inappropriate content including pornographic imagery, racist slurs, and harmful social stereotypes
Tossing that stuff when it comes up in a research environment is one thing, but Google clearly wants to implement this as a product, used all over the world by a huge range of people. If the dataset has problems, and why wouldn't it, it is perfectly rational to want to wait and re-implement it with a better one. DALL-E 2 was trained on a curated dataset so it couldn't generate sex or gore. Others are sanitizing their inputs too and have done for a long time. It is the only thing that makes sense for a company looking to commercialize a research project.
This has nothing to do with "inability to cope" and the implied woke mob yelling about some minor flaw. It's about building a tool that doesn't bake in serious and avoidable problems.
Maybe that's a nice thing, I wouldn't say their values are wrong but let's call a spade a spade.
For example, Google's image search results pre-tweaking had some interesting thoughts on what constitutes a professional hairstyle, and that searches for "men" and "women" should only return light-skinned people: https://www.theguardian.com/technology/2016/apr/08/does-goog...
Does that reflect reality? No.
(I suspect there are also mostly unstated but very real concerns about these being used as child pornography, revenge porn, "show my ex brutally murdered" etc. generators.)
Chitwan Saharia, William Chan, Saurabh Saxena†, Lala Li†, Jay Whang†, Emily Denton, Seyed Kamyar Seyed Ghasemipour, Burcu Karagol Ayan, S. Sara Mahdavi, Rapha Gontijo Lopes, Tim Salimans, Jonathan Ho†, David Fleet†, Mohammad Norouzi
One example would be if Imagen draws a group of mostly white people when you say "draw a group of people". This doesn't reflect actual reality. Another would be if Imagen draws a group of men when you say "draw a group of doctors".
In these cases where iconographic reality differs from actual reality, hand-tuning could be used to bring it closer to the real world, not just the world as we might wish it to be!
I agree there's a problem here. But I'd state it more as "new technologies are being held to a vastly higher standard than existing ones." Imagine TV studios issuing a moratorium on any new shows that made being white (or rich) seem more normal than it was! The public might rightly expect studios to turn the dials away from the blatant biases of the past, but even if this would be beneficial the progressive and activist public is generations away from expecting a TV studio to not release shows until they're confirmed to be bias-free.
That said, Google's decision to not publish is probably less about the inequities in AI's representation of reality and more about the AI sometimes spitting out drawings that are offensive in the US, like racist caricatures.
Translation: we need to hand-tune this to not reflect reality
Is it reflecting reality, though?Seems to me that (as with any ML stuff, right?) it's reflecting the training corpus.
Futhermore, is it this thing's job to reflect reality?
the world as we (Caucasian/Asian male American woke
upper-middle class San Fransisco engineers) wish it to be
Snarky answer: Ah, yes, let's make sure that things like "A giant cobra snake on a farm. The snake is made out of corn" reflect reality.Heartfelt answer: Yes, there is some of that wishful thinking or editorializing. I don't consider it to be erasing or denying reality. This is a tool that synthesizes unreality. I don't think that such a tool should, say, refuse to synthesize an image of a female POTUS because one hasn't existed yet. This is art, not a reporting tool... and keep in mind that art not only imitates life but also influences it.
At what point is statistical significance considered ok and unbiased?
Presumably when you're significantly predictive of the preferred dogma, rather than reality. There's no small bit of irony in machines inadvertently creating cognitive dissonance of this sort; second order reality check.
I'm fairly sure this never actually played out well in history (bourgeois pseudoscience, deutsche physik etc), so expect some Chinese research bureau to forge ahead in this particular direction.
T5-XXL looks on par with CLIP so we may not see an open source version of T5 for a bit (LAION is working on reproducing CLIP), but this is all progress.
It is also available via Hugging Face transformers.
However, the paper mentions T5-XXL is 4.6B, which doesn't fit any of the checkpoints above, so I'm confused.
I mean a good example of this is the Pulse[0][1] paper. You may remember it as the white Obama. This became a huge debate and it was pretty easily shown that the largest factor was the dataset bias. This outrage did lead to fixing FFHQ but it also sparked a huge debate with LeCun (data centric bias) and Timnit (model centric bias) at the center. Though Pulse is still remembered for this bias, not for how they responded to it. I should also note that there is human bias in this case as we have a priori knowledge of what the upsampled image should look like (humans are pretty good at this when the small image is already recognizable but this is a difficult metric to mathematically calculate).
It is fairly easy to find adversarial examples, where generative models produce biased results. It is FAR harder to fix these. Since this is known by the community but not by the public (and some community members focus on finding these holes but not fixing them) it creates outrage. Probably best for them to limit their release.
[0] https://arxiv.org/abs/2003.03808
[1] https://cdn.vox-cdn.com/thumbor/MXX-mZqWLQZW8Fdx1ilcFEHR8Wk=...
Very difficult to replicate results.
After that we'll make them sit through Legal's approved D&I video series, then it's off to the races.
Other STEM adjacent communities feel similarly but I don’t get it from actual in person engineers much.
Moreover, the model doing things like exclusively producing white people when asked to create images of people home brewing beer is "biased" but it's a bias that presumably reflects reality (or at least the internet), if not the reality we'd prefer. Bias means more than "spam and crap", in the ML community bias can also simply mean _accurately_ modeling the underlying distribution when reality falls short of the author's hopes.
For example, if you're interested in learning about what home brewing is the fact that it uses white people would be at least a little unfortunate since there is nothing inherently white and some home brewers aren't white. But if, instead, you wanted to just generate typical home brewing images doing anything but would generate conspicuously unrepresentative images.
But even ignoring the part of the biases which are debatable or of application-specific impact, saying something is unfortunate and saying people should be denied access are entirely different things.
I'll happily delete this comment if you can bring to my attention a single person who has suggested that we lose access to the internet because of spam and crap who has also argued that the release of an internet-biased ML model shouldn't be withheld.
Google knows this will be an unlimited money generator so they're keeping a lid on it.
There are two possible ways of interpreting interpreting "gender stereotypes in professions".
biased or correct
https://www.abc.net.au/news/2018-05-21/the-most-gendered-top...
https://www.statista.com/statistics/1019841/female-physician...
>Eschew flamebait. Avoid unrelated controversies and generic tangents.
They provided a pretty thorough overview (nearly 500 words) of the multiple reasons why they are showing caution. You picked out the one that happened to bother you the most and have posted a misleading claim that the tech is being withheld entirely because of it.
Genuinely, isn't it a prime example of the people actually stopping to think if they should, instead of being preoccupied with whether or not they could ?
Indeed it is. Consider this an early, toy version of the political struggle related to ownership of AI-scientists and AI-engineers of the near future. That is, generally capable models.
I do think the public should have access to this technology, given so much is at stake. Or at least the scientists should be completely, 24/7, open about their R&D. Every prompt that goes into these models should be visible to everyone.
It's often not worth it to decentralize the computation of the trained model though but it's not hard to get donated cycles and groups are working on it. Don't fret because Google isn't releasing the API/code. They released the paper and that's all you need.
Dall-E had an entire news cycle (on tech-minded publications, that is) that showcased just how amazing it was.
Millions* of people became aware that technology like Dall-E exists, before anyone could get their hands on it and abuse it. (*a guestimate, but surely a close one)
One day soon, inevitably, everyone will have access to something 10x better than Imagen and Dall-E. So at least the public is slowly getting acclimated to it before the inevitable "theater-goers running from a projected image of a train approaching the camera" moment
AI was expected to grow like a child. Somehow blurting out things that would show some increasing understanding on a deep level but poor syntax.
In fact we get the exact opposite. AI is creating texts that are syntaxically correct and very decently articulated and pictures that are insanely good.
And these texts and images are created from a text prompt?! There is no way to interface with the model other than by freeform text. That is so weird to me.
Yet it doesn’t feel intelligent at all at first. You can’t ask it to draw “a chess game with a puzzle where white mates in 4 moves”.
Yet sometimes GPT makes very surprising inferences. And it starts to feel like there is something going on a deeper level.
DeepMind’s AlphaXxx models are more in line with how I expected things to go. Software that gets good at expert tasks that we as humans are too limited to handle.
Where it’s headed, we don’t know. But I bet it’s going to be difficult to tell the “intelligence” from the “varnish”
Meanwhile, Nvidia sees no problem with yeeting stylegan and and models that allow real humans to be realistically turned into animated puppets in 3d space. The inevitable end result of these scientific achievements will be orders of magnitude worse than deepfakes.
Oh, or a panda wearing sunglasses, in the desert, digital art.
It’s an old fear for sure but it seems to be getting closer and closer every day, and yet most of the discussion around these things seems to be variations of “isn’t this cool?”
Without a fairly deep grounding in this stuff it’s hard to appreciate how far ahead Brain and DM are.
Neither OpenAI nor FAIR ever has the top score on anything unless Google delays publication. And short of FAIR? D2 lacrosse. There are exceptions to such a brash generalization, NVIDIA’s group comes to mind, but it’s a very good rule of thumb. Or your whole face the next time you are tempted to doze behind the wheel of a Tesla.
There are two big reasons for this:
- the talent wants to work with the other talent, and through a combination of foresight and deep pockets Google got that exponent on their side right around the time NVIDIA cards started breaking ImageNet. Winning the Hinton bidding war clinched it.
- the current approach of “how many Falcon Heavy launches worth of TPU can I throw at the same basic masked attention with residual feedback and a cute Fourier coloring” inherently favors deep pockets, and obviously MSFT, sorry OpenAI has that, but deep pockets also non-linearly scale outcomes when you’ve got in-house hardware for multiply-mixed precision.
Now clearly we’re nowhere close to Maxwell’s Demon on this stuff, and sooner or later some bright spark is going to break the logjam of needing 10-100MM in compute to squeeze a few points out of a language benchmark. But the incentives are weird here: who, exactly, does it serve for us plebs to be able to train these things from scratch?
Google clearly demonstrates their unrivaled capability to leverage massive quantities of data and compute, but it’s premature to declare that they’ve secured victory in the AI Wars.
And I don’t think whatever iteration of PaLM was cooking at the time GPT-3 started getting press would have looked to shabby.
I think Google crushed OpenAI on both GPT and DALL-E in short order because OpenAI published twice and someone had had enough.
This is ... very incorrect. I am very certain (95%+) that Google had nothing even close to GPT-3 at the time of its release. It's been 2 full years since GPT-3 was released, and even longer since OpenAI actually trained it.
That's not to talk about any of the other things OpenAI/FAIR has released that were SOTA at the time of release (Dall-E 1, JukeBox, Poker, Diplomacy, Codex).
Google Brain and Deepmind have done a lot of great work, but to imply that they essentially have a monopoly on SOTA results and all SOTA results other labs have achieved are just due to Google delaying publication is ridiculous.
I did a bit of disclaimer on my original post but not enough to withstand detailed scrutiny. This is sort of the trouble with trying to talk about cutting-edge research in what amounts to a tweet: what’s the right amount of oversimplified, emphatic statement to add legitimate insight but not overstep into being just full of shit.
I obviously don’t know that publication schedules at heavy-duty learning shops are deliberate and factor-in other publications. The only one I know anything concretely about is FAIR and even that’s badly dated knowledge.
I was trying to squeeze into a few hundred characters my very strong belief that Brain and DM haven’t let themselves be scooped since ResNet, based on my even stronger belief that no one has the muscle to do it.
To the extent that my oversimplification detracted from the conversation I regret that.
But if you’re interested I’m happy to (attempt) answers to anything that was jargon: by virtue of HN my answers will be peer-reviewed in real time, and with only modest luck, a true expert might chime in.
It’s Boltzmann and Szilard that did the original “kT” stuff around underlying thermodynamics governing energy dissipation in these scenarios, and Rolf Landaeur (I think that’s how you spell it) who did the really interesting work on how to apply that thermo work to lower-bounds on energy-expenditure in a given computation.
I said Maxwell’s Demon because it’s the best known example of a deep connection between useful work and computation. But it was sloppy.
But in general it is likely more due in part to the fact that it's going to happen anyway, if we can share our approaches and research findings, we'll just achieve it sooner.
I’ve got no interest in moralizing on this, but if any of the big actors wanted to they could put a meaningful if not overwhelming subset of the corpus on S3, put the source code on GitHub, and you could on a modest budget see an epoch or 3.
I’m not holding my breath.
I'm not sure it matters. The history of computing shows that within the decade we will all have the ability to train and use these models.
For example: the high-frequency trading industry is estimated to have made somewhere between 2-3 billion dollars in all of 2020, profit/earnings. That’s a good weekend at Google.
HFT shops pay well, but not much different to top performers at FAANG.
People work in HFT because without taking a pay cut they can play real ball: they want to try themselves against the best.
Heavy learning people are no different in wanting both a competitive TC but maybe even more to be where the action is.
That’s currently Blade Runner Industries Ltd, but that could change.
I can see the future as being devoid of any humanity.
I guess the concern would be: If one of these recipe websites _was_ generated by an AI, the ingredients _look_ correct to an AI but are otherwise wrong - then what do you do? Baking soda swapped with baking powder. Tablespoons instead of teaspoons. Add 2tbsp of flower to the caramel macchiato. Whoops! Meant sugar.
As AI advances, a lot of people will look after experiencing life outside the digital world.
Even digital communication will not be trustworthy anymore with deepfaces and everything else, so people will want to get together more often.
Edit: for the lazy ones, yeah, digital will be a sad and heartless environment...
Considering how many of the readers of said blog will be scrapers and bots, who will use the results to generate more spammy "content", I think you are right.
I can see a past where this already happened, to paraphrase Douglas Adams ;)
Unless you assume there are bad actors who will crop out the tags. Not many people now have access to Dall-E2 or will have access to Imagen.
As someone working in Vision, I am also thinking about whether to include such images deliberately. Using image augmentation techniques is ubiquitous in the field. Thus we introduce many examples for training the model that are not in the distribution over input images. They improve model generality by huge margins. Whether generated images improve generality of future models is a thing to try.
Damn I just got an idea for a paper writing this comment.
The irony is that if you had a great discriminator to separate the wheat from the chaff, that it would probably make its way into the next model and would no longer be useful.
My only recommendation is that OpenAI et al should be tagging metadata for all generated images as synthetic. That would be a really interesting tag for media file formats (would be much better native than metadata though) and probably useful across a lot of domains.
Neil Stephenson covered this briefly in "Fall; or Dodge In Hell." So much 'net content was garbage, AI-generated, and/or spam that it could only be consumed via "editors" (either AI or AI+human, depending on your income level) that separated the interesting sliver of content from...everything else.
A bit far out there in terms of plot but the notion of authenticating based on a multitude of factors and fingerprints is not that strange. We've already started doing that. It's just that we currently still consume a lot of unsigned content from all sorts of unreliable/untrustworthy sources.
Fake news stops being a thing as soon as you stop doing that. Having people sign off on and vouch for content needs to start becoming a thing. I might see Joe Biden saying stuff in a video on Youtube. But how do I know if that's real or not?
With deep fakes already happening, that's no longer an academic question. The answer is that you can't know. Unless people sign the content. Like Joe Biden, any journalists involved, etc. You might still not know 100% it is real but you can know whether relevant people signed off on it or not and then simply ignore any unsigned content from non reputable sources. Reputations are something we can track using signatures, blockchains, and other solutions.
Interesting with Neal Stephenson that he presents a problem and a possible solution in that book.
If the AI models can't consume it, it can't be commoditised and, well, ruined.
I think you’re right, and it’s unlikely that we (society) will convince people to label their AI content as such so that scraping is still feasible.
It’s far more likely that companies will be formed to provide “pristine training sets of human-created content”, and quite likely they will be subscription based.
well, we do have organic/farmed/handcrafted/etc. food. One can imagine information nutrition label - "contains 70% AI generated content, triggers 25% of the daily dopamine release target".
I think this will introduce unavoidable background noise that will be super hard to fully eliminate in future large scale data sets scraped from the web, there's always going to be more and more photorealistic pictures of "cats" "chairs" etc. in the data that are close to looking real but not quite, and we can never really go back to a world where there's only "real" pictures, or "authentic human art" on the internet.
Less common opinion: this is also how you end up with models that understand the concept of themselves, which has high economic value.
Even less common opinion: that's really dangerous.
[0] https://creativecloud.adobe.com/discover/article/how-to-use-...
Cheap books, cheap TV and cheap music will be generated.
Good lord we are screwed. And yet somehow I bet even this isn't going to kill off the they're just statistical interpolators meme.
[1] https://www.deepmind.com/blog/tackling-multiple-tasks-with-a...
I think it’s in everyone’s benefit if we start planning for a world where a significant portion of the experts are stubbornly wrong about AGI. As a technology, generally intelligent ML has the potential to change so many aspects of our world. The dangers of dismissing the possibility of AGI emerging in the next 5-10 years are huge.
Again, I think we should consider "The Human Alignment Problem" more in this context. The transformers in question are large, heavy and not really prone to "recursive self-improvement".
If the ML-AGI works out in a few years, who gets to enter the prompts?
They’re all fundamentally anthropocentric: people argue until they are blue in the face about what “intelligent” means but it’s always implicit that what they really mean is “how much like me is this other thing”.
Language models, even more so than the vision models that got them funded have empirically demonstrated that knowing the probability of two things being adjacent in some latent space is at the boundary indistinguishable from creating and understanding language.
I think the burden is on the bright hominids with both a reflexive language model and a sex drive to explain their pre-Copernican, unique place in the theory of computation rather than vice versa.
A lot of these problems just aren’t problems anymore if performance on tasks supersedes “consciousness” as the thing we’re studying.
All of these models seem to require a human to evaluate and edit the results. Even Co-Pilot. In theory this will reduce the number of human hours required to write text or create images. But I haven't seen anyone doing that successfully at scale or solving the associated problems yet.
I'm pessimistic about the current state of AI research. It seems like it's been more of the same for many years now.
For image generation, it's obviously all fiction. Which is fine and mostly harmless if you you know what you're getting. It's going to leak out onto the Internet, though, and there will be photos that get passed around as real.
For text, it's all fiction too, but this isn't obvious to everyone because sometimes it's based on true facts. There's often not going to be an obvious place where the facts stop and the fiction starts.
The raw Internet is going to turn into a mountain of this stuff. Authenticating information is going to become a lot more important.
Still amazing that we're at a point where that's the case, they're both incredible developments.
I believe this type of content generation will be the next big thing or at least one of them. But people will want some customization to make their pictures “unique” and fix AI’s lack of creativity and other various shortcomings. Plus edit out the remaining lapses in logic/object separation (which there are some even in the given examples).
Still, being able to create arbitrary stock photos is really useful and i bet these will flood small / low-budget projects
Don't like any of the results from the real web? Well how about these we created just for you.
If Getty et al aren't already spending money on that possibility, they probably should be.
(Consumer demand and boredom both being infinite is another thing working against it.)
I would expect AI development to follow a similar path to digital media generally, as its following the increasing difficulty and space requirements of digitally representing said media: text < basic sounds < images < advanced audio < video.
What’s more impressive to me is how far ahead text-to-speech is, but I think the explanation is straightforward (the accessibility value has motivated us to work on that for a lot longer).
"A photo of a Shiba Inu dog Wearing a (sic) sunglasses And black leather jacket Playing guitar In a garden"
The Shiba Inu is not playing a guitar.
They have an example “horse riding an astronaut” that no model produces a correct image for. It’d be interesting if models could explain themselves or print the caption they understand you as saying.
“In future work we will explore a framework for responsible externalization that balances the value of external auditing with the risks of unrestricted open-access.”
I work for a big org myself, and I’ve wondered what it is exactly that makes people in big orgs so bad at saying things.
You can tell me those pictures are generated by an AI and I might believe it, but until real people can actually test it... it's easy enough to fake. This page isn't even the remotest bit legit by the URL, It looks nicely put together and that's about it. Could have easily put together this with a graphic designer to fake it.
Let be clear, I'm not actually saying it's fake. Just that all of these new "cool" things are more or less theoretical if nothing is getting released.
For example, corporate graphics design, logos, brand photography, etc.
I really do think inference time is a red herring for the first generation of these models.
Sure, the more transformative use-cases like real-time content generation to replace movies/games, but there is a lot of value to be created prior to that point.
What I see is semi-poverty mindset among very smart people who appear to be treated in a way such that the winners get promotion, and everyone else is fired. That this sort of analysis with ML is useful for massive data sets at scale, where 90% is a lot of accuracy, not at all for the small sets of real world, human-scale problems where each result may matter a lot. The amount of years of training that these researchers had to go through, to participate in this apparently ruthless environment, are certainly like a lottery ticket, if you are in fact in a game where everyone but the winner has to find a new line of work. I think their masters live in Redmond, if I recall.. not looking it up at the moment.
Nothing in a Transformer's perplexity in predicting the next token tells you that at some point it suddenly starts being able to write flawless literary style parodies, and this is why the computer art people become virtuosos of CLIP variants and are excited by new ones, because each one attacks concepts in slightly different ways and a 'small' benchmark increase may unlock some awesome new visual flourish that the model didn't get before.
Sure, it's only 2%, but if it's on a problem where everyone else has been trying to make that improvement for a long time, and that improvement means big economic or social gains, then it's worth it.
> The potential risks of misuse raise concerns regarding responsible open-sourcing of code and demos. At this time we have decided not to release code or a public demo. In future work we will explore a framework for responsible externalization that balances the value of external auditing with the risks of unrestricted open-access.
I can see the argument here. It would be super fun to test this model's ability to generate arbitrary images, but "arbitrary" also contains space for a lot of distasteful stuff. Add in this point:
> While a subset of our training data was filtered to removed noise and undesirable content, such as pornographic imagery and toxic language, we also utilized LAION-400M dataset which is known to contain a wide range of inappropriate content including pornographic imagery, racist slurs, and harmful social stereotypes. Imagen relies on text encoders trained on uncurated web-scale data, and thus inherits the social biases and limitations of large language models. As such, there is a risk that Imagen has encoded harmful stereotypes and representations, which guides our decision to not release Imagen for public use without further safeguards in place.
That said, I hope they're serious about the "framework for responsible externalization" part, both because it would be really fun to play with this model and because it would be interesting to test it outside of their hand-picked examples.
Oh wait.
Google: "it's too dangerous to release to the public"
OpenAI: "we are committed to open source AGI but this model is too dangerous to release to the public"
I don't think they would host this for fun then.
An impressive advance would be a small model that’s capable of working from an external memory rather than memorizing it.
Oh well.
Lying about ethics or misattributing their actions to some misguided sense of "social" responsibility puts google in a far worse light in my eyes. I can't help but wonder how many skilled employees were driven off from accepting a position at google because of lies like these.
That said, you can download Dream by Wombo from the app store and it is one of the top smartphone apps, even though it is a few generations behind state of the art.
Actually, I think they made InstructGPT even better at erotica because it’s trained to be “helpful and friendly”, so in other words they made it a sub.
Google is not a hobby project anymore: "don't do evil" or whatever they whittered on about back in the day.
There's mountains of ai-generated inauthentic content that companies (including Google) have to filter out of their services. This content is used for spam, click farms, scamming, and even state propaganda operations. GPT-2 made this problem orders of magnitude worse than it used to be, and each iteration makes it harder to filter.
The industry term is (generally) "Coordinated Inauthentic Behavior" (though this includes uses of actual human content). I think Smarter Every Day did a good videos (series?) on the topic, and there are plenty of articles on the topic if you prefer that.
“Oh our tech is so dangerous and amazing it could turn the world upside down” yet we hand it to random Bluechecks on Twitter.
It’s just marketing
Someone tried to say there were ethics committees etc the other day...what a bad joke. Who checks the ethics committee is making ethical decisions?
I was told I "didn't know what" I was talking about, excuse from some over-important know-it-all who didn't know what ethics was, i.e. they don't know what they are talking about.
Hooray! Non-cherry-picked samples should be the norm.
I would love it.
Of course, working in a golden lab at Google may twist your views on society.
Their slider with examples at the top showed a prompt along the lines of "a chrome plated duck with a golden beak confronting a turtle in a forest" and the resulting image was perfect - except the turtle had a golden shell.
Almost there, the Apple Laserwriter nailed it at 300 dpi.
Sometimes sneaked an issue of the "SF-Lovers Digest" in between code printouts.
The kind of early 2010's, over the top description of something that's ridiculous
To the extent that they get used for making bored ape images or whatever meme du juor, it says much more about the kind of pictures people want to see.
I personally find the weird deep dreaming dogs with spikes coming out of their heads more mathematically interesting, but I can understand why that doesn’t sell as well.
Print me a racoon in a leather jacket riding a skateboard.
Unrelated to the main topic, but this is exactly why I think cryptocurrencies will only be used for illegal activities, or things you may want to hide, and nothing else. Because that's where it has found its usecase in porn.
You gave an example of a still image, but it's going to end up with an AI generating a full video according to a detailed text prompt. The porn industry is going to be utterly destroyed.
But I have not tried making generative models with out-of-distribution data before. Distributions other than main training data.
There are several indie attempts that I am aware of. Mentioning them to the reply of this comment. (In case the comment gets deleted)
The first layers should be general. But the later layers should not behave well to porn images. As they are more specialist layers learning distribution specific visual patters.
Transfer learning is posssible.
https://www.metaculus.com/questions/3479/date-weakly-general...
This will result in mass social unrest.
Should ML/AI deliver on the wildest promises, it will be like a SpaceX Starship for the mind.
It's still an unruly 7 year old at best. Results need to be verified. Prompt engineering and a sense of creativity are core competencies.
- If you made that picture with actors or in MS Paint, politics boomers on Facebook wouldn’t care either way. They’d just start claiming it’s real if they like the message.
# whois appspot.com
[Querying whois.verisign-grs.com]
[Redirected to whois.markmonitor.com]
[Querying whois.markmonitor.com]
[whois.markmonitor.com]
Domain Name: appspot.com
Registry Domain ID: 145702338_DOMAIN_COM-VRSN
Registrar WHOIS Server: whois.markmonitor.com
Registrar URL: http://www.markmonitor.com
Updated Date: 2022-02-06T09:29:56+0000
Creation Date: 2005-03-10T02:27:55+0000
Registrar Registration Expiration Date: 2023-03-10T00:00:00+0000
Registrar: MarkMonitor, Inc.
Registrar IANA ID: 292
Registrar Abuse Contact Email: abusecomplaints@markmonitor.com
Registrar Abuse Contact Phone: +1.2086851750
Domain Status: clientUpdateProhibited (https://www.icann.org/epp#clientUpdateProhibited)
Domain Status: clientTransferProhibited (https://www.icann.org/epp#clientTransferProhibited)
Domain Status: clientDeleteProhibited (https://www.icann.org/epp#clientDeleteProhibited)
Domain Status: serverUpdateProhibited (https://www.icann.org/epp#serverUpdateProhibited)
Domain Status: serverTransferProhibited (https://www.icann.org/epp#serverTransferProhibited)
Domain Status: serverDeleteProhibited (https://www.icann.org/epp#serverDeleteProhibited)
Registrant Organization: Google LLC
Registrant State/Province: CA
Registrant Country: US
Registrant Email: Select Request Email Form at https://domains.markmonitor.com/whois/appspot.com
Admin Organization: Google LLC
Admin State/Province: CA
Admin Country: US
Admin Email: Select Request Email Form at https://domains.markmonitor.com/whois/appspot.com
Tech Organization: Google LLC
Tech State/Province: CA
Tech Country: US
Tech Email: Select Request Email Form at https://domains.markmonitor.com/whois/appspot.com
Name Server: ns4.google.com
Name Server: ns3.google.com
Name Server: ns2.google.com
Name Server: ns1.google.com