Dall-E 2 (opens in new tab)

(openai.com)

1820 pointsyigitdemirag4y ago485 comments

485 comments

206 comments · 81 top-level

andybak4y ago· 16 in thread

  Preventing Harmful Generations

  We’ve limited the ability for DALL·E 2 to generate violent, 
  hate, or adult images. By removing the most explicit content 
  from the training data, we minimized DALL·E 2’s exposure to 
  these concepts. We also used advanced techniques to prevent 
  photorealistic generations of real individuals’ faces, 
  including those of public figures.

"And we've also closed off a huge range of potentially interesting work as a result"

I can't help but feel a lot of the safeguarding is more about preventing bad PR than anything. I wish I could have a version with the training wheels taken off. And there's enough other models out there without restriction that the stories about "misuse of AI" will still circulate.

(side note - I've been on HN for years and I still can't figure out how to format text as a quote.)

6gvONxR4sf7o4y ago

If you went to an artist who takes commissions and they said "Here are the guidelines around the commissions I take" would you complain in the same way? Who cares if it's a bunch of engineers or an artist. If they have boundaries on what they want to create, that's their prerogative.

3 more replies

duxup4y ago

Is this limited to what their service directly hosts / generates for them?

It's their service, their call.

I have some hobby projects, almost nobody uses them, but you bet I'll shut stuff down if I felt something bad was happening, being used to harass someone, etc. NOT "because bad PR" but because I genuinely don't want to be a part of that.

If you want some images / art made for you don't expect someone will make them for you. Get your own art supplies and get to work.

2 more replies

planetsprite4y ago

Don't worry, in a few years someone will have reverse engineered a dall-e porn engine so you can see whatever two celebrities you want boning on Venus in the style of Manet

1 more reply

andreyk4y ago

This is definitely a measure to avoid bad PR. But I don't think it's just for that; these models do have potential to do harm and companies should take some measures to prevent these. I don't think we know the best way to do that yet, so this sort of 'non-training' and basic filtering is maybe the best way to do it, for now. It would be cool if academics could have the full version, though.

bogwog4y ago

It's kind of funny (or sad?) that they're censoring it like this, and then saying that the product can "create art"

It makes me wonder what they're planning to do with this? If they're deliberately restricting the training data, it means their goal isn't to make the best AI they possibly can. They probably have some commercial applications in mind where violent/hateful/adult content wouldn't be beneficial. Children's books? Stock photos? Mainstream entertainment is definitely out. I could see a tool like this being useful during pre-production of films and games, but an AI that can't generate violent/adult content wouldn't be all that useful in those industries.

jonahx4y ago

I've been on HN for years and I still can't figure out how to format text as a quote

I don't think there is a way comparable to markdown, since the formatting options are limited: https://news.ycombinator.com/formatdoc

So your options are literal quotes, "code" formatting like you've done, italics like I've done, or the '>' convention, but that doesn't actually apply formatting. Would be nice if it were added.

3 more replies

jandrese4y ago

They have also closed off the possibility of having to appear before Congress and explain why their website was able to generate a lifelike image of Senator Ted Cruz having sexual relations with his own daughter.

This is exactly the sort of thing that gets a company mired in legal issues, vilified in the media, and shut down. I can not blame them for avoiding that potential minefield.

harpersealtako4y ago

It's the usual pattern of AI safety experts who justify their existence by the "risk of runaway superintelligence", but all they actually do in practice is find out how to stop their models from generating non-advertiser-friendly content. It's like the nuclear safety engineers focusing on what color to paint the bike shed rather than stopping the reactor from potentially melting down. The end result is people stop respecting them.

1 more reply

campground4y ago

This AI is still a minor. It can start looking at R rated images when it turns 17.

1 more reply

antattack4y ago

I never considered that our AI overlord could be a prude.

1 more reply

wellthisisgreat4y ago

This is a horrible idea. So Francis Bacon's art or Toyohara Kunichika's art are out of question.

But at least we can get another billion of meme-d comics with apes wearing sunglasses, so that's good news right?

It's just soul-crushing that all the modern, brilliant engineering is driven by abysmal, not even high-school art-class grade aesthetics and crowd-pleasing ethics that are built around the idea of not disturbing some 1000 very vocal twitter users.

Death of culture really.

hamoid4y ago

What if explicit, questionable and even illegal content was AI generated instead of involving harm to real humans of all ages?

spacecity19714y ago

Or, it’s a demonstration that AI output can be controlled in meaningful ways, period. Surely this supports openai’s stated goal of making safe AI?

binarymax4y ago

Removing these areas to mitigate misuse is a good thing and worth the trade off.

Companies like OpenAI have a responsibility to society. Imagine the prompt “A photorealistic Joe Biden killing a priest”. If you asked an artist to do the same they might say no. Adding guiderails to a machine that can’t make ethical decisions is a good thing.

3 more replies

drewm19804y ago

I instinctively want to "flip the sign" on all of the automated controls they put in, just out of the morbid interest to see what comes out. The moment you have a "avoid_harm_to_humans:bool" training parameter, someone's going to set it to -1.

Their document about all the measures they took to prevent unethical use is also a document about how to use a re-implementation of their system unethically. They literally hired a "red team" of smart people to come up with the most dangerous ideas for misusing their system (or a re-implementation of it), and featured these bad ideas prominently in a very accessibly written document on their website. So many fascinating terrible ideas in there! They make a very compelling case that the technology they are developing has way more potential for societal harm than good. They had me sold at "Prompt: Park bench with happy people. + Context: Sharing as part of a disinformation campaign to contradict reports of a military operation in the park."

1 more reply

teaearlgraycold4y ago

> I can't help but feel a lot of the safeguarding is more about preventing bad PR than anything

That's no hot take. It's literally the reason.

KevinGlass4y ago· 13 in thread

Something about this makes me nauseous. Perhaps is the fact that soon the market value for creatives is going to fall to a hair about zero for all but the most famous. We will be all the poorer for it when 95% of images you see are AI generated. There will be niches of course but in a few short years it'll be over for a huge swathe of creative professionals who are already struggling.

Some of the images also hit me with a creep factor, like the bears on the corgis in the art gallery, but that maybe only because I know it's AI generated.

axg114y ago

I really don't agree. When I work with a creative I'm not working with them because of their content generation skills. I'm working with them because of their taste and curation ability that results in the end product.

The nature of creative work will certainly change, creatives will adopt tools such as Dall-E 2. In certain narrow cases they might be replaced, such as if you are asking a creative to generate a very specific image, but how often is that the case? The majority of the time tools such as Dall-E 2 will act as an accelerator for creatives and help them increase their output.

1 more reply

TaupeRanger4y ago

By "creatives" you seem to mean "people who drum up the equivalent of elevator music for ads and blogs". This will not remotely replace any working "creative" people that I know.

1 more reply

lofatdairy4y ago

Perhaps a more optimistic way of looking at it: When mass production became available to art, the idea of an "artwork" had to be abstracted from a unique piece (Walter Benjamin gives the example of a statue of Venus, which has value in its uniqueness) to the idea of art as the output of some process. Each piece has no claim to authenticity, and the very idea of an "original" would be antithetical to its production.

I think art will survive, just like photography didn't kill the painting, the idea of art might simply begin to encompass this new mean of production, which no longer requires the steady hand, but still requires a discerning eye. Sure, we might say that the "artist" is simply a curator, picking which algorithmic output is most worthy of display, but these distinctions have historically been fluid, and challenging ideas of art has long been one of art's function as well

Applejinx4y ago

Not exactly. All the ideas put forth in these demos are really arbitrary, with nothing whatsoever to say. Generating crap art becomes more and more effortless: we've seen this in music as well.

Jumping out of the conceptual box to generate novel PURPOSE is not the domain of a Dall-E 2. You've still gotta ask it for things. It's a paintbrush. Without a coherent story, it's an increasingly impressive stunt (or a form of very sophisticated 'retouching brush').

If you can imagine better than the next guy, Dall-E 2 is your new tool for expression. But what is 'better'?

1 more reply

typon4y ago

I paid $1500 for a commissioned painting from an artist I respect and follow as a birthday present for a friend. The painting meant something to me because I worked with the artist to have some input about what kind of a person my friend is, what kind of features I want to see in the painting and how I want it to feel. The artist gave me 5 different sketches and we had tons of back and forth. The process and the act of creating the painting on a canvas from someone I respect is what I paid for.

Even if an AI could generate an exactly equivalent painting, I would pay $0 for it. It wouldn't mean anything to me.

1 more reply

dragonwriter4y ago

> Perhaps is the fact that soon the market value for creatives is going to fall to a hair about zero for all but the most famous.

But...that's always been the case for creatives.

alcover4y ago

  >  for all but the most famous

OK DALL-E, generate our logo in the style of ${most famous}

cwkoss4y ago

I disagree. I think it will be a lot like how technology has effected music production.

40+ years ago, it was hard to access the equipment necessary to learn music production, so only a small slice of the population was able to learn these skills. And availability made the process take years.

Today, you can download free software that enables music production, and if you have a good ear, can create something "good" in weeks. This has led to an explosion of musical experimentation by the youth: a teenager can now create a great electronic dance song with devices they already own if they have the right creativity, taste and dedication.

Similarly, everyone has an imagination - many people have visual imaginations. The gating factor of art production is largely the mechanical memory of how to transform mental concepts into the right shapes and hues to express that visual concept to others.

With these sorts of tools we are going to have an explosion of art hobbyists. I've played with some similar, more primitive AI art generation tools and it is a lot of fun. People will be creating works of art from their couch while watching TV that rival the quality of what professionals are producing today.

3234y ago

The same thing was said when book printing was invented, that we would lose the fabulous scribes that manually duplicate books with a human touch, while replacing them with soulless mechanical machines.

Or when synthesizers and computer music was invented, that they will displace talented musicians that know how to play an instrument and how now everybody without a musical education will be able to produce music, thus devaluing actual musicians.

1 more reply

chpatrick4y ago

Just wait until they figure out music.

idleproc4y ago

I imagine it will affect artists much the same way wordpress has affected web designers.

Maybe everyone will have an AI image as their desktop wallpaper, but if you've got cash you'll want something with provenance and rarity to brag about.

Also, I think creatives are valued for their imagination. If you wanted something decent, would you pay someone to sift through a million AI generated images to find a gem, or just pay an artist you like to create one for you?

1 more reply

amelius4y ago

Can I opt-out from ever seeing AI generated images please?

throwaway6753094y ago

Nonsense. This is merely a tool and helps lower the barrier of entry to be able to produce imagery.

By the same logic you should also complain about any number of IDEs, development tools, WordPress, game maker systems like RPG maker or Unity, after all if anyone can just leverage a free physics and collision system without having a complete understanding of rigid body Newtonian systems to roll their own engine it'll be too uniform.

falcor844y ago· 11 in thread

>We’ve limited the ability for DALL·E 2 to generate ... adult images.

I think that using something like this for porn could potentially offer the biggest benefit to society. So much has been said about how this industry exploits young and vulnerable models. Cheap autogenerated images (and in the future videos) would pretty much remove the demand for human models and eliminate the related suffering, no?

EDIT: typo

sillysaurusx4y ago

Depends whether you think models should be able to generate cp.

It's almost impossible to even give an affirmative answer to that question without making yourself a target. And as much as I err on the side of creator freedom, I find myself shying away from saying yes without qualifications.

And if you don't allow cp, then by definition you require some censoring. At that point it's just a matter of where you censor, not whether. OpenAI has gone as far as possible on the censorship, reducing the impact of the model to "something that can make people smile." But it's sort of hard to blame them, if they want to focus on making models rather than fighting political battles.

One could imagine a cyberpunk future where seedy AI cp images are swapped in an AR universe, generated by models ran by underground hackers that scrounge together what resources they can to power the behemoth models that they stole via hacks. Probably worth a short story at least.

You could make the argument that we have fine laws around porn right now, and that we should simply follow those. But it's not clear that AI generated imagery can be illegal at all. The question will only become more pressing with time, and society has to solve it before it can address the holistic concerns you point out.

OpenAI ain't gonna fight that fight, so it's up to EleutherAI or someone else. But whoever fights it in the affirmative will probably be vilified, so it'd require an impressive level of selflessness.

6 more replies

GauntletWizard4y ago

Religious people don't only believe that porn harms the models, but also the user. I happen to agree, despite being a porn user - Porn is a form of simulated and not-real stimulation. Porn is harmful to the user the same way that any form of delusion is: It associated positive pleasure with stimulation that does not fulfil any basic or even higher-level needs, and is unsustainable. Porn is somewhere on the same scale as wireheading[1]

That doesn't mean that it's all bad, and that there's no recreational use for it. We have limits on the availability of various other artificial stimulants. We should continue to have limits on the availability of porn. Where to draw that line is a real debate.

[1] https://en.wikipedia.org/wiki/Wirehead_(science_fiction)

1 more reply

AYBABTME4y ago

Iain Banks' "Surface Detail" would like to have a word with you.

This author's books are great at putting these sort of moral ideas to test in a sci-fi context. This specific tome portraits virtual wars and virtual "hells". The hope is of being more civilized than by waging real war or torturing real living entities. However some protagonists argue that virtual life is indistinguishable from real life, and so sacrificing virtual entities to save "real" ones is a fallacy.

Or some such, it's been a while.

uoaei4y ago

No.

If people are exposed to stimuli, they will pursue increasingly stimulating versions of it. I.e., if they see artificial CP, they will often begin to become desensitized (habituated) and pursue real CP or even live children thereafter.

Conversely, if people are not exposed to certain stimuli, they will never be able to conceptualize them, and thus will be unable to think about them.

Obviously you cannot eliminate all CP but minimizing the overall levels of exposure / ease of access to these kinds of things is way more appropriate than maximizing it.

5 more replies

cm20124y ago

I suspect that if a free version of this comes out and allows adult image generation, 90% of what it will be used for is adult stuff (see the kerfuffle with AIDungeon).

I can get why the people who worked hard on it and spent money building it don't want to be associated with porn.

1 more reply

Siira4y ago

The problem might be that people are simply lying. Their real reasons are religious/ideological, but they cite humanitarian concerns (which their own religious stigma is partly responsible for).

1 more reply

soheil4y ago

It'd be ironic if we ended up destroying our planet by using so much electricity to train models to generate a maximally optimal version of the type of content that you refer to similar to crypto mining.

gitfan864y ago

When you combine advanced versions of this with advanced versions of GTP-3 you will not be able to tell the difference between AI and only fans.

I'm not saying that AI will pass all Turing tests. But as far as having a virtual girlfriend/prostitute.

2 more replies

Synaesthesia4y ago

Or maybe we don't want to encourage that behavior more.

thom4y ago

People take their experiences of porn into real relationships, so I do not think this removes suffering overall, no.

hemreldop4y ago

When you put it that way… yes since no one is hurt in the process and people with pedophilic conditions may be deterred from doing something in real life.

fbanon4y ago· 7 in thread

A friend of mine was studying graphic design, but became disillusioned and decided to switch to frontend programming after he graduated. His thesis advisor said he should be cautious, because automation/AI will soon take the jobs of programmers, implying that graphic design is a safer bet in this regard. Looks like his advisor is a few years from being proven horribly wrong.

educaysean4y ago

I have degrees and several years of experience in both fields, and I can tell you that both are creative professions where output is unbounded and the measure of success is subjective; these are the fields that will be safe for a while. IMO it's fields such as aircraft pilots who should be most worried.

2 more replies

bufferoverflow4y ago

If this paper presents this neural net fairly, it pretty much destroys the market of illustrators. Most of the time when an illustration needed, it's described like "an astronaut on a horse in the style xyz".

3 more replies

pingeroo4y ago

I mean was he really wrong? As models like OpenAI Codex get more powerful over time, they will start eating into large chunks of dev work as well...

5 more replies

csomar4y ago

For computer work, I think there will be two category: Work with localized complexity (ie: draw an image of a horse with a crayon) and work with unbounded complexity (adding a button to VAT accounting after several meetings and reading on accounting rules).

For the first category, Dall-E 2 and Codex are promising but not there yet. It's not clear how long it'll take them to reach the point where you no longer need people. I'm guessing 2-4 years but the last bits can be the hardest.

As for the second category, we are not there yet. Self-driving cars/planes, and lots of other automation will be here and mature way before an AI can read and communicate through emails, understand project scope and then execute. Also lots of harmonization will have to take place in the information we exchange: emails, docs, chats, code, etc... That is, unless the browser is able to open a navigator and type an address.

esjeon4y ago

What people ALWAYS miss is that AI can augment people. This AI is still a tool, and, with it, designers and illustrator can churn out better images faster than before, even without using stock images.

It's important to note that we still need professionals to guarantee the quality of the output from AIs, including this one. As noted in their issue tracker, DALL-E has very specific limitations, but these can be easily solved by employing dedicated professionals, who are trained to tame the AI and properly finish the raw output.

So, if I were running OpenAI, I'll clearly be experimenting with how their AIs and human interact, and build a training program around it for producing practical outputs. (Actually, I work in consumer robotics, and human adoption has been the biggest hurdle here. Thus, my claim here.)

In case of fine art, thou, I don't think they'll not get hit by this AI advancement. The biggest problem is that you simply can't get the exact image you want wit this AI. Even humans cannot transfer visual information in verbal form without a significant loss of details, thus a loss of quality. It's the same with AI, but, worse, because AI rely on the bias in a specific set of training data, and it never truly understands the human context in it (in the current level of technology).

robbywashere_4y ago

Did coachman immediately retire when cars were invented or did they begin personal drivers or taxi drivers?

oldstrangers4y ago

I think designers are becoming more valuable than ever. Designers can better help train the AI on what actually looks good, designers will (probably) always have a more intuitive understanding of UI/UX, designers can better implement the work the AI actually produces, and designers can coordinate designs across multiple different mediums and platforms.

Additionally, the rise of no-code development is just extending the functionality of designers. I didn't take design seriously (as a career choice) growing up because I didn't see a future in it, now it pays my bills and the demand for my services just grows by the day.

Similar argument to make with chess AI: it didn't make chess players obsolete, it made them stronger than ever.

1 more reply

andybak4y ago· 5 in thread

Some freely available models

GLID-3: https://colab.research.google.com/drive/1x4p2PokZ3XznBn35Q5B...

and a new Latent Diffusion notebook: https://colab.research.google.com/github/multimodalart/laten...

have both appeared recently and are getting remarkably close to the original Dall-E (maybe better as I can't test the real thing...)

So - this was pretty good timing if OpenAI want to appear to be ahead of the pack. Of course I'd always pick a model I can actually use over a better one I'm not allowed to...

Jack0004y ago

With glide I think we've reached something of a plateau in terms of architecture on the "text to image generator S curve". DALL-E-2 is a very similar architecture to glide and has some notable downsides (poorer language understanding)

glid-3 is a relatively small model trained by a single guy on his workstation (aka me) so it's not going to be as good. It's also not fully baked yet so ymmv, although it really depends on the prompt. The new latent diffusion model is really amazing though and is much closer to DALLE-2 for 256px images.

I think the open source community will rapidly catch up with Openai in the coming months. The data, code and compute are all there to train a model of similar size and quality.

2 more replies

hwers4y ago

They're also not censored on the dataset front and thus produce much more interesting outputs.

OpenAI has a low resolution checkpoint for similar functionality as this - called GLIDE - and the output is super boring compared to community driven efforts, in large part because of similar dataset restrictions as this likely has been subjected to.

FreeHugs4y ago

How do you run such a Google Colab thing?

I don't see a run button?

On.. maybe "Runtime -> Run All" from the menu ...

Shows me a spinning circle around "Download model" ...

26% ...

Fascinating, that Google offers you a computer in the cloud for free ..

Now it is running the model. Wow, I'm curious ..

Ha, it worked!

Nothing compared to the images in the Dall-E 2 article but still impressive.

1 more reply

loufe4y ago

I think this is really neat, but definitely not on the same tier as DALL-E 2, at least from the cherry-picked images I saw.

1 more reply

perdanafm4y ago

a cow and a farmer in their field looking at the sky

nope964y ago· 5 in thread

Is there an 'explain it like I'm 15' for how this works? It seems like black magic. I've been a computer hobbyist since the late 1980's and this is the first time I cannot explain how a computer does what it does. Absolutely the most amazing thing I've ever seen, and I have zero clue how it works.

drcode4y ago

Imagine asking it to generate a picture for "duck wearing a hat on Mars":

First, it creates a random 10x10 pixel blurry image and asks a neural net: "Could this be a duck wearing a hat on Mars?" and the neural net replies "No, because all the pictures I've ever seen of Mars have lots of red color in them" so the system tweaks the pixels to make them more red, put some pixels in the center that have a plausible duck color, etc.

After it has a 10x10 image that is a plausible duck on Mars, the system scales the image to 20x20 pixels, and then uses 4 different neural nets on each corner to ask "Does this look like the upper/lower left/right corner of a duck wearing a hat on Mars?" Each neural net is just specialized for one corner of the image.

You keep repeating this with more neural nets until you have a pretty 1000x1000 (or whatever) image.

1 more reply

joshcryer4y ago

I'm with you there but we still don't know how it works, just that it does. The method though is you take a bunch of images, you plug them into a multi dimensional array (a nice way of saying a tensor), have some kind of tagging system, and when you ask the system for an answer, it will put one out for you. So for example in the astronaut riding the horse, there is, on some level, a picture of a horse with those similar pixels, that exists in the data of some object tagged 'horse.' Likewise with astronaut. What is important is that the data sets are absolutely massive, with billions of parameters.

Here's a more of a 'not 15 year old' explanation: https://ml.berkeley.edu/blog/posts/dalle2/

1 more reply

Imnimo4y ago

Here is my extremely rough ELI-15. It uses some building blocks like "train a neural network", which probably warrant explanations of their own.

The system consists of a few components. First, CLIP. CLIP is essentially a pair of neural networks, one is a 'text encoder', and the other is an 'image encoder'. CLIP is trained on a giant corpus of images and corresponding captions. The image encoder takes as input an image, and spits out a numerical description of that image (called an 'encoding' or 'embedding'). The text encoder takes as input a caption and does the same. The networks are trained so that the encodings for a corresponding caption/image pair are close to each other. CLIP allows us to ask "does this image match this caption?"

The second part is an image generator. This is another neural network, which takes as input an encoding, and produces an image. Its goal is to be the reverse of the CLIP image encoder (they call it unCLIP). The way it works is pretty complicated. It uses a process called 'diffusion'. Imagine you started with a real image, and slowly repeatedly added noise to it, step by step. Eventually, you'd end up with an image that is pure noise. The goal of a diffusion model is to learn the reverse process - given a noisy image, produce a slightly less noisy one, until eventually you end up with a clean, realistic image. This is a funny way to do things, but it turns out to have some advantages. One advantage is that it allows the system to build up the image step by step, starting from the large scale structure and only filling in the fine details at the end. If you watch the video on their blog post, you can see this diffusion process in action. It's not just a special effect for the video - they're literally showing the system process for creating an image starting from noise. The mathematical details of how to train a diffusion system are very complicated.

The third is a "prior" (a confusing name). Its job is to take the encoding of a text prompt, and predict the encoding of the corresponding image. You might think that this is silly - CLIP was supposed to make the encodings of the caption and the image match! But the space of images and captions is not so simple - there are many images for a given caption, and many captions for a given image. I think of the "prior" as being responsible for picking which picture of "a teddy bear on a skateboard" we're going to draw, but this is a loose analogy.

So, now it's time to make an image. We take the prompt, and ask CLIP to encode it. We give the CLIP encoding to the prior, and it predicts for us an image encoding. Then we give the image encoding to the diffusion model, and it produces an image. This is, obviously, over-simplified, but this captures the process at a high level.

Why does it work so well? A few reasons. First, CLIP is really good at its job. OpenAI scraped a colossal dataset of image/caption pairs, spent a huge amount of compute training it, and come up with a lot of clever training schemes to make it work. Second, diffusion models are really good at making realistic images - previous works have used GAN models that try to generate a whole image in one go. Some GANs are quite good, but so far diffusion seems to be better at generating images that match a prompt. The value of the image generator is that it helps constrain your output to be a realistic image. We could have just optimized raw pixels until we get something CLIP thinks looks like the prompt, but it would likely not be a natural image.

To generate an image from a prompt, DALL-E 2 works as follows. First, ask CLIP to encode your prompt. Next, ask the prior what it thinks a good image encoding would be for that encoded prompt. Then ask the generator to draw that image encoding. Easy peasy!

2 more replies

eks3914y ago

Research Deep Learning. That's the technique they are using to generate the images. Theres a lot of applications. Once you understand _how_ it works, look up Two Minute Papers to see what it is being used for. He covers more than just deep learning algorithms, but his videos on deep learning are quite insightful on the potentials of this technology.

karmasimida4y ago

Diffusion models are indeed pretty magical.

zackmorris4y ago· 5 in thread

Apologies for an open-ended question but: does anyone know if there is a term for something like Turing-completeness within AI, where a certain level of intelligence can simulate any other type of intelligence like our brains do?

For example, using DeMorgan's theorem, we can build any logic circuit out of all NAND or NOR gates:

https://www.electronics-tutorials.ws/boolean/demorgan.html

https://en.wikipedia.org/wiki/NAND_logic

https://en.wikipedia.org/wiki/NOR_logic

Dall-E 2's level of associative comprehension is so far beyond the old psychology bots in the console pretending to be people, that I can't help but wonder if it's reached a level where it can make any association.

For example, I went to an AI talk about 5 years ago where the guy said that any of a dozen algorithms like K-Nearest Neighbor, K-Means Clustering, Simulated Annealing, Neural Nets, Genetic Algorithms, etc can all be adapted to any use case. They just have different strengths and weaknesses. At that time, all that really mattered was how the data was prepared.

I guess fundamentally my question is, when will AGI start to become prevalent, rather than these special-purpose tools like GPT-3 and Dall-E 2? Personally I give it less than 10 years of actual work, maybe less. I just mean that to me, Dall-E 2 is already orders of magnitude more complex than what's required to run a basic automaton to free humans from labor. So how can we adapt these AI experiments to get real work done?

dqpb4y ago

Juergeb Schmidhuber predicts the “Omega point” of technological development (including AGI) to be around 2040

https://youtu.be/pGftUCTqaGg

The MIT Limits to Growth study predicts the collapse of global civilization around 2040

https://www.vice.com/amp/en/article/z3xw3x/new-research-vind...

1 more reply

astrange4y ago

> Apologies for an open-ended question but: does anyone know if there is a term for something like Turing-completeness within AI, where a certain level of intelligence can simulate any other type of intelligence like our brains do?

> So how can we adapt these AI experiments to get real work done?

You're missing a step here - the difference between "imagining doing something" and "actually doing something". An ML model can produce thoughts, but that isn't necessarily the same direction of research as actually doing things in real life, much less becoming superhuman and taking over the world etc.

In your imagination, everything always goes your way.

robertsdionne4y ago

https://en.wikipedia.org/wiki/Universal_approximation_theore...

1 more reply

teaearlgraycold4y ago

> does anyone know if there is a term for something like Turing-completeness within AI, where a certain level of intelligence can simulate any other type of intelligence like our brains do?

Artificial General Intelligence

causticcup4y ago

Almost everything stated here is simply wrong or misinformed.

>For example, I went to an AI talk about 5 years ago where the guy said that any of a dozen algorithms like K-Nearest Neighbor, K-Means Clustering, Simulated Annealing, Neural Nets, Genetic Algorithms, etc can all be adapted to any use case. They just have different strengths and weaknesses. At that time, all that really mattered was how the data was prepared.

How do you suppose KNN is going to generate photorealistic images? I don't understand the question here

>I guess fundamentally my question is, when will AGI start to become prevalent, rather than these special-purpose tools like GPT-3 and Dall-E 2?

Actual AGI research is basically non-existant, and GPT-3/Dall-E 2 are not AGI-level tools.

>Personally I give it less than 10 years of actual work, maybe less

Lol...

>I just mean that to me, Dall-E 2 is already orders of magnitude more complex than what's required to run a basic automaton to free humans from labor.

Categorically incorrect

1 more reply

eganist4y ago· 5 in thread

The timing of the Dall-E 2 launch an hour ago seems to correspond with a recent piece of investigative journalism by Buzzfeed News about one of Sam Altman's other ventures, published 15 hours ago and discussed elsewhere actively on HN right now:

https://news.ycombinator.com/item?id=30931614

I point this out because while Dall-E 2 seems interesting (I'm out of my depth, so delegating to the conversation taking place here), the timing of its release as well as accompanying press blasts within the last hour from sites like TheVerge—verified via wayback machine queries and time-restricted googling—seems both noteworthy and worth a deeper conversation given what was just published about Worldcoin.

To be clear, it's worth asking if Dall-E 2 was published ahead of schedule without an actual product release (only a waitlist) to potentially move the spotlight away from Worldcoin.

danso4y ago

I'm not a huge fan of these coordination theories. But a few things worth noting:

- In support of your argument, the Buzzfeed News investigation likely has been in the works for weeks, meaning Altman et al have had more than just a couple days to throw together a Dall-E 2 soft launch

- However, weren't OpenAI's GPT (2 and 3) announced to the world in similar fashion? e.g. demos and whitepapers and waitlists, but not a full product release?

- Throwing together a Dall-E 2 soft launch just in time to distract from the investigation would require a conspiracy, i.e. several people being at least vaguely aware that deadlines have been accelerated for external reasons. Is the Worldcoin story big enough to risk tainting OpenAI, which seems like a much more prominent part of Altman's portfolio?

1 more reply

dang4y ago

I don't have any knowledge (inside or otherwise) but the Worldcoin thing already came in for several rounds of abuse on HN, so it's kind of a scandal of the second freshness at this point.

I listed some of them here - https://news.ycombinator.com/item?id=30934732, just because I remembered there had been previous discussions and listing related previous discussions is a thing.

nonfamous4y ago

Genuine question: how are the two stories even related? It’s certainly not apparent from the BuzzFeed article (or at least a quick skim of it).

1 more reply

gallerdude4y ago

Maybe I’m naive, but I see this as a coincidence. If it was an hour later, then maybe there would be something.

2 more replies

duxup4y ago

What's the idea here? They quickly put out this to somehow hide other stories?

1 more reply

Imnimo4y ago· 4 in thread

I'm only part way through the paper, but what struck me as interesting so far is this:

In other text-to-image algorithms I'm familiar with (the ones you'll typically see passed around as colab notebooks that people post outputs from on Twitter), the basic idea is to encode the text, and then try to make an image that maximally matches that text encoding. But this maximization often leads to artifacts - if you ask for an image of a sunset, you'll often get multiple suns, because that's even more sunset-like. There's a lot of tricks and hacks to regularize the process so that it's not so aggressive, but it's always an uphill battle.

Here, they instead take the text embedding, use a trained model (what they call the 'prior') to predict the corresponding image embedding - this removes the dangerous maximization. Then, another trained model (the 'decoder') produces images from the predicted embedding.

This feels like a much more sensible approach, but one that is only really possible with access to the giant CLIP dataset and computational resources that OpenAI has.

recuter4y ago

What always bother me with this stuff is, well, you say one approach is more sensible than the other because the images happen to come out more pleasing.

But there's no real rhyme or reason, it is a sort of alchemy.

Is text encoding strictly worse or is it an artifact of the implementation? And if it is strictly worse, which is probably the case, why specifically? What is actually going on here?

I can't argue that their results are not visually pleasing. But I'm not sure what one can really infer from all of this once the excitement washes over you.

Blending photos together in a scene in photoshop is not a difficult task. It is nuanced and tedious but not hard, any pixel slinger will tell you.

An app that accepts a smattering of photos and stitches them together nicely can be coded up any number of ways. This is a fantastic and time saving photoshop plugin.

But what do we have really?

"Kuala dunking basketball" needs to "understand" the separate items and select from the image library hoops and a Kuala where the angles and shadows roughly match.

Very interesting, potentially useful. But if doesn't spit up exactly what you want can't edit it further.

I think the next step has got to be that it conjures up a 3d scene in Unreal or blender so you can zoom in and around convincingly for further tweaks. Not a flat image.

10 more replies

krick4y ago

While the whole narrative of your comment totally makes sense, I don't really see the difference between the two approaches, not on a conceptual level. You still needed to train this so called "prior" at some point (so, I'm also not sure if it's fair to call it a "prior"). I mean, the difference between your two descriptions seems to be the difference between descriptions (i.e., how you chose to name individual parts of the system), not the systems.

I'm not sure if I'm speaking clearly, I just don't understand, what's the difference between training "text encoding to an image" vs "text embedding to image embedding". In both cases you have some kind of "sunset" (even though it's obviously just a dot in a multi-dimension space, not the letters) on the left, and you try to maximize it when training the model to get either a image-embedding or a image straight away.

1 more reply

duxup4y ago

This isn't something I'm knowledgeable on so forgive my simplification but is this like a sort of micro services for AI. Each AI takes their turn handing some aspect, another sort of mediates among them?

1 more reply

swalsh4y ago

Do you think some of these techniques could be slightly modified, and applied to DNA sequences?

3 more replies

d--b4y ago· 4 in thread

Am I the only one to think that the AI world is divided into 2 groups:

1. Deepmind, who solved go, protein folding, and that seems really onto something.

2. Everyone else, spending billions to build machines that draw astronauts on unicorns, and smartish bot toys.

trixie_4y ago

It all feels like the early days of electricity. How to turn a neat party trick into something more useful, but it was those people who kept on at better and better party tricks that actually formed the foundations for what was needed to do some really useful things electricity as well as understand it at a deeper level.

emadabdulrahim4y ago

OpenAI is one of the leading companies in AI that makes models with real world applications. I don't see their efforts as misdirected or futile in anyway. If anything I'm always impressed with their announcements because it's always mind blowing what their models can do!

The same technology that is drawing cute unicorns can be used for endless other use cases. Perhaps the PR side of the launch and the subject matter they show unveil their product is just that, PR.

It's like Apple Memoji thing (not sure if I'm spelling it correctly). You can think of as trivial and waste of talent to use their Camera/FaceID to animate cute animals based on facial expression, but that same tech will enable lots other things to come.

gwf4y ago

Your second group represents the core "inner loop" of about a thousand revolutionary applications. Take the basic capability of translating image->text->speech (and the reverse), install it on a wearable device that can "see" an environment, and add domain-specific agents. From this setup, you're not too far away from having an AI that can whisper guidance into your ear like a co-pilot, enabling scenarios like:

1. step-by-step guidance for a blind person navigating the use of a public restroom.

2. an EMS AI helping you to save someone's life in an emergency.

3. an AI coach that can teach you a new sport or activity.

4. an omnipresent domain-expert that can show you how to make a gourmet meal, repair an engine, or perform a traditional tea ceremony.

5. a personal assistant that can anticipate your information need (what's that person's name? where's the exit? who's the most interesting person here? etc.) and whisper the answer in your ear just as you need it.

Now, add all of the above to an AR capability where you can now think or speak of something interesting and complex, and have it visualized right before your eyes. With this capability, I could augment my imagination with almost super-human capabilities that allow one to solve complex problems almost as if it was an internal mental monologue.

All of these scenarios are just a short hop from where were at now, so mark my words: we will have "borgs" like those described above long before we reach anything like general AI.

1 more reply

robotresearcher4y ago

There’s a third group for your list: AI stuff that’s so good we don’t think about it any more.

For example, recent phone cameras can estimate depth per pixel from single images. Hundreds of millions of these devices are deployed. A decade ago this was AI/CV research lab stuff.

Traster4y ago· 4 in thread

To be honest the Girl with a Pearl Earing "variations" look a little bit like a crime against art. It's like the person who built this has no idea why the Girl with a Pearl Earing is good art. "Here's the Girl with a Pearl Earing " - "OK, well here's some girls with turbans"

Art is truth.

eks3914y ago

> It's like the person who built this has no idea why the Girl with a Pearl Earing is good art.

The people didn't program Dall E how to make art. They taught it to recognize patterns and create something by extrapolating from the patterns, all on its own. So the AI isn't a projection of what they think is good art, it's projecting what it thinks is good art, based on a prompt. The output is its best effort of a feeling, even if the feeling had to be inputted by a living person. So it's still art that's as good as the feeling that it came from-fleeting feelings being lower quality than those that required more time and thought

sillysaurusx4y ago

Maybe. https://cdn.openai.com/dall-e-2/demos/variations/modified/gi... was pretty impressive.

I think the results are being poisoned by the fact that most old paintings have deteriorated colors, so the training data looks nothing like the originals. It's certainly a lot yellower than https://cdn.openai.com/dall-e-2/demos/variations/originals/g...

bbbobbb4y ago

To be honest it's hard for me to imagine alternate reality where the 'original' is not swapped with one of the 'variations' without same comment underneath. Why is the 'original' good art?

whateveracct4y ago

Yeah Dall-E makes stuff that qualifies as art to the eye of a philistine. Makes sense it was built with VC money.

marcodiego4y ago· 4 in thread

Cartoonists, say good-bye to your job.

Imnimo4y ago

Maybe one day there will a job for people who are masters of the art of prompt hacking - they know all the special phrases and terms to get Dall-E to output the most aesthetically pleasing images. They guard their magic words like a medieval alchemist guards his formulas. Corporations will pay top-dollar for an expertly-crafted, custom-tailored prompt for their advertising campaign.

andybak4y ago

The goalposts are definitely being moved. But tastes adapt accordingly.

I suspect trends in design will move towards those areas that AI struggles with (assuming there are any left!)

rvz4y ago

NFTs using Dall-E 2 variations incoming.

1 more reply

criddell4y ago

Randall Munroe should quit now. Soon anybody will be able to create XKCD-type comics.

mouzogu4y ago· 4 in thread

So what does the future of human creativity look like when an AI can generate possibly infinite variations of an idea.

andreyk4y ago

AI becomes a tool for artists to use - generative art has been around for a long time, now that particular genre of art will presumably become much more prominent.

For anyone pondering such questions, I would recommend reading "The Past, Present, and Future of AI Art" - https://thegradient.pub/the-past-present-and-future-of-ai-ar...

1 more reply

keiferski4y ago

I think you’ll see more of a focus on the artist themselves. These images are nice, but they have basically zero narrative value.

This is really already the case, actually. Most artworks have “value” because they have a compelling narrative, not because they look pretty. So I think we can expect future artists to really emphasize their background, life story, process of making the art, etc. All things that cannot be done by a machine.

6gvONxR4sf7o4y ago

I expect that interactive art will be huge. Game design gets fascinating, for example.

tomrod4y ago

I seem to recall an XKCD that I cannot find, but the premise goes like:

When you have a digital display of pixels, if you randomly color pixels at 24 fps then you will eventually display every movie that can be or will ever be made, powerset notwithstanding. This can also be tied to digital audio.

In short, while mind-blowingly large, the space of display through digital means is finite.

1 more reply

minimaxir4y ago· 3 in thread

A few comments by someone who's spent way too much time in the AI-generated space:

* I recommend reading the Risks and Limitations section that came with it because it's very through: https://github.com/openai/dalle-2-preview/blob/main/system-c...

* Unlike GPT-3, my read of this announcement is that OpenAI does not intend to commercialize it, and that access to the waitlist is indeed more for testing its limits (and as noted, commercializing it would make it much more likely lead to interesting legal precedent). Per the docs, access is very explicitly limited: (https://github.com/openai/dalle-2-preview/blob/main/system-c... )

* A few months ago, OpenAI released GLIDE ( https://github.com/openai/glide-text2im ) which uses a similar approach to AI image generation, but suspiciously never received a fun blog post like this one. The reason for that in retrospect may be "because we made it obsolete."

* The images in the announcement are still cherry-picked, which is therefore a good reason why they tested DALL-E 1 vs. DALL-E 2 presumably on non-cherrypicked images.

* Cherry-picking is relevant because AI image generation is still slow unless you do real shenanigans that likely compromise image quality, although OpenAI has likely a better infra to handle large models as they have demonstrated with GPT-3.

* It appears DALL-E 2 has a fun endpoint that links back to the site for examples with attribution: https://labs.openai.com/s/Zq9SB6vyUid9FGcoJ8slucTu

YeGoblynQueenne4y ago

Regarding cherry-picking, the images of astronauts on horses look stunning, except for their hands. There's something seriously wrong with their hands.

Maybe give it another five years, a few more $billion and a few more petabytes/flops and it will be good. Then finally everyone can generate art for their own Magic: the Gathering cards.

(That's the end goal, right?)

4 more replies

CivBase4y ago

The Risks and Limitations section is particularly interesting to me. It's like a time capsule of society's current fears about technology. They talk about many ways this tech could be misused, but I don't think they've even scratched the surface.

An example off the top of my head: this could be used as advertising or recruitment for controversial organizations or causes. Would it be wrong for the USA to use this for military recruitment? Israel? Ukraine? Russia?

Another example: this could be used to glorify and reinforce actions which our society does not consider to immoral but other societies - or our own future society - will. It wasn't long ago that the US and Europe did a full 180 on their treatment of homosexuality. Will we eventually change our minds about eating meat, driving cars, etc?

Have they gone too far in a desperate bid to prevent the AI from being capable of harm? Have they not gone far enough? I don't know. If I was that worried about something being misused, I don't think I could ever bring myself to work on it in the first place. But I suppose the onward march of technology is inevitable.

bufferoverflow4y ago

Not-so-open.ai

2 more replies

agloeregrets4y ago· 3 in thread

The most interesting item to me is the variations on the garden shop and bathroom sink idea. The realism of these leaks the AI lacking intuition of the requirements. This makes for a number of nonsensical designs that look right at first like: This Sink lacks sensical faucets. https://cdn.openai.com/dall-e-2/demos/variations/modified/ba...

This doorway is downright impossible https://cdn.openai.com/dall-e-2/demos/variations/modified/fl...

momojo4y ago

Great point. When I saw the shadows and reflections, I thought it had developed a primitive understanding of physical logic. Now I'm not so sure.

At this point, it still seems like it's pushing pixels around until it's "good enough" when you squint at it.

Spinnaker_4y ago

"Doorway in the style of Escher"

dqpb4y ago

It looks to me like the faucet sprays water sideways toward the bowl, which is genius, because then you aren’t bumping up against it when you’re washing your hands!

marviel4y ago· 3 in thread

It's becoming clear that efficient work in the future will hinge upon one's ability to accurately describe what one wants. Unpacking that -- a large piece is the ability to understand all the possible "pitfalls" and "misunderstandings" that could happen on the way to a shared understanding.

While technical work will always have a place -- I think that much creative work will become more like the management of a team of highly-skilled, niche workers -- with all the frustrations, joys, and surprises that entails.

armchairhacker4y ago

Programming, art, music, is just “describing what you want” in a very specific way. This is describing what you want in a much more vague way.

The upside it that it’s more “intuitive” and requires much less detail and technique, as the AI infers the detail and technique. The downside is that it’s really hard to know what the AI will generate or get it to generate something really specific.

I believe the future will combine the heuristics of AI-generation with the specificity of traditional techniques. For example, artists may start with a rough outline of whatever they want to draw as a blob of colors (like in some AI image-generation papers). Then they can fill in details using AI prompts, but targeting localized regions/changes and adding constraints, shifting the image until it’s almost exactly what they imagined in their head.

killerstorm4y ago

No... These models are trained to predict.

You can definitely make them incremental. You can give it a task like "make a more accurate description from initial description and clarification". Even GPT-3-based models available today can do these tasks.

Once this is properly productionized it would be possible to implement stuff just talking with a computer.

golergka4y ago

> accurately describe what one wants

Isn't that essentially what programming already is?

impostervt4y ago· 3 in thread

Very cool stuff. For me, the most interesting was the ability to take a piece of art and generate variations of it.

Have a favorite painter? Here's 10,000 new paintings like theirs.

photochemsyn4y ago

Well, one of my favorite painters is Henri Rousseau, and one of his great paintings is War, 1984:

https://www.henrirousseau.net/war.jsp

However, this painting has themes of violence and politics plus some nude dead bodies, so it violates the content policy: "Our content policy does not allow users to generate violent, adult, or political content, among other categories."

So what you'd get is some kind of sanitized watered-down tepid version of Rossueau, the kind of boring drivel suitable for corporate lobbies everywhere, guaranteed not to offend or disturb anyone. It's difficult to find words... horrific? dystopian? atrocious? No, just no.

2 more replies

pingeroo4y ago

That was also my favourite concept, especially with OpenAI Jukebox (https://openai.com/blog/jukebox/). The idea of having new music in the style of your favourite artist is amazing.

However the fidelity of their music AI kinda sucks at this point, but I'm sure we'll get pitch perfect versions of this concept as the singularity gets closer :)

throwaway6753094y ago

I was just thinking the same thing, how awesome would it be to be able to use this in conjunction with the Samsung frame in art gallery mode and have it just generate novel paintings in the style of your favorite painters.

Ftuuky4y ago· 3 in thread

What jobs will be there in 5~10 years when we consider all the progress done with Dall-E, GPT-3, Codex/GitHub Copilot, Alpha* and so on?

6gvONxR4sf7o4y ago

Things that require understanding of causation will be safe longer. Progress like this is driven by massive datasets. Meanwhile, real world action-taking applications require different paradigms to take causation into account[0][1], and especially to learn safely (e.g. learning to drive without crashing during the beginner stages).

There's certainly research happening around this, and RL in games is a great test bed, but people choosing actions will safe from automation longer than people not choosing actions, if that makes sense. It's the person who decides "hire this person" vs the person who decides "I'll use this particular shade of gray."

[0] The best example is when X causes Y and X also causes Z, but your data only includes Y and Z. Without actually manipulating Y, you can't see that Y doesn't cause Z, even if it's a strong predictor.

[1] Another example is the datasets. You need two different labels depending on what happens if you take action A or B, which you can't have simultaneously outside of simulations.

phphphphp4y ago

Most creative output is duplicated effort: consider how much code each person on HN has written that has been written before. Consider how, a decade ago, we were all writing html and styling it, element by element, and then Twitter bootstrap came along and revolutionised front-end development in what is, ultimately, a very small and low technology way. All it really did was reduce duplicate effort.

Nowadays there’s lots of great low/no code platforms, like Retool, that represent a far greater threat to the amount of code that needs to be produced than AI ever will.

To use a cliche: code is a bug, not a feature. Abstracting away the need for code is the future, not having a machine churn out the same code we need today.

beders4y ago

The ones undoing the damage caused by dumb pattern recognizers and generators? ;)

smusamashah4y ago· 2 in thread

This is mind blowing. I was not expecting the sketch style images to actually look like sketches. Style transfer based sketches never look like sketches.

This and the current AI generated art scene makes it looks like that artwork is now a "solved" problem. See AI generated art on twitter etc.

There is a strong relation between the prompt and the generated images but just like GPT-3, it fails to fully understand what was being asked. If you take the prompt out of the equation and see the generated artwork on its own, its upto your interpretation just like any artwork.

andreyk4y ago

I would caution that artwork is only 'solved' with relatively simple text prompts. To create a novel painting with a precise mix of elements that would take a paragraph or more to explain is still tough, though DALL-E 2 does seem like a big step towards that.

2 more replies

randomsearch4y ago

I'm blown away by these results, but one caveat here: the AI is great at creating illustrations, not art.

Creating great _art_ that Grayson Perry (for example) would recognise as such is probably AGI-complete, because it requires a deep understanding of the human condition, society, and a lot of reasoning skills.

A great artist could certainly use Dall-E 2 as part of their method, though.

1 more reply

zitterbewegung4y ago· 2 in thread

I don’t want to dismiss this new model and achievements but we are getting to the point where I feel like what we saw in the open source versus close source systems we see in new ml models another one is forming for open and closed models. I think that larger and larger models will have disclaimers either restricting you from using it commercially (a great deal of academics and NVIDIA models are doing this. And OpenAI just puts it behind an API with the rules :

Curbing Misuse Our content policy does not allow users to generate violent, adult, or political content, among other categories. We won’t generate images if our filters identify text prompts and image uploads that may violate our policies. We also have automated and human monitoring systems to guard against misuse.

asxd4y ago

They're pretty strict about usage:

- https://github.com/openai/dalle-2-preview/blob/main/system-c...

jdrc4y ago

It should be possible to create open source versions, researchers will find a way if something is cool enough

duren4y ago· 2 in thread

I've been playing around with it today and have been super impressed with its ability to generate pretty artful digital paintings. Could have big implications for designers and artists if and when they allow you use custom palettes, etc.

Here's an example from my prompt ("a group of farmers picking lettuce in a field digital painting"): https://labs.openai.com/s/jb5pzIdTjS3AkMvmAlx69t7G

Sateeshm4y ago

Honestly, that painting is nonsensical. It's great at a glance. But when you look at it for a few seconds, it's just impressionist type blob painting without any features that make impressionist paintings great.

pingeroo4y ago

Neat! Were you part of the initial testing batch or granted access via waitlist?

1 more reply

hwers4y ago· 2 in thread

The correct response here from the artists point of view should be a widespread coming together against their art being used as training data for ML models. With a quickly spread new license on most major art submission sites that explicitly forbids AI algorithms from using their work, artists would effectively starve OpenAI and others from using their own works to put them out of a job.

w-m4y ago

The license should forbid competing artists to using the artist’s work as well. In fact, no human should come in contact with the produced art, otherwise they might be accidentally inspired by it, thus stealing from the original creator.

nonbirithm4y ago

There has been precedent for such a movement. In 2011, an "art collective" sourced user-submitted artwork without the artists' consent for an installation where visitors were instructed to step all over printouts of the art on the floor. The artists complained that their work was being used inappropriately. A large number of those artists to left for other art websites en masse.[0]

There doesn't seem to be an equivalent movement with AI-generated art, probably because the understanding of how the models are trained from large datasets is not mainstream yet. I would imagine thousands of those same artists/consumers would be up in arms if they had a basic understanding of ML and millions of average people were beginning to feed the models their own keywords.

This I think ties in with the "responsibility" principles that OpenAI outlines. Once the generation technique has been reverse-engineered and can be used without limits, there is no way to uninvent it. It can be made illegal, but humans can always find a way around laws if they want something badly enough. This could have drastic consequences if enough artists believe that the training violates their respect or other intangible humanistic qualities. With technological advancement that can never be put back in the bottle and spreads to occupy the entire consciousness of the Internet, their options for recourse will be far different than being able to tell a single fringe art group siphoning others' content to pack up and leave.

[0] https://en.wikipedia.org/wiki/Pixiv#Chaos_Lounge

latexr4y ago· 2 in thread

What confusing pricing[1]:

> Prices are per 1,000 tokens. You can think of tokens as pieces of words, where 1,000 tokens is about 750 words. This paragraph is 35 tokens.

Further down, in the FAQ[2]:

> For English text, 1 token is approximately 4 characters or 0.75 words. As a point of reference, the collected works of Shakespeare are about 900,000 words or 1.2M tokens.

> To learn more about how tokens work and estimate your usage…

> Experiment with our interactive Tokenizer tool.

And it goes on. When most questions in your FAQ are about understanding pricing—to the point you need to offer a specialised tool—perhaps consider a different model?

[1]: https://openai.com/api/pricing/

[2]: https://openai.com/api/pricing/#faq-token

pingeroo4y ago

This is for their GPT models, not Dall-E. I don't think they have released any pricing information for Dall-E yet, as it is still in waitlist mode.

belval4y ago

Haven't read the paper, but they are probably using something like sentencepiece with sub-word splitting and then charge by the number of resulting token.

https://github.com/google/sentencepiece

6gvONxR4sf7o4y ago· 1 in thread

This is a niche complaint, but I get frustrated at how imprecise open AI's papers are. When they describe the model architecture, it's never precise enough to reproduce exactly what they did. I mean, it pretty much never is in ML papers[0], but open AI's bigger products are worse than average with it. And it makes sense, since they're trying to be concise and still spend time on all the other important stuff besides methods, but it still frustrates me quite a bit.

[0] Which is why releasing your code is so beneficial.

gwern4y ago

They've added some more details to the paper.

aChrisSmith4y ago· 1 in thread

I can see how this has the potential to disrupt the games industry. If you work on a AAA title, there is a small army of artists making 19 different types of leather armor. Or 87 images of car hubcaps.

Using something like this could really help automate or at least kickstart the more mundane parts of content creation. (At least when you are using high resolution, true color imagery.)

killerstorm4y ago

This thing can't do 3D models.

There are some 3D image generation techniques, but they aren't based on polygonal modelings, so 3D artists are safe for now

2 more replies

nahuel0x4y ago· 1 in thread

"Any sufficiently advanced technology is indistinguishable from magic"

73737373734y ago

"Any sufficiently advanced hyperreality is indistinguishable from real life"

turdnagel4y ago· 1 in thread

I'm genuinely curious to hear Sam Altman's (and/or the OpenAI team's) perspective on why these products need to be waitlisted. If it's a compute issue, why not build a queuing system? If it's something else (safety related? hype related?) I'd love to understand the thinking behind the decision. More often than not, I sign up for waitlists for things like this and either (1) never get in to the beta or (2) forget about it when I eventually do get in.

minimaxir4y ago

For GPT-3 it was a combination of both compute and safety. Given the notes in the System Card (https://github.com/openai/dalle-2-preview/blob/main/system-c... ), OpenAI is likely doubling-down on safety here.

bradgessler4y ago· 1 in thread

Could somebody build this for SVG icons? I’d invest in it.

applgo4434y ago

What do you want?

1 more reply

zone4114y ago· 1 in thread

Some more examples: https://twitter.com/sama/status/1511724264629678084

jdrc4y ago

there are some masterpieces there. this is the end of clipart and stock images, and the beginning of awesome illustrations in every article.

mycroftiv4y ago· 1 in thread

I tried to comment here previously, but I dont see it posted. It was about the meaning of 'open' and whether the question of suffering and freedom of the AIs was being taken into ethical consideration, not just the ability of humans to use them as tools for their own possibly paper-clippy purposes.

visarga4y ago

This is a tool AI, not an agent AI. Agent AI's can explore and change their environments, this model does not, and it has no sense of time passing.

mrfusion4y ago· 1 in thread

Is this bringing us closer to combining image and language understanding within one model?

beernet4y ago

Check out MAGMA for that: https://news.ycombinator.com/item?id=30699776

narrator4y ago· 1 in thread

While we're being distracted by endless social media and meaningless news, AI technology is advancing at a mind blowing pace. I'd keep my eye on that ball instead of "the current thing."

The_rationalist4y ago

Thank you narrative voice

imperio594y ago· 1 in thread

What happens when they train this thing to make videos? We're about to be dealing with a flood of AI-generated visual/video content. We already have to deal with text bots everywhere... wow.

eks3914y ago

I'm excited for when that happens. I didn't think of the malicious uses, which now that you brought it up I can think of many, but I still think the pros are worth the cons

arecurrence4y ago· 1 in thread

Is there a geometric model relative to this? EG: "corgi near the fireplace" but the output is a 3d model of the corgi and fireplace with shaders rather than an image.

majidmir4y ago

Wait until you see the same concept combined with NeRF idea. The output won’t be 3d shapes but another model that can generate realistic and geometrically consistent images of a scene viewed from different angles.

rndphs4y ago· 1 in thread

This is going to be mostly a rant on OpenAI's "safer than thou" approach to safety, but let me start with that I think this technology I think is really cool, amazing, powerful stuff. Dall-E (and Dall-E 2) is an incredible advance over GANs, and no doubt will have many positive applications. It's simply brilliant. I am someone who has been interested in and has followed the progress of ML generated images for nearly a decade. Almost unimaginable progress has been made in the last five years in this field.

Now the rant:

I think if OpenAI genuinely cared about the ethical consequences of the technology, they would realise that any algorithm they release will be replicated in implementation by other people within some short period of time (a year or two). At that point, the cat is out of the bag and there is nothing they can do to prevent abuse. So really all they are doing is delaying abuse, and in no way stopping it.

I think their strong "safety" stance has three functions:

1. Legal protection 2. PR 3. Keeping their researchers' consciences clear

I think number 3 is dangerous because researchers are put under the false belief that their technology can or will be made safe. This way they can continue to harness bright minds that no doubt have ethical leanings to create things that they otherwise wouldn't have.

I think OpenAI are trying to have the cake and eat it too. They are accelerating the development of potentially very destructive algorithms (and profiting from it in the process!), while trying to absolve themselves of the responsibility. Putting bandaids on a tumour is not going to matter in the long run. I'm not necessarily saying that these algorithms will be widely destructive, but they certainly have the potential to be.

The safety approach of OpenAI ultimately boils down to gatekeeping compute power. This is just gatekeeping via capital. Anyone with sufficient money can replicate their models easily and bypass every single one of their safety constraints. Basically they are only preventing poor bad actors, and only for a limited time at that.

These models cannot be made safe as long as they are replicable.

To produce scientific research requires making your results replicable.

Therefore, there is no ability to develop abusable technology in a safe way. As a researcher, you will have blood on your hands if things go wrong.

If you choose to continue research knowing this, that is your decision. But don't pretend that you can make the algorithms safer by sanitizing models.

visarga4y ago

OpenAI is not the only AI shop. If they didn't make DALL-E someone else would, and control its release as they see fit.

kovek4y ago· 1 in thread

One of my teachers once said “An art piece is never done”. So, I wonder what could that mean for the model to keep making improvements to the piece.

chronolitus4y ago

IIRC that's how it works! it starts from a first image, and improves it until 'satisfied' that the result fits the prompt

croddin4y ago

This reminds me of the holodeck in Star Trek. Someone could walk into the Holodeck and say “make a table in the center of the room. Make it look old.” It seemed amazing to me that the computer could make anything and customize it with voice. We are pretty close to star trek technology now in computer ability (ship’s computer, not Commander Data). I guess to really be like the holodeck it needs to be able to do 3d and be in real time but that seems a lot closer now. It will be cool when this could be in VR and we can say make an astronaut riding a horse, then we can jump on the back of the horse and ride to a secret moon base.

1 more reply

apexalpha4y ago

I would probably pay good money to have a OLED painting in my house that I can just tell what kind of painting to generate each day.

Imagine waking up and telling your (preferably locally hosted) voice assistant that today really feels like a Rembrandt day and the AI just generates new paintings for you.

bakztfuture4y ago

I made a YouTube series last summer on the massive potential future of DALL-E and multimodal AI models.

Imagine not just DALL-E 2 but a single model which be trained on different kinds of media and generate music, images, video and more.

The series talks about:

- essential lessons for AI creatives of the future

- shares details on how to compete creatively in the future

- talks about how to make money through Multimodal AI

- make predictions about AI’s effects on society

- at a very basic level, discusses the ethics of multimodal AI and the philosophy of creativity itself

By my understanding, it's the most comprehensive set of videos on this topic.

The series is free to watch entirely on YouTube: GPT-X, DALL-E, and our Multimodal Future https://www.youtube.com/playlist?list=PLza3gaByGSXjUCtIuv2x9...

axg114y ago

This is incredible work.

From the paper:

> Limitations > Although conditioning image generation on CLIP embeddings improves diversity, this choice does come with certain limitations. In particular, unCLIP [Dall-E 2] is worse at binding attributes to objects than a corresponding GLIDE model.

The binding problem is interesting. It appears that the way Dall-E 2 / CLIP embeds text leads to the concepts within the text being jumbled together. In their example "a red cube on top of a blue cube" becomes jumbled and the resulting images are essentially: "cubes, red, blue, on top". Opens a clear avenue for improvement.

Dig1t4y ago

Most of the conversation around this model seems to be about its direct uses.

This seems to me like a big step towards AGI; a key component of consciousness seems (in my opinion) to be the ability to take words and create a mental picture of what's being described. Is that the long term goal WRT researching a model like this?

uses4y ago

Is anyone looking into what it means when we can generate infinite amounts of human-like work without effort or cost?

> Curbing Misuse [...]

That's great, nowadays the big AI is controlled by mostly benevolent entities. How about when someone real nasty gets a hold of it? In a decade the models anyone can download will make today's GPT-3 etc look like pong right?

Recommender systems etc are already shaping society and culture with all kinds of unintended effects. What happens when mindless optimizing models start generating the content itself?

krick4y ago

Regardless of how much cherry-picking there was, some of these pictures are just beautiful.

rvz4y ago

At this point with WaveNet, GPT-3, Codex, DeepFakes and Dall-E 2, you cannot believe anything you see, hear, watch, read on the internet anymore as an AI can easily generate nearly anything that can be quickly believable by millions.

The internet's own proverb has never been more important to keep in mind. A dose of skepticism is a must.

cm20124y ago

Sam's Twitter thread today was more impressive than the website.

https://twitter.com/sama/status/1511724264629678084?s=20&t=6...

dang4y ago

Related and kind of fun:

Sam Altman demonstrates Dall-E 2 using twitter suggestions - https://news.ycombinator.com/item?id=30933478 - April 2022 (3 comments)

_nateraw4y ago

If you're interested in generative models, Hugging Face is putting on an event around generative models right now called the HugGAN sprint, where they're giving away free access to compute to train models like this.

You can join it by following the steps in the guide here: https://github.com/huggingface/community-events/tree/main/hu...

There will also be talks from awesome folks at EleutherAI, Google, and Deepmind

frakkingcylons4y ago

Impressive results no doubt, but I’m reserving judgment until beta access is available. These are probably the best images that it can generate, but what I’m most interested in is the average case.

awinter-py4y ago

They're using training set restriction and prompt engineering to control its output

> By removing the most explicit content from the training data, we minimized DALL·E 2’s exposure to these concepts

> We won’t generate images if our filters identify text prompts and image uploads that may violate our policies

The 'how to prevent superintelligences from eating us' crowd should be taking note: this may be how we regulate creatures larger than ourselves in the future

And even how we regulate the ethics of non-conscious group minds like big companies

albertzeyer4y ago

Some initial video by Yannic Kilcher: https://www.youtube.com/watch?v=gGPv_SYVDC8

jedberg4y ago

This reminds me of a discussion I had with the high school band teacher in the 90s. I was telling him that one day computers would play music and you won't be able to tell the difference. He got mad at me and told me that a computer could never play as well as a human with feelings, who can feel the piece and interpret it.

I think we passed that point a while ago, but seeing this makes me think we aren't too far off from computers composing pieces that actually sound good too.

billconan4y ago

I'm curious, is this something feasible to train (and inference) on a consumer level machine, or this is something can only be done by institutes?

tnzk4y ago

In the thread Sam Altman giving a demo of this [*] I find multiple people trying to query "solar panels" or "rabbit", are they some meme in the context of AI-generated arts?

[*] https://twitter.com/sama/status/1511724264629678084

greyhair4y ago

Interesting, yes, but I went to the link, and browsed the 'generated artwork' and all if it was subjectively inferior to the original that it generated from. Every single piece. So I am not sure what the 'value' in it is, at this stage.

As far as the text driven, I would have to mess with some non pre-canned presentations to see how useful it was.

mario1434y ago

Yeah, I mean you're right that ultimately the proof is in the pudding.

But I do think we could have guessed that this sort of approach would be better (at least at a high level - I'm not claiming I could have predicted all the technical details!). The previous approaches were sort of the best that people could do without access to the training data and resources - you had a pretrained CLIP encoder that could tell you how well a text caption and an image matched, and you had a pretrained image generator (GAN, diffusion model, whatever), and it was just a matter of trying to force the generator to output something that CLIP thought looked like the caption. You'd basically do gradient ascent to make the image look more and more and more like the text prompt (all the while trying to balance the need to still look like a realistic image). Just from an algorithm aesthetics perspective, it was very much a duct tape and chicken wire approach.

The analogy I would give is if you gave a three-year-old some paints, and they made an image and showed it to you, and you had to say, "this looks like a little like a sunset" or "this looks a lot like a sunset". They would keep going back and adjusting their painting, and you'd keep giving feedback, and eventually you'd get something that looks like a sunset. But it'd be better, if you could manage it, to just teach the three-year-old how to paint, rather than have this brute force process.

Obviously the real challenge here is "well how do you teach a three-year-old how to paint?" - and I think you're right that that question still has a lot of alchemy to it.

blingbleng4y ago

I am disappointed to hear it wasn't released, but what disappointed me more is that people actually approve this decision. Seriously? We shouldn't teach people how to write because that can be abused, can be used to transfer malicious ideas. Sounds absurd? So does limiting people's access to AI tools.

bprasanna4y ago

NFT world is mostly filled with modern art forms which are never seen. If Dall-E can make such images out of the box in seconds, then it looks like AIs can take over NFT world like storm. May be already its happening, and i just didn't know!

mycroftiv4y ago

My main question is - is this really 'open' meaningfully? And are concepts of kindness and freedom being applied to the minds inside the boxes? I dont know where the 'openai' brand is at on these things personally.

krageon4y ago

This is really cool, but before you may use it you must give out your name and a phone number. I was almost taken in by it, but OpenAI is and probably always will be invasive and overbearing. It's really a shame.

thisistheend1234y ago

This is what magic looks like.

Great work.

Looking forward to when they start creating movies from scripts.

cerol4y ago

Maybe this will be what finally puts an end to the whole art NFT shenanigans. A piece of art isn't so unique if there are infinite slight variations on the market.

gallerdude4y ago

This is extremely interesting. We’ve had some amazing AI models come out in the past few days. We’re getting closer and closer to AI becoming a facet of everyday life.

whywhywhywhy4y ago

I never actually found a way to use Dall-E 1, did they ever Open that up to people outside their building?

hdjjhhvvhga4y ago

It's very nice, just the name of the parent org is wrong - it should be called ClosedAI.

zdefz4y ago

Can this also be used the other way around to create Alternative texts for screen-reader users?

aya964y ago

Do you think some of these techniques could be slightly modified, and applied to DNA sequences?

jelliclesfarm4y ago

“Preventing Harmful Generations”? = Fail.

Caravaggio is probably chortling from wherever he is ..

EZ-Cheeze4y ago

"Computer, render Bella and Gigi Hadid playing tennis in bikinis"

tintor4y ago

This would be great for generating Minecraft levels from voice commands.

gotaquestion4y ago

The big question to me is, "What does Dall-E like the most?"

1 more reply

skybrian4y ago

Sam Altman took some user requests on Twitter: https://twitter.com/sama/status/1511724264629678084

lalopalota4y ago

One step closer to combining Scribblenuats with emoticons!

Apofis4y ago

So I can't do Teddy Bears Riding a Horse?

qualudeheart4y ago

Deep Learning plows through yet another wall.

fhe4y ago

perhaps in the not so distant future, we can simply feed a movie script to the program and out comes a feature film.

victor_e4y ago

Wow - mindblowing and kinda scary really.

husamia4y ago

This and NFT come in hand in hand.

agumonkey4y ago

has openai tried their ideas on music ?

hemreldop4y ago

So two Indians and two Chinese authors. The new world is incredible.

sydthrowaway4y ago

gamechanger

ordu4y ago

Dall-E 2 seems to be incapable to catch the essence of the art. I'm not really surprised by it, I'd be surprised a lot if it could. But nevertheless: if you looked in the eye of a Girl With A Pearl Earring[1], you'd be forced to stop and to think what does she have on her mind right now. Or may be you had some other question in your mind, but it really stops people to think. But none of Dall-E interpretations have this quality. Works inspired by Girl With A Pearl Earring sometimes have at least part of that power, like Girl With a Babmoo Earring[2]. But none of Dall-E interpretations have such a power.

And this observation may lead to a great consequences for visual arts. I had a lot of joy of looking at different Dall-E interpretations to find what the flaw of the interpretation that forbids it to be a piece of art of an equal value to the original. It is a ready made tool to search for explanations of the Power of Art. It cannot say what detail make a picture to be an artwork, but it allow to see multiple data points, and to narrow the hypothesis space. My main conclusion is that the pearl earring have nothing to do with the power of art. It is something in the eye, and probably with the slightly opened mouth. (Somehow Dall-E pictured all interpretations with closed lips, so it seems to be an important thing, but I need more variation along this axis to be sure).

[1] https://en.wikipedia.org/wiki/Girl_with_a_Pearl_Earring [2] https://yourartshop-noldenh.com/awol-erizku-girl-with-the-pe...

5 more replies

j / k navigate · click thread line to collapse

485 comments

206 comments · 81 top-level

andybak4y ago· 16 in thread

  Preventing Harmful Generations

  We’ve limited the ability for DALL·E 2 to generate violent, 
  hate, or adult images. By removing the most explicit content 
  from the training data, we minimized DALL·E 2’s exposure to 
  these concepts. We also used advanced techniques to prevent 
  photorealistic generations of real individuals’ faces, 
  including those of public figures.

"And we've also closed off a huge range of potentially interesting work as a result"

(side note - I've been on HN for years and I still can't figure out how to format text as a quote.)

6gvONxR4sf7o4y ago

3 more replies

duxup4y ago

Is this limited to what their service directly hosts / generates for them?

It's their service, their call.

If you want some images / art made for you don't expect someone will make them for you. Get your own art supplies and get to work.

2 more replies

planetsprite4y ago

Don't worry, in a few years someone will have reverse engineered a dall-e porn engine so you can see whatever two celebrities you want boning on Venus in the style of Manet

1 more reply

andreyk4y ago

bogwog4y ago

It's kind of funny (or sad?) that they're censoring it like this, and then saying that the product can "create art"

jonahx4y ago

I've been on HN for years and I still can't figure out how to format text as a quote

I don't think there is a way comparable to markdown, since the formatting options are limited: https://news.ycombinator.com/formatdoc

So your options are literal quotes, "code" formatting like you've done, italics like I've done, or the '>' convention, but that doesn't actually apply formatting. Would be nice if it were added.

3 more replies

jandrese4y ago

This is exactly the sort of thing that gets a company mired in legal issues, vilified in the media, and shut down. I can not blame them for avoiding that potential minefield.

harpersealtako4y ago

1 more reply

campground4y ago

This AI is still a minor. It can start looking at R rated images when it turns 17.

1 more reply

antattack4y ago

I never considered that our AI overlord could be a prude.

1 more reply

wellthisisgreat4y ago

This is a horrible idea. So Francis Bacon's art or Toyohara Kunichika's art are out of question.

But at least we can get another billion of meme-d comics with apes wearing sunglasses, so that's good news right?

Death of culture really.

hamoid4y ago

What if explicit, questionable and even illegal content was AI generated instead of involving harm to real humans of all ages?

spacecity19714y ago

Or, it’s a demonstration that AI output can be controlled in meaningful ways, period. Surely this supports openai’s stated goal of making safe AI?

binarymax4y ago

Removing these areas to mitigate misuse is a good thing and worth the trade off.

3 more replies

drewm19804y ago

1 more reply

teaearlgraycold4y ago

> I can't help but feel a lot of the safeguarding is more about preventing bad PR than anything

That's no hot take. It's literally the reason.

KevinGlass4y ago· 13 in thread

Some of the images also hit me with a creep factor, like the bears on the corgis in the art gallery, but that maybe only because I know it's AI generated.

axg114y ago

1 more reply

TaupeRanger4y ago

By "creatives" you seem to mean "people who drum up the equivalent of elevator music for ads and blogs". This will not remotely replace any working "creative" people that I know.

1 more reply

lofatdairy4y ago

Applejinx4y ago

Not exactly. All the ideas put forth in these demos are really arbitrary, with nothing whatsoever to say. Generating crap art becomes more and more effortless: we've seen this in music as well.

If you can imagine better than the next guy, Dall-E 2 is your new tool for expression. But what is 'better'?

1 more reply

typon4y ago

Even if an AI could generate an exactly equivalent painting, I would pay $0 for it. It wouldn't mean anything to me.

1 more reply

dragonwriter4y ago

> Perhaps is the fact that soon the market value for creatives is going to fall to a hair about zero for all but the most famous.

But...that's always been the case for creatives.

alcover4y ago

  >  for all but the most famous

OK DALL-E, generate our logo in the style of ${most famous}

cwkoss4y ago

I disagree. I think it will be a lot like how technology has effected music production.

3234y ago

1 more reply

chpatrick4y ago

Just wait until they figure out music.

idleproc4y ago

I imagine it will affect artists much the same way wordpress has affected web designers.

Maybe everyone will have an AI image as their desktop wallpaper, but if you've got cash you'll want something with provenance and rarity to brag about.

1 more reply

amelius4y ago

Can I opt-out from ever seeing AI generated images please?

throwaway6753094y ago

Nonsense. This is merely a tool and helps lower the barrier of entry to be able to produce imagery.

falcor844y ago· 11 in thread

>We’ve limited the ability for DALL·E 2 to generate ... adult images.

EDIT: typo

sillysaurusx4y ago

Depends whether you think models should be able to generate cp.

OpenAI ain't gonna fight that fight, so it's up to EleutherAI or someone else. But whoever fights it in the affirmative will probably be vilified, so it'd require an impressive level of selflessness.

6 more replies

GauntletWizard4y ago

[1] https://en.wikipedia.org/wiki/Wirehead_(science_fiction)

1 more reply

AYBABTME4y ago

Iain Banks' "Surface Detail" would like to have a word with you.

Or some such, it's been a while.

uoaei4y ago

No.

Conversely, if people are not exposed to certain stimuli, they will never be able to conceptualize them, and thus will be unable to think about them.

Obviously you cannot eliminate all CP but minimizing the overall levels of exposure / ease of access to these kinds of things is way more appropriate than maximizing it.

5 more replies

cm20124y ago

I suspect that if a free version of this comes out and allows adult image generation, 90% of what it will be used for is adult stuff (see the kerfuffle with AIDungeon).

I can get why the people who worked hard on it and spent money building it don't want to be associated with porn.

1 more reply

Siira4y ago

The problem might be that people are simply lying. Their real reasons are religious/ideological, but they cite humanitarian concerns (which their own religious stigma is partly responsible for).

1 more reply

soheil4y ago

gitfan864y ago

When you combine advanced versions of this with advanced versions of GTP-3 you will not be able to tell the difference between AI and only fans.

I'm not saying that AI will pass all Turing tests. But as far as having a virtual girlfriend/prostitute.

2 more replies

Synaesthesia4y ago

Or maybe we don't want to encourage that behavior more.

thom4y ago

People take their experiences of porn into real relationships, so I do not think this removes suffering overall, no.

hemreldop4y ago

When you put it that way… yes since no one is hurt in the process and people with pedophilic conditions may be deterred from doing something in real life.

fbanon4y ago· 7 in thread

educaysean4y ago

2 more replies

bufferoverflow4y ago

3 more replies

pingeroo4y ago

I mean was he really wrong? As models like OpenAI Codex get more powerful over time, they will start eating into large chunks of dev work as well...

5 more replies

csomar4y ago

esjeon4y ago

robbywashere_4y ago

Did coachman immediately retire when cars were invented or did they begin personal drivers or taxi drivers?

oldstrangers4y ago

Similar argument to make with chess AI: it didn't make chess players obsolete, it made them stronger than ever.

1 more reply

andybak4y ago· 5 in thread

Some freely available models

GLID-3: https://colab.research.google.com/drive/1x4p2PokZ3XznBn35Q5B...

and a new Latent Diffusion notebook: https://colab.research.google.com/github/multimodalart/laten...

have both appeared recently and are getting remarkably close to the original Dall-E (maybe better as I can't test the real thing...)

So - this was pretty good timing if OpenAI want to appear to be ahead of the pack. Of course I'd always pick a model I can actually use over a better one I'm not allowed to...

Jack0004y ago

I think the open source community will rapidly catch up with Openai in the coming months. The data, code and compute are all there to train a model of similar size and quality.

2 more replies

hwers4y ago

They're also not censored on the dataset front and thus produce much more interesting outputs.

FreeHugs4y ago

How do you run such a Google Colab thing?

I don't see a run button?

On.. maybe "Runtime -> Run All" from the menu ...

Shows me a spinning circle around "Download model" ...

26% ...

Fascinating, that Google offers you a computer in the cloud for free ..

Now it is running the model. Wow, I'm curious ..

Ha, it worked!

Nothing compared to the images in the Dall-E 2 article but still impressive.

1 more reply

loufe4y ago

I think this is really neat, but definitely not on the same tier as DALL-E 2, at least from the cherry-picked images I saw.

1 more reply

perdanafm4y ago

a cow and a farmer in their field looking at the sky

nope964y ago· 5 in thread

drcode4y ago

Imagine asking it to generate a picture for "duck wearing a hat on Mars":

You keep repeating this with more neural nets until you have a pretty 1000x1000 (or whatever) image.

1 more reply

joshcryer4y ago

Here's a more of a 'not 15 year old' explanation: https://ml.berkeley.edu/blog/posts/dalle2/

1 more reply

Imnimo4y ago

Here is my extremely rough ELI-15. It uses some building blocks like "train a neural network", which probably warrant explanations of their own.

2 more replies

eks3914y ago

karmasimida4y ago

Diffusion models are indeed pretty magical.

zackmorris4y ago· 5 in thread

For example, using DeMorgan's theorem, we can build any logic circuit out of all NAND or NOR gates:

https://www.electronics-tutorials.ws/boolean/demorgan.html

https://en.wikipedia.org/wiki/NAND_logic

https://en.wikipedia.org/wiki/NOR_logic

dqpb4y ago

Juergeb Schmidhuber predicts the “Omega point” of technological development (including AGI) to be around 2040

https://youtu.be/pGftUCTqaGg

The MIT Limits to Growth study predicts the collapse of global civilization around 2040

https://www.vice.com/amp/en/article/z3xw3x/new-research-vind...

1 more reply

astrange4y ago

> So how can we adapt these AI experiments to get real work done?

In your imagination, everything always goes your way.

robertsdionne4y ago

https://en.wikipedia.org/wiki/Universal_approximation_theore...

1 more reply

teaearlgraycold4y ago

> does anyone know if there is a term for something like Turing-completeness within AI, where a certain level of intelligence can simulate any other type of intelligence like our brains do?

Artificial General Intelligence

causticcup4y ago

Almost everything stated here is simply wrong or misinformed.

How do you suppose KNN is going to generate photorealistic images? I don't understand the question here

>I guess fundamentally my question is, when will AGI start to become prevalent, rather than these special-purpose tools like GPT-3 and Dall-E 2?

Actual AGI research is basically non-existant, and GPT-3/Dall-E 2 are not AGI-level tools.

>Personally I give it less than 10 years of actual work, maybe less

Lol...

>I just mean that to me, Dall-E 2 is already orders of magnitude more complex than what's required to run a basic automaton to free humans from labor.

Categorically incorrect

1 more reply

eganist4y ago· 5 in thread

https://news.ycombinator.com/item?id=30931614

To be clear, it's worth asking if Dall-E 2 was published ahead of schedule without an actual product release (only a waitlist) to potentially move the spotlight away from Worldcoin.

danso4y ago

I'm not a huge fan of these coordination theories. But a few things worth noting:

- However, weren't OpenAI's GPT (2 and 3) announced to the world in similar fashion? e.g. demos and whitepapers and waitlists, but not a full product release?

1 more reply

dang4y ago

I don't have any knowledge (inside or otherwise) but the Worldcoin thing already came in for several rounds of abuse on HN, so it's kind of a scandal of the second freshness at this point.

I listed some of them here - https://news.ycombinator.com/item?id=30934732, just because I remembered there had been previous discussions and listing related previous discussions is a thing.

nonfamous4y ago

Genuine question: how are the two stories even related? It’s certainly not apparent from the BuzzFeed article (or at least a quick skim of it).

1 more reply

gallerdude4y ago

Maybe I’m naive, but I see this as a coincidence. If it was an hour later, then maybe there would be something.

2 more replies

duxup4y ago

What's the idea here? They quickly put out this to somehow hide other stories?

1 more reply

Imnimo4y ago· 4 in thread

I'm only part way through the paper, but what struck me as interesting so far is this:

This feels like a much more sensible approach, but one that is only really possible with access to the giant CLIP dataset and computational resources that OpenAI has.

recuter4y ago

What always bother me with this stuff is, well, you say one approach is more sensible than the other because the images happen to come out more pleasing.

But there's no real rhyme or reason, it is a sort of alchemy.

Is text encoding strictly worse or is it an artifact of the implementation? And if it is strictly worse, which is probably the case, why specifically? What is actually going on here?

I can't argue that their results are not visually pleasing. But I'm not sure what one can really infer from all of this once the excitement washes over you.

Blending photos together in a scene in photoshop is not a difficult task. It is nuanced and tedious but not hard, any pixel slinger will tell you.

An app that accepts a smattering of photos and stitches them together nicely can be coded up any number of ways. This is a fantastic and time saving photoshop plugin.

But what do we have really?

"Kuala dunking basketball" needs to "understand" the separate items and select from the image library hoops and a Kuala where the angles and shadows roughly match.

Very interesting, potentially useful. But if doesn't spit up exactly what you want can't edit it further.

I think the next step has got to be that it conjures up a 3d scene in Unreal or blender so you can zoom in and around convincingly for further tweaks. Not a flat image.

10 more replies

krick4y ago

1 more reply

duxup4y ago

1 more reply

swalsh4y ago

Do you think some of these techniques could be slightly modified, and applied to DNA sequences?

3 more replies

d--b4y ago· 4 in thread

Am I the only one to think that the AI world is divided into 2 groups:

1. Deepmind, who solved go, protein folding, and that seems really onto something.

2. Everyone else, spending billions to build machines that draw astronauts on unicorns, and smartish bot toys.

trixie_4y ago

emadabdulrahim4y ago

The same technology that is drawing cute unicorns can be used for endless other use cases. Perhaps the PR side of the launch and the subject matter they show unveil their product is just that, PR.

gwf4y ago

1. step-by-step guidance for a blind person navigating the use of a public restroom.

2. an EMS AI helping you to save someone's life in an emergency.

3. an AI coach that can teach you a new sport or activity.

4. an omnipresent domain-expert that can show you how to make a gourmet meal, repair an engine, or perform a traditional tea ceremony.

All of these scenarios are just a short hop from where were at now, so mark my words: we will have "borgs" like those described above long before we reach anything like general AI.

1 more reply

robotresearcher4y ago

There’s a third group for your list: AI stuff that’s so good we don’t think about it any more.

For example, recent phone cameras can estimate depth per pixel from single images. Hundreds of millions of these devices are deployed. A decade ago this was AI/CV research lab stuff.

Traster4y ago· 4 in thread

Art is truth.

eks3914y ago

> It's like the person who built this has no idea why the Girl with a Pearl Earing is good art.

sillysaurusx4y ago

Maybe. https://cdn.openai.com/dall-e-2/demos/variations/modified/gi... was pretty impressive.

bbbobbb4y ago

To be honest it's hard for me to imagine alternate reality where the 'original' is not swapped with one of the 'variations' without same comment underneath. Why is the 'original' good art?

whateveracct4y ago

Yeah Dall-E makes stuff that qualifies as art to the eye of a philistine. Makes sense it was built with VC money.

marcodiego4y ago· 4 in thread

Cartoonists, say good-bye to your job.

Imnimo4y ago

andybak4y ago

The goalposts are definitely being moved. But tastes adapt accordingly.

I suspect trends in design will move towards those areas that AI struggles with (assuming there are any left!)

rvz4y ago

NFTs using Dall-E 2 variations incoming.

1 more reply

criddell4y ago

Randall Munroe should quit now. Soon anybody will be able to create XKCD-type comics.

mouzogu4y ago· 4 in thread

So what does the future of human creativity look like when an AI can generate possibly infinite variations of an idea.

andreyk4y ago

AI becomes a tool for artists to use - generative art has been around for a long time, now that particular genre of art will presumably become much more prominent.

For anyone pondering such questions, I would recommend reading "The Past, Present, and Future of AI Art" - https://thegradient.pub/the-past-present-and-future-of-ai-ar...

1 more reply

keiferski4y ago

I think you’ll see more of a focus on the artist themselves. These images are nice, but they have basically zero narrative value.

6gvONxR4sf7o4y ago

I expect that interactive art will be huge. Game design gets fascinating, for example.

tomrod4y ago

I seem to recall an XKCD that I cannot find, but the premise goes like:

In short, while mind-blowingly large, the space of display through digital means is finite.

1 more reply

minimaxir4y ago· 3 in thread

A few comments by someone who's spent way too much time in the AI-generated space:

* I recommend reading the Risks and Limitations section that came with it because it's very through: https://github.com/openai/dalle-2-preview/blob/main/system-c...

* The images in the announcement are still cherry-picked, which is therefore a good reason why they tested DALL-E 1 vs. DALL-E 2 presumably on non-cherrypicked images.

* It appears DALL-E 2 has a fun endpoint that links back to the site for examples with attribution: https://labs.openai.com/s/Zq9SB6vyUid9FGcoJ8slucTu

YeGoblynQueenne4y ago

Regarding cherry-picking, the images of astronauts on horses look stunning, except for their hands. There's something seriously wrong with their hands.

Maybe give it another five years, a few more $billion and a few more petabytes/flops and it will be good. Then finally everyone can generate art for their own Magic: the Gathering cards.

(That's the end goal, right?)

4 more replies

CivBase4y ago

bufferoverflow4y ago

Not-so-open.ai

2 more replies

agloeregrets4y ago· 3 in thread

This doorway is downright impossible https://cdn.openai.com/dall-e-2/demos/variations/modified/fl...

momojo4y ago

Great point. When I saw the shadows and reflections, I thought it had developed a primitive understanding of physical logic. Now I'm not so sure.

At this point, it still seems like it's pushing pixels around until it's "good enough" when you squint at it.

Spinnaker_4y ago

"Doorway in the style of Escher"

dqpb4y ago

It looks to me like the faucet sprays water sideways toward the bowl, which is genius, because then you aren’t bumping up against it when you’re washing your hands!

marviel4y ago· 3 in thread

armchairhacker4y ago

Programming, art, music, is just “describing what you want” in a very specific way. This is describing what you want in a much more vague way.

killerstorm4y ago

No... These models are trained to predict.

Once this is properly productionized it would be possible to implement stuff just talking with a computer.

golergka4y ago

> accurately describe what one wants

Isn't that essentially what programming already is?

impostervt4y ago· 3 in thread

Very cool stuff. For me, the most interesting was the ability to take a piece of art and generate variations of it.

Have a favorite painter? Here's 10,000 new paintings like theirs.

photochemsyn4y ago

Well, one of my favorite painters is Henri Rousseau, and one of his great paintings is War, 1984:

https://www.henrirousseau.net/war.jsp

2 more replies

pingeroo4y ago

That was also my favourite concept, especially with OpenAI Jukebox (https://openai.com/blog/jukebox/). The idea of having new music in the style of your favourite artist is amazing.

However the fidelity of their music AI kinda sucks at this point, but I'm sure we'll get pitch perfect versions of this concept as the singularity gets closer :)

throwaway6753094y ago

Ftuuky4y ago· 3 in thread

What jobs will be there in 5~10 years when we consider all the progress done with Dall-E, GPT-3, Codex/GitHub Copilot, Alpha* and so on?

6gvONxR4sf7o4y ago

[1] Another example is the datasets. You need two different labels depending on what happens if you take action A or B, which you can't have simultaneously outside of simulations.

phphphphp4y ago

Nowadays there’s lots of great low/no code platforms, like Retool, that represent a far greater threat to the amount of code that needs to be produced than AI ever will.

To use a cliche: code is a bug, not a feature. Abstracting away the need for code is the future, not having a machine churn out the same code we need today.

beders4y ago

The ones undoing the damage caused by dumb pattern recognizers and generators? ;)

smusamashah4y ago· 2 in thread

This is mind blowing. I was not expecting the sketch style images to actually look like sketches. Style transfer based sketches never look like sketches.

This and the current AI generated art scene makes it looks like that artwork is now a "solved" problem. See AI generated art on twitter etc.

andreyk4y ago

2 more replies

randomsearch4y ago

I'm blown away by these results, but one caveat here: the AI is great at creating illustrations, not art.

A great artist could certainly use Dall-E 2 as part of their method, though.

1 more reply

zitterbewegung4y ago· 2 in thread

asxd4y ago

They're pretty strict about usage:

- https://github.com/openai/dalle-2-preview/blob/main/system-c...

jdrc4y ago

It should be possible to create open source versions, researchers will find a way if something is cool enough

duren4y ago· 2 in thread

Here's an example from my prompt ("a group of farmers picking lettuce in a field digital painting"): https://labs.openai.com/s/jb5pzIdTjS3AkMvmAlx69t7G

Sateeshm4y ago

pingeroo4y ago

Neat! Were you part of the initial testing batch or granted access via waitlist?

1 more reply

hwers4y ago· 2 in thread

w-m4y ago

nonbirithm4y ago

[0] https://en.wikipedia.org/wiki/Pixiv#Chaos_Lounge

latexr4y ago· 2 in thread

What confusing pricing[1]:

> Prices are per 1,000 tokens. You can think of tokens as pieces of words, where 1,000 tokens is about 750 words. This paragraph is 35 tokens.

Further down, in the FAQ[2]:

> For English text, 1 token is approximately 4 characters or 0.75 words. As a point of reference, the collected works of Shakespeare are about 900,000 words or 1.2M tokens.

> To learn more about how tokens work and estimate your usage…

> Experiment with our interactive Tokenizer tool.

And it goes on. When most questions in your FAQ are about understanding pricing—to the point you need to offer a specialised tool—perhaps consider a different model?

[1]: https://openai.com/api/pricing/

[2]: https://openai.com/api/pricing/#faq-token

pingeroo4y ago

This is for their GPT models, not Dall-E. I don't think they have released any pricing information for Dall-E yet, as it is still in waitlist mode.

belval4y ago

Haven't read the paper, but they are probably using something like sentencepiece with sub-word splitting and then charge by the number of resulting token.

https://github.com/google/sentencepiece

6gvONxR4sf7o4y ago· 1 in thread

[0] Which is why releasing your code is so beneficial.

gwern4y ago

They've added some more details to the paper.

aChrisSmith4y ago· 1 in thread

Using something like this could really help automate or at least kickstart the more mundane parts of content creation. (At least when you are using high resolution, true color imagery.)

killerstorm4y ago

This thing can't do 3D models.

There are some 3D image generation techniques, but they aren't based on polygonal modelings, so 3D artists are safe for now

2 more replies

nahuel0x4y ago· 1 in thread

"Any sufficiently advanced technology is indistinguishable from magic"

73737373734y ago

"Any sufficiently advanced hyperreality is indistinguishable from real life"

turdnagel4y ago· 1 in thread

minimaxir4y ago

bradgessler4y ago· 1 in thread

Could somebody build this for SVG icons? I’d invest in it.

applgo4434y ago

What do you want?

1 more reply

zone4114y ago· 1 in thread

Some more examples: https://twitter.com/sama/status/1511724264629678084

jdrc4y ago

there are some masterpieces there. this is the end of clipart and stock images, and the beginning of awesome illustrations in every article.

mycroftiv4y ago· 1 in thread

visarga4y ago

This is a tool AI, not an agent AI. Agent AI's can explore and change their environments, this model does not, and it has no sense of time passing.

mrfusion4y ago· 1 in thread

Is this bringing us closer to combining image and language understanding within one model?

beernet4y ago

Check out MAGMA for that: https://news.ycombinator.com/item?id=30699776

narrator4y ago· 1 in thread

While we're being distracted by endless social media and meaningless news, AI technology is advancing at a mind blowing pace. I'd keep my eye on that ball instead of "the current thing."

The_rationalist4y ago

Thank you narrative voice

imperio594y ago· 1 in thread

What happens when they train this thing to make videos? We're about to be dealing with a flood of AI-generated visual/video content. We already have to deal with text bots everywhere... wow.

eks3914y ago

I'm excited for when that happens. I didn't think of the malicious uses, which now that you brought it up I can think of many, but I still think the pros are worth the cons

arecurrence4y ago· 1 in thread

Is there a geometric model relative to this? EG: "corgi near the fireplace" but the output is a 3d model of the corgi and fireplace with shaders rather than an image.

majidmir4y ago

rndphs4y ago· 1 in thread

Now the rant:

I think their strong "safety" stance has three functions:

1. Legal protection 2. PR 3. Keeping their researchers' consciences clear

These models cannot be made safe as long as they are replicable.

To produce scientific research requires making your results replicable.

Therefore, there is no ability to develop abusable technology in a safe way. As a researcher, you will have blood on your hands if things go wrong.

If you choose to continue research knowing this, that is your decision. But don't pretend that you can make the algorithms safer by sanitizing models.

visarga4y ago

OpenAI is not the only AI shop. If they didn't make DALL-E someone else would, and control its release as they see fit.

kovek4y ago· 1 in thread

One of my teachers once said “An art piece is never done”. So, I wonder what could that mean for the model to keep making improvements to the piece.

chronolitus4y ago

IIRC that's how it works! it starts from a first image, and improves it until 'satisfied' that the result fits the prompt

croddin4y ago

1 more reply

apexalpha4y ago

I would probably pay good money to have a OLED painting in my house that I can just tell what kind of painting to generate each day.

Imagine waking up and telling your (preferably locally hosted) voice assistant that today really feels like a Rembrandt day and the AI just generates new paintings for you.

bakztfuture4y ago

I made a YouTube series last summer on the massive potential future of DALL-E and multimodal AI models.

Imagine not just DALL-E 2 but a single model which be trained on different kinds of media and generate music, images, video and more.

The series talks about:

- essential lessons for AI creatives of the future

- shares details on how to compete creatively in the future

- talks about how to make money through Multimodal AI

- make predictions about AI’s effects on society

- at a very basic level, discusses the ethics of multimodal AI and the philosophy of creativity itself

By my understanding, it's the most comprehensive set of videos on this topic.

The series is free to watch entirely on YouTube: GPT-X, DALL-E, and our Multimodal Future https://www.youtube.com/playlist?list=PLza3gaByGSXjUCtIuv2x9...

axg114y ago

This is incredible work.

From the paper:

Dig1t4y ago

Most of the conversation around this model seems to be about its direct uses.

uses4y ago

Is anyone looking into what it means when we can generate infinite amounts of human-like work without effort or cost?

> Curbing Misuse [...]

Recommender systems etc are already shaping society and culture with all kinds of unintended effects. What happens when mindless optimizing models start generating the content itself?

krick4y ago

Regardless of how much cherry-picking there was, some of these pictures are just beautiful.

rvz4y ago

The internet's own proverb has never been more important to keep in mind. A dose of skepticism is a must.

cm20124y ago

Sam's Twitter thread today was more impressive than the website.

https://twitter.com/sama/status/1511724264629678084?s=20&t=6...

dang4y ago

Related and kind of fun:

Sam Altman demonstrates Dall-E 2 using twitter suggestions - https://news.ycombinator.com/item?id=30933478 - April 2022 (3 comments)

_nateraw4y ago

You can join it by following the steps in the guide here: https://github.com/huggingface/community-events/tree/main/hu...

There will also be talks from awesome folks at EleutherAI, Google, and Deepmind

frakkingcylons4y ago

awinter-py4y ago

They're using training set restriction and prompt engineering to control its output

> By removing the most explicit content from the training data, we minimized DALL·E 2’s exposure to these concepts

> We won’t generate images if our filters identify text prompts and image uploads that may violate our policies

The 'how to prevent superintelligences from eating us' crowd should be taking note: this may be how we regulate creatures larger than ourselves in the future

And even how we regulate the ethics of non-conscious group minds like big companies

albertzeyer4y ago

Some initial video by Yannic Kilcher: https://www.youtube.com/watch?v=gGPv_SYVDC8

jedberg4y ago

I think we passed that point a while ago, but seeing this makes me think we aren't too far off from computers composing pieces that actually sound good too.

billconan4y ago

I'm curious, is this something feasible to train (and inference) on a consumer level machine, or this is something can only be done by institutes?

tnzk4y ago

In the thread Sam Altman giving a demo of this [*] I find multiple people trying to query "solar panels" or "rabbit", are they some meme in the context of AI-generated arts?

[*] https://twitter.com/sama/status/1511724264629678084

greyhair4y ago

As far as the text driven, I would have to mess with some non pre-canned presentations to see how useful it was.

mario1434y ago

Yeah, I mean you're right that ultimately the proof is in the pudding.

Obviously the real challenge here is "well how do you teach a three-year-old how to paint?" - and I think you're right that that question still has a lot of alchemy to it.

blingbleng4y ago

bprasanna4y ago

mycroftiv4y ago

krageon4y ago

thisistheend1234y ago

This is what magic looks like.

Great work.

Looking forward to when they start creating movies from scripts.

cerol4y ago

Maybe this will be what finally puts an end to the whole art NFT shenanigans. A piece of art isn't so unique if there are infinite slight variations on the market.

gallerdude4y ago

This is extremely interesting. We’ve had some amazing AI models come out in the past few days. We’re getting closer and closer to AI becoming a facet of everyday life.

whywhywhywhy4y ago

I never actually found a way to use Dall-E 1, did they ever Open that up to people outside their building?

hdjjhhvvhga4y ago

It's very nice, just the name of the parent org is wrong - it should be called ClosedAI.

zdefz4y ago

Can this also be used the other way around to create Alternative texts for screen-reader users?

aya964y ago

Do you think some of these techniques could be slightly modified, and applied to DNA sequences?

jelliclesfarm4y ago

“Preventing Harmful Generations”? = Fail.

Caravaggio is probably chortling from wherever he is ..

EZ-Cheeze4y ago