DALL-E 2 has a secret language (opens in new tab)

(twitter.com)

619 pointssmarx3y ago112 comments

112 comments

Shouldn't this be expected to a certain extent?

Gibberish has to map _somewhere_ in the models concept space.

Whether is maps onto anything we'd recognise as consistent doesn't mean that the AI wouldn't have some concept of where it relates, as other people have noted, the gibberish breaks down when you move it into another context, but who's to say that Dall-E 2 isn't remaining consistent to some concept it understands that isn't immediately recognisable to us.

The interesting part is if you can trick it to spit out gibberish in targeted areas of that concept space using crafted queries.

codeflo3y ago

I mean, everything is easy to predict in retrospect. :) Personally, I’m a bit surprised that it has learned any connection between the letters in the generated image and the prompt text at all. I had assumed (somewhat falsely it seems) that the gibberish means that the generator just thinks of text as a “pretty pattern” that it fills in without meaning. For example, a recent post on HN suggested that it likes the word “Bay”, simply because that appears so often on maps.

PebblesRox3y ago

Yes, specifically a prompt about Thomas Bayes generated the caption "Bay of Tayees" and the theory was that "Bayes" got corrupted to "Bay of" because of maps.

I agree that this shows a focus on the appearance of the words rather than their meaning.

https://astralcodexten.substack.com/p/a-guide-to-asking-robo...

rob743y ago

In the spirit of that article, I wonder what DALL-E would spit out if you ask for "GilaWhamm" - probably images of scary medieval-looking men wielding scary medieval cutting weapons?

actionfromafar3y ago

I like how psychology (or at least behavioural studies) is edging closer to being relevant in computer science.

EvgeniyZh3y ago

You could expect that gibberish is distributed uniformly in latent space, disconnected from it's langual counterpart -- after all those are textual inputs that model have never seen, and it can't even map words it have seen many times to their writing in image properly: "seafood" word and "seafood" image are in the same place in latent space, but "seafood" word in image isn't. Yet some gibberish word in image is, and also the same gibberish word is. It's very counterintuitive for me.

TOMDM3y ago

A uniform distribution makes sense for gibberish, not something I'd considered.

A counterpoint I'd raise is I wonder how aggressive Dall-E 2 is in making assumptions about words it hasn't seen before.

Hard to do given that it's read essentially the entire internet, however someone could make up some latin-esque words that people would be able to guess the meaning of.

If the model is as good as people at assuming the meaning of such made up words, it could stand to reason that if it were aggressive enough in this it might be doing the same thing with gibberish and thus ending up with it's own interpretation of the word, which would land it back in a more targeted concept space.

I'd love to see someone craft some words that most people could guess the meaning of, and see how Dall-E 2 fairs.

1 more reply

jerf3y ago

Expected after the fact, somewhat. Before hand it would not be unreasonable to expect that the output text and the input text aren't necessarily that kind of connected, though, especially as as I understand it, DALL-E was not given input labelling explaining the text in various images. To it, text is just a frequently-recurring set of shapes that relate to each other a lot. This may yet be a false positive, based on other discussion.

That the model would have a consistent form of some kind of gibberish would be a given. Even humans have it: https://en.wikipedia.org/wiki/Bouba/kiki_effect And I'm sure if you asked native English speakers, "Hey, we know this isn't a word, but if it was a word, what would it be? 'Apoploe vesrreaitars'" you would get something very far from a uniformly random distribution of all nameable concepts.

momojo3y ago

> Shouldn't this be expected to a certain extent?

In hindsight, sure. Given enough time someone might have predicted the phenomenon. But I don't think most of us did.

What's more fascinating to me is how often this has happened in this space in just the last few years.

1. Some phenomenon is discovered

2. I'm surprised

3. It makes sense in hindsight

burrows3y ago

civilized3y ago

> Gibberish has to map _somewhere_ in the models concept space.

Why? It could just go to noise images, or vaguely real-looking objects that don't look like anything in particular.

jk7tarYZAQNpTQa3y ago

Are these algorithms even capable of generating noise images? And I don't mean asking them to generate "an image of tv static".

emporas3y ago

Of course this should be expected. The models are trained on internet data of natural language, where people are making typos, use abbreviations, some are not native speakers of english, others are talking in greeklish, or arabenglishy or whatever.

The machine is always trying to associate the words with other words semantically close together. E.g. when taken as input strong_man, or strng_man or srong_man these are all mean the same because that combination of letters are usually used with the word man, and there is no other competitor word to replace the srong except strong.

Now why that should be considered a secret language, it is beyond me. The input language for the machine is a natural human language, and that means it is very poor defined language for the machine to recognize. That is going always to produce a lot of gibberish.

gwern3y ago

> Shouldn't this be expected to a certain extent?

Not really. It's a stochastic model, so after a bunch of random denoising steps, it could easily just be mapping every bit of gibberish to a random image, and it be vanishingly unlikely for any of them to be similar or the relationship to run in reverse.

jamal-kumar3y ago

This is really interesting because I was just looking at gibberish detection using GPT models. Seems like mitigating AI with AI doesn't sound like it's all that secure since you can probably mess with the gibberish detection similarly - Or maybe the 'secret language' as they're calling it here passes GPT gibberish detection? [1]

[1] https://arr.am/2020/07/25/gpt-3-uncertainty-prompts/

thaumasiotes3y ago

> Gibberish has to map _somewhere_ in the models concept space.

No, it doesn't. The model in use maps all input to some output, but that isn't a necessary feature of the problem at all. It's actually a terrible idea.

jsnell3y ago

One of the replies is a thread with a fairly convincing rebuttal, with examples:

https://twitter.com/Thomas_Woodside/status/15317102510150819...

dwallin3y ago

I'm not sure it's a convincing rebuttal, the examples shown all seem to have some visible commonality.

Eg. "Apoploe vesrreaitais" Could refer to something along the lines of a "fan / wedge" or "wing-like"

If you look at the examples of cheese, when compared to the "birds and cheese" the cheese tends to be laid out in a fan like pattern and shaped in sharp angled wedges.

joshcryer3y ago

Yeah, and his example about bugs in the kitchen. Everything is edible and 'wild' or 'heirloom' and "contarra ccetnxniams luryca tanniounons" comes from the farmers talking about ... vegetables. So there's a definite interrelationship between the 'words' and the images.

I'm unconvinced by the rebuttal as well, not to say I am convinced we have a fully formal language going on here, but there's definitely some shared concepts with the generated text.

I wonder what imagen would come up with or if it's 'language' is more correlated to real language.

Beltiras3y ago

His counterexamples also have a flaw. He's expecting that mixing two languages have a consistent result given the human language meaning. Those words might have meaning in the DALLE language that totally flips the meaning of the whole phrase. Each batch of images is internally consistent.

f38zf5vdt3y ago

I'm curious what it generates when given randomly generated strings of seemingly pronounceable words like "Fedlope Dipeioreitcus".

sudosysgen3y ago

It seems to refer to "bird plant" which means birds on trees, so it would make sense there would be cheese and plants if it can't find how to fit a bird.

ericb3y ago

> Apoploe vesrreaitais" Could refer to something along the lines of a "fan / wedge"

"feathered" maybe?

jimhi3y ago

We don't know the rules or grammar of this "language". Maybe nouns change based on how they are used

https://en.wikipedia.org/wiki/Declension

nl3y ago

I don't think this is sufficient.

A language should have syntax and meaning. We can see these phrases (tokens?) have meaning.

It is unclear what they syntax is. But DALL-E2's idea of what the syntax is for English isn't how most people understand it either (as can be seen by how many rephrasing attempts people make to get what they want).

It's entirely possible (probable?) there is syntax here but we don't know it yet.

lmc3y ago

A rebuttal to the rebuttal (without examples)...

How many French people speak Breton?

jws3y ago

In short: DALLE-2 generates apparent gibberish for text in some circumstances, but feeding the gibberish back in gets recognized and you can tease out the meaning of words in this unknown language.

a_f3y ago

Is it gibberish in the true sense, or is it some sort of AI generated/learned latin text for the input models used? I wonder if they used a large number of biological images in their training data along with their scientific names, which led to this second order effect.

voxl3y ago

I think calling it gibberish is a misnomer, it would be gibberish if inconsistent, but if the same strings of characters lead to the same semantic objects then that is not gibberish.

chii3y ago

This reminds me of the film https://en.wikipedia.org/wiki/The_Machine_(film)

Spoilers:

These AI controlled soldiers have developed their own language (communicated wirelessly), and the humans think that they are just mute.

nutanc3y ago

I don't think it's a secret language per se. It's just that the tokens generated for these sentences are for some reason coming close to a bird latent space. Maybe if we can dig deep and do a google search for kinds of birds we can find the connection. Tokens from OpenAI below.

https://t.co/Of8CBGdGAE.

Found this answer:

https://twitter.com/BarneyFlames/status/1531736708903051265?...

adastra223y ago

> I don't think it's a secret language per se. It's just that the tokens generated for these sentences are for some reason coming close to a bird latent space.

Serious question: what else do you think language is? How else would your brain associate the word "bird" with the concept?

nutanc3y ago

I would associate a bird with a bird. I wouldn't associate birmongle with bird just because both start with the token "bir". What Dalle is doing here is what it has been trained to do. Just find the closest token match and try to draw something. It does not understand something. And that's absolutely fine. I am just saying it does not have a secret language or something. It's just regular language and it's closest matches.

adastra223y ago

But there's nothing particularly avian about the phonemes "bird". Your brain just matches that pattern of sounds to the nearest concept in thought-space, which happens to describe birds. That's what language is, a consistent mapping of words to concepts.

Which seems to also be what this article is describing.

aetherson3y ago

I think the weird thing is that it creates these "words" and uses them in its output.

axg113y ago

Indeed, this is an expected outcome from the way that DALL-E is trained. It’s still an interesting finding and sometimes affirming the expected can lay the groundwork for future interesting discoveries.

This is an example of an application where uncertainty modelling would help greatly. Any and every input will lead to an output. That doesn’t mean that all regions of latent/embedding space are equally valid.

I’m in the camp that large/modern ML models are nearing human intelligence, in some aspects. What’s currently missing is the universal ability to estimate uncertainty and identify inputs that are out of distribution. Many groups are working on this and perhaps we already have the solution but are not combining the right uncertainty estimation approach with the right foundational model.

wongarsu3y ago

Was DALL-E 2 trained on captions from multiple languages? If so, this makes a lot of sense. Somewhere early in the model the words "bird", "vogel", "oiseau" and "pájaro" have to be mapped to the same concept. And "Apoploe vesrreaitais" happens to map to the same concept. Or maybe "Apoploe vesrreaitais" is rather the tokenization of that concept, since it also appears in the output. So in a sense DALL-E is using an internal language to make sense of our world.

link0ff3y ago

This looks like the artificial language Lojban was constructed: its words share parts from completely unrelated languages to the point when none of the original words are recognizable in the result.

alxndr3y ago

The original words aren't recognizable at first glance, but they do serve as potential mnemonics for remembering the terms/definitions for any learners who speak one of those source languages (English, Spanish, Mandarin, Arabic, Russian, Hindi)

melony3y ago

But that's expected behavior for a language model (especially VAEs), where's the novelty? In a VAE, the vectors are probabilistic in the latent space so this is basically the NLP version of the classic VAE facial image generation where you can tweak the parameters to emphasize or de-emphasize a feature.

tomrod3y ago

Novel in engineering together of multiple concepts, if nothing else!

wongarsu3y ago

Link to the 5 page paper, for those that don't like twitter threads:

https://giannisdaras.github.io/publications/Discovering_the_...

teddykoker3y ago

According to [1], the byte pair encoding for “Apoploe vesrreaitais” (the words producing bird images) is "apo, plo, e</w>, ,ve, sr, re, ait, ais</w>", and Apo-didae & Plo-ceidae are families of birds.

[1] https://twitter.com/barneyflames/status/1531736708903051265?...

DalasNoin3y ago

On the other hand the openai tokenizer gives me a different tokenization ap - opl - oe [0]. If you capitalize A the result is A - pop - loe. The dalle 2 paper only specifies that it uses a BPE encoding, I would assume they used the same one as for gpt3 [0] https://beta.openai.com/tokenizer

karmasimida3y ago

If they use BPE dropout, then the split can be different and not unique.

And for the record, they use BPE dropout for DALLE-1, see https://arxiv.org/pdf/2102.12092.pdf

DalasNoin3y ago

I believe they only apply it during training.

1 more reply

726D72663y ago

Possibly related: In 2017 AI bots formed a derived shorthand that allowed them to communicate faster: https://www.facebook.com/dhruv.batra.dbatra/posts/1943791229...

> While the idea of AI agents inventing their own language may sound alarming/unexpected to people outside the field, it is a well-established sub-field of AI, with publications dating back decades.

> Simply put, agents in environments attempting to solve a task will often find unintuitive ways to maximize reward.

joshstrange3y ago

Which, to a lessor extent, isn't too terribly different from humans if you think about. We don't use a full new language but every profession has it's own jargon. Some of it spans the whole industry and some is company-specific.

gibolt3y ago

Unintuitive to biased humans. The solutions may actually be super intuitive/efficient, and we just can't wrap our heads around it yet

MatthiasPortzel3y ago

It’s wild to see the discoveries being made in ML research. Like most of these ‘discoveries,’ it makes a fair amount of sense after thinking about it. Of course it’s not just going to spit out random noise for random input, it’s been trained to generate realistic looking images.

But I think it is an interesting discovery because I don’t think anyone could have predicted this.

One of my favorite examples is the classification model that will identify an apple with a sticker on it that says “pear” as a pear—it makes sense, but is still surprising when you first see it.

astrange3y ago

> One of my favorite examples is the classification model that will identify an apple with a sticker on it that says “pear” as a pear—it makes sense, but is still surprising when you first see it.

That classification model (CLIP) is the first stage of this image generator (DALLE) - and actually this shows that it doesn't think they're exactly the same thing, or at least that's not the full story, because DALL-E doesn't confuse the two.

However, other CLIP guided image generation models do like to start writing the prompt as text into the image if you push them too hard.

bla33y ago

Another convincing rebuttal: https://mobile.twitter.com/benjamin_hilton/status/1531780892...

It'd be cool if this was true, but it looks like it mostly isn't.

kazinator3y ago

That's reminiscent of small children making up their own words for things. Those words are stable in that you can converse with the child using those words.

PoignardAzur3y ago

Wait, how does that make any sense?

I thought DALL-E's language model was tokenized, so it doesn't understand that eg "car" is made up of the letters 'c', 'a' and 'r'.

So how could the generated pictures contain letters that form words that are tokenized into DALL-E's internal "language"? Shouldn't we expect that feeding those words to the model would give the same result as feeding it random invented words?

Actually, now that I think about it, how does DALL-E react when given words made of completely random letters?

Veedrac3y ago

Wow, I am totally going to need to wait for more experimentation before believing any given thing here, but this seems like a big deal.

It's one thing if DALL-E 2 was trying to map words in the prompt to their letter sequences and failing because of BPEs; that shows an impressive amount of compositionality but it's still image-model territory. It's another if DALL-E 2 was trying to map the prompt to semantically meaningful content and then failing to finish converting that content to language because it's too small and diffusion is a poor fit for language generation. That makes for worse images but it says terrifying things about how much DALL-E 2 has understood the semantic structure of dialog in images, and how this is likely to change with scale. Normally I'd expect the physical representation to precede semantic understanding, not follow it!

That said I reiterate that a degree of skepticism seems warranted at this point.

DonHopkins3y ago

Has anyone tried talking to it in Simlish?

https://en.wikipedia.org/wiki/Simlish

https://web.archive.org/web/20040722043906/http://thesims.ea...

https://web.archive.org/web/20121102012431/http://bbs.thesim...

pwillia73y ago

sul sul

godelski3y ago

Interestingly Google detects these words as Greek. I know they are nonsensical and not actually Greek but I'm wondering if any Greek speakers might be able to provide some insights. Are these gibberish words close to meaningful words? (clear shot in the dark here) Maybe a linguist could find more meaning?

PartiallyTyped3y ago

As a native Greek, no, they don't make any sense.. sort of. My hunch is that they read significantly more like Latin than they do Greek. However it tells us something about google translate.

The reason "Apoploe vesrreaitais" is detected as Greek is because the first "word" is "phonetically" similar to the word απόπλους, which means sailing/shipping and it is rooted in ancient Greek. If we were to write Αποπλοuς using roman characters, we would write apoplous or apoloi (plural, in Greek is αποπλοΐ). So I think that the model understands that "oe" suffix is used to represent the Greek suffix "οι" that is used for plurals. The rest of the word is rather close phonetically, so there is some model that maps phonetic representations to the correct word.

The other phrase seems to be combined of words classified as Portuguese, Spanish, Lithuanian, and Luxembourgish.

godelski3y ago

This is a great response (I also suspected we'd learn something from the Google Translate black box). And I agree with the idea of being closer to Latin gibberish. The phonetic relationships are a great hint to what's actually going on.

My hypothesis here is more that these models are trained more on western languages than others and thus our latent representation of "language" is going to appear like Latin gibberish due to a combination of the evolution of these languages as well as human bias. ("It's all Greek to me")

stavros3y ago

I don't think that's how language detection works, they most likely use the frequencies of n-grams to detect language probability. It's still detected as Greek if you change to "Apoulon vesrreaitais", just because it kind of looks the way Greek words look, not because it resembles any specific word.

PartiallyTyped3y ago

You are wrong. Had it been that simple I would __not__ have suggested that and for whatever reason I find your reply borderline infuriating but I can't pinpoint exactly why that is.

Regardless, here is me, a native speaker, disproving your hypothesis.

I tried the following words in google translate elefantas ailaifantas ailaiphantas elaiphandas elaiphandac.

The suggested detections are ελέφαντας, αιλαιφάντας, αιλαιφάντας, ελαϊφάντας, ελαϊφάντας, however, the translations are elephant, illuminated, illuminated, elephant, elephant respectively. The first is correct. When mapping the roman characters back to greek, there is loss of information, this is seen in the umlaut above iota which makes the pronunciation from ε [e] - like to αϊ [ai̯], and the emphasis denoted via the mark above epsilon (έ).

Notice that all all the words have an edit distance of >=4, a soundex distance of at most 1, and a metaphone distance of at most 1 [1]. The suggested words as I said above are near homophones of the correct word bar a few minor details.

[1] http://www.ripelacunae.net/projects/levenshtein

1 more reply

noizejoy3y ago

Or maybe it’s a subtle joke by Google as a play on the idiom “it’s all Greek to me”?

IncRnd3y ago

Or for something that is only somewhat subtle, it's a chicken and egg problem.

deckeraa3y ago

One could conjecture that "Apoploe" is similar to από πουλί, "from bird". But I don't have much support for that conjecture.

PartiallyTyped3y ago

The word is απόπλους, or αποπλοΐ

softcactus3y ago

For some reason this comment from someone else was deleted:

"My first reaction to this was, "It probably has to do with tokenization. If there's a 'language' buried in here, its native alphabet is GPT-3 tokens, and the text we see is a concatenation of how it thinks those tokens map to Unicode text." Most randomly concatenated pairs of tokens simply do not occur in any training text, because their translation to Unicode doesn't correspond to any real word. There are also combinations that do correspond to real words ("pres" + "ident" + "ial") but still never occur in training because some other tokenization is preferred to represent the same string ("president" + "ial").

Maybe DALL-E 2 is assigning some sort of isolated (as in, no bound morphemes) meaning to tokens — e.g., combinations of letters that are statistically likely to mean "bird" in some language when more letters are revealed. When a group of such tokens are combined, you get a word that's more "birdlike" than the word "bird" could ever be, because it's composed exclusively of tokens that mean "bird": tokens that, unlike "bird" itself, never describe non-birds (e.g., a Pontiac Firebird). The exact tokens it uses to achieve this aren't directly accessible to us, because all we get is poorly rendered roman text."

I wonder if this is why the term for "bird" seemed to be in faux binomial nomenclature, the scientific names for animals. I assume that in the training set there were images of birds/insects with their scientific name. An image labeled with the scientific name would always be an image of an animal, unlike images with the word bird in them which could be of a birdhouse, Pontiac Firebird, or someone playing golf. That would mean that in the latent space when DALLE wants to represent a bird as accurately as possible, it will use the scientific name, or a gibberish/tokenized version of the scientific name-- like someone trying to make up a name that sounds regal might say "Sir Reginard Swellington III". Even though it's not a real name it encodes into the latent space of royal-sounding names.

I wonder if this could be extended to other things with very specific naming conventions. For example aircraft names: "Gruoeing B-26 Froovet" might encode into military aircraft latent space.

astrange3y ago

It seems obvious this would happen (it's just adversarial inputs again) - they didn't make DALL-E reject "nonsense" prompts, so it doesn't try to, and indeed there's no reason you'd want to make it do that.

Seems like a useful enhancement would be to invert the text and image prior stages, so it'd be able to explain what it thinks your prompt meant along with making images of it.

1 more reply

schroeding3y ago

Interesting! I wonder if the model would "understand" the made-up names from today's stained glass window post[1] like "Oila Whamm" for William Ockham and output similar images.

[1] https://astralcodexten.substack.com/p/a-guide-to-asking-robo...

notimpotent3y ago

My first thought upon reading this: what if DALL-E (or a similar AI) uncovers some kind of hidden universal language that is somehow more "optimal" than any existing language?

i.e. anything can be completely described in a more succinct manner than any current spoken language.

Or maybe some kind of universal language that naturally occurs and any semi-intelligence life can understand it.

Fun stuff!

extr3y ago

This is kind of already what's happening inside the NN. You can think of intermediate layers in the network as talking to each other in "NN-ease", that is, translating from one form of representation (encoding) to another. At the final encoder layer, the input is maximally compressed (for that given dataset/model architecture/training regime). The picture (millions of pixels) of the dog is reduced to a few bits of information about what kind of dog it is and how it's posed, what color the background is, etc.

However, optimality of encoding is entirely relative to the decoding scheme used and your purposes. Obviously a matrix of numbers representing a summary of a paragraph can be in some sense "more compressed" than the English equivalent, but it's useless if you don't speak matrices. Similarly, you could invent an encoding scheme with Latin characters that is more compressed than English, but it's again useless if you don't know it or want to take the time to learn it. If we wanted we could make English more regular and easier to learn/compress, but we don't, for a whole bunch of practical/real life reasons. There's no free lunch in information theory. You always have to keep the decoder/reader in mind.

sbierwagen3y ago

Ithkuil (Ithkuil: Iţkuîl) is an experimental constructed language created by John Quijada.[1] It is designed to express more profound levels of human cognition briefly yet overtly and clearly, particularly about human categorization.

Meaningful phrases or sentences can usually be expressed in Ithkuil with fewer linguistic units than natural languages.[2] For example, the two-word Ithkuil sentence "Tram-mļöi hhâsmařpţuktôx" can be translated into English as "On the contrary, I think it may turn out that this rugged mountain range trails off at some point."[2]

https://en.wikipedia.org/wiki/Ithkuil

astrange3y ago

That’s not possible - it’s like asking for a compression system that can compress any message.

All human languages are about the same efficiency when spoken, but of course this mainly depends on having short enough words for the most common concepts in the specific thing you’re talking about.

https://www.science.org/content/article/human-speech-may-hav...

And there can’t be a universal language because the symbols (words) used are completely arbitrary even if the grammar has universal concepts.

elil173y ago

There are a couple sci-fi short stories in the book "Stories of Your Life and Others" by Ted Chiang which explore the idea that highly advanced intelligences might create special languages which accommodate special thoughts which we cannot easily think.

jcims3y ago

I think something like this is actually quite likely.

I’ve been wondering if there is a way to do psychological experiments on these large language models that we couldn’t do with a person.

julianbuse3y ago

I imagine these would be very interesting, but not very applicable to humans (which I presume is the intended outcome). OTOH, since these language models are trained on human language and media, they might have some value. I'm quite split on which I think is more likely (I don't have any experience in ai/ml nor in psychology so what do I know).

jcims3y ago

One example of an ’experiment’ would be to explore the latent space with random/procedurally generated prompts and do semantic analysis on the results to look for topics or sentiments to emerge.

My guess is that the current language models don’t have enough information in the training data to do this usefully today, but over time it seems potentially viable.

qgin3y ago

https://twitter.com/giannis_daras/status/1531693104821985280

This one melts my brain a bit, I’m not going to lie. Whales talking about food, with subtitles. “Translate” the subtitles and you get food that whales would actually eat.

normaldist3y ago

I'm seeing a lot more people experimenting with DALL-E 2.

How does getting access work, do you need a referral?

mikequinlan3y ago

https://labs.openai.com/waitlist

minimaxir3y ago

There is a waitlist, but OpenAI just announced they are opening access more widely from it.

Cloudef3y ago

I wonder why they call it "Open"AI

neopallium3y ago

Would it be possible to build a rosetta stone for this secret language with prompts asking for labeled pictures of different categories of objects? Or prompts about teaching kids different words?

ml_basics3y ago

I find it really interesting how these new large models (DALLE, GPT3, PaLM etc) are opening up new research areas that do not require the same massive resources required to actually train the models.

This may act as a counter balance to the trends of the last few years of all major research becoming concentrated in a few tech companies.

trebligdivad3y ago

Is this finally a need for a xenolinguist?

MaxBorsch2283y ago

What if give it the same promt but "with subtitles in French" for example?

YeGoblynQueenne3y ago

If I understand correctly from the twitter thread (I haven't read the linked technical report) the author and a collaborator found that DALL-E generated some gibberish in an image that showed two men talking, one holding two ... cabbages? They fed (some of) the gibberish back to DALL-E and it generated images of birds, pecking at things.

Conclusion: the gibberish is the expression for birds eating things in DALL-E's secret language.

But, wait. Why is the same gibberish in the first image, that has the two men and the cabbages(?), but no birds?

Explanation: the two men are clearly talking about birds:

>> We then feed the words: "Apoploe vesrreaitars" and we get birds. It seems that the farmers are talking about birds, messing with their vegetables!

With apologies to my two compatriots, but that is circular thinking to make my head spin. I'm reminded of nothing else as much as the scene in the Knights of the Round Table where the wise Sir Bedivere explains why witches are made of wood:

https://youtu.be/zrzMhU_4m-g

ortusdux3y ago

I wonder if any linguists are training a neural network to generate Esperanto 2.0.

Imnimo3y ago

I tried a few of these in one of the available CLIP-guided diffusion notebooks, but wasn't able to get anything that looks like DALL-E meanings. Not sure if DALL-E retrained CLIP (I don't think they did?), but it maybe suggests that whatever weirdness is going on here is on the decoder side?

All the cool images that DALL-E spits out are fun to look at, but this sort of thing is an even more interesting experiment in my book. I've been patiently sitting on the waitlist for access, but I can't wait to play around with it.

tiborsaas3y ago

I love this scientific curiosity towards DALL-E. Many people just say that it's bad at text generation (including me), but someone stopped to wonder if this is really gibberish or it has some logic to it. Classic "hmm, that's odd" case.

It will be fun to see people experimenting with extracting text prompts from generated images. I'd try something like "An open children book about animals" or "Random thought written on a paper". Maybe do a feedback loop of extracted prompts :)

smusamashah3y ago

A few days ago I was wondering what DALL-E would generate if given gibberish (tried to request which wasn't entertained). This sounds like an answer to that to some extent.

I think, there will be multiple words for the same thing. Also, unlike 'bird' the word 'Apoploe vesrreaitais' might actually mean specific kind of bird in specific setting.

afro883y ago

I love the weird edge cases of ML. Imagine discussing security concerns and saying "what if it creates it's own secret language that we don't know about, which is discovered later, and people can use to circumvent privacy and obscenity controls?"

dang3y ago

Later related thread:

No, DALL-E doesn’t have a secret language - https://news.ycombinator.com/item?id=31587316 - June 2022 (7 comments)

layer83y ago

Sounds like an effect similar to illegal opcodes: https://en.m.wikipedia.org/wiki/Illegal_opcode

mola3y ago

So now we're reverting to haruspex... The deemphasizing of peer review BEFORE publication will kill science. The amount of noise and nonsense proliferating just causes confusion and lost of trust...

la647103y ago

Does google translate supports this?

carabiner3y ago

Science has gone too far.

GamerUncle3y ago

https://nitter.net/giannis_daras/status/1531693093040230402

ricardobeat3y ago

The paper is just as long as the twitter thread.

throw4573y ago

I bet it's just a form of copy protection.

ceejayoz3y ago

Like https://en.wikipedia.org/wiki/Trap_street?

867-53093y ago

and Wagatha

dpierce93y ago

Gavagai!

alxndr3y ago

(explaining the joke: https://en.m.wikipedia.org/wiki/Indeterminacy_of_translation )

seydor3y ago

damn. i hope arcaeologists can use that to decipher old scripts

j / k navigate · click thread line to collapse

112 comments

TOMDM3y ago

Shouldn't this be expected to a certain extent?

Gibberish has to map _somewhere_ in the models concept space.

The interesting part is if you can trick it to spit out gibberish in targeted areas of that concept space using crafted queries.

codeflo3y ago

PebblesRox3y ago

Yes, specifically a prompt about Thomas Bayes generated the caption "Bay of Tayees" and the theory was that "Bayes" got corrupted to "Bay of" because of maps.

I agree that this shows a focus on the appearance of the words rather than their meaning.

https://astralcodexten.substack.com/p/a-guide-to-asking-robo...

rob743y ago

In the spirit of that article, I wonder what DALL-E would spit out if you ask for "GilaWhamm" - probably images of scary medieval-looking men wielding scary medieval cutting weapons?

actionfromafar3y ago

I like how psychology (or at least behavioural studies) is edging closer to being relevant in computer science.

EvgeniyZh3y ago

TOMDM3y ago

A uniform distribution makes sense for gibberish, not something I'd considered.

A counterpoint I'd raise is I wonder how aggressive Dall-E 2 is in making assumptions about words it hasn't seen before.

Hard to do given that it's read essentially the entire internet, however someone could make up some latin-esque words that people would be able to guess the meaning of.

I'd love to see someone craft some words that most people could guess the meaning of, and see how Dall-E 2 fairs.

1 more reply

jerf3y ago

momojo3y ago

> Shouldn't this be expected to a certain extent?

In hindsight, sure. Given enough time someone might have predicted the phenomenon. But I don't think most of us did.

What's more fascinating to me is how often this has happened in this space in just the last few years.

1. Some phenomenon is discovered

2. I'm surprised

3. It makes sense in hindsight

burrows3y ago

civilized3y ago

> Gibberish has to map _somewhere_ in the models concept space.

Why? It could just go to noise images, or vaguely real-looking objects that don't look like anything in particular.

jk7tarYZAQNpTQa3y ago

Are these algorithms even capable of generating noise images? And I don't mean asking them to generate "an image of tv static".

emporas3y ago

gwern3y ago

> Shouldn't this be expected to a certain extent?

jamal-kumar3y ago

[1] https://arr.am/2020/07/25/gpt-3-uncertainty-prompts/

thaumasiotes3y ago

> Gibberish has to map _somewhere_ in the models concept space.

No, it doesn't. The model in use maps all input to some output, but that isn't a necessary feature of the problem at all. It's actually a terrible idea.

jsnell3y ago

One of the replies is a thread with a fairly convincing rebuttal, with examples:

https://twitter.com/Thomas_Woodside/status/15317102510150819...

dwallin3y ago

I'm not sure it's a convincing rebuttal, the examples shown all seem to have some visible commonality.

Eg. "Apoploe vesrreaitais" Could refer to something along the lines of a "fan / wedge" or "wing-like"

If you look at the examples of cheese, when compared to the "birds and cheese" the cheese tends to be laid out in a fan like pattern and shaped in sharp angled wedges.

joshcryer3y ago

I'm unconvinced by the rebuttal as well, not to say I am convinced we have a fully formal language going on here, but there's definitely some shared concepts with the generated text.

I wonder what imagen would come up with or if it's 'language' is more correlated to real language.

Beltiras3y ago

f38zf5vdt3y ago

I'm curious what it generates when given randomly generated strings of seemingly pronounceable words like "Fedlope Dipeioreitcus".

sudosysgen3y ago

It seems to refer to "bird plant" which means birds on trees, so it would make sense there would be cheese and plants if it can't find how to fit a bird.

ericb3y ago

> Apoploe vesrreaitais" Could refer to something along the lines of a "fan / wedge"

"feathered" maybe?

jimhi3y ago

We don't know the rules or grammar of this "language". Maybe nouns change based on how they are used

https://en.wikipedia.org/wiki/Declension

nl3y ago

I don't think this is sufficient.

A language should have syntax and meaning. We can see these phrases (tokens?) have meaning.

It's entirely possible (probable?) there is syntax here but we don't know it yet.

lmc3y ago

A rebuttal to the rebuttal (without examples)...

How many French people speak Breton?

jws3y ago

In short: DALLE-2 generates apparent gibberish for text in some circumstances, but feeding the gibberish back in gets recognized and you can tease out the meaning of words in this unknown language.

a_f3y ago

voxl3y ago

I think calling it gibberish is a misnomer, it would be gibberish if inconsistent, but if the same strings of characters lead to the same semantic objects then that is not gibberish.

chii3y ago

This reminds me of the film https://en.wikipedia.org/wiki/The_Machine_(film)

Spoilers:

These AI controlled soldiers have developed their own language (communicated wirelessly), and the humans think that they are just mute.

nutanc3y ago

https://t.co/Of8CBGdGAE.

Found this answer:

https://twitter.com/BarneyFlames/status/1531736708903051265?...

adastra223y ago

> I don't think it's a secret language per se. It's just that the tokens generated for these sentences are for some reason coming close to a bird latent space.

Serious question: what else do you think language is? How else would your brain associate the word "bird" with the concept?

nutanc3y ago

adastra223y ago

Which seems to also be what this article is describing.

aetherson3y ago

I think the weird thing is that it creates these "words" and uses them in its output.

axg113y ago

wongarsu3y ago

link0ff3y ago

This looks like the artificial language Lojban was constructed: its words share parts from completely unrelated languages to the point when none of the original words are recognizable in the result.

alxndr3y ago

melony3y ago

tomrod3y ago

Novel in engineering together of multiple concepts, if nothing else!

wongarsu3y ago

Link to the 5 page paper, for those that don't like twitter threads:

https://giannisdaras.github.io/publications/Discovering_the_...

teddykoker3y ago

[1] https://twitter.com/barneyflames/status/1531736708903051265?...

DalasNoin3y ago

karmasimida3y ago

If they use BPE dropout, then the split can be different and not unique.

And for the record, they use BPE dropout for DALLE-1, see https://arxiv.org/pdf/2102.12092.pdf

DalasNoin3y ago

I believe they only apply it during training.

1 more reply

726D72663y ago

Possibly related: In 2017 AI bots formed a derived shorthand that allowed them to communicate faster: https://www.facebook.com/dhruv.batra.dbatra/posts/1943791229...

> While the idea of AI agents inventing their own language may sound alarming/unexpected to people outside the field, it is a well-established sub-field of AI, with publications dating back decades.

> Simply put, agents in environments attempting to solve a task will often find unintuitive ways to maximize reward.

joshstrange3y ago

gibolt3y ago

Unintuitive to biased humans. The solutions may actually be super intuitive/efficient, and we just can't wrap our heads around it yet

MatthiasPortzel3y ago

But I think it is an interesting discovery because I don’t think anyone could have predicted this.

astrange3y ago

However, other CLIP guided image generation models do like to start writing the prompt as text into the image if you push them too hard.

bla33y ago

Another convincing rebuttal: https://mobile.twitter.com/benjamin_hilton/status/1531780892...

It'd be cool if this was true, but it looks like it mostly isn't.

kazinator3y ago

That's reminiscent of small children making up their own words for things. Those words are stable in that you can converse with the child using those words.

PoignardAzur3y ago

Wait, how does that make any sense?

I thought DALL-E's language model was tokenized, so it doesn't understand that eg "car" is made up of the letters 'c', 'a' and 'r'.

Actually, now that I think about it, how does DALL-E react when given words made of completely random letters?

Veedrac3y ago

Wow, I am totally going to need to wait for more experimentation before believing any given thing here, but this seems like a big deal.

That said I reiterate that a degree of skepticism seems warranted at this point.

DonHopkins3y ago

Has anyone tried talking to it in Simlish?

https://en.wikipedia.org/wiki/Simlish

https://web.archive.org/web/20040722043906/http://thesims.ea...

https://web.archive.org/web/20121102012431/http://bbs.thesim...

pwillia73y ago

sul sul

godelski3y ago

PartiallyTyped3y ago

As a native Greek, no, they don't make any sense.. sort of. My hunch is that they read significantly more like Latin than they do Greek. However it tells us something about google translate.

The other phrase seems to be combined of words classified as Portuguese, Spanish, Lithuanian, and Luxembourgish.

godelski3y ago

stavros3y ago

PartiallyTyped3y ago

You are wrong. Had it been that simple I would __not__ have suggested that and for whatever reason I find your reply borderline infuriating but I can't pinpoint exactly why that is.

Regardless, here is me, a native speaker, disproving your hypothesis.

I tried the following words in google translate elefantas ailaifantas ailaiphantas elaiphandas elaiphandac.

[1] http://www.ripelacunae.net/projects/levenshtein

1 more reply

noizejoy3y ago

Or maybe it’s a subtle joke by Google as a play on the idiom “it’s all Greek to me”?

IncRnd3y ago

Or for something that is only somewhat subtle, it's a chicken and egg problem.

deckeraa3y ago

One could conjecture that "Apoploe" is similar to από πουλί, "from bird". But I don't have much support for that conjecture.

PartiallyTyped3y ago

The word is απόπλους, or αποπλοΐ

softcactus3y ago

For some reason this comment from someone else was deleted:

I wonder if this could be extended to other things with very specific naming conventions. For example aircraft names: "Gruoeing B-26 Froovet" might encode into military aircraft latent space.

astrange3y ago

Seems like a useful enhancement would be to invert the text and image prior stages, so it'd be able to explain what it thinks your prompt meant along with making images of it.

1 more reply

schroeding3y ago

Interesting! I wonder if the model would "understand" the made-up names from today's stained glass window post[1] like "Oila Whamm" for William Ockham and output similar images.

[1] https://astralcodexten.substack.com/p/a-guide-to-asking-robo...

notimpotent3y ago

My first thought upon reading this: what if DALL-E (or a similar AI) uncovers some kind of hidden universal language that is somehow more "optimal" than any existing language?

i.e. anything can be completely described in a more succinct manner than any current spoken language.

Or maybe some kind of universal language that naturally occurs and any semi-intelligence life can understand it.

Fun stuff!

extr3y ago

sbierwagen3y ago

https://en.wikipedia.org/wiki/Ithkuil

astrange3y ago

That’s not possible - it’s like asking for a compression system that can compress any message.

https://www.science.org/content/article/human-speech-may-hav...

And there can’t be a universal language because the symbols (words) used are completely arbitrary even if the grammar has universal concepts.

elil173y ago

jcims3y ago

I think something like this is actually quite likely.

I’ve been wondering if there is a way to do psychological experiments on these large language models that we couldn’t do with a person.

julianbuse3y ago

jcims3y ago

One example of an ’experiment’ would be to explore the latent space with random/procedurally generated prompts and do semantic analysis on the results to look for topics or sentiments to emerge.

My guess is that the current language models don’t have enough information in the training data to do this usefully today, but over time it seems potentially viable.

qgin3y ago

https://twitter.com/giannis_daras/status/1531693104821985280

This one melts my brain a bit, I’m not going to lie. Whales talking about food, with subtitles. “Translate” the subtitles and you get food that whales would actually eat.

normaldist3y ago

I'm seeing a lot more people experimenting with DALL-E 2.

How does getting access work, do you need a referral?

mikequinlan3y ago

https://labs.openai.com/waitlist

minimaxir3y ago

There is a waitlist, but OpenAI just announced they are opening access more widely from it.

Cloudef3y ago

I wonder why they call it "Open"AI

neopallium3y ago

Would it be possible to build a rosetta stone for this secret language with prompts asking for labeled pictures of different categories of objects? Or prompts about teaching kids different words?

ml_basics3y ago

I find it really interesting how these new large models (DALLE, GPT3, PaLM etc) are opening up new research areas that do not require the same massive resources required to actually train the models.

This may act as a counter balance to the trends of the last few years of all major research becoming concentrated in a few tech companies.

trebligdivad3y ago

Is this finally a need for a xenolinguist?

MaxBorsch2283y ago

What if give it the same promt but "with subtitles in French" for example?

YeGoblynQueenne3y ago

Conclusion: the gibberish is the expression for birds eating things in DALL-E's secret language.

But, wait. Why is the same gibberish in the first image, that has the two men and the cabbages(?), but no birds?

Explanation: the two men are clearly talking about birds:

>> We then feed the words: "Apoploe vesrreaitars" and we get birds. It seems that the farmers are talking about birds, messing with their vegetables!

https://youtu.be/zrzMhU_4m-g

ortusdux3y ago

I wonder if any linguists are training a neural network to generate Esperanto 2.0.

Imnimo3y ago

tiborsaas3y ago

smusamashah3y ago

A few days ago I was wondering what DALL-E would generate if given gibberish (tried to request which wasn't entertained). This sounds like an answer to that to some extent.

I think, there will be multiple words for the same thing. Also, unlike 'bird' the word 'Apoploe vesrreaitais' might actually mean specific kind of bird in specific setting.

afro883y ago

dang3y ago

Later related thread:

No, DALL-E doesn’t have a secret language - https://news.ycombinator.com/item?id=31587316 - June 2022 (7 comments)

layer83y ago

Sounds like an effect similar to illegal opcodes: https://en.m.wikipedia.org/wiki/Illegal_opcode

mola3y ago

So now we're reverting to haruspex... The deemphasizing of peer review BEFORE publication will kill science. The amount of noise and nonsense proliferating just causes confusion and lost of trust...

la647103y ago

Does google translate supports this?

carabiner3y ago

Science has gone too far.

GamerUncle3y ago

https://nitter.net/giannis_daras/status/1531693093040230402

ricardobeat3y ago

The paper is just as long as the twitter thread.

throw4573y ago

I bet it's just a form of copy protection.

ceejayoz3y ago

Like https://en.wikipedia.org/wiki/Trap_street?