For example take a syntactically plausible yet meaningless concept such as "the temperature of sorrowful liquid car parkings"[1]. That has nothing near it in embedding space I'd be prepared to guess. When you embed any corpus of text this phrase is going to drop into a big hole in the semantic space because while it has components which have some sort of meaning in each of your semantic dimensions, there isn't anything similar to the actual concept- there isn't any actual meaning there for something else to be similar to.
You need the spaces because there are so many possible different facets we are trying to capture when we talk about meaning but only a subset of those facets are applicable to the meaning of any one concept. So the dimensions in the embedding space are not independent or really orthogonal, and semantic concepts end up clustered in bunches with big gaps between them.
That's my intuition about it. When I get some time it's definitely something I want to study more.
[1] Off the top of my head but you can come up with an infinite number of similar examples
This is quite a beautiful, strange (estranging?) clause - at least in the sense that we (or I) constantly struggle to find meaning and patterns in what might simply be plain noise (apophenic beauty?). It’s a similar form of intrigue that I and I think others often experience when reading the outputs of LLMs operating in the high-temperature regime, though of course we are just talking about embedding/embedding inversion here.
On a human level though, it makes me wonder why you picked that phrase. Did you roll dice in front of a dictionary? Play madlibs? Were they the first words that came to your mind? Or perhaps you went through several iterations to come up with the perfectly meaningless combination? Or perhaps you simply spilled your hot chocolate on your favorite pair of pants or dress while getting out of the car this morning (or perhaps as a child) and the memory has stuck with you… who knows! Only you!
In any case, my original point was simply that these interstitial points in embedding spaces can become ways of referring to or communicating ideas that we simply do not have the words for but which are none-the-less potentially useful in a communication between two entities that both have the ability to come to some roughly shared understanding of what is being referred to or expressed by that point in the embedding space. Regular languages of course invent new words all the time, and yet the points those new words map to in the embedding space always existed (eh not a great example because the shape of the embedding space might change as new words/tokens are introduced to the lexicon but I think the idea holds). Perhaps new words or phrases will come about to bring some point back into textual space; or perhaps that point will remain solely in the shared lexicon of the algorithmic systems using the latent space to communicate ideas. Again, who knows!
For instance, consider the midpoint of a segment connecting two ideas, or the centroid of any simplex in the embedding space… if we assume that there is some sort of well-defined semantic structure in the space, is it necessarily the case that the centroid must refer to something which equally represents all of the nodes, a kind of lowest-common semantic denominator? Obviously if the semantic structure only holds over local regions but breaks down globally this is not the case, but if all the points are within a region of relatively sound semantic structure, that seems plausible. We know what happens when you do a latent space traversal for a VAE which generates images, and it can be quite beautiful and strange (or boring and familiar by 2024, depending on your perspective), but some similarly weird process might be possible with embedding space traversals, if only we could some how phenomenologically if not linguistically decode those interpolating points.
> concepts which have essentially no meaning
This is a pretty strange idea to try to wrap your head around.
Edit: but I might exchange the word useful for something else… maybe not…
For a couple of years now, I've had this half-articulated sense that the uncanny ability of sufficiently-advanced language models to back into convincing simulations of conscious thought entirely via predicting language tokens means something profound about the nature of language itself.
I'm sure there are much smarter people than I thinking about this (and probably quite a bit of background reading that would help; Chomsky, perhaps McLuhan?) but it feels like, in parallel to everything going on in the development of LLMs, there's also something big about us waiting there under the surface.
> there's also something big about us waiting there under the surface.
I don't believe so. In "The Origins of Knowledge and Imagination" by Jacob Brownoski, he argues that human language have four unique characteristics:
- We can separate information (data of what being described) from emotional content (how we're supposed to react). There's no longer a bijection between communication and action.
- We can extend the time reference of the communication content. We talk about the past, we plan for the future.
- We can refer to ourselves. So we examine what we've done and iterate over it until we fix the errors. We can see ourselves doing the action without actually doing it.
- We can rearrange units of languages to have different meanings. The same words can have different meanings based on their order. So meaning depends not only on the words, but their sequence. And that goes from words to phrases to sequence of dialogs.
The fourth point is the most important. LLMs by predicting languages tokens can give use the most common order for a particular context. And because we don't have that many words, their orders can be extracted from books and other written content. But then they fail for the higher levels, mostly because that's when everything get unique.
As for the third point, by observing ourselves, our communication is constantly being based on reality, which grounds it in truth. And because we can extend the reference it's based on, that leads us to observe changes and model laws. The first point allows us to separate what things are from what we should do or feel based on their existence and absence.
Instead of the LLMs fooling us, it's more us fooling ourselves, because by recognizing meaning in sentences, we try to extract meanings for longer sequences of text where there aren't any. Why? Because there is no "I" that has done the job of extracting information and using language to transmit it (while still cognizant of the imperfection of natural languages). LLMs are lossy compressions of ideas. Only the smallest survives and then it generates much more false ones.
Skinner would marvel at today’s LLMs. They are the most elegant proof that intelligence is not just shaped by external contingencies, but that it is identical with those contingencies.
> simulations of conscious thought entirely via prediction language tokens
Jaynes goes so far as to assert that language generates consciousness, which is characterized by (amongst other features) its narrative structure, as well as its production of a metaphor of our selves that can inhabit a spatiotemporal mental space that serves as an analog for the physical world; the mental space where we imagine potential actions, play with ideas, predict future outcomes, and analyze concepts prior to taking action in the "real, actual" world.
The generation of metaphors is inextricably linked to the psychotechnology (to pull a word from vocabulary discussed by John Vervaeke in his "Awakening from the Meaning Crisis" series) of language, which is the means by which one object can be described and elaborated by its similarity to another. As an etymological example: the Sanskrit word "bhu" which means "to grow" forms the basis of the modern English verb "to be," but predates lofty abstract notions such as that of "being," "ontology," or "existence." It's from the known and the familiar (plant or animal growth) that we can reach out into the unknown and the unfamiliar (the concept of being), using (psycho-)technologies such as language to extend our cognition in the same way a hammer or a bicycle extends our body.
There is something here about language being the substrate of thought, and perhaps even consciousness in general as Jaynes would seem to assert in Book I of his 1976 work, where he spends a considerable amount of time discussing metaphor and language in connection to his definition of "consciousness."
There are also questions of "intentionality" and whether or not computers and their internal representations can actually be "about" something in the way that our language and our ideas can be "about" something in the physical (or even ideal) world that we want to discuss. Searle and the "Chinese room" argument come to mind.
Turing famously dodged this question in his paper "Computing Machinery and Intelligence" by substituting what is now called the "Turing test" in lieu of answering the question of whether or not "machines" can "think" (whatever those two words actually mean).
The recent discussion of Helen Keller[1] and her description of learning the meaning of "I", strongly backs this assertion, on my opinion.
I read her words as implying that you can't have consciousness without self identity.
Partially it's because we're still wrapping our heads around what kind of experience this might enable. The tools still feel ahead of the medium. I think we're closer to Niépce than Muybridge.
In photography terms, we've just figured out how to capture photons on paper — and artists haven't figured out how to use that to make something interesting.
Or maybe it's that we instinctively feel that writing should still be linear writing, if reading is still going to be linear reading.
Personally I think the "photoshop for text" analogy shows just how misguided it is to expect people to tolerate words that were calculated, not crafted.
Literacy is too important to mess with like this.
https://github.com/Hellisotherpeople/Constrained-Text-Genera...
I'm obsessed with this idea of a proper LLM desktop class prosumer front-end. Something feeling like it was made by Adobe in a world where they didn't go to shit in the early 2010s. Blender, but for LLMs. Oobabooga, but actually good and not janky. It would ideally implement all forms of "representation engineering" and hacking or playing with the embedding/latent spaces, along with every other LLM feature folks would love to have but often don't know exist (i.e. constrained generation)
If you're a VC type reading this and believe in this idea, I really want to talk to you right about now.
Also, if you are an expert in DearPyGUI or DearImGUI, I want to talk to you right now.
https://en.wikipedia.org/wiki/Eadweard_Muybridge
(the article doesn’t bother to mention any of this until near the end in the tl;dr section, which since it’s tl and you dr, you never got to).
Example: Sometimes it's a symptom of a small business already wanted a reason to pivot to a new venture, and they keep the old thing going to profit from some old whales while in transition.
The full quote is more psychedelic, in the context of his experience with so-called ‘jeweled self-dribbling basketballs’ he would encounter on DMT trips, who he said were made of a kind of language, or ‘syntax binding light’:
“You wonder what to make of it. I’ve thought about this for years and years and years, and I don’t know why there should be an invisible syntactical intelligence giving language lessons in hyperspace. That certainly, consistently seems to be what is happening.
I’ve thought a lot about language as a result of that. First of all, it is the most remarkable thing we do.
Chomsky showed the deep structure of language is under genetic control, but that’s like the assembly language level. Local expressions of language are epigenetic.
It seems to me that language is some kind of enterprise of human beings that is not finished.
We have now left the grunts and the digs of the elbow somewhat in the dust. But the most articulate, brilliantly pronounced and projected English or French or German or Chinese is still a poor carrier of our intent. A very limited bandwidth for the intense compression of data that we are trying to put across to each other. Intense compression.
It occurs to me, the ratios of the senses, the ratio between the eye and the ear, and so forth, this also is not genetically fixed. There are ear cultures and there are eye cultures. Print cultures and electronic cultures. So, it may be that our perfection and our completion lies in the perfection and completion of the word.
Again, this curious theme of the word and its effort to concretize itself. A language that you can see is far less ambiguous than a language that you hear. If I read the paragraph of Proust, then we could spend the rest of the afternoon discussing, what did he mean? But if we look at a piece of sculpture by Henry Moore, we can discuss, what did he mean, but at a certain level, there is a kind of shared bedrock that isn’t in the Proust passage. We each stop at a different level with the textual passage. With the three-dimensional object, we all sort of start from the same place and then work out our interpretations. Is it a nude, is it an animal? Is it bronze, is it wood? Is it poignant, is it comical? So forth and so on.”
This post feels like the beginning of that concretization.
I would include this all the way up to higher intelligence itself, language is but the force carrier for intelligence. We've been developing muscles and balance for hundreds of millions of years, but our intelligence that communicates in advanced language is pretty much brand new.
I've always been highly articulate, and also frustrated by the limitations of spoken language. This is a common (maybe even the dominant?) theme in 20th century theatrical writing. People like Ibsen, Chekhov, Pinter, Genet, and Churchill all struggle with it in their own ways. People like Beckett and LePage and Sarah Kane ultimately kind of abandon language altogether.
Or, though poetry's not as much my field as theatre, you could go back to TS Eliot:
... Words strain, Crack, and sometimes break, under the burden, Under the tension, slip, slide, perish, Decay with imprecision, will not stay in place, Will not stay still.
My own speculation, along your lines, is that it's because sound is transient, hearing imperfect, and memory fallible. Even apart from ambiguity, two people will never quite agree on what was said. (Most of my arguments with my wife begin this way!) Even court transcripts, intended to eliminate this limitation, don't capture non-verbal cues.
As someone who's been marinated in the written and spoken word for all my life, research like this is fascinating, and slightly creepy: will all of the ghosts in the machine be exorcised? If those are blown away, and the bare mechanism of language exposed, what comes next?
Why do I suspect the offence will always be ahead of the defence in these areas?
I'd earlier suggested that everyone, in elementary school, ought to watch Ancient Aliens and attempt to note the moment where each episode jumps the shark. I take it we could attempt this with LLMs, now?
because destroying is easier than creating/entropy increases over time?
The only solution I can see is working on turning bad actors into good actors, or another way: positive reinforcement cycles.
No idea what that would look like with regard to LLMs though.
In nature we typically don't see something 'win' and that's the end of the story. I mean yes things do go extinct, but the winner always has something new to deal with. Could be a more advanced predator eating all it's food sources. Could be a bacteria that it's not resistant to. Simply put, when there's entropy on the table, something is going to evolve to take it with the least amount of work possible.
Once we have the 1000-dim vector embeddings I can make the rest work. Not sure how to go from 20-word span to a 1000-dim vector embedding.
all-MiniLM-L6-v2 is a really (if not the most) popular one (albeit not SotA), with 384 dimensions: https://huggingface.co/sentence-transformers/all-MiniLM-L6-v...
Edit: A more modern and robust suite of models comes from Nomic, and can generate embeddings with 64 to 768 dimensions (https://huggingface.co/nomic-ai/nomic-embed-text-v1.5).
When the author talks about thousands of dimensions, they're probably talking about the OpenAI embedding models.
This seems plausible, and amazing or terrible depending on the application.
An amazing application would be textbooks that adapt to use examples, analogies, pacing, etc. that enhance the reader’s engagement and understanding.
An unfortunate application would be mapping which features are persuasive to individual users for hyper-targeted advertising and propaganda.
A terrible application would be tracking latent political dissent to punish people for thought-crime.
[alas, HN scrubs venus and mars symbols, and I shall spare you all the ancient egyptian hieroglyphs and O'Keeffean mathematical symbols, so `f` and `m` they are]
Given how long these have been pored over by existing hyperconnected nanomachine networks (i.e. brains) it may be that we'll mostly unearth qualities humans can already detect, even if only subconsciously.
When it comes to separating truth and lies, perhaps the real trick the computer will bring is removing context, e.g. scoring text without confirmation bias towards its conclusion.
Great essay, but this small comment toward the end of the essay confused me. Is he saying that dogs never gallop?
I'm still not sure about the answer breed-by-breed, but searching for it led me to this interesting page illustrating different dog gaits: https://vanat.ahc.umn.edu/gaits/index.html
In particular, it seems to say that at least some dogs do the same "transverse gallop" that horses use: https://vanat.ahc.umn.edu/gaits/transGallop.html
And that greyhounds at least also do a "rotary gallop": https://vanat.ahc.umn.edu/gaits/rotGallop.html
I have a Vizsla (one of several breeds in the running for second fastest breed after greyhounds) and my guess is that she at times does both gallops. I can't find a reference to confirm this, though.
Site is struggling
We discover innovative ideas in companies and help them protect their IP.