Here comes the Muybridge camera moment but for text (opens in new tab)

(interconnected.org)

249 pointsRA2lover2y ago65 comments

65 comments

49 comments · 16 top-level

szvsw2y ago· 10 in thread

One thing I always find interesting but not discussed all that much at least in things I’ve read is - what happens in the spaces between the data? Obviously this is an incredibly high dimensional space which is only sparsely populated by the entirety of the English language; all tokens, etc. if the space is truly structured well enough, then there is a huge amount of interesting, implicit, almost platonic meaning occurring in the spaces between the data - synthetic? Dialectic? Idk. Anyways, I think those areas are a space that algorithmic intelligence will be able to develop its own notions of semantics and creativity in expression. Things that might typically be ineffable may find easy expression somewhere in embedding space. Heidegger’s thisness might be easily located somewhere in a latent representation… this is probably some linguistics 101 stuff but it’s still fascinating imo.

seanhunter2y ago

My intuition is that the voids in an embedding space are concepts which have essentially no meaning, so you will never find text that embeds into those spaces, and therefore they are not reachable.

For example take a syntactically plausible yet meaningless concept such as "the temperature of sorrowful liquid car parkings"[1]. That has nothing near it in embedding space I'd be prepared to guess. When you embed any corpus of text this phrase is going to drop into a big hole in the semantic space because while it has components which have some sort of meaning in each of your semantic dimensions, there isn't anything similar to the actual concept- there isn't any actual meaning there for something else to be similar to.

You need the spaces because there are so many possible different facets we are trying to capture when we talk about meaning but only a subset of those facets are applicable to the meaning of any one concept. So the dimensions in the embedding space are not independent or really orthogonal, and semantic concepts end up clustered in bunches with big gaps between them.

That's my intuition about it. When I get some time it's definitely something I want to study more.

[1] Off the top of my head but you can come up with an infinite number of similar examples

szvsw2y ago

> the temperature of sorrowful liquid car parkings

This is quite a beautiful, strange (estranging?) clause - at least in the sense that we (or I) constantly struggle to find meaning and patterns in what might simply be plain noise (apophenic beauty?). It’s a similar form of intrigue that I and I think others often experience when reading the outputs of LLMs operating in the high-temperature regime, though of course we are just talking about embedding/embedding inversion here.

On a human level though, it makes me wonder why you picked that phrase. Did you roll dice in front of a dictionary? Play madlibs? Were they the first words that came to your mind? Or perhaps you went through several iterations to come up with the perfectly meaningless combination? Or perhaps you simply spilled your hot chocolate on your favorite pair of pants or dress while getting out of the car this morning (or perhaps as a child) and the memory has stuck with you… who knows! Only you!

In any case, my original point was simply that these interstitial points in embedding spaces can become ways of referring to or communicating ideas that we simply do not have the words for but which are none-the-less potentially useful in a communication between two entities that both have the ability to come to some roughly shared understanding of what is being referred to or expressed by that point in the embedding space. Regular languages of course invent new words all the time, and yet the points those new words map to in the embedding space always existed (eh not a great example because the shape of the embedding space might change as new words/tokens are introduced to the lexicon but I think the idea holds). Perhaps new words or phrases will come about to bring some point back into textual space; or perhaps that point will remain solely in the shared lexicon of the algorithmic systems using the latent space to communicate ideas. Again, who knows!

For instance, consider the midpoint of a segment connecting two ideas, or the centroid of any simplex in the embedding space… if we assume that there is some sort of well-defined semantic structure in the space, is it necessarily the case that the centroid must refer to something which equally represents all of the nodes, a kind of lowest-common semantic denominator? Obviously if the semantic structure only holds over local regions but breaks down globally this is not the case, but if all the points are within a region of relatively sound semantic structure, that seems plausible. We know what happens when you do a latent space traversal for a VAE which generates images, and it can be quite beautiful and strange (or boring and familiar by 2024, depending on your perspective), but some similarly weird process might be possible with embedding space traversals, if only we could some how phenomenologically if not linguistically decode those interpolating points.

> concepts which have essentially no meaning

This is a pretty strange idea to try to wrap your head around.

1 more reply

skydhash2y ago

I strongly believe there's nothing there other than gibberish. Piping /dev/random to a word selector will probably enumerates everything inside that set. There's a reason we can translate between every language on earth. That's because it's the same earth and reality. So there's a common sets of concepts that gives us the foundational rules of languages. Which is the data that you're speaking about.

Buttons8402y ago

I think a concrete application of what your wondering is: What is the most useful word that doesn't exist?

szvsw2y ago

This sums up what I wrote above (as well as in a longer reply to a reply) much more elegantly and clearly than I ever could. Thank you!

Edit: but I might exchange the word useful for something else… maybe not…

mortenjorck2y ago

Now this is a fun idea. If you think of embeddings as a sort of quantization of latent space, what would happen if you “turned off” that quantization? It would obviously make no sense to us, as we can only understand the output of vectors that map to tokens in languages we speak, but you could imagine a language model writing something in a sort of platonic, infinitely precise language that another model with the same latent space could then interpret.

1 more reply

Der_Einzige2y ago

Ya I'm having my return to plato moment. It really feels like we are the dēmiurgós right now with AI systems. The nature of interpolation vs extrapolation and the exploration of latent spaces will answer a lot of philosophical questions that we didn't expect to be answered so quickly, and by computers of all things.

Kiro2y ago

That reminds me of the crazy output you get when raising the temperature and letting the model deviate from regular language. E.g. https://news.ycombinator.com/item?id=38779818

Cacti2y ago

The space is an uncountable set, at the limit. Mostly it’s noise. See: curse of dimensionality.

fortzi2y ago

If I’m not mistaking, the coordinates in any given latent space (in this context) are countable, as there is a finite amount of dimentions. You can even only consider the space enveloped by the already explored coordinates (e.g. English words), to get a finite space which can be fully enumerated.

mortenjorck2y ago· 7 in thread

Yes, yes, more explorations in this direction.

For a couple of years now, I've had this half-articulated sense that the uncanny ability of sufficiently-advanced language models to back into convincing simulations of conscious thought entirely via predicting language tokens means something profound about the nature of language itself.

I'm sure there are much smarter people than I thinking about this (and probably quite a bit of background reading that would help; Chomsky, perhaps McLuhan?) but it feels like, in parallel to everything going on in the development of LLMs, there's also something big about us waiting there under the surface.

skydhash2y ago

> convincing simulations of conscious thought entirely via predicting language tokens means something profound about the nature of language itself.

> there's also something big about us waiting there under the surface.

I don't believe so. In "The Origins of Knowledge and Imagination" by Jacob Brownoski, he argues that human language have four unique characteristics:

- We can separate information (data of what being described) from emotional content (how we're supposed to react). There's no longer a bijection between communication and action.

- We can extend the time reference of the communication content. We talk about the past, we plan for the future.

- We can refer to ourselves. So we examine what we've done and iterate over it until we fix the errors. We can see ourselves doing the action without actually doing it.

- We can rearrange units of languages to have different meanings. The same words can have different meanings based on their order. So meaning depends not only on the words, but their sequence. And that goes from words to phrases to sequence of dialogs.

The fourth point is the most important. LLMs by predicting languages tokens can give use the most common order for a particular context. And because we don't have that many words, their orders can be extracted from books and other written content. But then they fail for the higher levels, mostly because that's when everything get unique.

As for the third point, by observing ourselves, our communication is constantly being based on reality, which grounds it in truth. And because we can extend the reference it's based on, that leads us to observe changes and model laws. The first point allows us to separate what things are from what we should do or feel based on their existence and absence.

Instead of the LLMs fooling us, it's more us fooling ourselves, because by recognizing meaning in sentences, we try to extract meanings for longer sequences of text where there aren't any. Why? Because there is no "I" that has done the job of extracting information and using language to transmit it (while still cognizant of the imperfection of natural languages). LLMs are lossy compressions of ideas. Only the smallest survives and then it generates much more false ones.

justinjlynn2y ago

Are you certain that you're not playing with words to arrive at a predetermined conclusion? What is this "I" to which you're referring and how can you demonstrate that "I" does not or cannot exist within systems such as these? Further, if you are to find something which qualifies as an "I" elsewhere, what makes that elsewhere fundamentally different and therefore capable of supporting and being an "I" and is that elsewhere such simply by definition or in and of itself? Further, if the language usage is indistinguishable from the language usage of an "I", is the difference of source meaningful? If so, why?

1 more reply

brianush12y ago

Why does there need to be an "I" that uses language to transmit information? Language itself encodes information. I can read a piece of text and gain something from it. Where the text came from is irrelevant.

1 more reply

leobg2y ago

Chomsky, of all people? Chomsky rose to fame by attacking BF Skinner’s book “Verbal Behavior”. Which is the book that made exactly the case you’re making now, only some 60 years ago.

Skinner would marvel at today’s LLMs. They are the most elegant proof that intelligence is not just shaped by external contingencies, but that it is identical with those contingencies.

ryandv2y ago

To this list I would absolutely add Julian Jaynes' "The Origin of Consciousness in the Breakdown of the Bicameral Mind."

> simulations of conscious thought entirely via prediction language tokens

Jaynes goes so far as to assert that language generates consciousness, which is characterized by (amongst other features) its narrative structure, as well as its production of a metaphor of our selves that can inhabit a spatiotemporal mental space that serves as an analog for the physical world; the mental space where we imagine potential actions, play with ideas, predict future outcomes, and analyze concepts prior to taking action in the "real, actual" world.

The generation of metaphors is inextricably linked to the psychotechnology (to pull a word from vocabulary discussed by John Vervaeke in his "Awakening from the Meaning Crisis" series) of language, which is the means by which one object can be described and elaborated by its similarity to another. As an etymological example: the Sanskrit word "bhu" which means "to grow" forms the basis of the modern English verb "to be," but predates lofty abstract notions such as that of "being," "ontology," or "existence." It's from the known and the familiar (plant or animal growth) that we can reach out into the unknown and the unfamiliar (the concept of being), using (psycho-)technologies such as language to extend our cognition in the same way a hammer or a bicycle extends our body.

There is something here about language being the substrate of thought, and perhaps even consciousness in general as Jaynes would seem to assert in Book I of his 1976 work, where he spends a considerable amount of time discussing metaphor and language in connection to his definition of "consciousness."

There are also questions of "intentionality" and whether or not computers and their internal representations can actually be "about" something in the way that our language and our ideas can be "about" something in the physical (or even ideal) world that we want to discuss. Searle and the "Chinese room" argument come to mind.

Turing famously dodged this question in his paper "Computing Machinery and Intelligence" by substituting what is now called the "Turing test" in lieu of answering the question of whether or not "machines" can "think" (whatever those two words actually mean).

mikewarot2y ago

>Jaynes goes so far as to assert that language generates consciousness

The recent discussion of Helen Keller[1] and her description of learning the meaning of "I", strongly backs this assertion, on my opinion.

I read her words as implying that you can't have consciousness without self identity.

[1] https://news.ycombinator.com/item?id=40466814

furstenheim2y ago

100%, maybe intelligence is not as mysterious and extraordinary as we thought

kepano2y ago· 3 in thread

The repercussions of what the author summarizes as "could you colour-grade a book?" still feel wildly unknown to me, even after a couple years of thinking about it (see Photoshop for text [1][2]).

Partially it's because we're still wrapping our heads around what kind of experience this might enable. The tools still feel ahead of the medium. I think we're closer to Niépce than Muybridge.

In photography terms, we've just figured out how to capture photons on paper — and artists haven't figured out how to use that to make something interesting.

[1] https://news.ycombinator.com/item?id=33253606

[2] https://stephango.com/photoshop-for-text

throw463652y ago

> The tools still feel ahead of the medium.

Or maybe it's that we instinctively feel that writing should still be linear writing, if reading is still going to be linear reading.

Personally I think the "photoshop for text" analogy shows just how misguided it is to expect people to tolerate words that were calculated, not crafted.

Literacy is too important to mess with like this.

kepano2y ago

Genuine question — do you think synthetic images pose less of a problem than synthetic text? If yes, why?

1 more reply

Der_Einzige2y ago

I have proof from my commit history on the readme to CTGS[1] that my usage of the term "Photoshop for Creative Writing" (What I tried to market it as) predates all of this by... years now.

https://github.com/Hellisotherpeople/Constrained-Text-Genera...

I'm obsessed with this idea of a proper LLM desktop class prosumer front-end. Something feeling like it was made by Adobe in a world where they didn't go to shit in the early 2010s. Blender, but for LLMs. Oobabooga, but actually good and not janky. It would ideally implement all forms of "representation engineering" and hacking or playing with the embedding/latent spaces, along with every other LLM feature folks would love to have but often don't know exist (i.e. constrained generation)

If you're a VC type reading this and believe in this idea, I really want to talk to you right about now.

Also, if you are an expert in DearPyGUI or DearImGUI, I want to talk to you right now.

dhosek2y ago· 3 in thread

For those perplexed by the headline, the Muybridge camera moment refers to Eadweard Muybridge who managed via camera photos taken in rapid succession to prove that when a horse runs it at times has all four legs above the ground.

https://en.wikipedia.org/wiki/Eadweard_Muybridge

(the article doesn’t bother to mention any of this until near the end in the tl;dr section, which since it’s tl and you dr, you never got to).

Animats2y ago

(On an irrelevant note, the Stanford Barn, where those pictures were taken, has gradually been closed off to the world. It was open to the public until COVID. It's still there, and there's a Stanford equestrian team, but road access has been cut and all mentions of the barn removed from directional signs.)

gausswho2y ago

There are so many of these places I've encountered what used to be publicly available pre-COVID and are no longer. The reasons/excuses vary.

Example: Sometimes it's a symptom of a small business already wanted a reason to pivot to a new venture, and they keep the old thing going to profit from some old whales while in transition.

2 more replies

stavros2y ago

Not only that, but the tldr basically only talks about that, so it's not much of a summary at all. I read the tldr and I have no idea what the article is about.

sebmellen2y ago· 2 in thread

Terence McKenna phrased this wonderfully, by saying “It seems to me that language is some kind of enterprise of human beings that is not finished.”

The full quote is more psychedelic, in the context of his experience with so-called ‘jeweled self-dribbling basketballs’ he would encounter on DMT trips, who he said were made of a kind of language, or ‘syntax binding light’:

“You wonder what to make of it. I’ve thought about this for years and years and years, and I don’t know why there should be an invisible syntactical intelligence giving language lessons in hyperspace. That certainly, consistently seems to be what is happening.

I’ve thought a lot about language as a result of that. First of all, it is the most remarkable thing we do.

Chomsky showed the deep structure of language is under genetic control, but that’s like the assembly language level. Local expressions of language are epigenetic.

It seems to me that language is some kind of enterprise of human beings that is not finished.

We have now left the grunts and the digs of the elbow somewhat in the dust. But the most articulate, brilliantly pronounced and projected English or French or German or Chinese is still a poor carrier of our intent. A very limited bandwidth for the intense compression of data that we are trying to put across to each other. Intense compression.

It occurs to me, the ratios of the senses, the ratio between the eye and the ear, and so forth, this also is not genetically fixed. There are ear cultures and there are eye cultures. Print cultures and electronic cultures. So, it may be that our perfection and our completion lies in the perfection and completion of the word.

Again, this curious theme of the word and its effort to concretize itself. A language that you can see is far less ambiguous than a language that you hear. If I read the paragraph of Proust, then we could spend the rest of the afternoon discussing, what did he mean? But if we look at a piece of sculpture by Henry Moore, we can discuss, what did he mean, but at a certain level, there is a kind of shared bedrock that isn’t in the Proust passage. We each stop at a different level with the textual passage. With the three-dimensional object, we all sort of start from the same place and then work out our interpretations. Is it a nude, is it an animal? Is it bronze, is it wood? Is it poignant, is it comical? So forth and so on.”

This post feels like the beginning of that concretization.

pixl972y ago

> “It seems to me that language is some kind of enterprise of human beings that is not finished.”

I would include this all the way up to higher intelligence itself, language is but the force carrier for intelligence. We've been developing muscles and balance for hundreds of millions of years, but our intelligence that communicates in advanced language is pretty much brand new.

eszed2y ago

Fascinating comment, that articulates the point of TFA better than TFA did.

I've always been highly articulate, and also frustrated by the limitations of spoken language. This is a common (maybe even the dominant?) theme in 20th century theatrical writing. People like Ibsen, Chekhov, Pinter, Genet, and Churchill all struggle with it in their own ways. People like Beckett and LePage and Sarah Kane ultimately kind of abandon language altogether.

Or, though poetry's not as much my field as theatre, you could go back to TS Eliot:

... Words strain, Crack, and sometimes break, under the burden, Under the tension, slip, slide, perish, Decay with imprecision, will not stay in place, Will not stay still.

My own speculation, along your lines, is that it's because sound is transient, hearing imperfect, and memory fallible. Even apart from ambiguity, two people will never quite agree on what was said. (Most of my arguments with my wife begin this way!) Even court transcripts, intended to eliminate this limitation, don't capture non-verbal cues.

As someone who's been marinated in the written and spoken word for all my life, research like this is fascinating, and slightly creepy: will all of the ghosts in the machine be exorcised? If those are blown away, and the bare mechanism of language exposed, what comes next?

0823498723498722y ago· 2 in thread

> What would it mean to listen to a politician speak on TV, and in real-time see a rhetorical manoeuvre that masks a persuasive bait and switch?

Why do I suspect the offence will always be ahead of the defence in these areas?

I'd earlier suggested that everyone, in elementary school, ought to watch Ancient Aliens and attempt to note the moment where each episode jumps the shark. I take it we could attempt this with LLMs, now?

rablackburn2y ago

> Why do I suspect the offence will always be ahead of the defence in these areas?

because destroying is easier than creating/entropy increases over time?

The only solution I can see is working on turning bad actors into good actors, or another way: positive reinforcement cycles.

No idea what that would look like with regard to LLMs though.

pixl972y ago

At the end of the day there is no permanent solution.

In nature we typically don't see something 'win' and that's the end of the story. I mean yes things do go extinct, but the winner always has something new to deal with. Could be a more advanced predator eating all it's food sources. Could be a bacteria that it's not resistant to. Simply put, when there's entropy on the table, something is going to evolve to take it with the least amount of work possible.

kaycebasques2y ago· 2 in thread

> Looking at this plot by @oca.computer, I feel like I’m peering into the world’s first microscope and spying bacteria, or through a blurry, early telescope, and spotting invisible dots that turn out to be the previously unknown moons of Jupiter… There is something there! New information to be interpreted!

1024core2y ago

Any tools to replicate @oca.computer's work?

Once we have the 1000-dim vector embeddings I can make the rest work. Not sure how to go from 20-word span to a 1000-dim vector embedding.

10c82y ago

Generating embeddings is relatively simple with a model and Python code. There's plenty of them on HuggingFace, along with code examples.

all-MiniLM-L6-v2 is a really (if not the most) popular one (albeit not SotA), with 384 dimensions: https://huggingface.co/sentence-transformers/all-MiniLM-L6-v...

Edit: A more modern and robust suite of models comes from Nomic, and can generate embeddings with 64 to 768 dimensions (https://huggingface.co/nomic-ai/nomic-embed-text-v1.5).

When the author talks about thousands of dimensions, they're probably talking about the OpenAI embedding models.

zharknado2y ago· 1 in thread

> Could you dynamically change the register or tone of text depending on audience, or the reading age, or dial up the formality or subjective examples or mentions of wildlife, depending on the psychological fingerprint of the reader or listener?

This seems plausible, and amazing or terrible depending on the application.

An amazing application would be textbooks that adapt to use examples, analogies, pacing, etc. that enhance the reader’s engagement and understanding.

An unfortunate application would be mapping which features are persuasive to individual users for hyper-targeted advertising and propaganda.

A terrible application would be tracking latent political dissent to punish people for thought-crime.

lsaferite2y ago

I'm sure it comes up frequently, but the adapting textbook thought reminds me of the "Young Lady's Illustrated Primer" from Diamond Age.

Animats2y ago· 1 in thread

So embedding space itself is interesting. It's more than a step to an LLM. That's been known for a while, back to that early result where "King" - "Man" + "Woman" -> "Queen". This article, though, suggests more uses for embedding spaces. This could be interesting. It's a step beyond viewing them as a black box.

0823498723498722y ago

Is ♔ - m + f = ♕ specific to embeddings, or does it also work in https://en.wikipedia.org/wiki/Formal_concept_analysis#Exampl... ? (either as ♔ ⊕ f ⊕ m = ♕ or as ♔ ⋀ not(m) ⋁ f = ♕?)

[alas, HN scrubs venus and mars symbols, and I shall spare you all the ancient egyptian hieroglyphs and O'Keeffean mathematical symbols, so `f` and `m` they are]

Terr_2y ago· 1 in thread

> What if the difference between statements that are simply speculative and statement that mislead are as obvious as, I don’t know, the difference between a photo and a hand-drawn sketch?

Given how long these have been pored over by existing hyperconnected nanomachine networks (i.e. brains) it may be that we'll mostly unearth qualities humans can already detect, even if only subconsciously.

When it comes to separating truth and lies, perhaps the real trick the computer will bring is removing context, e.g. scoring text without confirmation bias towards its conclusion.

TeMPOraL2y ago

LLMs seem to do more of what brains do unconsciously, rather than consciously. Which means brains may be better at rating e.g. trustworthiness of some text, but they don't surface specific ratings to the conscious level. Meanwhile, language models seem to be able to expose those features as knobs, allowing you to boost or attenuate them. So you get to drag the e.g. "excited" slider down to minimum, and get a text that may be easier to process at a conscious level. Having a slider to remove rhetoric from text would be really useful development.

nkurz2y ago· 1 in thread

> "Even in 1821, horses were wrongly depicted running like dogs."

Great essay, but this small comment toward the end of the essay confused me. Is he saying that dogs never gallop?

I'm still not sure about the answer breed-by-breed, but searching for it led me to this interesting page illustrating different dog gaits: https://vanat.ahc.umn.edu/gaits/index.html

In particular, it seems to say that at least some dogs do the same "transverse gallop" that horses use: https://vanat.ahc.umn.edu/gaits/transGallop.html

And that greyhounds at least also do a "rotary gallop": https://vanat.ahc.umn.edu/gaits/rotGallop.html

I have a Vizsla (one of several breeds in the running for second fastest breed after greyhounds) and my guess is that she at times does both gallops. I can't find a reference to confirm this, though.

Maken2y ago

In the linked article (https://www.amusingplanet.com/2019/06/the-galloping-horse-pr...) there are some examples of "wrong" galloping horses. The first two examples look like the "rotary gallop", which is how a dog or a cat, not a horse, would run. The third example is plainly wrong, because the horses are mid-air but seemly ready to land in one leg.

failrate2y ago

For a game based on semantic vectors: https://semantle.com/

qup2y ago

https://archive.is/EcQfE

Site is struggling

nickreese2y ago

I thoroughly enjoyed reading this style of loose connected thoughts.

anigbrowl2y ago

Zardoz predicted this ~50 years ago

lettergram2y ago

Quite literally what my company does - https://ipcopilot.ai/

We discover innovative ideas in companies and help them protect their IP.

j / k navigate · click thread line to collapse

65 comments

49 comments · 16 top-level

szvsw2y ago· 10 in thread

seanhunter2y ago

My intuition is that the voids in an embedding space are concepts which have essentially no meaning, so you will never find text that embeds into those spaces, and therefore they are not reachable.

That's my intuition about it. When I get some time it's definitely something I want to study more.

[1] Off the top of my head but you can come up with an infinite number of similar examples

szvsw2y ago

> the temperature of sorrowful liquid car parkings

> concepts which have essentially no meaning

This is a pretty strange idea to try to wrap your head around.

1 more reply

skydhash2y ago

Buttons8402y ago

I think a concrete application of what your wondering is: What is the most useful word that doesn't exist?

szvsw2y ago

This sums up what I wrote above (as well as in a longer reply to a reply) much more elegantly and clearly than I ever could. Thank you!

Edit: but I might exchange the word useful for something else… maybe not…

mortenjorck2y ago

1 more reply

Der_Einzige2y ago

Kiro2y ago

That reminds me of the crazy output you get when raising the temperature and letting the model deviate from regular language. E.g. https://news.ycombinator.com/item?id=38779818

Cacti2y ago

The space is an uncountable set, at the limit. Mostly it’s noise. See: curse of dimensionality.

fortzi2y ago

mortenjorck2y ago· 7 in thread

Yes, yes, more explorations in this direction.

skydhash2y ago

> convincing simulations of conscious thought entirely via predicting language tokens means something profound about the nature of language itself.

> there's also something big about us waiting there under the surface.

I don't believe so. In "The Origins of Knowledge and Imagination" by Jacob Brownoski, he argues that human language have four unique characteristics:

- We can separate information (data of what being described) from emotional content (how we're supposed to react). There's no longer a bijection between communication and action.

- We can extend the time reference of the communication content. We talk about the past, we plan for the future.

- We can refer to ourselves. So we examine what we've done and iterate over it until we fix the errors. We can see ourselves doing the action without actually doing it.

justinjlynn2y ago

1 more reply

brianush12y ago

1 more reply

leobg2y ago

Chomsky, of all people? Chomsky rose to fame by attacking BF Skinner’s book “Verbal Behavior”. Which is the book that made exactly the case you’re making now, only some 60 years ago.

Skinner would marvel at today’s LLMs. They are the most elegant proof that intelligence is not just shaped by external contingencies, but that it is identical with those contingencies.

ryandv2y ago

To this list I would absolutely add Julian Jaynes' "The Origin of Consciousness in the Breakdown of the Bicameral Mind."

> simulations of conscious thought entirely via prediction language tokens

mikewarot2y ago

>Jaynes goes so far as to assert that language generates consciousness

The recent discussion of Helen Keller[1] and her description of learning the meaning of "I", strongly backs this assertion, on my opinion.

I read her words as implying that you can't have consciousness without self identity.

[1] https://news.ycombinator.com/item?id=40466814

furstenheim2y ago

100%, maybe intelligence is not as mysterious and extraordinary as we thought

kepano2y ago· 3 in thread

The repercussions of what the author summarizes as "could you colour-grade a book?" still feel wildly unknown to me, even after a couple years of thinking about it (see Photoshop for text [1][2]).

Partially it's because we're still wrapping our heads around what kind of experience this might enable. The tools still feel ahead of the medium. I think we're closer to Niépce than Muybridge.

In photography terms, we've just figured out how to capture photons on paper — and artists haven't figured out how to use that to make something interesting.

[1] https://news.ycombinator.com/item?id=33253606

[2] https://stephango.com/photoshop-for-text

throw463652y ago

> The tools still feel ahead of the medium.

Or maybe it's that we instinctively feel that writing should still be linear writing, if reading is still going to be linear reading.

Personally I think the "photoshop for text" analogy shows just how misguided it is to expect people to tolerate words that were calculated, not crafted.

Literacy is too important to mess with like this.

kepano2y ago

Genuine question — do you think synthetic images pose less of a problem than synthetic text? If yes, why?

1 more reply

Der_Einzige2y ago

I have proof from my commit history on the readme to CTGS[1] that my usage of the term "Photoshop for Creative Writing" (What I tried to market it as) predates all of this by... years now.

https://github.com/Hellisotherpeople/Constrained-Text-Genera...

If you're a VC type reading this and believe in this idea, I really want to talk to you right about now.

Also, if you are an expert in DearPyGUI or DearImGUI, I want to talk to you right now.

dhosek2y ago· 3 in thread

https://en.wikipedia.org/wiki/Eadweard_Muybridge

(the article doesn’t bother to mention any of this until near the end in the tl;dr section, which since it’s tl and you dr, you never got to).

Animats2y ago

gausswho2y ago

There are so many of these places I've encountered what used to be publicly available pre-COVID and are no longer. The reasons/excuses vary.

Example: Sometimes it's a symptom of a small business already wanted a reason to pivot to a new venture, and they keep the old thing going to profit from some old whales while in transition.

2 more replies

stavros2y ago

Not only that, but the tldr basically only talks about that, so it's not much of a summary at all. I read the tldr and I have no idea what the article is about.

sebmellen2y ago· 2 in thread

Terence McKenna phrased this wonderfully, by saying “It seems to me that language is some kind of enterprise of human beings that is not finished.”

I’ve thought a lot about language as a result of that. First of all, it is the most remarkable thing we do.

Chomsky showed the deep structure of language is under genetic control, but that’s like the assembly language level. Local expressions of language are epigenetic.

It seems to me that language is some kind of enterprise of human beings that is not finished.

This post feels like the beginning of that concretization.

pixl972y ago

> “It seems to me that language is some kind of enterprise of human beings that is not finished.”

eszed2y ago

Fascinating comment, that articulates the point of TFA better than TFA did.

Or, though poetry's not as much my field as theatre, you could go back to TS Eliot:

... Words strain, Crack, and sometimes break, under the burden, Under the tension, slip, slide, perish, Decay with imprecision, will not stay in place, Will not stay still.

0823498723498722y ago· 2 in thread

> What would it mean to listen to a politician speak on TV, and in real-time see a rhetorical manoeuvre that masks a persuasive bait and switch?

Why do I suspect the offence will always be ahead of the defence in these areas?

rablackburn2y ago

> Why do I suspect the offence will always be ahead of the defence in these areas?

because destroying is easier than creating/entropy increases over time?

The only solution I can see is working on turning bad actors into good actors, or another way: positive reinforcement cycles.

No idea what that would look like with regard to LLMs though.

pixl972y ago

At the end of the day there is no permanent solution.

kaycebasques2y ago· 2 in thread

1024core2y ago

Any tools to replicate @oca.computer's work?

Once we have the 1000-dim vector embeddings I can make the rest work. Not sure how to go from 20-word span to a 1000-dim vector embedding.

10c82y ago

Generating embeddings is relatively simple with a model and Python code. There's plenty of them on HuggingFace, along with code examples.

all-MiniLM-L6-v2 is a really (if not the most) popular one (albeit not SotA), with 384 dimensions: https://huggingface.co/sentence-transformers/all-MiniLM-L6-v...

Edit: A more modern and robust suite of models comes from Nomic, and can generate embeddings with 64 to 768 dimensions (https://huggingface.co/nomic-ai/nomic-embed-text-v1.5).

When the author talks about thousands of dimensions, they're probably talking about the OpenAI embedding models.

zharknado2y ago· 1 in thread

This seems plausible, and amazing or terrible depending on the application.

An amazing application would be textbooks that adapt to use examples, analogies, pacing, etc. that enhance the reader’s engagement and understanding.

An unfortunate application would be mapping which features are persuasive to individual users for hyper-targeted advertising and propaganda.

A terrible application would be tracking latent political dissent to punish people for thought-crime.

lsaferite2y ago

I'm sure it comes up frequently, but the adapting textbook thought reminds me of the "Young Lady's Illustrated Primer" from Diamond Age.

Animats2y ago· 1 in thread

0823498723498722y ago

[alas, HN scrubs venus and mars symbols, and I shall spare you all the ancient egyptian hieroglyphs and O'Keeffean mathematical symbols, so `f` and `m` they are]

Terr_2y ago· 1 in thread

> What if the difference between statements that are simply speculative and statement that mislead are as obvious as, I don’t know, the difference between a photo and a hand-drawn sketch?

When it comes to separating truth and lies, perhaps the real trick the computer will bring is removing context, e.g. scoring text without confirmation bias towards its conclusion.

TeMPOraL2y ago

nkurz2y ago· 1 in thread

> "Even in 1821, horses were wrongly depicted running like dogs."

Great essay, but this small comment toward the end of the essay confused me. Is he saying that dogs never gallop?

I'm still not sure about the answer breed-by-breed, but searching for it led me to this interesting page illustrating different dog gaits: https://vanat.ahc.umn.edu/gaits/index.html

In particular, it seems to say that at least some dogs do the same "transverse gallop" that horses use: https://vanat.ahc.umn.edu/gaits/transGallop.html

And that greyhounds at least also do a "rotary gallop": https://vanat.ahc.umn.edu/gaits/rotGallop.html

I have a Vizsla (one of several breeds in the running for second fastest breed after greyhounds) and my guess is that she at times does both gallops. I can't find a reference to confirm this, though.

Maken2y ago

failrate2y ago

For a game based on semantic vectors: https://semantle.com/

qup2y ago

https://archive.is/EcQfE

Site is struggling

nickreese2y ago

I thoroughly enjoyed reading this style of loose connected thoughts.

anigbrowl2y ago

Zardoz predicted this ~50 years ago

lettergram2y ago

Quite literally what my company does - https://ipcopilot.ai/

We discover innovative ideas in companies and help them protect their IP.

j / k navigate · click thread line to collapse