Giving GPT “Infinite” Knowledge (opens in new tab)

(sudoapps.substack.com)

121 pointssudoapps3y ago86 comments

86 comments

62 comments · 15 top-level

ftxbro3y ago· 11 in thread

> "Once these models achieve a high level of comprehension, training larger models with more data may not offer significant improvements (not to be mistaken with reinforcement learning through human feedback). Instead, providing LLMs with real-time, relevant data for interpretation and understanding can make them more valuable."

To me this viewpoint looks totally alien. Imagine you have been training this model to predict the next token. At first it can barely interleave vowels and consonants. Then it can start making words, then whole sentences. Then it starts unlocking every cognitive ability one by one. It begins to pass nearly every human test and certification exam and psychological test of theory of mind.

Now imagine thinking at this point "training larger models with more data may not offer significant improvements" and deciding that's why you stop scaling it. That makes absolutely no sense to me unless 1) you have no imagination or 2) you want to stop because you are scared to make superhuman intelligence or 3) you are lying to throw off competitors or regulators or other people.

spacephysics3y ago

I don’t think we’re close to super human intelligence in the colloquial sense.

ChatGPT scrapes all the information given, then predicts the next token. It has no ability to understand what is truthful or correct. It’s as good as the data being fed to it.

To me, this is a step closer to AGI but we’re still far off. There’s a difference between “what’s statistically likely to be the next word” vs “despite this being the most likely next word, it’s actually wrong and here’s why”

If we say, “well, we’ll tell chatgpt what the correct sources of information are” that’s no better really. It’s not reasoning, it’s just a neutered data set.

I imagine they need to add something like chatgpt 4 has with live internet models or something else to get the next meaningful bump

I don’t recall who said it, but a similar thread had a researcher in the field express that we have squeezed far more juice than expected from these transformer models. Not that new progress in this direction can be made, but it seems like we’re approaching diminishing returns

I believe the next step that’s close is to have these train on less and less horsepower. If we can have these models run on a phone locally, oh boy that’s gonna be something

famouswaffles3y ago

GPT's already forgo the surface level statistically most likely next word for words that are more context appropriate. That's one of the biggest reasons they are so useful.

The truth is that functionally/technically, there's plenty left to squeeze. The bigger issue is that we're hitting a wall economically.

1 more reply

firecall3y ago

> ChatGPT scrapes all the information given, then predicts the next token. It has no ability to understand what is truthful or correct. It’s as good as the data being fed to it.

That is precisely true of Humans as well though! :-)

1 more reply

muskmusk3y ago

I agree with your general premise, but I think you left a couple of points of your list at the end:

it is obscenely expensive to keep training + there are other more low hanging fruit + you expect hardware to get better over time.

I don't think Altman is trying to fool anyone. Even if he were it wouldn't work. The competition is not that stupid and he knows that :)

It's just that hardware tends to get better at a rate that resembles Moore's law so in 18 months the cost of training a 100 mill dollar model is 50 mill dollar. You certainly can just throw money at the problem, but it's expensive and there are other options that are just as effective for now. Why spend money on things that are half as valuable in 18 months when you can spend money on things that don't devalue as fast like producing more/better data?

All that being said you can bet your ass there will be a gpt5 :)

tyre3y ago

It's possible that training with more data has diminishing gains. For example, we know that current LLMs have a problem with hallucination, so maybe a more valuable next area of research/development is to fix that.

Or work on consistency within a scope. For example, it can't write a novel because it doesn't have object consistency. A character will be 15 years old then 28 years old three sentences later.

Or allow it database/API access so it can interpolate canonical information into its responses.

None of these have to do with scale of data (as far as I understand.) All of them are, in my opinion, higher ROI areas for development for LLM => AGI.

HarHarVeryFunny3y ago

These LLMs are trained to model humans - they are going to be penalized, not rewarded, if they generate outputs that disagree with the training data, whether due to being too dumb OR too smart.

Best you can hope for is that they combine the expertise of all authors in the training data, which would be very impressive, but more top-tier human than super-human. However, achieving this level of performance may well be beyond what a transformer of any size can do. It may take a better architecture.

I suspect that there is also probably a dumbing-down effect by training the model on material from people who themselves are on a spectrum of different abilities. Simply put the model is being rewarded when trained for being correct as often as possible (i.e on average), so if it saw the same subject matter in the training set 10 times, once by an expert and 10x by mid-wits, then it's going to be rewarded for mid-wit performance.

sudoappsOP3y ago

This wasn't mean't to say that all training would stop. I think, to some extent, the model won't need additional recent data (that is already similar in structure to what it has) to better understand language and interpret the next set of characters. I could be completely wrong, but I still think techniques like transformers, RLHF and of course others will still exist and evolve to eventually get to some higher intelligence level.

nomel3y ago

This assumes that current neural networks topologies can "solve" intelligence. "Gains" could be a problem of missing subsystems, rather than missing data.

For a squishy example of a known conscious system, if you scoop out certain small, relatively fixed, regions of our brains, you can make consciousness, memory, and learning mostly cease. This suggests it's partly due to special subsystems, rather than total connection count.

vidarh3y ago

I think it's more a question of diminishing return and the cost of scaling it up, which is getting to a point where looking for ways of maximizing the impact of what is there makes sense. I'm sure we'll see models trained on more data, but maybe after efficiency improvements makes it cheaper both to train and run large models.

joshspankit3y ago

My takeaway from his statements is that if you sum up all of human knowledge then add every unique bit of knowledge that humans could uncover in the next 20 years, there’s a plateau and that plateau is probably lower than our dreams of what LLMs can do.

woah3y ago

Maybe it gets twice as good each time you spend 10x more training it. In this case, you might indeed hit a wall at some point.

furyofantares3y ago· 10 in thread

Embeddings-based search is a nice improvement on search, but it's still search. Relative to ChatGPT answering on its training data, I find embeddings-based search to be severely lacking. The right comparison is to traditional search, where it becomes favorable.

It has the same advantages search has over ChatGPT (being able to cite sources, being quite unlikely to hallucinate) and it has some of the advantages ChatGPT has over search (not needing exact query) - but in my experience it's not really in the new category of information discovery that ChatGPT introduced us to.

Maybe with more context I'll change my tune, but it's very much at the whim of the context retrieval finding everything you need to answer the query. That's easy for stuff that search is already good at, and so provides a better interface for search. But it's hard for stuff that search isn't good at, because, well: it's search.

sudoappsOP3y ago

Agreed, GPT answering based on its own training data has been the best experience by far (aside from hallucinations) and comparing against that is difficult. Embeddings might not even be the long term solution. I think it's still early to really know for certain but models are already getting better at interpreting with less overall training data so there are bound to be some new ideas.

b33j0r3y ago

I’m sure many of you have tried generating epic conversations from history. With work and luck, I’ve read stuff way better than college.

But 90% of the time, it’s two barely distinct personalities chatting back and forth:

Me: Hey brian, what do you think of AI?

Brian: It’s great!

Me: I’m so glad we agree.

Brian: Great, this increases the training weight of Brian agreeing with Brian to a much more accurate level!

Me: Agree!

b33j0r3y ago

Many points stated well. Agree. Now, I’m not certain of this, but I’m starting to get an intuition that duct-taping databases to an agent isn’t going to be the answer (I still kinda feel like hundreds of agents might be).

But these optimizations are applications of technology stacks we already know about. Sometimes, this era of AI research reminds me of all the whacky contraptions from the era before building airplanes became an engineering discipline.

I would likely have tried building a backyard ornithopter powered by mining explosives, if I had been alive during that period of experimentation.

Prediction: the best interfaces for this will be the ones we use for everything else as humans. I am trying to approach it more like that, and less like APIs and “document vs relational vs vector storage”.

chartpath3y ago

I can understand why that framing would be attractive, but there is no real fundamental difference when considering JSONB/HSTORE in PostgreSQL, and now we have things like pgvector https://github.com/pgvector/pgvector to store and search over embeddings (including k-nn).

1 more reply

sebzim45003y ago

My intuition is that it would work much better if the model could choose what to search for with something like langchain. The problem is that we don't know how to train such a system properly, we mainly do supervised finetuning on human examples of using the tools but this is fundamentally a reinforcement learning problem (RL is just hard).

1 more reply

kordlessagain3y ago

Vector search with move tos and move aways based on feedback is much more than attaching a database…

fzliu3y ago

Encoder-decoder (attention) architectures still have a tough time with long-range dependencies, so even with longer context lengths, you'll still need a retrieval solution.

I agree that there's probably a better solution than pure embedding-based or mixed embedding/keyword search, but the "better" solution will still be based around semantics... aka embeddings.

mlyle3y ago

> It has the same advantages search has over ChatGPT (being able to cite sources, being quite unlikely to hallucinate) and it has some of the advantages ChatGPT has over search (not needing exact query) - but in my experience it's not really in the new category of information discovery that ChatGPT introduced us to.

I think the two could be paired up effectively. Context windows are getting bigger, but are still limited in the amount of information ChatGPT can sift through. This in turn limits the utility of current plugin based approaches.

Letting ChatGPT ask for relevant information, and sift through it based on its internal knowledge, seems valuable. If nothing else, it allows "learning" from recent development and effectively would augment its reasoning capability by having more information in working memory.

stavros3y ago

Is there any way to fine-tune GPT to make documentation a part of its training set, so you won't need embeddings? OpenAI lets you fine-tune GPT-3, but I don't know how well that works.

sudoappsOP3y ago

OpenAI doesn't let you fine-tune GPT-4 or GPT-3.5 yet (https://platform.openai.com/docs/guides/fine-tuning), but fine-tuning models on a set of documents is still an option but not really scalable if you want to keep feeding it more relevant information over time. I guess it could depend on the base model you are using and its size.

orasis3y ago· 10 in thread

One caveat about about embedding based retrieval is that there is no guarantee that the embedded documents will look like the query.

One trick is to have a LLM hallucinate a document based on the query, and then embed that hallucinated document. Unfortunately this increases the latency since it incurs another round trip to the LLM.

taberiand3y ago

Is that something easily handed off to a faster/cheaper LLM? I'm imagining something like running the main process through GPT-4 and hand of the hallucinations to GPT 3 turbo.

If you could spot the need for it while streaming a response you could possibly even have it ready ahead of time

d4rkp4ttern3y ago

Some people packaged this rather intuitive idea, named it Hyde (Hypothetical Document Embeddings) and wrote a paper about it —

https://arxiv.org/abs/2212.10496

Summary —

HyDE is a new method for creating effective zero-shot dense retrieval systems that generates hypothetical documents based on queries and encodes them using an unsupervised contrastively learned encoder to identify relevant documents. It outperforms state-of-the-art unsupervised dense retrievers and performs strongly compared to fine-tuned retrievers across various tasks and languages.

wasabi9910113y ago

>One caveat about about embedding based retrieval is that there is no guarantee that the embedded documents will look like the query.

Aleph Alpha provides an asymmetric embedding model which I believe is an attempt to resolve this issue (haven't looked into it much, just saw the entry in langchain's documentation)

rco87863y ago

> One trick is to have a LLM hallucinate a document based on the query

I'm not following why you would want to do this? At that point, just asking the LLM without any additional context would/should produce the same (inaccurate) results.

BoorishBears3y ago

You're not having the LLM answer from the hallucination, you're looking for the document that looks most similar to the hallucination and having it answer on that instead.

redskyluan3y ago

i have an opposite way on doing this. Tried to generate questions based on doc chunks and embedding on questions. It works perfect!

ck_one3y ago

How do you generate the questions and how do you make sure to not lose information?

E.g. Today I woke up at 9.am, had a light breakfast and then went on a run in Golden Gate Park.

What questions do you generate from this sentence?

1 more reply

selfhoster113y ago

This sounds like a fantastic approach. I will try this with my own LLM/search-and-retrieval projects.

orasis3y ago

Nice! Do you generate N questions so N embeddings per document or just one?

williamcotton3y ago

“We’re gonna need a bigger boat.”

Beltiras3y ago· 5 in thread

I'm working on something where I need to basically add on the order of 150,000 tokens into the knowledge base of an LLM. Finding out slowly I need to delve into training a whole ass LLM to do it. Sigh.

v3ss0n3y ago

https://deepai.org/publication/scaling-transformer-to-1m-tok...

Can this be implemented in current opensource models?

akvadrako3y ago

Can't you use fine-tuning for this?

A other option is to ask GPT to compress your tokens into a shorter prompt for itself.

RhodesianHunter3y ago

Or, at this rate, just wait 6 months.

Zetice3y ago

I don't think this rate is sustainable. [0]

[0] https://www.theverge.com/2023/4/14/23683084/openai-gpt-5-rum...

Beltiras3y ago

When I would have had to add another 2 batches of ~150,000 tokens.....

Der_Einzige3y ago· 3 in thread

I get annoyed by articles like this. Yes, it's cool to educate readers who aren't aware of embeddings/embeddings stores/vectorDB technologies that this is possible.

What these articles don't touch on is what to do once you've got the most relevant documents. Do you use the whole document as context directly? Do you summarize the documents first using the LLM (now the risk of hallucination in this step is added)? What about that trick where you shrink a whole document of context down to the embedding space of a single token (which is how ChatGPT is remembering the previous conversations). Doing that will be useful but still lossey

What about simply asking the LLM to craft its own search prompt to the DB given the user input, rather than returning articles that semantically match the query the closest? This would also make hybird search (keyword or bm25 + embeddings) more viable in the context of combining it with an LLM

Figuring out which of these choices to make, along with an awful lot more choices I'm likely not even thinking about right now, is what will seperate the useful from the useless LLM + Extractive knowledge systems

EForEndeavour3y ago

> What about that trick where you shrink a whole document of context down to the embedding space of a single token (which is how ChatGPT is remembering the previous conversations)

This is news to me. Where could I read about this trick?

1 more reply

sudoappsOP3y ago

The article is definitely still high level and mean't to provide enough understanding of what capabilities are today. Some of what you are mentioning goes deeper on how you take these learnings/tools and come up with the any number of solutions to fit the problem you are solving for.

> "Do you use the whole document as context directly? Do you summarize the documents first using the LLM (now the risk of hallucination in this step is added)?"

In my opinion the best approach is to take a large document and break it down into chunks before storing as embeddings and only querying back the relevant passages (chunks).

> "What about that trick where you shrink a whole document of context down to the embedding space of a single token (which is how ChatGPT is remembering the previous conversations)"

Not sure I follow here but seems interesting if possible, do you have any references?

> "What about simply asking the LLM to craft its own search prompt to the DB given the user input, rather than returning articles that semantically match the query the closest? This would also make hybird search (keyword or bm25 + embeddings) more viable in the context of combining it with an LLM"

This is definitely doable but just adds to the overall processing/latency (if that is a concern).

gaogao3y ago

> What about simply asking the LLM to craft its own search prompt to the DB given the user input, rather than returning articles that semantically match the query the closest?

I played with that approach in this post - https://friend.computer/jekyll/update/2023/04/30/wikidata-ll.... "Craft a query" is nice as it gives you a very declarative intermediate state for debugging.

jeffchuber3y ago· 2 in thread

hi everyone, this is jeff from Chroma (mentioned in the article) - happy to answer any questions.

hartator3y ago

Is Chroma already trained or only trained in the supplied documents?

I can try to make a Ruby client.

jeffchuber3y ago

Chroma is not an LLM, it is "just" a database that you pass vectors into to search.

A Ruby client would be great. Our FastAPI spec makes this pretty easy - it's at localhost:8000/openapi.json when the docker backend is running.

pbhjpbhj3y ago· 2 in thread

>There is an important part of this prompt that is partially cut off from the image:

>> “If you don't know the answer, just say that you don't know, don't try to make up an answer”

It seems silly to make this part of the prompt rather than a separate parameter, surely we could design the response to be close to factual. Then run a checker to ascertain a score for the factuality of the output?

sudoappsOP3y ago

A lot of what prompting has turned into seems silly to me too, but it has shown to be effective (at least with GPT-4).

TeMPOraL3y ago

Only a month or two ago I found this ridiculous, but then my mental model of GPTs shifted and I don't think it's so stupid anymore.

Technobabble explanation: such "silly" additions are a natural way to emphasize certain dimensions of the latent space more than others, focusing the proximity search GPTs are doing.

Working model I've been getting some good mileage off: GPT-4 is like a 4 year old kid, that somehow managed to read half of the Internet. Sure, it kinda remembers and possibly understands a lot, but it still thinks like a 4 year old, has about as much attention span, and you need to treat it like a kid that age.

2 more replies

nico3y ago· 2 in thread

Can we build a model based purely on search?

The model searches until it finds an answer, including distance and resolution

Search is performed by a DB, the query then sub-queries LLMs on a tree of embeddings

Each coordinate of an embedding vector is a pair of coordinate and LLM

Like a dynamic dictionary, in which the definition for the word is an LLM trained on the word

Indexes become shortcuts to meanings that we can choose based on case and context

Does this exist already?

fzliu3y ago

Not sure what you mean by dynamic dictionary, but the embedding tree you mention is already freely available Milvus via the Annoy index.

nico3y ago

An entry in a dictionary is static text, ex:

per·snick·et·y: placing too much emphasis on trivial or minor details; fussy. "she's very persnickety about her food"

A dynamic entry could instead be an LLM what will answer things related to they word, ex:

What is the definition of persnickety?

How can I use it in a sentence?

What are some notable documents that include it?

Any famous quotes?

…

So each entry is an LLM trained mostly only on that keyword/concept definition

There are some that believe in smaller models: https://twitter.com/chai_research/status/1655649081035980802...

A_D_E_P_T3y ago· 1 in thread

"Infinite" is a technical term with a highly specific meaning.

In this case, it can't possibly be approached. It certainly can't be attained.

Borges' Library of Babel, which represents all possible combinations of letters that can fit into a 400-page book, only contains some 25^1312000 books. And the overwhelming majority of its books are full of gibberish. The amount of "knowledge" that a LLM can learn or describe is VERY strictly bounded and strictly finite. (This is perhaps its defining characteristic.)

I know this is pedantic, but I am a philosopher of mathematics and this is a matter that's rather important to me.

hartator3y ago

> I know this is pedantic, but I am a philosopher of mathematics and this is a matter that's rather important to me.

I don’t think this is pedantic. Words carry a specific meaning or what’s the point of words otherwise.

nadermx3y ago· 1 in thread

I think someone did this https://github.com/pashpashpash/vault-ai

xtracto3y ago

This looks pretty promising, will check out later. Thanks for sharing

chartpath3y ago

Search query expansion: https://en.wikipedia.org/wiki/Query_expansion

We've done this in NLP and search forever. I guess even SQL query planners and other things that automatically rewrite queries might count.

It's just that now the parameters seem squishier with a prompt interface. It's almost like we need some kind of symbolic structure again.

sudoappsOP3y ago

If you are wondering what the latest is on giving LLM's access to large amounts of data, I think this article is a good start. Seems like this is a space where there will be a ton of innovation so interested to learn what else is coming.

iot_devs3y ago

A similar idea is been developed in: https://github.com/pieroit/cheshire-cat

m3kw93y ago

This is like asking gpt to summarize what it found on Google, this is basically what bing does when you try to find stuff like hotels and other recent subjects. Not the revolution we are all expecting

flukeshott3y ago

I wonder how effectively compressed LLMs are going to become...

j / k navigate · click thread line to collapse

86 comments

62 comments · 15 top-level

ftxbro3y ago· 11 in thread

spacephysics3y ago

I don’t think we’re close to super human intelligence in the colloquial sense.

ChatGPT scrapes all the information given, then predicts the next token. It has no ability to understand what is truthful or correct. It’s as good as the data being fed to it.

If we say, “well, we’ll tell chatgpt what the correct sources of information are” that’s no better really. It’s not reasoning, it’s just a neutered data set.

I imagine they need to add something like chatgpt 4 has with live internet models or something else to get the next meaningful bump

I believe the next step that’s close is to have these train on less and less horsepower. If we can have these models run on a phone locally, oh boy that’s gonna be something

famouswaffles3y ago

GPT's already forgo the surface level statistically most likely next word for words that are more context appropriate. That's one of the biggest reasons they are so useful.

The truth is that functionally/technically, there's plenty left to squeeze. The bigger issue is that we're hitting a wall economically.

1 more reply

firecall3y ago

> ChatGPT scrapes all the information given, then predicts the next token. It has no ability to understand what is truthful or correct. It’s as good as the data being fed to it.

That is precisely true of Humans as well though! :-)

1 more reply

muskmusk3y ago

I agree with your general premise, but I think you left a couple of points of your list at the end:

it is obscenely expensive to keep training + there are other more low hanging fruit + you expect hardware to get better over time.

I don't think Altman is trying to fool anyone. Even if he were it wouldn't work. The competition is not that stupid and he knows that :)

All that being said you can bet your ass there will be a gpt5 :)

tyre3y ago

Or work on consistency within a scope. For example, it can't write a novel because it doesn't have object consistency. A character will be 15 years old then 28 years old three sentences later.

Or allow it database/API access so it can interpolate canonical information into its responses.

None of these have to do with scale of data (as far as I understand.) All of them are, in my opinion, higher ROI areas for development for LLM => AGI.

HarHarVeryFunny3y ago

These LLMs are trained to model humans - they are going to be penalized, not rewarded, if they generate outputs that disagree with the training data, whether due to being too dumb OR too smart.

sudoappsOP3y ago

nomel3y ago

This assumes that current neural networks topologies can "solve" intelligence. "Gains" could be a problem of missing subsystems, rather than missing data.

vidarh3y ago

joshspankit3y ago

woah3y ago

Maybe it gets twice as good each time you spend 10x more training it. In this case, you might indeed hit a wall at some point.

furyofantares3y ago· 10 in thread

sudoappsOP3y ago

b33j0r3y ago

I’m sure many of you have tried generating epic conversations from history. With work and luck, I’ve read stuff way better than college.

But 90% of the time, it’s two barely distinct personalities chatting back and forth:

Me: Hey brian, what do you think of AI?

Brian: It’s great!

Me: I’m so glad we agree.

Brian: Great, this increases the training weight of Brian agreeing with Brian to a much more accurate level!

Me: Agree!

b33j0r3y ago

I would likely have tried building a backyard ornithopter powered by mining explosives, if I had been alive during that period of experimentation.

chartpath3y ago

1 more reply

sebzim45003y ago

1 more reply

kordlessagain3y ago

Vector search with move tos and move aways based on feedback is much more than attaching a database…

fzliu3y ago

Encoder-decoder (attention) architectures still have a tough time with long-range dependencies, so even with longer context lengths, you'll still need a retrieval solution.

I agree that there's probably a better solution than pure embedding-based or mixed embedding/keyword search, but the "better" solution will still be based around semantics... aka embeddings.

mlyle3y ago

stavros3y ago

Is there any way to fine-tune GPT to make documentation a part of its training set, so you won't need embeddings? OpenAI lets you fine-tune GPT-3, but I don't know how well that works.

sudoappsOP3y ago

orasis3y ago· 10 in thread

One caveat about about embedding based retrieval is that there is no guarantee that the embedded documents will look like the query.

taberiand3y ago

Is that something easily handed off to a faster/cheaper LLM? I'm imagining something like running the main process through GPT-4 and hand of the hallucinations to GPT 3 turbo.

If you could spot the need for it while streaming a response you could possibly even have it ready ahead of time

d4rkp4ttern3y ago

Some people packaged this rather intuitive idea, named it Hyde (Hypothetical Document Embeddings) and wrote a paper about it —

https://arxiv.org/abs/2212.10496

Summary —

wasabi9910113y ago

>One caveat about about embedding based retrieval is that there is no guarantee that the embedded documents will look like the query.

Aleph Alpha provides an asymmetric embedding model which I believe is an attempt to resolve this issue (haven't looked into it much, just saw the entry in langchain's documentation)

rco87863y ago

> One trick is to have a LLM hallucinate a document based on the query

I'm not following why you would want to do this? At that point, just asking the LLM without any additional context would/should produce the same (inaccurate) results.

BoorishBears3y ago

You're not having the LLM answer from the hallucination, you're looking for the document that looks most similar to the hallucination and having it answer on that instead.

redskyluan3y ago

i have an opposite way on doing this. Tried to generate questions based on doc chunks and embedding on questions. It works perfect!

ck_one3y ago

How do you generate the questions and how do you make sure to not lose information?

E.g. Today I woke up at 9.am, had a light breakfast and then went on a run in Golden Gate Park.

What questions do you generate from this sentence?

1 more reply

selfhoster113y ago

This sounds like a fantastic approach. I will try this with my own LLM/search-and-retrieval projects.

orasis3y ago

Nice! Do you generate N questions so N embeddings per document or just one?

williamcotton3y ago

“We’re gonna need a bigger boat.”

Beltiras3y ago· 5 in thread

v3ss0n3y ago

https://deepai.org/publication/scaling-transformer-to-1m-tok...

Can this be implemented in current opensource models?

akvadrako3y ago

Can't you use fine-tuning for this?

A other option is to ask GPT to compress your tokens into a shorter prompt for itself.

RhodesianHunter3y ago

Or, at this rate, just wait 6 months.

Zetice3y ago

I don't think this rate is sustainable. [0]

[0] https://www.theverge.com/2023/4/14/23683084/openai-gpt-5-rum...

Beltiras3y ago

When I would have had to add another 2 batches of ~150,000 tokens.....

Der_Einzige3y ago· 3 in thread

I get annoyed by articles like this. Yes, it's cool to educate readers who aren't aware of embeddings/embeddings stores/vectorDB technologies that this is possible.

EForEndeavour3y ago

> What about that trick where you shrink a whole document of context down to the embedding space of a single token (which is how ChatGPT is remembering the previous conversations)

This is news to me. Where could I read about this trick?

1 more reply

sudoappsOP3y ago

> "Do you use the whole document as context directly? Do you summarize the documents first using the LLM (now the risk of hallucination in this step is added)?"

In my opinion the best approach is to take a large document and break it down into chunks before storing as embeddings and only querying back the relevant passages (chunks).

> "What about that trick where you shrink a whole document of context down to the embedding space of a single token (which is how ChatGPT is remembering the previous conversations)"

Not sure I follow here but seems interesting if possible, do you have any references?

This is definitely doable but just adds to the overall processing/latency (if that is a concern).

gaogao3y ago

> What about simply asking the LLM to craft its own search prompt to the DB given the user input, rather than returning articles that semantically match the query the closest?

jeffchuber3y ago· 2 in thread

hi everyone, this is jeff from Chroma (mentioned in the article) - happy to answer any questions.

hartator3y ago

Is Chroma already trained or only trained in the supplied documents?

I can try to make a Ruby client.

jeffchuber3y ago

Chroma is not an LLM, it is "just" a database that you pass vectors into to search.

A Ruby client would be great. Our FastAPI spec makes this pretty easy - it's at localhost:8000/openapi.json when the docker backend is running.

pbhjpbhj3y ago· 2 in thread

>There is an important part of this prompt that is partially cut off from the image:

>> “If you don't know the answer, just say that you don't know, don't try to make up an answer”

sudoappsOP3y ago

A lot of what prompting has turned into seems silly to me too, but it has shown to be effective (at least with GPT-4).

TeMPOraL3y ago

Only a month or two ago I found this ridiculous, but then my mental model of GPTs shifted and I don't think it's so stupid anymore.

Technobabble explanation: such "silly" additions are a natural way to emphasize certain dimensions of the latent space more than others, focusing the proximity search GPTs are doing.

2 more replies

nico3y ago· 2 in thread

Can we build a model based purely on search?

The model searches until it finds an answer, including distance and resolution

Search is performed by a DB, the query then sub-queries LLMs on a tree of embeddings

Each coordinate of an embedding vector is a pair of coordinate and LLM

Like a dynamic dictionary, in which the definition for the word is an LLM trained on the word

Indexes become shortcuts to meanings that we can choose based on case and context

Does this exist already?

fzliu3y ago

Not sure what you mean by dynamic dictionary, but the embedding tree you mention is already freely available Milvus via the Annoy index.

nico3y ago

An entry in a dictionary is static text, ex:

per·snick·et·y: placing too much emphasis on trivial or minor details; fussy. "she's very persnickety about her food"

A dynamic entry could instead be an LLM what will answer things related to they word, ex:

What is the definition of persnickety?

How can I use it in a sentence?

What are some notable documents that include it?

Any famous quotes?

…

So each entry is an LLM trained mostly only on that keyword/concept definition

There are some that believe in smaller models: https://twitter.com/chai_research/status/1655649081035980802...

A_D_E_P_T3y ago· 1 in thread

"Infinite" is a technical term with a highly specific meaning.

In this case, it can't possibly be approached. It certainly can't be attained.

I know this is pedantic, but I am a philosopher of mathematics and this is a matter that's rather important to me.

hartator3y ago

> I know this is pedantic, but I am a philosopher of mathematics and this is a matter that's rather important to me.

I don’t think this is pedantic. Words carry a specific meaning or what’s the point of words otherwise.

nadermx3y ago· 1 in thread

I think someone did this https://github.com/pashpashpash/vault-ai

xtracto3y ago

This looks pretty promising, will check out later. Thanks for sharing

chartpath3y ago

Search query expansion: https://en.wikipedia.org/wiki/Query_expansion

We've done this in NLP and search forever. I guess even SQL query planners and other things that automatically rewrite queries might count.

It's just that now the parameters seem squishier with a prompt interface. It's almost like we need some kind of symbolic structure again.

sudoappsOP3y ago

iot_devs3y ago

A similar idea is been developed in: https://github.com/pieroit/cheshire-cat

m3kw93y ago

flukeshott3y ago

I wonder how effectively compressed LLMs are going to become...

j / k navigate · click thread line to collapse