It's like, imagine there's a complex machine with large panels full of buttons and levers - and then, someone covered the panels with tapestry. Beautiful tapestry, showing artistic interpretations of things mundane and holy, trivialities of everyday life next to impossible dreams. And then, people were told the machine is to be operated by touching that tapestry, and that the artworks are the guide to understanding it and using it effectively. And then a whole religion formed around studying patterns in the tapestry. To me, prompt engineering is that religion.
There's an actual interface to the machine hidden under all the clever wordplay. A precise, formalized one. An interface that eats tokens and spits out probabilities. I just don't get why most talk - even seemingly specialist talk - about LLMs is ignoring it entirely, and focuses on the tapestry that's just obscuring the nature of the model, effectively making everything more difficult.
I was on a call this morning and heard someone refer to two of their team members as "Prompt Engineers" as if that were an actual role.
You then put those vectors in a vector database (e.g. pinecone, pgvector, chroma).
To run searches, you generate an embedding of the search term (could be the raw user search, could be something a model like ChatGPT was asked to transform the user's search into), then query the vector database for the n closest vectors. The trick is getting a model that generates good vectors for search (and transforming the user's query into some text that'd be useful vector(s) to search against). If feeding that into an LLM context, the next step is making sure that you get your prompt right, and don't overload the model with unrelated information (i.e. bad search results).
The key is that the vector representation embeds language concepts in how close vectors are to one another. An easy way to gain a feel for this is to look at single-word embeddings. Computerphile have a great episode on it[1]. You can take a vector for 'King', subtract the vector for 'Man' and add the vector for 'Woman' and the closest vector in that search will likely be 'Queen'. Scale up this idea to whole paragraphs (and larger vectors as a result).
LangChain has an example of searching a database of facts[2] (although I find their documentation pretty inaccessible - they explain their library, but don't step back from inside the weeds of what they're doing to really explain why / what's going on). Many of the features LangChain implements are distilling (or sometimes simply lifting and providing a toolkit to directly apply) LLM papers.
1: Computerphile Word Embeddings https://www.youtube.com/watch?v=gQddtTdmG_8
2: https://langchain.readthedocs.io/en/latest/use_cases/questio...
https://www.konjer.xyz/the-alchemist (disclaimer: built by me)
What specifically is missing from the answers in your opinion?
I want it to understand a complete fiction book and tell me about how a character grows throughout their journey from chapter 1 to chapter 12 over 350 pages.
It also seems to be one of the most important limitations of ChatGPT, and a lot of people/teams are looking for solutions.
In my testing the biggest challenges with using for example OpenAI embeddings with cosine similarity or something are A) figuring out the section breaks or right chunk size so that information stays in context and B) retrieving enough chunks to get the correct hit for a query without having too much extraneous information that confuses it.
I think that it's hard to make a parser that most optimally slices up arbitrary documents.
Since you have some larger documents preloaded I assume for those you have the embeddings search. But for user uploads you are skipping that now and just feeding all of the text extracted from the PDF into the prompt along with the query.
For compiling information or getting an immediate yes/no it's likely correct - but I found ctrl+f generally gets me there faster albiet with slightly more reading.
At least in the context of this lease agreement which does have everything well organized and uses carefully chosen keywords already.
Consider the example question of "I won't be able to pay until the 9th of this month, will I get a fee?" - are you going to search for "fee"? There are 66 occurrences.
Modify the question to "If I pay on the 4th of the month, will there be any late fee?" and you get the correct answer too.
For the question "What restrictions are there on parties?" it appears to get that correctly answered. If you search for "party" you'll get 19 results that appear to be legal entity parties rather than the possibly noisy type.
How could one ever trust the output of ChatGPT?
This feels to me a bit like non-L5 autonomous driving: If I have to assist at all, it'd be easier to do it myself. In the same vein, for this project (and ChatGPT generally): Can I actually trust that the output from ChatGPT in answering my question about the document is factually correct?
e.g., If I hand it a home rental agreement legal document and ask "What is the late move out penalty if I am 10-minutes late in dropping off the keys?", it may give the correct answer. Or it may generate a plausible-sounding answer using the words in the document that is completely (or perhaps even just slightly) incorrect.
How could I possibly know without reading it myself?
Yours is not a good example though because "10 minutes late" is never going to be in a document like that.
This seems similar to ChatPDF.com (with a 200 page limit though, instead of the 5 page limit that you have, it seems) which I suppose we'll see a lot more competitors for as the ChatGPT API expands.
It's byok. Keys are not persisted. You can choose chat-gpt-turbo or text-davinci.
Limit is 2.4M tokens per call, working to get higher too.
Have you thought of even larger knowledge-bases? like entire legal systems etc...
Anyway, amazingly executed, nice work!