Show HN: ChatGPT and Document Parser = Ghost (opens in new tab)

(ghostextension.com)

57 pointsOstatnigrosh3y ago50 comments

I've always wanted to just upload a whole book to ChatGPT and ask questions. Obviously with the char limit that's impossible... So some buddies and I built Ghost. We have it limited to 5 pages for uploads for now, but plan on expanding the limit soon. Let me know what you guys think!

Show HN: ChatGPT and Document Parser = Ghost

(ghostextension.com)

57 pointsOstatnigrosh3y ago50 comments

50 comments

45 comments · 21 top-level

swalsh3y ago· 7 in thread

A better way to do this might be to use the embedding API. That allows you to upload a text corpus and to then get vectors. You can then calculate the cosign similarity for a search string on those to get relevant results of clustered text from the uploaded corpus.

TeMPOraL3y ago

I don't get why people bother with chat interface and textual prompts. The whole concept of "prompt engineering" sounds to me like a practical joke that got out of hand.

It's like, imagine there's a complex machine with large panels full of buttons and levers - and then, someone covered the panels with tapestry. Beautiful tapestry, showing artistic interpretations of things mundane and holy, trivialities of everyday life next to impossible dreams. And then, people were told the machine is to be operated by touching that tapestry, and that the artworks are the guide to understanding it and using it effectively. And then a whole religion formed around studying patterns in the tapestry. To me, prompt engineering is that religion.

There's an actual interface to the machine hidden under all the clever wordplay. A precise, formalized one. An interface that eats tokens and spits out probabilities. I just don't get why most talk - even seemingly specialist talk - about LLMs is ignoring it entirely, and focuses on the tapestry that's just obscuring the nature of the model, effectively making everything more difficult.

DebtDeflation3y ago

>The whole concept of "prompt engineering" sounds to me like a practical joke that got out of hand.

I was on a call this morning and heard someone refer to two of their team members as "Prompt Engineers" as if that were an actual role.

1 more reply

risyachka3y ago

Because everyone can use text interface without knowing how to configure the low level one.

1 more reply

bilsbie3y ago

Would you mind explaining this and maybe dumbing it down? Sounds useful

garblegarble3y ago

You can use models (OpenAI have some, there are other open-source self-hostable ones that are better if I recall correctly) that will take a sentence or a paragraph and spit out a vector. These vectors are called 'embeddings'

You then put those vectors in a vector database (e.g. pinecone, pgvector, chroma).

To run searches, you generate an embedding of the search term (could be the raw user search, could be something a model like ChatGPT was asked to transform the user's search into), then query the vector database for the n closest vectors. The trick is getting a model that generates good vectors for search (and transforming the user's query into some text that'd be useful vector(s) to search against). If feeding that into an LLM context, the next step is making sure that you get your prompt right, and don't overload the model with unrelated information (i.e. bad search results).

The key is that the vector representation embeds language concepts in how close vectors are to one another. An easy way to gain a feel for this is to look at single-word embeddings. Computerphile have a great episode on it[1]. You can take a vector for 'King', subtract the vector for 'Man' and add the vector for 'Woman' and the closest vector in that search will likely be 'Queen'. Scale up this idea to whole paragraphs (and larger vectors as a result).

LangChain has an example of searching a database of facts[2] (although I find their documentation pretty inaccessible - they explain their library, but don't step back from inside the weeds of what they're doing to really explain why / what's going on). Many of the features LangChain implements are distilling (or sometimes simply lifting and providing a toolkit to directly apply) LLM papers.

1: Computerphile Word Embeddings https://www.youtube.com/watch?v=gQddtTdmG_8

2: https://langchain.readthedocs.io/en/latest/use_cases/questio...

nico3y ago

+1 to this. Maybe even some basic code to share on how to use embeddings to query ChatGPT with bigger data sets. Like thousands of phone call transcriptions, hundreds of documents or millions of user reviews? Thank you!

ploppyploppy3y ago

https://platform.openai.com/docs/guides/embeddings

gigel823y ago· 6 in thread

Projects like these (using embeddings) are great, but what I'm looking for is something that can ingest an entire book (let's say a fiction book) then answer questions about the entire content (and not just by effectively doing a text search over your input, but actually "understanding" the entire contents of the book); I presume such a thing is not possible with ChatGPT (without fine-tuning), correct?

mnkm3y ago

What do you think about the responses generated by this:

https://www.konjer.xyz/the-alchemist (disclaimer: built by me)

What specifically is missing from the answers in your opinion?

gigel823y ago

That's pretty interesting but ideally, I'd be able to upload my own book (txt, pdf, epub) and interact with it. It's lacking implementation details so not sure if you use embeddings, fine tuning or a novel approach.

ElFitz3y ago

Could using GPT3 (davinci-003) to generate embeddings, then searching your vector database for relevant excerpts, then providing the results as context for the prompt lead to something close enough?

gigel823y ago

No. That works for documentation where you do text search and extract paragraphs around the results for "context".

I want it to understand a complete fiction book and tell me about how a character grows throughout their journey from chapter 1 to chapter 12 over 350 pages.

1 more reply

nico3y ago

Right now that’s not a use case supported out of the box by ChatGPT.

It also seems to be one of the most important limitations of ChatGPT, and a lot of people/teams are looking for solutions.

DebtDeflation3y ago

I work in consulting and this is literally the use case that every single client wants right now - the ability to ingest a corpus of documents into ChatGPT or similar and then have it generate responses based on natural language questions. Right now most people are faking it by running the search using some other tool like Solr/ES and then taking the snippets that are returned and assembling them into a prompt that gets passed to ChatGPT.

1 more reply

ilaksh3y ago· 5 in thread

5 pages fits in the context window. How exactly do you plan on expanding the limit? Without explanation we have to assume you haven't completely solved your core technical challenges.

In my testing the biggest challenges with using for example OpenAI embeddings with cosine similarity or something are A) figuring out the section breaks or right chunk size so that information stays in context and B) retrieving enough chunks to get the correct hit for a query without having too much extraneous information that confuses it.

I think that it's hard to make a parser that most optimally slices up arbitrary documents.

Since you have some larger documents preloaded I assume for those you have the embeddings search. But for user uploads you are skipping that now and just feeding all of the text extracted from the PDF into the prompt along with the query.

nathanwh3y ago

This explains why "What if I move out early?" for the sample document doesn't mention any of the information in the lease break section, which is definitely the most important section for moving out early. Whatever space they're projecting the question into doesn't capture that "lease break" and "moving out early" are synonyms.

ilaksh3y ago

It may only be retrieving the top N results with most similar embedding. If that answer is in the 3rd most similar chunk and it only fed 2 along with the query in the prompt, then GPT never got the information relevant to the question.

pablo246023y ago

From the website, it seems as though they are retrieving five chunks. Also looks like they split documents by paragraph sections, unless the paragraphs are small enough- then they put a couple of them together.

btbuildem3y ago

Same, the is lies in the details. You basically need a good semantic search in front of GPT to feed it the best context given the question.

nico3y ago

Any code or pseudo-code you could share that does something like that?

thewataccount3y ago· 2 in thread

This is not meant to be a critique just an open question to everyone trying it - does anyone find this to be more useful then just ctrl+f?

For compiling information or getting an immediate yes/no it's likely correct - but I found ctrl+f generally gets me there faster albiet with slightly more reading.

At least in the context of this lease agreement which does have everything well organized and uses carefully chosen keywords already.

shagie3y ago

It picks up some context questions that aren't there.

Consider the example question of "I won't be able to pay until the 9th of this month, will I get a fee?" - are you going to search for "fee"? There are 66 occurrences.

Modify the question to "If I pay on the 4th of the month, will there be any late fee?" and you get the correct answer too.

For the question "What restrictions are there on parties?" it appears to get that correctly answered. If you search for "party" you'll get 19 results that appear to be legal entity parties rather than the possibly noisy type.

rnk3y ago

I asked if I could use the rental as a foreign embassy location. It gave the reasonable answer, quoting the agreement that you could only use it as a private residence and you couldn't use it for other purposes.

aver4geredditor3y ago· 2 in thread

I didn't do any extensive testing but seems to be really useful. However, where can we see the privacy policy? People are probably going to upload some important and confidential documents so it's good to know how this data is being handled. The only thing I see is an asterisk after the 24 hour notice Also, the bot answer window may have 5 pages even if 1 page is enough for the answer, this may confise your users since they may think there's something else on other pages

OstatnigroshOP3y ago

Hey! This is a great point. So we delete the documents within 24 hours of upload, and have a limitation to 5 pages to cut our own costs as this is just a concept.

aver4geredditor3y ago

Could you please add a more detailed policy to your site? For example, who can see, use or access the uploaded documents in any other way and whether the documents are used to gather some data, analyze or sell it?

1 more reply

JadoJodo3y ago· 1 in thread

(This critique is unrelated to this project. It works as expected, OP, and looks good.)

How could one ever trust the output of ChatGPT?

This feels to me a bit like non-L5 autonomous driving: If I have to assist at all, it'd be easier to do it myself. In the same vein, for this project (and ChatGPT generally): Can I actually trust that the output from ChatGPT in answering my question about the document is factually correct?

e.g., If I hand it a home rental agreement legal document and ask "What is the late move out penalty if I am 10-minutes late in dropping off the keys?", it may give the correct answer. Or it may generate a plausible-sounding answer using the words in the document that is completely (or perhaps even just slightly) incorrect.

How could I possibly know without reading it myself?

ilaksh3y ago

If it's using a search then it is possible to identify the paragraphs with a number in the database along with the embeddings. Then once the similar chunks are retrieved, part of the prompt could be to return the paragraph, line numbers or exact quote(s) used to answer the question.

Yours is not a good example though because "10 minutes late" is never going to be in a document like that.

satvikpendem3y ago· 1 in thread

Ghost is a well known blogging platform so you might want to change the name.

This seems similar to ChatPDF.com (with a 200 page limit though, instead of the 5 page limit that you have, it seems) which I suppose we'll see a lot more competitors for as the ChatGPT API expands.

smashed3y ago

There is also Ghostscript which is a postscript/pdf library, and since the site is operating on PDF content, my initial thought were that they were somehow related.

OstatnigroshOP3y ago

Wow, thanks so much everyone for checking out Ghost! We are currently crashing because of all the traffic. Should be up and running in 30 minutes :)

aicharades3y ago

Here's a free notebook for map reduce summarization I created: https://www.wrotescan.com

It's byok. Keys are not persisted. You can choose chat-gpt-turbo or text-davinci.

Limit is 2.4M tokens per call, working to get higher too.

Luuucas3y ago

i would be great to have a little summary of the document you hit against the API like this extension: https://chrome.google.com/webstore/detail/chatgpt-suite-summ... (simply grab the prompts ;))

gitgud3y ago

This is exactly the application I was thinking of when I first used ChatGPT. Using AI to summarize complex legal documents, and be able to ask questions about the document.

Have you thought of even larger knowledge-bases? like entire legal systems etc...

Anyway, amazingly executed, nice work!

MollyRealized3y ago

Just so you're aware of its existence: https://www.wordtune.com/read

MetaCosm3y ago

What is the basic mechanic that is going on here? Searching the document then using it with one shot or multi-shot prompting?

user-3y ago

I wanted to try it, but the pdf i tried just got an error with no info so ¯\_(ツ)_/¯

simlevesque3y ago

I wish I could try it, it would help me at this very moment.

KerryJones3y ago

Love this idea -- also curious on privacy policy

tmaly3y ago

Got a 502 error, could not see the product

biblical_Arc3y ago

Self host possibility ?

MatthewB3y ago

Very nice. Very useful.

MWil3y ago

error uploading 5 page upload

pgmonstereater3y ago

Doesn't work

j / k navigate · click thread line to collapse

50 comments

45 comments · 21 top-level

swalsh3y ago· 7 in thread

TeMPOraL3y ago

I don't get why people bother with chat interface and textual prompts. The whole concept of "prompt engineering" sounds to me like a practical joke that got out of hand.

DebtDeflation3y ago

>The whole concept of "prompt engineering" sounds to me like a practical joke that got out of hand.

I was on a call this morning and heard someone refer to two of their team members as "Prompt Engineers" as if that were an actual role.

1 more reply

risyachka3y ago

Because everyone can use text interface without knowing how to configure the low level one.

1 more reply

bilsbie3y ago

Would you mind explaining this and maybe dumbing it down? Sounds useful

garblegarble3y ago

You then put those vectors in a vector database (e.g. pinecone, pgvector, chroma).

1: Computerphile Word Embeddings https://www.youtube.com/watch?v=gQddtTdmG_8

2: https://langchain.readthedocs.io/en/latest/use_cases/questio...

nico3y ago

ploppyploppy3y ago

https://platform.openai.com/docs/guides/embeddings

gigel823y ago· 6 in thread

mnkm3y ago

What do you think about the responses generated by this:

https://www.konjer.xyz/the-alchemist (disclaimer: built by me)

What specifically is missing from the answers in your opinion?

gigel823y ago

ElFitz3y ago

Could using GPT3 (davinci-003) to generate embeddings, then searching your vector database for relevant excerpts, then providing the results as context for the prompt lead to something close enough?

gigel823y ago

No. That works for documentation where you do text search and extract paragraphs around the results for "context".

I want it to understand a complete fiction book and tell me about how a character grows throughout their journey from chapter 1 to chapter 12 over 350 pages.

1 more reply

nico3y ago

Right now that’s not a use case supported out of the box by ChatGPT.

It also seems to be one of the most important limitations of ChatGPT, and a lot of people/teams are looking for solutions.

DebtDeflation3y ago

1 more reply

ilaksh3y ago· 5 in thread

5 pages fits in the context window. How exactly do you plan on expanding the limit? Without explanation we have to assume you haven't completely solved your core technical challenges.

I think that it's hard to make a parser that most optimally slices up arbitrary documents.

nathanwh3y ago

ilaksh3y ago

pablo246023y ago

btbuildem3y ago

Same, the is lies in the details. You basically need a good semantic search in front of GPT to feed it the best context given the question.

nico3y ago

Any code or pseudo-code you could share that does something like that?

thewataccount3y ago· 2 in thread

This is not meant to be a critique just an open question to everyone trying it - does anyone find this to be more useful then just ctrl+f?

For compiling information or getting an immediate yes/no it's likely correct - but I found ctrl+f generally gets me there faster albiet with slightly more reading.

At least in the context of this lease agreement which does have everything well organized and uses carefully chosen keywords already.

shagie3y ago

It picks up some context questions that aren't there.

Consider the example question of "I won't be able to pay until the 9th of this month, will I get a fee?" - are you going to search for "fee"? There are 66 occurrences.

Modify the question to "If I pay on the 4th of the month, will there be any late fee?" and you get the correct answer too.

rnk3y ago

aver4geredditor3y ago· 2 in thread

OstatnigroshOP3y ago

Hey! This is a great point. So we delete the documents within 24 hours of upload, and have a limitation to 5 pages to cut our own costs as this is just a concept.

aver4geredditor3y ago

1 more reply

JadoJodo3y ago· 1 in thread

(This critique is unrelated to this project. It works as expected, OP, and looks good.)

How could one ever trust the output of ChatGPT?

How could I possibly know without reading it myself?

ilaksh3y ago

Yours is not a good example though because "10 minutes late" is never going to be in a document like that.

satvikpendem3y ago· 1 in thread

Ghost is a well known blogging platform so you might want to change the name.

This seems similar to ChatPDF.com (with a 200 page limit though, instead of the 5 page limit that you have, it seems) which I suppose we'll see a lot more competitors for as the ChatGPT API expands.

smashed3y ago

There is also Ghostscript which is a postscript/pdf library, and since the site is operating on PDF content, my initial thought were that they were somehow related.

OstatnigroshOP3y ago

Wow, thanks so much everyone for checking out Ghost! We are currently crashing because of all the traffic. Should be up and running in 30 minutes :)

aicharades3y ago

Here's a free notebook for map reduce summarization I created: https://www.wrotescan.com

It's byok. Keys are not persisted. You can choose chat-gpt-turbo or text-davinci.

Limit is 2.4M tokens per call, working to get higher too.

Luuucas3y ago

i would be great to have a little summary of the document you hit against the API like this extension: https://chrome.google.com/webstore/detail/chatgpt-suite-summ... (simply grab the prompts ;))

gitgud3y ago

This is exactly the application I was thinking of when I first used ChatGPT. Using AI to summarize complex legal documents, and be able to ask questions about the document.

Have you thought of even larger knowledge-bases? like entire legal systems etc...

Anyway, amazingly executed, nice work!

MollyRealized3y ago

Just so you're aware of its existence: https://www.wordtune.com/read

MetaCosm3y ago

What is the basic mechanic that is going on here? Searching the document then using it with one shot or multi-shot prompting?

user-3y ago

I wanted to try it, but the pdf i tried just got an error with no info so ¯\_(ツ)_/¯

simlevesque3y ago

I wish I could try it, it would help me at this very moment.

KerryJones3y ago

Love this idea -- also curious on privacy policy