ArxivGPT: Chrome extension that summarizes arxived research papers using ChatGPT (opens in new tab)

(github.com)

154 pointsexcerionsforte3y ago104 comments

104 comments

64 comments · 20 top-level

herbst3y ago· 10 in thread

Ive used ChatGPT to make short factual videos for YouTube and honestly it's a bit worrying with supposed 'facts'

I would not suggest anyone to use ChatGPT outputs for actual knowledge at this point.

You can use it for that, but you have to test its claims. It's specially helpful with questions that would take a lot of research to find the answer, but which are easier to test. That could be a good thing overall. If people get used to always testing everything anyone tells them, people will be a lot better informed.

visarga3y ago

I think LLM checking is going to be the most important research direction this year. Generative models are worthless without verification. It relates to deciding truth and dealing with fake information, synthetic text and spam. Google hasn't been able to solve it in the last decade, but some people have an idea:

> "Discovering Latent Knowledge in Language Models Without Supervision" Existing techniques for training language models can be misaligned with the truth: if we train models with imitation learning, they may reproduce errors that humans make; if we train them to generate text that humans rate highly, they may output errors that human evaluators can't detect. We propose circumventing this issue by directly finding latent knowledge inside the internal activations of a language model in a purely unsupervised way.

https://arxiv.org/abs/2212.03827

In other words the model already tries to predict the truth because it is useful in next token prediction, but we need to find a way to detect the 'truth alignment' in its activations.

julius3y ago

I asked it how to do something with an AWS cli tool. ChatGPT invented a new parameter that looked like a programmer would come up with it and it would do exactly what I was looking for (I assumed many people had my problem before).

aws cloudfront update-distribution --id <distribution-id> --distribution-config <new-config> --no-reset-origin-access-identity

Took me a while to figure out that the parameter --no-reset-origin-access-identity was not only not working. But it did never exist on any version of the cli tool.

mtmail3y ago

Same here. ChatGPT invented a feature of our API and wrote an example script. User then complained to us that it's not working.

belter3y ago

I asked it to create a COBOL program to connect to AWS and create an S3 Bucket...

https://news.ycombinator.com/item?id=33991767

te_chris3y ago

Did you not think to check the docs? (Srs question)

2 more replies

williamcotton3y ago

ChatGPT works best with analytic prompts that respond with translations and not synthesis.

Meaning, if all the facts exist in the prompt then the likelihood of synthesizing fiction is diminished.

There are a number of ways to use the principle of analytic augmentation to add most or if not all of the facts required for a truthful response, ranging from simple “prompt engineering” to evaluating code to document embedding in latent space.

For example, if you use prompt engineering to k-shot a task to turn math word problems into executable JavaScript, meaning LLMs are only translators and the computations are done by a software interpreter, then the results are much more likely to be truthful.

Sampling from a number of variations on a prompt can lead to a more accurate outcome if say 1:10 times the translation attempt has a different answer.

Baeocystin3y ago

I think getting in to the habit of fact-checking everything LLMs come up with when used to generate is probably an excellent idea. That being said, I haven't really seen any confabulation when using it to summarize a supplied text.

magicalhippo3y ago

I've not used ChatGPT yet (almost feel like a luddite), what happens when you tell it that it's wrong? Will it get it right after a retry or two?

mepian3y ago

Sometimes by accident it will, but most of the time it keeps making things up with no regard to factual and logical correctness. It has no notion of truth.

m3kw93y ago· 9 in thread

This is dangerous because people that has no knowledge of the science would blindly trust whatever it summarizes. There is no way to verify, an example is if you ask to summarize a book that you have understanding of the subject, at least you can sense some bs or open up the book at verify a few points. Here you would be at GPT3 mercy

newswasboring3y ago

I'm so tired of all discussions on LLMs starting with "this is dangerous" or forms of this. First because at some point this discourages people from even attempting cool things with LLMs and second because it really stalls the discussion. We are all aware that there is a hallucination issue in LLMs, so what do we do about this? What's your proposal? If it's just "don't do it", I don't think that's useful. I think if we were following the true spirit of HN, we would be giving suggestions on what to change. Just a suggestion to add disclaimers everywhere would be better than these "it's dangerous" comments. Not everything needs to be perfect, for a hobby project not everything even needs to be working. These comments are just discouraging for no real reason.

Edit: [*] 'we' in my comment here is indicating the HN community, not entirety of humanity.

almost3y ago

I feel the same way about “chainsaw juggling for babies” classes at daycare centres. People are so quick to jump to “that’s dangerously” or “ouch you cut off my arm” rather than engaging with the subject and suggesting ways the babies can be better taught to not decapitate their carers.

1 more reply

swatcoder3y ago

Some of us respond to uses that align with the technology’s strengths with excitement and encouragement, and to uses that rely on its weaknesses with criticism and warning.

That seems useful, prudent, and completely in line with the spirit if a community like HN.

There’s no more reason that every critique should come with a “proposal” than that every cheer should come with some kind of admonition. As a community, multiple points of view are expressed and developed simultaneously.

Of course, some of points of view might personally frustrate you or leave you feeling like you don’t know how to respond to them. But is that so bad? Does it need to be squelched just because you don’t enjoy it?

1 more reply

rat99883y ago

While I agree with the sentiment, I don't think there is a "true" spirit of HN. His sentiment as much as yours are in its spirit.

1 more reply

m3kw93y ago

If I were to suggest something it would be to wait for OpenAI to get their stuff together before creating a summarizing bot which uses a model it doesn’t own.

bilsbie3y ago

This already happens when the popular press reports on papers. Can’t be any worse.

dekken_3y ago

> This is dangerous because people that has no knowledge of the science would blindly trust whatever it summarizes.

This says more about you than the hypothetical "people" you are talking about.

kurisufag3y ago

I don't think so. The average person already thinks (Chat)GPT is an all-knowing AGI homework-solver, and the problem only worsens if you add the airs of "science" to the situation.

1 more reply

m3kw93y ago

Realy? The “You are projecting” replies. That’s cute

1 more reply

goldemerald3y ago· 7 in thread

The example here is a bit worrying for the peer review process. I am not looking forward to my "peers" reviewing my paper by putting it through LLMs and blindly copy pasting the output. I can already imagine emailing the Area Chair and saying "While reviewer 2 is detailed, the questions show a severe lack of basic understanding. We believe the contents are AI generated."

Then again, perhaps LLMs could simply be incorporated into the peer-review process, where after submitting your paper, you'd have to answer the AI's basic questions. As a reviewer, I could imagine a structured AI report for a paper being helpful in guiding discussion: "The paper compares to recent approaches X, Y, and Z. And the work is indeed novel."

asabla3y ago

I like the way you flipped it around.

I'm a big believer of using all these new AI services/programs as tools to enhance my workflows, not replacing them.

tmalsburg23y ago

"The best model was truthful on 58% of questions, while human performance was 94%."

https://arxiv.org/abs/2109.07958

4 more replies

sorokod3y ago

How would innacurate summary enhance your workflow?

Or same question in a different way, what sort of workflow would be enhanced by an innacurate summary?

blackbear_3y ago

I've seen so many s*ht reviews already. Perhaps, just perhaps, a LLM would make better ones.

ramraj073y ago

If you generate a question using an LLM, what’s to stop me from answering it using an LLM? And who verifies if the answer is correct? An LLM?

Maken3y ago

Finally, full automation of the review process is here.

ModernMech3y ago

In the future, no one knows how to do anything anymore, and it’s LLMs all the way down.

aix13y ago· 4 in thread

Isn't this what abstracts are for?

stevenrj3y ago

Not only that, it appears to only feed the abstract to ChatGPT anyway.

shusaku3y ago

As far as I can tell, it just rephrases the abstracts...

muzani3y ago

One of my first serious uses for GPT is summarizing the abstracts too.

elashri3y ago

I don't think it is seroud. Abstract are usually tends to be a short summary. In many cases it is a short paragraph. I was skimming on arxiv before I opened HN now and saw a couple of abstracts that are 4-5 sentences. Are ChatGPT going to write a one sentence summary or will the summary of the summary be comparable or longer?

2 more replies

meghan_rain3y ago· 4 in thread

What if the paper is longer than what fits in a prompt?

voxelghost3y ago

You can ask chatGPT about arxiv.org papers if you have the doi and the full title if the paper is from 2021 or earlier

alexanderdavide3y ago

This renders the extension pretty useless for older papers, doesn't it? There doesn't seem to be a fallback to prompt older papers in a whole because it exceeds the prompt limit.

1 more reply

fassssst3y ago

bing.com/new doesn’t have a length restriction, and the Edge sidebar version can do it like this.

Disclosure: I work at MSFT but not on Bing

meghan_rain3y ago

It must have some length restriction though, no? A GPT always has a max context window if I understand correctly. It's likely Bing has access to models with a much larger context window than what is accessibly through the OpenAI API.

juliangmp3y ago· 3 in thread

Man if only the authors of such papers would write small summaries about the content and put them in the papers themselves or something

throwaway374953y ago

If only the abstracts they write were more often than not accurate and not attention-grabbing problem-ignoring pieces of crap that do not reflect the content of the actual paper, hoping investors/newspapers won't actually read the paper.

This is a conflict of incentives. Whereas ArxivGPT has no reason not to tell the problems first.

Al-Khwarizmi3y ago

You're right that abstracts often have a positive bias that can sometimes border on dishonesty, but for most papers it's not a matter of investors or newspapers, but rather of getting the paper accepted to the conference or journal, especially if it's a highly selective one. It happens even in fields where there is no direct industry interest.

1 more reply

krona3y ago

Wouldn't that be breaking academic kayfabe?

ggm3y ago· 3 in thread

Don't trust the output to reflect what's in the paper.

tmalsburg23y ago

In my experience, it's almost safe to assume that the LLM summary will misrepresent the content in some way.

Maken3y ago

Don't expect the paper to reflect its output either.

amelius3y ago

Especially if the paper is about LLMs.

alexanderdavide3y ago· 2 in thread

I'm wondering about two things related to development of customer-facing programs that use paid APIs in the background:

Are you using your own API key and pay for the usage? How can you justify operation of programs that produce high costs but no income? Isn't the API publicly exposed to the client-side and possible subject of theft and abuse?

m4lvin3y ago

The extension here does not include an API key. You either need to log in using an account or provide your own API key in its settings.

Abecid3y ago

It asks the user to sign in to chat.openai.com

irthomasthomas3y ago· 1 in thread

Caution. Language models do not know what is salient to a human. They also have a strong bias toward information that they have seen frequently. Research will contain a larger amount of new information and it's that new information which is most valuable to us, but least relevant according to the models.

FartyMcFarter3y ago

They're also known to be unreliable at simple logical inference: https://news.ycombinator.com/item?id=33859482

iamflimflam13y ago· 1 in thread

Would love something that would translate papers from academic speech into something I can enjoy reading.

voxelghost3y ago

Then https://www.explainpaper.com/ is probably much more to your liking.

wonderfuly3y ago

Great work! Glad to see innovations based on my work https://github.com/wong2/chatgpt-google-extension That's why I open sourced the code!

voxelghost3y ago

For my paper, it just straight up invented most of the "relevant references" section

JangoSteve3y ago

I think my favorite part of this prompt is that it starts with, "Please..."

With this new class of products based on crafting prompts that best exploit a GPT's algorithm and training data, are we going to start seeing pull requests that tweak individual parts or words of the prompt. I'm also curious how the test suite for projects like this would look for specific facts or phrases to be contained in the responses for specific inputs.

newswasboring3y ago

To make it work in brave we need to turn off language based finger printing [1]. I wonder how's that related.

Edit: btw, congratulations on the release. This is the kind of stuff I think should be explored more using LLMs. Great choice on making a chrome extension, it's great UI for this kind of thing.

[1] https://github.com/hunkimForks/chatgpt-arxiv-extension#how-t...

inductive_magic3y ago

This isn't viable because of bias and blatant lies in LLM-outputs.

https://huggingface.co/ml6team/keyphrase-extraction-kbir-ins... is a decent tool to explore the constant stream of publications. The last mile still is left to the human.

f_allwein3y ago

I don’t understand. Don’t research papers usually have a summary, provided by the authors?

seaucre3y ago

The related references don't appear to be real.

2Gkashmiri3y ago

firefox extension dude

amitport3y ago

Also works on Edge

dankle3y ago

Is it good?

j / k navigate · click thread line to collapse

104 comments

64 comments · 20 top-level

herbst3y ago· 10 in thread

Ive used ChatGPT to make short factual videos for YouTube and honestly it's a bit worrying with supposed 'facts'

I would not suggest anyone to use ChatGPT outputs for actual knowledge at this point.

aflag3y ago

visarga3y ago

https://arxiv.org/abs/2212.03827

In other words the model already tries to predict the truth because it is useful in next token prediction, but we need to find a way to detect the 'truth alignment' in its activations.

julius3y ago

aws cloudfront update-distribution --id <distribution-id> --distribution-config <new-config> --no-reset-origin-access-identity

Took me a while to figure out that the parameter --no-reset-origin-access-identity was not only not working. But it did never exist on any version of the cli tool.

mtmail3y ago

Same here. ChatGPT invented a feature of our API and wrote an example script. User then complained to us that it's not working.

belter3y ago

I asked it to create a COBOL program to connect to AWS and create an S3 Bucket...

https://news.ycombinator.com/item?id=33991767

te_chris3y ago

Did you not think to check the docs? (Srs question)

2 more replies

williamcotton3y ago

ChatGPT works best with analytic prompts that respond with translations and not synthesis.

Meaning, if all the facts exist in the prompt then the likelihood of synthesizing fiction is diminished.

Sampling from a number of variations on a prompt can lead to a more accurate outcome if say 1:10 times the translation attempt has a different answer.

Baeocystin3y ago

magicalhippo3y ago

I've not used ChatGPT yet (almost feel like a luddite), what happens when you tell it that it's wrong? Will it get it right after a retry or two?

mepian3y ago

Sometimes by accident it will, but most of the time it keeps making things up with no regard to factual and logical correctness. It has no notion of truth.

m3kw93y ago· 9 in thread

newswasboring3y ago

Edit: [*] 'we' in my comment here is indicating the HN community, not entirety of humanity.

almost3y ago

1 more reply

swatcoder3y ago

Some of us respond to uses that align with the technology’s strengths with excitement and encouragement, and to uses that rely on its weaknesses with criticism and warning.

That seems useful, prudent, and completely in line with the spirit if a community like HN.

1 more reply

rat99883y ago

While I agree with the sentiment, I don't think there is a "true" spirit of HN. His sentiment as much as yours are in its spirit.

1 more reply

m3kw93y ago

If I were to suggest something it would be to wait for OpenAI to get their stuff together before creating a summarizing bot which uses a model it doesn’t own.

bilsbie3y ago

This already happens when the popular press reports on papers. Can’t be any worse.

dekken_3y ago

> This is dangerous because people that has no knowledge of the science would blindly trust whatever it summarizes.

This says more about you than the hypothetical "people" you are talking about.

kurisufag3y ago

I don't think so. The average person already thinks (Chat)GPT is an all-knowing AGI homework-solver, and the problem only worsens if you add the airs of "science" to the situation.

1 more reply

m3kw93y ago

Realy? The “You are projecting” replies. That’s cute

1 more reply

goldemerald3y ago· 7 in thread

asabla3y ago

I like the way you flipped it around.

I'm a big believer of using all these new AI services/programs as tools to enhance my workflows, not replacing them.

tmalsburg23y ago

"The best model was truthful on 58% of questions, while human performance was 94%."

https://arxiv.org/abs/2109.07958

4 more replies

sorokod3y ago

How would innacurate summary enhance your workflow?

Or same question in a different way, what sort of workflow would be enhanced by an innacurate summary?

blackbear_3y ago

I've seen so many s*ht reviews already. Perhaps, just perhaps, a LLM would make better ones.

ramraj073y ago

If you generate a question using an LLM, what’s to stop me from answering it using an LLM? And who verifies if the answer is correct? An LLM?

Maken3y ago

Finally, full automation of the review process is here.

ModernMech3y ago

In the future, no one knows how to do anything anymore, and it’s LLMs all the way down.

aix13y ago· 4 in thread

Isn't this what abstracts are for?

stevenrj3y ago

Not only that, it appears to only feed the abstract to ChatGPT anyway.

shusaku3y ago

As far as I can tell, it just rephrases the abstracts...

muzani3y ago

One of my first serious uses for GPT is summarizing the abstracts too.

elashri3y ago

2 more replies

meghan_rain3y ago· 4 in thread

What if the paper is longer than what fits in a prompt?

voxelghost3y ago

You can ask chatGPT about arxiv.org papers if you have the doi and the full title if the paper is from 2021 or earlier

alexanderdavide3y ago

This renders the extension pretty useless for older papers, doesn't it? There doesn't seem to be a fallback to prompt older papers in a whole because it exceeds the prompt limit.

1 more reply

fassssst3y ago

bing.com/new doesn’t have a length restriction, and the Edge sidebar version can do it like this.

Disclosure: I work at MSFT but not on Bing

meghan_rain3y ago

juliangmp3y ago· 3 in thread

Man if only the authors of such papers would write small summaries about the content and put them in the papers themselves or something

throwaway374953y ago

This is a conflict of incentives. Whereas ArxivGPT has no reason not to tell the problems first.

Al-Khwarizmi3y ago

1 more reply

krona3y ago

Wouldn't that be breaking academic kayfabe?

ggm3y ago· 3 in thread

Don't trust the output to reflect what's in the paper.

tmalsburg23y ago

In my experience, it's almost safe to assume that the LLM summary will misrepresent the content in some way.

Maken3y ago

Don't expect the paper to reflect its output either.

amelius3y ago

Especially if the paper is about LLMs.

alexanderdavide3y ago· 2 in thread

I'm wondering about two things related to development of customer-facing programs that use paid APIs in the background:

m4lvin3y ago

The extension here does not include an API key. You either need to log in using an account or provide your own API key in its settings.

Abecid3y ago

It asks the user to sign in to chat.openai.com

irthomasthomas3y ago· 1 in thread

FartyMcFarter3y ago

They're also known to be unreliable at simple logical inference: https://news.ycombinator.com/item?id=33859482

iamflimflam13y ago· 1 in thread

Would love something that would translate papers from academic speech into something I can enjoy reading.

voxelghost3y ago

Then https://www.explainpaper.com/ is probably much more to your liking.

wonderfuly3y ago

Great work! Glad to see innovations based on my work https://github.com/wong2/chatgpt-google-extension That's why I open sourced the code!

voxelghost3y ago

For my paper, it just straight up invented most of the "relevant references" section

JangoSteve3y ago

I think my favorite part of this prompt is that it starts with, "Please..."

newswasboring3y ago

To make it work in brave we need to turn off language based finger printing [1]. I wonder how's that related.

Edit: btw, congratulations on the release. This is the kind of stuff I think should be explored more using LLMs. Great choice on making a chrome extension, it's great UI for this kind of thing.

[1] https://github.com/hunkimForks/chatgpt-arxiv-extension#how-t...

inductive_magic3y ago

This isn't viable because of bias and blatant lies in LLM-outputs.

https://huggingface.co/ml6team/keyphrase-extraction-kbir-ins... is a decent tool to explore the constant stream of publications. The last mile still is left to the human.

f_allwein3y ago

I don’t understand. Don’t research papers usually have a summary, provided by the authors?

seaucre3y ago

The related references don't appear to be real.

2Gkashmiri3y ago

firefox extension dude

amitport3y ago

Also works on Edge

dankle3y ago

Is it good?

j / k navigate · click thread line to collapse