undefined | Better HN

0 pointsSOLAR_FIELDS2y ago0 comments

I read this all the time and yet no one can seem to come up with even a few questions from several months ago that ChatGPT has become “worse” at. You would think if this is happening it would be very easy to produce such evidence since chat history of all conversations is stored by default.

0 comments

14 comments · 7 top-level

edgyquant2y ago· 2 in thread

Everytime it’s mentioned someone says this and other users provide examples. Maybe you just don’t care about those examples

SOLAR_FIELDSOP2y ago

Care to share these examples, in a scientific (n > 30) manner that can’t just be attributed to model nondeterminism? I don’t follow these threads religiously but in the ones I’ve seen no one has been able to provide any sort of convincing evidence. I’m not some sort of OpenAI apologist, so if there is actual good provable evidence here I will easily change my mind about it

sebzim45002y ago

I don't see how anyone could provide what you are asking for. I can go through my chat history and find a prompt that got a better answer 3 months ago than I get now, but you can always just say it's nondeterminism.

Without access to the old model, I can't collect samples with n > 1

bondarchuk2y ago· 2 in thread

Here's a specific example https://news.ycombinator.com/item?id=37533417

SOLAR_FIELDSOP2y ago

Pointing out a specific bug with functionality is not the same as saying “in general the quality of GPT answers has decreased over X months” especially when that bug is in a realm that LLM’s have already been provably bad at.

bondarchuk2y ago

You're moving the goalposts.

HenryBemis2y ago· 1 in thread

Here is one. I ask it to write some code. 4-5 pages long. With some back & forth it does. Then I ask "change lines 50-65 from blue to red", and it does (change#1). I ask it to show me the full code. Then I ask "change lines 100-120 from yellow to green". Aaaaand it makes the change#2 and revokes the change#1. Oh!! the amount of times this has happened.. So now I ask it to make a change, I do it by 'paragraph' and I copy & paste the new paragraph. It's annoying, but still makes things faster.

phkahler2y ago

I haven't used it, but can't you just say "OK, use that as the new baseline from here on." Or something similar?

dmm2y ago· 1 in thread

OpenAI regularly changes the model and they admit the new models are more restricted, in the sense that they prevent tricky prompts from producing naughty words, etc.

It should be their responsibility to prove that it's just as capable.

SOLAR_FIELDSOP2y ago

He who makes the logical argument must provide the burden of proof. Did OpenAI claim that their models didn’t regress while putting these new safeguards into place? If not, it feels like the burden of proof lies on whoever said that they did.

To be specific, the claim we are talking about here is “ChatGPT gives generally worse answers to the exact same questions than ChatGPT gave X months ago”. Perhaps for the subset of knowledge space you reference that updates were pushed to that is pretty easily provably true, but I’m more interested in the general case.

In other words, you can pretty easily make the claim that ChatGPT got worse at telling me how to make a weapon than it did 3 months ago. I could pretty easily believe that and also accept that it was probably intentional. While we can debate whether it was a good idea or not, I’m more interested in the claim over whether ChatGPT got worse at summarizing some famous novel or helping write a presentation than it was 3 months ago.

tessierashpool2y ago· 1 in thread

> I read this all the time and yet no one can seem to come up with even a few questions from several months ago that ChatGPT has become “worse” at

this could just mean that people do not have time to argue with strangers

SOLAR_FIELDSOP2y ago

Well, sure, but shouldn’t some pedant have the time to dig up their ChatGPT history from 4 months ago to disprove the claim? Seems like it would be pretty easy to do and there are plenty of pedants on the internet but I don’t see the blogosphere awash of side by side comparisons showing how much worse it got

birracerveza2y ago

It's probably just subjective bias, once the novelty wears off you learn not to rely on it as much because sometimes it's very difficult to get what you specifically want, so in my personal experience I ended up using it less and less to avoid butting heads with it, to the point I disabled my subscription altogether. YMMV of course.

xfz2y ago

One example: it now refuses to summarise books that it trained on. Soon after trying GPT-4 I could get it to summarise Evans DDD chapter by chapter. Not anymore.

Not a surprise, but a change nonetheless.

j / k navigate · click thread line to collapse

0 comments

14 comments · 7 top-level

edgyquant2y ago· 2 in thread

Everytime it’s mentioned someone says this and other users provide examples. Maybe you just don’t care about those examples

SOLAR_FIELDSOP2y ago

sebzim45002y ago

Without access to the old model, I can't collect samples with n > 1

bondarchuk2y ago· 2 in thread

Here's a specific example https://news.ycombinator.com/item?id=37533417

SOLAR_FIELDSOP2y ago

bondarchuk2y ago

You're moving the goalposts.

HenryBemis2y ago· 1 in thread

phkahler2y ago

I haven't used it, but can't you just say "OK, use that as the new baseline from here on." Or something similar?

dmm2y ago· 1 in thread

OpenAI regularly changes the model and they admit the new models are more restricted, in the sense that they prevent tricky prompts from producing naughty words, etc.

It should be their responsibility to prove that it's just as capable.

SOLAR_FIELDSOP2y ago

tessierashpool2y ago· 1 in thread

> I read this all the time and yet no one can seem to come up with even a few questions from several months ago that ChatGPT has become “worse” at

this could just mean that people do not have time to argue with strangers

SOLAR_FIELDSOP2y ago

birracerveza2y ago

xfz2y ago

One example: it now refuses to summarise books that it trained on. Soon after trying GPT-4 I could get it to summarise Evans DDD chapter by chapter. Not anymore.

Not a surprise, but a change nonetheless.

j / k navigate · click thread line to collapse