Anyone else seeing any evidence of this?
It's full of memes and people complaining its not as "good" as it was yesterday when it fails at completing their homework.
I would take anything said there with a big grain of salt, and stick to benchmarks.
I tried Claude because of the larger context size, but I've been disappointed so far. I find Claude much more likely to just compliment my writing, whereas GPT will identify strengths and areas that could be improved.
Does it really make sense to play with this kind of power?
The shape of my career might change, but I doubt I'll be unable to find a job.
In the past, new automation technologies often open up new possibilities in production capabilities in turn creating new jobs - specifically jobs that have not been automated yet.
AI though promises to be the universal automation, i.e. it can do any job. Thus even if new jobs show up, they will be taken over by AI too.
Then what?
> The shape of my career might change, but I doubt I'll be unable to find a job.
Question you should ask is why would anyone hire you when they can get AI to do the same job.
I think it's critical to be thinking about how to make sure wealth isn't simply funneled to a few capitalists who own everything simply by virtue of them being first, because it seems that's the future we're heading for if we aren't careful.
I think you, and I, and everyone on HN will be fine (more or less...) but I am worried about a wide swath of people who will get "left behind."
Better to be excited and learn the tool as it develops than to stick your head in the sand.
A world where AI has put literally everyone “out of a job,” meanwhile, is still so far from our current reality that IMO, it’s not worth making practical day-to-day decisions on unless you are directly involved in the development or regulation of AI.
2 years ago something like ChatGPT (as limited as it is) was “far from our current reality”.
I think it’s worthwhile to think ahead.
Somehow that didn't happen, though.
If you "have enough compute" available -- which OpenAI definitely does -- the best current technique is to use mixed precision with post-quantisation fine tuning to restore performance. That's most probably how all of the "turbo" models work. Take a model that was initially 16 or 32 bits per parameter during training, quantise it down to a mixture of 4, 8, and 16 bits, and then fix it up with an additional training pass that uses the original full-fat model's predictions as the loss function. With access to the raw parameters, it's possible to do this training such that all of the output weights are considered and adjusted during this phase instead of just the top word. Third parties fine-tuning against GPT4 chats can't do this, even with the collected samples, because they only have individual selected tokens/words instead of the full probability distribution.
well, that's how it was for GPT-3.5 anyway. The "turbo" flavor was faster and cheaper, but seemed to have slightly worse output (again, this is all going by subjective measurement; it could entirely be the imagination of AI bros)
It's like the AI equivalent of "the rest is left as an exercise to the reader" you'd find in old textbooks.
gpt4 should really be called DecartesGPT with this bs.
"I benchmarked on SAT reading, which is a nice human reference for reasoning ability. Took 3 sections (67 questions) from an official 2008-2009 test (2400 scale) and got the following results, here a SAT-like test:
- GPT3.5 - 690 (10 wrong) - GPT4 - 770 (3 wrong) - GPT4-turbo (one section at time) - 740 (5 wrong) - GPT4-turbo (3 sections at once, 9K tokens) - 730 (6 wrong)"
Source: https://twitter.com/wangzjeff/status/1721934560919994823?t=P...
You seem to be suggesting it got a bit worse, and the aider article seems to suggest gpt4 got a bit worse, although much faster at being a bit worse, while gpt3.5 got worse, then better, while faster.
However, in my opinion the first attempt score is more important, and Turbo does genuinely seem to lead there. There's still a possibility the updated training data has tainted the results.
Are there any other programming assistant packages that use the chatgpt api like this?
Regarding rate limits, it might be an idea to have configurable delays built in to the testing code to prevent hitting limits.