undefined | Better HN

0 pointsfamouswaffles3y ago0 comments

GPT's already forgo the surface level statistically most likely next word for words that are more context appropriate. That's one of the biggest reasons they are so useful.

The truth is that functionally/technically, there's plenty left to squeeze. The bigger issue is that we're hitting a wall economically.

0 comments

6 comments · 1 top-level

EGreg3y ago· 5 in thread

How do they do that? No one seems to have a real explanation of what OpenAI actually did to train it

famouswafflesOP3y ago

It's pretty much just scale, either via Dataset size or parameter size. Before GPT-4, the general SOTA model was not in fact from Open AI (Flan-PaLM from Google).

The attention from GPT-4 is a little different (probably some kind of flash attention) so that memory requirements for longer contexts are no longer quadratic. But there's nothing to suggest the intellectual gains from 4 isn't just bigger scale.

Google could have made a 4 equivalent I'm sure. It's not like there wasn't a road to take. We already knew 3 was severely undertrained even from a computer optimal perspective. And then of course, you can just train on even more tokens to get them even better.

EGreg3y ago

How do you know it’s pretty much just scale? “Open”AI has been pretty tight-lipped about the details of its training and merely “claims” it was scale. It hired a ton of humans to train the model in little ways. If that’s the “scale” you’re talking about then it’s humans all the way down:

https://www.forbes.com/sites/kenrickcai/2023/04/11/how-alexa...

2 more replies

mindwok3y ago

Information on how they trained it nonwithstanding, there’s clearly more than just statistically appropriate words going on because you can ask it to create completely new words based on rules you define and it will happily do it.

feanaro3y ago

Well yes -- it's not words, it's tokens, which are smaller than words.

fnordpiglet3y ago

attention is all you need

(Well and crap tons of GPUs and training data)

j / k navigate · click thread line to collapse