undefined | Better HN

0 pointsaz2263y ago0 comments

Compared to GPT2 it’s on par. Compared to GPT3, 3.5, or 4, it’s a toy. GPT2 is 4 years old, and in terms of LLMs, that’s several life times ago. In 5-10 years, GPT3 will be viewed as a toy. Note, “progress” will unlikely be as fast as it has been going forward.

0 comments

2 comments · 2 top-level

tbalsam3y ago

GPT-2's largest model was 1.5B params, LLama-65B was similar to the largest GPT3 in benchmark performance but that model was expensive in the API, a number of the people would use the cheaper one(s) instead IIRC.

So this is similar to a mid tier GPT3 class model.

Basically, there's not much reason to Pooh-Pooh it. It may not perform quite as well, but I find it to be useful for the things it's useful for.

Oranguru3y ago

"Compared to GPT2 it’s on par" Any benchmarks or evidence tu support this claim? IF you try to find them, official benchmarks will tell you that this is not true. Even the smallest LLaMa model (7B) is far ahead of GPT2, like an order of magnitude better in perplexity.

j / k navigate · click thread line to collapse