Wavecoder – a CodeLLM with 6.7B params scoring just behind GPT4 (opens in new tab)

(twitter.com)

36 pointsfgfm2y ago10 comments

10 comments

lxe2y ago

A 6.7B model that's as good as GPT-4 is mostly due to overfitting in such a way that favors certain benchmarks.

earleybird2y ago

In their paper they say "To prevent overfitting, we use Low-Rank Adaption (LoRA) [35] for fine-tuning . . ."

I'm way out of my league here so I have no opinion on whether or not that actually addresses overfitting.

(that quote probably doesn't capture their intention - just a pointer into the paper)

eightysixfour2y ago

That’s to prevent overfitting on their dataset, it is not to prevent overfitting on the test data, which is likely in their dataset.

You basically cannot beat GPT-4 on broad reasoning tasks, which the tests are designed to cover, without having some of the tests leaking into training dataset. There simply aren’t enough parameters and isn’t enough training to make that possible.

great_psy2y ago

This a pretty strong claim with zero data to back it up

eightysixfour2y ago

Every small model that has outperformed GPT-4 has proven to be an overfit, so I would say it is the obvious claim, and any claim opposite that is what we should be skeptical of.

1 more reply

lxe2y ago

While I lack specific data, my intuition is based on observed trends in AI model development. I believe some other models that claimed such numbers excelled in benchmarks but fell short in real-world applications. Further research can validate this claim, and I welcome a balanced discussion.

1 more reply

j / k navigate · click thread line to collapse

10 comments

lxe2y ago

A 6.7B model that's as good as GPT-4 is mostly due to overfitting in such a way that favors certain benchmarks.

earleybird2y ago

In their paper they say "To prevent overfitting, we use Low-Rank Adaption (LoRA) [35] for fine-tuning . . ."

I'm way out of my league here so I have no opinion on whether or not that actually addresses overfitting.

(that quote probably doesn't capture their intention - just a pointer into the paper)

eightysixfour2y ago

That’s to prevent overfitting on their dataset, it is not to prevent overfitting on the test data, which is likely in their dataset.

great_psy2y ago

This a pretty strong claim with zero data to back it up

eightysixfour2y ago

Every small model that has outperformed GPT-4 has proven to be an overfit, so I would say it is the obvious claim, and any claim opposite that is what we should be skeptical of.

1 more reply

lxe2y ago

1 more reply

j / k navigate · click thread line to collapse