undefined | Better HN

0 pointsGaggiX2y ago0 comments

No they don't.

0 comments

5 comments · 1 top-level

ctoth2y ago· 4 in thread

Yes, actually, they do. See the heading Blessings of scale in [0]

[0]: https://gwern.net/scaling-hypothesis:

I have trained models on performing additions on numbers with fixed number of digits, just to prove to a friend that a neural network can learn to do addition, I hide in the dataset many combinations of numbers but the model was still able to sum them correctly, therefore it learned to perform addition on numbers with fixed number of digits. So no it's not memorizing, it's something that I have tested myself.

Der_Einzige2y ago

Fine-tuning the model to force it to understand the insane tokenization its forced to use for its vocabulary and perform accurate addition sounds great until you start trying very large numbers or messing with whatever decoding settings you're using (and let me guess, it was beam search).

Or, you can simply ask it to use a tool, like a calculator for you. This is more reliable than fine-tuning a current technique BPE tokenized model ever will be.

1 more reply

jjtheblunt2y ago

Interesting. Thanks for the comment. What if you ask it to add numbers with more digits?

1 more reply

jjtheblunt2y ago

super cool and interesting link. thanks.

j / k navigate · click thread line to collapse

0 comments

5 comments · 1 top-level

ctoth2y ago· 4 in thread

Yes, actually, they do. See the heading Blessings of scale in [0]

[0]: https://gwern.net/scaling-hypothesis:

GaggiXOP2y ago

I have trained models on performing additions on numbers with fixed number of digits, just to prove to a friend that a neural network can learn to do addition, I hide in the dataset many combinations of numbers but the model was still able to sum them correctly, therefore it learned to perform addition on numbers with fixed number of digits. So no it's not memorizing, it's something that I have tested myself.

Der_Einzige2y ago

Fine-tuning the model to force it to understand the insane tokenization its forced to use for its vocabulary and perform accurate addition sounds great until you start trying very large numbers or messing with whatever decoding settings you're using (and let me guess, it was beam search).

Or, you can simply ask it to use a tool, like a calculator for you. This is more reliable than fine-tuning a current technique BPE tokenized model ever will be.

1 more reply

jjtheblunt2y ago

Interesting. Thanks for the comment. What if you ask it to add numbers with more digits?

1 more reply

jjtheblunt2y ago

super cool and interesting link. thanks.

j / k navigate · click thread line to collapse