undefined | Better HN

0 pointsOutOfHere2y ago0 comments

A well-trained 8B model will already be over-saturated with information from the start. It will therefore easily forget much old information when fine-tuning it with new materials. It just doesn't have the capacity to take in too much information.

Don't get me wrong. I think an 70B or larger model would be worth fine-tuning, especially if it can be grown further with more layers.

0 comments

2 comments · 1 top-level

solidasparagus2y ago· 1 in thread

> A well-trained 8B model will already be over-saturated with information from the start

Any evidence of that that I can look at? This doesn't match what I've seen nor have I heard this from the world-class researchers I have worked with. Would be interested to learn more.

OutOfHereOP2y ago

Upon further thought, if fine-tuning involves adding layers, then the initial saturation should not matter. Let's say if an 8B model adds 0.8*2 = 1.6B of new layers for fine-tuning, then with some assumptions, a ballpark is that this could be good for 16 million articles for fine-tuning.

j / k navigate · click thread line to collapse