undefined | Better HN

0 pointsgemeral2y ago0 comments

> and blowing up parameter count to make up for it

based on (an admittedly rapid and indulgent reading of the paper), it seems like they're not increasing the parameter size. Do you mind pointing out where the blowup is occurring?

0 comments

magnustesshu2y ago

They're saying that likely, models of comparable size will perform worse (the paper claims as good)

But since they are (optimized up to 8 or 10x if packing terns beyond 2 bits, in practice it seems 3-5x considering larger other structures needed in memory) more memory efficient, the largest models can be that much larger.

j / k navigate · click thread line to collapse

0 pointsgemeral2y ago0 comments

> and blowing up parameter count to make up for it

based on (an admittedly rapid and indulgent reading of the paper), it seems like they're not increasing the parameter size. Do you mind pointing out where the blowup is occurring?

0 comments

magnustesshu2y ago

They're saying that likely, models of comparable size will perform worse (the paper claims as good)

j / k navigate · click thread line to collapse