No Train No Gain:Revisiting Efficient Training Algrthm for Transformer-BasedLM (opens in new tab)

(arxiv.org)

11 pointsfroster2y ago1 comments

1 comments

1 comments · 1 top-level

Recent paper highlights the difficulty of creating a new optimizer as drop-in replacement. Sophia and Lion were recently proposed as superior alternatives to Adam, but appeared worse in an independent eval

j / k navigate · click thread line to collapse