2The Practitioner's Guide to the Maximal Update Parameterization (opens in new tab)(blog.eleuther.ai)1tipsytoad1y ago0Save
3DenseFormer: Enhancing Information Flow in Transformers (opens in new tab)(arxiv.org)arXiv123tipsytoad2y ago33Save