1Retentive Network: A Successor to Transformer for Large Language Models (opens in new tab)(arxiv.org)arXiv112sangel2y ago19Save