1China has trained a 10 trillion parameter language model (opens in new tab)(twitter.com)4MrUssek4y ago0
5Enigma: GPT-2 trained on 10K Nature Papers: Can you spot the difference? (opens in new tab)(stefanzukin.com)183MrUssek4y ago105
6GShard: Scaling giant models with conditional computation and automatic sharding (opens in new tab)(arxiv.org)112MrUssek5y ago35