1Writing an LLM from scratch, part 32g – Interventions: weight tying (opens in new tab)(gilesthomas.com)2gpjt1d ago0
2Writing an LLM from scratch, part 32f – Interventions: weight decay (opens in new tab)(gilesthomas.com)6gpjt2d ago0
3Writing an LLM from scratch, part 32e – Interventions: the learning rate (opens in new tab)(gilesthomas.com)3gpjt15d ago0
4Writing an LLM from scratch, part 32d – Interventions: adding attention bias (opens in new tab)(gilesthomas.com)6gpjt1mo ago0
5Writing an LLM from scratch, part 32c – Interventions: removing dropout (opens in new tab)(gilesthomas.com)1gpjt1mo ago0
6Writing an LLM from scratch, part 32B – Interventions: gradient clipping (opens in new tab)(gilesthomas.com)2gpjt1mo ago0
7Writing an LLM from scratch, part 32a – Interventions: training a baseline model (opens in new tab)(gilesthomas.com)1gpjt1mo ago0
8Getting a Custom PyTorch LLM onto the Hugging Face Hub (opens in new tab)(gilesthomas.com)1gpjt1mo ago0
9Writing an LLM from scratch, part 31 – the models are now on Hugging Face (opens in new tab)(gilesthomas.com)2gpjt2mo ago0
10Writing an LLM from scratch, part 30 – digging into the LLM-as-a-judge results (opens in new tab)(gilesthomas.com)1gpjt2mo ago0
11LLM from scratch, part 29 – using DDP to train a base model in the cloud (opens in new tab)(gilesthomas.com)2gpjt2mo ago0
12LLM from scratch, part 28 – training a base model from scratch on an RTX 3090 (opens in new tab)(gilesthomas.com)540gpjt3mo ago121
13Writing an LLM from scratch, part 27 – what's left, and what's next? (opens in new tab)(gilesthomas.com)1gpjt4mo ago0
14Writing an LLM from scratch, part 26 – evaluating the fine-tuned model (opens in new tab)(gilesthomas.com)4gpjt4mo ago0
15Writing an LLM from scratch, part 25 – instruction fine-tuning (opens in new tab)(gilesthomas.com)2gpjt4mo ago0