110Gb/s Ethernet: what I did to get it working in my home (opens in new tab)(gilesthomas.com)232gpjt11d ago176
3LLM from scratch, part 33 – what I learned from the appendices (opens in new tab)(gilesthomas.com)5gpjt18d ago0
4LLM from scratch (32l) – Interventions: updated instruction fine-tuning results (opens in new tab)(gilesthomas.com)1gpjt20d ago0
6LLM from scratch, part 32k – Interventions: gradient accumulation (opens in new tab)(gilesthomas.com)2gpjt25d ago0
8LLM from scratch, part 32j – trying to train a better model in the cloud (opens in new tab)(gilesthomas.com)2gpjt1mo ago0
9Writing an LLM from scratch, part 32i – Interventions: what is in the noise? (opens in new tab)(gilesthomas.com)1gpjt1mo ago0
10Writing an LLM from scratch, part 32h – Interventions: full fat float32 (opens in new tab)(gilesthomas.com)7gpjt1mo ago0
11Writing an LLM from scratch, part 32g – Interventions: weight tying (opens in new tab)(gilesthomas.com)2gpjt1mo ago0
12Writing an LLM from scratch, part 32f – Interventions: weight decay (opens in new tab)(gilesthomas.com)6gpjt1mo ago0
13Writing an LLM from scratch, part 32e – Interventions: the learning rate (opens in new tab)(gilesthomas.com)3gpjt2mo ago0
14Writing an LLM from scratch, part 32d – Interventions: adding attention bias (opens in new tab)(gilesthomas.com)6gpjt3mo ago0
15Writing an LLM from scratch, part 32c – Interventions: removing dropout (opens in new tab)(gilesthomas.com)1gpjt3mo ago0