One of the reasons people build one though is to learn. Most smart folks are quite aware that the reality of pre-training a real LLM is going to involve some head banging against the wall (ie, things don't go smoothly like "building an llm from scratch" book), and they want to go through the process.
> Modify one thing at a time
> Change only one variable per ablation while keeping everything else constant. If you change multiple things and performance improves, you won’t know what caused it. Test modifications individually, then combine successful ones and reassess.
This is an unintentional microcosm of what is flawed with the document.
And even then. If you’re an IC and your boss is saying, “incrementalism at the level of planning experiments,” and the goal is research, quit, because you will fail.
Tumbler speak has a bunch of whacky things, notably "chimkin nuggers."