Train a LLM from Scratch (opens in new tab)

(github.com)

3 pointslinhns1mo ago1 comments

1 comments

1 comments · 1 top-level

Curious — how did you handle training stability early on? Was convergence an issue without heavy tuning?

j / k navigate · click thread line to collapse