The Smol Training Playbook: The Secrets to Building World-Class LLMs (opens in new tab)

(huggingface.co)

265 pointskashifr7mo ago19 comments

19 comments

17 comments · 4 top-level

lewtun7mo ago· 5 in thread

Hi, Lewis here (one of the co-authors). Happy to answer any questions people have about the book :)

This was a good read. I was struck by the quantity of nuanced and applied knowhow it took to build SmolLM3. I am curious about the rough cost it took to engineer and train SmolLM3 - at ~400 GPUS for a least a month, and, based on the set of book co-authors, 12 engineers for at least three months. Is $3-5M a fair ballpark number? The complement is how much experience, on average, the team members had doing ML and LLM training at scale before SmolLM3. The book is "up" on recent research, so I am surmising a phd-centric team each with multiple systems built. This is not commodity skill. What the book suggests to me is that an LLM applications start up would best focus on understanding the scope and knowhow for starting from post-training.

danielmarkbruce7mo ago

I'm a little ways through this and it's great so far, nice job.

One of the reasons people build one though is to learn. Most smart folks are quite aware that the reality of pre-training a real LLM is going to involve some head banging against the wall (ie, things don't go smoothly like "building an llm from scratch" book), and they want to go through the process.

matusp7mo ago

Really impressive writeup. In your opinion, how long will this stay up to date? The field is constantly evolving, do you plan to keep updating this document?

lewtun7mo ago

Thanks! I expect the book will remain relevant as long as the Transformers architecture does. That’s why we mostly focus on topics we think will stand the test of time, but let’s see how that plays out :)

danielmarkbruce7mo ago

Finished. Great write up.

tsenturk7mo ago· 3 in thread

Hugging Face is not just an AI information-sharing website; it’s also a great learning platform for all AI learners. This documentation is one of the most impressive hands-on resources I’ve ever read.

abossy7mo ago

What others would you recommend that are comparable in quality?

pixelmelt7mo ago

Been reading a book by u/fpham "The Cranky mans guide to lora and qlora" and it's pretty great, writing quality isnt all there but the content is valuable for learning to make good finetunes

donkeyboy7mo ago

The documentation for common ai packages is pretty good too. For example, pytorch docs, peft docs, timm docs.

doctorpangloss7mo ago· 3 in thread

I really like the Hugging Face guys, but...

> Modify one thing at a time

> Change only one variable per ablation while keeping everything else constant. If you change multiple things and performance improves, you won’t know what caused it. Test modifications individually, then combine successful ones and reassess.

This is an unintentional microcosm of what is flawed with the document.

CamperBob27mo ago

What's wrong with it? That's good advice in almost any optimization or troubleshooting context where variables may interact.

yorwba7mo ago

One problem with testing one change at a time is that if you can only run a small number of experiments because each one requires many GPU hours to get results, you can also only test a small number of changes. If you can come up with and implement new changes much more easily than you can test them, it would be more efficient to test multiple changes at a time and use some form of Bayesian optimization to find the best combination of changes with as few experiments as possible.

2 more replies

doctorpangloss7mo ago

It’s advice for being an individual contributor, not a researcher.

And even then. If you’re an IC and your boss is saying, “incrementalism at the level of planning experiments,” and the goal is research, quit, because you will fail.

forgingahead7mo ago· 2 in thread

Where does "Smol" come from? It's supposed to mean "Small" right? If yes then what's the etymology and reason for popular usage?

potsandpans7mo ago

It's just internet speak from the days of tumbler. It usually has cutsie connotations.

Tumbler speak has a bunch of whacky things, notably "chimkin nuggers."

lewtun7mo ago

In the specific case of SmolLM, it originates from the meme in this dataset https://huggingface.co/datasets/bigcode/the-stack-smol

j / k navigate · click thread line to collapse

19 comments

17 comments · 4 top-level

lewtun7mo ago· 5 in thread

Hi, Lewis here (one of the co-authors). Happy to answer any questions people have about the book :)

troelsSteegin7mo ago

danielmarkbruce7mo ago

I'm a little ways through this and it's great so far, nice job.

matusp7mo ago

Really impressive writeup. In your opinion, how long will this stay up to date? The field is constantly evolving, do you plan to keep updating this document?

lewtun7mo ago

danielmarkbruce7mo ago

Finished. Great write up.

tsenturk7mo ago· 3 in thread

abossy7mo ago

What others would you recommend that are comparable in quality?

pixelmelt7mo ago

Been reading a book by u/fpham "The Cranky mans guide to lora and qlora" and it's pretty great, writing quality isnt all there but the content is valuable for learning to make good finetunes

donkeyboy7mo ago

The documentation for common ai packages is pretty good too. For example, pytorch docs, peft docs, timm docs.

doctorpangloss7mo ago· 3 in thread

I really like the Hugging Face guys, but...

> Modify one thing at a time

This is an unintentional microcosm of what is flawed with the document.

CamperBob27mo ago

What's wrong with it? That's good advice in almost any optimization or troubleshooting context where variables may interact.

yorwba7mo ago

2 more replies

doctorpangloss7mo ago

It’s advice for being an individual contributor, not a researcher.

And even then. If you’re an IC and your boss is saying, “incrementalism at the level of planning experiments,” and the goal is research, quit, because you will fail.

forgingahead7mo ago· 2 in thread

Where does "Smol" come from? It's supposed to mean "Small" right? If yes then what's the etymology and reason for popular usage?

potsandpans7mo ago

It's just internet speak from the days of tumbler. It usually has cutsie connotations.

Tumbler speak has a bunch of whacky things, notably "chimkin nuggers."

lewtun7mo ago

In the specific case of SmolLM, it originates from the meme in this dataset https://huggingface.co/datasets/bigcode/the-stack-smol

j / k navigate · click thread line to collapse