What free resources are available and recommended in the "from scratch vein"?
then
https://www.dipkumar.dev/becoming-the-unbeatable/posts/gpt-k... for adding a kv cache implementation
It's much more accessible to regular developers, and doesn't make assumptions about any kind of mathematics background. It's a good starting poing after which other similar resources start to make more sense.
Plenty of other people have this understanding of these topics, and you know what they chose to do with that knowledge? Keep it to themselves and go work at OpenAI to make far more money keeping that knowledge private.
If you want to live in a world where this knowledge is open, at the very least refrain from publicly complaining about a book that cost roughly the same as a decent dinner.
I would have expected the main target audience to be people NOT working in the AI space, that don’t have any prior knowledge (“from scratch”), just curious to learn how an LLM works.
import torch
From the first code sample, not quite from scratch :-)Nobody working in this space is hand calculating derivatives for these models. Thinking in terms of differentiable programming is a given and I think certainly counts as "from scratch" in this case.
Any time I see someone post a comment like this, I suspect the don't really understand what's happening under the hood or how contemporary machine learning works.
I have to disagree on that being an obvious assumption for the meaning of "from scratch", especially given that the book description says that readers only need to know Python. It feels like if I read "Crafting Interpreters" only to find that step one is to download Lex and Yacc because everyone working in the space already knows how parsers work.
> I suspect the don't really understand what's happening under the hood or how contemporary machine learning works.
Everyone has to start somewhere. I thought I would be interested in a book like this precisely because I don't already fully understand what's happening under the hood, but it sounds like it might not actually be a good starting point for my idea of "from scratch."
pytorch to LLMs has a lot to show even without Python to pytorch part. It reminds me of "Neural Networks: Zero to Hero" Andrej Karpathy https://m.youtube.com/playlist?list=PLAqhIrjkxbuWI23v9cThsA9... Prerequisites: solid programming (Python), intro-level math (e.g. derivative, gaussian). https://karpathy.ai/zero-to-hero.html
Do you (or others) have good resources explaining what they are and how they work at a high level?
import universe
first. from transformers importMy goal is to have something learn to land, like a lunar lander. Simple, start at 100 feet, thrust in one direction, keep trying until you stop making craters.
Then start adding variables, such as now it's moving horizontally, adding a horizontal thruster.
next, remove the horizontal thruster and let the lander pivot.
Etc.
I just have no idea how to start with this, but this seems "mainstream" ML, curious if this book would help with that.
[0]: https://www.manning.com/books/grokking-deep-reinforcement-le... [1]: https://gymnasium.farama.org/environments/box2d/ [2]: https://github.com/DevJac/learn-pytorch/blob/main/SAC.ipynb
Sutton and Barto's Reinforcement Learning: An Introduction is widely considered a the definitive intro to the topic.
If I had to recommend a curriculum to a friend I would say:
(1) Spend a few hours on Spinning Up.
(2) If the mathematical notation is intimidating, read Grokking Deep Reinforcement Learning (from Manning), which is slower paced and spends a lot of time explaining the notation itself, rather than just assuming the mathematical notation is self-explanatory as is so often the case. This book has good theoretical explanations and will get you some running code.
(3) Spend a few hours with Spinning Up again. By this point you should be a little comfortable with a few different RL algorithms.
(4) Read Sutton's book, which is "the bible" of reinforcement learning. It's quite approachable, but it would be a bit dry and abstract without some hands-on experience with RL I think.
To learn more about RL, most people would advise the Sutton and Barto book, available at: http://incompleteideas.net/book/the-book-2nd.html
- it implements a real word-level LLM instead of a character-level LLM
- after pretraining also shows how to load pretrained weights
- instruction-finetune that LLM after pretraining
- code the alignment process for the instruction-finetuned LLM
- also show how to finetune the LLM for classification tasks
- the book it overall has a lots of figures. For Chapter 3, there are 26 figures alone :)
The video looks awesome though. I think it's probably a great complementary resource to get a good solid intro because it's just 2 hours. I think reading the book will probably be more like 10 times that time investment.
I've watched it many times to understand well most of it.
And obviously you must already know pytorch really well, including the matrix multiplication, backpropagation etc. He speaks very fast too...
In my opinion he covers everything needed to understand his lectures. Even broadcasting and multidimensional indexing with numpy.
Also in the first lecture you will implement your own python class for building expressions including backprop with an API modeled after PyTorch.
IMHO it is the second lecture I can recommend without hesitation. The other is Gilbert Strang on linear algebra.
Now, the secondary goal is, of course, also to help people with building their own LLMs if they need to. The book will code the whole pipeline, including pretraining and finetuning, but I will also show how to load pretrained weights because I don't think it's feasible to pretrain an LLM from a financial perspective. We are coding everything from scratch in this book using GPT-2-like LLM (so that we can load the weights for models ranging from 124M that run on a laptop to the 1558M that runs on a small GPU). In practice, you probably want to use a framework like HF transformers or axolotl, but I hope this from-scratch approach will demystify the process so that these frameworks are less of a black box.
>Introducing MPT-7B, the first entry in our MosaicML Foundation Series. MPT-7B is a transformer trained from scratch on 1T tokens of text and code. It is open source, available for commercial use, and matches the quality of LLaMA-7B. MPT-7B was trained on the MosaicML platform in 9.5 days with zero human intervention at a cost of ~$200k
Just wondering are going to include any specific section or chapter in your LLM book on RAG? I think it will be very much a welcome addition for the build your own LLM crowd.
The fine tuning guide is the best resource so far https://ravinkumar.com/GenAiGuidebook/language_models/finetu...
Is there a way for readers to give feedback on the book as you write it?
Do you have an ETA for the completion of the book?
In the meantime, do you know any other free/paid resource that comes close to what you are trying to achieve with this book?
I'm not interested in language models specifically, but there are techniques involved with language models I would like to understand better and use elsewhere. For example, I know "attention" is used in a variety of models, and I know transformers are used in more than just language models. Will this book help me understand attention and transformers well enough that I can use them outside of language models?
(*Chapter 3, already submitted last week and should be online in the MEAP soon, in the meantime the code along with the notes is also available here: https://github.com/rasbt/LLMs-from-scratch/blob/main/ch03/01...)