Those suggestions they make for a B200 start at $4.99 an hour.
Is that really required, for starting out? I've been tinkering with my own from-scratch LLM, but in the early phases I don't need anything more than a 4090 on Vast.ai
We were lucky enough to get Blackwell GPUs for Stanford students this year, which is why the writeups are written mostly around them.
- the hardware you need for a production use-case is relatively small, because production {models, bitstreams} have been heavily size-optimized, stripping out everything not needed to get a good result for the target use-cases
- but the hardware you need when tinkering/learning how to design {compute kernels, IP blocks} in the first place, must be quite a bit more powerful / higher-capacity, because your experiments will intentionally be the opposite of optimized: they'll be built for legibility / introspectability / debuggability at every level, which massively inflates and de-optimizes the resulting {model, bitstream}.
(And, to be clear here, "running someone else's finished model, which was designed and optimized to be used on something like a 4090, against your own prompt" is a kind of experimenting, which is cheap, in the same way that "deploying someone else's pre-baked FPGA bitstream, that was designed and synthesized for a $20 target FPGA, onto your own instance of that $20 FPGA, and then feeding your own input signals to it" is cheap. But that's not the kind of experimenting you'd be doing in this course while learning to design your own models!)
Coming back to the course, kudos to the course staff, including professors and TAs. The obviously put a ton of thought in designing the course, putting together those slides that contain the latest updates of the field, and preparing the wonderful assignments. You get to create a real LM and explore other important parts of LLM pipeline from small building blocks and validate them, validate each step, and see for yourself how everything comes together. You can really feel a sense of achievement after completing the assignments.
That said, while the staff obviously put a lot of effort into making this accessible to everyone, I wish they made a bit more effort in clarifying the environment requirement. Their harness works best on a Linux environment with NVIDIA GPU, which may be taken for granted for researchers but rare for home computer setup. Their setup also expects specific CUDA versions and/or architectecture. For following at home, the next best setup is Windows with WSL2 + NVIDIA GPU, plus leased GPUs on various platforms, none of which is exactly trivial (or cheap, for that matter). It would be nice if the staff could put together a bit more guidance in that area, especially how someone without any compatible GPU can make the most out of the course. (One thing I learned is that if you use Mac OS and are not careful about memory analysis, your python code could freeze and force reboot your machine).
IMO the cost of renting GPUs is a bit overstated in these comments. Generally almost all of the development can be done locally, and then ran for a short period of time using on-demand GPUs. For assignment 1, you can run everything on your local machine, even if you don't have a GPU. For A1 and A2, you can do (most of) the tasks with only a few hours of renting. Without being too careful using rental GPUs throughout will net you around $200 of a compute budget, but you can easily get this under $50 if you're willing to scale down many of the problems. I think we could work on making this clear and charting what these changes are.
If you have further feedback or encounter problems, feel free to open issues in the repos so we can resolve them! It's hard for us to fix issues we're not aware of.
GPU cost: most of us will spend at least a few hours of troubleshooting to get started on a leased GPU, including but not limited to figuring out how much storage is needed, if CUDA version works well etc. No GPU is definitely possible but difficult. Plus, one issue might be that most of us just don't have enough experience working with them, resulting in more time figuring things out.
Github issues -- noted, will create any issue that I can think of.
I have a 5080 16GB, are they really needing more than that in this course?
Would be great to have a community to discuss the material - even if folks can't commit to the full course.
A want like a casual lesswrong style from ground up explanation.
Gives you the basics on LLM internals in about 90 minutes and includes an already built model in JavaScript that you can step through in browser devtools to get as detailed as you want.
Assignment 1 (basics) has the most hours of preparation invested in it, and only minor modernization/bug fixes were necessary this year.
> Machine Learning (e.g. CS221, CS229, CS230, CS124, CS224N) You should be comfortable with the basics of machine learning and deep learning.
Anyone have a good implementation-heavy self-study resource for those topics, or experience with the recorded lectures for those Stanford courses?
Course: https://web.stanford.edu/class/archive/cs/cs224n/cs224n.1246...
Lecture videos: https://www.youtube.com/playlist?list=PLoROMvodv4rOaMFbaqxPD...
Textbook: https://web.stanford.edu/~jurafsky/slp3/
The one issue I had with the CS336 course was the delivery of the RL components. I liked Lectures 5 & 6 from CME 295 better
https://cme295.stanford.edu/syllabus/
I’ve heard good things about the diffusion models class as well - CME 296. Seems like a good next step.
I was able to reproduce the results of the original gpt-1 paper with my gaming PC. I don't even have alot of VRAM. My NVIDIA GeForce RTX 2060 SUPER was able to reproduce most of the results with just 1 hour of training. I would totally recommend to do the same, if you are interested in pre-training LLMs.
The code is here: https://github.com/epoyraz/modded-gpt-1 But, you can also just ask Claude 4.8 or Codex 5.5
Started with Word2Vec, built an RNN, then LSTM and am halfway through building transformer architecture.
AI Agent Guidelines for CS336 at Stanford https://github.com/stanford-cs336/assignment1-basics/blob/ma... (https://news.ycombinator.com/item?id=48359232)