Small: runs in an average laptop not optimized for inference of LLMs, like Gemma 3 4B.
Medium: runs in a very high spec computer that people can buy for less than 5k. 30B, 70B dense models or larger MoEs.
Large: Models that big LLM providers sell as "mini", "flash", ...
Extra Large / SOTA: Gemini 2.5 PRO, Claude 4 Opus, ChatGPT O3, ...
These are typically small and performant both in compute and accuracy/utility from what I've seen.
I think with all the hype at the moment sometimes AI/ML has become too synonymous with LLM
How is that a "language model"?
Working with time series data would work in that case.
This is the problem I have with the general discourse of "AI" even on Hacker News, of all places. Everything you listed is not an example of a *language model*.
All of those can either be implemented as a simple "if", decision tree, decision table, and finally actual ML in the example of cameras and time series predication.
Using an LLM is not just ridiculous here but totally the wrong fit and a waste of resources.
Time and labor are resources too. There's a whole host of problems where "good enough" is tremendously valuable.
I'll add one more: a LLM small enough that it can be trained from scratch on one A100 in 24 hours. Is it really small if it takes $10,000 to train? Or leave that term for $200 models?
Back to your definitions, there are sub-1B models people are using. I think I saw one in the 400-600M range for audio. Another person posted here a 100M-200M model for extracting data from web pages. We told them to just use a rules-based approach where possible but they believed the SLM worked better.
Then, there's projects like BabyLM that can be useful at 10M:
Maybe resources needed for fine-tuning would be nice to see.
For those reasons, users might want to train a new model from scratch.
Researchers of training methods have a different problem. They need to see whether a new technique, like an optimization algorithm, gets better results. They try them more quickly with less money if they have small, training runs representative of what larger models do. If BabyLM-10M was representative, they could test each technique at the FLOPS/$ of a 10M model instead of a 1B model.
So, both researchers and users might want new models trained from scratch. The cheaper to train, the better.
Could you post a link to this comment or thread. I can't seem to find this model by searching but world love to try it out.
100%. It has enough technical details that maybe a human did something. But who knows.
“tiny” can run on a microcontroller, “compact” on a Rpi, “small” on a phone, “medium” on a single GPU machine, “large” on AI class workstation hardware, and “huge” on a data center cluster.
Does this mean without a dedicated electric power plant?
I wanted to say "Right, big-sized. Do you want fries with that?", but I couldn't figure out how to work that in, so I won't say it.