Also there are better models than the one suggested. Mistral for 7B parameters. Yi if you want to go larger and happen to have 32Gb of memory. Mixtral MoE is the best but requires too much memory right now for most users.
> TinyChatEngine provides an off-line open-source large language model (LLM) that has been reduced in size.
But then they download the models from huggingface. I don’t understand how these are smaller? Or do they modify them locally?
Turns out the original source is actually somewhat informative. Including telling you how much hardware do you need. This blog post looks like your typical note you leave for yourself to annotate a bit of your shell history.
Of course the main problem is that I don't know enough about the subject to reason on it on my own.
Their optimized models are not downloaded from HF, but from dropbox. I have no idea why.
Performance on my relatively old i5-8600 CPU running 6 cores at 3.10GHz with 32GB of memory gives me about 150-250 ms per token on the default model, which is perfectly usable.