Sample output:
Meta's Llama 3.1 models can output multilingual text which is awesome. Here are some examples output of 8 bit quantized 8b model with 100 token output (-n 100)... Quantization creates some brain damage.
English
My cat is funny. "Funny cat," I say, walking up to it. "What are you up to?" It sits up straight and looks at me with a tilted head, as if to say, "What's wrong with you?" Sometimes I just have to laugh at how funny a cat can be. So I say, "Okay, you're funny. I'll give you some treats." It stretches out a little and I give it some treats. It eats them up quickly and starts
German
Besitzen Sie einen Amiga 500? Wenn nicht, werden Sie wissen, dass dies ein Computer war, der im späten 1980er und frühen 1990er Jahren für Spiele verfügbar war, die für Personen mit bestimmten Körperverletzungen gedacht waren. Manchmal müssen wir uns an frühere Zeiten erinnern, die, wie wir jetzt wissen, schwierig waren. Hier ist ein Link, der meine Geschichte bespre
French
Le vin français est, à bien des égards, un vin des origines, car il a joué un rôle important dans l'histoire de la France". La réputation des vins de France repose principalement sur leurs qualités gustatives et la gestion des vignobles contrôlée, ce qui rend le vin français un "produit d'exception". La France est donc leader mondial de la production de vin, avec 25 % des exportations mon
Thai
แมวของฉันตลกชอบเล่นบนม้วนกระดาษ และฉันก็ไม่แน่ใจว่าควรจะยินยอมที่จะให้เล่นหรือไม่
เมื่อเวลาผ่านไป ฉันเห็นว่าแมวของฉันเล่นม้วนกระดาษเป็นระยะ ๆ ฉันจึงตัดสินใจที่จะลองปรับเปลี่ยนเกมให้สนุกขึ้น
Hindi
मेरी बिल्ली बहुत मज़ाया है और वह हमेशा अपनी शारीरिक गतिविधियों से मुझे मजाक करती है। वास्तव में, जब वह अपनी खिलौनों की चपपेट में आती है तो वह विशेष रूप से क्लासिक बन जाती है। इसके अलावा, वह एक छोटी सी च
This is kind of like 3rd grade English. What would be required to go beyond that?
This is also how you'd communicate with a cat or dog. We [still] can't use complex long sentences in communications with them. And the LLM seems to be nicely reproducing that style and mindset of complete yet small sentence-bites you're in when communicating with your Good Boy or your Funny Cat.
Shut up and take my money!
Love the wording.
Can open a PR if people want :) [Edit: Just opened a PR! Apologies my C is very rusty! https://github.com/trholding/llama2.c/pull/14]
https://github.com/trholding/llama2.c/blob/master/runq.c#L65... needs to be scaled with some weird formula like in https://github.com/unslothai/unsloth/blob/main/unsloth/model...
It is a program that given a model file, tokenizer file and a prompt, it continues to generate text.
To get it to work, you need to clone and build this: https://github.com/trholding/llama2.c
So the steps are like this:
First you'll need to obtain approval from Meta to download llama3 models on hugging face.
Go to https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct, fill the form and then go to https://huggingface.co/settings/gated-repos see acceptance status. Once accepted, do the following to download model, export and run.
huggingface-cli download meta-llama/Meta-Llama-3.1-8B-Instruct --include "original/*" --local-dir Meta-Llama-3.1-8B-Instruct
git clone https://github.com/trholding/llama2.c.git
cd llama2.c/
# Export Quantized 8bit
python3 export.py ../llama3.1_8b_instruct_q8.bin --version 2 --meta-llama ../Meta-Llama-3.1-8B-Instruct/original/
# Fastest Quantized Inference build
make runq_cc_openmp
# Test Llama 3.1 inference, it should generate sensible text
./run ../llama3.1_8b_instruct_q8.bin -z tokenizer_l3.bin -l 3 -i " My cat"