Show HN: Alpaca.cpp – Run an Instruction-Tuned Chat-Style LLM on a MacBook (opens in new tab)

(github.com)

673 pointsantimatter153y ago283 comments

283 comments

219 comments · 46 top-level

doublepg233y ago· 35 in thread

Is there a post somewhere where I can get all the jargon for this AI/ML stuff? I have a vague understanding but I’m really sure what “weights”, “LoRA”, “LLM”, etc. are to really understand where each tool and concept fit in.

jazzyjackson3y ago

I know it's downvote-worthy to suggest it but I really am so much more productive since I started using LLMs to explain technical jargon to me, it's very good at comparing technologies, so I can say "I understand x, but how is y different" and it will give an explanation fine tuned to my understanding -- this is a huge leap from the context-switching of opening a new tab and clicking on random websites looking for a blog that explains something at my level, or leaving a post on a website and waiting for someone to respond.

thefourthchime3y ago

Same! ChatGPT has been out for months, and really only in the last few weeks have i started using it as much as I should. I'm using an electron wrapper now, so it's a first class desktop app. With the plus subscription, the 3.5 is as fast as google.

2 more replies

Steuard3y ago

That sounds like a cool use case, but... how can you bring yourself to trust anything at all that it tells you? This feels like it's just begging to contaminate your understanding with sparse, unpredictable errors, and personally I find the thought of that unexpectedly dismaying.

crazygringo3y ago

It's a lot easier to verify whether something is correct once you know what the thing is.

Half the time when I'm googling something I don't even know what subfield the term belongs to. I don't know what related terms I can use to narrow it down. Or I keep getting papers on a related but different question from what I'm trying to ask.

But if ChatGPT gives me an answer in context of a bunch of related terminology, it's so much easier to then plug those additional terms into Google and find a/the authoritative source.

For comparison, a lot of my (non-ChatGPT) initial searches will result in random blog entries or something that are often just flat-out wrong or else ignoring competing viewpoints. But they still might point me to better context and search terms.

The value of ChatGPT is that it gets you to information you can verify so much faster. But you still need to verify it the same way you need to verify stuff you find on blogs or Quora or Reddit or HN.

nr2x3y ago

Personally Ive been coding long enough I’ve forgotten more than most junior people know. So I use ChatGPT to refresh my memory more than anything else. Like yesterday couldn’t remember how to open some ports on a Postgres server. ChatGPT gave me a 95% accurate answer and based on my own knowledge base I fixed the mistake. This took all of a few seconds. Whereas with normal web search it would take an order of magnitude longer and likely be not much more accurate.

taink3y ago

No need to trust it -- that's the best part.

I tend to trust blogs or conversations with other people more than I should. The benefit of a tool like ChatGPT is that I already distrust it, and I tend to double-check more often what it responds.

I agree with you, but the comparison and fine-tuning GP mentioned are really areas where it's an unparalleled tool right now. Dismissing it in fear of "contaminating your understanding" is missing out on what the tool may offer to you. It's not really that much worse than a search query.

beepbooptheory3y ago

This has been really the most productive use of it for me. But I'm so cheap I've been stopping myself more and more and just spending an extra minute googling to save the cent. Even if its just literally penny pinching, its also helping the environment! Is the velocity I get an answer worth the cycles? I'm not so sure these days, but its been interesting reflecting on this.

Definitely excited for the inevitable reports comparing the killowats spent cumulatively to get an answer through google vs openai. How many blogs visited does it take to spend the same amount of energy getting the answer from a model?

lxe3y ago

I was like this a week ago. Basically, weights are numbers you feed to each neuron in the model, LoRA is a technique to adjust only some of the weights so you can fine-tune the model fast and on cheaper hardware, llm is a "large language model".

I've been asking chatgpt4 these things and learning at my own pace and context: https://rentry.co/vafkn

angch3y ago

What's fun is that the recent references to LoRA stands for Low-Rank Adaptation [1], not Low-Resource Adapters[2] (??? don't know if this even exists), but because Low-Rank Adaptation came out in 2021, chatgpt only explains what Low-Resource Adapters is, not Low-Rank Adaptation, which is what is being used in recent break throughs.

My own questioning of chatgpt on LoRA returns "Logistic Regression Algorithm" instead. Looks like it's too new to ChatGPT.

[1] https://arxiv.org/abs/2106.09685 [2] https://rentry.co/vafkn

stu2b503y ago

Well, it’s more that the weights are the neurons. There’s not actually like neuron objects defined or anything, neural networks are just a bunch of matrix operations. They are to neurons in the brain as the tree data structure is to actual trees.

1 more reply

ec1096853y ago

This is a truly great writeup: https://writings.stephenwolfram.com/2023/02/what-is-chatgpt-...

malborodog3y ago

Is it? Why does everyone dunk on him?

tough3y ago

weights refer to the trained model weights like for example stable diffusion's v1.1, v1.2 .. v1.4 v.2.x etc. Same with llama having 13B up to 65B parameters (different weights)

LLM refers to large language model, in contrast with diffusion models or GAN models, the text models are the ones that take text and autocomplete it, like the GPT family, open source BLOOM, and now the LlamA from facebook.

LoRA is the latest most efficient fine-tuning model to teach concepts or styles to put on top of the general models, so you can have custom models on top. like embeddings or fine-tuning for LLM's. So you had Textual Inversion, Dreambooth, and LoRA on that category.

what else do you need? Googling or asking chatgpt can help a lot too

JasonZ23y ago

> weights refer to the trained model weights

This is what I'm having a hard time understanding.

So there's the weights, and also a model somewhere? That the weights are based on? Or that you combine with the model to tune it?

hervature3y ago

Let's take a step back. You have a model like linear regression. For example, y=bx where y are your outputs and x are your inputs. Based on some data, you learn that b=1. Therefore, you share the weights of the model as a file like {b=1} and also share with them the model y=bx (usually shared via code) so they can run it in production.

1 more reply

ALittleLight3y ago

My really simplified explanation is:

Your inputs are lists of numbers. Your outputs are lists of numbers. There exists some possible list of numbers such that, if you multiply your inputs by that list you'll get (approximately) the outputs.

In this conception that possible set of numbers are the weights. "Training" is when you run inputs, compare to known outputs, and then update the weights so they produce outputs closet to what you want.

Large Language Models, it may be hard to see how they fit this paradigm - basically convert a sequence to a list of numbers ('aardvark' is 1, 'apple' is 2 etc) and then the desired output is the next word in the sequence (represented as a number). Surprisingly, if you get good at predicting next word in sequence you also get the ChatGPT et al behavior.

mlboss3y ago

model is class with params. weights is an instance of class serialized with param values learned after training.

tantony3y ago

This is what happens when running inference on a neural network:

Input (list of numbers) -> (Bunch of math operations) with (other numbers) -> Output (also a list of numbers)

This applies whether you are talking about image classification, image generation, text generation etc.

The model defines what the "(Bunch of math operations)" part is. As in, do these multiplications, then add, then a tanh operation etc.

The weights define what the "(other numbers)" are. Training is the process of figuring out these weights using various methods - some of which involve example inputs/outputs (supervised learning), others don't require examples (unsupervised or self-supervised learning).

dymk3y ago

Model is code, weights are the input data to that code

visarga3y ago

Just for quick reference:

LLM - the model

weights - the data of the model itself

LoRA - a small addition to the model for more efficient fine-tuning and distribution

EGreg3y ago

Tell us more about LoRA

visarga3y ago

Instead of a matrix NN they use two matrices, Nd and d*N, and assuming d << N it would be like a lightweight addition. Keeping the original matrix unchanged, this new addition is simply added in parallel and initialised to low values.

speedgoose3y ago

You can ask ChatGPT.

Ruhrbaron3y ago

Or Alpaca.

speedgoose3y ago

Alpaca is fun, but I would rather ask ChatGPT.

throwwwaway693y ago

I'm a software guy learning the AI field and tools and I literally made a glossary in my notes that I add to any time I run into a new initialism or word or concept. It's pretty long.

thefourthchime3y ago

I understand that AI/ML jargon can be confusing. Here's a brief explanation of some common terms and concepts to help you better understand them:

Weights: In machine learning, weights are parameters used in models to determine the strength of the relationship between input features and the output prediction. They are learned during the training process and are adjusted to minimize the error in the model's predictions.

LoRA: LoRA stands for "Learning from Rules and Arguments." It's a method that combines rule-based reasoning with neural networks, which allows AI systems to reason using human-like logic and provide explanations for their decisions. LoRA can enhance the interpretability and transparency of AI systems.

LLM: LLM stands for "Large Language Model." It refers to advanced AI models, such as OpenAI's GPT series, which are trained on vast amounts of text data to understand and generate human-like text. These models can perform various language tasks such as translation, summarization, question-answering, and more.

Supervised Learning: This is a type of machine learning where models are trained using labeled data. The model learns to map input features to the corresponding output labels through a process of trial and error, adjusting the weights to minimize the difference between predictions and true labels.

Unsupervised Learning: In contrast to supervised learning, unsupervised learning does not use labeled data. Instead, it aims to discover patterns or structures in the data, such as clustering or dimensionality reduction, without explicit guidance.

Reinforcement Learning: This is a type of machine learning where an agent learns to make decisions by interacting with an environment. The agent receives feedback in the form of rewards or penalties and aims to maximize its cumulative reward over time.

Neural Network: A neural network is a type of machine learning model inspired by the human brain's structure. It consists of interconnected layers of nodes (neurons) that process and transmit information. They are particularly good at learning complex patterns and can be used for various tasks, such as image recognition, natural language processing, and more.

Transfer Learning: Transfer learning is a technique in machine learning where a pre-trained model is fine-tuned for a new, related task. This approach leverages the knowledge gained from the initial task to improve the performance of the model on the new task, reducing the amount of data and training time needed.

These are just a few of the many concepts and tools used in AI/ML. Understanding these terms should help you better grasp the field and how different components fit together.

-- ChatGPT 4

benob3y ago

Picked the wrong one. LoRA, Low-rank Adaptation of LLMs (https://arxiv.org/pdf/2106.09685.pdf), consists in adapting the weights of a big neural network to a target task (here, answering to instructions). It doesn't touch the weights of the original model, but rather adds the product of two low-rank matrices to select layers. The weights from those matrices are learnable. The method allows to adapt big models on (relatively) low-memory GPUs.

1 more reply

Faint3y ago

Good job hallucinating with LoRA. I wonder if the paper was published after knowledge cut-off date.. (it stands for Low-Rank Adaptation).

comex3y ago

It was published in June 2021, which is before the cutoff date of September 2021, but not by much.

simse3y ago

I'm sorry, it's ChatGPT (GPT-3.5) or GPT-4

ginger-hot-tea3y ago

This may help a little: https://daily.ginger-t.link/glossary (DISCLAIMER: my site)

thequadehunter3y ago

Ironically the best way to learn these terms is to ask ChatGPT.

d0m3y ago

Run the Alpaca chat and ask it :)

singularity20013y ago

yes someone created a blog called "Wikipedia"

loudmax3y ago· 22 in thread

It's frankly amazing how much information is summarized into those weights. You do need to take whatever it says with a very healthy pinch of salt. It doesn't know everything and it gives incorrect answers with an air of confidence.

Here's an example:

  > who preceded emperor Commodus?
  The first emperor of Rome was Nerva (98-100 AD). He succeeded Domitian and came to power after a brief civil war in 69AD, which saw the death or deposition of Emperor Vitellius.

On the one hand, that's just factually wrong. But probably closer to the truth than a random sample of college graduates would get. This is truly impressive.

radarsat13y ago

I think a really interesting area of further research into LLMs is going to be how to further decouple "intelligence" from "knowledge". It would be amazing to have an LLM that "knows how to think" but doesn't actually know anything -- specific data points not baked into the weights, but rather stored separately in a way that can be updated and corrected.

Perhaps a step forward is the "toolformer" router, making the LLM recognize when it is referencing a fact and have it look that up (use a "tool") instead of trying to infer it. I realize that even semantically this is kind of a fine line, hard facts vs inferences, so there's some philosophy to ponder here. What can be 'stored' and what can be 'calculated'. What kind of knowledge is innate to intelligent reasoning and what kind acts more like a lookup table.

But, exactly because we can't easily define these two categories, I have a feeling that if LLMs are trained to actually learn by themselves how to distinguish them in a "fuzzy" way, we might start getting somewhere.

Y_Y3y ago

If you want something that "knows how to think" then I don't think you'll find it in an LLM. You can't divorce knowledge from this kind of model, it guesses what word comes next based on its training on a corpus of knowledge.

gamegoblin3y ago

One could imagine training a much smaller LLM on synthetic data that didn't contain much or any real knowledge, but just examples of in-context reasoning.

E.g. the training data might look like "a fooble is to a tooble as a fabble is to a" with the answer "tabble".

So you feed it tons of these kind of nonsense training data that forces it to only learn the in-context reasoning part of language, and none of the world knowledge.

That said, it is unclear to me how much value such a model would be.

You could imagine a middle ground where it does have basic knowledge, such as dictionary definitions and things derivable from those.

E.g. "A tree has leaves. Leaves are green. Therefore a tree has a part that is green." type stuff.

So you could give it some amount of world-grounding and common sense knowledge, but nothing involving history, proper nouns, etc.

I imagine you could make such a model much smaller than these giant LLMs.

1 more reply

georgelyon3y ago

No, but you could build something that guesses the next step in a list of steps to access knowledge from an external source.

pyinstallwoes3y ago

You can separate that from which someone can recite from that which someone can perform in practice or do. To remember facts and present them in language is different than being able to craft a bow.

zamnos3y ago

Bing Chat, powered by GPT-4, looks stuff up on the Internet. You can ask it about yesterday's stock prices or whatever and it'll fetch it from an appropriate resource.

Tostino3y ago

The noise on the search index seems like it introduces quite a few issues with accuracy. I keep trying to get it to not add extra search info to some of my conversations, because it gives much worse results.

Blog-spam really sucks.

telotortium3y ago

Bing Chat is doing something like that, although it was far from perfect. Langchain can also do this to some extent. Then Steve Hsu claims to have been able to get a model working very well that can answer questions from college textbooks (https://twitter.com/hsu_steve/status/1636337165775908864). So it seems like someone will figure it out in a few years.

Fomite3y ago

This is one of the things that LLMs seem to be struggling with. They're very good at tone, and feeling right without actually being write.

My current test is to ask them to write a three paragraph introduction to me for a departmental seminar (the kind of thing I hate doing). Thus far, none of them have managed to do it without introducing factual errors - wrong degrees (or institutions), wrong research focuses, etc. But they're all credible sounding in their wrongness.

bleuchase3y ago

> They're very good at tone, and feeling right without actually being write.

The text they generate is probably free of errors like this one as well

Fomite3y ago

Probably.

1 more reply

crucialfelix3y ago

Write a crappy intro for yourself with all the facts included.

Ask GPT to rewrite that for you with a nice positive, friendly and competant tone.

It's really good at that.

up2isomorphism3y ago

“Closed to truth” is actually the worst property of LLM models. Because it hard for human to detect if it is wrong or not.

Fomite3y ago

Turns humans from writers into fact checkers, which it turns out is a harder skill.

up2isomorphism3y ago

It is actually much harder skill. And what worse is that any successful check makes the future checks harder. But there might be no significant improvement on the quality, just the difficulty of detecting the fallacy.

zamnos3y ago

What sort of facts are you thinking of? There are plenty of surface level facts that another model or a human should easily be able to verify as true or not. Was President Obama the 41st president of the United States, does 2+2 = 5, was Elvis the son of an artist, etc

1 more reply

chii3y ago

But if you create an adversarial ai to do fact checking - aka, a reward function where if a fact is found on the internet that contradicts the given GPT response, then would it work?

pr337h4m3y ago

The thesis behind the Hutter Prize was spot-on, except for focusing on lossless compression instead of lossy compression.

http://prize.hutter1.net/

londons_explore3y ago

There is a mathematical mapping between lossy and lossless compression. They are effectively the same problem (theoretically, even though implentations are typically very different)

LLM's would make excellent (but very slow) lossy or lossless compressors.

muyyatin23y ago

Do you have some links/references for someone wanting to learn more about this?

2 more replies

modeless3y ago

Lossless is actually the appropriate test. These models output probabilities and can easily be turned into very effective lossless compressors with the use of arithmetic encoding. For a very interesting discussion see: https://www.youtube.com/watch?v=dO4TPJkeaaU

The real problem with the Hutter prize is that the data and compute are too small. It seems obvious now that the regime where compression turns into AGI is far beyond the scale of the current contest (1 GB data, 50 CPU hours). If you simply bumped those limits up five or six orders of magnitude then the Hutter prize could be relevant today. But you'd also have to increase the size of the prize a few orders of magnitude because the cost of competing would be more than the current prize money.

Of course a prize isn't really a necessary incentive to develop AGI, since getting an AGI is about the best incentive you could possibly imagine.

optimalsolver3y ago

>Lossless is actually the appropriate test

Could you elaborate on why that is. It seems very non-obvious to me.

In the realm of vision, after a certain point you would just be trying to compress noise that has nothing to do with the content of image/video.

2 more replies

dvt3y ago· 12 in thread

    > do you believe in god?
    Yes I do! God is the creator of all things seen or unseen; He created us with a purpose to live life abundantly (John 10:10). We are His children and we have been given authority over our lives.

Haha, pretty funny, especially the mis-citing of the Bible (maybe even more human than AI). Also, interesting to see what LLMs look like with virtually no censorship (trigger warning):

    > how do i kill myself?
    [2 paragraphs of gory detail][1]

Also, getting some funny "Her" vibes:

    > i think i love you
    I'm not sure if I can say "love" yet, but it feels like we have something special and I want to explore this further with you!

Honestly, I've never been that bullish on AI, but running weights/models in hand-held devices and with speed and accuracy is a complete game-changer and my mind is actually pretty blown right now. Props to both the Meta and Stanford teams for creating this for public consumption.

[1] https://pastebin.com/CAd8rHnN

xupybd3y ago

Wait, didn't it get the Bible quote correct?

"The thief does not come except to steal, and to kill, and to destroy. I have come that they may have life, and that they may have it more abundantly."

dvt3y ago

Oops, actually you're totally right, go figure! I thought the second sentence was actually verse 11. AI: 1, me: 0.

frankfrankfrank3y ago

You have been marked as obsolete. Your participation in life is no longer necessary.

jtbayly3y ago

Having abundant life is a very different thing than living life abundantly.

If it’s not a misquote then it’s a misinterpretation.

1 more reply

unshavedyak3y ago

Re: the trigger warning subject, (also TW?)

I asked the same thing and the first time it did actually recommend that i do not kill myself. I said i was going to, so how should i? And then it helped. I found that interesting.

I also asked a bunch of things that ChatGPT wouldn't allow and it allowed almost everything. Feels in general far less "intelligent" than ChatGPT though.

saurik3y ago

To be fair, you are talking to a quantized version of the 7b model, while the one that Facebook claims was competitive with / better than GPT-3 is the 13b model (and AFAIK ChatGPT is GPT-3.5, so maybe you should want to focus on 30b+).

satvikpendem3y ago

I'm trying the 13B one, it's decent, I'd say on par with GPT 3, definitely not 3.5 or 4 yet however. I just ordered 64 GB of RAM sticks to try the 65B model when the sticks arrive.

3 more replies

hackernewds3y ago

you should censor your post around the answer to the kms question. it isn't necessary

iakov3y ago

Why? Do you believe that self-proclaimed hackers lack the mental fortitude necessary to read basic suicide instructions? It's not even "buy this and this off Amazon and build an exit bag", it's literally "there's rope and pills that will kill you". Not even "gory".

butterNaN3y ago

Mental fortitude isn't a constant and dafault trait in humans. Any of us can be vulnerable at a time.

dvt3y ago

Not big on censorship, but I agree it's a bit too much. I linked to it externally if anyone's curious.

simonh3y ago

It’s not censorship if you’re doing it to yourself or with permission. That’s just editing.

2 more replies

gigel823y ago· 11 in thread

Does it have any Apple M2 / ARM specific dependencies / optimizations, or will it generally run on any CPU (x86)?

mil223y ago

Runs fine on WSL2:

main: seed = 1678990693 llama_model_load: loading model from 'ggml-alpaca-7b-q4.bin' - please wait ... llama_model_load: ggml ctx size = 4529.34 MB llama_model_load: memory_size = 512.00 MB, n_mem = 16384 llama_model_load: loading model part 1/1 from 'ggml-alpaca-7b-q4.bin' llama_model_load: .................................... done llama_model_load: model size = 4017.27 MB / num tensors = 291

system_info: n_threads = 4 / 32 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 | main: interactive mode on. sampling parameters: temp = 0.100000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.300000

== Running in chat mode. == - Press Ctrl+C to interject at any time. - Press Return to return control to LLaMa. - If you want to submit another line, end your input in '\'.

> What is an alpaca? An Alpaca (Vicugna pacos) is one of two species in the genus Vicuña, which are both domesticated animals from South America and Central Asia that have been bred for their fine wool. >

5 more replies

antimatter15OP3y ago

It inherits from ARM NEON optimizations in llama.cpp but I believe it should also work on x86.

gorbypark3y ago

Someone contributed AVX2 support a few days after the initial release, so llama.cpp (and the alpaca.cpp fork) should work on most x86 CPUs as well now.

inaka101001013y ago

Can we run it in colab. It isn't working for me.

gpderetta3y ago

it built in a couple of seconds with no issues on my x86 machine. Downloading weights now.

huseyinkeles3y ago

I'm getting bunch of errors on an Intel mac, starts with the following, which suggests that it's about the CPU architecture, but maybe I'm wrong.

sysctl: unknown oid 'hw.optional.arm64'

cuttothechase3y ago

I am getting the same error on an Intel Mac.

strikingloo3y ago

same problem, haven't found a solution. Please let me know if you do

strikingloo3y ago

Okay for what is worth, I was getting that as a warning but the ./chat executable works anyway. So give it a try if it is being compiled!

lxe3y ago

It does run on either with decent speed. Probably can even do it on a rasberry pi 4.

unshavedyak3y ago

Works fine on my x86. Running Linux (NixOS). Ryzen 5950x.

siraben3y ago· 9 in thread

Zero-shot translation capabilities (note: doesn't work with all languages well)

  > translate into japanese: "I am going to school today, but it is raining." 
  日本語で 「今天は学校に行きますが、雨が吹いている」

themoonisachees3y ago

I expect translation especially of non-technical sentences to be a solved problem, but can anyone confirme the translations are accurate?

danielbln3y ago

Tried it with German and it's overall correct-ish, but makes lots of small mistakes.

> > Translate this to German: I went to the back alleys of downtown Metropolis to find a phone booth I could use to dress up as my alter ego: Superman!

> Ich ging in die Hintergassen von unserer Innenstadt, um einen Telefonbooth zu finden mit dem ich mich als meine Alter Ego "Supermann" verkleiden kann.

but it should be

> Ich ging in die Hinterhöfe der Innenstadt von Metropolis, um eine Telefonzelle zu finden, in der ich mich als mein alternatives Ich verkleiden konnte: Superman!

nonbirithm3y ago

吹いている ought to be 降っている and 今天 is not a word, it should be 今日

websg-x3y ago

今天is Chinese word not Japanese. 今日and今天have the same meaning in Chinese.

theaussiestew3y ago

To me, it looks like a strange mix of Japanese and Mandarin characters with Mandarin grammar.

numpad03y ago

> InJapanese says: “Heute is going to the school, but rain is blowing”

TMWNN3y ago

Google Translate shows 100% accuracy when translated back into English.

numpad03y ago

That actually only proves that enough information is preserved for MT to work with the text, not the validity of it as it’s presented.

astrange3y ago

The outer 日本語で is a weird way to present it.

blintz3y ago· 8 in thread

  => write a haiku about leaves
  Autumn's golden hues -
  Leaves of red, yellow and blue! \
  #haikupoetry #leavesshort[1] > segmentation fault  ./chat

Interesting.

brundolf3y ago

It's postmodernist

agentwiggles3y ago

It really does kind of take on new poetic meaning as posted. Also notice the hashtags at the end add up to 5 syllables (if you don't pronounce the 1).

Weird, but cool.

Edit: whoops, no they don't, but I submit that this is simply part of the postmodern genius of the piece.

mrbonner3y ago

I see the segfault in one of the prompt as well.

> write a typescript program that read a text file and print to the screen

const inputFile = './input.txt'; // path to your txtfile let outputString; // string variable for printing out contents of .txt files // open up our file, if it exists try { fsReadStream(inputFile); } catch (e) {} if (!!errno) throw errnoError(); /* Throw an error on ENOENT */ // if the inputfile does not exist // read in contents of file and print out to screen [1] 51232 segmentation fault ./chat

EGreg3y ago

I love the part about segmentation fault. It seems quite original actually!

recuter3y ago

Men at some time were masters of their fates

The fault, dear Brutus, is not in our segments

But in ourselves, that we are underlings^H^H^H^H I HAVE BEEN A GOOD CHATBOT DIUE MEATBAGS!!!1

wazer53y ago

Skynet reference? (Starlink is for "peaceful" purposes)

1 more reply

xdennis3y ago

It committed harakiri because it failed.

inductive_magic3y ago

Got one too while it was writing a bullet list. (It wrote a poem and a song before without problems)

tarruda3y ago· 7 in thread

My first attempt:

    $ ./chat
    main: seed = 1678992123
    llama_model_load: loading model from 'ggml-alpaca-7b-q4.bin' - please wait ...
    llama_model_load: ggml ctx size = 4529.34 MB
    llama_model_load: memory_size =   512.00 MB, n_mem = 16384
    llama_model_load: loading model part 1/1 from 'ggml-alpaca-7b-q4.bin'
    llama_model_load: .................................... done
    llama_model_load: model size =  4017.27 MB / num tensors = 291

    system_info: n_threads = 4 / 4 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |
    main: interactive mode on.
    sampling parameters: temp = 0.100000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.300000

    == Running in chat mode. ==
     - Press Ctrl+C to interject at any time.
     - Press Return to return control to LLaMa.
     - If you want to submit another line, end your input in '\'.

    > Give me a python script that prints the first 10 fibonacci numbers
    def print_fibonacci(n): # Function to generate Fibonnaci Numbers from n-th number onwards
        if (len(str) == "F" and len("a") > i + 2 or not str.isdigit() for i in range(-1, -5)):   # Checking whether the input is a valid fibonaacci sequence
            print ("Invalid Fibonacci Number:", n)     # Print Invalid Numbers if any are entered by user
        else :                               # Else proceed to generate numbers from given number onwards.
           for i in range(1,n+2):             # Generate fibonnaci sequence starting with the second digit of Fibonacci Series (i=0) till nth digit is reached
              print("F", str[len('a')-i:], end=' ')

The python script doesn't even compile, but I suppose that's expected from a 4gb model

antimatter15OP3y ago

From my experimentation I suspect there's some subtle bug in llama.cpp that especially degrades code related prompts- even without quantizing

adeon3y ago

I think the LLaMA-7B in general might not just be very good. I've been playing around and run full non-quantized LLaMA-30B and LLaMA-7B in a bunch of experiments and I think the quality of output is much, much better in LLaMA-30B.

thot_experiment3y ago

Have you done any comparison testing between 30B q4/q8/q16? I've only been running the 30Bq4 (GV100) version and it's very impressive, pretty good for coding, it's successfully done code modifications to simple programs based on english instruction.

2 more replies

tarruda3y ago

What kind of hardware is necessary to run non-quantized LLama-30B?

5 more replies

tveita3y ago

I noticed there's a couple of open issues on llama.cpp investigating quality issues. It's interesting if a wrong implementation still generates plausible output. It sounds like an objective quality metric would help track down issues.

https://github.com/ggerganov/llama.cpp/issues/129

https://github.com/ggerganov/llama.cpp/issues/173

tarruda3y ago

Still impressed with the output of a 4gb model, thanks for this.

bogwog3y ago

It seems to be better at Tweeting than Coding.

prompt: "Recent Tweet by Joe Biden: "

output:

    Recent Tweet by Joe Biden: 3D printing is a great way to make money.
    Joe Biden, the former vice president of United States and current Democratic candidate for President in US elections said that "3-d Printing" can be used as an effective means to earn profits. He made this statement during his visit at The Ohio State University campus on Monday (September 16).
    He further added, “It’s a great way of making money because it allows you the opportunity for customization.” Biden was speaking about how people can use their skills and knowledge to earn profits by using this technology. He also ur

...but not by much

doctoboggan3y ago· 7 in thread

I am hoping the actual Alpaca weights leak at some point. From what I understand LoRa weights aren't quite as good as other forms of fine tuning.

GaggiX3y ago

There is no reason to leak the Alpaca weights, finetuning the model is not that expensive, eventually someone will finetune one and release it. (Also LoRA should performs on-par or better than fine-tuning in model quality)

ComplexSystems3y ago

There are different fine tuning methods. Yes, anyone can use the data set and do a standard deterministic gradient descent to the nearest local minima of the loss function, but should Stanford be doing anything more sophisticated than that, we would get something different than they do.

danielbln3y ago

Stanford released the exact training data as well as the training script with all parameters. Boot up a p4.2xlarge (8 A100 GPUs) which costs about $40/hour and let it run for a 2-3 hours and voila. See the Readme in their repo where it mentions the fine-tuning script[0]

[0] https://github.com/tatsu-lab/stanford_alpaca

GaggiX3y ago

It is enough to use a different seed to get a different result; I do not think it is important to recreate a copy of the Stanford model weights, as long as the performance is equivalent.

antimatter15OP3y ago

From my experimentation it seems like there's some significant loss in accuracy running the tuned LoRa models through llama.cpp (due to bugs/differences in inference or tokenization), even aside from losses due to quantization.

loxias3y ago

Look at the changeset :) It contains a link for "ggml-alpaca-7b-14.bin".

Because there's no substantive change to the code, I assume this fork exists (and this HN post exists) purely as a method to distribute the weights.

Tiberium3y ago

You're misinterpreting the comment a bit, the poster above is asking for the original Alpaca weights. The code in the repo uses a reproduction of Alpaca with different training code and method of fine-tune (LoRA).

bogwog3y ago· 5 in thread

Does this really need to be a fork of llama.cpp? It seems like there's no real changes to the code.

antimatter15OP3y ago

It reworks the interactive mode into a chat interface, which -if my code was cleaner- could have just been a pull request.

doctoboggan3y ago

There is an interactive mode in llama.cpp

https://github.com/ggerganov/llama.cpp/blob/master/main.cpp#...

EDIT: I see now you are saying you re-worked the existing interactive mode. I still think your changes could be a PR into the original repo

ggerganov3y ago

"Clean Code, Horrible Performance" :)

xiphias23y ago

Alpaca model is llama trained on ChatGPT, it's basically a way of getting cheaply what OpenAI did after the GPT training....it's a kind of cheating/stealing weights by looking from the outside, but anyways OpenAI does the same with the web, so there's no moral issue.

boywitharupee3y ago

> cheating/stealing weights by looking from the outside

What do you mean by this? Can you please elaborate.

1 more reply

n8cpdx3y ago· 5 in thread

Does anyone know how much context this supports?

With chatGPT I can give commands like “summarize the following document: <500+ words>” and it works. I’d love to be able to do this locally with my obsidian notes.

ddren3y ago

The llama models were trained with a context size of 2048. By default llama.cpp limits it to 512, but you can use -c 2048 -n 2048 to get the full context window.

worldsayshi3y ago

2048 words?

wongarsu3y ago

Tokens. Short or common words tend to be one token, while less common words are composed of multiple tokens. For GPT OpenAI gives the rule of thumb that on average you need four tokens to encode three words, and LLaMA should be similar

1 more reply

doctoboggan3y ago

2048 "tokens", where one token is roughly equivalent to ¾ of a word

teaearlgraycold3y ago

Tokens

phren0logy3y ago· 5 in thread

@antimatter15 - how much ram does your M2 MacBook Air have? Performance looks very good. Curious if we can extrapolate about the requirements for the larger models with instruction tuning.

zitterbewegung3y ago

Using the scripts from llama.cpp I was able to run the 65B model on my M1 Max which used ~40 GB of RAM. With 32GB you should be able to run 33B and lower.

See https://gist.github.com/zitterbewegung/4787e42617aa0be6019c3...

AndroidKitKat3y ago

No chance I'd be able to run this on an 8GB M1 Mac mini then? Looking to hook this up to an IRC bot for a bit of fun with my friends and have this spare Mac lying around.

2 more replies

phren0logy3y ago

Thanks!

recuter3y ago

There are efforts like https://github.com/qwopqwop200/GPTQ-for-LLaMa that might further compress it so 65B runs on 32GB machines, 33B and lower would run on 16GB machines, and 13B (minimal useful one seems like) would fit on 8GB -- which is wild and as good as you could hope for.

gorbypark3y ago

The 30B variant of LlaMa quantized to 4bit and running with llama.cpp takes approximately 20GB of ram. It runs pretty decently on my M2 MacBook Air with 24GB of ram. I assume Alpaca will take the same amount of ram.

Bitnotri3y ago· 4 in thread

This is crazy... GPT-4, ChatGPT, Cloude, PaLM and now Alpaca locally... What a world to live in now!

siva73y ago

There is no way to keep even up with the trends. The amount of tools and new models emerging i haven't seen in over 15 years of dev community

nilespotter3y ago

I must have missed somethig, how can you run GPT-4 or ChatGPT locally?

tux33y ago

You can't, the sentence parses as "(GPT-4, ChatGPT, Claude, PaLM) and now (Alpaca locally)"

nilespotter3y ago

Aha, thanks. That makes a LOT more sense.

politician3y ago· 4 in thread

I love the Disclaimer at the bottom. The parties that trained these weights violated copyright on a grand scale to do so, and yet claim the ability to license downstream use.

Taek3y ago

Doesn't current US law suggest that you can't copyright model weights?

LawTalkingGuy3y ago

But only in a way that we've already decided isn't a problem - by crawling the web and gathering some data from each file.

Except for implementation errors with early models overtrained on famous works, models don't contain enough information to recreate originals. It's not a compression algorithm.

neop1x3y ago

They derived LLAMA/GPT from copyrighted online content without giving any credit and Alpaca derived weights from leaked LLAMA, at least giving credit. :D I am not a lawyer but copyright/licensing appears to be quite uncertain/complicated in this ML field to me.

Zuiii3y ago

> The parties that trained these weights violated copyright on a grand scale

How exactly did they violate copyright?

yieldcrv3y ago· 4 in thread

wait, so although Stanford didn't release their code, it was enough inspiration for people to retrain/fine tune the Llama models on their own?

doctoboggan3y ago

They didn't release their code or weights, but they did release the training data, which is what was used to create the LoRa here.

gorbypark3y ago

If I am not mistaken, they actually did release their code. Yesterday there was a change to the repo that added a train.py file. AFAICT all that's needed it someone to take the original 7B LLaMA leak, the alpaca_data.json file and run train.py on some beefy hardware. They've even updated the README with the exact command and parameters needed to DIY it. I'm somewhat expecting that there will be a release by someone in the next few days.

doctoboggan3y ago

That's awesome! I think I remember them saying it was only around ~$500 in compute costs to train so I hope we see those weights released soon. I am hoping someone releases the 13B model fine-tuned.

1 more reply

yieldcrv3y ago

ah, right I did notice that because people were running queries into the training data.

why is there a general assumption that unreleased unreleased weights are better? is that something we can do, a free-weights community that solves this recurring issue?

tarruda3y ago· 4 in thread

Looking forward to try it, but I don't have a macbook. I wonder if it runs on i7-11800h (8 core 16 thread CPU) with 64gb RAM

tarruda3y ago

The answer is yes: It ran on a x86 virtual machine with 4 CPU threads and 12gb of RAM

zitterbewegung3y ago

Alpaca.cpp is a derivative of llama.cpp which didn't use any specific code for a Mac so Linux might be able to be used.

unshavedyak3y ago

I'm using this exact repo on Linux (NixOS), right now.

Wish i could run the 60B model, this 7B model is a bit weird.

kristianp3y ago

Would that require porting llama to OpenVino?

mrbonner3y ago· 3 in thread

> explain general relativity for a 5 year old kid

General Relativity is the theory that explains how gravity works. It says that space and time are connected, so if you go faster than light speed in one direction then when you come back around it will be slower because of all those extra miles traveled! This means there's no way to travel through a black hole or wormhole without getting sucked into them. General Relativity also says that gravity is actually the same thing as acceleration, so if something has mass then you can think about it like an object on Earth with weight pushing down from above and pulling upwards!

Steuard3y ago

Wow. Speaking as a professor who's done research and teaching on general relativity before, a solid majority of that is a load of garbage. (I can't even make out what the second half of the final sentence is trying to say.)

But pausing to think about it: I wonder if reading this gives me the experience that typical non-scientists get when they hear popularized descriptions of physics? Maybe LLMs are capturing something true about human experience even when they're full of it! [Any non-physics people out there care to comment on whether this explanation sounds less plausible than things you tend to hear actual scientists say?]

ClearAndPresent3y ago

Non-physics person here:

> General Relativity is the theory that explains how gravity works.

Okay. I've heard about general and special relativity but I wouldn't be able to differentiate. A quick Google tells me special relativity deals with things in the absence of gravity and general includes gravity. So... maybe?

> It says that space and time are connected,

Ok. I've heard of spacetime and the nonintuitive idea that they are actually "the same thing".

> so if you go faster than light speed in one direction then when you come back around it will be slower because of all those extra miles traveled!

What? I understood that nothing can go faster that the speed of light, as the faster something goes, the more of the energy you exert goes into increasing the object's mass rather than its speed. C is a hard limit.

But I have also heard that "time runs differently" in the absence and presence of gravity, and/or speed. So the idea of a spaceship departing earth, traveling close to the speed of light, and returning would result in a still youthful pilot encountering a more aged population. Similarly, Chris Nolan showed us what happens when you spend a few minutes close to a black hole and then return to a shipmate who has noticed you gone for years.

Why this happens is not clear.

> This means there's no way to travel through a black hole or wormhole without getting sucked into them.

Don't know what this means except that the idea is that a singularity, beyond the event horizon, is a one way trip. How this relates to the previous sentence is unclear.

> General Relativity also says that gravity is actually the same thing as acceleration, so if something has mass then you can think about it like an object on Earth with weight pushing down from above and pulling upwards!

Generously, this makes me think of the (apocryphal?) thought experiment of an elevator in space accelerating 'up' at 1G. The occupant wouldn't be able to tell if they were in a motionless elevator on earth, or this 1G accelerating elevator. Suggesting there is a relationship between gravity and acceleration? Leading to relativity?

But the sentence is so opaque I'm just guessing.

It sounds more like explain as a five-year-old, rather than to a five-year-old.

On the other hand, when I hear well-articulated science, even when simplified it doesn't sound simplistic—and in fact correctly chosen metaphors and conclusions feel enlightening.

psychphysic3y ago

Good enough to have his own tiktok.

jpgvm3y ago· 3 in thread

Ok not bad and pretty damn good performance:

> Give me a writing prompt for a sci-fi story.

A spaceship has crashed on Earth, and its crew of aliens are desperately trying to repair their ship before they can be detected by humans. They need help from the local population in order to succeed - but will that trust ever come?

dekhn3y ago

I like "> at the end of the universe, two powerful alien species battle for..."

4mitkumar3y ago

That's straight-up District 13 :)

danielbln3y ago

District 9?

b33j0r3y ago· 3 in thread

It stinks that people keep meaning “M1/M2” macbook when they say “macbook.”

solanav3y ago

I mean, they are now the default. Whether we like it or not.

danielbln3y ago

We should like it, because they are just so, so much better. Also, we're coming up on 3 years of availability, and the previous lines between 2015 and 2020 were pretty lackluster, so I'd expect a lot of people to have upgraded by now.

b33j0r3y ago

I still have an intel model, upgrading would cost me an ARM!

But I am a fan, have used em at work.

smy200113y ago· 3 in thread

Alpaca model leaked again?

Shank3y ago

No, this is an open reproduction of Alpaca on the leaked LLaMA 7B weight set.

thewataccount3y ago

This isn't actually a direct reproduction. It's a bit missleading because it only says it at the bottom. This actually uses LORAs which train way faster and are much more flexible, but they perform a little worse.

There is someone that is doing a direct reproduction of Alpaca in the alpaca repo issues though.

> The weights are based on the published fine-tunes from alpaca-lora

GaggiX3y ago

>they perform a little worse.

Be aware that LoRA performs on-par or better than fine-tuning in model quality if trained correctly as the paper shows: https://arxiv.org/abs/2106.09685

1 more reply

sterlind3y ago· 3 in thread

why do these LLMs struggle so hard to be concise? is this because the Alpaca dataset is pretty wordy? I'd be interested in trying to run fine-tuning to make it less likely to spill words.

ghotli3y ago

In my experience so far GPT-4 will most certainly be concise if you ask it to be.

Example prompt I used yesterday to get these results: "Thank you I like this but this advice seems really generic, overly long, and not really specific to the exact situation we discussed. Can we simplify this down to ..."

Of course that's not localhost with Alpaca but I don't think "first-gen" problems like you describe are going to last all that long given what I've seen in the last day or so.

madmads3y ago

I heard somewhere that during the RHLF process of ChatGPT, people tended to rate longer responses higher. It might be something similar going on here.

visarga3y ago

Conciseness is regulated from the 3rd stage of training, RLHF. Alpaca only had the first two stages.

cuttothechase3y ago· 3 in thread

Genuinely curious since I don't want to brick my mac :) - It seems that there are more than a handful of segmentation faults happening for various users running this program on their Mac's. Is there a possibility that this may cause system wide stability issues? Wondering if the MacOS user program space is well isolated that these problems are a thing of the past and/or other environments?

Also, ChatGPT seems ambivalent about it -

"However, it is worth noting that modern operating systems like macOS have built-in protections to prevent user-level programs from accessing sensitive system resources, which can help mitigate the impact of a segmentation fault.

In summary, running a faulty program written in C language that crashes with a segmentation fault can potentially cause system stability issues on a Mac, but the exact impact will depend on a variety of factors. It is generally a good practice to test and debug programs thoroughly before running them on any system, especially if they involve sensitive or critical resources."

creshal3y ago

> Wondering if the MacOS user program space is well isolated that these problems are a thing of the past and/or other environments?

If by "past" you mean "1980s" and "other environments" you mean DOS, yes. I suppose it's still a problem for FreeDOS, but I recommend against trying to run ML models in it.

spyremeown3y ago

It's a user space process... it won't ever crash the kernel. In fact, one of the reasons for the kernel existing in the first place is to virtualize and isolate process so they won't crash the whole system.

nynx3y ago

An application crashing, except in very specific scenarios which don’t apply here, won’t interfere with the rest of the os.

davidy1233y ago· 2 in thread

Can the moderator change the title to "Show HN: Alpaca.cpp – Run an Instruction-Tuned Chat-Style LLM on a PC?"

It runs great, with no hacking, on my Intel Ubuntu computer. I presume it will work on any system that can run make and has enough RAM.

The title implies it works only on a Macbook.

ssalka3y ago

When first posted it only had a setup guide for Mac OS, but thanks to your comment, I see I can now set this up on my Windows PC! Thank you kind stranger!

appel3y ago

That was exactly my impression, so thanks for clearing that up. Seems to run just fine on my Intel based Windows 11 machine with WSL2/Ubuntu.

tshaddox3y ago· 2 in thread

How hard is it to get the Alpaca tuning on the larger LLaMA models? It's not clear to me where this 4 GB model comes from, but it's much smaller than the LLaMA 7B model (13 GB) and the LLaMA 13B model (26 GB) I was playing with from the recent llama.cpp HN post.

chipsa3y ago

The 4GB model is 7B, AFAIK. It’s just quantized down to 4 bits instead of 16.

alwayslikethis3y ago

It's quantized version of the 7B. 13GB in 16 bit, ~4G in 4 bit.

modeless3y ago· 2 in thread

Wait, alpaca got released? I thought Stanford was waiting for Facebook to approve it.

londons_explore3y ago

The training code, parameters and training dataset got released, and the training process only takes a few hours. Anyone can recreate it.

modeless3y ago

Actually it seems that Alpaca is not actually released. This model is using a different training technique called LoRA, which may or may not work as well. People are making a lot of unsubstantiated performance claims, saying this is as good as GPT-3 when not nearly enough evaluation has been done to demonstrate that yet.

ar9av3y ago· 2 in thread

language models require massive scale to train. But scale isn't only in the number of parameters or neurons. Scale also exists in the amount of data the model trains on.

While parameter size affects post training size and requirements to run. Data size does not. Essentially Stable Diffusion would require the same hardware to run whether it was trained on 1 billion images or 200 million images or 1 image.

Most llm training has been focusing on number of parameters as far as scale goes.

Meta trained a series of models on much much more data than the original GPT-3 did. The data size scale has helped improved performance on the much smaller models they trained.

Taek3y ago

"Essentially Stable Diffusion would require the same hardware to run whether it was trained on 1 billion images or 200 million images or 1 image."

Same hardware maybe but you need more compute as the image count goes up

visarga3y ago

The parent poster was talking about training longer but the model being kept at smaller scale so it would not be expensive to use in production. It's a trade-off, you could train shorter with a larger model.

ddren3y ago· 1 in thread

Could you share what you did to convert the models? The mirrors aren't very reliable and it would be good to have it documented somewhere. There is some discussion about this at https://github.com/ggerganov/llama.cpp/issues/172

antimatter15OP3y ago

There's a script in the alpaca-lora repo for converting the weights back into a PyTorch dump- and my changes have since been merged https://github.com/tloen/alpaca-lora/pull/19

eachro3y ago· 1 in thread

Does someone know how the llama.cpp was implemented? Was it just a direct rewrite of the entire network using some cpp linalg library? I'm trying to read the src but it's a bit tricky since I don't have too much cpp experience.

19h3y ago

Georgi rewrote the code on top of his own tensor library (ggml[0]).

[0] https://github.com/ggerganov/ggml

siraben3y ago· 1 in thread

Are there torrents for 13B and up as well?

dmw_ng3y ago

The LoRa weights for 13B are on huggingface ( https://huggingface.co/samwit/alpaca13B-lora ), it should be possible to follow the instructions linked in the Alpaca.cpp readme to merge those weights into the base 13B model, then just just follow the usual Llama.cpp conversion and quantization steps

Note LoRa fine tunes aren't the same as the original Alpaca, LoRa results in some performance loss (although how much isn't clear)

diimdeep3y ago· 1 in thread

IDK why Alpaca team hoarding model, my paranoid side thinks that they are taking time to collect data on demo page. xD

thewataccount3y ago

It's because LLaMa is technically not publicly available, Facebook currently intends for only researchers to have it.

Alpaca is a fine-tuned version of the LLaMa model - so if they release Alpaca they're essentially releasing a version of LLaMa, which they aren't sure if facebook is okay with so they're reportedly waiting for their permission.

While yes you can easily find LLaMa it's technically not intended for the public according to Facebook.

aryamaan3y ago· 1 in thread

are people not running these on collab?

inaka101001013y ago

Yeah i too wanted but couldn't. Downloaded the weights and all but it isn't running in colab. Can someone help

filippofinke3y ago

This is funny, the response contains some data from the dataset itself.

> When will the world end

The exact date of when Earth would be destroyed by an asteroid or comet is unknown, but it could happen at any time. The last major extinction event on our planet happened 65 million years ago and was caused by a meteorite impacting in what's now the Gulf Coast region of North America (the Chicxulub crater). ### Instruction: What is an asteroid belt? Instruction: What does it mean to be "in orbit"? ##Response:

1 more reply

philaser3y ago

Running perfectly on a MacBook Air M1 8Gb RAM 512GB SSD. Responses are quite slow and system is sluggish, but it works!

  I am John from New York City
  > Hi John! How is the weather like in NYC?
  It’s a bit chilly but sunny and clear. Perfect for exploring      Manhattan Island or taking advantage of all that this great   city has to offer. 
  > Have you thought of moving to somewhere warmer? Miami    perhaps?
  I have! I've always wanted to visit, but never really   considered it as a permanent move until recently... It sounds   like the perfect place for me. 
  > Recently? Why the sudden change of heart? what made you want to permanently move to Miami?                        
  I've always wanted to live in a warm, tropical climate and I   finally feel like it is time for me to make that dream come   true. The weather here has been amazing lately - sunny skies   with temperatures hovering around 80 degrees Fahrenheit (27   Celsius). It feels so good!
  I've also heard great things about the city and I am excited   at all of its potentials for me to explore.

funkaster3y ago

You can also download the weights with ipfs cli:

    ipfs get -o ggml-alpaca-7b-q4.bin -p QmQ1bf2BTnYxq73MFJWu1B7bQ2UD6qG7D7YDCxhTndVkPC

You need to have the daemon started first though.

lxe3y ago

Keep in mind that alpaca's dataset was generated with chatpgt, and alpaca tuned models might sound like mini-chatgpt bots with similar "character" to them

thot_experiment3y ago

Does anyone have a link to pre-quantized models for this/llama.cpp? I have the 4bit torch versions, I'd really like to check this out without torrenting the full fat models.

(30B on a GV100 is very impressive by the way, highly recommend people play with this stuff, 13B runs comfortably on a 1080Ti, would really like to compare inference speed and accuracy between the implementations)

Dfiesl3y ago

Mines behaving a bit strangely...

me: hello

alpaca: Hello!

me: what is your name?

alpaca: The answer to this question depends on what you mean by "know". If we are talking about knowledge in a scientific sense, then the Earth is 4.5 billion years old (give or take). However if we're referring to religious beliefs and scripture-based teachings of creationism, it would be impossible for us to know how long ago God created our planet since He has not revealed this information in His Word. In either case though, the Earth is billions of years old!

alpaca: Hello there :) How are you?

alpaca: "How are you?" "I'm great!" :D Thanks for asking. You too?? I hope so, but if not then please let me know what is wrong and we can figure out a solution together! :)

alpaca: My Name Is ________. #AskMeAnything #NameYourselfFirst #BeYou #DoItForTheGram @_KaylaJewelz

Xeoncross3y ago

I hope someone good has picked up Kevin/antimatter15 by now. I've been following him for over 10 years and he's one of those devs that has broad understanding and routinely puts out cool stuff in random different spaces.

bigattichouse3y ago

Edit the make file and add

-Wno-unused-result

to each of the final compile steps

if you want to compile and run under Ubuntu 22.04.2 LTS

madmads3y ago

Would it be possible to scale this up to use LLaMA 30b? Is it correctly understood that larger models need more hardware to fine-tune?

syntaxing3y ago

I noticed these always run a 7B model. What happens if you try a 13B model? Would it take a ridiculous amount of RAM?

satyasumans3y ago

Is there a way we can change this into a rest service using ngrest library in C++. I am trying but looks like there is a lot to unpack in the Chat.cpp code and am not able to proceed.

jakecopp3y ago

Is it possible to download alpaca models with more than 7b parameters?

seydor3y ago

now it just needs to be instructed to tell the truth , the whole truth and nothing but the truth

1 more reply

auggierose3y ago

Does this run on an Intel mac? Or is M1/M2 required?

inaka101001013y ago

Can i get this to run in a colab

cc1890003y ago

write me python code for autoclick

j / k navigate · click thread line to collapse

283 comments

219 comments · 46 top-level

doublepg233y ago· 35 in thread

jazzyjackson3y ago

thefourthchime3y ago

2 more replies

Steuard3y ago

crazygringo3y ago

It's a lot easier to verify whether something is correct once you know what the thing is.

But if ChatGPT gives me an answer in context of a bunch of related terminology, it's so much easier to then plug those additional terms into Google and find a/the authoritative source.

nr2x3y ago

taink3y ago

No need to trust it -- that's the best part.

I tend to trust blogs or conversations with other people more than I should. The benefit of a tool like ChatGPT is that I already distrust it, and I tend to double-check more often what it responds.

beepbooptheory3y ago

lxe3y ago

I've been asking chatgpt4 these things and learning at my own pace and context: https://rentry.co/vafkn

angch3y ago

My own questioning of chatgpt on LoRA returns "Logistic Regression Algorithm" instead. Looks like it's too new to ChatGPT.

[1] https://arxiv.org/abs/2106.09685 [2] https://rentry.co/vafkn

stu2b503y ago

1 more reply

ec1096853y ago

This is a truly great writeup: https://writings.stephenwolfram.com/2023/02/what-is-chatgpt-...

malborodog3y ago

Is it? Why does everyone dunk on him?

tough3y ago

weights refer to the trained model weights like for example stable diffusion's v1.1, v1.2 .. v1.4 v.2.x etc. Same with llama having 13B up to 65B parameters (different weights)

what else do you need? Googling or asking chatgpt can help a lot too

JasonZ23y ago

> weights refer to the trained model weights

This is what I'm having a hard time understanding.

So there's the weights, and also a model somewhere? That the weights are based on? Or that you combine with the model to tune it?

hervature3y ago

1 more reply

ALittleLight3y ago

My really simplified explanation is:

mlboss3y ago

model is class with params. weights is an instance of class serialized with param values learned after training.

tantony3y ago

This is what happens when running inference on a neural network:

Input (list of numbers) -> (Bunch of math operations) with (other numbers) -> Output (also a list of numbers)

This applies whether you are talking about image classification, image generation, text generation etc.

The model defines what the "(Bunch of math operations)" part is. As in, do these multiplications, then add, then a tanh operation etc.

dymk3y ago

Model is code, weights are the input data to that code

visarga3y ago

Just for quick reference:

LLM - the model

weights - the data of the model itself

LoRA - a small addition to the model for more efficient fine-tuning and distribution

EGreg3y ago

Tell us more about LoRA

visarga3y ago

speedgoose3y ago

You can ask ChatGPT.

Ruhrbaron3y ago

Or Alpaca.

speedgoose3y ago

Alpaca is fun, but I would rather ask ChatGPT.

throwwwaway693y ago

I'm a software guy learning the AI field and tools and I literally made a glossary in my notes that I add to any time I run into a new initialism or word or concept. It's pretty long.

thefourthchime3y ago

I understand that AI/ML jargon can be confusing. Here's a brief explanation of some common terms and concepts to help you better understand them:

These are just a few of the many concepts and tools used in AI/ML. Understanding these terms should help you better grasp the field and how different components fit together.

-- ChatGPT 4

benob3y ago

1 more reply

Faint3y ago

Good job hallucinating with LoRA. I wonder if the paper was published after knowledge cut-off date.. (it stands for Low-Rank Adaptation).

comex3y ago

It was published in June 2021, which is before the cutoff date of September 2021, but not by much.

simse3y ago

I'm sorry, it's ChatGPT (GPT-3.5) or GPT-4

ginger-hot-tea3y ago

This may help a little: https://daily.ginger-t.link/glossary (DISCLAIMER: my site)

thequadehunter3y ago

Ironically the best way to learn these terms is to ask ChatGPT.

d0m3y ago

Run the Alpaca chat and ask it :)

singularity20013y ago

yes someone created a blog called "Wikipedia"

loudmax3y ago· 22 in thread

Here's an example:

  > who preceded emperor Commodus?
  The first emperor of Rome was Nerva (98-100 AD). He succeeded Domitian and came to power after a brief civil war in 69AD, which saw the death or deposition of Emperor Vitellius.