For now my IRC bots run the alpaca 7B 4-bit. 13B was not a significant improvement for twice the computational time. But it's best to learn them now because as soon as openai gets sued for the first time all the turing test passing older models without the legal-butt-covering bolted on will be removed.
Whoa. I want to read this! Duel - what a great film. Twain - amazing writer. Animorphs - published after my teen years but sounds like a great story!
> For a more creative chat, use: temp 0.72, rep pen 1.1, top_k 0, and top_p 0.73
> For a more precise chat, use temp 0.7, repetition_penalty 1.1764705882352942 (1/0.85), top_k 40, and top_p 0.1
https://old.reddit.com/r/LocalLLaMA/comments/11o6o3f/how_to_...
https://old.reddit.com/r/singularity/comments/11vsvro/in_cas...
https://twitter.com/theshawwn/status/1632569215348531201
---
That being said, I found the OpenAssistant model much better: https://huggingface.co/spaces/olivierdehaene/chat-llm-stream...
It's also completely OSS, Apache 2.0, unlike LLaMA and Alpaca which are non-commercial.
“No, OpenAI does not have an API for dogs. They do, however, have an API for other animals, such as cats. To retrieve an image of a cat, you can use the OpenAI API for Dogs API and select the cat breed or type.”
This is the wild card here, though, isn't it? OpenAI's chatGPT likely uses more than 4 bits for it's parameters. IIRC the original LLaMA params were 16bit floats and they were quantitized down to 4bit - considering that large amount of compression, they sill do pretty OK, but not as good as chatGPT. I wonder how the alpaca/LLaMA models would do with 16bit floating point params (as they were originally trained)? What if they would have gone with 8 bits for the params as a compromise?
EDIT: Come to think of it, unless you're using vectorized ops on a CPU, 4 bit and 8 bit math is going to run at the same speed (for most popular CPUs), is it not? So why did they go all the way down to 4 bits instead of stopping at 8 bits (other than to make the param files 1/2 the size)?
EDIT2: looking through the alpacca.cpp code and there is mention of AVX, AVX2, AVX512 (and NEON on ARM) so it probably is taking advantage of vectorized ops where that's possible.
And now that there's a few competitors in the same league - 3.5 quality is suddenly garbage and only 4.0 is good enough.
Was it good enough before or wasn't it?
When Jurassic park first came out, or even something like Star Trek next gen. It looked AMAZING. So so realistic. But then…. As time goes on new things showed us what realistic could be.
I think we actually got better at seeing.
Same thing here. The more time you spend with it the more you notice things that don’t quite work. And then the new thing solves those problems, but we’ll find more wrongness
Now we’ve all gotten familiar with 3.5, and we’ve come to understand its limitations, so the public knows it’s not a “godlike” AI.
Luckily there’s a fresh new model, not technically different from the earlier one but it cost more money to build. The hype group can start again, citing the publicly known limitations of 3.5. But in 6 months we’ll understand what’s wrong with it, and the public will be talking about the limitations, just in time for 4.5.
They are neat, they are useful, but they can do so much more.
I am playing around with GPT-4 this week though. Let’s see how that goes.
GPT's performance in non-trivial translation tasks is unbelievable. all those articles mentioning jobs that are going to be replaced fail to mention translators are probably going to be the first.
"Specifically, GPTQ can quantize GPT models with 175 billion parameters in approximately four GPU hours, reducing the bitwidth down to 3 or 4 bits per weight, with negligible accuracy degradation relative to the uncompressed baseline."
This would be 175 billion 3 bit weights instead of 175 billion 16 (or 32!) bit weights. It massively reduces the size of the model. It makes loading it in ram on consumer computers feasible. The number of parameters stays the same.
n_parts = LLAMA_N_PARTS.at(hparams.n_embd);
in chat.cpp (or main.cpp) to hard code it to treat this 1 file model properly like, n_parts = 1;
Or re-write the parameter config subroutine to recognize and handle non-standard weights file.magnet: magnet:?xt=urn:btih:053b3d54d2e77ff020ebddf51dad681f2a651071&dn=ggml-alpaca-13b-q4.bin&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337%2Fannounce&tr=udp%3A%2F%2Fopentracker.i2p.rocks%3A6969%2Fannounce&tr=udp%3A%2F%2Ftracker.openbittorrent.com%3A6969%2Fannounce&tr=udp%3A%2F%2F9.rarbg.com%3A2810%2Fannounce
torrent: https://btcache.me/torrent/053B3D54D2E77FF020EBDDF51DAD681F2...
torrent: https://torrage.info/torrent.php?h=053b3d54d2e77ff020ebddf51...
This sounded like a really cool idea but I asked ChatGPT to do this for the plot of the movie The Shawashank Redemption and there is no way that I would ever have been able to guess that movie from the emojis it gave me. Perhaps GPT-4 does a better job at it.
These weights are shit man, they have been quanted w/o being retrained against the original. I already have this torrent, I want uncut originals. And no water marks this time, the last model wouldn't shutup about investing in tulips.
It’s USD 600 for fine-tuning. Maybe USD 4-5k for a computer that can run it.
ChatGPT pro is $20/month. 5k would be 250 months (10+ years) of paid access.
Not sure pirating it now adds up.
The reason to pirate it would to be able to fine-tune the model on your private internal source code repository, assuming you already have an existing large body of work you want to train it and offer SelfHostedCoPilot on your bespoke internal-only DSL that ChatGPT and Copilot has no way of having seen and would undoubtedly hallucinate about by a considerable margin.
They don't charge per interaction, but per token. The chat models range from a fifth of a cent per 1000 tokens to 12 cents per thousand tokens (depending on whether it's gpt-3.5, or the 8k limit gpt-4, or the 32k limit gpt-4, and, for gpt-4 models, also prompt v. response tokens.)
"godlike"? Really? I'm not religious, but this seems like an overreaction for something that has no agency.
Considering that we don’t know how the brain works so well, and we don’t understand why LLMs work so well, simply on the basis of their output I think the safest assumption is that these models do indeed have agency, or at least the capability of agency.
> How do you know agency is not simply the output of a large language model encoded in neurons?
I'm not sure what you mean here. Is agency an emergent effect of large digital or biological neural network? Maybe! Is it an emergent effect of a large language model? If it is, then it should be clear, or demonstrable, that the model (1) has goals (2) takes concrete steps to achieve those goals.
> What is the difference between neuronal and digital weights?
Brain chemistry works at orders of magnitude less speed, since we're talking about periodically building and releasing an ionic differential between the inside and outside of a cell wall. Moreover, we have a massive number of neurons and a stupidly massive amount of interneuronal connections, with billions of years of training over billions of lineages. Digital weights, in contrast, are a stripped down model of this system that throws out a whole class of complexities like hormones and metabolism.
> I think the safest assumption is that these models do indeed have agency, or at least the capability of agency.
I think this is an overly generous assumption.
I don't disagree that digital systems with neural architectures could have agency in principle, but agency generally is definitely not the output of a large language model. Animals without language have agency, in that that they take actions to fulfill their desires. Current LLMs may have some degree of intelligence, but they don't even appear to have any consistent wishes or desires. You can get them to talk longingly about x... until you give another prompt and suddenly x doesn't matter to them at all.
Doesn't 7B indicates that it was trained on 7 billion tokens? Or am I misunderstanding the nomenclature?
No, 7B means it has 7 billion parameters.
The genie really is out of the bottle now.
This is a lot like pharmaceuticals. The initial investment in a new medication is enormous. The price of each pill is trivial, to the extent that every drugstore chain is able to supply a generic in-house brand.
The other aspect is that fine-tuning an existing model is way cheaper than creating a competing model from scratch, so a company could offer CompetitorGPT/CompetitorCoPilot competitive with GPT-3.5, and offer fine-tuning of that model trained on the source code repository of the purchaser company's codebase, possibly on-prem or at least inside their AWS VPC/Azure/GCP equivalent.
The other thing to note is that OpenAI is hosting ChatGPT as a public resource available to anyone with an account, akin to Google being open to the public from day one (although that is without an account. Maybe Gmail is a better comparison). I can't say for certain, only OpenAI would know for sure, but I'm willing to bet that inference for ChatGPT is the vast majority of their costs (which is all but trivial). Any private internal-only instance of OpenChatGPT (using the unlicensed leaked LLaMA model or a legal copy or someone else's) could be paying (relatively) minuscule training costs, and way lower inference costs if it's internal-use only. Whether that cost can be borne by a small SaaS company's existing AWS budget is up in the air, which is to say ultimately that you're right - ChatGPT would be difficult without the support of Microsoft via a huge Azure grant, it's less obvious that a self hosted internal-only OpenChatGPT, not from OpenAI, would be possible by hobbyist self-hosters with a prosumer GPU cluster (Say with last generation K80's instead of business-priced A100's), or by a company wanting to leverage LLMs for private use by that company that wants to provide a Copilot like productivity multiplier internal tool to their developers, without sending private source code to OpenAI in lieu of a privacy agreement with them.
Reading through the README and issues on the llama.cpp project, there is some speculation that there is a bug in the quantization, or possibly a bug in the inference (less likely I think).
I hope this is true and once fixed the models can perform up to or past the ChatGPT level. If its not true and these models are performing correctly, then either the metrics used to compare it to GPT is garbage and don't capture the real world uses, or the instruction tuning done by the Stanford team is not up to par.
Who writes this shit?
Everyone will soon have the equivalent of online nuclear weapons: bot swarms that infiltrate every forum, including this one.
Note this was in 2020: https://www.technologyreview.com/2020/10/08/1009845/a-gpt-3-...
And here's 4chan bot: https://www.youtube.com/watch?v=efPrtcLdcdM
I can tell you that HN is probably already being infiltrated as well.
SPAM can't gang up on you in a forum and downvote you and turn your friends against you and destroy your reputation within 1 hour online. But soon, it will. The web as we know it is soon going to be over.
Close enough.
I gave it the prompt “cats aren’t always fuzzy” and it wrote a lengthy livejournal-esque rambling journal entry about a woman and her husband having money issues. It was funny, but lightyears away from chatgpt.
It does sometimes create some really funny hallucinations though, like inventing prefectures in Japan that don’t exist etc.
Personally, I think that the RLHF does make a big difference but maybe it's a bug in the quantization code as suggested up thread.
I've found OA to be better than Alpaca but I'll wait until the 65B 3-bit quantization efforts for Alpaca are underway to compare them.
Only if you agreed to the ToS or believe that the weights are copyrightable (precedents set by the copyright office and the courts strongly suggest that they aren't). I personally see no issue in using these models for commercial purposes.
I don't remember many books where this ends particularly well. Perhaps the Culture universe could be a survivable outcome. Hopefully we don't get Berzerkers first.
It does not matter what current capabilities of open source models are, because this opens the door to tremendous democratization of the ability to train and self-deploy these models.
In less than 6 months we will have open source models with gpt3-like capabilities, running locally on laptops, and potentially in phones and web browsers.
This is what I think comparing these bots is like. You can argue that they’re very close. But the delta makes a very big difference for any practical purposes because we’re looking for nuanced capability.
https://waitbutwhy.com/2015/01/artificial-intelligence-revol...
https://waitbutwhy.com/2015/01/artificial-intelligence-revol...
Basically, at the point where we have "almost human" level AI, it won't take much to get AI that's beyond human capabilities.
create a short poem about my wife helping me get over the stuffed nose and throat that i've had for days now. the cold was all around him, but he didnΓÇÖt notice it at first- so much to do in this new place! finally though when they were sitting by themselves on their porch swing his sweetheart leaned into her man saying ΓÇ£we could take some of these leaves from the garden and make tea.ΓÇ¥ He looked up with a quizzical look and saw that she was serious. I've been under weather since last Tuesday. Today is day 6...and still going strong! Had to cancel two nights on stage, missed my son
hmm I wonder if this is essentially a probe[1] technique + relies on chatgpt already having been extensively trained
like did they basically exfiltrate the weights
1. probing per https://arxiv.org/abs/2102.12452
Stanford Alpaca web demo suspended “until further notice” - https://news.ycombinator.com/item?id=35200557 - March 2023 (77 comments)
Stanford Alpaca, and the acceleration of on-device LLM development - https://news.ycombinator.com/item?id=35141531 - March 2023 (66 comments)
Alpaca: An Instruct Tuned LLaMA 7B – Responses on par with txt-DaVinci-3 - https://news.ycombinator.com/item?id=35139450 - March 2023 (11 comments)
Alpaca: A strong open-source instruction-following model - https://news.ycombinator.com/item?id=35136624 - March 2023 (296 comments)
I think you can train LLaMA 7B (the model underlying Alpaca) for around $82,000, based on the Meta Research paper about it. Then you can fine-tune it ala Alpaca for a few hundred dollars more.
My wilder speculation is that, if you can shrink the model down to 4GB with llama.cpp 4bit quantization, it may be possible to run it entirely in the browser (ala Stable Diffusion from the other day).
We've got big names like OpenAI, Google, Apple, Meta, Baidu, and Amazon putting in serious time and money to ensure their language models are safe and ethical. However, now that we know it's possible to build powerful AI models on a budget, it's crucial to think about what this means for the future of AI regulation and safety.
This Alpaca AI project is a stark reminder that we need to have a serious conversation about the possible repercussions of AI proliferation. We can't just sit back and assume the big companies will take care of everything. The genie is out of the bottle, and it's time for everyone in the tech community to face the music and take responsibility for the AI revolution.
OpenAI is an eventually to be obsoleted initial brute force approach that will be abstracted over and over into a simpler code implementation with rules to recreate the old state.
kkrieger is a simple example of a tiny data model that can be deterministically rehydrated. It’s not unrealistic for AI models to become a seed value for a normalized code base to deterministically unpack into necessary electron state