The genie escapes: Stanford copies the ChatGPT AI for less than $600 (opens in new tab)

(newatlas.com)

207 pointsFreddie1113y ago166 comments

166 comments

97 comments · 23 top-level

superkuh3y ago· 26 in thread

Hardly. I've played a lot with the 7,13, and 30B llamas as well as the 7 and 13B alpacas fine tuned by Stanford. They do not have emergent abilities like being able to generate rhymes or, say, represent a movie plot as emoji. Even openai's old text-davinci-003 (gpt3.5, but text completion, not the chat ones) far outperforms them. That said, I have hopes for a 65B 3-bit quantized alpaca-fine tuned. We'll see when someone spends the money to do the (more costly) 65B training. The alpacas are also much more likely to go off rails and start regurgitating their fine-tuning inputs. Either that or openai is doing a lot of post processing on their end to hide the same problems in their LLM.

For now my IRC bots run the alpaca 7B 4-bit. 13B was not a significant improvement for twice the computational time. But it's best to learn them now because as soon as openai gets sued for the first time all the turing test passing older models without the legal-butt-covering bolted on will be removed.

crooked-v3y ago

For me the easiest comparison between models is to give it an absurd but entirely possible request, like "Write me a fanfic where the Animorphs battle the truck from Duel, but in the style of Mark Twain". So far nothing else I've tried has done even as well as GPT 3.5 yet, let alone GPT 4.

gumby3y ago

I couldn't do it either as I have no idea what Animorphs or Duel are.

1 more reply

pram3y ago

How exactly do you get it to keep going? Every time I try a prompt like this in the playground it spits out a couple paragraphs and then refuses to generate anything further, even with tokens maxed out.

4 more replies

starik363y ago

That is my experience as well. I've tried various models but nothing comes even close to the current ChatGPT implementation (when it manages to stay up).

TedDoesntTalk3y ago

> Write me a fanfic where the Animorphs battle the truck from Duel, but in the style of Mark Twain

Whoa. I want to read this! Duel - what a great film. Twain - amazing writer. Animorphs - published after my teen years but sounds like a great story!

1 more reply

satvikpendem3y ago

You might need to fix your parameters. From the text-generation-gui guide:

> For a more creative chat, use: temp 0.72, rep pen 1.1, top_k 0, and top_p 0.73

> For a more precise chat, use temp 0.7, repetition_penalty 1.1764705882352942 (1/0.85), top_k 40, and top_p 0.1

https://old.reddit.com/r/LocalLLaMA/comments/11o6o3f/how_to_...

https://old.reddit.com/r/singularity/comments/11vsvro/in_cas...

https://twitter.com/theshawwn/status/1632569215348531201

---

That being said, I found the OpenAssistant model much better: https://huggingface.co/spaces/olivierdehaene/chat-llm-stream...

It's also completely OSS, Apache 2.0, unlike LLaMA and Alpaca which are non-commercial.

circuit103y ago

I’m impressed by how ChatGPT-like it is but also it’s saying things like

“No, OpenAI does not have an API for dogs. They do, however, have an API for other animals, such as cats. To retrieve an image of a cat, you can use the OpenAI API for Dogs API and select the cat breed or type.”

superkuh3y ago

I've tried all sorts of parameters including those exact ones. As for the huggingface stuff, it's not exactly clear how to use it without going down the python dependency rabbit hole. I am not confident I could get the correct python packages all together on Debian 11 to support running it. The llama.cpp stuff is very simple to compile and run comparatively.

UncleOxidant3y ago

> the alpaca 7B _4-bit_ [and presumably also 4bit for the 13B, 30B and larger parameter sets]

This is the wild card here, though, isn't it? OpenAI's chatGPT likely uses more than 4 bits for it's parameters. IIRC the original LLaMA params were 16bit floats and they were quantitized down to 4bit - considering that large amount of compression, they sill do pretty OK, but not as good as chatGPT. I wonder how the alpaca/LLaMA models would do with 16bit floating point params (as they were originally trained)? What if they would have gone with 8 bits for the params as a compromise?

EDIT: Come to think of it, unless you're using vectorized ops on a CPU, 4 bit and 8 bit math is going to run at the same speed (for most popular CPUs), is it not? So why did they go all the way down to 4 bits instead of stopping at 8 bits (other than to make the param files 1/2 the size)?

EDIT2: looking through the alpacca.cpp code and there is mention of AVX, AVX2, AVX512 (and NEON on ARM) so it probably is taking advantage of vectorized ops where that's possible.

leodriesch3y ago

Not an expert on the matter so take this with a grain of salt, but I’d say the compression is also about VRAM/RAM, which seems to be the more limiting factor over inference speed.

onlyrealcuzzo3y ago

It's interesting that when ChatGPT 3.5 came out - everyone said, this is it! It's ready for primetime.

And now that there's a few competitors in the same league - 3.5 quality is suddenly garbage and only 4.0 is good enough.

Was it good enough before or wasn't it?

jonplackett3y ago

I think it’s going to be like movie special effects.

When Jurassic park first came out, or even something like Star Trek next gen. It looked AMAZING. So so realistic. But then…. As time goes on new things showed us what realistic could be.

I think we actually got better at seeing.

Same thing here. The more time you spend with it the more you notice things that don’t quite work. And then the new thing solves those problems, but we’ll find more wrongness

coeneedell3y ago

The problem is that you’ve identified two distinct and non-overlapping sets of people as “everyone”. Everyone who was applauding 3.5 when it came out were industry hype people. Even the critical voices were industry hype people, paid to assume the AI is powerful and write about the possible negative consequences of that assumption.

Now we’ve all gotten familiar with 3.5, and we’ve come to understand its limitations, so the public knows it’s not a “godlike” AI.

Luckily there’s a fresh new model, not technically different from the earlier one but it cost more money to build. The hype group can start again, citing the publicly known limitations of 3.5. But in 6 months we’ll understand what’s wrong with it, and the public will be talking about the limitations, just in time for 4.5.

bioemerl3y ago

It's really not good enough yet, it's impressive for what it is in our current time. But we're looking at the 1980s computers.

They are neat, they are useful, but they can do so much more.

LASR3y ago

In my personal testing, I throw some sophisticated use cases at LLMs - particularly chain of thought reasoning. None of the models out there are able to do this this well, except for the OG GPT-3 Davinci-003. Even the newer turbo models are not as good.

I am playing around with GPT-4 this week though. Let’s see how that goes.

stavros3y ago

The newer turbo models are the ChatGPT models, and are worse than text-davinci-003, in my experience. The gpt-4 model is also not as good as the GPT-4 chat version, which is very odd.

1 more reply

genericacct3y ago

fwiw 7B is totaly useless for the subset of non english languages i've used, 13B a bit less so, but nowhere near as good as gpt.

GPT's performance in non-trivial translation tasks is unbelievable. all those articles mentioning jobs that are going to be replaced fail to mention translators are probably going to be the first.

user_named3y ago

You need the translators to QA the output from GPT. It's less work but not much less, and more types of translation work becomes feasible when leveraging GPT. I'm guessing the job market for translators will grow, not decline.

thomasahle3y ago

3 bits? Is that for all weights in the network?

superkuh3y ago

As far as I know, yes. https://arxiv.org/abs/2210.17323

"Specifically, GPTQ can quantize GPT models with 175 billion parameters in approximately four GPU hours, reducing the bitwidth down to 3 or 4 bits per weight, with negligible accuracy degradation relative to the uncompressed baseline."

This would be 175 billion 3 bit weights instead of 175 billion 16 (or 32!) bit weights. It massively reduces the size of the model. It makes loading it in ram on consumer computers feasible. The number of parameters stays the same.

4 more replies

DrJosiah3y ago

It might have been a typo, as the current llama.cpp / alpaca.cpp included quantizers default to 4 bits.

throwaway18513y ago

Hm. I haven’t tried the local installs yet. However, when the Alpaca web demo was live, I did find it to be comparable (though not quite as capable) to davinci-003. It answered arbitrary factual questions about pop culture references, law, medicine, and programming. It generated rhymes and poems. (I didn’t try asking for the emoji thing, so can’t say anything about that.) It performed natural language tasks such as information extraction and summarization. And it did all of it coherently.

nickthegreek3y ago

Where does one find the 13B alpaca model?

superkuh3y ago

Be aware this file is a single ~8GB 4-bit model (ggml-alpaca-13b-q4.bin) instead of the 2x ~4GB models (ggml-model-q4_0.bin, ggml-model-q4_0.bin.1) that most llama.cpp style inference running programs expect. You'll probably have to edit the line,

    n_parts = LLAMA_N_PARTS.at(hparams.n_embd);

in chat.cpp (or main.cpp) to hard code it to treat this 1 file model properly like,

    n_parts = 1;

Or re-write the parameter config subroutine to recognize and handle non-standard weights file.

magnet: magnet:?xt=urn:btih:053b3d54d2e77ff020ebddf51dad681f2a651071&dn=ggml-alpaca-13b-q4.bin&tr=udp%3A%2F%2Ftracker.opentrackr.org%3A1337%2Fannounce&tr=udp%3A%2F%2Fopentracker.i2p.rocks%3A6969%2Fannounce&tr=udp%3A%2F%2Ftracker.openbittorrent.com%3A6969%2Fannounce&tr=udp%3A%2F%2F9.rarbg.com%3A2810%2Fannounce

torrent: https://btcache.me/torrent/053B3D54D2E77FF020EBDDF51DAD681F2...

torrent: https://torrage.info/torrent.php?h=053b3d54d2e77ff020ebddf51...

via: https://github.com/antimatter15/alpaca.cpp

codetrotter3y ago

> represent a movie plot as emoji

This sounded like a really cool idea but I asked ChatGPT to do this for the plot of the movie The Shawashank Redemption and there is no way that I would ever have been able to guess that movie from the emojis it gave me. Perhaps GPT-4 does a better job at it.

ryoshu3y ago

So what you're saying is it's a matter of time?

xwdv3y ago· 10 in thread

Given the high prices of OpenAI offerings it seems it’s better to pirate an AI model before resorting to paying for anything.

nerpderp823y ago

The world is becoming more cyberpunk everyday, people making back alley deals for data or models.

These weights are shit man, they have been quanted w/o being retrained against the original. I already have this torrent, I want uncut originals. And no water marks this time, the last model wouldn't shutup about investing in tulips.

oezi3y ago

Shh, I got some unnerfed midjourney v8 for your nsfw needs, my friend...

1 more reply

ronsor3y ago

ChatGPT API is surprisingly cheap, but GPT-4 is many times more expensive to the point where I can't see it being worth it most of the time.

stavros3y ago

I'm not convinced that the GPT-4 API actually works? It's been giving me very different answers than the chat interface. For example, the chat interface says it's GPT-4 if you ask it, but the API says it's GPT-3 (and bills as GPT-4).

3 more replies

nico3y ago

How much does it cost to privately fine-tune and run Llama?

It’s USD 600 for fine-tuning. Maybe USD 4-5k for a computer that can run it.

ChatGPT pro is $20/month. 5k would be 250 months (10+ years) of paid access.

Not sure pirating it now adds up.

zamnos3y ago

Correct me if I'm wrong, but that's per-user, unless you all just share an account, or you build out a bespoke API integration. Which is to say if you have 25 developers, you'd spend $5k in 1 month.

The reason to pirate it would to be able to fine-tune the model on your private internal source code repository, assuming you already have an existing large body of work you want to train it and offer SelfHostedCoPilot on your bespoke internal-only DSL that ChatGPT and Copilot has no way of having seen and would undoubtedly hallucinate about by a considerable margin.

mromanuk3y ago

Most people doesn’t fine tune the models (llama or openAI). A MacBook M1 can run those model ($1000) and in many cases the user already have it. You also need a computer to access openAI, the comparison boils down to $20/m vs $0. At this point in time, LLM are a curiosity for most people

Firmwarrior3y ago

Isn't openAI only charging something like a 20th of a penny per interaction right now? Definitely not the kind of thing you want to incorporate into a widespread free app just yet, but it seems pretty affordable for a lot of use cases

dragonwriter3y ago

> Isn't openAI only charging something like a 20th of a penny per interaction right now?

They don't charge per interaction, but per token. The chat models range from a fifth of a cent per 1000 tokens to 12 cents per thousand tokens (depending on whether it's gpt-3.5, or the 8k limit gpt-4, or the 32k limit gpt-4, and, for gpt-4 models, also prompt v. response tokens.)

sp3323y ago

That's the price per "token". A token is a word or part of a word - rule of thumb is four tokens per three words.

cjohnson3183y ago· 6 in thread

> It seems these godlike AIs are already frighteningly cheap and easy to replicate.

"godlike"? Really? I'm not religious, but this seems like an overreaction for something that has no agency.

crazygringo3y ago

If it's a shorthand for omniscience then I can see how it makes sense. A bit hyperbolic though for sure.

doctor_eval3y ago

How do you know agency is not simply the output of a large language model encoded in neurons? What is the difference between neuronal and digital weights?

Considering that we don’t know how the brain works so well, and we don’t understand why LLMs work so well, simply on the basis of their output I think the safest assumption is that these models do indeed have agency, or at least the capability of agency.

cjohnson3183y ago

I'm using agency in this context to mean: (1) a strong desire to achieve one or more clear goals, beyond survival, and (2) taking concrete steps to achieve those goals.

> How do you know agency is not simply the output of a large language model encoded in neurons?

I'm not sure what you mean here. Is agency an emergent effect of large digital or biological neural network? Maybe! Is it an emergent effect of a large language model? If it is, then it should be clear, or demonstrable, that the model (1) has goals (2) takes concrete steps to achieve those goals.

> What is the difference between neuronal and digital weights?

Brain chemistry works at orders of magnitude less speed, since we're talking about periodically building and releasing an ionic differential between the inside and outside of a cell wall. Moreover, we have a massive number of neurons and a stupidly massive amount of interneuronal connections, with billions of years of training over billions of lineages. Digital weights, in contrast, are a stripped down model of this system that throws out a whole class of complexities like hormones and metabolism.

> I think the safest assumption is that these models do indeed have agency, or at least the capability of agency.

I think this is an overly generous assumption.

1 more reply

ComposedPattern3y ago

Gods don't have persistent bodies or brains. They're spirits.

I don't disagree that digital systems with neural architectures could have agency in principle, but agency generally is definitely not the output of a large language model. Animals without language have agency, in that that they take actions to fulfill their desires. Current LLMs may have some degree of intelligence, but they don't even appear to have any consistent wishes or desires. You can get them to talk longingly about x... until you give another prompt and suddenly x doesn't matter to them at all.

Jcowell3y ago

What if creation was a result of a lucky happen-by-chance hallucination ?

cjohnson3183y ago

Sure! Why not? Quantum foam imagining quantum foam. I love it. I still wouldn't consider an LLM god-like. I mean, if it eats its own son, then maybe. (Titan of a reference there.)

starik363y ago· 6 in thread

From the article: Pre-trained on a trillion "tokens"...

Doesn't 7B indicates that it was trained on 7 billion tokens? Or am I misunderstanding the nomenclature?

dragonwriter3y ago

> Doesn't 7B indicates that it was trained on 7 billion tokens?

No, 7B means it has 7 billion parameters.

starik363y ago

And what does a parameter mean in this context?

1 more reply

superkuh3y ago

The emerging consensus for larger LLM is you want to train them with at least 2-4x the tokens of the number of parameters (weights between neurons in the layers). A trillion (100x) surprises me.

sitic3y ago

The LLaMA paper contradicts this view: "[...] Although Hoffmann et al. (2022) recommends training a 10B model on 200B tokens, we find that the performance of a 7B model continues to improve even after 1T tokens." https://arxiv.org/pdf/2302.13971.pdf

sebzim45003y ago

They probably put most of the effort into the 65B model, the 7B model was just trained so they could get an idea of the scaling behaviour. It makes sense to use the same amount of training steps, then.

instance3y ago

7B is the number of parameters of the model.

twblalock3y ago· 6 in thread

This is why it's not possible to slow down or "stop" AI: once the problems are solved the solutions turn out to be trivial to replicate. All it takes is compute.

jeron3y ago

you say all it takes compute like that is trivial - chatGPT would have a hard time without Microsoft's support via Azure

twblalock3y ago

There are lots of places to get compute, including Chinese cloud providers...

The genie really is out of the bottle now.

This is a lot like pharmaceuticals. The initial investment in a new medication is enormous. The price of each pill is trivial, to the extent that every drugstore chain is able to supply a generic in-house brand.

zamnos3y ago

Maybe. Certainly in the past, before the world was aware LLMs on the level of ChatGPT were possible with today's technology. OpenAI's chosen not to release any real details about GPT-4, so we don't actually know what it would take to train a model of equivalent quality, especially considering training isn't a one-shot. Multiple training runs easily add up training costs. So training for a 12-figure parameter size model(s) (175B) is assumed to be very expensive. But there has been great progress made for optimized models which are smaller by a two orders of magnitude - 7B for a debatable drop in quality (7B alpaca is in no-way competitive with ChatGPT, but it's still very much not a markov chain from during the AI winter). So one possibility is that OpenAI chose not to release salient GPT-4 details is due to it being much smaller than GPT-3's 175B model size and they're hiding the details because of how much that cuts down on training costs. (Which I should note is unsubstantiated conjecture but not outside the realm of possibility.)

The other aspect is that fine-tuning an existing model is way cheaper than creating a competing model from scratch, so a company could offer CompetitorGPT/CompetitorCoPilot competitive with GPT-3.5, and offer fine-tuning of that model trained on the source code repository of the purchaser company's codebase, possibly on-prem or at least inside their AWS VPC/Azure/GCP equivalent.

The other thing to note is that OpenAI is hosting ChatGPT as a public resource available to anyone with an account, akin to Google being open to the public from day one (although that is without an account. Maybe Gmail is a better comparison). I can't say for certain, only OpenAI would know for sure, but I'm willing to bet that inference for ChatGPT is the vast majority of their costs (which is all but trivial). Any private internal-only instance of OpenChatGPT (using the unlicensed leaked LLaMA model or a legal copy or someone else's) could be paying (relatively) minuscule training costs, and way lower inference costs if it's internal-use only. Whether that cost can be borne by a small SaaS company's existing AWS budget is up in the air, which is to say ultimately that you're right - ChatGPT would be difficult without the support of Microsoft via a huge Azure grant, it's less obvious that a self hosted internal-only OpenChatGPT, not from OpenAI, would be possible by hobbyist self-hosters with a prosumer GPU cluster (Say with last generation K80's instead of business-priced A100's), or by a company wanting to leverage LLMs for private use by that company that wants to provide a Copilot like productivity multiplier internal tool to their developers, without sending private source code to OpenAI in lieu of a privacy agreement with them.

2 more replies

crooked-v3y ago

While that's true, it's basically inevitable now that at some point personal hardware will be powerful enough for enthusiasts to run home bots comparable to GPT-3, and even that by itself would drastically change a lot things.

2 more replies

mLuby3y ago

Governments have experience limiting the spread of digital content. For now at least, AI proliferation is not immune to those same tactics.

twblalock3y ago

Governments are really bad at limiting the spread of digital content.

1 more reply

doctoboggan3y ago· 3 in thread

I've used both the 7B and 13B instruction tuned llama weights (quantized using the llama.cpp scripts). Either I am doing something wrong, or these two models are no-where near the level of ChatGPT. Many times they return something totally irrelevant to my question, stop responding, use a different language, or otherwise return the wrong answer. ChatGPT does none of this. (other than the wrong answer due to hallucinating sometimes...)

Reading through the README and issues on the llama.cpp project, there is some speculation that there is a bug in the quantization, or possibly a bug in the inference (less likely I think).

I hope this is true and once fixed the models can perform up to or past the ChatGPT level. If its not true and these models are performing correctly, then either the metrics used to compare it to GPT is garbage and don't capture the real world uses, or the instruction tuning done by the Stanford team is not up to par.

f_devd3y ago

LLama hasn't been fine-tuned with RLHF, so it requires additional prompting, check out the open-assistant[0] project for an open-source ChatGPT equivalent (WIP).

[0]: https://github.com/LAION-AI/Open-Assistant

satvikpendem3y ago

Use it here: https://huggingface.co/spaces/olivierdehaene/chat-llm-stream...

1 more reply

simonw3y ago

This is why Alpaca is a big deal: it shows what LLaMA can do after it's been fine-tuned to follow instructions like ChatGPT has.

1 more reply

raydiatian3y ago· 3 in thread

> It seems these godlike AIs are already frighteningly cheap and easy to replicate.

Who writes this shit?

meh88813y ago

Irreplaceable humans

B1FF_PSUVM3y ago

Hard to say ...

dr_kiszonka3y ago

Thanks for a genuinely funny comment!

EGreg3y ago· 3 in thread

I warned about this for years. Finally an article gets it right.

Everyone will soon have the equivalent of online nuclear weapons: bot swarms that infiltrate every forum, including this one.

tantalor3y ago

Spam has existed on the internet for a long time.

EGreg3y ago

This is different. It can act just like humans do for most people who skim comments won't be able to tell the difference.

Note this was in 2020: https://www.technologyreview.com/2020/10/08/1009845/a-gpt-3-...

And here's 4chan bot: https://www.youtube.com/watch?v=efPrtcLdcdM

I can tell you that HN is probably already being infiltrated as well.

SPAM can't gang up on you in a forum and downvote you and turn your friends against you and destroy your reputation within 1 hour online. But soon, it will. The web as we know it is soon going to be over.

1 more reply

B1FF_PSUVM3y ago

My pet theory was that AI would come out of spam bots.

Close enough.

braingenious3y ago· 2 in thread

I have not found alpaca to be comparable to chatgpt, but it could be because of bugs in the version I installed through dalai. I might try reinstalling it because I suspect there might be some sort of file corruption issue or whatever.

I gave it the prompt “cats aren’t always fuzzy” and it wrote a lengthy livejournal-esque rambling journal entry about a woman and her husband having money issues. It was funny, but lightyears away from chatgpt.

It does sometimes create some really funny hallucinations though, like inventing prefectures in Japan that don’t exist etc.

DustinBrett3y ago

I also got that text about the married couple and their money issues. Alpaca didn't impress me at all so far.

disgruntledphd23y ago

Alpaca wasn't great. The 13b and 30b models are much better, but just for sentence completion.

Personally, I think that the RLHF does make a big difference but maybe it's a bug in the quantization code as suggested up thread.

1 more reply

satvikpendem3y ago· 2 in thread

Alpaca is cool but it's also not technically allowed by OpenAI's TOS, and LLaMA is certainly not allowed to be used for non-commercial purposes. With that in mind, OpenAssistant is an Apache 2.0 licensed fully open source alternative that's pretty good (the model is OpenAssistant/oasst-sft-1-pythia-12b): https://huggingface.co/spaces/olivierdehaene/chat-llm-stream....

I've found OA to be better than Alpaca but I'll wait until the 65B 3-bit quantization efforts for Alpaca are underway to compare them.

Zuiii3y ago

> Alpaca is cool but it's also not technically allowed by OpenAI's TOS, and LLaMA is certainly not allowed to be used for non-commercial purposes.

Only if you agreed to the ToS or believe that the weights are copyrightable (precedents set by the copyright office and the courts strongly suggest that they aren't). I personally see no issue in using these models for commercial purposes.

satvikpendem3y ago

You might not but a company will think twice. It's the same reason why companies could theoretically use pirated Windows and Adobe products and get away with it, but most don't because the risk is not worth the reward.

1 more reply

jakedata3y ago· 2 in thread

AI bootstrapping AI is a sci-fi trope that goes back decades. I first encountered it in The Cybernetic Samurai while in high school. While the details differ, the reality is that AI is a catalyst for more of itself.

I don't remember many books where this ends particularly well. Perhaps the Culture universe could be a survivable outcome. Hopefully we don't get Berzerkers first.

nuclearsugar3y ago

Stable Diffusion trains StyleGAN2 - https://www.jasonfletcher.info/vjloops/

doctor_eval3y ago

It’s the beginning of the AI singularity. It’s not that it’s bad, we just can’t see anything beyond the event horizon.

freediver3y ago· 1 in thread

The incredible contribution of Alpaca is showing the world how to efficicently train LLM on instructions. The fact that it did so on 52k instructions generated by GPT is poetic.

It does not matter what current capabilities of open source models are, because this opens the door to tremendous democratization of the ability to train and self-deploy these models.

In less than 6 months we will have open source models with gpt3-like capabilities, running locally on laptops, and potentially in phones and web browsers.

permo-w3y ago

if we’re all still alive by then

Waterluvian3y ago· 1 in thread

If you use consciousness as a baseline, the intellectual difference between a grade schooler and a PhD is tiny.

This is what I think comparing these bots is like. You can argue that they’re very close. But the delta makes a very big difference for any practical purposes because we’re looking for nuanced capability.

est313y ago

There is a blog post that drives this point home with very good illustrations:

https://waitbutwhy.com/2015/01/artificial-intelligence-revol...

Basically, at the point where we have "almost human" level AI, it won't take much to get AI that's beyond human capabilities.

gaogao3y ago· 1 in thread

Has anyone tried this yet on the 65B version? I'm curious if it knows how to rhyme and other emergent behavior, as alpace-7B does not.

starik363y ago

I tried because I was curious. It's not very good. First I "quantized" the files. Took about 3-4 hours. Then I ran a prompt: "create a short poem about my wife helping me get over the stuffed nose". After 3 hours of 99% CPU usage and using all 32 GB of RAM, I got the following result:

create a short poem about my wife helping me get over the stuffed nose and throat that i've had for days now. the cold was all around him, but he didnΓÇÖt notice it at first- so much to do in this new place! finally though when they were sitting by themselves on their porch swing his sweetheart leaned into her man saying ΓÇ£we could take some of these leaves from the garden and make tea.ΓÇ¥ He looked up with a quizzical look and saw that she was serious. I've been under weather since last Tuesday. Today is day 6...and still going strong! Had to cancel two nights on stage, missed my son

https://i.imgur.com/Nl3xLEg.png

earthboundkid3y ago· 1 in thread

I bet you could “exfiltrate” an LLM relatively cheaply by using LLM A to generate training data for LLM B.

oezi3y ago

No way. The cost for generating the tokens is way too high.

UncleOxidant3y ago· 1 in thread

Is it accurate to say they were trained for less than $600? Wouldn't that just be the finetuning that was done to the already existing LLaMA parameters which likely cost way more than $600 to train?

simonw3y ago

Yeah, exactly. LLaMA 7B itself cost $80,000+ to train (82,432 GPU hours). Stanford spent $100 on fine-tuning compute and $500 on OpenAI credits to generate their 52,000 sample instruction training set.

awinter-py3y ago

> asked GPT to take 175 human-written instruction/output pairs, and start generating more in the same style and format ... through one of OpenAI's helpfully provided APIs, and ... the team had some 52,000 sample conversations to use in post-training the LLaMA model

hmm I wonder if this is essentially a probe[1] technique + relies on chatgpt already having been extensively trained

like did they basically exfiltrate the weights

1. probing per https://arxiv.org/abs/2102.12452

dang3y ago

Recent and related:

Stanford Alpaca web demo suspended “until further notice” - https://news.ycombinator.com/item?id=35200557 - March 2023 (77 comments)

Stanford Alpaca, and the acceleration of on-device LLM development - https://news.ycombinator.com/item?id=35141531 - March 2023 (66 comments)

Alpaca: An Instruct Tuned LLaMA 7B – Responses on par with txt-DaVinci-3 - https://news.ycombinator.com/item?id=35139450 - March 2023 (11 comments)

Alpaca: A strong open-source instruction-following model - https://news.ycombinator.com/item?id=35136624 - March 2023 (296 comments)

simonw3y ago

Related, my post "Could you train a ChatGPT-beating model for $85,000 and run it in a browser?" https://simonwillison.net/2023/Mar/17/beat-chatgpt-in-a-brow...

I think you can train LLaMA 7B (the model underlying Alpaca) for around $82,000, based on the Meta Research paper about it. Then you can fine-tune it ala Alpaca for a few hundred dollars more.

My wilder speculation is that, if you can shrink the model down to 4GB with llama.cpp 4bit quantization, it may be possible to run it entirely in the browser (ala Stable Diffusion from the other day).

neilellis3y ago

Wow, Stanford's Alpaca AI project is a real game-changer. The fact that it performs on par with ChatGPT but costs less than $600 to build is both exciting and terrifying. Sure, it's great to see AI becoming more accessible, but it's also a massive wakeup call for the potential misuse of these technologies.

We've got big names like OpenAI, Google, Apple, Meta, Baidu, and Amazon putting in serious time and money to ensure their language models are safe and ethical. However, now that we know it's possible to build powerful AI models on a budget, it's crucial to think about what this means for the future of AI regulation and safety.

This Alpaca AI project is a stark reminder that we need to have a serious conversation about the possible repercussions of AI proliferation. We can't just sit back and assume the big companies will take care of everything. The genie is out of the bottle, and it's time for everyone in the tech community to face the music and take responsibility for the AI revolution.

welly34h3y ago

Code can be abstracted into a simpler code model and deterministically recreate the old code model.

OpenAI is an eventually to be obsoleted initial brute force approach that will be abstracted over and over into a simpler code implementation with rules to recreate the old state.

kkrieger is a simple example of a tiny data model that can be deterministically rehydrated. It’s not unrealistic for AI models to become a seed value for a normalized code base to deterministically unpack into necessary electron state

amrb3y ago

Anything open will need training and attention to be an openai competitor, tho I'm happy to see the function of this one: https://huggingface.co/spaces/togethercomputer/OpenChatKit

alecco3y ago

https://archive.ph/xIKIN

j / k navigate · click thread line to collapse

166 comments

97 comments · 23 top-level

superkuh3y ago· 26 in thread

crooked-v3y ago

gumby3y ago

I couldn't do it either as I have no idea what Animorphs or Duel are.

1 more reply

pram3y ago

4 more replies

starik363y ago

That is my experience as well. I've tried various models but nothing comes even close to the current ChatGPT implementation (when it manages to stay up).

TedDoesntTalk3y ago

> Write me a fanfic where the Animorphs battle the truck from Duel, but in the style of Mark Twain

Whoa. I want to read this! Duel - what a great film. Twain - amazing writer. Animorphs - published after my teen years but sounds like a great story!

1 more reply

satvikpendem3y ago

You might need to fix your parameters. From the text-generation-gui guide:

> For a more creative chat, use: temp 0.72, rep pen 1.1, top_k 0, and top_p 0.73

> For a more precise chat, use temp 0.7, repetition_penalty 1.1764705882352942 (1/0.85), top_k 40, and top_p 0.1

https://old.reddit.com/r/LocalLLaMA/comments/11o6o3f/how_to_...

https://old.reddit.com/r/singularity/comments/11vsvro/in_cas...

https://twitter.com/theshawwn/status/1632569215348531201

---

That being said, I found the OpenAssistant model much better: https://huggingface.co/spaces/olivierdehaene/chat-llm-stream...

It's also completely OSS, Apache 2.0, unlike LLaMA and Alpaca which are non-commercial.

circuit103y ago

I’m impressed by how ChatGPT-like it is but also it’s saying things like

superkuh3y ago

UncleOxidant3y ago

> the alpaca 7B _4-bit_ [and presumably also 4bit for the 13B, 30B and larger parameter sets]

EDIT2: looking through the alpacca.cpp code and there is mention of AVX, AVX2, AVX512 (and NEON on ARM) so it probably is taking advantage of vectorized ops where that's possible.

leodriesch3y ago

Not an expert on the matter so take this with a grain of salt, but I’d say the compression is also about VRAM/RAM, which seems to be the more limiting factor over inference speed.

onlyrealcuzzo3y ago

It's interesting that when ChatGPT 3.5 came out - everyone said, this is it! It's ready for primetime.

And now that there's a few competitors in the same league - 3.5 quality is suddenly garbage and only 4.0 is good enough.

Was it good enough before or wasn't it?

jonplackett3y ago

I think it’s going to be like movie special effects.

When Jurassic park first came out, or even something like Star Trek next gen. It looked AMAZING. So so realistic. But then…. As time goes on new things showed us what realistic could be.

I think we actually got better at seeing.

Same thing here. The more time you spend with it the more you notice things that don’t quite work. And then the new thing solves those problems, but we’ll find more wrongness

coeneedell3y ago

Now we’ve all gotten familiar with 3.5, and we’ve come to understand its limitations, so the public knows it’s not a “godlike” AI.

bioemerl3y ago

It's really not good enough yet, it's impressive for what it is in our current time. But we're looking at the 1980s computers.

They are neat, they are useful, but they can do so much more.

LASR3y ago

I am playing around with GPT-4 this week though. Let’s see how that goes.

stavros3y ago

The newer turbo models are the ChatGPT models, and are worse than text-davinci-003, in my experience. The gpt-4 model is also not as good as the GPT-4 chat version, which is very odd.

1 more reply

genericacct3y ago

fwiw 7B is totaly useless for the subset of non english languages i've used, 13B a bit less so, but nowhere near as good as gpt.

GPT's performance in non-trivial translation tasks is unbelievable. all those articles mentioning jobs that are going to be replaced fail to mention translators are probably going to be the first.

user_named3y ago

thomasahle3y ago

3 bits? Is that for all weights in the network?

superkuh3y ago

As far as I know, yes. https://arxiv.org/abs/2210.17323

4 more replies

DrJosiah3y ago

It might have been a typo, as the current llama.cpp / alpaca.cpp included quantizers default to 4 bits.

throwaway18513y ago

nickthegreek3y ago

Where does one find the 13B alpaca model?

superkuh3y ago

    n_parts = LLAMA_N_PARTS.at(hparams.n_embd);

in chat.cpp (or main.cpp) to hard code it to treat this 1 file model properly like,

    n_parts = 1;

Or re-write the parameter config subroutine to recognize and handle non-standard weights file.

torrent: https://btcache.me/torrent/053B3D54D2E77FF020EBDDF51DAD681F2...

torrent: https://torrage.info/torrent.php?h=053b3d54d2e77ff020ebddf51...

via: https://github.com/antimatter15/alpaca.cpp

codetrotter3y ago

> represent a movie plot as emoji

ryoshu3y ago

So what you're saying is it's a matter of time?

xwdv3y ago· 10 in thread

Given the high prices of OpenAI offerings it seems it’s better to pirate an AI model before resorting to paying for anything.

nerpderp823y ago

The world is becoming more cyberpunk everyday, people making back alley deals for data or models.

oezi3y ago

Shh, I got some unnerfed midjourney v8 for your nsfw needs, my friend...

1 more reply

ronsor3y ago

ChatGPT API is surprisingly cheap, but GPT-4 is many times more expensive to the point where I can't see it being worth it most of the time.

stavros3y ago

3 more replies

nico3y ago

How much does it cost to privately fine-tune and run Llama?

It’s USD 600 for fine-tuning. Maybe USD 4-5k for a computer that can run it.

ChatGPT pro is $20/month. 5k would be 250 months (10+ years) of paid access.

Not sure pirating it now adds up.

zamnos3y ago

Correct me if I'm wrong, but that's per-user, unless you all just share an account, or you build out a bespoke API integration. Which is to say if you have 25 developers, you'd spend $5k in 1 month.

mromanuk3y ago

Firmwarrior3y ago

dragonwriter3y ago

> Isn't openAI only charging something like a 20th of a penny per interaction right now?

sp3323y ago

That's the price per "token". A token is a word or part of a word - rule of thumb is four tokens per three words.

cjohnson3183y ago· 6 in thread

> It seems these godlike AIs are already frighteningly cheap and easy to replicate.

"godlike"? Really? I'm not religious, but this seems like an overreaction for something that has no agency.

crazygringo3y ago

If it's a shorthand for omniscience then I can see how it makes sense. A bit hyperbolic though for sure.

doctor_eval3y ago

How do you know agency is not simply the output of a large language model encoded in neurons? What is the difference between neuronal and digital weights?

cjohnson3183y ago

I'm using agency in this context to mean: (1) a strong desire to achieve one or more clear goals, beyond survival, and (2) taking concrete steps to achieve those goals.

> How do you know agency is not simply the output of a large language model encoded in neurons?

> What is the difference between neuronal and digital weights?

> I think the safest assumption is that these models do indeed have agency, or at least the capability of agency.

I think this is an overly generous assumption.

1 more reply

ComposedPattern3y ago

Gods don't have persistent bodies or brains. They're spirits.

Jcowell3y ago

What if creation was a result of a lucky happen-by-chance hallucination ?

cjohnson3183y ago

Sure! Why not? Quantum foam imagining quantum foam. I love it. I still wouldn't consider an LLM god-like. I mean, if it eats its own son, then maybe. (Titan of a reference there.)

starik363y ago· 6 in thread

From the article: Pre-trained on a trillion "tokens"...

Doesn't 7B indicates that it was trained on 7 billion tokens? Or am I misunderstanding the nomenclature?

dragonwriter3y ago

> Doesn't 7B indicates that it was trained on 7 billion tokens?

No, 7B means it has 7 billion parameters.

starik363y ago

And what does a parameter mean in this context?

1 more reply

superkuh3y ago

The emerging consensus for larger LLM is you want to train them with at least 2-4x the tokens of the number of parameters (weights between neurons in the layers). A trillion (100x) surprises me.

sitic3y ago

sebzim45003y ago

instance3y ago

7B is the number of parameters of the model.

twblalock3y ago· 6 in thread

This is why it's not possible to slow down or "stop" AI: once the problems are solved the solutions turn out to be trivial to replicate. All it takes is compute.

jeron3y ago

you say all it takes compute like that is trivial - chatGPT would have a hard time without Microsoft's support via Azure

twblalock3y ago

There are lots of places to get compute, including Chinese cloud providers...

The genie really is out of the bottle now.

zamnos3y ago

2 more replies

crooked-v3y ago

2 more replies

mLuby3y ago

Governments have experience limiting the spread of digital content. For now at least, AI proliferation is not immune to those same tactics.

twblalock3y ago

Governments are really bad at limiting the spread of digital content.

1 more reply

doctoboggan3y ago· 3 in thread

Reading through the README and issues on the llama.cpp project, there is some speculation that there is a bug in the quantization, or possibly a bug in the inference (less likely I think).

f_devd3y ago

LLama hasn't been fine-tuned with RLHF, so it requires additional prompting, check out the open-assistant[0] project for an open-source ChatGPT equivalent (WIP).

[0]: https://github.com/LAION-AI/Open-Assistant

satvikpendem3y ago

Use it here: https://huggingface.co/spaces/olivierdehaene/chat-llm-stream...

1 more reply

simonw3y ago

This is why Alpaca is a big deal: it shows what LLaMA can do after it's been fine-tuned to follow instructions like ChatGPT has.

1 more reply

raydiatian3y ago· 3 in thread

> It seems these godlike AIs are already frighteningly cheap and easy to replicate.

Who writes this shit?

meh88813y ago

Irreplaceable humans

B1FF_PSUVM3y ago

Hard to say ...

dr_kiszonka3y ago

Thanks for a genuinely funny comment!

EGreg3y ago· 3 in thread

I warned about this for years. Finally an article gets it right.

Everyone will soon have the equivalent of online nuclear weapons: bot swarms that infiltrate every forum, including this one.

tantalor3y ago

Spam has existed on the internet for a long time.

EGreg3y ago

This is different. It can act just like humans do for most people who skim comments won't be able to tell the difference.

Note this was in 2020: https://www.technologyreview.com/2020/10/08/1009845/a-gpt-3-...

And here's 4chan bot: https://www.youtube.com/watch?v=efPrtcLdcdM

I can tell you that HN is probably already being infiltrated as well.

1 more reply

B1FF_PSUVM3y ago

My pet theory was that AI would come out of spam bots.

Close enough.

braingenious3y ago· 2 in thread

It does sometimes create some really funny hallucinations though, like inventing prefectures in Japan that don’t exist etc.

DustinBrett3y ago

I also got that text about the married couple and their money issues. Alpaca didn't impress me at all so far.

disgruntledphd23y ago

Alpaca wasn't great. The 13b and 30b models are much better, but just for sentence completion.

Personally, I think that the RLHF does make a big difference but maybe it's a bug in the quantization code as suggested up thread.

1 more reply

satvikpendem3y ago· 2 in thread

I've found OA to be better than Alpaca but I'll wait until the 65B 3-bit quantization efforts for Alpaca are underway to compare them.

Zuiii3y ago

> Alpaca is cool but it's also not technically allowed by OpenAI's TOS, and LLaMA is certainly not allowed to be used for non-commercial purposes.

satvikpendem3y ago

1 more reply

jakedata3y ago· 2 in thread

I don't remember many books where this ends particularly well. Perhaps the Culture universe could be a survivable outcome. Hopefully we don't get Berzerkers first.

nuclearsugar3y ago

Stable Diffusion trains StyleGAN2 - https://www.jasonfletcher.info/vjloops/

doctor_eval3y ago

It’s the beginning of the AI singularity. It’s not that it’s bad, we just can’t see anything beyond the event horizon.

freediver3y ago· 1 in thread

The incredible contribution of Alpaca is showing the world how to efficicently train LLM on instructions. The fact that it did so on 52k instructions generated by GPT is poetic.

It does not matter what current capabilities of open source models are, because this opens the door to tremendous democratization of the ability to train and self-deploy these models.

In less than 6 months we will have open source models with gpt3-like capabilities, running locally on laptops, and potentially in phones and web browsers.

permo-w3y ago

if we’re all still alive by then

Waterluvian3y ago· 1 in thread

If you use consciousness as a baseline, the intellectual difference between a grade schooler and a PhD is tiny.

est313y ago

There is a blog post that drives this point home with very good illustrations:

https://waitbutwhy.com/2015/01/artificial-intelligence-revol...

Basically, at the point where we have "almost human" level AI, it won't take much to get AI that's beyond human capabilities.

gaogao3y ago· 1 in thread

Has anyone tried this yet on the 65B version? I'm curious if it knows how to rhyme and other emergent behavior, as alpace-7B does not.

starik363y ago

https://i.imgur.com/Nl3xLEg.png

earthboundkid3y ago· 1 in thread

I bet you could “exfiltrate” an LLM relatively cheaply by using LLM A to generate training data for LLM B.

oezi3y ago

No way. The cost for generating the tokens is way too high.

UncleOxidant3y ago· 1 in thread

Is it accurate to say they were trained for less than $600? Wouldn't that just be the finetuning that was done to the already existing LLaMA parameters which likely cost way more than $600 to train?

simonw3y ago

awinter-py3y ago

hmm I wonder if this is essentially a probe[1] technique + relies on chatgpt already having been extensively trained

like did they basically exfiltrate the weights

1. probing per https://arxiv.org/abs/2102.12452

dang3y ago

Recent and related:

Stanford Alpaca web demo suspended “until further notice” - https://news.ycombinator.com/item?id=35200557 - March 2023 (77 comments)

Stanford Alpaca, and the acceleration of on-device LLM development - https://news.ycombinator.com/item?id=35141531 - March 2023 (66 comments)

Alpaca: An Instruct Tuned LLaMA 7B – Responses on par with txt-DaVinci-3 - https://news.ycombinator.com/item?id=35139450 - March 2023 (11 comments)

Alpaca: A strong open-source instruction-following model - https://news.ycombinator.com/item?id=35136624 - March 2023 (296 comments)

simonw3y ago

Related, my post "Could you train a ChatGPT-beating model for $85,000 and run it in a browser?" https://simonwillison.net/2023/Mar/17/beat-chatgpt-in-a-brow...

I think you can train LLaMA 7B (the model underlying Alpaca) for around $82,000, based on the Meta Research paper about it. Then you can fine-tune it ala Alpaca for a few hundred dollars more.

neilellis3y ago

welly34h3y ago

Code can be abstracted into a simpler code model and deterministically recreate the old code model.

OpenAI is an eventually to be obsoleted initial brute force approach that will be abstracted over and over into a simpler code implementation with rules to recreate the old state.

amrb3y ago

Anything open will need training and attention to be an openai competitor, tho I'm happy to see the function of this one: https://huggingface.co/spaces/togethercomputer/OpenChatKit

alecco3y ago

https://archive.ph/xIKIN

j / k navigate · click thread line to collapse