Stable Code 3B: Coding on the Edge (opens in new tab)

(stability.ai)

315 pointsegnehots2y ago140 comments

140 comments

103 comments · 21 top-level

JCM92y ago· 17 in thread

Don’t entirely understand Stability’s business model. They’ve been putting out a lot of models recently and Stable Diffusion was novel at the time, but now their models consistently seem to be somewhat second rate compared to other things out there. For example Midjourney now seems to have far surpassed them in the image generation front. After raising a ton of funding Stability seems to just be throwing a bunch of stuff out there that’s OK but no longer ground breaking. What am I missing?

Many other startups in the space will like face similar issues given the rapid commoditization of these models and the underlying tech. It’s very easy to spend a fortune building a model that offers a short lived incremental improvement at best before one can just quickly swap it out for something else someone else paid to train.

emadm2y ago

Business model is bundling so you have a one stop shop for good quality models of every modality and cultural variants of them.

These go on bedrock, on chip, on prem etc and our consulting partners take them to the end user.

On the innovation side stable diffusion turbo does like 100 cats with hats per second and the video model outperforms runway, pika etc on blind tests.

Stable audio was one of the time innovation of the year winners on music and we released a sota 3d model.

Stable LM zephyr is the best 3b chat model works great on a MacBook Air.

Most of the pixels in the world will be generated so fast high quality image/video are the core and these other models are to support them.

It’s really hard to build good solid models and we are the only company that can build a model of any type for anyone.

refulgentis2y ago

Vouch, I finally tried Stable LM 3b zephyr today and I'm stunned this slipped by. It's the only model I've tried that's not Mistral 7B that can do RAG. And it can run ~any consumer grade hardware released in last 3 years. I'm literally stunned it's been sitting out since December 8th. I've heard 10x more about Phi-2 than it, and I'm not sure why.

(Official ONNX version, please!! Then you get Transformers.js / web / I can deploy on every platform from Web to iOS to Windows)

re: art, Dalle-3 costs significantly more. XL costs are 1/5th of what they were at launch, 0.0002/image versus Dalle-3's 0.04. And you'd be surprised how often people are happy with XL -- Dalle-3's marginal advantage is mostly text, especially with the excessive filtering of stylistic stuff, and forced prompt rewrites

doctorpangloss2y ago

I use Stable Diffusion family models for innovative art products.

On a small scale, you have to professionalize ComfyUI’s development. My PR to make it installable and to make a plugin ecosystem that makes sense should not be sitting unmerged (https://github.com/comfyanonymous/ComfyUI/pull/298).

On a medium scale, CLIP is holding you back. I would eagerly buy a 48GB card to accommodate a batch size 1, gradient checkpointed LoRA-trainable model with T5 for conditioning. I want PixArt-a or DeepFloyd/IF with the SDXL dataset and training. I get I can achieve so much with SDXL on 24GB, including just barely a fine tuning, I understand the engineering decisions here, but it’s too weak on prompts.

On a large scale, I’m willing to spend a little money up front. In those conditions you can be far more innovative, you don’t have to make everything for $0. Shane Carruth didn’t make Primer for $0. I’m sure you’ve seen this movie, you get how astoundingly good it is. But he still spent something. He spent only slightly more than an RTX 6000 Ada.

Innovators have budgets. It’s still worth releasing the most powerful possible model for expensive hardware, this is why everyone is talking about Mixtral, but it’s especially true of visual art.

1 more reply

tlrobinson2y ago

(parent commenter is founder/CEO of Stability AI, Emad Mostaque, I assume)

DreamGen2y ago

> Stable LM zephyr is the best 3b chat model

By what measure? Phi 2 seems better as far as I can tell from benchmarks and usage and has much more permissive license.

1 more reply

rsynnott2y ago

> On the innovation side stable diffusion turbo does like 100 cats with hats per second

2028: Energy use on hat-cat generation exceeds energy use on bitcoin.

1 more reply

amne2y ago

"100 cats with hats per second"

AI has peaked

washadjeffmad2y ago

Midjourney is decidedly underwhelming if you've spent any time using the expansive tooling and control nets of Stable Diffusion. Yes, it's easy to get impressive first gens with MJ, but all of the coolest work and integration happening is using SD.

capybara_20202y ago

It depends. For great looking pics that you need to get out quickly MJ does a great job. Especially with its image + text feature. Dalle is also an interesting choice.

SDXL and controlnet is odd a lot of the time. 1.5 + controlnet still seem to give quicker and better results.

Basically SD atleast seems to be for when you want unique content. MJ/Dalle for everything else.

1 more reply

satvikpendem2y ago

> For example Midjourney now seems to have far surpassed them in the image generation front

Nope. Stable Diffusion with alternative models offers far more customization and control than Midjourney. Midjourney is good for beginners but sucks for experts.

teaearlgraycold2y ago

We use SD at work because we need more control over the image generation pipeline (and to a lesser extent don’t want extra latency from web APIs).

Believe it or not, generating a full image from a prompt is a small slice of the image generation pie. Highly tuned in-painting is key to a number of budding startups.

smusamashah2y ago

Midjourney has better quality but does not offer any control. Community has done and is still doing a a lot with SD models because they can be played and tinkered with in any way anyone wants to.

liuliu2y ago

> one can just quickly swap it out for something else someone else paid to train.

That doesn't seem to be the case. There are very limited open-source models outside of the small-LLM bubble.

JCM92y ago

The open space on small models is a whole other developing angle, but O was referring to the general commoditization of a lot of these models. With rare exception after launch it seems the lifespan of any of these models is rather limited. From a business standpoint that sort of scenario is generally very unattractive and thus was trying to understand if they have some other angle they’re trying to play here to make a viable business out of this. Or the business model can just be get acquired before that matters and let that be someone else’s problem to figure out.

1 more reply

whywhywhywhy2y ago

>Stable Diffusion was novel at the time, but now their models consistently seem to be somewhat second rate compared to other things out there. For example Midjourney now seems to have far surpassed them in the image generation front.

This isn’t entirely true, after the fumble that was SD2 they shipped SDXL and SDXL Turbo that are both excellent. And in real world results Midjourney doesn’t just straight out perform them it’s a lot more complex and ultimately SDXL is the more powerful tool.

Definitely found the LLMs underwhelming and Stable Audios launch was poor but don’t think Midjourney has outright surpassed them on image gen.

wincy2y ago

For what I’d call “art” or at least artsy works, or anything I want to iterate on (using inpainting and redraws) I want to make I use stable diffusion, but if I just want to send some dumb silly picture to send to my friend I’ve found myself using DALL-E almost every single day. It’s just so easy and in 4 images it’ll almost always get pretty close to what I’m describing. I’m constantly sending my friends dumb pictures because it’s really funny and gets a laugh out of people.

That said it was super cool the time I trained a model on my friends selfies and made her into her D&D character. She was super excited about it, made me feel like a real life wizard.

hospitalJail2y ago

>Midjourney now seems to have far surpassed them in the image generation front.

What? Have you actually used either? MJ is just a ultra-fine tuned model with a few layers to prevent stuff from looking bad. Stable Diffusion has their own 'single shot' version, maybe someone remembers it, I played with it for 1-2 hours. Everything looks great, but I want hyper specific stuff in my art and I'm never getting that with 1 shots.

Heck, I did a few flyers and used some icons I made with img2img + inpainting + controlnet. The work is completely stunning and scalable. That is never happening even at an individual level with MJ.

tarruda2y ago· 13 in thread

Note that they don't compare with deepseek coder 6.7b, which is vastly superior to much bigger coding models. Surpassing codellama 7b is not that big of a deal today.

The most impressive thing about these results is how good the 1.3B deepseek coder is.

jyap2y ago

Deepseek Coder Instruct 6.7b has been my local LLM (M1 series MBP) for a while now and that was my first thought… They selectively chose benchmark results to look impressive (which is typical).

I tested out StableLM Zephyr 3B when that came out and it was extremely underwhelming/unusable.

Based on this, Stable Code 3B doesn’t look to be worth trying out. Guessing if they could put out a 7B model which beat Deepseek Coder 6.7B they would have.

a_wild_dandan2y ago

Do you know how Deepseek 33b compares to 6.7b? I'm trying 33b on my (96GB) MacBook just because I have plenty of spare (V)RAM. But I'll run the smaller model if the benefits are marginal in other peoples' experience.

2 more replies

zwarag2y ago

Do you use it inside vscode or how do you integrate an LLM into your IDE?

SubiculumCode2y ago

How do you make use of it? Do you have it integrated directly into an ide?

sroussey2y ago

What do you use it for?

1 more reply

danielbln2y ago

Deepseek-coder-6.7B really is a quite surprisingly capable model. It's easy to give it a spin with ollama via `ollama run deepseek-coder:6.7b`.

triyambakam2y ago

Thanks for the tip with ollama

2 more replies

eyegor2y ago

The 1.3b model is amazing for real time code complete, it's fast enough to be a better intellisense.

Another model you should try is magicoder 6.7b ds (based on deepseek coder). After playing with it for a couple weeks, I think it gives slightly better results than the equivalent deepseek model.

Repo https://github.com/ise-uiuc/magicoder

Models https://huggingface.co/models?search=Magicoder-s-ds

hskalin2y ago

How do you use these models with your editor? (E. vscode or Emacs etc)

2 more replies

varjag2y ago

Not sure what you guys are doing with it but even at 33B it's laughably dumb compared to Mixtral.

a_wild_dandan2y ago

This is phenomenal. And runs fast! The 33b version might be my MacBook's new coding daily driver.

tarruda2y ago

4-bit quantized 33b runs great on a mp pro with m3 max chip

1 more reply

unshavedyak2y ago

How are you using it? I need to find some sane way to use this stuff from Helix/terminal..

1 more reply

knicholes2y ago· 8 in thread

I've got a machine with 4 3090s-- Anyone know which model would perform the best for programming? It's great this can run on a machine w/out a graphics card and is only 3B params, but I have the hardware. Might as well use it.

mistercheph2y ago

Try mistral 8x7b, which some human evals place above gpt-3.5 and you have enough VRAM and compute to make training a LORA either on your own dataset, or one of the freely available datasets on huggingface worthwhile, or at least interesting

tarruda2y ago

AFAIK deepseek coder family are the best open coding models.

I haven't tested, but I think deepseek coder 33b can run in a single RTX 3090 when 4-bit quantized. In your case you might be able to run the non quantized version

Havoc2y ago

The coding models are all small because speed is crucial. If you need to wait 2 seconds for an autocomplete it becomes near useless.

SushiHippie2y ago

Here is a leader board of some models

https://huggingface.co/spaces/mike-ravkine/can-ai-code-resul...

Don't know how biased this leaderboard is, but I guess you could just give some of them a try and see for yourself.

SparkyMcUnicorn2y ago

This is a much better leaderboard: https://evalplus.github.io/leaderboard.html

I've seen the CanAiCode leaderboard several times (and used many of the models listed), but I wouldn't use it to pick a model. It's not a bad list, but the benchmark is too limited. The results are not accurately ranked from best to worst.

For example the deepseek 33b model is ranked 5 spots lower than the 6.7b model, but the 33b model is definitely better. WizardCoder 15b is near the top while WizardCoder 33b is ranked 26 spots lower, which is a wildly inaccurate ranking.

It's worth noting that those 33b models score in the 70s for HumanEval and HumanEval+ while the 15b model scores in the 50s.

1 more reply

mlboss2y ago

Did you build a machine with 4x 3090 ? I looking for a way to build such a machine for ML training.

knicholes2y ago

I did! I started by going to vast.ai. I was able to look at the specs of the top-scoring machines. I started with the motherboard (as I knew it could support my 3090s, because some PCIe busses can't handle all that data). Then of course I copied everything else that I could. I ended up using PCIe extenders and zip-tieing (plastic, I should use metal zip ties instead) the cards to a rack I got from Lowes. I'm not too pleased with how it looks, but it works!

BTW, depending on where you're at in your ML journey, Jeremy Howard from FastAI says you should focus more on using hosted instances like paperspace until you really need to get your own machine. Unless, of course, you enjoy linux sysadmin tasks. :) It can get really annoying trying to match the right version of CUDA with the version of Pytorch you're trying to get running for the newest model you're trying.

1 more reply

mesmertech2y ago

Wondering the same here

keyle2y ago· 6 in thread

That is fantastic. I'm building a small macOS SwiftUI client with llama cpp built in, no server-client model, and it's already so useful with models like openhermes chat 7B, and fast.

If this opens it to smaller laptops, wow!

We truly live in crazy time. The rate of improvement in this field is off the walls.

joshmarlow2y ago

Not sure if this is where your head is, but I think there's a lot of value in integrating LLMs directly into complex software. Jira, Salesforce, maybe K8s - should all have an integrated LLMs that can walk you through how to perform a nuanced task in the software.

manmal2y ago

Imagine good error messages, with hints for mitigation and maybe smart retry w/ mitigations applied.

dpacmittal2y ago

Why would the LLM walk you through and not just do the nuanced task on its own?

2 more replies

debarshri2y ago

Walkthrough is generally performed once or not so frequently. It would be a bad investment if you just use it for just this use case

1 more reply

emadm2y ago

3b is good for 8gb MacBook Air etc. 7b is slightly too big.

Sure these will continue to improve, phi2 is a good base as well

turnsout2y ago

That sounds awesome! Can you share any details about how you're working with llama cpp? Is it just via the Swift <> C bridge? I've toyed with the idea of doing this, and wonder if you have any pointers before I get started.

lfkdev2y ago· 6 in thread

How is this compared to the current GitHub Copilot?

brianjking2y ago

A 3B tiny model is not going to compare to GitHub copilot. However, there are plenty of nice 7B models that are excellent at code and I encourage you to try them out.

londons_explore2y ago

If you just want to get stuff done, use the best tools like a Milwaukee Drill - and right now, thats copilot/gpt-4.

If you don't want to be tied to a company and like opensource, feel free to connect a toy motor to an AA battery to drill your holes... Or to use Llama/Stable Code 3B.

irthomasthomas2y ago

Openai just invisibly dropped my API requests to a lower model with a 4k context limit. And my commit scripts started failing for being over the context limit. It's buried in the docs somewhere that low tier api users will be served on lower models during peek times.

So,I guess they're like a Milwaukee Drill that will sometimes refuse to work unless you buy more drill credits.

2 more replies

bearjaws2y ago

You clearly have never used these other tools. Mixtral / Deepseek perform very well on coding challenges. I've used them against local code without issues, sometimes they are a bit optimistic and produce too much, but thats far better than producing too little (like GPT4 does).

yreg2y ago

A self-hosted solution is a common requirement for security reasons.

mistercheph2y ago

it’s going to be real hard to pry the carburetors out of this guy’s cold dead hands!

akulbe2y ago· 5 in thread

I just tried this model with Koboldcpp on my LLM box. I got gibberish back.

My prompt - "please show me how to write a web scraper in Python"

The response?

<blockquote> I've written my first ever python script about 5 months ago and I really don't remember anything except for the fact that I used Selenium in order to scrape websites (in this case, Google). So you can probably just copy/paste all of these lines from your own Python code which contains logic to determine what value should be returned when called by another piece of software or program. </blockquote>

SushiHippie2y ago

It's very likely a "completion model" and not instruct/chat fine-tuned.

So you'd need to prompt it through comments or by starting with a function name, basically the same as one would prompt GitHub copilot.

e.g.

  # the following code implements a webscraper in python
  class WebScraper:

(I didn't try this, and I'm not good at prompting, but something along the lines of this example should yield better results)

Tiberium2y ago

But it's a code completion model, not a chat/instruct one.

MrNeon2y ago

It is weird that it is not mentioned in the model card but I'm pretty sure it is a completion model, not tuned as an instruct model.

edit: the webpage does call it "Stable Code Completion"

endofreach2y ago

This doesn't seem like gibberish though?

connorgutman2y ago

Same thing with Ollama.

kleiba2y ago· 5 in thread

It's quite amazing - I often find that I read quite positive comments towards LLM tools for coding. Yet, an "Ask HN" I posted a while ago (and which admittedly didn't gain much traction) seemed to mirror mostly negative/pessimistic responses.

https://news.ycombinator.com/item?id=38803836

Was it just that my submission didn't find enough / more balanced commenters?

yorwba2y ago

You got two positive and two negative responses. You replied only to the negative responses. Now you think that the responses were mostly negative. I blame salience bias.

Anyways, there's also a difference between "are you excited about this new thing becoming available" and "now that you've used it, do you like the experience". The former is more likely to feature rosy expectations and the latter bitter disappointment. (Though it could also be the other way around, with people dismissing it at first and then discovering that it's kind of nice actually.)

weebull2y ago

If somebody can show me a coding task that LLMs have successfully done that isn't an interview question or a documentation snippet, I might start to value it.

Spending huge amount of resource to be a bit better at autocompleting code doesn't have value to me. I want it to solve significant problems, and it's looking like it can't do it and scaling it to be able to is totally impractical.

> In aggregate, training all 9 Code Llama models required 400K GPU hours of computation on hardware of type A100-80GB (TDP of 350-400W).

That is: * 45⅔ GPU years * 160 MWh or... * 45 average UK homes annual electric consumption * 18 average US homes * 64 average drivers annual milage in an EV.

...and that's just the GPUs. Add on all the rest of the system (s).

regularfry2y ago

In the grand scheme of things it's ancient history, but https://code-as-policies.github.io/ works by generating code then executing it. That's worth running at. The code generation in that paper was done on code-davinci-002, which is (or rather was - it's deprecated) a 15B GPT-3 model. I've not done it yet, but I'd expect the open source 7B code completion models to be able to replicate it by now.

Havoc2y ago

The precise wording matters.

How has it changed your work life leads people down the rabbit hole of will coding jobs be safe.

This one is a lot more neutral/technical.

simonw2y ago

You only got comments from six people so yeah, definitely not representative.

herval2y ago· 4 in thread

Can anyone explain what’s Stability’s business model (or plan for one)?

I get why Meta releases tons of models, but still can’t quite understand what stability is trying to achieve

sangnoir2y ago

Seems like the standard open-core playbook:

> This model is included in our new Stability AI Membership. Visit our Membership page to take advantage of our commercial Core Model offerings, including SDXL Turbo & Stable Video Diffusion.

A hypothetical Stable Code 13B/70B could be hosted only, with more languages or specialized use-cases (Stable Code 3B iOS-Swift-Turbo)

emadm2y ago

Membership with upsell to support, custom models and more

Plus licensed variant models like stable audio and on chip installation like arm for specialist models eg Japanese law or Indonesian accounting

seydor2y ago

to be bought by meta

bogwog2y ago

This is all an elaborate mating ritual

swyx2y ago· 3 in thread

> License: Other

> Commercial Applications

> This model is included in our new Stability AI Membership. Visit our Membership page to take advantage of our commercial Core Model offerings, including SDXL Turbo & Stable Video Diffusion.

what exactly is the license lol. can people use this or is this "see dont touch"

neurostimulant2y ago

It's free for noncommercial use. If you use it in your company, your company should pay the membership fee. afaik most openai competitors also use similar usage restriction (e.g. free for noncommercial or research use, contact us for commercial license).

blagie2y ago

This basically means "Get sued."

There is no clear legal, definition of "noncommercial," and courts have gone all sorts of different way on what constitutes commercial use.

This is where CC NC licenses imploded. A lot of places (hello, MIT!) intentionally use CC NC licenses to make things appear more open than they are.

1 more reply

jillesvangurp2y ago

There are a growing number of open source options out there. I was playing with Simon Willison's excellent llm cli tool this morning and tried out some models from the gpt4all project. One of the better ones come from companies like mistral which release their models under the Apache license.

Gpt4all has a UI as well that you can use with models running locally on your laptop.

jjtheblunt2y ago· 3 in thread

Jargon naivete question: isn't "on the edge" normally implying on a server side with minimal routers hops to the client, not on client side?

devindotcom2y ago

afaik "edge" nearly always means taking place on the device a user is interacting with. no server involved except perhaps as authentication etc. but there is probably some other situation where "edge" could mean local infra or caching.

WatchDog2y ago

I think the etymology of “edge computing” is derived from “network edge”, ie the outer shell of some network/autonomous system.

The closest point within your control that interfaces with devices outside of your control.

Seeing the term get used to describe client devices themselves kinda muddies the terminology.

2 more replies

xer0x2y ago

+1 yes, for a service using network caching like using Cloudflare. I would've referred to their CDN as the Edge of our network.

alwinaugustin2y ago· 3 in thread

I've been experimenting with code-llama extensively on my laptop, and from my experience, it seems that these models are still in their early stages. I primarily utilize them through a Web UI, where they can successfully refactor code given an existing snippet. However, it's worth noting that they cannot currently analyze entire codebases or packages, refining them based on the most suitable solutions using the most appropriate algorithms. While these models offer assistance to some extent, there is room for improvement in their ability to handle more complex and comprehensive coding scenarios.

danielmarkbruce2y ago

I think there is a decent chance SourceGraph will figure this all out. The most important thing at this point is figuring what context to feed. They can build up a nice graph of a codebase and I expect from there they can put in the best context and then boom.

They might also be able to train a model more intelligently by generating training data from said graphs.

jdorfman2y ago

> I've been experimenting with code-llama extensively on my laptop

You can use/try code-llama with Cody https://sourcegraph.com/blog/cody-vscode-1.1.0-release#:~:te...

weebull2y ago

I'm honestly failing to see the utility for LLMs, because the context for any given problem is far too small, and we're already at 33B parameter models. They just don't seem to be a technology that scales to an interesting problem size.

artninja19882y ago· 3 in thread

Given the complete failure of the first stable lm, I'm interested to try this one out. Haven't really seen a small language model, except mixtral 7b that's really useful for much.

I also hope stability comes out with a competitor to the new midjourney and dalle models! That's what put them on the map in the first place

emadm2y ago

We released a competitor to runway recently that beat it on blind tests, plus way faster image in sdxl turbo

We have been working on ComfyUI for the next step and new image models

Midjourney and others are pipelines versus models so we have a higher bar to jump but the og stable diffusion team are working hard!

tarruda2y ago

Deepseek coder 6.7B is very useful for coding and can run in consumer GPUs.

I use the 6bit GGUF quantized version on a laptop RTX 3070

brianjking2y ago

All of the Mistral versions have been excellent, including the OpenHermes versions. I encourage you to check out Phi-2 as well, it's the only 3b model I've found really quite interesting outside of Replit's code model built into Replit Core.

connorgutman2y ago· 2 in thread

FYI: This model is already available on Ollama.

mistrial92y ago

how do you check that ?

coder5432y ago

https://ollama.ai/library?sort=newest

rahimnathwani2y ago· 1 in thread

How are people using codellama and this in their workflows?

I found one option: https://github.com/xNul/code-llama-for-vscode

But I'm guessing there are others, and they might differ in how they provide context to the model.

weebull2y ago

Ellama for Emacs look promising, but I only tried to install it this morning.

mchiang2y ago· 1 in thread

It's amazing to see more smaller models being released. This creates opportunities for more developers to run it on their local computers, and makes it easier to fine-tune for specific needs.

brcmthrowaway2y ago

Has anyone tried starting with a smaller modeling, then RLing until it improves to the bigger model?

sytelus2y ago· 1 in thread

Why authors miss to compare with Phi-2?

hcarlens2y ago

Agreed, and not only do they not compare their model to Phi-2 directly, the benchmarks they report don't overlap with the ones in the Phi-2 post[1], making it hard for a third party to compare without running benchmarks themselves.

(In turn, in the Phi-2 post they compare Phi-2 to Llama-2 instead of CodeLlama, making it even harder)

[1]: https://www.microsoft.com/en-us/research/blog/phi-2-the-surp...

photon_collider2y ago· 1 in thread

How reliable are these benchmarks?

ilaksh2y ago

I think the trick is that they are just comparing to other tiny models.

None of the little models, including this one, are comparable to the performance of the larger models for any significant coding problem.

I think what these are useful for is mostly giving people hints inside of a code editor. Occasionally filling in the blank.

outcoldman2y ago

I was able to run this model in http://lmstudio.ai as well. Just remove Compatibility Guess in Filters, so you can see all the models. LM Studio can load it and run requests against it.

hospitalJail2y ago

Seems like they caught the Apple Marketing bug and are chasing things noonecares about. Great 3B model, everyone is already running 7B models over here.

Maybe one day when I need to do offline coding on my cellphone, it will be really useful.

alastairr2y ago

does anyone have recommendations for addins to integrate these 'smaller' llms into an IDE like VSCode? I'm pretty embedded with GH copilot, but curious to explore other options.

ihaag2y ago

Terrible model

j / k navigate · click thread line to collapse

140 comments

103 comments · 21 top-level

JCM92y ago· 17 in thread

emadm2y ago

Business model is bundling so you have a one stop shop for good quality models of every modality and cultural variants of them.

These go on bedrock, on chip, on prem etc and our consulting partners take them to the end user.

On the innovation side stable diffusion turbo does like 100 cats with hats per second and the video model outperforms runway, pika etc on blind tests.

Stable audio was one of the time innovation of the year winners on music and we released a sota 3d model.

Stable LM zephyr is the best 3b chat model works great on a MacBook Air.

Most of the pixels in the world will be generated so fast high quality image/video are the core and these other models are to support them.

It’s really hard to build good solid models and we are the only company that can build a model of any type for anyone.

refulgentis2y ago

(Official ONNX version, please!! Then you get Transformers.js / web / I can deploy on every platform from Web to iOS to Windows)

doctorpangloss2y ago

I use Stable Diffusion family models for innovative art products.

Innovators have budgets. It’s still worth releasing the most powerful possible model for expensive hardware, this is why everyone is talking about Mixtral, but it’s especially true of visual art.

1 more reply

tlrobinson2y ago

(parent commenter is founder/CEO of Stability AI, Emad Mostaque, I assume)

DreamGen2y ago

> Stable LM zephyr is the best 3b chat model

By what measure? Phi 2 seems better as far as I can tell from benchmarks and usage and has much more permissive license.

1 more reply

rsynnott2y ago

> On the innovation side stable diffusion turbo does like 100 cats with hats per second

2028: Energy use on hat-cat generation exceeds energy use on bitcoin.

1 more reply

amne2y ago

"100 cats with hats per second"

AI has peaked

washadjeffmad2y ago

capybara_20202y ago

It depends. For great looking pics that you need to get out quickly MJ does a great job. Especially with its image + text feature. Dalle is also an interesting choice.

SDXL and controlnet is odd a lot of the time. 1.5 + controlnet still seem to give quicker and better results.

Basically SD atleast seems to be for when you want unique content. MJ/Dalle for everything else.

1 more reply

satvikpendem2y ago

> For example Midjourney now seems to have far surpassed them in the image generation front

Nope. Stable Diffusion with alternative models offers far more customization and control than Midjourney. Midjourney is good for beginners but sucks for experts.

teaearlgraycold2y ago

We use SD at work because we need more control over the image generation pipeline (and to a lesser extent don’t want extra latency from web APIs).

Believe it or not, generating a full image from a prompt is a small slice of the image generation pie. Highly tuned in-painting is key to a number of budding startups.

smusamashah2y ago

Midjourney has better quality but does not offer any control. Community has done and is still doing a a lot with SD models because they can be played and tinkered with in any way anyone wants to.

liuliu2y ago

> one can just quickly swap it out for something else someone else paid to train.

That doesn't seem to be the case. There are very limited open-source models outside of the small-LLM bubble.

JCM92y ago

1 more reply

whywhywhywhy2y ago

Definitely found the LLMs underwhelming and Stable Audios launch was poor but don’t think Midjourney has outright surpassed them on image gen.

wincy2y ago

That said it was super cool the time I trained a model on my friends selfies and made her into her D&D character. She was super excited about it, made me feel like a real life wizard.

hospitalJail2y ago

>Midjourney now seems to have far surpassed them in the image generation front.

Heck, I did a few flyers and used some icons I made with img2img + inpainting + controlnet. The work is completely stunning and scalable. That is never happening even at an individual level with MJ.

tarruda2y ago· 13 in thread

Note that they don't compare with deepseek coder 6.7b, which is vastly superior to much bigger coding models. Surpassing codellama 7b is not that big of a deal today.

The most impressive thing about these results is how good the 1.3B deepseek coder is.

jyap2y ago

Deepseek Coder Instruct 6.7b has been my local LLM (M1 series MBP) for a while now and that was my first thought… They selectively chose benchmark results to look impressive (which is typical).

I tested out StableLM Zephyr 3B when that came out and it was extremely underwhelming/unusable.

Based on this, Stable Code 3B doesn’t look to be worth trying out. Guessing if they could put out a 7B model which beat Deepseek Coder 6.7B they would have.

a_wild_dandan2y ago

2 more replies

zwarag2y ago

Do you use it inside vscode or how do you integrate an LLM into your IDE?

SubiculumCode2y ago

How do you make use of it? Do you have it integrated directly into an ide?

sroussey2y ago

What do you use it for?

1 more reply

danielbln2y ago

Deepseek-coder-6.7B really is a quite surprisingly capable model. It's easy to give it a spin with ollama via `ollama run deepseek-coder:6.7b`.

triyambakam2y ago

Thanks for the tip with ollama

2 more replies

eyegor2y ago

The 1.3b model is amazing for real time code complete, it's fast enough to be a better intellisense.

Another model you should try is magicoder 6.7b ds (based on deepseek coder). After playing with it for a couple weeks, I think it gives slightly better results than the equivalent deepseek model.

Repo https://github.com/ise-uiuc/magicoder

Models https://huggingface.co/models?search=Magicoder-s-ds

hskalin2y ago

How do you use these models with your editor? (E. vscode or Emacs etc)

2 more replies

varjag2y ago

Not sure what you guys are doing with it but even at 33B it's laughably dumb compared to Mixtral.

a_wild_dandan2y ago

This is phenomenal. And runs fast! The 33b version might be my MacBook's new coding daily driver.

tarruda2y ago

4-bit quantized 33b runs great on a mp pro with m3 max chip

1 more reply

unshavedyak2y ago

How are you using it? I need to find some sane way to use this stuff from Helix/terminal..

1 more reply

knicholes2y ago· 8 in thread

mistercheph2y ago

tarruda2y ago

AFAIK deepseek coder family are the best open coding models.

I haven't tested, but I think deepseek coder 33b can run in a single RTX 3090 when 4-bit quantized. In your case you might be able to run the non quantized version

Havoc2y ago

The coding models are all small because speed is crucial. If you need to wait 2 seconds for an autocomplete it becomes near useless.

SushiHippie2y ago

Here is a leader board of some models

https://huggingface.co/spaces/mike-ravkine/can-ai-code-resul...

Don't know how biased this leaderboard is, but I guess you could just give some of them a try and see for yourself.

SparkyMcUnicorn2y ago

This is a much better leaderboard: https://evalplus.github.io/leaderboard.html

It's worth noting that those 33b models score in the 70s for HumanEval and HumanEval+ while the 15b model scores in the 50s.

1 more reply

mlboss2y ago

Did you build a machine with 4x 3090 ? I looking for a way to build such a machine for ML training.

knicholes2y ago

1 more reply

mesmertech2y ago

Wondering the same here

keyle2y ago· 6 in thread

That is fantastic. I'm building a small macOS SwiftUI client with llama cpp built in, no server-client model, and it's already so useful with models like openhermes chat 7B, and fast.

If this opens it to smaller laptops, wow!

We truly live in crazy time. The rate of improvement in this field is off the walls.

joshmarlow2y ago

manmal2y ago

Imagine good error messages, with hints for mitigation and maybe smart retry w/ mitigations applied.

dpacmittal2y ago

Why would the LLM walk you through and not just do the nuanced task on its own?

2 more replies

debarshri2y ago

Walkthrough is generally performed once or not so frequently. It would be a bad investment if you just use it for just this use case

1 more reply

emadm2y ago

3b is good for 8gb MacBook Air etc. 7b is slightly too big.

Sure these will continue to improve, phi2 is a good base as well

turnsout2y ago

lfkdev2y ago· 6 in thread

How is this compared to the current GitHub Copilot?

brianjking2y ago

A 3B tiny model is not going to compare to GitHub copilot. However, there are plenty of nice 7B models that are excellent at code and I encourage you to try them out.

londons_explore2y ago

If you just want to get stuff done, use the best tools like a Milwaukee Drill - and right now, thats copilot/gpt-4.

If you don't want to be tied to a company and like opensource, feel free to connect a toy motor to an AA battery to drill your holes... Or to use Llama/Stable Code 3B.

irthomasthomas2y ago

So,I guess they're like a Milwaukee Drill that will sometimes refuse to work unless you buy more drill credits.

2 more replies

bearjaws2y ago

yreg2y ago

A self-hosted solution is a common requirement for security reasons.

mistercheph2y ago

it’s going to be real hard to pry the carburetors out of this guy’s cold dead hands!

akulbe2y ago· 5 in thread

I just tried this model with Koboldcpp on my LLM box. I got gibberish back.

My prompt - "please show me how to write a web scraper in Python"

The response?

SushiHippie2y ago

It's very likely a "completion model" and not instruct/chat fine-tuned.

So you'd need to prompt it through comments or by starting with a function name, basically the same as one would prompt GitHub copilot.

e.g.

  # the following code implements a webscraper in python
  class WebScraper:

(I didn't try this, and I'm not good at prompting, but something along the lines of this example should yield better results)

Tiberium2y ago

But it's a code completion model, not a chat/instruct one.

MrNeon2y ago

It is weird that it is not mentioned in the model card but I'm pretty sure it is a completion model, not tuned as an instruct model.

edit: the webpage does call it "Stable Code Completion"

endofreach2y ago

This doesn't seem like gibberish though?

connorgutman2y ago

Same thing with Ollama.

kleiba2y ago· 5 in thread

https://news.ycombinator.com/item?id=38803836

Was it just that my submission didn't find enough / more balanced commenters?

yorwba2y ago

You got two positive and two negative responses. You replied only to the negative responses. Now you think that the responses were mostly negative. I blame salience bias.

weebull2y ago

If somebody can show me a coding task that LLMs have successfully done that isn't an interview question or a documentation snippet, I might start to value it.

> In aggregate, training all 9 Code Llama models required 400K GPU hours of computation on hardware of type A100-80GB (TDP of 350-400W).

That is: * 45⅔ GPU years * 160 MWh or... * 45 average UK homes annual electric consumption * 18 average US homes * 64 average drivers annual milage in an EV.

...and that's just the GPUs. Add on all the rest of the system (s).

regularfry2y ago

Havoc2y ago

The precise wording matters.

How has it changed your work life leads people down the rabbit hole of will coding jobs be safe.

This one is a lot more neutral/technical.

simonw2y ago

You only got comments from six people so yeah, definitely not representative.

herval2y ago· 4 in thread

Can anyone explain what’s Stability’s business model (or plan for one)?

I get why Meta releases tons of models, but still can’t quite understand what stability is trying to achieve

sangnoir2y ago

Seems like the standard open-core playbook:

> This model is included in our new Stability AI Membership. Visit our Membership page to take advantage of our commercial Core Model offerings, including SDXL Turbo & Stable Video Diffusion.

A hypothetical Stable Code 13B/70B could be hosted only, with more languages or specialized use-cases (Stable Code 3B iOS-Swift-Turbo)

emadm2y ago

Membership with upsell to support, custom models and more

Plus licensed variant models like stable audio and on chip installation like arm for specialist models eg Japanese law or Indonesian accounting

seydor2y ago

to be bought by meta

bogwog2y ago

This is all an elaborate mating ritual

swyx2y ago· 3 in thread

> License: Other

> Commercial Applications

> This model is included in our new Stability AI Membership. Visit our Membership page to take advantage of our commercial Core Model offerings, including SDXL Turbo & Stable Video Diffusion.

what exactly is the license lol. can people use this or is this "see dont touch"

neurostimulant2y ago

blagie2y ago

This basically means "Get sued."

There is no clear legal, definition of "noncommercial," and courts have gone all sorts of different way on what constitutes commercial use.

This is where CC NC licenses imploded. A lot of places (hello, MIT!) intentionally use CC NC licenses to make things appear more open than they are.

1 more reply

jillesvangurp2y ago

Gpt4all has a UI as well that you can use with models running locally on your laptop.

jjtheblunt2y ago· 3 in thread

Jargon naivete question: isn't "on the edge" normally implying on a server side with minimal routers hops to the client, not on client side?

devindotcom2y ago

WatchDog2y ago

I think the etymology of “edge computing” is derived from “network edge”, ie the outer shell of some network/autonomous system.

The closest point within your control that interfaces with devices outside of your control.

Seeing the term get used to describe client devices themselves kinda muddies the terminology.

2 more replies

xer0x2y ago

+1 yes, for a service using network caching like using Cloudflare. I would've referred to their CDN as the Edge of our network.

alwinaugustin2y ago· 3 in thread

danielmarkbruce2y ago

They might also be able to train a model more intelligently by generating training data from said graphs.

jdorfman2y ago

> I've been experimenting with code-llama extensively on my laptop

You can use/try code-llama with Cody https://sourcegraph.com/blog/cody-vscode-1.1.0-release#:~:te...

weebull2y ago

artninja19882y ago· 3 in thread

Given the complete failure of the first stable lm, I'm interested to try this one out. Haven't really seen a small language model, except mixtral 7b that's really useful for much.

I also hope stability comes out with a competitor to the new midjourney and dalle models! That's what put them on the map in the first place

emadm2y ago

We released a competitor to runway recently that beat it on blind tests, plus way faster image in sdxl turbo

We have been working on ComfyUI for the next step and new image models

Midjourney and others are pipelines versus models so we have a higher bar to jump but the og stable diffusion team are working hard!

tarruda2y ago

Deepseek coder 6.7B is very useful for coding and can run in consumer GPUs.

I use the 6bit GGUF quantized version on a laptop RTX 3070

brianjking2y ago

connorgutman2y ago· 2 in thread

FYI: This model is already available on Ollama.

mistrial92y ago

how do you check that ?

coder5432y ago

https://ollama.ai/library?sort=newest

rahimnathwani2y ago· 1 in thread

How are people using codellama and this in their workflows?

I found one option: https://github.com/xNul/code-llama-for-vscode

But I'm guessing there are others, and they might differ in how they provide context to the model.

weebull2y ago

Ellama for Emacs look promising, but I only tried to install it this morning.

mchiang2y ago· 1 in thread

It's amazing to see more smaller models being released. This creates opportunities for more developers to run it on their local computers, and makes it easier to fine-tune for specific needs.

brcmthrowaway2y ago

Has anyone tried starting with a smaller modeling, then RLing until it improves to the bigger model?

sytelus2y ago· 1 in thread

Why authors miss to compare with Phi-2?

hcarlens2y ago

(In turn, in the Phi-2 post they compare Phi-2 to Llama-2 instead of CodeLlama, making it even harder)

[1]: https://www.microsoft.com/en-us/research/blog/phi-2-the-surp...

photon_collider2y ago· 1 in thread

How reliable are these benchmarks?

ilaksh2y ago

I think the trick is that they are just comparing to other tiny models.

None of the little models, including this one, are comparable to the performance of the larger models for any significant coding problem.

I think what these are useful for is mostly giving people hints inside of a code editor. Occasionally filling in the blank.

outcoldman2y ago

I was able to run this model in http://lmstudio.ai as well. Just remove Compatibility Guess in Filters, so you can see all the models. LM Studio can load it and run requests against it.

hospitalJail2y ago

Seems like they caught the Apple Marketing bug and are chasing things noonecares about. Great 3B model, everyone is already running 7B models over here.

Maybe one day when I need to do offline coding on my cellphone, it will be really useful.

alastairr2y ago

does anyone have recommendations for addins to integrate these 'smaller' llms into an IDE like VSCode? I'm pretty embedded with GH copilot, but curious to explore other options.

ihaag2y ago

Terrible model

j / k navigate · click thread line to collapse