StableCode (opens in new tab)

(stability.ai)

305 pointskpozin2y ago107 comments

107 comments

80 comments · 18 top-level

Uptrenda2y ago· 22 in thread

> People of every background will soon be able to create code to solve their everyday problems and improve their lives using AI, and we’d like to help make this happen

Yeah, this is not going to happen. Anyone who has ever tried to gather requirements for software knows that users don't know what they want (clients especially lmao.) The language they use won't be detailed enough to create anything meaningful. Do you know what language would be? Code... Unironically, the best language for software isn't English. It's code. Should you specify what you want in enough detail for it to be meaningful suddenly you're doing something quite peculiar. Where have I heard it before? Oh yeah, you're writing code.

These tools are all annoying AF. Developers don't need half-baked hints to write basic statements and regular people don't have the skills to hobble together whatever permutations these things spit out. Which rather begs the question: who the hell is the audience for this?

spott2y ago

The reason that people need to gather precise requirements that are precise is because the specifications -> product loop is long. Imprecision results in lots of wasted effort.

If that loop is shortened drastically, then trying, checking and tweaking is suddenly a much more viable design method. That doesn’t require a precise set of requirements.

chii2y ago

> That doesn’t require a precise set of requirements.

this exactly.

If the AI could make something that semi works, and you check the output, and repeat until you find the output satisfying, then it will be one of the biggest improvements to software development. Sure, you wouldn't use it to write mission critical software such as aviation, etc. But you'd use it to automate the sorting of your email, or write a quick auto-reply and auto mail merge, or bang out a quick site.

1 more reply

dragonwriter2y ago

> If that loop is shortened drastically, then trying, checking and tweaking is suddenly a much more viable design method.

No, you still need the skill of gathering precise requirements, otherwise you end up in endless churn of implementing the wrong requirements and then implementing the wrong corrections when you get bad corrections.

(Maybe we didn’t know before the general adoption of notionally-Agile development methods, which didn’t have this as there premise but were focused on other benefits of a shortened spec->product loop, we certainly know it after the widespread adoption of those methods.)

Shortened development loop does mean that you are more likely to have the whole market/domain shift under you between the time the requirements are defined and when the system is implemented, though, a frequently-realized risk with big upfront design that renders even precise and accurate (at the time gathered) requirements incorrect when implemented.

Overdone2y ago

A better tool to improve production would be an expert system that would gather requirements. But the people who want software to do their job either can't specify what they do or don't want to invest any of their time in what they see as someone else's job.

abi2y ago

You're missing the point: Natural language can be much a higher layer of abstraction than the programming languages we currently have. It's much faster to say "Add a button to download the output as a PDF" than write JS directly.

You'd be surprised by what regular people can build when you give them the power to create software. Here are a bunch of apps created using my tool/GPT-4: https://showcase.picoapps.xyz Most of our users have never coded before, and are able to build small tools to make their and their customers' lives better.

pphysch2y ago

Sure, they can "build" it, but can they maintain it? Can they only add more layers of mud? How do you refactor the statement "Add a button to download the output as a PDF"?

That's not a replacement for software engineering.

2 more replies

vGPU2y ago

Hence, the usefulness of tools like UI.Vision RPA

chrisnight2y ago

Additional idea for why this won't be used as much as they hope: Creating the software is just one part of the entire process of utilizing software. Small scripts actually could be possible to be written by AI, but actually using them could turn up challenging for normal users.

Anyone who has setup a coding project knows that actually creating the project structure, setting up dependencies, build scripts, making the code compile/be interpreted are all problems that can have extremely obscure, frustrating errors, and they happen before you even start coding.

Then, not to mention, deploying the software. Even if you give someone code, they won't immediately know how to run it. End users get worried at the idea of opening a terminal and running a command in it, no matter how easy it is. Not to mention setting up the software to do so. (Is the right Python version even installed?)

As such, even if an AI could write a perfect script in code from standard text to, say, lowercase all of the words in a document, it would still be hard for non-developers to use because of the surrounding knowledge barrier, outside of the code itself. Although, yeah, it would be easier.

vineyardmike2y ago

> These tools are all annoying AF. Developers don't need half-baked hints to write basic statements and regular people don't have the skills to hobble together whatever permutations these things spit out. Which rather begs the question: who the hell is the audience for this?

On the contrary: developers are exactly the people capable of handling those complex requirements you speak of. As a developer, getting a computer to handle basic statements is great and frees you to handle the big stuff.

Being able to write “// Retrieves second value from path” and have the computer spit out some string parsing method is great. All those little helper methods that showily fill up projects are great candidates for an AI. Especially if it helps you break up code into smaller, more composable (and disposable) chunks. If an AI writes the code, and can easily do it again, maybe people would be more willing to delete stuff that isn’t needed.

quadrature2y ago

This is already happening.

It’s true that they won’t know how to exactly specify their needs. But they can share input and output examples and iterate on the solution.

I know folks without any programming background using ChatGPT to write code for them.

The code doesn’t work right off the bat but by iterating with the agent they can either get a solution or solve a portion of the problem.

mavili2y ago

if you mean things like "write me Python code to read a file and count all occurrences of a word" then all cool. Good luck with something like "write me some Java code for Android to send a message to an app user"

1 more reply

pylua2y ago

It’s about giving the domain experts who understand what the requirements should be a way to build something without having to have the domain knowledge of code.

smoldesu2y ago

I think the problem with that is the same reason why "no code" platforms struggle to succeed. Writing software without understanding control flow or libraries or APIs is practically impossible. Instead of being a liberating experience like it should be, it becomes a confining and frustrating one where you don't understand what is and isn't possible.

LLMs will work really well when developers know what they want and how to ask for it, same with many no-code platforms. If you don't understand programming though, you can't even know if your request is possible.

2 more replies

dragonwriter2y ago

Yes, the problem that had been “solved” a couple times a decade since the 1970s, at least, and every time the new tools that do that end up mostly used by professional software developers, not domain experts. Honestly, the only thing that has come close to actually even minimally addressing this is spreadsheet software.

It hasn't been solved any better this time, either.

Overdone2y ago

Sorry, but in my experience fish think about water more than domain experts think about requirements.

mavili2y ago

I agree the claim isn't going to happen.

> Which rather begs the question: who the hell is the audience for this?

In my opinion audience for code-generation AI are developers, not the general public. It's immensely useful to be assisted by AI to autocomplete and suggest my code. Whether that is because I'm not familiar with a language syntax or just don't have all the language API in my head.

The general public isn't going to have a clue how to put things together, and until AI can generate reliable and fully functioning code I doubt this is ever going to be for the general public. AI right now is essentially the combination of Google+StackOverflow for me but in a much faster pace. Instead of browsing through tens of SO questions and Google links to get to the exact situation I'm in I can just prompt the AI with all the details and get one response that has the answer to my problem, usually!

cyanydeez2y ago

Shit, even if you give someone all the Lego blocks in the world and a infinity accurate picture of the minimally complex final product, less than 1% would figure it out.

I bootstrapped dev learning by collecting all the necessary pieces of code but at the end of the data I feel like I'm just writing a huge semitechnical novel and the problems I encounter have nothing to do with the basic building blocks, it's entirely about code flow, data flow, entry points, race conditions and things you encounter after you hit 99% of test cases.

This stuff seems like new age "low code" environments.

Overdone2y ago

The audience for this is management. They'll spend lots of their budget on it. They'll use this to put something together that does something, just not what they want. Then they'll show it to you and tell you to make it work the way they think they want. After all, if a manager can "do it" in 90 minutes with no training a developer should be able to make it perfect in a few days. And they'll make you use the new tool so you learn it and so they can justify the expense.

gabereiser2y ago

The point is the nuance of writing code is now no longer an elitist act that strikes the egos of those who understand the intricacies, it’s now democratized and in the hands of anyone. It’s not “good code” but it can be. It’s a kin to hiring your nephew that says he can code but can’t really other than stdio stuff but at least has the right attitude and asks the right questions.

I do believe there will be a day where we communicate what we need and software is written on the fly to facilitate that need. For better or worse.

dragonwriter2y ago

> The point is the nuance of writing code is now no longer an elitist act that strikes the egos of those who understand the intricacies, it’s now democratized and in the hands of anyone.

Insofar as anything like that was ever true, it still is.

Not that writing code has ever been the hard part of software development.

elisharobinson2y ago

Assuming that the requirement quality is a constant . And humans as system have the ability to compile this high level instruction to low level code. now imagine there exists systems which A. Augments human ability making it more efficient. or B. replace humans completely. The only reason this is valuable is that it might POTENTIALLY reduces the $/hr cost of the system.

yieldcrv2y ago

Copium

More people will be able to express themselves, it doesn’t matter that your uncle won’t

sebzim45002y ago· 9 in thread

Hard to believe it can work that well when it only has 3B parameters, but I'd love to be proven wrong.

thewataccount2y ago

I was impressed enough by replit's 2.7B model that I'm convinced it's doable. I have a 4090 and consider that the "max expected card for a consumer to own".

Also exllama doesn't support non-llama models and the creator doesn't seem interested in adding support for wizardcoder/etc. Because of this, using the alternatives are prohibitively slow to use a quantized 16B model on a 4090 (if the exllama author reads this _please_ add support for other model types!).

3B models with refact are pretty snappy with Refact, about as fast as github copilot. The other benefit is more context space, which will be a limiting factor for 16B models.

tl;dr - I think we need ~3B models if we want any chance of consumer hardware to reasonably run coding models akin to github copilot with decent context length. And I think it's doable.

eyegor2y ago

I'm fairly confident a coding specific model should be a lot smaller - 3b should be plenty if not 1b or less. As it stands, there are quite a few 7-13b model sizes that can predict natural language quite well. Code seems at its surface a much simpler language, strict grammars, etc so I wouldn't think it needs to be anywhere near as large as the nlp models. Right now people are retraining nlp models to work with code, but I think the best code helper models in the future will be trained primarily on code and maybe fine tuned on some language. I'm thinking less of a chat bot api and more of a giant leap in "intellisense" services.

2 more replies

thorum2y ago

replit’s model is surprisingly good at generating code, even at following complex instructions that I was sure would confuse it. I have found it’s a bit weak on code analysis, for open-ended questions like ‘is there a bug anywhere in this code?’ that GPT-4 can answer.

brucethemoose22y ago

exLlama is not the only viable quantized backend. TVM (as use by mlc-llm) and GGML (which is used by llama.cpp) are very strong contenders.

~7B-13B will work in 16GB RAM with pretty much any dGPU for help, and context extending tricks.

TBH I suspect Stability released a 3B model because its cheap and quick to train. If they really wanted a good model on modest devices, they would have re used a supported architecture (like Falcon, MPT, Llama, Starcoder...) or contributed support to a good backend.

*Also, I think any PyTorch based model is not really viable for consumer use. Its just too finicky to install and too narrow with hardware support.

nwoli2y ago

Reminder that GPT-2 was considered “too dangerous” to be released at just 1.5B weights

ben_w2y ago

My memory may be imperfect, but I thought it was more "we aren't sure and we want to promote a culture of safety" rather than "this is definitely unsafe… oh wait never mind"?

1 more reply

capableweb2y ago

I had that thought at first too, but then the scope is really small (programming) compared to other models (everything) so might not be that bad.

csjh2y ago

phi-1[0] is only 1.3 billion parameters and performs very well in coding tasks - small models have a massive amount of potential

[0] - https://arxiv.org/abs/2306.11644

politelemon2y ago

But it does mean, hopefully, it is easier to run on small hardware. Making it much more accessible.

runako2y ago· 8 in thread

Is this a "product" that one could install and use or a model that one should expect an OEM to integrate into a product before programmers can use it? I'm asking because I don't see any links that would help me figure out how to try it out.

yohannparis2y ago

To be honest, you’d better buy GitHub co-pilot and enjoy the productivity boost at a cheap price. Trying to download/install/setup/use StableCode is worth it only if you want to learn all those steps as well. If what you care is the final result, just buy an existing service.

hmottestad2y ago

I have bought into co-pilot, but I can’t say it’s that much of a productivity boost. More often than not it recommends something completely wrong. I guess it might be more useful if I did more spring boot or maybe hibernate.

I’ve found chat gpt to be more helpful in general. I can paste some code in and have a discussion about what I want it to fix for me.

arcanemachiner2y ago

I may put all my open source stuff on GitHub, but hell will freeze over before I willingly let Microsoft get a whiff of my private data, no matter how irrelevant it may be.

GitHub Copilot sounds pretty neat though, I will admit that.

fruit20202y ago

I didn’t have a good experience with copilot. It was ok for some auto completions, but I found it very distracting to correct it many times when it didn’t do well. It’s like it interrupted my flow. Maybe there is a shortcut to enable it on demand rather than always on, but it wasn’t obvious in the Jetbrains plugin

selfhoster112y ago

Yeah, why not upload all my employer's proprietary code to Microsoft? What could possibly go wrong?

I get that consuming an API is far easier than setting up your own inference backend, but there are legitimate issues to consider before going in that direction.

nwoli2y ago

Ctrl-F for “Code for using StableCode Instruct to generate a response to a given instruction.” and you’ll see a super straightforward piece of code to copy to test it out for generating code

runako2y ago

Thanks! The verbiage at the beginning of the announcement seems to go out of its way to not call StableCode a "model," which was confusing. By contrast, the recent release of SDXL 1.0 is described as a "model" in its announcement.

carom2y ago

Yes, the model is available. However, it just released so no one has wrapped it in a plugin yet. I would expect that within the month there will be a nicely runnable local version, similar to llama2's wrappers.

3rd32y ago· 7 in thread

How does it compare to GitHub Copilot?

miohtama2y ago

The model, source, etc. are available under permissive terms

https://huggingface.co/stabilityai/stablecode-instruct-alpha...

You can “run it locally”. Very handy if you do not trust automatically sending all your code to someone in the United States.

lolinder2y ago

> to reproduce, distribute, and create derivative works of the Software Products solely for your non-commercial research purposes

I wouldn't call these terms permissive. It's in line with the recent trend in released AI models, but fairly restrictive in what you're actually allowed to do with it.

1 more reply

UncleOxidant2y ago

Hmmm... so on that hugging face page there's a text box where you enter input then you click the 'compute' button.

So I asked it to "Write a python function that computes the square of the input number."

And it responds with:

     def square(x):

Which seems quite underwhelming.

1 more reply

jstummbillig2y ago

When they don't voluntarily answer the question, you know the answer.

sebzim45002y ago

It's not easy to compare them, to be fair.

I guess you could come up with a thousand example prompts and pay some students to pick which output is better, but I can also see why you wouldn't bother. It probably depends on language, type of prompt, etc.

2 more replies

karmasimida2y ago

On HumanEval, Copilot is 40+ on pass@1 comparing to 26 for stable code 3b.

HumanEval is abused but this model is only good for its size, it is no match for Copilot … yet

UncleOxidant2y ago

> On HumanEval, Copilot is 40+ on pass@1 comparing to 26 for stable code 3b.

Can you put those numbers into context for those who haven't done HumanEval? Are those percentages so that 40+ means 40+% and 26 is 26%? If so does that imply both would be failing scores?

rvz2y ago· 5 in thread

Either way, the race to zero has been further accelerated.

Stability AI, Apple, Meta, etc are clearly at the finish line putting pressure on cloud only AI models and cannot raise prices or compete with free.

_pdp_2y ago

Lots of folks out there would rather skip the hassle of running their own models, and that's totally understandable. Similarly, you've got plenty of folks who'd rather pay for managed hosting services instead of dealing with the nitty-gritty of setting up everything themselves using free tools. This opens up exciting opportunities for successful companies to offer some real perks – think convenience, a smoother user experience, and lightning-fast speeds, just to name a few! All of these things save time and are worth paying for.

thewataccount2y ago

> Stability AI, Apple, Meta, etc are clearly at the finish line

I'm very optimistic and expect them to catch up. I've used the open models a lot, to be clear they are starting to compare to GPT3.5Turbo right now, they can't compete with GPT4 at all. GPT4 is almost a year old from when it finished training I think?

I expect open source models to stay ~1.5 years behind. That said they will eventually be "good enough".

Keep in mind too though that using and scaling GPUs is not free. You have to run the models somewhere. Most businesses will still prefer a simple api to call instead of managing the infrastructure. On top of this many business (medium and smaller) will likely find models like GPT4 to be sufficient for their workload, and will appreciate the built in "rails" for their specific usecases.

tl;dr - open models don't even compare to GPT4 yet (I use them all daily), they aren't free to run, and a API option is still preferably to a massive if not most companies.

nwoli2y ago

> Keep in mind too though that using and scaling GPUs is not free. You have to run the models somewhere.

Long or medium term these will probably be dirt cheap to just run in the background though. It might be within 3-5 years since parallel compute is still growing and isn’t as bounded by moores law stagnation

1 more reply

empath-nirvana2y ago

Open Source doesn't mean free. It costs a lot of money to run models and keep models up to date, and maybe a "good enough" model runs relatively cheaply, but there's always going to be a "state of the art" that people are willing to pay for.

brucethemoose22y ago

Hardware is still a limiting factor.

Cloud AI providers get a big advantage from batching/pipelining and fancy ASICs. The question is how much they are willing to lower the tax.

cutler2y ago· 4 in thread

Yet another site whose data privacy policy amounts to nothing more than an Accept button. Refuse to use such sites.

capableweb2y ago

It's a model you download and run yourself, on your own hardware. No privacy policy needed.

barrotes2y ago

He's referring to the actual website, which doesn't give you the option of reject profilation cookies (mandatory in Europe). I commented about another website posted here few days ago. It gets me mad too

smcleod2y ago

Use uBlock Origin and then you won't have to see them ;)

cutler2y ago

But won't that just default you to agreeing to whatever.

1 more reply

RomanPushkin2y ago· 4 in thread

Is it good at algos?

From interviews:

Implement queue that supports three methods:

* push

* pop

* peek(i)

peek returns element by its index. All three methods should have O(1) complexity [write code in Ruby].

ChatGPT wasn't able to solve that last time I tried https://twitter.com/romanpushkin/status/1617037136364199938

anotherpaulg2y ago

I tried using aider to work with GPT-4 on this problem. Initially it went for a solution based on `shift`. But when challenged, it realized that shift was O(n) and was able to come up with a dual stack solution. It considers this solution O(1) when amortized over many operations. I don't know ruby well, so I can't verify that.

https://aider.chat/share/?mdurl=https://gist.github.com/paul...

redox992y ago

GPT4 made a mistake on its first try, but after asking what the complexity of pop is, it figured out its mistake and fixed it.

https://chat.openai.com/share/d527f65f-8a6d-4602-acab-4d80ed...

voxl2y ago

in what world is a hashtable lookup worst case O(1)? Your own solution doesn't match your requirements.

If you want amortized complexity then a simple vector suffices.

RomanPushkin2y ago

1. I like toxic comments like that saying something is simple without actually solving the problem, you're the best.

2. The average complexity to search, insert, and delete data in a hash table is O(1), for interviews it works 99% of the time.

3. There is alternative O(1) solution you're looking for, I'll leave this exercise to you, bro. As well as the other exercise of being less toxic and a bit more respectful to people you don't know online lol.

2 more replies

thewataccount2y ago· 2 in thread

I can't seem to find a demo, if anyone has a chance to test it, how does it compare to replit and wizardcoder?

james2doyle2y ago

Looks like there is one on the Hugging Face page: https://huggingface.co/stabilityai/stablecode-instruct-alpha...

Not very promising based on this lame test

politelemon2y ago

I ran it locally and it seemed to do better. I switched Python to Bash and it also gave a good answer (nproc).

smcleod2y ago· 1 in thread

If I’m reading this correctly this could be an open source model that may compete with the likes of copilot?

That is something I’d be very interested in if they can get the compute requirements down to those of say a standard 13B model. Then I could fine tune (correct term?) it on my offline data and hook it into something like fauxpilot and my IDE.

I had a look at some of the recent code models (wizardcoder,strider etc) but it seemed that you need a really large model to be any good and quite a few of them were trying specifically for python.

smcleod2y ago

Trained specifically for Python*

gaogao2y ago

Its metrics on HumanEval seem not particularly good (26.89 Pass@1 for it vs. 61.64 for PanGu-Coder2 15b). Is it targeting a very specific latency for responses? I'd think a 15b quantization should run fast enough for most use cases? Even phi-1 1.3B has better performance at 50.6.

dragonwriter2y ago

> People of every background will soon be able to create code to solve their everyday problems and improve their lives using AI, and we’d like to help make this happen

Just like everytime people hyping a technology have said this with something else where “AI” is but otherwise an identical claim, no, it didn’t happen last time, its not happening this time, and there’s a pretty good chance its not happening next time, either.

MyAccountYo2y ago

I have thought about how these tools can be useful quite a lot. I have a prompt I can feed chat gpt and it will create whole feature "skeletons" with my naming rules and architecture quirks. Taking a lot of time from getting started when building something new. But with chat it is still too inconvenient, having something like this integrated in the ide via a script would he more convenient but still a very specific use case.

I think what I want is this idea of "code completion" but not for writing the methods, which is the easy part. Instead the tool should structure classes and packages and modules and naming and suggest better ways to write certain things.

jaimani_langoo2y ago

AI Cannot magically read minds. Having said that It would be nicer to have complete solutions rather than code hints. Imagine having to write a detailed prompt rather than choosing a prediction. Something like : "Write a React/Node JS app that has authentication and a home page" and the AI model give you a complete project as the output. It would be great if it generates deterministic output for the Prompt. AI can really help increase the productivity of Programmers.

whimsicalism2y ago

> ~120,000 code instruction/response pairs in Alpaca format were trained on the base model to achieve this result.

Very curious where they are getting this data from. In other open source papers, usually this comes from a GPT-4 output, but presumably Stability would not do that?

ethereal_ai2y ago

As a user who cares more about the product, how does it compare to the gpt-4 code capability? gpt-4 is good enough for me, if it works better than gpt-4 I would love to try it!

nwoli2y ago

I love stability AI

monlockandkey2y ago

Any performance metrics?

eduardocrs2y ago

"People will never ..."

Ai: "Hold my beer".

j / k navigate · click thread line to collapse

107 comments

80 comments · 18 top-level

Uptrenda2y ago· 22 in thread

> People of every background will soon be able to create code to solve their everyday problems and improve their lives using AI, and we’d like to help make this happen

spott2y ago

The reason that people need to gather precise requirements that are precise is because the specifications -> product loop is long. Imprecision results in lots of wasted effort.

If that loop is shortened drastically, then trying, checking and tweaking is suddenly a much more viable design method. That doesn’t require a precise set of requirements.

chii2y ago

> That doesn’t require a precise set of requirements.

this exactly.

1 more reply

dragonwriter2y ago

> If that loop is shortened drastically, then trying, checking and tweaking is suddenly a much more viable design method.

Overdone2y ago

abi2y ago

pphysch2y ago

Sure, they can "build" it, but can they maintain it? Can they only add more layers of mud? How do you refactor the statement "Add a button to download the output as a PDF"?

That's not a replacement for software engineering.

2 more replies

vGPU2y ago

Hence, the usefulness of tools like UI.Vision RPA

chrisnight2y ago

vineyardmike2y ago

quadrature2y ago

This is already happening.

It’s true that they won’t know how to exactly specify their needs. But they can share input and output examples and iterate on the solution.

I know folks without any programming background using ChatGPT to write code for them.

The code doesn’t work right off the bat but by iterating with the agent they can either get a solution or solve a portion of the problem.

mavili2y ago

1 more reply

pylua2y ago

It’s about giving the domain experts who understand what the requirements should be a way to build something without having to have the domain knowledge of code.

smoldesu2y ago

2 more replies

dragonwriter2y ago

It hasn't been solved any better this time, either.

Overdone2y ago

Sorry, but in my experience fish think about water more than domain experts think about requirements.

mavili2y ago

I agree the claim isn't going to happen.

> Which rather begs the question: who the hell is the audience for this?

cyanydeez2y ago

Shit, even if you give someone all the Lego blocks in the world and a infinity accurate picture of the minimally complex final product, less than 1% would figure it out.

This stuff seems like new age "low code" environments.

Overdone2y ago

gabereiser2y ago

I do believe there will be a day where we communicate what we need and software is written on the fly to facilitate that need. For better or worse.

dragonwriter2y ago

> The point is the nuance of writing code is now no longer an elitist act that strikes the egos of those who understand the intricacies, it’s now democratized and in the hands of anyone.

Insofar as anything like that was ever true, it still is.

Not that writing code has ever been the hard part of software development.

elisharobinson2y ago

yieldcrv2y ago

Copium

More people will be able to express themselves, it doesn’t matter that your uncle won’t

sebzim45002y ago· 9 in thread

Hard to believe it can work that well when it only has 3B parameters, but I'd love to be proven wrong.

thewataccount2y ago

I was impressed enough by replit's 2.7B model that I'm convinced it's doable. I have a 4090 and consider that the "max expected card for a consumer to own".

3B models with refact are pretty snappy with Refact, about as fast as github copilot. The other benefit is more context space, which will be a limiting factor for 16B models.

tl;dr - I think we need ~3B models if we want any chance of consumer hardware to reasonably run coding models akin to github copilot with decent context length. And I think it's doable.

eyegor2y ago

2 more replies

thorum2y ago

brucethemoose22y ago

exLlama is not the only viable quantized backend. TVM (as use by mlc-llm) and GGML (which is used by llama.cpp) are very strong contenders.

~7B-13B will work in 16GB RAM with pretty much any dGPU for help, and context extending tricks.

*Also, I think any PyTorch based model is not really viable for consumer use. Its just too finicky to install and too narrow with hardware support.

nwoli2y ago

Reminder that GPT-2 was considered “too dangerous” to be released at just 1.5B weights

ben_w2y ago

My memory may be imperfect, but I thought it was more "we aren't sure and we want to promote a culture of safety" rather than "this is definitely unsafe… oh wait never mind"?

1 more reply

capableweb2y ago

I had that thought at first too, but then the scope is really small (programming) compared to other models (everything) so might not be that bad.

csjh2y ago

phi-1[0] is only 1.3 billion parameters and performs very well in coding tasks - small models have a massive amount of potential

[0] - https://arxiv.org/abs/2306.11644

politelemon2y ago

But it does mean, hopefully, it is easier to run on small hardware. Making it much more accessible.

runako2y ago· 8 in thread

yohannparis2y ago

hmottestad2y ago

I’ve found chat gpt to be more helpful in general. I can paste some code in and have a discussion about what I want it to fix for me.

arcanemachiner2y ago

I may put all my open source stuff on GitHub, but hell will freeze over before I willingly let Microsoft get a whiff of my private data, no matter how irrelevant it may be.

GitHub Copilot sounds pretty neat though, I will admit that.

fruit20202y ago

selfhoster112y ago

Yeah, why not upload all my employer's proprietary code to Microsoft? What could possibly go wrong?

I get that consuming an API is far easier than setting up your own inference backend, but there are legitimate issues to consider before going in that direction.

nwoli2y ago

Ctrl-F for “Code for using StableCode Instruct to generate a response to a given instruction.” and you’ll see a super straightforward piece of code to copy to test it out for generating code

runako2y ago

carom2y ago

3rd32y ago· 7 in thread

How does it compare to GitHub Copilot?

miohtama2y ago

The model, source, etc. are available under permissive terms

https://huggingface.co/stabilityai/stablecode-instruct-alpha...

You can “run it locally”. Very handy if you do not trust automatically sending all your code to someone in the United States.

lolinder2y ago

> to reproduce, distribute, and create derivative works of the Software Products solely for your non-commercial research purposes

I wouldn't call these terms permissive. It's in line with the recent trend in released AI models, but fairly restrictive in what you're actually allowed to do with it.

1 more reply

UncleOxidant2y ago

Hmmm... so on that hugging face page there's a text box where you enter input then you click the 'compute' button.

So I asked it to "Write a python function that computes the square of the input number."

And it responds with:

     def square(x):

Which seems quite underwhelming.

1 more reply

jstummbillig2y ago

When they don't voluntarily answer the question, you know the answer.

sebzim45002y ago

It's not easy to compare them, to be fair.

2 more replies

karmasimida2y ago

On HumanEval, Copilot is 40+ on pass@1 comparing to 26 for stable code 3b.

HumanEval is abused but this model is only good for its size, it is no match for Copilot … yet

UncleOxidant2y ago

> On HumanEval, Copilot is 40+ on pass@1 comparing to 26 for stable code 3b.

Can you put those numbers into context for those who haven't done HumanEval? Are those percentages so that 40+ means 40+% and 26 is 26%? If so does that imply both would be failing scores?

rvz2y ago· 5 in thread

Either way, the race to zero has been further accelerated.

Stability AI, Apple, Meta, etc are clearly at the finish line putting pressure on cloud only AI models and cannot raise prices or compete with free.

_pdp_2y ago

thewataccount2y ago

> Stability AI, Apple, Meta, etc are clearly at the finish line

I expect open source models to stay ~1.5 years behind. That said they will eventually be "good enough".

tl;dr - open models don't even compare to GPT4 yet (I use them all daily), they aren't free to run, and a API option is still preferably to a massive if not most companies.

nwoli2y ago

> Keep in mind too though that using and scaling GPUs is not free. You have to run the models somewhere.

1 more reply

empath-nirvana2y ago

brucethemoose22y ago

Hardware is still a limiting factor.

Cloud AI providers get a big advantage from batching/pipelining and fancy ASICs. The question is how much they are willing to lower the tax.

cutler2y ago· 4 in thread

Yet another site whose data privacy policy amounts to nothing more than an Accept button. Refuse to use such sites.

capableweb2y ago

It's a model you download and run yourself, on your own hardware. No privacy policy needed.

barrotes2y ago

smcleod2y ago

Use uBlock Origin and then you won't have to see them ;)

cutler2y ago

But won't that just default you to agreeing to whatever.

1 more reply

RomanPushkin2y ago· 4 in thread

Is it good at algos?

From interviews:

Implement queue that supports three methods:

* push

* pop

* peek(i)

peek returns element by its index. All three methods should have O(1) complexity [write code in Ruby].

ChatGPT wasn't able to solve that last time I tried https://twitter.com/romanpushkin/status/1617037136364199938

anotherpaulg2y ago

https://aider.chat/share/?mdurl=https://gist.github.com/paul...

redox992y ago

GPT4 made a mistake on its first try, but after asking what the complexity of pop is, it figured out its mistake and fixed it.

https://chat.openai.com/share/d527f65f-8a6d-4602-acab-4d80ed...

voxl2y ago

in what world is a hashtable lookup worst case O(1)? Your own solution doesn't match your requirements.

If you want amortized complexity then a simple vector suffices.

RomanPushkin2y ago

1. I like toxic comments like that saying something is simple without actually solving the problem, you're the best.

2. The average complexity to search, insert, and delete data in a hash table is O(1), for interviews it works 99% of the time.

2 more replies

thewataccount2y ago· 2 in thread

I can't seem to find a demo, if anyone has a chance to test it, how does it compare to replit and wizardcoder?

james2doyle2y ago

Looks like there is one on the Hugging Face page: https://huggingface.co/stabilityai/stablecode-instruct-alpha...

Not very promising based on this lame test

politelemon2y ago

I ran it locally and it seemed to do better. I switched Python to Bash and it also gave a good answer (nproc).

smcleod2y ago· 1 in thread

If I’m reading this correctly this could be an open source model that may compete with the likes of copilot?

I had a look at some of the recent code models (wizardcoder,strider etc) but it seemed that you need a really large model to be any good and quite a few of them were trying specifically for python.

smcleod2y ago

Trained specifically for Python*

gaogao2y ago

dragonwriter2y ago

> People of every background will soon be able to create code to solve their everyday problems and improve their lives using AI, and we’d like to help make this happen

MyAccountYo2y ago

jaimani_langoo2y ago

whimsicalism2y ago

> ~120,000 code instruction/response pairs in Alpaca format were trained on the base model to achieve this result.

Very curious where they are getting this data from. In other open source papers, usually this comes from a GPT-4 output, but presumably Stability would not do that?

ethereal_ai2y ago

As a user who cares more about the product, how does it compare to the gpt-4 code capability? gpt-4 is good enough for me, if it works better than gpt-4 I would love to try it!

nwoli2y ago

I love stability AI

monlockandkey2y ago

Any performance metrics?

eduardocrs2y ago

"People will never ..."

Ai: "Hold my beer".

j / k navigate · click thread line to collapse