Many other startups in the space will like face similar issues given the rapid commoditization of these models and the underlying tech. It’s very easy to spend a fortune building a model that offers a short lived incremental improvement at best before one can just quickly swap it out for something else someone else paid to train.
These go on bedrock, on chip, on prem etc and our consulting partners take them to the end user.
On the innovation side stable diffusion turbo does like 100 cats with hats per second and the video model outperforms runway, pika etc on blind tests.
Stable audio was one of the time innovation of the year winners on music and we released a sota 3d model.
Stable LM zephyr is the best 3b chat model works great on a MacBook Air.
Most of the pixels in the world will be generated so fast high quality image/video are the core and these other models are to support them.
It’s really hard to build good solid models and we are the only company that can build a model of any type for anyone.
(Official ONNX version, please!! Then you get Transformers.js / web / I can deploy on every platform from Web to iOS to Windows)
re: art, Dalle-3 costs significantly more. XL costs are 1/5th of what they were at launch, 0.0002/image versus Dalle-3's 0.04. And you'd be surprised how often people are happy with XL -- Dalle-3's marginal advantage is mostly text, especially with the excessive filtering of stylistic stuff, and forced prompt rewrites
On a small scale, you have to professionalize ComfyUI’s development. My PR to make it installable and to make a plugin ecosystem that makes sense should not be sitting unmerged (https://github.com/comfyanonymous/ComfyUI/pull/298).
On a medium scale, CLIP is holding you back. I would eagerly buy a 48GB card to accommodate a batch size 1, gradient checkpointed LoRA-trainable model with T5 for conditioning. I want PixArt-a or DeepFloyd/IF with the SDXL dataset and training. I get I can achieve so much with SDXL on 24GB, including just barely a fine tuning, I understand the engineering decisions here, but it’s too weak on prompts.
On a large scale, I’m willing to spend a little money up front. In those conditions you can be far more innovative, you don’t have to make everything for $0. Shane Carruth didn’t make Primer for $0. I’m sure you’ve seen this movie, you get how astoundingly good it is. But he still spent something. He spent only slightly more than an RTX 6000 Ada.
Innovators have budgets. It’s still worth releasing the most powerful possible model for expensive hardware, this is why everyone is talking about Mixtral, but it’s especially true of visual art.
By what measure? Phi 2 seems better as far as I can tell from benchmarks and usage and has much more permissive license.
2028: Energy use on hat-cat generation exceeds energy use on bitcoin.
AI has peaked
SDXL and controlnet is odd a lot of the time. 1.5 + controlnet still seem to give quicker and better results.
Basically SD atleast seems to be for when you want unique content. MJ/Dalle for everything else.
Nope. Stable Diffusion with alternative models offers far more customization and control than Midjourney. Midjourney is good for beginners but sucks for experts.
Believe it or not, generating a full image from a prompt is a small slice of the image generation pie. Highly tuned in-painting is key to a number of budding startups.
That doesn't seem to be the case. There are very limited open-source models outside of the small-LLM bubble.
This isn’t entirely true, after the fumble that was SD2 they shipped SDXL and SDXL Turbo that are both excellent. And in real world results Midjourney doesn’t just straight out perform them it’s a lot more complex and ultimately SDXL is the more powerful tool.
Definitely found the LLMs underwhelming and Stable Audios launch was poor but don’t think Midjourney has outright surpassed them on image gen.
That said it was super cool the time I trained a model on my friends selfies and made her into her D&D character. She was super excited about it, made me feel like a real life wizard.
What? Have you actually used either? MJ is just a ultra-fine tuned model with a few layers to prevent stuff from looking bad. Stable Diffusion has their own 'single shot' version, maybe someone remembers it, I played with it for 1-2 hours. Everything looks great, but I want hyper specific stuff in my art and I'm never getting that with 1 shots.
Heck, I did a few flyers and used some icons I made with img2img + inpainting + controlnet. The work is completely stunning and scalable. That is never happening even at an individual level with MJ.
The most impressive thing about these results is how good the 1.3B deepseek coder is.
I tested out StableLM Zephyr 3B when that came out and it was extremely underwhelming/unusable.
Based on this, Stable Code 3B doesn’t look to be worth trying out. Guessing if they could put out a 7B model which beat Deepseek Coder 6.7B they would have.
Another model you should try is magicoder 6.7b ds (based on deepseek coder). After playing with it for a couple weeks, I think it gives slightly better results than the equivalent deepseek model.
I haven't tested, but I think deepseek coder 33b can run in a single RTX 3090 when 4-bit quantized. In your case you might be able to run the non quantized version
https://huggingface.co/spaces/mike-ravkine/can-ai-code-resul...
Don't know how biased this leaderboard is, but I guess you could just give some of them a try and see for yourself.
I've seen the CanAiCode leaderboard several times (and used many of the models listed), but I wouldn't use it to pick a model. It's not a bad list, but the benchmark is too limited. The results are not accurately ranked from best to worst.
For example the deepseek 33b model is ranked 5 spots lower than the 6.7b model, but the 33b model is definitely better. WizardCoder 15b is near the top while WizardCoder 33b is ranked 26 spots lower, which is a wildly inaccurate ranking.
It's worth noting that those 33b models score in the 70s for HumanEval and HumanEval+ while the 15b model scores in the 50s.
BTW, depending on where you're at in your ML journey, Jeremy Howard from FastAI says you should focus more on using hosted instances like paperspace until you really need to get your own machine. Unless, of course, you enjoy linux sysadmin tasks. :) It can get really annoying trying to match the right version of CUDA with the version of Pytorch you're trying to get running for the newest model you're trying.
If this opens it to smaller laptops, wow!
We truly live in crazy time. The rate of improvement in this field is off the walls.
Sure these will continue to improve, phi2 is a good base as well
If you don't want to be tied to a company and like opensource, feel free to connect a toy motor to an AA battery to drill your holes... Or to use Llama/Stable Code 3B.
So,I guess they're like a Milwaukee Drill that will sometimes refuse to work unless you buy more drill credits.
My prompt - "please show me how to write a web scraper in Python"
The response?
<blockquote> I've written my first ever python script about 5 months ago and I really don't remember anything except for the fact that I used Selenium in order to scrape websites (in this case, Google). So you can probably just copy/paste all of these lines from your own Python code which contains logic to determine what value should be returned when called by another piece of software or program. </blockquote>
So you'd need to prompt it through comments or by starting with a function name, basically the same as one would prompt GitHub copilot.
e.g.
# the following code implements a webscraper in python
class WebScraper:
(I didn't try this, and I'm not good at prompting, but something along the lines of this example should yield better results)edit: the webpage does call it "Stable Code Completion"
https://news.ycombinator.com/item?id=38803836
Was it just that my submission didn't find enough / more balanced commenters?
Anyways, there's also a difference between "are you excited about this new thing becoming available" and "now that you've used it, do you like the experience". The former is more likely to feature rosy expectations and the latter bitter disappointment. (Though it could also be the other way around, with people dismissing it at first and then discovering that it's kind of nice actually.)
Spending huge amount of resource to be a bit better at autocompleting code doesn't have value to me. I want it to solve significant problems, and it's looking like it can't do it and scaling it to be able to is totally impractical.
> In aggregate, training all 9 Code Llama models required 400K GPU hours of computation on hardware of type A100-80GB (TDP of 350-400W).
That is: * 45⅔ GPU years * 160 MWh or... * 45 average UK homes annual electric consumption * 18 average US homes * 64 average drivers annual milage in an EV.
...and that's just the GPUs. Add on all the rest of the system (s).
How has it changed your work life leads people down the rabbit hole of will coding jobs be safe.
This one is a lot more neutral/technical.
I get why Meta releases tons of models, but still can’t quite understand what stability is trying to achieve
> This model is included in our new Stability AI Membership. Visit our Membership page to take advantage of our commercial Core Model offerings, including SDXL Turbo & Stable Video Diffusion.
A hypothetical Stable Code 13B/70B could be hosted only, with more languages or specialized use-cases (Stable Code 3B iOS-Swift-Turbo)
Plus licensed variant models like stable audio and on chip installation like arm for specialist models eg Japanese law or Indonesian accounting
> Commercial Applications
> This model is included in our new Stability AI Membership. Visit our Membership page to take advantage of our commercial Core Model offerings, including SDXL Turbo & Stable Video Diffusion.
what exactly is the license lol. can people use this or is this "see dont touch"
There is no clear legal, definition of "noncommercial," and courts have gone all sorts of different way on what constitutes commercial use.
This is where CC NC licenses imploded. A lot of places (hello, MIT!) intentionally use CC NC licenses to make things appear more open than they are.
Gpt4all has a UI as well that you can use with models running locally on your laptop.
The closest point within your control that interfaces with devices outside of your control.
Seeing the term get used to describe client devices themselves kinda muddies the terminology.
They might also be able to train a model more intelligently by generating training data from said graphs.
You can use/try code-llama with Cody https://sourcegraph.com/blog/cody-vscode-1.1.0-release#:~:te...
I also hope stability comes out with a competitor to the new midjourney and dalle models! That's what put them on the map in the first place
We have been working on ComfyUI for the next step and new image models
Midjourney and others are pipelines versus models so we have a higher bar to jump but the og stable diffusion team are working hard!
I use the 6bit GGUF quantized version on a laptop RTX 3070
I found one option: https://github.com/xNul/code-llama-for-vscode
But I'm guessing there are others, and they might differ in how they provide context to the model.
(In turn, in the Phi-2 post they compare Phi-2 to Llama-2 instead of CodeLlama, making it even harder)
[1]: https://www.microsoft.com/en-us/research/blog/phi-2-the-surp...
None of the little models, including this one, are comparable to the performance of the larger models for any significant coding problem.
I think what these are useful for is mostly giving people hints inside of a code editor. Occasionally filling in the blank.
Maybe one day when I need to do offline coding on my cellphone, it will be really useful.