It's almost a mathematical certainty that people who invested in OpenAI will need to reincarnate in multiple universes to ever see that money again but no bother many are probably NVIDIA stock holders to even out the damage.
Open AI can release GPT 4.5 or 5 and push out the boundary in the direction of “correctness” and “multimodality”.
Either way, we win as customers while the the level of competition remains this hot.
I personally want a smart AI much more than a cheap or fast one. Your mileage may vary.
I don’t know whether/when we’ll get there, and whether it will be improvements in models, or underlying model technology, or GPU/TPUs with larger memory at a consumer price point, or something else, that will deliver it.
They are no doubt sitting on ultra polished stuff. When you are the tip of the arrow though and the cutting edge itself it might not be as efficient but does it ever show you things you can’t unsee.
When OpenAI can launch a video thing a day after because it’s ready to go. I am less and less skeptical e dry time they ship because the quality of the first version isn’t sliding back wards even in different areas like video.
Maybe releasing it is strategic, or releasing it also requires supporting it infrastructure wise and then some. That might be a challenge.
My feeling is the next model of an k between may have massive efficiency and performance improvements without having to go quantum with brute forcing it.
Meanwhile others who are following what OpenAI has done seem to be able to optimize it and make it more efficient whether it’s open source or otherwise.
Both are doing important work and I'm not sure I want to see it as a one winner take all game.
The way AI vendors are responding suddenly to another’s launch feels like they are always ready to launch and continue to add functionality to it that could also ship.
It reminds me of when Google spent a billion dollars advertising bing had a billion pages indexed. Google stayed quiet. Then when the money was spent by Microsoft, Google simply added a zero or two to their search page, when they used to list how many pages they have indexed. They were just sitting on it already done, announcing it when it’s to their benefit.
The newest version of GPT-4 is probably still overall the best model currently, but it is only a few months old, and the picture depends a lot on what benchmarks you are looking at.
E.g. for what we are doing at our company (document processing, etc.) Claude-3 Opus and Gemini-1.5 Pro are currently the better models. The newest GPT-4 even performed worse than a previous version.
So to me it def. seems like the gap is getting smaller. Of course, OpenAI could be coming out with GPT-5 next week and it could be vastly better than all other current models.
Everyone could say anything about open source models, but they're comparing themselves to what OpenAI released a year ago. They haven't shown all of their cards yet and they have a decent moat already in place; some say they have no moat, I disagree, they have one of the best moats possible which is brand awareness.
Sora on its own could bring in billions in revenue; an open-source Sora will take at least another year, if not two, to come out. Then more time until it can run on commodity hardware. An open source model that only runs in a dedicated H100 is actually less useful than a closed model behind an API call; not to detract from open source, I think it's the way to go but I'm just being pragmatic and realistic. There's a reason why MS Office is still the top productivity app in the world, even though dozens of open source alternatives exist.
Do they though?
If you talk to "regular people", everybody knows ChatGPT, but nobody knows or cares about OpenAI. And most of them don‘t even really know that name. They call it ChatUuuuhm, ChatThingy, Chad Gippity, or similar.
I think they will just switch, when something better comes along.
Azure, while significant, has no similar monopoly to support OpenAI. Do you really see a structural advantage to openAI beyond the Microsoft products integrating it?
a) A year after GPT-4 set the bar, it's still the best model, despite everyone else not having to do it first. Just copy, and just software. And that's not for lack of trying by every other viable prime player on the planet with unprecedented acceleration.
Imagine any other piece of software, where the incumbent has a mere 2-3 year head start, in which they had to work out the entire product that everyone else, despite just having to copy and pressing the pedal through the floor is struggling just trying to catch up with.
b) The current models including GPT-4 are so bad. The few billions can be made by just by continue playing this game of improvements for a few years and getting better each year. I think people are wildly confused about how big this market is going to be when that happens. They are not squeezing hosting or compute. They are squeezing intelligence. Intelligence is the entire economy. The notion that there would ever not be room for multiple things here, maybe through size or specialisation or cost (as with all other intelligence), and that a few billion dollar are a big deal, is so strange to me.
c) The game will at some point, be mostly about infra and optimization. People come to the conclusion that's a problem for the incumbents, when our entire industry is mostly about infra and optimization. AWS is infra and optimization. I think even the average hn tinkerer understands that therein lies a proposition that's not exactly equivalent to "just rent a few servers and do it yourself".
Debatable. Many people find Claude Opus superior, and I know I've found it consistently better for challenging coding questions. More importantly, the delta between GPT-4 and everything else is getting smaller and smaller. Llama 3 is basically interchangeable with GPT-4 for a huge number of tasks, despite its smaller size.
This reasoning can mostly be applied here. If you want to learn about and pull the LLM apart. Perhaps fine-tune and tinker then 100% go ahead running locally. You however won't be able to scale this up easily for a consumer base and the electricity use and heat output starts to become a problem.
At some point it's more beneficial to pay the provider for inference, this includes upkeep, latest models, faster generation, stability, hosting etc.
Pros and cons! Choice is important and Meta is doing the right thing by the AI community and tech community in general by being realistic with these programs. The ecosystem is giving back by being able to access these high quality models.
I also hope that it ought not change if it became more palatable to not be open.
For video games, being locked to a cloud service means the feature will disappear when the servers are shut down.
It ends up being somewhat faster than regular speculative decoding in normal setting (GPU only). If you are doing CPU offloading it's massively faster.
Edit typo
It is in their TODO part in https://github.com/Infini-AI-Lab/Sequoia/tree/main
Will this work, or do I need a Tesla P40 or two?
Even with RTX 4090, 2 tokens per second is very slow and likely not ideal for most tasks. It is impressive (much faster than previous solutions), but still very slow for real time use.
If you want to run Llama 3 70b, might be better to purchase a mac studio with 64gb RAM (more for longer contexts) and run with 4-bit quantization.
My 2 cents: For most common tasks Llama 3 8b will be more than enough, and you can run that with full precision using a single rtx 3090. At a much lower cost, you can also run Llama 3 8b with 8-bit quantization in a single RTX 3060, if it has 12GB RAM.
You will also be getting about 720GB/s of memory bandwidth with 2x3060; instead of 1TB/s with the 4090; so expect lower performance.
I get that Therese proprietary technology, but if so, can we please not put this on arxiv and pretend it’s a scientific contribution?
[1] https://github.com/Infini-AI-Lab/Sequoia/tree/main/Engine