I'm running a 70b model now that's okay, but it's still fairly tight. And I've got 16gb more vram then the red v2.
I'm also confused why this is 12U. My whole rig is 4u.
The green v2 has better GPUs. But for $65k, I'd expect a much better CPU and 256gb of RAM. It's not like a threadripper 7000 is going to break the bank.
I'm glad this exists but it's... honestly pretty perplexing
The thing that’s less useful is the 64G VRAM/128G System RAM config, even the large MoE models only need 20B for the router, the rest of the VRAM is essentially wasted (Mixing experts between VRAM and/System RAM has basically no performance benefit).
I imagine that's because they are buying a single SKU for the shell/case. I imagine their answer to your question would be: In order to keep prices low and quality high, we don't offer any customization to the server dimensions
But yeah, 4x Blackwell 6000s are ~32-36k, not sure where the other $30k is going.
I don't see the 120B claim on the page itself. Unless the page has been edited, I think it's something the submitter added.
I agree, though. The only way you're running 120B models on that device is either extreme quantization or by offloading layers to the CPU. Neither will be a good experience.
These aren't a good value buy unless you compare them to fully supported offerings from the big players.
It's going to be hard to target a market where most people know they can put together the exact same system for thousands of dollars less and have it assembled in an afternoon. RTX 6000 96GB cards are in stock at Newegg for $9000 right now which leaves almost $30,000 for the rest of the system. Even with today's RAM prices it's not hard to do better than that CPU and 256GB of RAM when you have a $30,000 budget.
Can't you offload KV to system RAM, or even storage? It would make it possible to run with longer contexts, even with some overhead. AIUI, local AI frameworks include support for caching some of the KV in VRAM, using a LRU policy, so the overhead would be tolerable.
With that said, people are trying to extend VRAM into system RAM or even NVMe storage, but as soon as you hit the PCI bus with the high bandwidth layers like KV cache, you eliminate a lot of the performance benefit that you get from having fast memory near the GPU die.
For me, qwen didn't cut it. You're not fine tuning a 120b parameter model with 80gb. You're probably not going to be able to abliterate it either, because it's moe. Other options use more vram, and where you'd have a fair amount of buffer with qwen, you're pressed with other big models.
Still, this is a great idea, and one I hope takes off. I think there's a good argument that the future of AI is in locally-trained models for everyone, rather than relying on a big company's own model.
One thought: The ability to conveniently get this onto a 240v circuit would be nice. Having to find two different 120v circuits to plug this into will be a pain for many folks.
* this section written by me typing on keyboard *
* this section produced by AI *
And usually both exist in document and lengthy communications. This gets what I wanted across with exactly my intention and then I can attach 10x length worth of AI appendix that would be helpful indexing and references.
fundamentally, looks like they are shipping consumer off-the-shelf hardwares in a custom box.
Maybe the volume for them is ok that well-intentioned but poor quality PRs can be politely(or otherwise, culture depending) disregarded and the method of generation is not important.
Then you could focus fire, like the script kiddies did with DDoS in the old days on fixing whatever preferred issues you have.
Wouldn't there be a massive duplication of effort in that case? It'll be interesting to see how the costs play out. There are security benefits to think about as well in keeping things local-first.
No need for separate circuits, just use a double adapter.
In Europe, you could plug the two power supplies into an appropriately sized 240V circuit.
In an apartment you can't rewire, you could set it up in your kitchen, which in the modern US code should have two separate 20A circuits. You will need to put it to sleep while you use appliances.
But this is re: European 240/250 which is 240 between its load and neutral
I’d say don’t energize either systems ground plane, but , really, don’t do this in EU
So basically you need a brand new circuit run if you don't have two 120V circuits next to each other. But if you're spending $65k on a single machine, an extra grand for an electrician to run conduit should be peanuts. While you're at it I would def add a whole-home GFCI, lightning/EMI arrestor, and a UPS at the outlet, so one big shock doesn't send $65k down the toilet.
Can confirm.
Oh wait, I get it, it's bike shedding.
I have no idea who would buy this. Maybe if you think Vera Rubin is three years out? But NV ships, man, they are shipping.
Maybe the target profile for exabox looks like a smaller/younger Cursor? If you're a small team with some seed funding and expertise, this kind of compute in a single box you can set up in your office feels like it could be a great fit.
Can it run Crysis?
Higher management figures out things to do in strategic level, in brief, and pushes on "soldiers", who kick it through in the least time (cheapest of the cheapest, for the sake of the quarterlies) EXACTLY the way management told it. Because they have to, their job is to make happen the company objectives given, the way it is given. Pushing out crap in the shape of the thing expected.
Larger organiztaion can use these kind of things the most. Even if they don't do that.
If you compare tokens/kWh efficiency then my math has Mac Studio being about 1.5x more efficient.
Has this guy never worked on a B2B product before? Nobody is going to order a $10 million piece of infrastructure through your website's order form. And they are definitely going to want to negotiate something, even if it's just a warranty. And you'll do it because they're waving a $10 million check in your face.
The tone of this website is arrogant to the point of being almost hostile. The guy behind this seems to think that his name carries enough weight to dictate terms like this, among other things like requiring candidates to have already contributed to his product to even be considered for a job. I would be extremely surprised if anyone except him thinks he's that important.
Besides a lot of self congratulatory pats on the back for how elegant it is. Honestly, when I read it, it looked confusing as all the other ML libraries. Not actually simple like Karpathy’s stuff.
All that to say, I do really want it to succeed. They should probably hire some practical engineers and not just guys and gals congratulating themselves how elegant and awesome they are.
> Can you fill out this supplier onboarding form?
That's very important context, as anyone who has been asked to fill out a supplier onboarding form (hi) will attest.
> we don't offer any customization to the box or ordering process
Every B2B deal of that size that I've ever seen requires at least weeks of meetings between the customer and vendor, in which every detail is at least discussed if not negotiated. That would certainly constitute a "customization" to this guy's prescribed ordering process, which is to "Buy it now" [1] through the website at the stated price like you're ordering a jar of peanuts on Amazon. This is not "framing", it's what the guy said. If it isn't what he meant then he needs to fix his copy.
[1] Yes, there is an actual "Buy it now" button for a $65,000 business purchase that takes you to a page that looks just like a Stripe form. There isn't even a textbox for delivery instructions. Wild.
First encounter with geohot eh?
> 20,000 lbs
> concrete slab
Huge-scale IT systems are typically delivered in one or more 42/44u cabinets, and are designed to be installed on raised floors.
I mean I'm sure lots of companies do this in practice because tickets for higher-paying customers naturally get prioritized, but directly stating your intention to do it on your home page is hilarious.
The YouTube rap video of geohotz telling Sony lawyers suing him to blow him is still up.
His style of dealing with corporate matters is certainly unconventional
Edit: found a third party referencing the claim but it doesn't belong in the title here I think:
Meet the World’s Smallest ‘Supercomputer’ from Tiiny AI; A Machine Bold Enough to Run 120B AI Models Right in the Palm of Your Hand
https://wccftech.com/meet-the-worlds-smallest-supercomputer-...
Now I'm wondering if the HN title was submitted by some AI bot that couldn't tell the difference.
I almost sure it’s possible to custom build a machine as powerful as their red v2 within 9k budget. And have a lot of fun along the way.
I think Tinygrad should think about recycling. Are they planning ahead in this regard? Is anyone? My thought is if there was a central database of who own what and where, at least when the recycling tech become available, people will know where to source their specific trash (and even pay for it.) Having a database like that in the first place could even fuel the industry.
$12,000, $65,000, $10,000,000.
the town near my hometown has 650 – 800 houses (according to chatgpt).
crazy.
A typical home just consumes rather little energy, now that LED lighting and heat pump cooling / heating became the norm.
Do you live in a deprived rural village in a very poor country? Because you can't even run a heater and the oven with 3kW.
I'm currently shopping for offline hardware and it is very hard to estimate the performance I will get before dropping $12K, and would love to have a baseline that I can at least always get e.g. 40 tok/s running GPT-OSS-120B using Ollama on Ubuntu out of the box.
With 6 GPUs you have to deal with risers, pcie retimers, dual PSUs and custom case for so value proposition there was much better IMO
Not revolutionary in any way, but nice. Unless I'm missing something here?
It's funny though... we're using deepseek now for features in our service and based on our customer-type we thought that they would be completely against sending their data to a third-party. We thought we'd have to do everything locally. But they seem ok with deepseek which is practically free. And the few customers that still worry about privacy may not justify such a high price point.
If private inference is actually non-negotiable, then sure, put GPUs in your colo and enjoy the infra pain, vendor weirdness, and the meeting where finance learns what those power numbers meant.
"likely" doesn't inspire much confidence. Surely, they have those numbers, and if it was, they'd publicize the comparisons.
Can they/someone else give more details as to what workloads pytorch is more than 2x slower than the hardware provides? Most of the papers use standard components and I assume pytorch is already pretty performant at implementing them at 50+% of extractable performance from typical GPUs.
If they mean more esoteric stuff that requires writing custom kernels to get good performance out of the chips, then that's a different issue.
I'm not sure what tinygrad is but I assume the markup is because the customer is making a conscious choice to support the tinygrad project. But what's unusual is there is apparently no reason whatsoever to buy this hardware, even if you plan on using tinygrad exclusively for your project. At least with System76 hardware I get (in theory) first class support for Pop!_OS.
* RAM - $1500 - Crucial Pro 128GB Kit (2x64GB) DDR5 RAM, 5600MHz CP2K64G56C46U5, up to 4 sticks for 128GB or 256GB, Amazon
* GPU - $4700 - RTX Pro 5000 48GB, Microcenter
* CPU/Mobo bundle - $1100 - AMD Ryzen 7 9800X3D, MSI X870E-P Pro, ditch the 32GB RAM, Microcenter
* Case - $220, Hyte Y70, Microcenter
* Cooler - $155, Arctic Cooling Liquid Freezer III Pro, top-mount it, Microcenter
* PSU - $180, RM1000x, Microcenter
* SSD - $400 - Samsung 990 pRO 2TB gen 4 NVMe M.2
* Fans - $100 - 6x 120mm fans, 1x 140mm fan, of your choice
Look into models like Qwen 3.5
I knew prices went up, but that's wild. I bought 64GB (2x32) of RAM a year ago for $90.
This is certainly not the most effective use of $7k for running local LLMs.
The answer is a 16" M5 Max 128GB for $5k. You can run much bigger models than your setup while being an awesome portable machine for everything else.
https://marketplace.nvidia.com/en-us/enterprise/personal-ai-...
A small joke at this weeks GTC was the "BOGOD" discount was to sell them at $4K each...
I’m pretty curious to see any benchmarks on inference on VRAM vs UM.
For 5K one can get a desktop PC with RTX 5090, that has 3x more compute, but 4x less VRAM - so depending on the workload may be a better option.
Machines with the 4xx chips are coming next month so maybe wait a week or two.
It's soldered LPDDR5X with amd strix halo ... sglang and llama.cpp can do that pretty well these days. And it's, you know, half the price and you're not locked into the Nvidia ecosystem
Mac Studio or Mac Mini, depending on which gives you the highest amount of unified memory for ~$5k.
Doubt local inference is the target use case near nearly as much as post-training. I could totally see something like this being super appealing for a startup looking to do some fine-tuning/distillation to tune a small open-weight model for a narrow use case.
I could swear I filed a GitHub issue asking about the plans for that but I don't see it. Anyway I think he mentioned it when explaining tinygrad at one point and I have wondered why that hasn't got more attention.
As far as boxes, I wish that there were more MI355X available for normal hourly rental. Or any.
Obviously any Turing machine can run any size of model, so the “120B” claim doesn’t mean much - what actually matters is speed and I just don’t believe this can be speedy enough on models that my $5000 5090-based pc is too slow for and lacks enough vram for.
120B could run, but I wouldn't want to be the person who had to use it for anything.
To be fair, the 120B claim doesn't appear on the webpage. I don't know where it came from, other than the person who submitted this to HN
A single box with those specs without having to build/configure (the red and green) - I could see being useful if you had $ and not time to build/configure/etc yourself.
How do you test/generate these numbers?
720x RDNA5 AT0 XL 25,920 GB VRAM 23,040 GB System RAM
~ $10 Million
Who is the target market here?
the boxes look cool but how good are they really? the cheapest box seems pricey at 12 for a what is essentially a few gaming gpus. i dont see why you couldnt make that like half the price. u could do a PC/server build thats much much faster for way less. size doesnt matter if its more than twice the price i think...
the more expensive box has atleast real processing gpus but afaik also not very popular ones, this one seems maybe more fair priced (there seems a big difference in bang for buck between these???).
the third one suggested looks like a joke.
dont get me wrong, this seems like a really cool idea. But i dont see it taking off as the prices are corporate but the product seems more home use.
maybe in time they will find a better balance, i do respect the fact that the component market now is sour as hell and making good products with stable prices is pretty much i possible.
id love one of these machines someday, maybe when i am less poor, or when they are xD.
(love the styling of everything, this is the most critical i could be from a dumb consumer perspective, which i totally am btw.)
He's an interesting guy. Seems to be one who does things the way he thinks is right, regardless of corporate profits.
the latest AMD GPUs are RX 9070 XT w/32GB each
“Not your weights, not your brain.”
Theres a lot there that makes sense & I think needs to be considered. But a lot just seems to be out of the blue, included without connection, in my view. Feels like maybe are in-grouo messages, that I don't understand. How this is headered as against democracy is unclear to me, and revolting. I both think we must grapple with the world as it is, and this post is in that area, strongly, but to let fear be the dominant ruling emotion is one of the main definitions of conservativism, and it's use here to scare us sounds bad.
And his politics are a derivative of Great Man Theory, and his positions on things like democracy follow from that. This idea, and those espoused by some of the VC/tech elite like Peter Theil are that singular hardworking genius individuals can change the world on their own, and everyone who not in this top 0.1% are borderline NPCs.
They do this both because of their genius/hardwork, and also because they are willing to break the rules that are set forth by this bottom 99.9%.
I'm starting to call this ideology Authoritarian techno-Libertarianism. Its a delibriately oxymoronic name that I use, because these "Great Men" are definitely trying to change the world. IE, they are trying to impose their goals and values on the world without getting the buyin of other people.
Thats the "authoritarian" part. And then the "libertarian" part is that they are going about this imposition of their will on the world by doing it all themselves, through their own hard work.
Think "Person invents a world changing technology, that some people thing is bad, and just releases it open source for anyone to use". AI models are a great example, in fact. Once that technology is out there the genie cannot be put back into the bottle and a ton of people are going to lose their jobs, ect.
A distain for democracy follows directly from things like this. You dont wait for people to vote to allow you to change the world by inventing something new. You just do and watch the results.
Did he take down the video because of embarrassment or did he fear negative impact on his sales?
But let’s be real, 12k is kinda pushing it - what kind of people are gonna spend $65k or even $10M (lmao WTAF) on a boutique thing like this. I dont think these kinds of things go in datacenters (happy to be corrected) and they are way too expensive (and probably way too HOT) to just go in a home or even an office “closet”.
Since when did our perception of tiny blow out of size in tech? Is it the influence of "hello world" eletron apps consuming 100mb of mem while idle setting the new standard? Anyway being an AI bro seems like an expensive hobby...
Sorry, what? Is this just a scam?
Literally the line above that