Tinybox – A powerful computer for deep learning (opens in new tab)

(tinygrad.org)

601 pointsalbelfio1mo ago341 comments

341 comments

There's no way the red v2 is doing anything with a 120b parameter model. I just finished building a dual a100 ai homelab (80gb vram combined with nvlink). Similar stats otherwise. 120b only fits with very heavy quantization, enough to make the model schizophrenic in my experience. And there's no room for kv, so you'll OOM around 4k of context.

I'm running a 70b model now that's okay, but it's still fairly tight. And I've got 16gb more vram then the red v2.

I'm also confused why this is 12U. My whole rig is 4u.

The green v2 has better GPUs. But for $65k, I'd expect a much better CPU and 256gb of RAM. It's not like a threadripper 7000 is going to break the bank.

I'm glad this exists but it's... honestly pretty perplexing

overfeed1mo ago

> I'm also confused why this is 12U. My whole rig is 4u.

I imagine that's because they are buying a single SKU for the shell/case. I imagine their answer to your question would be: In order to keep prices low and quality high, we don't offer any customization to the server dimensions

ottah1mo ago

That's just such a massively oversized server for the number of gpus. It's not like they're doing anything special either. I can buy an appropriately sized supermicro chassis myself and throw some cards in it. They're really not adding enough value add to overspend on anything.

randomgermanguy1mo ago

The major selling point of the tinyboxes is that you're able to run them in your office without any hassle.

I used to own a Dell Poweredge for my home-office, but those fans even on minimal setting kept me up at night

oceanplexian1mo ago

It will work fine but it’s not necessarily insane performance. I can run a q4 of gpt-oss-120b on my Epyc Milan box that has similar specs and get something like 30-50 Tok/sec by splitting it across RAM and GPU.

The thing that’s less useful is the 64G VRAM/128G System RAM config, even the large MoE models only need 20B for the router, the rest of the VRAM is essentially wasted (Mixing experts between VRAM and/System RAM has basically no performance benefit).

androiddrew1mo ago

Could you share what you are using for inference and how you are running it? I have a 64G VRAM/128G system RAM setup.

sosodev1mo ago

Most people are using something in the llama family for inference. Llama server is my go to. Unsloth guides describe how to configure inference for your model of choice.

syntaxing1mo ago

Split RAM and GPU impacts it more than you think. I would be surprised if the red box doesn’t outperform you by 2-3X for both PP and TG

datadrivenangel1mo ago

Yeah I've got the q4 gpt-oss-120b running at ~40-60 tokens per second on an M5 Pro.

ericd1mo ago

Was that cheaper than a Blackwell 6000?

But yeah, 4x Blackwell 6000s are ~32-36k, not sure where the other $30k is going.

bastawhiz1mo ago

I bought the A100s used for a little over $6k each.

ericd1mo ago

Oh, why'd you go that route? Considering going beyond 80 gigs with nvlink or something?

1 more reply

segmondy1mo ago

folks have too much money than sense, gpt-oss-120b full quant runs on my quad 3090 at 100tk/sec and that's with llama.cpp, with vllm it will probably run at 150tk/sec and that's without batching.

Aurornis1mo ago

> gpt-oss-120b full quant runs on my quad 3090

A 120B model cannot fit on 4 x 24GB GPUs at full quantization.

Either you're confusing this with the 20B model, or you have 48GB modded 3090s.

1 more reply

integralid1mo ago

Thanks for chiming in. I'm looking for a reasonably cheap local LLM machine, and multiple 3090s is exactly what I planned to buy. Do you have any recommendations or recommend any reading material before I decide to spend money on that?

edit: Found your comment about /r/localllama, but if you have anything more to add I'm still very interested.

amarshall1mo ago

You're almost certainly (definitely, in fact) confusing the 120b and 20b models.

1 more reply

ericd1mo ago

How're you fitting a model made for 80 gig cards onto a GPU with 24 gigs at full quant?

2 more replies

gfiorav1mo ago

I think Hotz basically created super specific software for the gpus that throws away anything that doesn't contribute to inference (not turing complete, for example).

zozbot2341mo ago

> And there's no room for kv, so you'll OOM around 4k of context.

Can't you offload KV to system RAM, or even storage? It would make it possible to run with longer contexts, even with some overhead. AIUI, local AI frameworks include support for caching some of the KV in VRAM, using a LRU policy, so the overhead would be tolerable.

tcdent1mo ago

Not worth it. It is a very significant performance hit.

With that said, people are trying to extend VRAM into system RAM or even NVMe storage, but as soon as you hit the PCI bus with the high bandwidth layers like KV cache, you eliminate a lot of the performance benefit that you get from having fast memory near the GPU die.

zozbot2341mo ago

> With that said, people are trying to extend VRAM into system RAM or even NVMe storage

Only useful for prefill (given the usual discrete-GPU setup; iGPU/APU/unified memory is different and can basically be treated as VRAM-only, though a bit slower) since the PCIe bus becomes a severe bottleneck otherwise as soon as you offload more than a tiny fraction of the memory workload to system memory/NVMe. For decode, you're better off running entire layers (including expert layers) on the CPU, which local AI frameworks support out of the box. (CPU-run layers can in turn offload to storage for model parameters/KV cache as a last resort. But if you offload too much to storage (insufficient RAM cache) that then dominates the overhead and basically everything else becomes irrelevant.)"

bastawhiz1mo ago

The performance already isn't spectacular with it running all in vram. It'll obviously depend on the model: MoE will probably perform better than a dense model, and anything with reasoning is going to take _forever_ to even start beginning its actual output.

ranger_danger1mo ago

I know llama.cpp can, it certainly improved performance on my RAM-starved GPU.

Aurornis1mo ago

> There's no way the red v2 is doing anything with a 120b parameter model.

I don't see the 120B claim on the page itself. Unless the page has been edited, I think it's something the submitter added.

I agree, though. The only way you're running 120B models on that device is either extreme quantization or by offloading layers to the CPU. Neither will be a good experience.

These aren't a good value buy unless you compare them to fully supported offerings from the big players.

It's going to be hard to target a market where most people know they can put together the exact same system for thousands of dollars less and have it assembled in an afternoon. RTX 6000 96GB cards are in stock at Newegg for $9000 right now which leaves almost $30,000 for the rest of the system. Even with today's RAM prices it's not hard to do better than that CPU and 256GB of RAM when you have a $30,000 budget.

ottah1mo ago

Honestly two rtx 8000s would probably have a better return on investment than the red v2. I have an eight gpu server, five rtx 8000, three rtx 6000 ada. For basic inference, the 8000s aren't bad at all. I'm sure the green with four rtx pro 6000s are dramatically faster, but there's a $25k markup I don't honestly understand.

packetlost1mo ago

This does not match my experience with 120B~ models. I run Qwen3.5 122b A10B on about 80GB of vRAM just fine.

bastawhiz1mo ago

Qwen 3.5 is MoE. But you're also almost certainly running a quantized version. 120B is well over 200gb at bf16. With int4 you're looking at 60gb or so. Qwen uses relatively little kv (only about 2gb for 64k context). So you're not too snug, but if qwen isn't cutting it for you, as it didn't for me, you're kind of in a pickle. For writing tasks, int4 was simply too chaotic. I also couldn't get it to use tools.

For me, qwen didn't cut it. You're not fine tuning a 120b parameter model with 80gb. You're probably not going to be able to abliterate it either, because it's moe. Other options use more vram, and where you'd have a fair amount of buffer with qwen, you're pressed with other big models.

sosodev1mo ago

What models are you testing? A 120b model with hybrid attention should fit within 80gb of VRAM fine at a 4-bit quant. Also, 4-bit quants that are done well are generally fine. They certainly don’t make the model unusable.

ivraatiems1mo ago

There's some irony in the fact that this website reads as extremely NOT AI-generated, very human in the way it's designed and the tone of its writing.

Still, this is a great idea, and one I hope takes off. I think there's a good argument that the future of AI is in locally-trained models for everyone, rather than relying on a big company's own model.

One thought: The ability to conveniently get this onto a 240v circuit would be nice. Having to find two different 120v circuits to plug this into will be a pain for many folks.

solarkraft1mo ago

I find that the most respected writing about AI has very few signs of being written by AI. I'm guessing that's because people in the space are very sensitive to the signs and signal vs. noise.

rimeice1mo ago

And because people writing anything worth reading are using the process of writing to form a proper argument and develop their ideas. It’s just not possible to do that by delegating even a small chunk of the work to AI.

Aperocky1mo ago

I found it useful to preface with

* this section written by me typing on keyboard *

* this section produced by AI *

And usually both exist in document and lengthy communications. This gets what I wanted across with exactly my intention and then I can attach 10x length worth of AI appendix that would be helpful indexing and references.

jolmg1mo ago

> attach 10x length worth of AI appendix that would be helpful indexing and references.

Are references helpful when they're generated? The reader could've generated them themselves. References would be helpful if they were personal references of stuff you actually read and curated. The value then would be getting your taste. References from an AI may well be good-looking nonsense.

2 more replies

wat100001mo ago

If you’re spending $65,000 on this thing, needing two circuits seems like a minor problem

ycui19861mo ago

they could had gone with the Max-Q version RTX PRO 6000 and only require 120V circuit. 10% performance hit, but half the power.

fundamentally, looks like they are shipping consumer off-the-shelf hardwares in a custom box.

ericd1mo ago

Yeah, the other big benefit is that the Max-Q's have blowers that exhaust the hot air out of the box, the workstation cards would each blow their exhaust straight into the intake of the card behind it. The last card in that chain would be cooking, as the air has already been heated up by 1800W, essentially a hair dryer on high.

Or could be the server edition 6000s that just have a heatsink and rely on the case to drive air through them, those are 600W cards.

ivraatiems1mo ago

The $12,000 one also requires it.

wat100001mo ago

The specs show that it only has one PSU. The docs just say that it has 2 and thus needs two circuits, but I’d guess that was meant to be for the more expensive one.

knollimar1mo ago

Easier to get two circuits than rewire a breaker in an office you might be renting, no?

(I work for an electrical contractor so my sense of ease might be overcorrecting)

1 more reply

isatty1mo ago

Surprisingly affordable but I’m not really interested in the 9070XT.

If it shipped with like 4090+ (for a higher price) it’d be more tempting.

2 more replies

jofzar1mo ago

Good? That's what I want out of all websites. I don't want to read what an AI believes is the best thing for a website, I want to know the honest truth.

agnishom1mo ago

I don't view this as irony. This seems like good sense in understanding when AI usage will make things better and when it will not.

Lerc1mo ago

I am a little surprised that they openly solicit code contributions with "Invest with your PRs" but don't have any statement on AI contributions.

Maybe the volume for them is ok that well-intentioned but poor quality PRs can be politely(or otherwise, culture depending) disregarded and the method of generation is not important.

KeplerBoy1mo ago

Tinygrad sure shared a few opinions on AI PRs on Twitter. I believe the gist was "we have Claude code as well, if that's all you bring don't bother".

all21mo ago

That's a pretty excellent take, IMO. Just an undirected AI model doesn't do much, especially when the core team has time with the code, domain expertise, _and_ Claude.

cyanydeez1mo ago

I'm starting to think that if you have an AI repo thats basically about codegen, you should just close all issues automatically, the manually (or whatever) open the ones you/maintainers actually care about. Thats about the only way to kill some of the signal/noise ratio AIs are creating.

Then you could focus fire, like the script kiddies did with DDoS in the old days on fixing whatever preferred issues you have.

adrianwaj1mo ago

"locally-trained models for everyone"

Wouldn't there be a massive duplication of effort in that case? It'll be interesting to see how the costs play out. There are security benefits to think about as well in keeping things local-first.

all21mo ago

There are multiple efforts for 'folding at home' but for AI models at this point. I get the impression that we will see a frontier model released this year built on a system like this.

nutjob21mo ago

3200W at ~240V is ~15A, that's just a regular household socket, at least in Europe. I imagine 240V sockets in the US are at least 15A.

No need for separate circuits, just use a double adapter.

kube-system1mo ago

When you’re dealing with this kind of power it’s easier just to colocate where you’ll typically get two separate feeds of 208v

harvey91mo ago

If I'm spending at least 12k USD on the machine then doing some electrical works to accommodate it is not a big deal.

trollbridge1mo ago

A typical U.S. 240V circuit is actually just two 120V circuits. Fairly trivial to rewire for that.

Salgat1mo ago

It's more accurate to say that the typical 120V circuit is just a 240V source with the neutral tapped into the midpoint of the transformer winding.

reactordev1mo ago

This. It definitely comes in at a higher voltage.

1 more reply

jcgrillo1mo ago

If you actually use two 120V circuits that way and one breaker flips the other half will send 120V through the load back into the other circuit. So while that circuit's breaker is flipped it is still live. Very bad. Much better to use a 240V breaker that picks up two rails in the panel.

amluto1mo ago

I assume the device has two separate PSUs, each of which accepts 120-240V, and neither of which will backfeed its supply.

HWR_141mo ago

They make connected circuit breakers for this use case, where one tripping automatically trips both.

ycui19861mo ago

i am guessing, without any proof, that, when one breaker fails the server lose it all, or loose two GPUs, depending on whether one connected to the cpu side failed.

1 more reply

projektfu1mo ago

Yes, if you have a 240V US split-phase circuit you could make a little sub-panel with a 40A breaker feeding two 20A 120V circuits and plug the two power supplies into each side. (1600W would need a 20-A breaker because 13.3A would be too much of a 15A circuit). But it would probably make more sense to just plug them both into the same 40A 240V circuit. If you use NEMA 6-20, make sure you label it appropriately and probably color it red.

In Europe, you could plug the two power supplies into an appropriately sized 240V circuit.

In an apartment you can't rewire, you could set it up in your kitchen, which in the modern US code should have two separate 20A circuits. You will need to put it to sleep while you use appliances.

razingeden1mo ago

A US circuit is.

But this is re: European 240/250 which is 240 between its load and neutral

I’d say don’t energize either systems ground plane, but , really, don’t do this in EU

0xbadcafebee1mo ago

I think you're forgetting the wires? If you have one outlet with a 15-20A 120V circuit, then the wiring is almost certainly rated for 15-20A. If you just "combined" two 120V circuits into a 240V circuit, you still need an outlet that is rated for 30A, the wires leading to it also need to be rated for 30A, and it definitely needs a neutral. So you still need a new wire run if you don't have two 120V circuits right where you wanna plug in the box. To pass code you also may need to upsize conduit. If load is continuously near peak, it should be 50A instead of 30.

So basically you need a brand new circuit run if you don't have two 120V circuits next to each other. But if you're spending $65k on a single machine, an extra grand for an electrician to run conduit should be peanuts. While you're at it I would def add a whole-home GFCI, lightning/EMI arrestor, and a UPS at the outlet, so one big shock doesn't send $65k down the toilet.

briandw1mo ago

Correct me if I’m wrong, but doubling the volts doesn't change the amps, it doubles the watts. Watts = V*A.

2 more replies

fc417fc8021mo ago

I think you might've misread GP. (Or maybe I did?)

He's not saying you would use it as two separate 120v circuits sharing a ground but rather as a single 240v circuit. His point is that it's easy to rewire for 240v since it's the same as all the other wiring in your house just with both poles exposed.

Of course you do have to run a new wire rather than repurpose what's already in the wall since you need the entire circuit to yourself. So I think it's not as trivial as he's making out.

But then at that wattage you'll also want to punch an exhaust fan in for waste heat so it's not like you won't already be making some modifications.

1 more reply

doubled1121mo ago

I’ve actually had half of my dryer outlet fail when half of the breaker failed.

Can confirm.

amluto1mo ago

Sometimes. 240V circuits may or may not have a neutral.

aiiizzz1mo ago

Why is hn so obsessed Scott whether something is _written_ by ai or not? Who cares? Judge content, not form.

Oh wait, I get it, it's bike shedding.

dddgghhbbfblk1mo ago

I've been seeing variations on your comment a lot on HN lately and I find it a rather vapid way of looking at something so intricate as human communication. Among other things, the medium is the message!

imjustmsk1mo ago

Big companies are pushing cloud really hard, and yea the hardware prices too is a problem. People still buy Google cloud and OneDrive when they could literally pickup an old computer from trash and Frankenstein it into a NAS server.

vessenes1mo ago

The exabox is interesting. I wonder who the customer is; after watching the Vera Rubin launch, I cannot imagine deciding I wanted to compete with NVIDIA for hyperscale business right now. Maybe it’s aiming at a value-conscious buyer? Maybe it’s a sensible buy for a (relatively) cash-strapped ML startup; actually I just checked prices, and it looks like Vera Rubin costs half for a similar amount of GPU RAM. I’m certain that the interconnect will not be as good as NV’s.

I have no idea who would buy this. Maybe if you think Vera Rubin is three years out? But NV ships, man, they are shipping.

kulahan1mo ago

Sometimes you can compete with the big boys simply because they built their infra 5+ years ago and it’s not economically viable for them to upgrade yet, because it’s a multi-billion dollar process for them. They can run a deficit to run you out of the business, but if you’re taking less than 0.01% of their business, I doubt they’d give a crap.

h14h1mo ago

Have to imagine each tinybox is targeting different tiers of startups trying to fine-tune/RL their way to custom models for narrow use-cases.

Maybe the target profile for exabox looks like a smaller/younger Cursor? If you're a small team with some seed funding and expertise, this kind of compute in a single box you can set up in your office feels like it could be a great fit.

zozbot2341mo ago

> The exabox is interesting.

Can it run Crysis?

dist-epoch1mo ago

Yes, it can generate Crysis with diffusion models at 60 fps.

WithinReason1mo ago

Only gamers understand that reference

-- Jensen Huang

zargon1mo ago

*Only gamers know that joke.

1 more reply

bastawhiz1mo ago

Probably, the rdna5 can do graphics. But it would be a huge waste, since you could probably only use one of the 720 GPUs

paxys1mo ago

The problem with all these "AI box" startups is that the product is too expensive for hobbyists, and companies that need to run workloads at scale can always build their own servers and racks and save on the markup (which is substantial). Unless someone can figure out how to get cheaper GPUs & RAM there is really no margin left to squeeze out.

nine_k1mo ago

Would a hedge fund that does not want to trust to a public AI cloud just buy chassis, mobos, GPUs, etc, and build an equivalent themselves? I suspect they value their time differently.

paxys1mo ago

Why do you think a hedge fund can't hire a couple of IT guys? Most of the larger ones have technical operations that would put big tech to shame.

ViscountPenguin1mo ago

Medium sized hedge funds are a good portion of the market, and only really want to hire just enough tech people to keep the quant pipelines running.

p1esk1mo ago

They wouldn’t build anything - they would order from Dell or Supermicro.

mihaaly1mo ago

We may be surprised how illefficient companies are in organizing the creation of sophisticated things (including processes) for themselves, to use (so for the cost center column).

Higher management figures out things to do in strategic level, in brief, and pushes on "soldiers", who kick it through in the least time (cheapest of the cheapest, for the sake of the quarterlies) EXACTLY the way management told it. Because they have to, their job is to make happen the company objectives given, the way it is given. Pushing out crap in the shape of the thing expected.

Larger organiztaion can use these kind of things the most. Even if they don't do that.

qubex1mo ago

They’re kickstarting a TINY device that is pocketable and aimed at consumers. I’ve backed it (full disclosure).

jgrizou1mo ago

https://www.kickstarter.com/projects/tiinyai/tiiny-ai-pocket...

griffinmb1mo ago

This is not the same company. The OP Tiny Corp accused them of Trademark infringement on Twitter, due to exactly this kind of misconception.

qubex1mo ago

I might’ve been conned: https://bay41.com/posts/tiiny-ai-pocket-lab-review/

1 more reply

alexfromapex1mo ago

$12,000 for the base model is insane. I have an Apple M3 Max with 128GB RAM that can run 120B parameter models using like 80 watts of electricity at about 15-20 tokens/sec. It's not amazing for 120B parameter models but it's also not 12 grand.

Thaxll1mo ago

M3 max tflops is tiny compared to the 12k box. It's not even comparable.

davej1mo ago

It is very comparable if you work out the $/tok/s on inference. I did some napkin math and it looks like you’re getting roughly 3x the performance for 3x the cost. Red v2 vs Mac Studio M3 Ultra 96GB.

If you compare tokens/kWh efficiency then my math has Mac Studio being about 1.5x more efficient.

zozbot2341mo ago

M3 has tolerable decode performance for the price, and that's what people would care about most of the time. they underperform severely wrt. prefill, but that's a fraction of the workload. AI, even agentic AI, spends most of its time outputing tokens, not processing context in bulk.

segmondy1mo ago

it's for fools. i bought 160gb of vram for $1000 last year. 96gb of p40 VRAM can be had for under $1000. And it will run gpt-oss-120b Q8 at probably 30tk/sec

timschmidt1mo ago

P40 is Tesla architecture which is no longer receiving driver or CUDA updates. And only available as used hardware. Fine for hobbyists, startups, and home labs, but there is likely a growing market of businesses too large to depend on used gear from ebay, but too small for a full rack solution from Nvidia. Seems like that's who they're targeting.

segmondy1mo ago

99% of interest is in inference. If you want to fine-tune a model, just rent the best gpu in the cloud. It's often cheaper and faster.

1 more reply

siliconc0w1mo ago

Tinybox is cool but I think the market is maybe looking more for a turn-key explicit promise of some level of intelligence @ a certain Tok/s like "Kimi 2.5 at 50Tok/s".

roarcher1mo ago

> In order to keep prices low and quality high, we don't offer any customization to the box or ordering process. If you aren't capable of ordering through the website, I'm sorry but we won't be able to help.

Has this guy never worked on a B2B product before? Nobody is going to order a $10 million piece of infrastructure through your website's order form. And they are definitely going to want to negotiate something, even if it's just a warranty. And you'll do it because they're waving a $10 million check in your face.

The tone of this website is arrogant to the point of being almost hostile. The guy behind this seems to think that his name carries enough weight to dictate terms like this, among other things like requiring candidates to have already contributed to his product to even be considered for a job. I would be extremely surprised if anyone except him thinks he's that important.

codemog1mo ago

I haven’t seen tinygrad used for any mainstream production project or thing of value, yet.

Besides a lot of self congratulatory pats on the back for how elegant it is. Honestly, when I read it, it looked confusing as all the other ML libraries. Not actually simple like Karpathy’s stuff.

All that to say, I do really want it to succeed. They should probably hire some practical engineers and not just guys and gals congratulating themselves how elegant and awesome they are.

jen729w1mo ago

Your framing of this section is misleading. On the site it's preceded by a FAQ-style 'question':

> Can you fill out this supplier onboarding form?

That's very important context, as anyone who has been asked to fill out a supplier onboarding form (hi) will attest.

roarcher1mo ago

Filling out an onboarding form is an example of what he's not willing to do, not the only thing he isn't willing to do.

> we don't offer any customization to the box or ordering process

Every B2B deal of that size that I've ever seen requires at least weeks of meetings between the customer and vendor, in which every detail is at least discussed if not negotiated. That would certainly constitute a "customization" to this guy's prescribed ordering process, which is to "Buy it now" [1] through the website at the stated price like you're ordering a jar of peanuts on Amazon. This is not "framing", it's what the guy said. If it isn't what he meant then he needs to fix his copy.

[1] Yes, there is an actual "Buy it now" button for a $65,000 business purchase that takes you to a page that looks just like a Stripe form. There isn't even a textbox for delivery instructions. Wild.

awesomeMilou1mo ago

Then if they succeed, I guess you're going to see a different process for the first time in your life.

On a website where we frequently talk about disruptive business models, this whole attitude kinda stinks.

2 more replies

phrotoma1mo ago

> arrogant to the point of being almost hostile

First encounter with geohot eh?

crossroadsguy1mo ago

What does this mean? Is it some reference to different temperaments across geographies? Or some Internet slang?

FergusArgyll1mo ago

https://en.wikipedia.org/wiki/George_Hotz

wmf1mo ago

He's not actually selling the exabox yet. It sounds like he put up a hypothetical config to see if anyone is interested.

kube-system1mo ago

The specs for the “exabox” scream “this is a joke” to me.

> 20,000 lbs

> concrete slab

Huge-scale IT systems are typically delivered in one or more 42/44u cabinets, and are designed to be installed on raised floors.

0xbadcafebee1mo ago

It's a shipping container. Look at the dimensions. They say concrete slab probably half as a joke, half because building code would require it to consider it a non-temporary structure.

wmf1mo ago

It's a shipping container that you install outdoors.

kube-system1mo ago

Are you referring to the images of branded shipping containers on their Twitter page that have visible Gemini watermarks … and jokes in the comments about AI trailer parks?

1 more reply

roarcher1mo ago

It's also funny that they explicitly list driver quality as "good" for the base option and "great" for the intermediate one. You're really going to deliberately provide worse drivers for the machine I paid you for, just because I didn't buy the more expensive one?

I mean I'm sure lots of companies do this in practice because tickets for higher-paying customers naturally get prioritized, but directly stating your intention to do it on your home page is hilarious.

wmf1mo ago

Nvidia drivers are better than AMD. It's not really something they have control over. Geohot is definitely obsessed with bitching about driver bugs though.

1 more reply

kube-system1mo ago

I took that as a dig against AMD vs Nvidia driver quality.

zekrioca1mo ago

I guess it is called ‘honesty’.

HWR_141mo ago

There isn't a $10MM device right now, just $64M and under. I doubt the order process will remain the same in 12 months when the $10MM device becomes available

jrflowers1mo ago

I imagine that the FAQ might get updated when there’s actually a $10M machine for sale

roarcher1mo ago

Maybe. Frankly I'd be very surprised if any business ordered a $65k machine that way either.

jrflowers1mo ago

Yeah it’s a little odd. Maybe they are meant to be really really cool toys? People regularly spend more than $65k on things like cars to show off, so it could be like that.

I have no use for these but I might buy one anyway if I won the lottery. ¯\_(ツ)_/¯

Havoc1mo ago

> arrogant to the point of being almost hostile.

The YouTube rap video of geohotz telling Sony lawyers suing him to blow him is still up.

His style of dealing with corporate matters is certainly unconventional

lofaszvanitt1mo ago

Well, at least he had the power that average joes don't have. And he used it well.

hmokiguess1mo ago

Is this like the new equivalent of crypto mining? I remember the early days when they would sell hardware for farming crypto, now it’s AI?

latchkey1mo ago

Kind of yes, except there is no block reward.

barnabee1mo ago

The block reward is firing humans and collecting ad revenue for slop

ekropotin1mo ago

IDK, I feel it’s quite overpriced, even with the current component prices.

I almost sure it’s possible to custom build a machine as powerful as their red v2 within 9k budget. And have a lot of fun along the way.

lostmsu1mo ago

AMD now has 32 GiB Radeon AI Pro 9700. 4 of these (just under 2k each) would put you at 128 GiB VRAM

ekropotin1mo ago

VRAM is not everything - GPU cores also matter (a lot) for inference

cyanydeez1mo ago

inference speed is like monitor Hz; sure, you go from 60 to 120Hz and thats noticeable, but unless your model is AGI, at some point you're just generating more code than you'll ever realistically be able to control, audit and rely on.

So, context is probably more $/programming worth than inference speed.

lostmsu1mo ago

4x Radeon will have significantly more GPU power than say Mac Studio or DGX Spark.

mellosouls1mo ago

Where is the 120B documented? This seems to be an editorialized title.

Edit: found a third party referencing the claim but it doesn't belong in the title here I think:

Meet the World’s Smallest ‘Supercomputer’ from Tiiny AI; A Machine Bold Enough to Run 120B AI Models Right in the Palm of Your Hand

https://wccftech.com/meet-the-worlds-smallest-supercomputer-...

Aurornis1mo ago

That third party link is from a different company (Tiiny with an extra i)

Now I'm wondering if the HN title was submitted by some AI bot that couldn't tell the difference.

mellosouls1mo ago

Ha, good catch, I googled for Tinybox 120B and clearly didn't read the article beyond the seeming match.

adrianwaj1mo ago

Perhaps this company should think about acting as a landlord for their hardware. You buy (or lease) but they also offer colocation hosting. They could partner with crypto miners who are transitioning to AI factories to find the space and power to do this. I wonder if the machines require added cooling, though, in what would otherwise be a crypto mining center. CoreWeave made the transition and also do colocation. The switchover is real.

I think Tinygrad should think about recycling. Are they planning ahead in this regard? Is anyone? My thought is if there was a central database of who own what and where, at least when the recycling tech become available, people will know where to source their specific trash (and even pay for it.) Having a database like that in the first place could even fuel the industry.

operatingthetan1mo ago

The incremental price increases between products is funny.

$12,000, $65,000, $10,000,000.

znpy1mo ago

I was more worried by the 600kW power requirement... that's 200 houses at full load (3kw) in southern europe... which likely means 400 houses at half load.

the town near my hometown has 650 – 800 houses (according to chatgpt).

crazy.

nine_k1mo ago

Or it's two 300kW fast EV chargers working together.

A typical home just consumes rather little energy, now that LED lighting and heat pump cooling / heating became the norm.

delusional1mo ago

I think the above commentor is reflecting on the total energy use from having a 600KW load running 24/7. I suppose the more interesting observation is the 14 MWh of daily consumption, enough to charge 100 Rivians every day.

paganel1mo ago

> and heat pump cooling / heating became the norm.

We're not all solidly middle-class (especially in Southern and Eastern Europe) and as such we cannot afford those heat pumps. But we'll have to eat the increased energy costs brought by insane server configurations like the ones from the article, so, yeey!!!

znpy1mo ago

> now that LED lighting and heat pump cooling / heating became the norm.

My brother in Christ, you vastly overestimate southern europe

1 more reply

nutjob21mo ago

> at full load (3kw)

Do you live in a deprived rural village in a very poor country? Because you can't even run a heater and the oven with 3kW.

znpy1mo ago

No it’s quite the norm actually.

Most power contracts give you 3 kwh power supply for residential home. That’s the standard.

Bumping to 4.5 or 6kwh must be required explicitly and costs and extra on the base power supply bill

dist-epoch1mo ago

Your hometown also has public lightning, water pumps, and probably some other stuff.

ericd1mo ago

That’s surprising, 200 amp 240v service is pretty common in the US.

sudo_cowsay1mo ago

I mean the difference in performance is quite big too. However, the 10,000,000 is a little bit too much (imo).

mmoustafa1mo ago

I would love to see real-life tokens/sec values advertised for one or various specific open source models.

I'm currently shopping for offline hardware and it is very hard to estimate the performance I will get before dropping $12K, and would love to have a baseline that I can at least always get e.g. 40 tok/s running GPT-OSS-120B using Ollama on Ubuntu out of the box.

hpcjoe1mo ago

Look for llmfit on github. This will help with that analysis. I've found it reasonably accurate. If you have Ollama already installed, it can download the relevant models directly.

atwrk1mo ago

For reference, 12k gets you at least 4 Strix Halo boxes each running GPT-OSS-120B at ~50tok/s.

adi_kurian1mo ago

https://en.wikipedia.org/wiki/Decoy_effect

mciancia1mo ago

Not sure why they stopped using 6 GPUs in thei builds - with 4 GPUs, both 9070 and rtx6000 come in 2 slot designs, so it easy to build it yourself using a bit more expensive, but still fairly regular motherboard.

With 6 GPUs you have to deal with risers, pcie retimers, dual PSUs and custom case for so value proposition there was much better IMO

wongarsu1mo ago

Sound like solid prebuilt with well balanced components and a pretty case

Not revolutionary in any way, but nice. Unless I'm missing something here?

eurekin1mo ago

It's pretty close to what people have been frankenbuilding on r/LocaLLaMa... It's nice to have a prebuild option.

speedgoose1mo ago

You could also order such configurations from a classic server reseller as far as I know. The case is a bit original there.

nextlevelwizard1mo ago

Tiny boxes are already several years old IIRC

llbbdd1mo ago

If you wanted a box built by geohot, most recently known for signing on to Elons Twitter and then bailing, it's for you

asadm1mo ago

actually known for comma.ai

comrade12341mo ago

Cool that you have a dual power supply model. It says rack mountable or free standing. Does that mean two form factors? $65K is more than we can afford right now but we are definitely eventually in the market for something we can run in our own colo.

It's funny though... we're using deepseek now for features in our service and based on our customer-type we thought that they would be completely against sending their data to a third-party. We thought we'd have to do everything locally. But they seem ok with deepseek which is practically free. And the few customers that still worry about privacy may not justify such a high price point.

2 more replies

SmartestUnknown1mo ago

Regarding 2x faster than pytorch being a condition for tinygrad to come out of alpha:

Can they/someone else give more details as to what workloads pytorch is more than 2x slower than the hardware provides? Most of the papers use standard components and I assume pytorch is already pretty performant at implementing them at 50+% of extractable performance from typical GPUs.

If they mean more esoteric stuff that requires writing custom kernels to get good performance out of the chips, then that's a different issue.

ks20481mo ago

"... and likely the best performance/$".

"likely" doesn't inspire much confidence. Surely, they have those numbers, and if it was, they'd publicize the comparisons.

gymbeaux1mo ago

$12,000 gets you 1Gb/s networking and vanilla Ubuntu 24.04. Napkin math on the hardware it looks like margins are around 50% which feels like a school fundraiser where everyone pays what is obviously way more than normal retail price for X because "it's for the children."

I'm not sure what tinygrad is but I assume the markup is because the customer is making a conscious choice to support the tinygrad project. But what's unusual is there is apparently no reason whatsoever to buy this hardware, even if you plan on using tinygrad exclusively for your project. At least with System76 hardware I get (in theory) first class support for Pop!_OS.

mayukh1mo ago

What’s the most effective ~$5k setup today? Interested in what people are actually running.

BobbyJo1mo ago

Depends. If token speed isn't a big deal, then I think strix halo boxes are the meta right now, or Mac studios. If you need speed, I think most people wind up with something like a gaming PC with a couple 3090 or 4090s in it. Depending on the kinds of models you run (sparse moe or other), one or the other may work better.

emidoots1mo ago

At $7.2k + tax:

* RAM - $1500 - Crucial Pro 128GB Kit (2x64GB) DDR5 RAM, 5600MHz CP2K64G56C46U5, up to 4 sticks for 128GB or 256GB, Amazon

* GPU - $4700 - RTX Pro 5000 48GB, Microcenter

* CPU/Mobo bundle - $1100 - AMD Ryzen 7 9800X3D, MSI X870E-P Pro, ditch the 32GB RAM, Microcenter

* Case - $220, Hyte Y70, Microcenter

* Cooler - $155, Arctic Cooling Liquid Freezer III Pro, top-mount it, Microcenter

* PSU - $180, RM1000x, Microcenter

* SSD - $400 - Samsung 990 pRO 2TB gen 4 NVMe M.2

* Fans - $100 - 6x 120mm fans, 1x 140mm fan, of your choice

Look into models like Qwen 3.5

ac291mo ago

> RAM - $1500 - Crucial Pro 128GB Kit (2x64GB) DDR5 RAM

I knew prices went up, but that's wild. I bought 64GB (2x32) of RAM a year ago for $90.

cmxch1mo ago

Surprised to see X3D given the reports of failures. I’ve opted for a regular 9900x and X670E-E just to have a bit more assurance.

aurareturn1mo ago

$7.2k just to run at best Qwen3.5-35B-A3B doesn't seem worth it at all.

This is certainly not the most effective use of $7k for running local LLMs.

The answer is a 16" M5 Max 128GB for $5k. You can run much bigger models than your setup while being an awesome portable machine for everything else.

emidoots1mo ago

Performance (tok/s and PP) or quality (model size)? Pick one.

In terms of GPU memory bandwidth (models fitting in the ~48GB of RTX 5000 Pro card), the RTX card I described above has over 2x the bandwidth of an M5 Max.

If leveraging system RAM (the 128GB-256GB outside the GPU) to run larger models, then the memory bandwidth is ~6x slower than M5 Max.

For models fitting in the ~48GB RTX memory, like dense Qwen3.5 27B models, the RTX will be 2-4x faster than M5 Max. For models that don't fit in the 48GB RTX memory, the M5 Max will be 5-20x faster.

Also worth considering future upgrades: Do you plan to throw away the machine in a few years, or pick up multiple used RTX 6000 Pro cards when people start ditching them?

bensyverson1mo ago

Sadly $5k is sort of a no-man's land between "can run decent small models" and "can run SOTA local models" ($10k and above). It's basically the difference between the 128GB and 512GB Mac Studio (at least, back when it was still available).

EliasWatson1mo ago

The DGX Spark is probably the best bang for your buck at $4k. It's slower than my 4090 but 128gb of GPU-usable memory is hard to find anywhere else at that price. It being an ARM processor does make it harder to install random AI projects off of GitHub because many niche Python packages don't provide ARM builds (Claude Code usually can figure out how to get things running). But all the popular local AI tools work fine out of the box and PyTorch works great.

NickJLange1mo ago

It's $4.7K now, darn inflation!

https://marketplace.nvidia.com/en-us/enterprise/personal-ai-...

A small joke at this weeks GTC was the "BOGOD" discount was to sell them at $4K each...

oofbey1mo ago

DGX Spark is a fantastic option at this price point. You get 128GB VRAM which is extremely difficult to get at this price point. Also it’s a fairly fast GPU. And stupidly fast networking - 200gbps or 400gbps mellanox if you find coin for another one.

BobbyJo1mo ago

Internet seems to think the SW support for those is bad, and that strix halo boxes are better ROI.

oofbey1mo ago

Meh. DGX is Arm and CUDA. Strix is X86 and ROCm. Cuda has better support than ROCm . And x86 has better support than Arm.

Nowadays I find most things work fine on Arm. Sometimes something needs to be built from source which is genuinely annoying. But moving from CUDA to ROCm is often more like a rewrite than a recompile.

2 more replies

ekropotin1mo ago

I’m not very well versed in this domain, but I think it’s not going to be “VRAM” (GDDR) memory, but rather “unified memory”, which is essentially RAM (some flavour of DDR5 I assume). These two types of memory has vastly different bandwidth.

I’m pretty curious to see any benchmarks on inference on VRAM vs UM.

oofbey1mo ago

I’m using VRAM as shorthand for “memory which the AI chip can use” which I think is fairly common shorthand these days. For the spark is it unified, and has lower bandwidth than most any modern GPU. (About 300 GB/s which is comparable to an RTX 3060.)

So for an LLM inference is relatively slow because of that bandwidth, but you can load much bigger smarter models than you could on any consumer GPU.

banana_giraffe1mo ago

A quick benchmark using float32 copies using torch cuda->cuda copies, comparing some random machines:

    Raptor Lake + 5080: 380.63 GB/s
    Raptor Lake (CPU for reference): 20.41 GB/s
    GB10 (DGX Spark): 116.14 GB/s
    GH200: 1697.39 GB/s

This is a "eh, it works" benchmarks, but should give you a feel for the relative performance of the different systems.

In practice, this means I can get something like 55 tokens a sec running a larger model like gpt-oss-120b-Q8_0 on the DGX Spark.

1 more reply

borissk1mo ago

Can even network 4 of these together, using a pretty cheap InfiniBand switch. There is a YouTube video of a guy building and benchmarking such setup.

For 5K one can get a desktop PC with RTX 5090, that has 3x more compute, but 4x less VRAM - so depending on the workload may be a better option.

ekropotin1mo ago

VRAM vs UM is not exactly apples to apples comparison.

cco1mo ago

Biggest Mac Studio you can get. The DGX Spark may be better for some workflows but since you're interested in price, the Mac will maintain it's value far longer than the Spark so you'll get more of your money out of it.

borissk1mo ago

With $5k you have to make compromises. Which compromises you are willing to make depends on what you want to do - and so there will be different optimal setup.

kristopolous1mo ago

Fully aware of the DGX spark I've actually been looking into AMD Ryzen AI Max+ 395/392 machines. There's some interesting things here like https://www.bee-link.com/products/beelink-gtr9-pro-amd-ryzen... and https://www.amazon.com/GMKtec-5-1GHz-LPDDR5X-8000MHz-Display... ... haven't pulled the trigger yet but apparently inferencing on these chips are not trash.

Machines with the 4xx chips are coming next month so maybe wait a week or two.

It's soldered LPDDR5X with amd strix halo ... sglang and llama.cpp can do that pretty well these days. And it's, you know, half the price and you're not locked into the Nvidia ecosystem

ejpir1mo ago

unfortunately the bigger models are pretty slow in token speed. The memory is just not that fast.

You can check what each model does on AMD Strix halo here:

https://kyuz0.github.io/amd-strix-halo-toolboxes/

Tepix1mo ago

4xx chips are less capable than the 395

zozbot2341mo ago

> What’s the most effective ~$5k setup today?

Mac Studio or Mac Mini, depending on which gives you the highest amount of unified memory for ~$5k.

the_arun1mo ago

Curious to know who will spend this much money without external funding? Would you spend any VC invested money into this nameless brand? Are there any guardrails or clauses to protect the kind of expenses?

ilaksh1mo ago

I thought the most interesting thing about tinygrad was that theoretically you could render a model all the way into hardware similar to Taalas (tinygrad might be where Taalas got the idea for all I know).

I could swear I filed a GitHub issue asking about the plans for that but I don't see it. Anyway I think he mentioned it when explaining tinygrad at one point and I have wondered why that hasn't got more attention.

As far as boxes, I wish that there were more MI355X available for normal hourly rental. Or any.

h14h1mo ago

Would be very curious how RL benchmarks shake out vs M5 Pro/Max.

Doubt local inference is the target use case near nearly as much as post-training. I could totally see something like this being super appealing for a startup looking to do some fine-tuning/distillation to tune a small open-weight model for a narrow use case.

alasdair_1mo ago

I just don’t believe that this can run inference on a 120 billion parameter model at actually useful speeds.

Obviously any Turing machine can run any size of model, so the “120B” claim doesn’t mean much - what actually matters is speed and I just don’t believe this can be speedy enough on models that my $5000 5090-based pc is too slow for and lacks enough vram for.

mnkyprskbd1mo ago

Look at the GPU and RAM spec; 120b seems workable.

Aurornis1mo ago

For the red v2?

120B could run, but I wouldn't want to be the person who had to use it for anything.

To be fair, the 120B claim doesn't appear on the webpage. I don't know where it came from, other than the person who submitted this to HN

mnkyprskbd1mo ago

It is more than fair, also, you're comparing your 5k devices to 12k and more importantly 65k and >10m devices.

1 more reply

Aissen1mo ago

It might be a bit CPU and RAM starved… Which in theory should be OK, but in practice you'll find production workloads that struggle because of this. Just make sure whatever you want to run on this is indeed extremely GPU-bound, or you might have bad surprises later.

himata41131mo ago

exabox reads as if it was making a joke of something or someone. if it's real then it's really interesting!

jmspring1mo ago

Tinygrad devices are interesting, I wish I have screen captures - but their prices have gone up and some specs like RAM have gone down.

A single box with those specs without having to build/configure (the red and green) - I could see being useful if you had $ and not time to build/configure/etc yourself.

renewiltord1mo ago

I have 8x RTX 6000 Pro. Better to run the 300 W version of the cards. And it costs close to their 4x version. I get why they make it so big. So you can cool it at home. I prefer to just put in datacenter. Much cheaper power.

zahirbmirza1mo ago

10 mil today... 1k in 10 years. Are OpenAI and Anthropic overvalued?

Gigachad1mo ago

Looking at these prices I’m just thinking that as a user it makes no sense to buy this when you can just use the subsidised stuff from AI companies and then buy it a few years later at a tiny % of the cost.

DeathArrow1mo ago

Why do I get the impression that I get more bang for the buck by going through OpenRouter? Of course, not anyone can do that and there are security and other concerns.

triwats1mo ago

This is cool, I'll add these as desktops to https://flopper.io!

How do you test/generate these numbers?

p0w3n3d1mo ago

Quite expensive little bastard. I wonder how much does it make sense to invest in a such device, if you can get $0.40/mtok from hyperbolic for example

sowbug1mo ago

If you're OK letting them train on, and maybe keep, your data, then it's hard to beat cloud prices vs. local.

heinternets1mo ago

exabox -

720x RDNA5 AT0 XL 25,920 GB VRAM 23,040 GB System RAM

~ $10 Million

Who is the target market here?

LorenDB1mo ago

I can't find sources but I think they are building it for Comma.ai (geohot's other company) so that Comma can scale up their training datacenter.

orochimaaru1mo ago

And... what about 20k lbs and 1360 cubic feet screams "tiny" :)

smoyer1mo ago

That is very close to a half-length shipping container.

spiderfarmer1mo ago

VC funded startups

mayukh1mo ago

A non-trivial share of this market won’t show up in public data. That makes most estimates unreliable by default

dist-epoch1mo ago

A company which doesn't want the big LLM providers to see it's prompts or data - military, health, finance, research

Buttons8401mo ago

Oh, this is geohots product?

He's an interesting guy. Seems to be one who does things the way he thinks is right, regardless of corporate profits.

andai1mo ago

Can someone explain the exabox? They say it "functions as a single GPU". Is there anything like that currently existing?

wmf1mo ago

An NVL72 rack or Helios rack also "functions as a single GPU".

progbits1mo ago

TPU pods

saidnooneever1mo ago

its a bit weird to me ud need to be contributor to their software to work in operations or hardware, but I suppose its ok for tinycompany. in long term its likely better to have domain experts and not bias everything towards the same thing.

the boxes look cool but how good are they really? the cheapest box seems pricey at 12 for a what is essentially a few gaming gpus. i dont see why you couldnt make that like half the price. u could do a PC/server build thats much much faster for way less. size doesnt matter if its more than twice the price i think...

the more expensive box has atleast real processing gpus but afaik also not very popular ones, this one seems maybe more fair priced (there seems a big difference in bang for buck between these???).

the third one suggested looks like a joke.

dont get me wrong, this seems like a really cool idea. But i dont see it taking off as the prices are corporate but the product seems more home use.

maybe in time they will find a better balance, i do respect the fact that the component market now is sour as hell and making good products with stable prices is pretty much i possible.

id love one of these machines someday, maybe when i am less poor, or when they are xD.

(love the styling of everything, this is the most critical i could be from a dumb consumer perspective, which i totally am btw.)

sudo_cowsay1mo ago

I always wonder about these expensive products: Does the company make them once its ordered or do they just make them beforehand?

cyanydeez1mo ago

In this case, they're taking wire transfers, so they're definitely building them once they get the cash.

wmf1mo ago

He builds a batch every few months.

operatingthetan1mo ago

Are we at the point where 2x 9070XT's are a viable LLM platform? (I know this has 4, just wondering for myself).

oceanplexian1mo ago

These things don’t have Flash Attention or either have a really hacked together version of it. Is it viable for a hobby? Sure. Is it viable for a serious workload with all the optimizations, CUDA, etc.. Not really.

cyanydeez1mo ago

I'd go with strix halo if you're looking at that old of tech.

the latest AMD GPUs are RX 9070 XT w/32GB each

kylehotchkiss1mo ago

Meanwhile M-series processors and Qwen are racing to do the same thing for a much more approachable price.

orliesaurus1mo ago

I wonder if this is frontpage right now because of the other tiiny (the names are similar) video that went viral ... which turns out wasn't an actual product by the tinygrad linked in this post[1]

[1]https://x.com/ShriKaranHanda/status/2035284883384553953

droidjj1mo ago

Adding this to my list of ~beautifully~ designed things to buy when I win the lottery.

arunakt1mo ago

Great idea, can you publish the power consumption units for this device

agnishom1mo ago

Who is the intended customer for this product? I am genuinely curious.

moscoe1mo ago

Anyone who wants to run/train/finetune a local llm.

“Not your weights, not your brain.”

jgarzik1mo ago

Skeptical of their engineering, with replies to questions like this: https://x.com/jgarzik/status/2031312666036146460?s=20

_2d301mo ago

They answered your question with a pretty specific uptime target. Calling it a dodge and then moving the goalposts with a new question as your follow up doesn’t speak to you acting in good faith.

scratchyone1mo ago

tbh they really didn't, tinygrad's was clearly a joke response. they were not providing a real uptime target.

potamic1mo ago

Can't see replies, what did they say?

Moduke1mo ago

https://xcancel.com/jgarzik/status/2031312666036146460?s=20

jauntywundrkind1mo ago

My interest in anything associated with geohot took a colossal nose dive today after seeing this post against democracy, quoting frelling M*ncius M*ldbug: Democracy is a Liability. https://news.ycombinator.com/item?id=47469543 https://geohot.github.io//blog/jekyll/update/2026/03/21/demo...

Theres a lot there that makes sense & I think needs to be considered. But a lot just seems to be out of the blue, included without connection, in my view. Feels like maybe are in-grouo messages, that I don't understand. How this is headered as against democracy is unclear to me, and revolting. I both think we must grapple with the world as it is, and this post is in that area, strongly, but to let fear be the dominant ruling emotion is one of the main definitions of conservativism, and it's use here to scare us sounds bad.

kelvinjps101mo ago

He was always defending democracy and freedom before, and that was his argument for the local AI thing? What changed?

stale20021mo ago

Geohotz's politics are fairly straightforward once you understand his background. Geohotz is the prodigy child who, at the age of ~16 accomplished amazing technical feats on his own.

And his politics are a derivative of Great Man Theory, and his positions on things like democracy follow from that. This idea, and those espoused by some of the VC/tech elite like Peter Theil are that singular hardworking genius individuals can change the world on their own, and everyone who not in this top 0.1% are borderline NPCs.

They do this both because of their genius/hardwork, and also because they are willing to break the rules that are set forth by this bottom 99.9%.

I'm starting to call this ideology Authoritarian techno-Libertarianism. Its a delibriately oxymoronic name that I use, because these "Great Men" are definitely trying to change the world. IE, they are trying to impose their goals and values on the world without getting the buyin of other people.

Thats the "authoritarian" part. And then the "libertarian" part is that they are going about this imposition of their will on the world by doing it all themselves, through their own hard work.

Think "Person invents a world changing technology, that some people thing is bad, and just releases it open source for anyone to use". AI models are a great example, in fact. Once that technology is out there the genie cannot be put back into the bottle and a ton of people are going to lose their jobs, ect.

A distain for democracy follows directly from things like this. You dont wait for people to vote to allow you to change the world by inventing something new. You just do and watch the results.

overfeed1mo ago

> also because they are willing to break the rules that are set forth by this bottom 99.9%[...] they are going about this imposition of their will on the world by doing it all themselves, through their own hard work.

I think all these wildly successful neo-feudalists get increasingly emboldened the more they get away with bigger and bigger social infractions.

It's also clear that they haven't experienced existed an environment with extreme inequality - it's not safe for anyone there! They think the NPC plebs will continue to follow "the rules" ad perpetuam without considering that it is a direct result of the stability they are actively undermining. They clearly don't read enough history.

SilverElfin1mo ago

What makes it “Libertarianism” still? To me it feels like they’re taking away freedom, control, and influence from everyone who is not them. Even the concentration of wealth is itself taking away everyone else’s places in the world.

stale20021mo ago

> What makes it “Libertarianism” still?

Its libertarian because it is fundamentally about individuals acting on their own without going through the government, ect. It is an individualistic framework. Individuals going about achieving their goals, even through powerful corporations, falls squarely within what libertarians support.

Yes, you can make some philosophical point about how if corporations are powerful enough, how is that in any way different from governments.

But, powerful corporations controlling society, in some sort of fallout style or bioshock style dystopia clearly describes a libertarian dystopia, not a left leaning or even fascist dystopia.

LogicFailsMe1mo ago

Scratch a libertarian and a fascist bleeds libertarianism here, no?

yukIttEft1mo ago

He had a video on Youtube where he proudly gloated about how he voted for Trump in not one but two elections, how happy he is that he can now openly talk about it, how its a fresh start for US, how catastrophic Harris would have been.

Did he take down the video because of embarrassment or did he fear negative impact on his sales?

fragmede1mo ago

Damn, that's a take.

pencilheads1mo ago

Geohot has always been an arrogant cunt who thinks he's better than everyone else. That blog post is totally on brand.

tadfisher1mo ago

For those unaware, Mencius Moldbug is the pen name of Curtis Yarvin, thought leader for the Silicon Valley branch of right-wing technofascist weirdos which includes Peter Thiel and apparently half of a16z.

mememememememo1mo ago

Give me token/s for favourite models.

qubex1mo ago

I just backed their TINY on Kickstarter.

rick_dalton1mo ago

That thing is NOT related to tinybox or tinygrad in any way. It is basically copyright infringement. Unless you’re astroturfing here I suggest you get your money back.

qubex1mo ago

Wasn’t astroturfing, I’ll look into it, thanks.

rick_dalton1mo ago

Sorry for even mentioning astroturfing, haha. It’s just because the promotion of the device is based on trying to fool people it was made by tiny corp.

1 more reply

ppap31mo ago

I thought there was a typo in the price

vlovich1231mo ago

Surprising to see this with AMD GPUs considering how George famously threw up his hands as AMD not being worth working with.

embedding-shape1mo ago

Yeah, and labeling AMD "Driver Quality" as "Good" (for comparison, they label nvidia's driver quality as "Great").

lostmsu1mo ago

Things changed. On my new Ryzen Strix Halo laptop I was able to run training experiments with PyTorch on Windows day 1: https://news.ycombinator.com/item?id=46052535

vlovich1231mo ago

Yeah, installing random wheels from non official sources is an improvement. Not sure I’d characterize that as an unmitigated win. But also as soon as you try to do more involved things, at least personally, I ran into serious challenges getting things to work.

DeathArrow1mo ago

I wonder how much has he sold.

raincole1mo ago

How does this thing cool down?

throwatdem123111mo ago

Finally, a computer that should be able to run Monster Hunter Wilds with decent performance.

But let’s be real, 12k is kinda pushing it - what kind of people are gonna spend $65k or even $10M (lmao WTAF) on a boutique thing like this. I dont think these kinds of things go in datacenters (happy to be corrected) and they are way too expensive (and probably way too HOT) to just go in a home or even an office “closet”.

oofbey1mo ago

It’s not for people to buy. It’s for companies to buy. Compare to salary, and it’s cheap.

aziaziazi1mo ago

> What's the goal of the tiny corp? To accelerate. We will commoditize the petaflop and enable AI for everyone.

I had the same feeling as throwadem when reading this. Your comment clarify what they meant by "everyone"

lostmsu1mo ago

Hm, I compared my salary with $10M and it doesn't feel cheap. I guess skill issue.

throwatdem123111mo ago

But how will I make ad-supported youtube videos about how I automated my life with OpenClaw using a $10M boutique AI server to make a few thousand in ad revenue while burning tens of thousands per month on API cost.

throwatdem123111mo ago

What companies are buying this instead of like a Dell server or whatever?

flumpcakes1mo ago

These specs look enormously cheaper than doing it with dell servers. The last quote I had for a bog standard dell server was $50k and only if bought in the next few days or so. The prices are going up weekly.

1 more reply

rpastuszak1mo ago

Who is this for?

flykespice1mo ago

"tiny" and it's 20k lbs and cost about 10k...

Since when did our perception of tiny blow out of size in tech? Is it the influence of "hello world" eletron apps consuming 100mb of mem while idle setting the new standard? Anyway being an AI bro seems like an expensive hobby...

aabaker991mo ago

> Can I pay with something besides wire transfer? In order to keep prices low and quality high, we don't offer any customization to the box or ordering process. Wire transfer is the only accepted form of payment.

Sorry, what? Is this just a scam?

1010081mo ago

Wire transfer has no comission or extra costs associated to it, so I find it very honest.

ejpir1mo ago

man, cmon. a little more effort.

aabaker991mo ago

Sure thing. For those who don’t know, wiring money like this is a good way to lose your money.

https://consumer.ftc.gov/articles/what-know-you-wire-money

metadata1mo ago

Wire transfer is a bank transfer, not money wire to Western Union and like.

1 more reply

insane_dreamer1mo ago

Is this real? Reads like a joke. They sell a $12K machine, a $60K machine, and a $10M machine???

wmf1mo ago

Nvidia has $4K DGX Spark, $120K DGX Station, $500K DGX, and $7M NVL72.

fhn1mo ago

"but if you haven't contributed to tinygrad your application won't be considered" this company expects people to work for free?

paxys1mo ago

> See our bounty page to judge if you might be a good fit. Bounties pay you while judging that fit.

Literally the line above that

roarcher1mo ago

They MIGHT pay you IF you're a fit. They're bounties, i.e. spec work. They also pay a max of $1000, most of them significantly less. You can see more info at the link in that line:

> All bounties paid out at my (geohot) discretion. Code must be clean and maintainable without serious hacks.

No thanks. If you want to try before you buy, have your candidates do a paid test project. Founders need to stop acting like it's a privilege to work for them. Any talent worth hiring has plenty of other options that will treat them with respect.

j / k navigate · click thread line to collapse

341 comments

bastawhiz1mo ago

I'm running a 70b model now that's okay, but it's still fairly tight. And I've got 16gb more vram then the red v2.

I'm also confused why this is 12U. My whole rig is 4u.

The green v2 has better GPUs. But for $65k, I'd expect a much better CPU and 256gb of RAM. It's not like a threadripper 7000 is going to break the bank.

I'm glad this exists but it's... honestly pretty perplexing

overfeed1mo ago

> I'm also confused why this is 12U. My whole rig is 4u.

ottah1mo ago

randomgermanguy1mo ago

The major selling point of the tinyboxes is that you're able to run them in your office without any hassle.

I used to own a Dell Poweredge for my home-office, but those fans even on minimal setting kept me up at night

oceanplexian1mo ago

androiddrew1mo ago

Could you share what you are using for inference and how you are running it? I have a 64G VRAM/128G system RAM setup.

sosodev1mo ago

Most people are using something in the llama family for inference. Llama server is my go to. Unsloth guides describe how to configure inference for your model of choice.

syntaxing1mo ago

Split RAM and GPU impacts it more than you think. I would be surprised if the red box doesn’t outperform you by 2-3X for both PP and TG

datadrivenangel1mo ago

Yeah I've got the q4 gpt-oss-120b running at ~40-60 tokens per second on an M5 Pro.

ericd1mo ago

Was that cheaper than a Blackwell 6000?

But yeah, 4x Blackwell 6000s are ~32-36k, not sure where the other $30k is going.

bastawhiz1mo ago

I bought the A100s used for a little over $6k each.

ericd1mo ago

Oh, why'd you go that route? Considering going beyond 80 gigs with nvlink or something?

1 more reply

segmondy1mo ago

folks have too much money than sense, gpt-oss-120b full quant runs on my quad 3090 at 100tk/sec and that's with llama.cpp, with vllm it will probably run at 150tk/sec and that's without batching.

Aurornis1mo ago

> gpt-oss-120b full quant runs on my quad 3090

A 120B model cannot fit on 4 x 24GB GPUs at full quantization.

Either you're confusing this with the 20B model, or you have 48GB modded 3090s.

1 more reply

integralid1mo ago

edit: Found your comment about /r/localllama, but if you have anything more to add I'm still very interested.

amarshall1mo ago

You're almost certainly (definitely, in fact) confusing the 120b and 20b models.

1 more reply

ericd1mo ago

How're you fitting a model made for 80 gig cards onto a GPU with 24 gigs at full quant?

2 more replies

gfiorav1mo ago

I think Hotz basically created super specific software for the gpus that throws away anything that doesn't contribute to inference (not turing complete, for example).

zozbot2341mo ago

> And there's no room for kv, so you'll OOM around 4k of context.

tcdent1mo ago

Not worth it. It is a very significant performance hit.

zozbot2341mo ago

> With that said, people are trying to extend VRAM into system RAM or even NVMe storage

bastawhiz1mo ago

ranger_danger1mo ago

I know llama.cpp can, it certainly improved performance on my RAM-starved GPU.

Aurornis1mo ago

> There's no way the red v2 is doing anything with a 120b parameter model.

I don't see the 120B claim on the page itself. Unless the page has been edited, I think it's something the submitter added.

I agree, though. The only way you're running 120B models on that device is either extreme quantization or by offloading layers to the CPU. Neither will be a good experience.

These aren't a good value buy unless you compare them to fully supported offerings from the big players.

ottah1mo ago

packetlost1mo ago

This does not match my experience with 120B~ models. I run Qwen3.5 122b A10B on about 80GB of vRAM just fine.

bastawhiz1mo ago

sosodev1mo ago

ivraatiems1mo ago

There's some irony in the fact that this website reads as extremely NOT AI-generated, very human in the way it's designed and the tone of its writing.

One thought: The ability to conveniently get this onto a 240v circuit would be nice. Having to find two different 120v circuits to plug this into will be a pain for many folks.

solarkraft1mo ago

I find that the most respected writing about AI has very few signs of being written by AI. I'm guessing that's because people in the space are very sensitive to the signs and signal vs. noise.

rimeice1mo ago

Aperocky1mo ago

I found it useful to preface with

* this section written by me typing on keyboard *

* this section produced by AI *

jolmg1mo ago

> attach 10x length worth of AI appendix that would be helpful indexing and references.

2 more replies

wat100001mo ago

If you’re spending $65,000 on this thing, needing two circuits seems like a minor problem

ycui19861mo ago

they could had gone with the Max-Q version RTX PRO 6000 and only require 120V circuit. 10% performance hit, but half the power.

fundamentally, looks like they are shipping consumer off-the-shelf hardwares in a custom box.

ericd1mo ago

Or could be the server edition 6000s that just have a heatsink and rely on the case to drive air through them, those are 600W cards.

ivraatiems1mo ago

The $12,000 one also requires it.

wat100001mo ago

The specs show that it only has one PSU. The docs just say that it has 2 and thus needs two circuits, but I’d guess that was meant to be for the more expensive one.

knollimar1mo ago

Easier to get two circuits than rewire a breaker in an office you might be renting, no?

(I work for an electrical contractor so my sense of ease might be overcorrecting)

1 more reply

isatty1mo ago

Surprisingly affordable but I’m not really interested in the 9070XT.

If it shipped with like 4090+ (for a higher price) it’d be more tempting.

2 more replies

jofzar1mo ago

Good? That's what I want out of all websites. I don't want to read what an AI believes is the best thing for a website, I want to know the honest truth.

agnishom1mo ago

I don't view this as irony. This seems like good sense in understanding when AI usage will make things better and when it will not.

Lerc1mo ago

I am a little surprised that they openly solicit code contributions with "Invest with your PRs" but don't have any statement on AI contributions.

Maybe the volume for them is ok that well-intentioned but poor quality PRs can be politely(or otherwise, culture depending) disregarded and the method of generation is not important.

KeplerBoy1mo ago

Tinygrad sure shared a few opinions on AI PRs on Twitter. I believe the gist was "we have Claude code as well, if that's all you bring don't bother".

all21mo ago

That's a pretty excellent take, IMO. Just an undirected AI model doesn't do much, especially when the core team has time with the code, domain expertise, _and_ Claude.

cyanydeez1mo ago

Then you could focus fire, like the script kiddies did with DDoS in the old days on fixing whatever preferred issues you have.

adrianwaj1mo ago

"locally-trained models for everyone"

Wouldn't there be a massive duplication of effort in that case? It'll be interesting to see how the costs play out. There are security benefits to think about as well in keeping things local-first.

all21mo ago

There are multiple efforts for 'folding at home' but for AI models at this point. I get the impression that we will see a frontier model released this year built on a system like this.

nutjob21mo ago

3200W at ~240V is ~15A, that's just a regular household socket, at least in Europe. I imagine 240V sockets in the US are at least 15A.

No need for separate circuits, just use a double adapter.

kube-system1mo ago

When you’re dealing with this kind of power it’s easier just to colocate where you’ll typically get two separate feeds of 208v

harvey91mo ago

If I'm spending at least 12k USD on the machine then doing some electrical works to accommodate it is not a big deal.

trollbridge1mo ago

A typical U.S. 240V circuit is actually just two 120V circuits. Fairly trivial to rewire for that.

Salgat1mo ago

It's more accurate to say that the typical 120V circuit is just a 240V source with the neutral tapped into the midpoint of the transformer winding.

reactordev1mo ago

This. It definitely comes in at a higher voltage.

1 more reply

jcgrillo1mo ago

amluto1mo ago

I assume the device has two separate PSUs, each of which accepts 120-240V, and neither of which will backfeed its supply.

HWR_141mo ago

They make connected circuit breakers for this use case, where one tripping automatically trips both.

ycui19861mo ago

i am guessing, without any proof, that, when one breaker fails the server lose it all, or loose two GPUs, depending on whether one connected to the cpu side failed.

1 more reply

projektfu1mo ago

In Europe, you could plug the two power supplies into an appropriately sized 240V circuit.

In an apartment you can't rewire, you could set it up in your kitchen, which in the modern US code should have two separate 20A circuits. You will need to put it to sleep while you use appliances.

razingeden1mo ago

A US circuit is.

But this is re: European 240/250 which is 240 between its load and neutral

I’d say don’t energize either systems ground plane, but , really, don’t do this in EU

0xbadcafebee1mo ago

briandw1mo ago

Correct me if I’m wrong, but doubling the volts doesn't change the amps, it doubles the watts. Watts = V*A.

2 more replies

fc417fc8021mo ago

I think you might've misread GP. (Or maybe I did?)

Of course you do have to run a new wire rather than repurpose what's already in the wall since you need the entire circuit to yourself. So I think it's not as trivial as he's making out.

But then at that wattage you'll also want to punch an exhaust fan in for waste heat so it's not like you won't already be making some modifications.

1 more reply

doubled1121mo ago

I’ve actually had half of my dryer outlet fail when half of the breaker failed.

Can confirm.

amluto1mo ago

Sometimes. 240V circuits may or may not have a neutral.

aiiizzz1mo ago

Why is hn so obsessed Scott whether something is _written_ by ai or not? Who cares? Judge content, not form.

Oh wait, I get it, it's bike shedding.

dddgghhbbfblk1mo ago

imjustmsk1mo ago

vessenes1mo ago

I have no idea who would buy this. Maybe if you think Vera Rubin is three years out? But NV ships, man, they are shipping.

kulahan1mo ago

h14h1mo ago

Have to imagine each tinybox is targeting different tiers of startups trying to fine-tune/RL their way to custom models for narrow use-cases.

zozbot2341mo ago

> The exabox is interesting.

Can it run Crysis?

dist-epoch1mo ago

Yes, it can generate Crysis with diffusion models at 60 fps.

WithinReason1mo ago

Only gamers understand that reference

-- Jensen Huang

zargon1mo ago

*Only gamers know that joke.

1 more reply

bastawhiz1mo ago

Probably, the rdna5 can do graphics. But it would be a huge waste, since you could probably only use one of the 720 GPUs

paxys1mo ago

nine_k1mo ago

Would a hedge fund that does not want to trust to a public AI cloud just buy chassis, mobos, GPUs, etc, and build an equivalent themselves? I suspect they value their time differently.

paxys1mo ago

Why do you think a hedge fund can't hire a couple of IT guys? Most of the larger ones have technical operations that would put big tech to shame.

ViscountPenguin1mo ago

Medium sized hedge funds are a good portion of the market, and only really want to hire just enough tech people to keep the quant pipelines running.

p1esk1mo ago

They wouldn’t build anything - they would order from Dell or Supermicro.

mihaaly1mo ago

We may be surprised how illefficient companies are in organizing the creation of sophisticated things (including processes) for themselves, to use (so for the cost center column).

Larger organiztaion can use these kind of things the most. Even if they don't do that.

qubex1mo ago

They’re kickstarting a TINY device that is pocketable and aimed at consumers. I’ve backed it (full disclosure).

jgrizou1mo ago

https://www.kickstarter.com/projects/tiinyai/tiiny-ai-pocket...

griffinmb1mo ago

This is not the same company. The OP Tiny Corp accused them of Trademark infringement on Twitter, due to exactly this kind of misconception.

qubex1mo ago

I might’ve been conned: https://bay41.com/posts/tiiny-ai-pocket-lab-review/

1 more reply

alexfromapex1mo ago

Thaxll1mo ago

M3 max tflops is tiny compared to the 12k box. It's not even comparable.

davej1mo ago

If you compare tokens/kWh efficiency then my math has Mac Studio being about 1.5x more efficient.

zozbot2341mo ago

segmondy1mo ago

it's for fools. i bought 160gb of vram for $1000 last year. 96gb of p40 VRAM can be had for under $1000. And it will run gpt-oss-120b Q8 at probably 30tk/sec

timschmidt1mo ago

segmondy1mo ago

99% of interest is in inference. If you want to fine-tune a model, just rent the best gpu in the cloud. It's often cheaper and faster.

1 more reply

siliconc0w1mo ago

Tinybox is cool but I think the market is maybe looking more for a turn-key explicit promise of some level of intelligence @ a certain Tok/s like "Kimi 2.5 at 50Tok/s".

roarcher1mo ago

codemog1mo ago

I haven’t seen tinygrad used for any mainstream production project or thing of value, yet.

Besides a lot of self congratulatory pats on the back for how elegant it is. Honestly, when I read it, it looked confusing as all the other ML libraries. Not actually simple like Karpathy’s stuff.

All that to say, I do really want it to succeed. They should probably hire some practical engineers and not just guys and gals congratulating themselves how elegant and awesome they are.

jen729w1mo ago

Your framing of this section is misleading. On the site it's preceded by a FAQ-style 'question':

> Can you fill out this supplier onboarding form?

That's very important context, as anyone who has been asked to fill out a supplier onboarding form (hi) will attest.

roarcher1mo ago

Filling out an onboarding form is an example of what he's not willing to do, not the only thing he isn't willing to do.

> we don't offer any customization to the box or ordering process

awesomeMilou1mo ago

Then if they succeed, I guess you're going to see a different process for the first time in your life.

On a website where we frequently talk about disruptive business models, this whole attitude kinda stinks.

2 more replies

phrotoma1mo ago

> arrogant to the point of being almost hostile

First encounter with geohot eh?

crossroadsguy1mo ago

What does this mean? Is it some reference to different temperaments across geographies? Or some Internet slang?

FergusArgyll1mo ago

https://en.wikipedia.org/wiki/George_Hotz

wmf1mo ago

He's not actually selling the exabox yet. It sounds like he put up a hypothetical config to see if anyone is interested.

kube-system1mo ago

The specs for the “exabox” scream “this is a joke” to me.

> 20,000 lbs

> concrete slab

Huge-scale IT systems are typically delivered in one or more 42/44u cabinets, and are designed to be installed on raised floors.

0xbadcafebee1mo ago

It's a shipping container. Look at the dimensions. They say concrete slab probably half as a joke, half because building code would require it to consider it a non-temporary structure.

wmf1mo ago

It's a shipping container that you install outdoors.

kube-system1mo ago

Are you referring to the images of branded shipping containers on their Twitter page that have visible Gemini watermarks … and jokes in the comments about AI trailer parks?

1 more reply

roarcher1mo ago

wmf1mo ago

Nvidia drivers are better than AMD. It's not really something they have control over. Geohot is definitely obsessed with bitching about driver bugs though.

1 more reply

kube-system1mo ago

I took that as a dig against AMD vs Nvidia driver quality.

zekrioca1mo ago

I guess it is called ‘honesty’.

HWR_141mo ago

There isn't a $10MM device right now, just $64M and under. I doubt the order process will remain the same in 12 months when the $10MM device becomes available

jrflowers1mo ago

I imagine that the FAQ might get updated when there’s actually a $10M machine for sale

roarcher1mo ago

Maybe. Frankly I'd be very surprised if any business ordered a $65k machine that way either.

jrflowers1mo ago

Yeah it’s a little odd. Maybe they are meant to be really really cool toys? People regularly spend more than $65k on things like cars to show off, so it could be like that.

I have no use for these but I might buy one anyway if I won the lottery. ¯\_(ツ)_/¯

Havoc1mo ago

> arrogant to the point of being almost hostile.

The YouTube rap video of geohotz telling Sony lawyers suing him to blow him is still up.

His style of dealing with corporate matters is certainly unconventional

lofaszvanitt1mo ago

Well, at least he had the power that average joes don't have. And he used it well.

hmokiguess1mo ago

Is this like the new equivalent of crypto mining? I remember the early days when they would sell hardware for farming crypto, now it’s AI?

latchkey1mo ago

Kind of yes, except there is no block reward.

barnabee1mo ago

The block reward is firing humans and collecting ad revenue for slop

ekropotin1mo ago

IDK, I feel it’s quite overpriced, even with the current component prices.

I almost sure it’s possible to custom build a machine as powerful as their red v2 within 9k budget. And have a lot of fun along the way.

lostmsu1mo ago

AMD now has 32 GiB Radeon AI Pro 9700. 4 of these (just under 2k each) would put you at 128 GiB VRAM

ekropotin1mo ago

VRAM is not everything - GPU cores also matter (a lot) for inference

cyanydeez1mo ago

So, context is probably more $/programming worth than inference speed.

lostmsu1mo ago

4x Radeon will have significantly more GPU power than say Mac Studio or DGX Spark.

mellosouls1mo ago

Where is the 120B documented? This seems to be an editorialized title.

Edit: found a third party referencing the claim but it doesn't belong in the title here I think:

Meet the World’s Smallest ‘Supercomputer’ from Tiiny AI; A Machine Bold Enough to Run 120B AI Models Right in the Palm of Your Hand

https://wccftech.com/meet-the-worlds-smallest-supercomputer-...

Aurornis1mo ago

That third party link is from a different company (Tiiny with an extra i)

Now I'm wondering if the HN title was submitted by some AI bot that couldn't tell the difference.

mellosouls1mo ago

Ha, good catch, I googled for Tinybox 120B and clearly didn't read the article beyond the seeming match.

adrianwaj1mo ago

operatingthetan1mo ago

The incremental price increases between products is funny.

$12,000, $65,000, $10,000,000.

znpy1mo ago

I was more worried by the 600kW power requirement... that's 200 houses at full load (3kw) in southern europe... which likely means 400 houses at half load.

the town near my hometown has 650 – 800 houses (according to chatgpt).

crazy.

nine_k1mo ago

Or it's two 300kW fast EV chargers working together.

A typical home just consumes rather little energy, now that LED lighting and heat pump cooling / heating became the norm.

delusional1mo ago

paganel1mo ago

> and heat pump cooling / heating became the norm.

znpy1mo ago

> now that LED lighting and heat pump cooling / heating became the norm.

My brother in Christ, you vastly overestimate southern europe

1 more reply

nutjob21mo ago

> at full load (3kw)

Do you live in a deprived rural village in a very poor country? Because you can't even run a heater and the oven with 3kW.

znpy1mo ago

No it’s quite the norm actually.

Most power contracts give you 3 kwh power supply for residential home. That’s the standard.

Bumping to 4.5 or 6kwh must be required explicitly and costs and extra on the base power supply bill

dist-epoch1mo ago

Your hometown also has public lightning, water pumps, and probably some other stuff.

ericd1mo ago

That’s surprising, 200 amp 240v service is pretty common in the US.

sudo_cowsay1mo ago

I mean the difference in performance is quite big too. However, the 10,000,000 is a little bit too much (imo).

mmoustafa1mo ago

I would love to see real-life tokens/sec values advertised for one or various specific open source models.

hpcjoe1mo ago

Look for llmfit on github. This will help with that analysis. I've found it reasonably accurate. If you have Ollama already installed, it can download the relevant models directly.

atwrk1mo ago

For reference, 12k gets you at least 4 Strix Halo boxes each running GPT-OSS-120B at ~50tok/s.

adi_kurian1mo ago

https://en.wikipedia.org/wiki/Decoy_effect

mciancia1mo ago

With 6 GPUs you have to deal with risers, pcie retimers, dual PSUs and custom case for so value proposition there was much better IMO

wongarsu1mo ago

Sound like solid prebuilt with well balanced components and a pretty case

Not revolutionary in any way, but nice. Unless I'm missing something here?

eurekin1mo ago

It's pretty close to what people have been frankenbuilding on r/LocaLLaMa... It's nice to have a prebuild option.

speedgoose1mo ago

You could also order such configurations from a classic server reseller as far as I know. The case is a bit original there.

nextlevelwizard1mo ago

Tiny boxes are already several years old IIRC

llbbdd1mo ago

If you wanted a box built by geohot, most recently known for signing on to Elons Twitter and then bailing, it's for you

asadm1mo ago

actually known for comma.ai

comrade12341mo ago

2 more replies

SmartestUnknown1mo ago

Regarding 2x faster than pytorch being a condition for tinygrad to come out of alpha:

If they mean more esoteric stuff that requires writing custom kernels to get good performance out of the chips, then that's a different issue.

ks20481mo ago

"... and likely the best performance/$".

"likely" doesn't inspire much confidence. Surely, they have those numbers, and if it was, they'd publicize the comparisons.

gymbeaux1mo ago

mayukh1mo ago

What’s the most effective ~$5k setup today? Interested in what people are actually running.

BobbyJo1mo ago

emidoots1mo ago

At $7.2k + tax:

* RAM - $1500 - Crucial Pro 128GB Kit (2x64GB) DDR5 RAM, 5600MHz CP2K64G56C46U5, up to 4 sticks for 128GB or 256GB, Amazon

* GPU - $4700 - RTX Pro 5000 48GB, Microcenter

* CPU/Mobo bundle - $1100 - AMD Ryzen 7 9800X3D, MSI X870E-P Pro, ditch the 32GB RAM, Microcenter

* Case - $220, Hyte Y70, Microcenter

* Cooler - $155, Arctic Cooling Liquid Freezer III Pro, top-mount it, Microcenter

* PSU - $180, RM1000x, Microcenter

* SSD - $400 - Samsung 990 pRO 2TB gen 4 NVMe M.2

* Fans - $100 - 6x 120mm fans, 1x 140mm fan, of your choice

Look into models like Qwen 3.5

ac291mo ago

> RAM - $1500 - Crucial Pro 128GB Kit (2x64GB) DDR5 RAM

I knew prices went up, but that's wild. I bought 64GB (2x32) of RAM a year ago for $90.

cmxch1mo ago

Surprised to see X3D given the reports of failures. I’ve opted for a regular 9900x and X670E-E just to have a bit more assurance.

aurareturn1mo ago

$7.2k just to run at best Qwen3.5-35B-A3B doesn't seem worth it at all.

This is certainly not the most effective use of $7k for running local LLMs.

The answer is a 16" M5 Max 128GB for $5k. You can run much bigger models than your setup while being an awesome portable machine for everything else.

emidoots1mo ago

Performance (tok/s and PP) or quality (model size)? Pick one.

In terms of GPU memory bandwidth (models fitting in the ~48GB of RTX 5000 Pro card), the RTX card I described above has over 2x the bandwidth of an M5 Max.

If leveraging system RAM (the 128GB-256GB outside the GPU) to run larger models, then the memory bandwidth is ~6x slower than M5 Max.

For models fitting in the ~48GB RTX memory, like dense Qwen3.5 27B models, the RTX will be 2-4x faster than M5 Max. For models that don't fit in the 48GB RTX memory, the M5 Max will be 5-20x faster.

Also worth considering future upgrades: Do you plan to throw away the machine in a few years, or pick up multiple used RTX 6000 Pro cards when people start ditching them?

bensyverson1mo ago

EliasWatson1mo ago

NickJLange1mo ago

It's $4.7K now, darn inflation!

https://marketplace.nvidia.com/en-us/enterprise/personal-ai-...

A small joke at this weeks GTC was the "BOGOD" discount was to sell them at $4K each...

oofbey1mo ago

BobbyJo1mo ago

Internet seems to think the SW support for those is bad, and that strix halo boxes are better ROI.

oofbey1mo ago

Meh. DGX is Arm and CUDA. Strix is X86 and ROCm. Cuda has better support than ROCm . And x86 has better support than Arm.

2 more replies

ekropotin1mo ago

I’m pretty curious to see any benchmarks on inference on VRAM vs UM.

oofbey1mo ago

So for an LLM inference is relatively slow because of that bandwidth, but you can load much bigger smarter models than you could on any consumer GPU.

banana_giraffe1mo ago

A quick benchmark using float32 copies using torch cuda->cuda copies, comparing some random machines:

    Raptor Lake + 5080: 380.63 GB/s
    Raptor Lake (CPU for reference): 20.41 GB/s
    GB10 (DGX Spark): 116.14 GB/s
    GH200: 1697.39 GB/s

This is a "eh, it works" benchmarks, but should give you a feel for the relative performance of the different systems.

In practice, this means I can get something like 55 tokens a sec running a larger model like gpt-oss-120b-Q8_0 on the DGX Spark.

1 more reply

borissk1mo ago

Can even network 4 of these together, using a pretty cheap InfiniBand switch. There is a YouTube video of a guy building and benchmarking such setup.

For 5K one can get a desktop PC with RTX 5090, that has 3x more compute, but 4x less VRAM - so depending on the workload may be a better option.

ekropotin1mo ago

VRAM vs UM is not exactly apples to apples comparison.

cco1mo ago

borissk1mo ago

With $5k you have to make compromises. Which compromises you are willing to make depends on what you want to do - and so there will be different optimal setup.

kristopolous1mo ago

Machines with the 4xx chips are coming next month so maybe wait a week or two.

It's soldered LPDDR5X with amd strix halo ... sglang and llama.cpp can do that pretty well these days. And it's, you know, half the price and you're not locked into the Nvidia ecosystem

ejpir1mo ago

unfortunately the bigger models are pretty slow in token speed. The memory is just not that fast.

You can check what each model does on AMD Strix halo here:

https://kyuz0.github.io/amd-strix-halo-toolboxes/

Tepix1mo ago

4xx chips are less capable than the 395

zozbot2341mo ago

> What’s the most effective ~$5k setup today?

Mac Studio or Mac Mini, depending on which gives you the highest amount of unified memory for ~$5k.

the_arun1mo ago

ilaksh1mo ago

As far as boxes, I wish that there were more MI355X available for normal hourly rental. Or any.

h14h1mo ago

Would be very curious how RL benchmarks shake out vs M5 Pro/Max.

alasdair_1mo ago

I just don’t believe that this can run inference on a 120 billion parameter model at actually useful speeds.

mnkyprskbd1mo ago

Look at the GPU and RAM spec; 120b seems workable.

Aurornis1mo ago

For the red v2?

120B could run, but I wouldn't want to be the person who had to use it for anything.

To be fair, the 120B claim doesn't appear on the webpage. I don't know where it came from, other than the person who submitted this to HN

mnkyprskbd1mo ago

It is more than fair, also, you're comparing your 5k devices to 12k and more importantly 65k and >10m devices.

1 more reply

Aissen1mo ago

himata41131mo ago

exabox reads as if it was making a joke of something or someone. if it's real then it's really interesting!

jmspring1mo ago

Tinygrad devices are interesting, I wish I have screen captures - but their prices have gone up and some specs like RAM have gone down.

A single box with those specs without having to build/configure (the red and green) - I could see being useful if you had $ and not time to build/configure/etc yourself.

renewiltord1mo ago

zahirbmirza1mo ago

10 mil today... 1k in 10 years. Are OpenAI and Anthropic overvalued?

Gigachad1mo ago

DeathArrow1mo ago

Why do I get the impression that I get more bang for the buck by going through OpenRouter? Of course, not anyone can do that and there are security and other concerns.

triwats1mo ago

This is cool, I'll add these as desktops to https://flopper.io!

How do you test/generate these numbers?

p0w3n3d1mo ago

Quite expensive little bastard. I wonder how much does it make sense to invest in a such device, if you can get $0.40/mtok from hyperbolic for example

sowbug1mo ago

If you're OK letting them train on, and maybe keep, your data, then it's hard to beat cloud prices vs. local.

heinternets1mo ago

exabox -

720x RDNA5 AT0 XL 25,920 GB VRAM 23,040 GB System RAM

~ $10 Million

Who is the target market here?

LorenDB1mo ago

I can't find sources but I think they are building it for Comma.ai (geohot's other company) so that Comma can scale up their training datacenter.

orochimaaru1mo ago

And... what about 20k lbs and 1360 cubic feet screams "tiny" :)

smoyer1mo ago

That is very close to a half-length shipping container.

spiderfarmer1mo ago

VC funded startups

mayukh1mo ago

A non-trivial share of this market won’t show up in public data. That makes most estimates unreliable by default

dist-epoch1mo ago

A company which doesn't want the big LLM providers to see it's prompts or data - military, health, finance, research

Buttons8401mo ago

Oh, this is geohots product?

He's an interesting guy. Seems to be one who does things the way he thinks is right, regardless of corporate profits.

andai1mo ago

Can someone explain the exabox? They say it "functions as a single GPU". Is there anything like that currently existing?

wmf1mo ago

An NVL72 rack or Helios rack also "functions as a single GPU".

progbits1mo ago

TPU pods

saidnooneever1mo ago

the more expensive box has atleast real processing gpus but afaik also not very popular ones, this one seems maybe more fair priced (there seems a big difference in bang for buck between these???).

the third one suggested looks like a joke.

dont get me wrong, this seems like a really cool idea. But i dont see it taking off as the prices are corporate but the product seems more home use.

maybe in time they will find a better balance, i do respect the fact that the component market now is sour as hell and making good products with stable prices is pretty much i possible.

id love one of these machines someday, maybe when i am less poor, or when they are xD.

(love the styling of everything, this is the most critical i could be from a dumb consumer perspective, which i totally am btw.)

sudo_cowsay1mo ago

I always wonder about these expensive products: Does the company make them once its ordered or do they just make them beforehand?

cyanydeez1mo ago

In this case, they're taking wire transfers, so they're definitely building them once they get the cash.

wmf1mo ago

He builds a batch every few months.

operatingthetan1mo ago

Are we at the point where 2x 9070XT's are a viable LLM platform? (I know this has 4, just wondering for myself).

oceanplexian1mo ago

cyanydeez1mo ago

I'd go with strix halo if you're looking at that old of tech.

the latest AMD GPUs are RX 9070 XT w/32GB each

kylehotchkiss1mo ago

Meanwhile M-series processors and Qwen are racing to do the same thing for a much more approachable price.

orliesaurus1mo ago

I wonder if this is frontpage right now because of the other tiiny (the names are similar) video that went viral ... which turns out wasn't an actual product by the tinygrad linked in this post[1]

[1]https://x.com/ShriKaranHanda/status/2035284883384553953

droidjj1mo ago

Adding this to my list of ~beautifully~ designed things to buy when I win the lottery.

arunakt1mo ago

Great idea, can you publish the power consumption units for this device

agnishom1mo ago

Who is the intended customer for this product? I am genuinely curious.

moscoe1mo ago

Anyone who wants to run/train/finetune a local llm.

“Not your weights, not your brain.”

jgarzik1mo ago

Skeptical of their engineering, with replies to questions like this: https://x.com/jgarzik/status/2031312666036146460?s=20

_2d301mo ago

They answered your question with a pretty specific uptime target. Calling it a dodge and then moving the goalposts with a new question as your follow up doesn’t speak to you acting in good faith.

scratchyone1mo ago

tbh they really didn't, tinygrad's was clearly a joke response. they were not providing a real uptime target.

potamic1mo ago

Can't see replies, what did they say?

Moduke1mo ago

https://xcancel.com/jgarzik/status/2031312666036146460?s=20

jauntywundrkind1mo ago

kelvinjps101mo ago

He was always defending democracy and freedom before, and that was his argument for the local AI thing? What changed?

stale20021mo ago

Geohotz's politics are fairly straightforward once you understand his background. Geohotz is the prodigy child who, at the age of ~16 accomplished amazing technical feats on his own.

They do this both because of their genius/hardwork, and also because they are willing to break the rules that are set forth by this bottom 99.9%.

Thats the "authoritarian" part. And then the "libertarian" part is that they are going about this imposition of their will on the world by doing it all themselves, through their own hard work.

A distain for democracy follows directly from things like this. You dont wait for people to vote to allow you to change the world by inventing something new. You just do and watch the results.

overfeed1mo ago

I think all these wildly successful neo-feudalists get increasingly emboldened the more they get away with bigger and bigger social infractions.

SilverElfin1mo ago

stale20021mo ago

> What makes it “Libertarianism” still?

Yes, you can make some philosophical point about how if corporations are powerful enough, how is that in any way different from governments.

But, powerful corporations controlling society, in some sort of fallout style or bioshock style dystopia clearly describes a libertarian dystopia, not a left leaning or even fascist dystopia.

LogicFailsMe1mo ago

Scratch a libertarian and a fascist bleeds libertarianism here, no?

yukIttEft1mo ago

Did he take down the video because of embarrassment or did he fear negative impact on his sales?

fragmede1mo ago

Damn, that's a take.

pencilheads1mo ago

Geohot has always been an arrogant cunt who thinks he's better than everyone else. That blog post is totally on brand.

tadfisher1mo ago

mememememememo1mo ago

Give me token/s for favourite models.

qubex1mo ago

I just backed their TINY on Kickstarter.

rick_dalton1mo ago

That thing is NOT related to tinybox or tinygrad in any way. It is basically copyright infringement. Unless you’re astroturfing here I suggest you get your money back.

qubex1mo ago

Wasn’t astroturfing, I’ll look into it, thanks.

rick_dalton1mo ago

Sorry for even mentioning astroturfing, haha. It’s just because the promotion of the device is based on trying to fool people it was made by tiny corp.

1 more reply

ppap31mo ago

I thought there was a typo in the price

vlovich1231mo ago

Surprising to see this with AMD GPUs considering how George famously threw up his hands as AMD not being worth working with.

embedding-shape1mo ago

Yeah, and labeling AMD "Driver Quality" as "Good" (for comparison, they label nvidia's driver quality as "Great").

lostmsu1mo ago

Things changed. On my new Ryzen Strix Halo laptop I was able to run training experiments with PyTorch on Windows day 1: https://news.ycombinator.com/item?id=46052535

vlovich1231mo ago

DeathArrow1mo ago

I wonder how much has he sold.

raincole1mo ago

How does this thing cool down?

throwatdem123111mo ago

Finally, a computer that should be able to run Monster Hunter Wilds with decent performance.

oofbey1mo ago

It’s not for people to buy. It’s for companies to buy. Compare to salary, and it’s cheap.

aziaziazi1mo ago

> What's the goal of the tiny corp? To accelerate. We will commoditize the petaflop and enable AI for everyone.

I had the same feeling as throwadem when reading this. Your comment clarify what they meant by "everyone"

lostmsu1mo ago

Hm, I compared my salary with $10M and it doesn't feel cheap. I guess skill issue.

throwatdem123111mo ago

What companies are buying this instead of like a Dell server or whatever?

flumpcakes1mo ago

1 more reply

rpastuszak1mo ago

Who is this for?

flykespice1mo ago

"tiny" and it's 20k lbs and cost about 10k...

aabaker991mo ago

Sorry, what? Is this just a scam?

1010081mo ago

Wire transfer has no comission or extra costs associated to it, so I find it very honest.

ejpir1mo ago

man, cmon. a little more effort.

aabaker991mo ago

Sure thing. For those who don’t know, wiring money like this is a good way to lose your money.

https://consumer.ftc.gov/articles/what-know-you-wire-money

metadata1mo ago

Wire transfer is a bank transfer, not money wire to Western Union and like.

1 more reply

insane_dreamer1mo ago

Is this real? Reads like a joke. They sell a $12K machine, a $60K machine, and a $10M machine???

wmf1mo ago

Nvidia has $4K DGX Spark, $120K DGX Station, $500K DGX, and $7M NVL72.

fhn1mo ago

"but if you haven't contributed to tinygrad your application won't be considered" this company expects people to work for free?

paxys1mo ago

> See our bounty page to judge if you might be a good fit. Bounties pay you while judging that fit.

Literally the line above that

roarcher1mo ago

They MIGHT pay you IF you're a fit. They're bounties, i.e. spec work. They also pay a max of $1000, most of them significantly less. You can see more info at the link in that line:

> All bounties paid out at my (geohot) discretion. Code must be clean and maintainable without serious hacks.

j / k navigate · click thread line to collapse