edit: For what it's worth, if you can't see that this language is rude or think it is somehow acceptable for people of a certain caliber to talk this way - you're also probably toxic.
As someone in my early 30's who grew up on message boards/gaming the language he is using is fairly mild. I think we just have very different social norms.
That’s an extremely low bar. Nobody would bat an eye about a random person speaking like this on a gaming Discord or a message board.
Posting public messages as a public figure with an audience to a company is not the same as Call of Duty voice chat.
Personally, I prefer direct language if it gets to the root of the problem quicker. It's more pragmatic. You just have to pick your audience, because some people get offended by it. But the most productive discussions I've had have been arguments where you can both quickly find the holes in each other's positions, and then move forwards from there. As long as no one is taking it personally, this is very effective.
OTOH, I've been in many meetings where people talk around a problem for an hour, never reaching the conflict about what their disagreement actually is. To me, that is much more frustrating than someone risking offending someone by being direct. But it really depends upon the people you work with and the team you have.
> If you want a dataflow graph compiler, build a dataflow graph compiler. > This is not 6 layers of abstraction, it's 3 (and only 2 you have to build).
This is direct and pragmatic. It states the writer's justified true beliefs and opinions as plainly as possible.
> Plz bro one more stack this stack will be good i promise > bro bro bro plz one more make it all back one trade type beat
This is just toxic. The writer is making assumptions about other people's position that he does not (and probably could not) substantiate.
Ironically, sprinkling in toxic comments and back-handed insults in any piece has the effect of making said piece less direct and pragmatic.
Who's the people of a certain caliber? Are there people who can talk like this and others that are better people which shouldn't? What weird thing to say.
Yucky!
Georgie hotsth u r yucky and tocthic!
I wish there was a thousand more geohots than all the mediocre middle-managers at AMD or tenstorrent; or people who have never done anything beyond posting snarky comments in online forums.
Sadly, I think geohot is an example of someone who earned some cred for impressive accomplishments in the past and then tried to cash in that cred over and over again in unrelated future domains.
His brief and very public flame out at Twitter after mysteriously abandoning another project and the bold claims about his AMD work that never really translated to anything have really detracted from whatever past “cred” he built up. I really hope he can find a new niche and succeed, but until then it might be time to lie low on social media and avoid throwing more mud.
One of the lessons I wish I'd learned earlier is that indulging bombastic behavior neither benefits them nor you.
Strong opinions, weakly held are good. Well-reasoned opinions to the contrary are even encouraged. Provided, of course, that they're smart enough to know when to persist and when to disagree and commit.
If you just indulge it, they end up with the engineering equivalent of Nobelitis. And if they're on your team, you end up with more burden than asset no matter how brilliant they are.
With AMD the experience is so poor that you have to save the company from itself if you want to make progress.
The ventures he has started (I can think of tinygrad and comma ai) all seem like half finished tech demos.
His AMD rants were a valuable warning about the quality of their hardware. I wish he'd done that maybe 10 years ago when I was buying AMD cards thinking that they might work with pytorch in a year or so. I knew they had problems but if I'd realised how bad the situation was I'd have held my nose and gone with Nvidia.
The rants weren’t breaking news to anyone who was familiar with PyTorch or adjacent communities. He seized upon a weak moment for AMD to try to launch his own company. Unfortunately he launched his effort with an attack on the company he was effectively trying to partner with, making the entire venture DOA.
It’s too bad, too, because it would have been interesting to see if anything could have been accomplished with a more friendly offer of cooperation. He’s obviously talented as a developer, but effectively going on the attack for the company that forms the foundation of the business you’re trying to build is obviously not going to end well.
He definitely writes in a below-HS level.
Edit: you edited your comment after I told you he made comma.ai.
Bit dishonest, but whatever, I wouldn't describe comma.ai as a "half finished tech demo" but you're allowed to your own opinion about it.
> You aren't going to get better deals on tapeouts/IP than NVIDIA/AMD. You need some advantage.
> If you want a dataflow graph compiler, build a dataflow graph compiler.
Now explain why. Clearly Tenstorrent is happy to build Yet Another Abstraction Layer, so instead of bullying them over it you should at least attempt to actively humiliate them for the approach. You know, produce some manner of evidence that vindicates your position instead of relying on your authority alone. Jim Keller has no reason to take this seriously, even if you're right.
Without any numbers this feels like one cult of personality trying to bait another into a shit-flinging contest as a marketing scheme. We've seen this happen several times before on Hacker News, and it doesn't end up with either side making an Nvidia-killer. This is not a model for productive discourse.
Geohot is abrasive to say the least, and, no, this is not a model for productive discourse(I'll try not to bring in some of his hot takes on the stream because giving them stage is probably also not productive) But I do think he has good taste in SW and he might be right about the number of layers of abstraction.
For context, geohot wrote this live on a twitch stream.
Pretty sure comma is profitable? Not particularly, but for a hardware startup selling multiple iterations and not getting wrecked is a sound start
What I'm saying is, tensorrent couldn't find a more excitable third-party developer if they grew one in a lab. And you know what? I can't make heads or tails out of all their various abstractions. I've tried! I've read the docs, I've read the examples, I've gone to meetups. I think OP is right that "one more abstraction bro" probably doesn't solve the problem.
At a guess, the problem isn't a technical one, it is an organizational one. They don't have anybody to stand in for me, or devs like me (eg dumb people). There is no product leadership on the API design. Just a lot of really brilliant engineers obsessively tuning for their own usecases, unwilling to ever trade-off a hit in performance or expressivity for readability or writeability.
I don't think anyone is seriously training an NN on TT hardware at the moment and I think that's an issue. I think tinygrad works not only because geohot is one hell of an engineer but also because comma dogfoods it. TT's engineers are absolutely brilliant (from reading their commits) but I think they are stretched too thin. Bounties are not gonna work - you can't expect an outsider with no internal access/bandwidth/knowledge to suddenly make e.g. Mixtral work as the issue spans at least across tt-xla/tt-mlir. And to agree with ^ training is a kind of artifact where good CX can only be derived from strong leadership and a leaner view of the stack. NVIDIA accumulated that over the decades and the rest are trying to catch up by aggressive hiring (not to say that hiring is necessary). e.g. Annapurna has a presence on the CMU campus when I was there and has the Anthropic team to test it out.
I'm an incredibly excited third-party developer as I think the pitch appeals a lot to grad students (who do model research) who need to run small experiments within the 13B range and reasonably scale them up to draw the first half of the scaling curve.
I lose too much productivity to abstractions and incomplete e2e support in TT's current shape. I'd love to give it another go in 6 months.
There are clear reasons why a hardware company would use a graph compiler -- they think such an approach is higher performance, and makes tenstorrent look better on price per dollar when compared to competitors (read: nvda).
There is some legitimate criticism of TT here, their hardware is composed or simple blocks that compose into a complex system (5 separate CPUs being programmed per tensix tile, many tiles per chip), and that complexity has to be wrangled in the software stack -- paying that complexity in hardware so there is less of a VLIW model in software might remove a few abstractions.
So much pearl clutching over the “tone”. Oh dear.
I thought this polemic was amusing and am sure comes from a place of genuine concern.
I’m a dev working on torch.compile at meta (previously I worked on ML focused FPGAs) and the approach I would use is build a static graph compiler, use torch.compile (and probably JAX) as graph extraction front-ends and call it a day. I feel like hardware companies don’t know how to handle the flexibility of PyTorch and as a result develop their own APIs which is mistake #1 and virtually makes it impossible to get any market penetration once you head down that path because nobody will ever ever rewrite their models for your hardware when they don’t even know what perf they will get, the risk is just too high. As a result, hardware companies offer inference APIs which hide all of this behind a REST API to basically paper over the lack of generality of the software/hardware interface. This is convenient because then nobody actually knows the perf/$ and they can burn VC money for as long as they want. Whether this is a viable business model or not, we will have to wait until they go public to actually see what their true inference costs are.
To sum it up, start from PyTorch and work your way down to your hardware, this is the only general way if you want to actually sell chips and not just constantly port the model of the day to your hardware.