undefined | Better HN

0 pointsanon2911y ago0 comments

> That’s why tinycorp is betting on a simple ML framework (tinygrad, which they develop and make available open source) whose promise is, due to the few operations needed by the framework: it’ll be very easy to get this software to run on a (eg your) new chip and then you can run ML workloads.

This sounds easy in theory, but in reality, based on current models, the implementations are often tuned to make them work fast on the chip. As an engineer in the ML compiler space, I think this idea of just using small primitives, which comes from the compiler / bytecode world, is not going to yield acceptable performance.

0 comments

zozbot2341y ago

Often enough, hardware-specific optimizations can be performed automatically by the compiler. On the flip side, depending on a small set of general-purpose primitives makes it easier to apply hardware-agnostic optimization passes to the model architecture. There are many efforts that are ultimately going in this direction, from Google's Tensorflow to the community project Aesara/PyTensor (née Theano) to the MLIR intermediate representation from the LLVM folks.

anon291OP1y ago

I'm a compiler engineer at a GPU company, and while tiny grad kernels might be made more performant by the JIT compiler underlying every GPU chips stack, oftentimes, a much bigger picture is needed to properly optimize all the chip's resources. The direction that companies like NVIDIA et al are going in involves whole model optimization, so I really don't see how tiny grad can be competitive here. I see it most useful in embedded, but Hotz is trying to make it a thing for training. Good luck.

> There are many efforts that are ultimately going in this direction, from Google's Tensorflow to the community project Aesara/PyTensor (née Theano) to the MLIR intermediate representation from the LLVM folks.

The various GPU companies (AMD, NVIDIA, Intel) are some of the largest contributors to MLIR, so saying that they're going in the direction of standardization is not wholly true. They're using MLIR as a way to share optimizations (really to stay at the cutting edge), but, unlike tiny grad, MLIR has a much higher level overview of the whole computation and the company's backends will thus be able to optimize over the whole model.

If tiny grad were focused on MLIR's ecosystem I'd say they had a fighting chance of getting NVIDIA-like performance, but they're off doing their own thing.