PyTorch could make use of the best API ideas from the other frameworks (also higher-level like Keras). And it was executed well. All these core principles of easy debuggability are indeed very important to win developers. Clean code, understandable code, flexibility, these are all very related to that, or mostly the same thing.
It's easy to get bloated, complex and complicated for a successful framework though. I wonder how PyTorch will look in a few years. I also remember the first TensorFlow releases, where the whole source code was also quite easy to understand. Then TensorFlow added more and more things, and many different types of APIs, starting to deprecate some earlier things, etc. The PyTorch internal code is also already much more complex than it was initially.
One reason JAX is now popular is because it again started with a fresh API. Despite being based on a new kind of idea of code transformations, which seems nice and powerful.
When looking at these developments, I really wonder what the future will look like. It's good to have new ideas and new or improved APIs. It's also good to adapt things for new kinds of hardware (GPUs, TPUs, maybe other neuromorphic hardware later).
Why? I sort of became disillusioned in torch after they abandoned lua.
Our main focus is usability, and one of our secondary focuses is to not look like clowns in the performance department.
So, we try to take more decisions that trade off performance for usability than vice versa.
Installing PyTorch with Poetry is next to impossible. Flux got this right by bundling the GPU drivers. Their installation is also standardized and does not require the weird pip -f flag for CPU only installations.
One question: One of the advantages about having a clean design is that performance is easier to optimize, since the 80%/20% rule of performance becomes much more obvious. How true was this in your experience? Were there any major performance-related design changes or was performance optimization a matter of tuning a few selected functions?
This paragraph sort of surprises me. In my experience if you want to do anything other than calling out to numeric libraries, you can do it in Lua and it will work, or you can do it in Python and suddenly your machine learning pipeline will spend 95% of its time running Python while your GPU idles. So the need to be able to drop down to C is much more severe in Python, and the difficulty of calling out to C is much greater.
While we were based on top of LuaJIT, we couldn't use the JIT for anything because we had to always call into the C library for GPU kernels (and LuaJIT can't JIT through an opaque C call, incase that C call changes the interpreter stack).
Where Python really helps is with its ecosystem. The entire data science and ML ecosystem is in Python.
The difficulty of calling out to C is not much greater in Python, things like PyBind11 make it pretty natural.