It had. It's now around parity with pytorch.
And no, it wasn't about a usability tradeoff.
It was about being more general- More general compiler, more general code, more composable code.
Then, the team has been optimizing that and including compiler optimizations in the language that benefit all code. ML type code stressed that in a particular way. Pytorch does ML array heavy stuff as a special case.
Julia will be doing the same, but it's setting the groundwork for domain specific optimizations to be done in package and user space. A different sort of philosophy
It was about being more greedy and setting the groundwork for a more powerful tool in general, at some short term cost.
They could have just wrote a framework that baked in fp 32/64,16 with cuda kernels and tracing and operator overloading computational graphs and gotten more speedup over pytorch (in fact, avalon.jl takes that approach.), with better usability.
But they didn't and now there's a burgeoning ecosystem that does things no other framework can't. It's not quite as marginally beneficial for current vanilla ML because that is stuck in a local optimum, but I think that is going to change: https://www.stochasticlifestyle.com/useful-algorithms-that-a...
In the meantime, places like MIT, moderna, NASA etc are reaping the benefits.