> I think the blame here is mostly on the science community which isn't paying much attention to ML tooling, best practices and research
While the science community doesn't have a great track record for quality software engineering, that's an awfully arrogant position.
Most ML tooling sucked, and it's only just getting better in terms of usability. But even then it's very software engineer-y in the worst kind of way, e.g. "Coming soon: PyTorch 1.0 ready for research and production" [0]. Great.
If you're doing research in one field, you don't really want to spend the time to become an expert in another one just to do some analysis. What you want is tools you can reliably (ab)use, like maths. But there often isn't a straight-forward way of getting the uncertainties on values output from many ML constructs.
Yes, the term "statistical learning" has been around since at least 2001. But it isn't widely known/talked about/understood, and most trendy ML "tutorials" gloss over it completely. Maybe this is unfair criticism. After all, most ML applications in software don't require that stricter treatment, and why should somebody playing around with ML be burdened with this rigorousness? At the same time, it's easy to come away from ML thinking "I don't understand this at all, it's a black box, it doesn't do what I need it to".
And we haven't even talked about what a pain reproducibility is in ML.
> instead keeps reinventing the wheel, over and over again.
If people keep reinventing it, maybe the problem isn't the people... yeah, physicists don't write great code (guilty), but ML tooling is full of hype and currently feels a bit Javascript-y.
[0] https://pytorch.org/2018/05/02/road-to-1.0.html