This post is about the high-level motivations and long-term vision for https://arch.dev; please let me know if any of it resonates or if you think I’m totally off the mark :) We’ve also got a post that goes into more detail on the Arch product itself: https://www.arch.dev/blog/announcing-arch-the-data-backend-f...
Practically, people who succeed at this get it working end-to-end with whatever compromises it entailed, they might tell anyone that they never would have done it that way had they known how it would turn out, yet, they have a bird in the hand.
There's a problem of scale mismatch.
My RSS reader needs the toolbox of
https://scikit-learn.org/stable/model_selection.html#model-s...
to make reliable scripts that can rebuild a model when the data changes. The version built into scikit-learn has the features I need, other ones don't. scikit-learn is great for problems of a certain size that take, say, 10 minutes to run.
That scale turns out to be appropriate for very fast prototyping of systems need about a human-week of judgements to light up, that can be updated daily, etc.
Someone is going to insist on using slower models that take two hours to train (wrapped up in a model selection process), where you worry the machine might crash, and have to take a "distributed systems" approach that adds a terrible overhead for jobs that don't need it. If I liked the model selection story I could probably live with that but so far I don't.