What does bother me here, much as with many other ML orchestration frameworks, is that examples and even docs don't tell me how you play with serious situations in terms of scale.
In fact, what you describe as "a real ML pipeline" is - in my view - not a good example of a real ML pipeline because, for instance, it doesn't tell me how I'd solve the issue of scaling out multi-node training when the data doesn't fit in a neat and standard PyTorch Map Dataset that loads some csv from the web.
I mean, maybe it's because I am dumb, and maybe the majority of people do in fact train single-node models on MNIST data, but I'd appreciate some more information on how you deal with more diverse sources in your pipeline. Will I have to squeeze cloud provider X's data solutions (which I am obligated to use by the client, say) into submission until they fit your examples? Because these days I get the feeling that claims of "easily orchestrating your ML pipeline" often amount to that. I see you started some of these topics in the "integrations" part of the docs. However, these pages do not seem to exist yet (for me). Furthermore, the "roadmap" link goes to a 404.
For me, these are the important topics. I can get a nice simple ML pipeline easily on Azure, AWS or Databricks if I am willing to conform to whatever they are doing already. It seems you are in a position to tackle more challenging problems, so that would be nice to show.
Cool product, and good luck!