Good spark support would be a good answer for batch/stream processing. I am a little scared of definition of support. Apache beam supports like 5 runners (flink, spark, data flow, etc) but the quality of runner support is extremely inconsistent. I’ve also noticed even for python flink sometimes have very useful operations only in Java with no wrapper. Although honestly having data pipelines in one language and downstream users of the data in a different language works pretty well in my experience so mixing data pipeline languages is somewhat ok.
What’s workflow orchestration choice? That’s main one you didn’t touch. My work area in on an ml training platform and a lot of my work can be described as wrapper work on kubeflow to allow dozens of other ml engineers to manage experiments/workflows. For python the main choices are kubeflow/airflow. Ray kind of but ray workflows are still quite new and missing a lot of useful features. I need some system to run hundreds of ml workflows (one workflow being like 5-10 tasks some short some long) per day and manage there tasks well.
Broader area also includes libraries like weights and biases, bento ml, etc (experimentation management libraries).
In theory you can have workflow manager in one language and workflow code in a different language. Main downside is it makes debugging locally workflows harder (breakpoints are a little sad across most language boundaries), but it is doable and we debated migrating to temporal (Java workflow system) before.