> Snowflake has support for custom Javascript UDFs and a lot of built in features (you can do absurd things with window functions). I also found it much faster than Spark.
UDF support isn't really the same, to be honest. You're still prisoner of the select from pattern. Don't get me wrong, SQL is wonderful where it works, but it doesn't work for everything that I need.
I completely agree that it's faster than Spark, but it's also super-expensive and more limited. I suspect it would probably be cheaper to run a managed Spark cluster vs Snowflake and just eat the performance hit by scaling up.
Tensorflow, PyTorch (not sure if Ray is needed) and Mxnet all support distributed training across CPUs/GPUs in a single machine or multiple machines. So does XGBoost if you don't want deep learning.
I forgot about Xgboost, but I'm a big fan of unsupervised methods (as input to supervised methods, mostly) and Spark has a bunch of these. I haven't ever tried to do it, but based on my experience of running deep learning frameworks and distributed ML, I suspect the combination of both to be exponentially more annoying ;) (And i deal mostly with structured data, so it doesn't buy me as much).
> You can then run them with KubeFlow or on whatever platform your SaaS provider has (GCP AI Platform, AWS Sagemaker, etc.).
Do people really find these tools useful? Again, I'm not really sure what SageMaker (for example) buys me on AWS, and their pricing structure is so opaque that I'm hesitant to even invest time in it.