But performance is not the only thing - there is also ability to debug issues. For this you still need to dig into Apache core which is in Scala.
This implementation in .Net would be "gateway drug" for moving your production to Scala/JVM.
It happened to me with PySpark - majority or tasks at hand can be solved with PySpark. But digging into the issues and stack traces brought me to Scala internals of Apache Spark. As a result in cases when python specific libraries are not needed and high performance needed I would write Spark programs Scala from the beginning.
FFI, extra debugging layers and lesser tooling integration don't pay off a couple of language feature bullet points.
https://databricks.com/blog/2017/10/30/introducing-vectorize...
It's mainly when you start writing custom UDFs (IOW, fabricating your own lego blocks) that platform interop and the performance of your language of choice become a big deal.
- Mobius is .NET Framework / Mono based and x-plat isn’t great, .NET for Apache Spark is .NET Core / .NET Standard and built with x-plat as a primary concern
- Mobius only targets up to Spark 2.0; while Spark LTS is up to 2.4 now
- .NET for Apache Spark is built to take advantage of .NET Core performance improvements, showing big advantages over Python and R bindings, especially when user defined functions are a major factor
- .NET for Apache Spark is driven by lessons learned and customer demand, including major big data users inside and outside Microsoft
Disclaimer: I know people that worked on this and helped from .NET Foundation side, but the above is my non-official summary from readme's and stuff.
>> Mobius: C# and F# language binding and extensions to Apache Spark, a pre-cursor project to .NET for Apache Spark from the same Microsoft group.