undefined | Better HN

0 pointsadgjlsfhk14y ago0 comments

One of the really nice things with Julia is some of the ecosystem needs disappear. Python needs a lot of ecosystem because none of the packages work together, and the language is slow so you have to make sure you are doing as much work as possible outside the language itself. To answer your question more specifically:

Numpy -> Array + broadcasting (both in Julia Base)

pytoch/tf -> Flux.jl (package)

batch/stream processing -> you don't need it as much, but things like OnlineStats exist. Also Base has multithreaded and distributed computing. Spark in particular is one where it lets you use a cluster of 100 computers to be as fast as 1 computer running good code.

pyarrow -> Arrow.jl (there's also really good packages for JSON, CSV, HD5 and a bunch of others)

Let me know if you have any other questions. Always glad to answer!

0 comments

3 comments · 1 top-level

Mehdi22774y ago· 2 in thread

Good spark support would be a good answer for batch/stream processing. I am a little scared of definition of support. Apache beam supports like 5 runners (flink, spark, data flow, etc) but the quality of runner support is extremely inconsistent. I’ve also noticed even for python flink sometimes have very useful operations only in Java with no wrapper. Although honestly having data pipelines in one language and downstream users of the data in a different language works pretty well in my experience so mixing data pipeline languages is somewhat ok.

What’s workflow orchestration choice? That’s main one you didn’t touch. My work area in on an ml training platform and a lot of my work can be described as wrapper work on kubeflow to allow dozens of other ml engineers to manage experiments/workflows. For python the main choices are kubeflow/airflow. Ray kind of but ray workflows are still quite new and missing a lot of useful features. I need some system to run hundreds of ml workflows (one workflow being like 5-10 tasks some short some long) per day and manage there tasks well.

Broader area also includes libraries like weights and biases, bento ml, etc (experimentation management libraries).

In theory you can have workflow manager in one language and workflow code in a different language. Main downside is it makes debugging locally workflows harder (breakpoints are a little sad across most language boundaries), but it is doable and we debated migrating to temporal (Java workflow system) before.

FridgeSeal4y ago

> What’s workflow orchestration choice?

We’ve moved away from language-integrated orchestration entirely at my work: we use Argo Workflows on Kubernetes, so we’re just orchestrating containers and aren’t beholden to language-specific requirements anymore so you can use whatever language/tool you want provided it packs into a container and accepts/returns what the rest of the workflow expects.

adgjlsfhk1OP4y ago

I don't use much in terms of orchestration, so I'm probably not the right person to ask there.

One of the really big potential benefits of Julia is that it lets you remove language barriers which is especially nice if you are doing ML research (or playing around with new model types, etc). Since the ML stack is Julia all the way down to CUDA/BLAS/julia loops, you can really easily inspect or modify everything in your stack.

j / k navigate · click thread line to collapse