Rust has been eating C++ lunch. Same rapid rise of ecosystem story.
Instead of forcing Python to be a language it isn't it might be more efficient and ultimately the "right choice" to invest the time in Julia.
Julia is great for numerical computing, it needs faster time to plot and more hands in the ecosystem. The former will be solved and the latter seems inevitable to me. Pitch in!
Languages are fun to think about but you don't always need to be concerned with every vocal minority of programmers that like to talk about how their language is better than yours. Sometimes that replacement is better and sometimes those people are wrong. But even when they're right, being marginally better isn't that big of deal, or nearly enough to make for a viable rewrite or change of language.
That said I'd like it if it develops a robust and large ecosystem because I personally like coding in it. It has built-in matrix ops, parallel ops, dynamic dispatch etc that are really nice to work with in the numerical space. Like Matlab but well rounded and fast.
So I admit my comment is less argument and more cheerleading. "Hey folks let's make this the case so us numerical people can have a slightly improved experience".
In the grand scheme of things this is as noble or ignoble as any.
Python more like #100 in terms of speed, #100 in terms of correctness, #100 in terms of sound abstractions, #10 in terms of readability for large programs. Its real strengths are quick hacks and a decent C-API.
The community is smug, conceited, does not value correctness and in general is intoxicated by Python's undeserved success. Many posers and incompetent people.
I personally don't like Python that much because every library does things differently and sometimes it feels like learning a completely new (sub) language. E.g. NumPy DataFrames allow to do the same thing in multiple ways (e.g. adding an index column, or removing a column). Often when I need to look up how to do a particular thing I end up finding many solutions that simply don't function with the version I am working with. Sometimes looking even into old code of mine doesn't work any more and requires either me using an older library of relearning how to do things.
That being said, a friend of mine has been quite fond of Julia lately. Which put Julia on the top of my list of programming languages to do a deep dive.
Talk related - https://youtu.be/5mUGu4RlwKE
https://github.com/zero-one-group/geni-performance-benchmark
Time to plot is much improved in 1.6 and should continue to improve in 1.7. It's definitely being addressed.
I'm not sure what advantage Julia has over Python. Yeah it has some typing and can be faster, but its too similar. Still single threaded.
# Threaded inner loop, each thread has no dependence upon others
Threads.@threads for t=1:Nthr
inner_gen_cpu1!(psum,ms,me,cls,2)
end
That's all you need. You don't need pthread create/join, you don't need installable language extensions, you don't need to appeal to external tools/libraries to enable threading. Its built in to Julia. And it is trivial to use.And no, Julia is not too similar to Python. Julia has multiple dispatch, Python does not.
While I'm really not a fan of 1-based indexing, Julia's multiple dispatch is not something easy to match in Python.
[EDIT]: one thing that's still not solved in Julia is code startup time.
Many people will sell you some sort of workflow that works around the problem, but it's the same old tired arguments people would use to defend traditional compiled languages, and I'm not buying.
I really wish they would find a way to truly solve this.
Really? I thought that was a ~2017 thing.
It won't stop either because the road between between JS client dev and JS server dev is so smooth. Path of least resistance type thing.
I guess in some startup scene, it has been Java and .NET over here and no signs of changing, despite the occasional junior projects that eventually get rewritten back into Java/.NET stacks when they move on.
Yall really just write procedural code for everything?
Another non-functional application of tail call elimination is finite state machines. Writing them as functions calling the next state in tail call position is very elegant, legible and efficient.
Were it not for Firefox and Edge teams who torpedoed that feature, it would be a part of the major language of today.
Maybe it still will be.
I use Python plenty, just not in large enough doses that I have to actually make peace with it.
Julia code might also uses a lot of in place operations which would be hard for a compiler to infer as safe.
Well, for example at the very least in Common Lisp you'll have much more joy with higher-order functions than with loops. The simple reason for that is the existence of compiler macros (http://clhs.lisp.se/Body/03_bba.htm) which can replace function compositions with arbitrary code. And it's much easier to figure out what the function composition does than to write a loop vectorizer.
No, I AM going to write procedural code, and it WILL be faster than your "high IQ" 1 line recursive solution. Also funny to see how little recursion gets used in CUDA/Pytorch/GPU programming - which is what we are seeing to be more and more important over time.
It’d be interesting to see how much of the Python ecosystem is actually necessary to move PyTorch to a better language.
I’m afraid we’re stuck with Python for the next 20 years. That makes me very, very sad.
Its important to remember that most of the python ecosystem, isn't written in python. The functions are often thin wrappers/objects around the real computation, which is often written in a faster language, C/C++/Fortran.
Julia excels in composability, performance, ease of development. You don't need to recode your algorithms in another language due to the performance of Julia, as is needed in Python's case.
Generally speaking, I see Julia's biggest fault, time to first plot, being addressed far sooner than python being redesigned to have Julia's capabilities. For the record, I use python daily in the day job. And I use Julia there for analytics there, often with jupyter notebooks. Easy for users to consume and interact with.
For me I think the packaging ecosystem is bad, we need one package management tool like poetry built in. We need a built in typing system like typescript. Lastly we need to remove the GIL.
I’m pretty sure all of these are currently being addressed by the community.
I switch languages a lot and things like functools, itertools, dunder methods, list comprehensions, dict comprehensions are things I sorely miss especially in typescript. In particular list and dict comprehensions when used with care are a great deal easier to work with and reason about when transforming data.
I like to think that containers only exist because deploying a Python application is so %^#(&*# complicated that the easiest way to do is to deploy an entire runtime image. It's an absolute nightmare and travesty. So bad. So very very bad. https://xkcd.com/1987/
I'm not optimistic on TypeScript for Python. That'd be great if such a thing existed! I'm not optimistic on packaging or deployment. There is recent progress on GIL removal which is exciting! There is hope, but I'm proceeding with very cautious optimism.
Comprehensions are kinda great, but also hideous and backwards. Rust iterators are a significant improvement imho. The fact that no recent languages have chosen to copy Python's syntax for comprehensions is telling!
Oh, and I think the standard library API design is pretty poor. Filesystem has caused me immense pain and suffering. Runtime errors are the worst.
Python is popular because of the ML revolution. If ML didn't take off neither would Python's popularity. Is ML successful because of Python or despite Python? Well, the world is probably further along with Python than if it merely didn't exist. But if a different language that sucked less existed we would, imho, be further along than we are.
I'm not annoyed Python exists. I'm annoyed that its inertia is so monumental it's inhibiting further progress. We're at a local maximum and the cost to take the next step is really really really high. That's not Python's fault mind you, just the way things work.
I was recently writing code using Reactor/RxJava in Java 11 w/ Lombok. I don't think I've ever been so productive or lead a team as productive as when we were going ham on the functional/reactive Java. Now that I'm back in Python land, I am constantly frustrate on a daily basis with both the language and the runtime at every turn. Even with the asyncio we are working on, it feels like the absolute minimum viable product when compared to the java, node, or rust I have done.
There are some fantastic python enhancements that bridge some of the gaps like PEP 484 (Type Hints) and PEP 517 (holy crap an actual build systems that are not dogcrap) but it feels like the python community does not care.
I wrote a somewhat tongue-in-cheek rant blog post. https://www.forrestthewoods.com/blog/things-i-like-about-pyt...
Python is one of the few languages that has a balance of ease of use, ecosystem, ubiquity, and useable type system. It's a fantastic glue language and it's extremely flexible.
> A language must compile to efficient code, and we will add restrictions to the language (type stability) to make sure this is possible.
> A language must allow post facto extensibility (multiple dispatch), and we will organize the ecosystem around JIT compilation to make this possible.
> The combination of these two features gives you a system that has dynamic language level flexibility (because you have extensibility) but static language level performance (because you have efficient code)
Given those constraints, the first language that comes to mind is Java. Why is Java basically not a player in the scientific-computing game?
There’s also some unfortunate choices Java made like standardizing one specific semantics for reproducible floating point code. That’s unfortunate because adjusting for native SIMD widths sacrifices reproducibility but improves both accuracy and speed. The only choice if you want perfect reproducibility on all hardware that Java supports is the worst performance model with the worst accuracy.
There’s also the fact that Java integers are 32-bit and Java arrays are limited to 2GB, which was reasonable when Java was designed, but is pretty limiting for modern numerical computing.
I also think that the JVM object model is quite limiting for numerical computing. They still don’t support value types, but value types are precisely what you want to represent efficient numerical values like complex numbers, or quaternions or rationals, and so on. Java forces all user-defined types to be heap-allocated reference types. Julia solves this by defaulting to immutable structures, which is exactly what you want for numerical values: the semantics are still referential (if you can’t mutate you can’t distinguish value semantics from reference semantics), you just can’t change values, which is exactly how you want numbers to behave (you don’t want to be able to change the value of 2).
Lack of value types in Java also makes memory management unnecessarily challenging. You can’t make a user-defined type with an efficient C-compatible array layout in Java. Because the objects are references, so the array is an array of pointers to individual heap-allocated objects. The ability to subtype classes forces that, but even with final classes, the ability to mutate objects also forces it, since pulling an object reference out of an array and modifying it is required to modify the object in the array (reference semantics), and that’s incompatible with the inline array layout.
And finally, this same lack of value types puts a LOT of pressure on the garbage collector.
This is mostly true, but the primitives are value types and you can get some things done with them. (Not enough to make Java good for these use cases, no.) I.e. write float[] instead of Float[] and you have a contiguously allocated region of memory that can be efficiently accessed.
Python was written with people like scientists in mind. Professionals write fast C libraries and then people who know just enough to get by use python to glue it all together.
But you can just write a simple 20 line Python script to do some data mangling, no project with 30 IDE files required.
Visual J++, Visual Cafe and JBuilder were the main ones but not everyone was eager to buy them, while the JDK was free beer.
Python doesn’t compile to machine instructions either and there’s nothing that prevents GPU access from Java. In fact I’d bet in many cases pure Java beats Python + C library though it obviously depends on how much has to be written in pure python.
See this talk for examples: https://www.youtube.com/watch?v=kc9HwsxE1OY
It’s PyTorch-if they said “the next version of PyTorch will be in Julia, the ecosystem would shift accordingly.
They’re practically saying “this language has every feature we need and want, most of them already existing, but we’re going to continue re-inventing them in this objectively less suitable language because we clearly wish to make life harder for ourselves”
I have used MATLAB, R, Python and Julia extensively for doing all sorts of data related things during the last 20 years. Julia is incredibly easy to work with, very elegant and really efficient.
R and Python have always felt clumsy in some ways, and hard to write really performant code, even if you are more proficient in Python! As a seasoned Lisper and MLer, even after having a lot of Python experience in my belt, Julia felt much easier to work with from the very beginning.
Furthermore, most Julia libraries are written in pure Julia which simplifies things a lot and enhances composability. While there are great libraries around, the DL landscape is a bit lacking. Flux is great, but I would not use it to build e.g. transformers as it changes too often and has too few maintainers behind it. Hence a potential migration of Torch to Julia would be fantastic.
PyTorch is not only easy, but is a joy to work with.
Among researchers, TensorFlow is rapidly losing ground to PyTorch, and, I think, will keep losing ground until it becomes a niche and only used by Googlers and some others.
Soumith Chintala had a keynote talk in juliacon where he focused on these points; https://www.youtube.com/watch?v=6V6jk_OdH-w
Another issue is pytorch/tf in python are very dominant in research/projects. Often we clone relevant recent projects and try experimenting with them to see if they help. Swapping to Julia would hurt a ton in that area.
edit: Also while I'm fond of python I'd be very open to seeing another language win. There are language design choices I dislike in python, but I like enough of the language and ecosystem as been too strong to leave most other languages worth pondering. If Julia grows enough that my coworkers start asking for Julia support I'd be happy to explore it. My pet preferred language is crystal (ruby like readability + types + good performance) but ecosystem wise it's tiny.
Numpy -> Array + broadcasting (both in Julia Base)
pytoch/tf -> Flux.jl (package)
batch/stream processing -> you don't need it as much, but things like OnlineStats exist. Also Base has multithreaded and distributed computing. Spark in particular is one where it lets you use a cluster of 100 computers to be as fast as 1 computer running good code.
pyarrow -> Arrow.jl (there's also really good packages for JSON, CSV, HD5 and a bunch of others)
Let me know if you have any other questions. Always glad to answer!
PyTorch is a small part of the python ecosystem. The python ecosystem is not going to change at all if PyTorch moves to Julia.
But all bullet points are about things that are easily done right now with libtorch (pytorch underlying C++ core code), and the hassle is... Python.
Well rational conclusion would be, just do everything in C++, and bind to Python. Make C++ first citizen here, since in all cases it'll be needed for performance, forever.
In other words, python binary wheels are harder to maintain than source-only python packages. And pytorch uses more than a few. I can't imagine Julia makes the problem much simpler. The main pain point is probably the lack of standard, multi-environment packaging solutions for natively compiled code.
I don't know what it would take for this sort of pain point to improve significantly. Some standards around how C, C++, and Fortran projects are packaged would help. This would allow projects to build on top of existing natively compiled tech a lot better. Maybe the biggest reason those languages don't have the same "ecosystem" as python is utter lack of packaging standardization.
Are you talking about something like BinaryBuilder.jl[1], which provides native binaries as julia-callable wrappers?
--
IMHO, one of the biggest advantages of Julia _is_ arrays.
There is the use of @ (but to signal macros), but otherwise, the syntax is much closer to a cross between Python and matlab except nicer for doing math.
I tried writing a few programs in Julia and got sucked in by how effective it is. The real surprise is that just a few weeks in instead of pulling up R to do a quick calculation my fingers decided they wanted Julia.
for example, check out parts of the stdlib: https://github.com/JuliaLang/julia/blob/master/base/operator...
but in the end, julia really is a lisp: https://www.youtube.com/watch?v=dK3zRXhrFZY
But, then what?
I could not use it anywhere I worked. The ecosystem was lacking.
Julia is good, but for what exactly?
People involved with Julia are always big with words, but when will I see it in use somewhere?
I have not seen good results from differential equation solvers in Go.
Glad to see functorch[3] as PyTorch is the library I have the most experience with.
[1] https://docs.hylang.org/en/alpha/
Some portions of the ecosystem are rock solid, especially the parts where JuliaComputing makes money from consulting(not all but some). Other parts are beds of sand/permanent research projects. The median experience is usually someone points you to a package and it doesn't really do what you hoped it would so you end up adapting it and rolling your own solution to a problem. Maybe you try to make a PR and it gets rejected because of "not invented here"/academia mindsets, either way you made a fix and your code works for you.
What makes this barrier hard to overcome for adoption is: trust, and blind spots. People who aren't experts in a casual work area (maybe computer vision) realize they can't use a tool to do something `basic` and run away to easier ecosystems(R/Python). People who are experts in other areas, check credentials of packages see that an ivy league lead researcher made it and assumes it's great and usable for a general audience. So you'll get a lot of "there's a package for that" but when you go to use it you might find the package is barren for common and anticipatable use cases in industry (or even hobbies).
This makes Julia best positioned as a research tool, or as a teaching tool. Unfortunately - where Julia actually shines is as a practical tool for accomplishing tasks very quickly and cleanly. So there's this uncomfortable mismatch between what Julia could be and what it's being used for today. (yes Julia can do both not arguing against it). The focus on getting headlines far outsurpasses stable useful stuff. Infact, very often after a paper gets published using Julia, a packages syntax will completely change - so no one really benefits except for the person who made the package.
Interestingly, 1 person(with some help of course) fleshed out the majority of the ecosystems need for interchange format support(JSON), database connections, etc. It's not like that person is jobless spending all their days doing it - it was a manageable task for a single smart person to kick off and work hard to accomplish. Why? Because Julia is amazing for quickly developing world class software. That is also kind of its detriment right now.
Because its so easy to create these amazing packages you'll find that a lot of packages have become deprecated or are undocumented. Some researcher just needed a 1 off really quickly to graduate, maybe the base language(or other parts of the ecosystem) changed many times since its release. Furthermore, if you try to revitalize one of these packages you'll sometimes find a rats nest of brilliance. The code is written very intelligently, but unpacking the design decisions to maintain world class performance can be prickly at best.
One of Julia's strengths is it's easy/clean to write fast enough code. One of its downsides is, this attracts people who focus on shaving nanoseconds from a runtime (sometimes needlessly) at the expense of (sometimes) intense code complexity. Performance is important, but, stable and correct features/capabilities mean more to the average person. After-all, this is why people use, pay for, hire for: Matlab, Python and R in the first place - right?
Most people don't want to have to figure out which ANOVA package they should use. Or find out in a bad way some weird bug in one of them and be forced to switch. Meanwhile in R: aov(...).
Do I blame Torch for not using Julia? No. Should they consider using it? Yes, absolutely. Does Julia's cultural issue need attention before risking Python(or anything else) reinventing a flavor of Julia that's more widely used for stability reasons alone - in my opinion, yes (see numba, pyjion, etc). Still love the language, because technologically it's sound, but there are blemishes. I'd chalk it up to growing pains.
(To be fair, Postgres has an extremely similar issue with JSON data types and it's doing fine.)
The state of tabular data formats is similar but instead of 2 libraries there are 20, and some of them are effectively deprecated, but they're not marked as deprecated so the only way to find out that you shouldn't be using them is, again, to ask a question about them in Discourse or Slack. You can check the commit history, but sometimes they'll have had minor commits recently, plus (to Julia's immense credit) there are some libraries that are actively maintained and work fine but haven't had any commits for 3 years because they don't need them. I assume this will get worse before gets better as the community tries to decide between wrapping Polars and sticking to DataFrames.jl, hopefully without chopping the baby in half.
I feel like the "not invented here" mindset contributes a lot to that fragmentation. It's easy to write your own methods for types from other Julia libraries because of multiple dispatch, which seems to have resulted in a community expectation that if you want some functionality that a core package doesn't have, you should implement it yourself and release your own package if you want to. So we have packages like DataFramesMeta.jl and SplitApplyCombine.jl, not to mention at least 3 different, independent packages that try (unsuccessfully IMO) to make piping data frames through functions as ergonomic as it is in R's dplyr.
Despite all of this, I still like the language a lot and enjoy using it, and I'm bullish on its future. Maybe the biggest takeaway is how impactful Guido was in steering Python away from many of these issues. (The people at the helm of Julia development are probably every bit as capable, but by design they're far less, um, dictatorial.)
Again, completely agree with the sometimes confusing state of the ecosystem. Sometimes I wish a bit of democracy existed, but people are people. I proposed some solutions to that problem a while ago but that's a story for another year.
Academia does create a very different kind of reward system that is often counter to community progress. IE: get there first, publish, obfuscate to thwart competition, abandon for new funding. Tends to reward people the highest for not giving credit, or sharing progress.
Meanwhile, people relying on alternatives to julia are more like: load in trusty xyz, use it in trusty way, I'll upgrade when it makes sense, and check the docs not the code when I am unsure of something.
Not to say industry is much better(I keep saying `academia`), but industry projects do tend to appreciate/honor free labor a little more kindly. That or they close the OSS gate and you get what you get.
Novelty is a driving force, but too much entropy and not playing well with each other can destroy a meaningful future quickly. It'll work itself out, one way or another but only because the technology is good :D.
With how quickly these frameworks change it's overwhelming to keep pace! Anyone have advice for solid frameworks that can reasonably leverage GPU's without too much heavy lifting?
What has not been accounted for is that the huge community / network effect of the python ecosystem is very far from exhausting itself. If anything, it is just starting as the exponential growth has mostly been the last few years (tautology, he he)
A major investment to eliminate python technical debt would make more sense if things were stagnant and the re-engineering would open up entirely new domains.