Correctness and composability bugs in the Julia ecosystem (opens in new tab)

(yuri.is)

699 pointsbenjojo124y ago407 comments

407 comments

246 comments · 47 top-level

Everything has correctness issues somewhere. Julia ships an entire patched version of LLVM to fix correctness bugs in numerical methods. It has its own implementations of things like software-side FMA because the FMA implementation of Windows is incorrect: https://github.com/JuliaLang/julia/pull/43530 . Core Julia devs are now the maintainers of things like libuv because of how much had to be fixed there. So from those three points, that clearly points out tons of cases where Python, R, etc. code is all incorrect where Julia isn't.

I think what's interesting about Julia is that because the code is all Julia, it's really easy to dig in there and find potential bugs. The standard library functions can be accessed with @edit sum(1:5) and there you go, hack away. The easier it is to look at the code, the easier it is to find issues with it. This is why Julia has such a higher developer to user ratio. That has its pros and cons of course. It democratizes the development process, but it means that people who don't have a ton of development experience (plus Fortran or C knowledge) are not excluded from contributing. Is that good or bad? Personally I believe it's good in the long run, but can have its bumps.

As an aside, the author highlights "for i in 1:length(A)". I agree, code should never do that. It should be `eachindex(A)`. In general things should use iterators which are designed for arbitrary indexing based on iterators. This is true in any language, though you'll always have some newcomers write code (and documentation) with this. Even experienced people who don't tend to use arrays beyond Array tend to do this. It's an interesting issue because coding style issues perpetuate themselves: explicitly using 1-base wasn't an issue before GPUs and OffsetArrays, but then loop code like that trains the next generation, and so more people use it. In the end the people who really know to handle these cases are the people who tend to use these cases, just like how people who write in styles that are ARM-safe tend to be people who use ARM. Someone should just run a bot that opens a PR for every occurrence of this (especially in Base), as that would then change the source that everyone learns from and completely flip the style.

Strilanc4y ago

> Everything has correctness issues somewhere.

This is fallacy of gray. The blog post isn't complaining that there are non-zero bugs, it's complaining that when you use the language you hit a lot of correctness bugs. More bugs than you'd hit using e.g. python.

Also, to the extent that Julia uses LLVM, a correctness bug in LLVM is also a correctness bug in Julia. So arguing "LLVM has lots of correctness bugs" is not helping the case...

> because the code is all Julia, it's really easy to dig in there and find potential bugs.

The blog post is about bugs hit while running code, not bugs found while reading code. The fact the issue can be understood and pointed at is great, but it's the number of issues being hit that's the problem.

suavesito4y ago

> So arguing "LLVM has lots of correctness bugs" is not helping the case

It does not help the case about the correctness of Julia, but it does help the case about Julia having more bugs than other software (negatively for the other projects). Every library built with LLVM that touches those code paths will have those bugs.

Another thing to have in mind is that Julia ships patches for some of these, that are not used upstream yet. So Julia does not suffer from some bugs on LLVM that other projects might.

ChrisRackauckas4y ago

It shows that Julia's tests are systematically finding (and leading to fixes) of numerical bugs that are pervasive throughout the rest of the LLVM ecosystem. And since Julia's LLVM is patched to solve these while other variants of LLVM are not, Julia is more correct in these aspects than other languages which rely on the Base build of LLVM. Of course Julia doesn't solve "all bugs", but some of them (like the correctness of certain math library implementations) really make you question how hard other language tests are hammering those for correctness testing (Julia has a lot of numerical tests checking the precision of such methods against MPFR bigfloats at higher precision to ensure ~X ulp correctness for example). Julia definitely spends a lot of time testing numerical correctness than it does testing something like a web server. It's just a prioritization thing.

1 more reply

mbauman4y ago

I do think there's a particularly unique challenge to Julia in that so many packages can theoretically coexist and interoperate. While it quadratically increases the power of Julia, it also quadratically increases the surface area for potential issues. That — to me — is the most interesting part of the blog post. How can we help folks find the "happy" paths so they don't get lost in the weeds by trying to differentiate a distributed SVD routine of an Offset BlockArray filled with Unitful Quaternions? And — as someone who worked with and valued Yuri's reported issues and fixes — how can I more quickly identify that they're not someone who gets joy out of making such a thing work?

p33p4y ago

Good comments, Chris. I think the author has a little bit of nuance in that Julia isn't correct in the specific use cases he needs them to be. While your point is also well taken that Julia is correct in cases where other languages aren't as well.

I'm a little unfamiliar with the versioning in the package ecosystem, but would you say most packages follow or enforce SemVer? Would enforcing a stricter dependency graph fix some of the foot guns of using packages or would that limit composability of packages too much?

ChrisRackauckas4y ago

> but would you say most packages follow or enforce SemVer?

The package ecosystem pretty much requires SemVer. If you just say `PackageX = "1"` inside of a Project.toml [compat], then it will assume SemVer, i.e. any version 1.x is non-breaking an thus allowed, but not version 2. Some (but very few) packages do `PackageX = ">=1"`, so you could say Julia doesn't force SemVar (because a package can say that it explicitly believes it's compatible with all future versions), but of course that's nonsense and there will always be some bad actors around. So then:

> Would enforcing a stricter dependency graph fix some of the foot guns of using packages or would that limit composability of packages too much?

That's not the issue. As above, the dependency graphs are very strict. The issue is always at the periphery (for any package ecosystem really). In Julia, one thing that can amplify it is the fact that Requires.jl, the hacky conditional dependency system that is very not recommended for many reasons, cannot specify version requirements on conditional dependencies. I find this to be the root cause of most issues in the "flow" of the package development ecosystem. Most packages are okay, but then oh, I don't want to depend on CUDA for this feature, so a little bit of Requires.jl here, and oh let me do a small hack for OffSetArrays. And now these little hacky features on the edge are both less tested and not well versioned.

Thankfully there's a better way to do it by using multi-package repositories with subpackages. For example, https://github.com/SciML/GalacticOptim.jl is a global interface for lots of different optimization libraries, and you can see all of the different subpackages here https://github.com/SciML/GalacticOptim.jl/tree/master/lib. This lets there be a GalacticOptim and then a GalacticBBO package, each with versioning, but with tests being different while allowing easy co-development of the parts. Very few packages in the Julia ecosystem actually use this (I only know of one other package in Julia making use of this) because the tooling only recently was able to support it, but this is how a lot of packages should be going.

The upside too is that Requires.jl optional dependency handling is by far and away the main source of loading time issues in Julia (because it blocks precompilation in many ways). So it's really killing two birds with one stone: decreasing package load times by about 99% (that's not even a joke, it's the huge majority of the time for most packages which are not StaticArrays.jl) while making version dependencies stricter. And now you know what I'm doing this week and what the next blog post will be on haha. Everyone should join in on the fun of eliminating Requires.jl.

kazinator4y ago

> Julia ships an entire patched version of LLVM to fix correctness bugs in numerical methods

Sounds like the banana ships with the gorilla which requires the entire jungle, and we're too busy fixing the gorilla to give the banana our undivided attention.

nightpool4y ago

I'll be honest, based on my experience with Julia, this makes me more worried about using e.g. libuv in production systems now, not less. I understand your opinion that "The easier it is to look at the code, the easier it is to find issues with it", but I don't think that has anything to do with the fact that `prod((Int8(100), Int8(100)))` and `prod([Int8(100), Int8(100)])` disagree, because someone decided to special-case tuple multiplication. And to make it even worse, this bug was even documented(!) in the comments by whoever committed the original code:

   # TODO: this is inconsistent with the regular prod in cases where the arguments
   # require size promotion to system size.

How did this pass code review? Why would it be okay for a standard library function to be "inconsistent" in this way?

(EDIT: Since writing this comment, I've realized that (100 * 100) % 256 is in fact 16, so the results are a little less inexplicable to me. I think having the types annotated in the REPL would have made it clearer what was going on, and it's still a very difficult inconsistency to debug, especially as an end user)

I also think your argument that "[...] you'll always have some newcomers write code (and documentation)" that is broken is completely incorrect, and it shifts the blame from providing a safe and easy-to-use system from the language authors onto the users. The OP goes to pains to point out that this was not just an issue of "some newcomers"—it was a fundamental issue across the entire community, including what seem to be some of the most heavily-used packages in Julia's ecosystem, including Distributions.jl and StatsBase.jl. It's deeply misleading to blame issues like that simply on "people who don't have a ton of development experience" and "newcomers writing documentation", and it indicates a lack of responsibility and humility from Julia's proponents.

P.S: You're correct that the documentation about @inbounds was written by someone who was new to the language (https://github.com/JuliaLang/julia/pull/19726). But in fact the example itself was copied over entirely as-is from devdocs, where it was written by the author of the boundschecking feature(!) https://github.com/JuliaLang/julia/pull/14474. And it was only fixed last year. And the entire docs PR was reviewed thoroughly by two core team members, with lots of changes and suggestions—but nobody noticed the index issue. So I don't think you can blame this one on newcomers.

cmcaine4y ago

Julia released experimental support for arrays whose indexes don't start at 1 in Julia 0.5, October 2016.

The boundschecking feature was added in 2015, so at the time they wrote their code and examples, they were correct.

The documentation and review happened in December and January 2016/2017 when the non 1-based indexing was still experimental and very new, so I don't think this is as big a fail as you've made out either.

Yes, the documentation should have been updated when non-standard indexing was made non-experimental, and the reviewers should maybe have noted the new and experimental array indexing stuff, but it's only natural to miss some things.

1 more reply

leephillips4y ago

Just to clarify, the prod() bug you mention was fixed about a year ago.

nalimilan4y ago

The problem in this case (as with most issues regarding `@inbounds`) is that this text was written before arrays with non-standard indices existed in Julia. So the example was correct at the time it was written, just like the StatsBase code was correct. Old code needs careful checking to fix all these occurrences.

nightpool4y ago

Discussed in a sibling thread: https://news.ycombinator.com/item?id=31401155.

kazinator4y ago

> I agree, code should never do that. It should be `eachindex(A)`

Will that generate the same code as "i in 1:length(A)"?

Maybe whoever wrote that didn't believe so at least, or perhaps didn't find it so at the time.

The reason @inbounds would have been used is performance, so that's likely why the for loop header was written that way?

mbauman4y ago

`eachindex` is — in quite a few situations — faster than `1:n`.

We've also been trying to promote a culture of not blindly putting `@inbounds` notations on things as the compiler gets smarter. `@inbounds` is a hack around a dumb compiler, especially when the loop is as simple as many of these examples. It's not needed there anymore (but was 5 years ago).

bjourne4y ago

Perhaps that is part of the point of the article? If you accept things like @inbounds, which is a horrible hack and was a horrible hack five years ago, then perhaps the culture is a little too tolerant towards horrible hacks. Because many of the bugs the author enumerates are of the "fixes the problem for now, let's deal with the consequences later" type.

2 more replies

kazinator4y ago

The question is: is it at least as fast in all situations? Was it always that way?

The 1 to length loop just has to initialize a local variable and step it; it cannot do anything else. It doesn't worry about the kinds of array that A may be, with its particular configuration of indexing, right?

You may promote a culture of not doing certain things, but that by itself won't make those things disappear from existing code.

Say you're trying to ship some product and you receive a bulletin from the language mailing list encouraging you, "try not to use @inbounds, it's a hack around a dumb compiler". You know you have that in numerous places; but you're not going to stop what you're doing and start removing @inbounds from the code base. If you're remarkably conscientious, you might open a ticket for that, which someone will look into in another season.

1 more reply

cbkeller4y ago

I think it should be fine for performance AFAIU to use `eachindex` instead; at least I know `eachindex` plays nicely with LoopVectorization.jl with no performance costs there.

That said, I think you're exactly right that people may wonder just this and use the seemingly "lower-level" form out of concern with or without testing it.

1 more reply

rcthompson4y ago

Is "for i in 1:length(A)" ever correct? Should Julia just emit a warning any time it encounters that pattern? Or maybe something slightly more complicated, such as that pattern followed by usage of i to index into A inside the loop?

TimTheTinker4y ago

> Is "for i in 1:length(A)" ever correct?

In some rare cases, it very well might be exactly what the code's author intended and needed.

I tend to lean towards when Martin Fowler calls an "enabling attitude"[0] (as opposed to a "directing attitude") -- that is, when faced with a choice about how to design the primitives of an interface, I lean more often towards providing flexibility, and I try to avoid choosing ahead of time what users aren't allowed to do. It's better to document what's usually the wrong way to do something than to enforce it in the design. You can never guess what amazing things people will create when they are given flexible, unrestricted primitives.

So for cases like this, I think it's better to rely on a flexible linting tool (if available) than warnings or errors.

[0] https://martinfowler.com/bliki/SoftwareDevelopmentAttitude.h...

dan-robertson4y ago

Why not have a feature to allow you to turn off the warning? E.g. have something recognise 1:length(x) and complain unless you write e.g. @nowarn eachindex before it.

1 more reply

rashidrafeek4y ago

It is correct if `A` is of type `Array` as normal Array in julia has 1-based indexing. It is incorrect if `A` is of some other type which subtypes `AbstractArray` as these may not follow 1-based indexing. But this case errors normally due to bounds checking. The OP talks about the case where even bounds checking is turned off using `@inbounds` for speed and thus silently gives wrong answers without giving an error.

An issue was created sometime ago in StaticLint.jl to fix this: https://github.com/julia-vscode/StaticLint.jl/issues/337

cmcaine4y ago

It's correct if you want to do something `length(A)` times and want an iteration counter, but it's never better than `for idx in eachindex(A)` if what you actually want are indexes into A (which is of course the much more common case).

Julia did not initially support arrays that aren't indexed from 1 (experimental support added in Julia 0.5, I don't know when it was finalised), and at that time I'm not even sure we had something like eachindex, certainly there would be no reason why someone would use it for an array.

a13692099934y ago

> Is "for i in 1:length(A)" ever correct?

Yes, actually. While I have approximately zero knownledge of Julia specifically, a language-independent example might be:

  B = OneBasedArray(length(A))
  A_ = iter(A)
  for i in 1:length(A) { B[i] = pop(A_) }
  assert(iter_isdone(A_))

And if that looks contrived... yes; it is contrived.

> that pattern followed by usage of i to index into A inside the loop?

I can't think of any legitimate uses for that, but there probably are some; make sure to allow:

  len = length(A)
  for i in 1:len ...

as a `if( (x = foo()) )`-style workaround.

poulpy1234y ago

Why allow iterating with 1:length(A) if it's not the good way ?

cmcaine4y ago

I don't think there's any clean way to stop that at a language level (some languages prevent this by disallowing random access to arrays, but that's a non-starter for a performance-oriented language), and also it would be a massively breaking change.

adgjlsfhk14y ago

you can't disallow it at a language level since either way, you are just indexing with Ints. That said, we can add better linting rules to catch stuff like this.

grumpyprole4y ago

> Everything has correctness issues somewhere.

Yes but Julia is (yet another) dynamic language, presumably for "ease of use". A language with static types would have made it easier to build correct software (scientific code in e.g. OCaml and F# can look pretty good). Julia chose a path to maximize adoption at the expense of building a reliable ecosystem. Not all languages choose to make this trade-off.

pron4y ago

> A language with static types would have made it easier to build correct software

This claim is repeated often, but numerous attempts have failed to demonstrate that this is generally the case in practice (there have been a couple of studies showing an effect in very specific circumstances). Static types might indeed assist with correctness, but they are not the only thing that does, and in some situations they could come at the expense of others. I.e., even if types were shown to significantly help with correctness, it does not follow that if you want correctness your best course would be to add types.

Given empirical studies, the current working hypothesis should be that if static types do have a positive effect on correctness, it is a small one (if it were big, detecting it would have been easy).

Note that Matlab, the workhorse of scientific computing for a few decades now, is even less typed than Julia. That's not to say that Julia doesn't suffer from too many correctness issues (I have no knowledge on the matter), but even if it does, there is little support for the claim that typing is the most effective solution.

ThenAsNow4y ago

We can trade anecdotes on this topic, but I've written numerical code in OCaml and also Julia. The strictness of OCaml's type system is painful in a numerical context but for virtually all other things it is awesome to pass code into the interpreter/compiler and catch structural problems at compile-time rather than maybe at runtime.

OCaml's type system is almost certainly not the right model for Julia but the ad-hoc typing/interface system Julia currently employs is at strong odds with compile-time correctness. There's almost certainly some middle ground to be discovered which might be unsound in a strict sense but pragmatically constrains code statically so there is high likelihood of having to go out of your way to pull the footgun trigger.

You can see how little type annotations are used in practice in major Julia libraries. It should be integral to best practice in the language to specify some traits/constraints that arguments must satisfy to be semantically valid, but what you often see instead is a (potentially inscrutable) runtime error.

1 more reply

grumpyprole4y ago

> Given empirical studies, the current working hypothesis should be that if static types do have a positive effect on correctness, it is a small one.

Which use cases, languages and static type systems are you referring to? The context is very important, especially when seeking to draw general conclusions from empirical studies.

As someone who has previously posted extolling the merits of static analysis, I'm very surprised at your position regrding static types. Static types help to constrain a language and enable reasoning, either by additional static analysis or otherwise.

It is precisely the flexibility of dynamic languages that makes them difficult to reason about and difficult to build correct software in. This is why the use of dynamic languages are mostly banned in the defense industry.

Static types clearly help with composition (one of the complaints with Julia), especially at scale. How many academic empirical studies considered multimillion-line code bases? I submit for evidence a lot of expensive type-retrofitting projects such as Facebook Hack, Microsoft Typescript or Python types, which demonstrate that many companies have or had real problems with dynamic languages at any kind of scale.

jolux4y ago

> Note that Matlab, the workhorse of scientific computing for a few decades now, is even less typed than Julia.

You always make this argument when discussing PL features and I find it irksome. People get along fine without this feature, therefore there’s no sense in implementing it. But it cuts the other way, or we’d all still be using assembly. How many Matlab users know things could be better? Was the superiority of structured programming and avoiding GOTO ever empirically proven, or did we all just collectively realize it was a good idea?

2 more replies

StefanKarpinski4y ago

In particular, not a single issue mentioned in this article would have been prevented by static type checking.

1 more reply

guenthert4y ago

Julia allows you to specify the type of a datum if you feel the need (not unlike Common Lisp). Is any of the bugs the author mentioned related to the type system?

mattkrause4y ago

I'm surprised at this critique, as I thought Julia's type system was often considered to be one of its strongest features.

ThenAsNow4y ago

So, I really respect what you've done (for those who don't know, Chris is the original developer and lead of DifferentialEquations.jl) and use your work heavily. However, understanding and writing idiomatic Julia, especially with these large packages, is severely hampered by the documentation culture.

A prior comment I made, all of which seems unaddressed to me three years later: https://news.ycombinator.com/item?id=20589167

To be fair, I've only submitted a small documentation patch for a package and haven't significantly "put my money where my mouth is" on this topic. But I hope the next time there are thoughts among the core team about what is the next capability to add to the language, addressing this deficiency is prioritized.

ChrisRackauckas4y ago

FWIW, I posted the other month that I'm looking for any devs who can help with building a multi-package documentation for SciML, since I don't think the "separate docs for all packages" ends up helpful when the usage is intertwined. SciML is looking for anyone looking to help out there (and there's a tiny bit of funding, though "open source sized" funding). In the meantime, we're having a big push for more comprehensive docstrings, and will be planning a Cambridge area hackathon around this (follow https://www.meetup.com/julia-cajun/ for anyone who is curious in joining in).

As for high level changes, there's a few not too difficult things I think that can be done: https://github.com/JuliaLang/julia/issues/36517 and https://github.com/JuliaLang/julia/issues/45086 are two I feel strongly about. I think limiting the type information and decreasing the stack size with earlier error checking on broadcast would make a lot of error messages a lot more sane.

KKKKkkkk14y ago

FMA can't be broken on Windows because FMA is implemented in hardware by Intel. What's broken is the compiler that Julia uses on Windows.

ChrisRackauckas4y ago

When FMA isn't in the hardware (due to using some chip where it doesn't exist) it has a fallback to a software-based emulation. That is incorrectly implemented in Windows. Julia ends up calling that in this case because that's what LLVM ends up calling, and so any LLVM-based language will see this issue.

celrod4y ago

Even when FMA is implemented in hardware, LLVM will generally use the software version when the arguments are known at compile time.

stephencanon4y ago

FMA is only implemented in hardware on Haswell and later uArches. If you’re running on (or compiling for) IVB or earlier, you’ll get a libcall instead, and MSVC’s has been broken since forever.

Diggsey4y ago

Is this actually broken in MSVC, or is it broken because Julia is using mingw and linking to an ancient version of libc on windows (which is intentionally left as-is for back-compat)?

(I genuinely don't know, but the linked issue mentioned mingw specifically)

1 more reply

adgjlsfhk14y ago

the problem is that LLVM will happily miscompile fma instructions by turning them into incorrect constants due to windows having a broken libm. This is a bug in C/C++, and I'm currently unaware of a language that has fma and a good compiler which gives correct fma results on Windows.

Const-me4y ago

CPUs support these instructions for 9 years now. When ignoring these old CPUs, most languages and compilers are usually doing a good job. Example in C which does not depend on any library functions:

    double fma( double a, double b, double c )
    {
        __m128d av = _mm_set_sd( a );
        __m128d bv = _mm_set_sd( b );
        __m128d cv = _mm_set_sd( c );
        return _mm_cvtsd_f64( _mm_fmadd_sd( av, bv, cv ) );
    }

1 more reply

dekhn4y ago· 33 in thread

Wait, are those examples real?

I remember complaining about 1-bsaed indexing only to be told "julia is great! we have offsetindex". If it's a source of bugs, that ... greatly reduces my future interest in adopting the language.

orbifold4y ago

I was bit by trying to figure out of how to combine unit of measurements with other numerical computations. Ultimately a lot of the features look great on paper, but once you start using them, I only ever was able to produce an ungodly mess instead of what I could accomplish in Python in roughly the same time. Everything that goes beyond what Matlab does, sometimes looks great on paper but is not very pleasant to use / sometimes badly broken unfortunately. That being said I work in an area of scientific research where Julia or more specifically DifferentialEquations.jl would seem to walk away with the win, but I find myself searching for alternatives implemented in Jax.

I would still think most of this is my failings, but it is also extraordinarily hard to figure out what is going wrong.

ChrisRackauckas4y ago

Anything other than units? I'd be curious to know. Unitful.jl is something which I think is completely the wrong architecture (it violates many standard assumptions about arrays when used in arrays) so that's a somewhat special case (and I plan to create a new units library to completely remove uses of Unitful).

orbifold4y ago

I greatly appreciate your work, by the way, I just have not been able to make use of it effectively. Some of the problems revolve around the question of how to best proceed in a situation, where you have a high dimensional state space, but it is naturally partitioned in some way. There are several solutions for this (SubArray etc.), but the burden is entirely put on the user. I had the impression that there was a tension between what I would have preferred to write and what I could easily write as soon as I attempted to generalize from the examples that I could find. With JAX the corresponding libraries operate on Pytrees and as a user you can specify equations pretty naturally and easily without much fuss. If you want to use XLA, MPI, CUDA in Julia it typically is the same, in theory it should be possible to make things work in practice I have struggled tremendously to do anything productive.

When I looked at the adjoint event handling code last, I couldn’t figure out in the implementation, whether the general case was handled correctly, especially since parts of it still seemed in flux. Writing similar code in JAX leaves close to no room for interpretation that the code is correct. I am sure most of it is down to familiarity. But since ultimately I want to do ML relared things, right now JAX and related libraries ties up things that are there much more neatly even though overall SciML implements a more comprehensive set of techniques. I am still closely following the work around it especially in the area I am interested in and have some prototypes written in it, but it just hasn’t clicked yet.

wnoise4y ago

I highly recommend taking a look at George Hart's work on linear algebra: http://georgehart.com/research/multanal.html , although I do think he misses the point (or at least insufficiently emphasizes) that you almost never want to work with linear maps that cannot be described by a set of units on each axis that are multiplied together to get the entries.

JanisErdmanis4y ago

I am curios, what are those many standard assumptions about arrays that Unitful violates. There does not seem to be any space left for alternatives as representing each element with struct:

> struct Element{T, Unit} <: Number > value::T > end

which is placed in the array.

patrickkidger4y ago

You may already know of it, but if you want differential-equations-in-JAX then allow me to quickly advertise Diffrax: https://github.com/patrick-kidger/diffrax (of which I am the author, disclaimer).

orbifold4y ago

Yes I am aware :) it is missing a few things but I might end up contributing.

1 more reply

forgotpwd164y ago

>If it's a source of bugs, that ... greatly reduces my future interest in adopting the language.

It can be a source of bugs because some/many packages incorrectly assume that what you pass is 1-based indexed.

karmakaze4y ago

I was wondering if the 1-based arrays (and option to change index base) would factor into this.

> OffsetArrays in particular proved to be a strong source of correctness bugs. The package provides an array type that leverages Julia’s flexible custom indices feature to create arrays whose indices don’t have to start at zero or one.

Array indexing is such a core thing and I don't understand why anything mathematical or scientific would start with 1.

coldtea4y ago

>Array indexing is such a core thing and I don't understand why anything mathematical or scientific would start with 1.

Because starting with 0 is neither math nor array indexing in general.

It's just how the base addresss of an array pointer memory block was referenced in C (and it spread from there).

Which is why all math focused languages use 1-based (fortran, apl, matlab, r, mathematica, etc.)

Rayhem4y ago

> Because starting with 0 is neither math nor array indexing in general.

It very, very much is. Polynomials all start at a zero "index", as does just about every expansion I can think of (Fourier, Bessel, Legendre, Chebyshev, Spherical Harmonic, etc.) Combinatorics, too, make lots of use of zero indices and zero-sized sets. As for arrays, I'll leave it to Dijkstra[1] to explain why zero indexing is most natural. Zero indexing overwhelmingly makes the most sense in both math and computers because indexing is a different operation than counting.

[1]: https://www.cs.utexas.edu/users/EWD/transcriptions/EWD08xx/E...

2 more replies

jacobolus4y ago

Math (usually) uses 1-based indexes because those parts of math started before the concept of zero as a number, and then the convention persisted, even down to Matlab.

There are many similar path-dependent conventions in human culture. E.g. percentages originated before the concept of decimal fractions, base-sixty time units come from ancient Mesopotamia, and conventions about multi-dimensional array memory layout are based on the convention for drawing matrices on paper.

Most common mathematical sequences and series work better (more naturally/clearly) when zero indexing is used instead, and off-by-1 errors are a problem in mathematics just like computing (but less of a problem, because notation errors get silently corrected in readers’ heads, and don’t actually have to be interpreted strictly).

1 more reply

lapinot4y ago

Math traditionally has had some bad notation from a formal point of view, because humans are good at coping with bad notations (or going back and forth between variants), unlike machines and formal systems. Computer science being a (more) formal science (vs math which is overwhelmingly not done formally), it has criticized some traditional math notation which are ad-hoc and not nicely formalizable (and put forward variants that are actually better behaved in terms of mathematical structures).

For indices: indices are about referencing elements of finite ordered sets, say of size N. Hence the 'abstract' indexing set for N elements is the ordinal N. The most canonical way to represent it is to take the length-N prefix of the natural numbers (eg 0-based indexing, von neumann ordinals), which happen to have all sorts of additional structure (eg mod-N arithmetic). This is also consistent with the offset view (the i-th element is at offset i). The fact that people tend to start ordinal numbers at 1 doesn't change anything that mathematicians working with ordinal numbers take them to start at 0, for the same reason we start naturals at 0.

See also: notation for higher derivatives https://arxiv.org/abs/1801.09553; a bit further but in the same vein: notations for free variables in programs as de-bruijn indices (or some variant thereof) (it's further because it's practical for doing proofs, but not for writing concrete terms). There are probably other instances.

_vdpp4y ago

It makes iteration less error-prone too when the index of the last element is equal to the length of the array. In C it’s pretty easy to iterate past the end of an array if you use <= by mistake in a for loop, or forget a “length - 1” somewhere.

xdavidliu4y ago

> It's just how the base addresss of an array pointer memory block was referenced in C (and it spread from there).

There was also the famous 1-pager by Dijkstra: "Why numbering should start at zero"

https://news.ycombinator.com/item?id=777580

1 more reply

SeanLuke4y ago

> C (and it spread from there).

BCPL.

karmakaze4y ago

I don't read much about users modern languages with 0-based index requesting 1-based options/alternatives.

davisoneee4y ago

I'll start by saying that I greatly prefer 0-based, and have used but 0- and 1-based indexing, but the choice is largely arbitrary.

0 makes sense as the '0-th offset' when thinking from a pointer perspective, but I often find when teaching, that 1-based comes more naturally for many students (the 'first' item).

You mention mathematical or scientific work...but I often/mainly see enumerations (such as weights x_1, x_2, ... x_n or SUM 1 to N) start with 1, so for these 1-based can be a more natural/direct translation of mathematical notation to code.

rocqua4y ago

My experience is that 0-based offsets (and use of < or even != for upper bounds) mean that I should almost never have to write something like idx - 1 or idx + 1.

I came to 0-based offsets later in my career, having started with Matlab. So I have some real experience with 1-based offsets. Experience that was 'untainted' by being used to a different option. I much prefer 0-based.

Especially because I now sort-off have a linter rule in my head 'if I am writing i - 1 then I am making a mistake or doing something the wrong way'. Which has been quite successful.

1 more reply

jjgreen4y ago

Not for polynomial coefficient indices :-)

2 more replies

lahvak4y ago

As other posters noted, in mathematics both 0 based and 1 based indexing is used.

When dealing with matrices and vectors (including data tables and data columns), there is a strong preference for 1 based indexing: first row, first column, first entry, etc. Most matrix and vector based algorithms in literature use 1 based indexing. Programming these in a language with 0 based indexing is a mess, and a common source or errors.

When dealing with sequences, especially recursively defined ones, there is usually an initial value (indexed with 0) and then the n-th value is obtained by n applications of the recursive step, so 0 based indexing makes more sense, but in literature there is no fixed convention, and you can find examples with 0 based and with 1-based indexing. Another example of 0 based indexing in math are polynomials (and in extension, power series) where the index is the degree of the term, or in general any functional series where the 0-th term is the constant term.

There are also negative indices.

sdfhdhjdw34y ago

> Array indexing is such a core thing and I don't understand why anything mathematical or scientific would start with 1.

Because that's how maths work? Literally everywhere in maths you count from 1, except in software engineering. That's why. I hope that clarified your confusion.

planede4y ago

My hot take is that 1-based indexing is often a mistake in math too. It's also not universal, even within math. And linear algebra doesn't need 1-based indexing either, and some operations are even more easily expressed with 0-based indexing.

wnoise4y ago

Starting with 0 is quite common in series, e.g. Taylor, Fourier, Chebyshev expansions, etc.

1 more reply

temp89644y ago

> Array indexing is such a core thing and I don't understand why anything mathematical or scientific would start with 1.

From data analytic point of view, indexing should start with 1. When we analyze a data table, we always call the first row as the 1st row, or row #1, not row #0. It will be very strange to label rows as 0, 1, 2, 3, .... It may be fine for people with Computer Science background. But it would create so much confusion for almost everyone else...

1 more reply

hprotagonist4y ago

> Array indexing is such a core thing and I don't understand why anything mathematical or scientific would start with 1.

So, no FORTRAN, huh?

CRConrad4y ago

> Array indexing is such a core thing and I don't understand why anything mathematical or scientific would start with 1.

Counting things is such a core thing to humans that when we have a bunch of N things we think of them as thing #1 to thing #N. We start counting from 1, not 0.

Indexing from 0 in computing is adapting the human mind to the computer, purely for performance reasons that may have been relevant in the 50s or 60s but were beginning to be obsolete by the 70s. It was done so you could access elements of an array by the simplest possible calculation of your offset into heap memory. When your first element is stored at Starting_address, you need i for that first element to be = 0, just so you don't need to have the compiler add another constant term for each element to "Element is at Starting_address + i * sizeof(element)".

Would have been trivial, even then (as Wirth showed) to add that constant term calculation to compilers, but it was done without in C because that eliminated one whole integer operation from each (set of?) array access(es).

In stead, we got the mental gymnastics of

   for(i=0, i++, i<=N-1) {...}

and its many variations (in stead of just for i := 1 to N...), which surely have caused orders of magnitude more headaches in off-by-one bugs over the years than it saved on performance.

DNF24y ago

There are good arguments for using either 0- or 1-based indices. As you should be aware, there are many languages on each side.

While preferring one over the other is perfectly fine, I question the intellectual honesty of anyone claiming incredulity about opposite choice.

dklend1224y ago

If packages use generic indexing functions like eachindex, there would be no correctness issue with that specific example

sdfhdhjdw34y ago

The problem isn't that 1-base indexing can be "fixed" in Julia. The problem is that you see 1-based indexing as a flaw.

dekhn4y ago

I didn't say I see 1-based indexing as a flaw. I said I complained about it, and then learned they supported multiple types of offsets (which ostensibly resolved the issue for me), only to learn that the stats library was "written before offsetindex" and still has bugs related to it.

IshKebab4y ago

It is a flaw. Computers don't work that way fundamentally, and it introduces lots of awkward translation.

dash24y ago

But humans don't work 0-based. Try explaining to a bunch of scientists why for rows 2-5 of the DataFrame they have to write df[1:5].

1 more reply

KenoFischer4y ago· 29 in thread

So this one is a tough one for me, because Yuri has certainly spent significant time with Julia and I think he's a very competent programmer, so his criticism is certainly to be taken seriously and I'm sad to hear he ended up with a sour opinion.

There's a lot of different issues mentioned in the post, so I'm not really sure what angle to best go at it from, but let me give it a shot anyway. I think there's a couple of different threads of complaints here. There's certainly one category of issues that are "just bugs" (I'm thinking of things like the HTTP, JSON, etc. issues mentioned). I guess the claim is that this happens more in Julia than in other systems. I don't really know how to judge this. Not that I think that the julia ecosystem has few bugs, just that in my experience, I basically see 2-3 critical issues whenever I try a new piece of software independent of what language it's written in.

I think the other thread is "It's hard to know what's expected to work". I think that's a fair criticism and I agree with Yuri that there's some fundamental design decisions that are contributing here. Basically, Julia tries very hard to make composability work, even if the authors of the packages that you're composing don't know anything about each other. That's a critical feature that makes Julia as powerful as it is, but of course you can easily end up with situations where one or the other package is making implicit assumptions that are not documented (because the author didn't think the assumptions were important in the context of their own package) and you end up with correctness issues. This one is a bit of a tricky design problem. Certainly adding more language support for interfaces and verification thereof could be helpful, but not all implicit assumptions are easily capturable in interfaces. Perhaps there needs to be more explicit documentation around what combinations of packages are "supported". Usually the best way to tell right now is to see what downstream tests are done on CI and if there are any integration tests for the two packages. If there are, they're probably supposed to work together.

To be honest, I'm a bit pained by the list of issues in the blog post. I think the bugs linked here will get fixed relatively quickly by the broader community (posts like this tend to have that effect), but as I said I do agree with Yuri that we should be thinking about some more fundamental improvements to the language to help out. Unfortunately, I can't really say that that is high priority at the moment. The way that most Julia development has worked for the two-ish years is that there are a number of "flagship" applications that are really pushing the boundary of what Julia can do, but at the same time also need a disproportionate amount of attention. I think it's overall a good development, because these applications are justifying many people's full time attention on improving Julia, but at the same time, the issues that these applications face (e.g. - "LLVM is too slow", better observability tooling, GC latency issues) are quite different from the issues that your average open source julia developer encounters. Pre 1.0 (i.e. in 2018) there was a good 1-2 year period where all we did was think through and overhaul the generic interfaces in the language. I think we could use another one of those efforts now, but at least that this precise moment, I don't think we have the bandwidth for it. Hopefully in the future, once things settle down a bit, we'll be able to do that, which would presumably be what becomes Julia 2.0.

Lastly, some nitpicking on the HN editorialization of the title. Only of the issues linked (https://github.com/JuliaLang/julia/issues/41096) is actually a bug in the language - the rest are various ecosystem issues. Now, I don't want to disclaim responsibility there, because a lot of those packages are also co-maintained by core julia developers and we certainly feel responsibility to make those work well, but if you're gonna call my baby ugly, at least point at the right baby ;)

patrickkidger4y ago

FWIW my take is not that Yuri is expressing "there are too many bugs" so much as he's expressing a problem in the culture surrounding Julia itself:

> But systemic problems like this can rarely be solved from the bottom up, and my sense is that the project leadership does not agree that there is a serious correctness problem.

Concisely:

1. The ecosystem is poorly put together. (It's been produced by academics rather than professional software developers.)

2. The language provides few tools to guarantee correctness. (No static typing; no interfaces.)

Personally, what I'd love to see is one of the big tech companies come on board and just write their own ecosystem. The Julia language is amazing. The ecosystem needs to be rewritten.

ChrisRackauckas4y ago

Lots of things are being rewritten. Remember we just released a new neural network library the other day, SimpleChains.jl, and showed that it gave about a 10x speed improvement on modern CPUs with multithreading enabled vs Jax Equinox (and 22x when AVX-512 is enabled) for smaller neural network and matrix-vector types of cases (https://julialang.org/blog/2022/04/simple-chains/). Then there's Lux.jl fixing some major issues of Flux.jl (https://github.com/avik-pal/Lux.jl). Pretty much everything is switching to Enzyme which improves performance quite a bit over Zygote and allows for full mutation support (https://github.com/EnzymeAD/Enzyme.jl). So an entire machine learning stack is already seeing parts release.

Right now we're in a bit of an uncomfortable spot where we have to use Zygote for a few things and then Enzyme for everything else, but the custom rules system is rather close and that's the piece that's needed to make the full transition.

btilly4y ago

The fact that things are being rewritten and the primary criteria being looked at is speed IS culturally a big part of the problem. If you don't prioritize provable correctness first, then I guarantee that the code is not correct. And as the complaint explains, incorrect code costs people months and leads them to not trust the result.

Don't believe me? Re-read the blog post about how a major source of bugs is people making assumptions into silent errors by removing bounds checks. Simply being able to re-run the same code in a slow mode with the bounds checks turned back on would undoubtably catch bugs.

5 more replies

nextos4y ago

> The Julia language is amazing. The ecosystem needs to be rewritten.

I think this is pretty unfair. Julia has many libraries that have allowed me to build things that would have taken orders of magnitude more effort to produce in other languages with the same conciseness and efficiency.

Composability and efficiency hard. Are things better elsewhere? Python has excellent libraries. But these are big monoliths that not only do not compose well, but are also hard to understand deeply as they are essentially a thin layer over C, C++, Fortran, etc.

Julia simply needs more maintenance and more tests. There is no big corporate backing, and things depend on individual efforts. In my opinion, most packages are already polished and easy to understand.

IMHO, the biggest problem is that there is no reliable library to build huge transformers.

josephg4y ago

As a user, I’d prefer “correct but lacking composability” over “composable but sometimes my results will be silently wrong”.

What is Julia’s composability useful for if it leaves me unable to trust my results?

1 more reply

nl4y ago

> Are things better elsewhere? Python has excellent libraries. But these are big monoliths that not only do not compose well, but are also hard to understand deeply as they are essentially a thin layer over C, C++, Fortran, etc.

I dunno.

Things like the use of scipy.spatial.distance metrics[1] by in sklearn clustering[2] seems a great example of composability that is easy to learn and very efficient.

And the sklearrn side isn't a "thing layer over C, C++, Fortran" even if scikit is (sort of) this.

[1] https://docs.scipy.org/doc/scipy/reference/spatial.distance....

[2] https://scikit-learn.org/stable/modules/generated/sklearn.me...

Tarrosion4y ago

How has Python - almost surely the most successful and widely adopted scientific programming ecosystem - avoided the problems of #2? E.g. Python doesn't have static typing.

Is it just that Python is so widely used there's institutional support for incredible linting and type check tools despite the lack of static typing? Or that much of the science/data ecosystem of Python is written in lower level statically typed languages?

(sadly possibly necessary edit/clarification: I'm not trying to be That Guy who answers every complaint about Julia with a matching complaint about Python. I'm legitimately curious about how Python got where it is without static typing, and what that implies about paths to a better ecosystem for Julia.)

rudellius4y ago

Julia heavily makes use of multiple dispatch among with other convenient type related features much more complex than Python, to a point where they are often abused and sometimes have uncaught edge cases. It makes the language very powerful but has its downsides.

And to be fair to Python, static analysis has come a very long way and the CPython interpreter makes far fewer complex assumptions than the Julia compiler. It’s also fairly strongly typed as well, so I’ve found that challenges with the type system cause more issues with packaging and maintenance than it does correctness.

1 more reply

adgjlsfhk14y ago

I don't think it needs a rewrite as much as careful maintenance from people who have time to dedicate to software quality. Most of the APIs are good, it's just that a lot of the code is under-tested and doesn't receive enough love. Having more big companies using Julia would help a lot with that.

amkkma4y ago

Hi Keno,

Thanks for the honest assessment. Do you have any thoughts about correctness/ composability of compiler transforms like AD, reliability of GPU acceleration and predictability of optimizations? (basically what you've discussed in some of your compiler talks).

How is that going to be possible in an imperative language? Right now we have lux.jl, which is a pure by convention DL framework, but that ends up being jax without the TPUs, kernel fusion, branching (Lux relies on generated functions) and copy elision (though this last part is being worked on IIUC).

A bunch of folks in the ML, Probprog and fancy array space have been grappling with things like generated functions, type level programming and such, and were wondering about future directions in this space: https://julialang.zulipchat.com/#narrow/stream/256674-compil... there among other discussions

Edit: re : bandwidth issue Jan Vitek's group is thinking a lot about the verification vs flexibility tradeoff and some people are working on a trait/ static typing system. Maybe something can be done to help them along?

KenoFischer4y ago

> Thanks for the honest assessment. What about correctness/ composability of compiler transforms like AD, reliability of GPU acceleration and predictability of optimizations? (basically what you've discussed in some of your compiler talks).

I don't think we really have a good answer yet, but it's actively being worked on. That said, I don't think we can be faulted for that one, because I don't think anybody really has a good answer to this particular design problem. There's a lot of new ground being broken, so some experimentation will be required.

> TPUs, kernel fusion, branching (Lux relies on generated functions) and copy elision (though this last part is being worked on IIUC).

We have demonstrated that we can target TPUs. Kernel fusion is a bit of an interesting case, because julia doesn't really use "kernels" in the same way that the big C++ packages do. If you broadcast something, we'll just compile the "fused" kernel on the GPU, no magic required. There is still something remaining, which is that when you're working on the array level, you want to be able to do array-level optimization, which we currently don't really do (though again, the TPU work showed that we could), but is broadly being planned.

> Edit: re : bandwidth issue Jan Vitek's group is thinking a lot about the verification vs flexibility tradeoff and some people are working on a trait/ static typing system. Maybe something can be done to help them along?

We work closely with them of course, so I think there'll be some discussions there, but it's a very tough design problem.

amkkma4y ago

Glad to hear it's being worked on!

> That said, I don't think we can be faulted for that one, because I don't think anybody really has a good answer to this particular design problem.

Agreed! To be clear, If there's any implication of "fault" it was certainly not in a moral sense or even anything around making poor design decisions. Julia's compiler is being asked to do many new things with semantics that necessarily predated many advances in PL.

Re Kernel fusion, there's another piece here, which you may or many not have included in "array-level optimizations". Julia's "just write loops" ethos is awesome, until you get to accelerators...now we're back to an "optimizer defined sub language" as TKF puts it. People like loops and flexibility, Dex, Floops.jl, Tullio, Loopvec and KA.jl show that it's possible to retain structure and emit accelerator-able loopy code. But none of those, except for dex, has a solution for fusing kernels that rely on loops. I'm still using the concept of Kernels, because there's still a bit of a separation between low level CUDA.jl code/these various DSLs and higher level array code, even if not as stark as python or C++.

Would be really cool, if like Dex, there's a plan to fuse these sorts of structured loops as well. Dex does it by having type level indexing and loop effects (they're actually moving to a user defined parallel effect handler system (https://arxiv.org/abs/2110.07493) ...the latter can tell the compiler when it's safe to parallelize and fuse+beta reduce loops. But that relies on structured semantics/effects and a higher level IR than exists in Julia.

Not sure what a Julian solution would look like, if possible. But given the usability wins, it would be great to have in Julia as well.

1 more reply

amkkma4y ago

to add, as you know, this is part of a more general problem about type level programming vs write your own compiler vs the non composability of DSLs, where Julia folks in various other non ML domains like PPLs and fancy arrays have been wondering about how to do things that get compiled away, without relying on compiler heuristics or generated function blowups: https://julialang.zulipchat.com/#narrow/stream/256674-compil...

Another non ML example I discussed with some Probprog folks is that there was an arxiv review of PPLs and Julian ones that heavily rely on macros don't compose well within and across packages. The same mechanism for composability which Dex uses for parallelism and AD (effect handlers) is what new gen PPLs in jax and Haskell are using for composable transformable semantics, so maybe that's worth looking into.

We've been having some discussions about how to bring that to Julia, but stalled on engineering time and PL knowledge. Eventually wanted to talk to the core team about it with proposal in hand, but never got there. Let me know if you'd like to talk to some of those folks who have been involved in the discussions as you design the new compiler plugin infra.

https://julialang.zulipchat.com/#narrow/stream/256674-compil...

StefanKarpinski4y ago

The big language design problem that I think this post highlights is that the flip side of Julia's composability is that composing generic code with types that implement abstractions can easily expose bugs when the caller and the callee don't agree on exactly what the abstraction is.

Several of the bugs that Yuri reported are a very specific case of this: there's a lot of generic code that assumes that array indexing always starts at one, but that's not always the case since OffsetArrays allow indexing to start anywhere. The older code in the stats ecosystem is particularly badly hit by this because it often predates the existence of OffsetArrays and the APIs that were developed to allow writing efficient generic code that works with arrays that don't start at the typical index (or which might even want to be iterated in a different order).

Fixing these specific OffsetArray bugs is a fairly straightforward matter of searching for `1:length(a)` and replacing it with `eachindex(a)`. But there's a bigger issue that this general problem raises: How does one, in general, check whether an implementation of an abstraction is correct? And how can one test if generic code for an abstraction uses the abstraction correctly?

Many people have mentioned interfaces and seem to believe that they would solve this problem. I don't believe that they do, although they do help. Why not? Consider the OffsetArray example: nothing about `for i in 1:length(a)` violates anything about a hypothetical interface for AbstractArrays. Yes, an interface can tell you what methods you're supposed to implement. There's a couple of issues with that: 1) you might not actually need to implement all of them—some code doesn't actually use all of an interface; 2) you can find out what methods you need to implement just by running the code that uses the implementation and see what fails. What the interface would guarantee is that if you've implemented these methods, then no user of your implementation will hit a missing method error. But all that tells you is that you've implemented the entire surface area of the abstraction, not that you've implemented the abstraction at all correctly. And I think that covering the entire surface area of an abstraction when implementing it is the least hard part.

What you really want is a way to generically express behaviors of an abstraction in a way that can be automatically tested. I think that Clojure's spec is much closer to what's needed than statically checked interfaces. The idea is that when someone implements an abstraction, they can automatically get tests that their implementation implements the abstraction correctly and fully, including the way it behaves. If you've implemented an AbstractArray, one of the tests might be that if you index the array with each index value returned by `eachindex(a)` that it works and doesn't produce a bounds error.

On the other end, you also want some way of generating mock instances of an abstraction for testing generic code. We do a bit of this in Julia's test suite: there are GenericString and GenericSet types, which implement the minimal string/set abstraction, and use these to test generic code to verify that it doesn't assume more than it should about the string and set abstractions. For a GenericArray type, you'd want it to start at an arbitrary index and do other weird stuff that exotic array types are technically allowed to do, so that any generic code that makes invalid assumptions will get caught. You could call this type AdversarialArray or something like that.

I've personally thought quite a bit about these issues, but as Keno has said, there hasn't been time to tackle these problems in the last couple of years. But they certainly are important and worth solving.

On a personal note, Yuri, thanks for all the code and I'm sorry to see you go.

mfsch4y ago

It seems to me that much of the difficulty with interfaces, whether they are made explicit or kept implicit, lies in defining the semantics that the functions are supposed to have.

As we expand the types our generic code can handle, we have to refine the semantics it relies on. For a long time, Base.length(::AbstractArray) could mean “the largest one-based index of the array”, but then we started using the same code that handles regular Arrays for OffsetArrays and this interpretation was no longer valid. I guess the alternative would have been to leave length(::OffsetArray) unimplemented and block the valid use of OffsetArrays for all generic code that understands Base.length as “the number of values”.

It can still be difficult to tell what a function like Base.length should mean if I implement it for my types. For example, should it return the number of local values or the global length for an array that is distributed between multiple processes (e.g. in an MPI program)? Perhaps some generic code will use it to allocate a buffer for intermediate values, in which case it should be the local length. Or some generic code computes an average by dividing the (global) sum by the global length.

It seems impossible to come up with a precise definition of all the semantics your generic code assumes a priori, so we can either restrict our usage of generics to a small number of concrete types that were considered when the code was written, or we have to accept that we occasionally run into these sorts of issues while we refine the semantics.

Anecdotally, it has been my experience that packages that have been made to work in many generic contexts (such as the ODE packages) are likely to work flawlessly with my custom types, while packages that have seen less such effort (e.g. iterative solvers) are more likely to cause issues. This makes me hopeful that it is possible to converge towards very general generic implementations.

It is also worth mentioning that it is very possible to use Julia without ambitious use of cross-package generic functionality, and use it “merely” as a better Fortran or Matlab.

Sukera4y ago

To expand on the "interfaces are not enough" part: Defining an interface on an abstract type only gives you that a implementation exists, not that it is correct, i.e. that the specific implementation for a subtype guarantees the same properties the interface specifies.

On top of this, you really want to be alerted to when you expect more of an interface than the interface guarantees - this is what happened in the case of `1:length(A)` being assumed to give the indices into `A`, when the `AbstractArray` interface really only guarantees that a given set of methods exists.

I feel like these sorts of issues more or less require more formal models being provided & checked by the compiler. Luckily for us, nothing in this space has been implemented or attempted in & for julia, while there are a lot of experiments with formal methods and proofing systems being researched right now (TLA+, coq,..). There are of course a lot of footguns[1], but the space is moving fast and I'd love to see something that makes use of this integrated into julia at some point.

[1]: Why specifications don't compose - https://hillelwayne.com/post/spec-composition/

tialaramex4y ago

> Defining an interface on an abstract type only gives you that a implementation exists, not that it is correct

Pretty far off topic for Julia, but the definition of Rust's Traits over semantics rather than syntax (even though of course the compiler will only really check your syntax) gives me a lot of this.

The fact that this Bunch<Doodad> claims to be IntoIterator<Item=Doodad> tells me that the person who implemented that explicitly intends that I can iterate over the Doodads. They can't accidentally be IntoIterator<Item=Doodad> the author has to literally write the implementation naming the Trait to be implemented.

But that comes at a heavy price of course, if the author of Bunch never expected me to iterate over it, the best I can do is new type MyBunch and implement IntoIterator using whatever ingredients are provided on the surface of Bunch. This raises the price of composition considerably :/

> you really want to be alerted to when you expect more of an interface than the interface guarantees

In the case alluded to (AbstractArray) I feel like the correct thing was not to implement the existing interface. That might have been disruptive at the time, but people adopting a new interface which explicitly warns them not to 1:length(A) are not likely to screw this up, and by now perhaps everything still popular would have upgraded.

Re-purposing existing interfaces is probably always a bad idea, even if you can persuade yourself it never specifically said it was OK to use it the way you suspect everybody was in practice using it, Hyrum's Law very much applies. That interface is frozen in place, make a new one.

jerome-jh4y ago

I think the OP specifically complains about the use of @inbounds and that the documentation was advocating an invalid use of it. Some libraries may not have been updated to handle AbstractArray: that's normal SW rot. But the out of bound access being unreported is the actual grief of the OP.

clhodapp4y ago

> What you really want is a way to generically express behaviors of an abstraction in a way that can be automatically tested.

The pure FP ecosystems in Scala often accomplish this in the form of "laws", which are essentially bundles of pre-made unit tests that they ship alongside their core abstraction libraries.

iamed24y ago

Invenia's approach to interface testing ("Development with Interface Packages" on our blog) does some of the things you suggest as a standard of practice, by providing tools to check correctness that implementers can use as part of package tests. ChainRulesTestUtils.jl is a decent example (although this one doesn't come with fake test types). I think this is typically good enough, and only struggles with interface implementations with significant side effects.

One little win could be publishing interface tests like these for Base interfaces in the Test stdlib. I appreciate that the Generic* types are already exposed in the Test stdlib!

blindseer4y ago

> 2) you can find out what methods you need to implement just by running the code that uses the implementation and see what fails.

For large codebases this is SO painful to do. I just don't understand how anyone gets anything done when this is how they have to develop code.

StefanKarpinski4y ago

That's why interfaces are useful—they save you from that. But they don't actually solve the problem of checking that an abstraction has been implemented correctly, just that you've implemented the entire API surface area, possibly incorrectly. Note, however, that if you have a way of automatically testing the behavioral correctness of an implementation, then those tests presumably cover the entire API, so automatic testing would subsume the benefit that static interface checking provides—just run the automatic tests and it tells you what you haven't implemented as well as what you may have implemented incorrectly.

2 more replies

kaba04y ago

Interface’s provide correctness guarantees by way of implementing them is a conscious decision. If your array implements GenericArray, you know about that interface, and presumably what it is used for. Its methods can also contain documentation.

The point is a common point of… trust may be the word? Two developers that don’t even know each other can use each other’s code correctly by programming against a third, hypothetical implementation that they both agree on. Here OffsetArray would simply not implement the GenericArray interface if the latter expects 1-based indexing.

In this specific case the solution would be to move the indexing question into the interface itself - it is not only an implementation detail. Make the UltraGenericArray interface have an offset() method as well and perhaps make [] do 1-based indexing always (with auto-offsetting for indexed arrays), and a separate index-aware get() method, so that downstream usage must explicitly opt in to different indexing.

renox4y ago

I remember reading a long time ago about the 1-based array and the offset-array 'kludge'.

My first thought was they should have replicated Ada's design instead, my second thought I hope that they have a good linter because putting arbitrary offset implementation in a library is a minefield.

I don't claim to be especially smart: this is/was obvious.. Unfortunately what isn't obvious is how to fix this issue and especially how to fix the culture which produces this kind of issue..

StefanKarpinski4y ago

Offset arrays aren't a kludge and the package would exist regardless of whether zero or one has been chosen as a default base for indexing. Having arbitrary index ranges in different dimensions is extremely useful in many application domains. When working with FFTs, for example, it's natural for the indices to be symmetrical around zero. Or when doing coordinate transforms like in this example from the excellent Images.jl package: https://juliaimages.org/stable/tutorials/indexing/.

3 more replies

HelloNurse4y ago

> there are a number of "flagship" applications that are really pushing the boundary of what Julia can do, but at the same time also need a disproportionate amount of attention.

Disproportionate effort is an obvious sign that hacks to keep such flagships seaworthy are prioritized over a good language and a good library.

> Basically, Julia tries very hard to make composability work, even if the authors of the packages that you're composing don't know anything about each other.

Typically, programming languages and libraries don't need to "try very hard" because they are designed to be safe and correct, at the cost of curbing ambitious features.

> not all implicit assumptions are easily capturable in interfaces. Perhaps there needs to be more explicit documentation around what combinations of packages are "supported".

Supporting useful "combinations of packages" isn't a desirable approach to language and library evolution. Implicit assumptions must disappear, either by becoming explicit or by becoming unnecessary; both ways represent genuine progress, not fruitless firefighting.

JustFinishedBSG4y ago

> Disproportionate effort is an obvious sign that hacks to keep such flagships seaworthy are prioritized over a good language and a good library.

I do not think this is true; from my limited Julia experience the reason the flagship features need disproportionate efforts is precisely because they are research project and the developers make sure they are not hacks.

dang4y ago

Re the title: ok, we've replaced the submitted title ("The Julia language has a number of correctness flaws") with a representative phrase from the OP which uses the word 'ecosystem'.

HN's title rule calls for using the original title unless it is misleading or linkbait (https://news.ycombinator.com/newsguidelines.html) and "Why I no longer recommend Julia" is generic enough to be a sort of unintentional linkbait - I think it would lead to a less specific and therefore less substantive discussion. In that sense the submitter was probably right to change the title, and for the same reason I haven't reverted it.

I'm going to autocollapse this comment so we don't get a big thread about titles.

KenoFischer4y ago

Thanks. Appreciate your thoughtful moderation as always :).

RcouF1uZ4gsC4y ago· 9 in thread

> If you pass it an array with an unusual index range, it will access out-of-bounds memory: the array access was annotated with @inbounds, which removed the bounds check.

It think making indexes configurable is a huge mistake. Even if they are not ideal for the situation, having a single way to do indexes makes a huge source of confusion and potential bugs just go away. And this is orthogonal to whether you pick 0 or 1 as your starting point, as long as the whole language embraces that.

For example with C/C++/Rust, you know it is zero based indexing. Even if it is not perfectly ideal for your formulas, the mental math of translating to zero based is with not constantly having to worry about if a library is one based or zero based and what happens if you compose them.

mattkrause4y ago

There's a parallel idea, that you should avoid--insofar as is possible--numerical indexing. In other words, instead of iterating over `0:length(X) - 1` or `1:length(X)`, you use something like `for element in array` or

    indices = CartesianIndices(multidimensional_X)
    for index in indices

       X[index] = # whatever

If you do that, you don't need to keep track of whether it's zero-based, one-based, or anything else. In fact, you may not even need to keep track of the number of dimensions, as in this example, https://julialang.org/blog/2016/02/iteration/

mike_hock4y ago

Works great for trivial cases where there's no interdependency between array elements. As soon as you need to access, for example, adjacent elements, you want to be able to just iterate over 1:length(X) - 1 and access a[i-1] and a[i]. This is the most direct way and thus easiest to get right. Abstractions only make it more error prone.

mattkrause4y ago

Is `for i in eachindex(X)` really any worse?

You can still do math on i, it avoids issues with OffsetArrays, and it might even be clearer why you're iterating. It requires that the array type support linear indexing, but so does doing anything sensible with X[i] and X[i-1].

runevault4y ago

I'm only skimming this post and I'm not familiar with Julia so maybe I'm missing it, but does it have a way to get an item AND its index? There's I think Enumrable? in Rust where it gives you a tuple with both the item and its index in cases where you need both.

leephillips4y ago

     julia> pairs("François") |> collect                                                                        
     8-element Vector{Pair{Int64, Char}}:                                                            
     1 => 'F'                                                                                                
     2 => 'r'                                                                                                
     3 => 'a'                                                                                           
     4 => 'n'                                                                                                
     5 => 'ç'                                                                                                  
     7 => 'o'                                                                                                  
     8 => 'i'                                                                                                  
     9 => 's'

Notice the missing index 6, because ç takes two bytes.

In contrast, enumerate() gets you the iteration number:

    julia> enumerate("François") |> collect                                                                    
    8-element Vector{Tuple{Int64, Char}}:                                                           
     (1, 'F')                                                                                                
     (2, 'r')                                                                                                
     (3, 'a')                                                                                           
     (4, 'n')                                                                                                
     (5, 'ç')                                                                                                  
     (6, 'o')                                                                                                  
     (7, 'i')                                                                                                  
     (8, 's')

This can trip you up.

1 more reply

lmiq4y ago

for (i,val) in pairs(array)

adgjlsfhk14y ago

OffsetArrays can be really nice for things like convolutions. For example, it ends up being really natural to have a matrix that is indexed on [-2:2, -2:2] to implement a gausian blur. It definitely is a potential bug source though.

kllrnohj4y ago

Indexes being configurable makes a ton of sense. It's why so many languages end up with a slice type (or array_view or span or whatever you want to call it). Why shouldn't the base array type just itself be the slice type?

DNF24y ago

What do you mean by 'mistake'? How are the Julia devs going to stop someone from defining arrays with configurable indices?

Are you suggesting that the core language should somehow make this impossible? How?

NeutralForest4y ago· 8 in thread

I mean this looks like good potential targets to improve the language moving forward, it's healthy to not be in awe of your tools and push to make them better. I don't see this as "bad" honestly.

s_Hogg4y ago

It seems like the point of the article is that that push is insubstantial, if it even exists. Given the language has been around this long it's a bit worrying that stuff like that is the potential target for moving a language forward.

Julia has always had a reputation in my mind at least of being "by academics, for academics" and there's unfortunately a dark side to that in terms of reliability and maintainability. The concept and goals are great, which is annoying. If this language had stayed focussed on the basics, it would be extremely handy for someone like me who trains and deploys models in an edge computing environment. No way I'm doing that with stuff like this going on.

fluidcruft4y ago

For what it's worth many people feel similarly about R. R is great for people actively working in statistics research (I assume because that's what I'm always told). But for a lot of us who just want to do some analysis, it's constantly breaking and we've learned to default to just starting from scratch when we need to revisit something we did a few years ago. Or we figure out how to buy a commercial system.

tylermw4y ago

R is not constantly breaking. R Core does a remarkable job ensuring backwards compatibility. There are only a few prominent examples of significant "breaking" behavior across decades of the language existing, and those can often be reverted by setting an option (e.g. `options(stringsAsFactors = TRUE)`). But backwards compatibility is the primary concern with any update to the R language or the packages maintained by R Core.

Now, if you're thinking about changes introduced by a specific user-contributed package breaking your analysis, that can indeed be a problem. But that can't be blamed on the R language. And the main user-contributed R statistics packages that have been around for decades (such as lme4 or survival) are mature and stable.

NeutralForest4y ago

I suppose we'll see? Honestly this is maybe an opportunity to adjust some goals of the language if this is the feeling people are having now and outreach to purely CS and SE people will probably be needed but seeing the presence it has at the MIT, I don't see it being a problem.

CoastalCoder4y ago

I think the real test will be whether or not Julia's custodians / developers start putting a greater focus on semantics and correctness.

When a language's raison d'être is to try out certain ideas, it probably makes sense for a while to ignore corner cases and rigor. But as the author points out, they eventually become gating factors for wider adoption.

markkitti4y ago

The question here is are these merely just bugs or is there something about the language that makes Julia error prone?

There is potential in using Julia's type inference engine to check for correctness. For example see JET.jl. "JET.jl employs Julia's type inference to detect potential bugs."

https://github.com/aviatesk/JET.jl https://www.youtube.com/watch?v=7eOiGc8wfE0

The video brings up some potential difficulties with Julia's metaprogramming facilities for static or lexical analysis, but also shows that these issues are also addressable.

The type inference system could be exploited for further effect. For example, the type system could be extended to check for shape information within the type as demonstrated in this prototype: https://twitter.com/KenoFischer/status/1407810981338796035

Julia has guard rails (e.g. default bounds checking), but also also provides facilities to work outside them (`@inbounds`, `unsafe_*` methods, `ccall`, in-place methods with a `!` suffix). Typically these provide features that trade safety for performance or access to features. Used judiciously one can achieve a balance between performance and safety. Julia is not a language that restricts its users to a sandbox in the name of safety, but it does provide bounds of where the sandbox is and is not.

Another take away from the original blog post is that much Julia development is happening in the open on Github. These issues and their fixes just require a Github account to contribute to. Is this a feature?

freemint4y ago

Jet.jl is far from a solution. Over short or long JuliaComputing (or someone else) will have to pay people full time to develop such tools if it wants to see larger adoption. Nobody expects Julia to be system language a language you write an OS in). The later those tools come the more code will need to be fixed up.

NeutralForest4y ago

It's still at version 1.x, maybe an explicit roadmap could help tackling those issues?

ur-whale4y ago· 8 in thread

The examples provided feel more like bugs in various libraries than an actual problem intrinsic to Julia the language.

wodenokoto4y ago

According to the article the problem is in the ecosystem, and partly the standard lib.

Basically it doesn't matter if Julia the language is fine, if all the stats packages make wrong calculations. Then what is the point of Julia, if you have to rewrite all things? might as well use another language where you trust the result of the ecosystem, since it is the ecosystem you need in order to produce results.

trenchgun4y ago

All bugs mentioned had been quickly fixed: https://news.ycombinator.com/item?id=31397425

wnoise4y ago

That comment doesn't say all bugs have been fixed, or even quickly fixed. When I check on the posted links, many are in fact still open, e.g.

https://github.com/JuliaStats/Distributions.jl/issues/1253

https://github.com/JuliaStats/StatsBase.jl/issues/642

https://github.com/JuliaStats/StatsBase.jl/issues/616

https://github.com/JuliaLang/julia/issues/39385

1 more reply

exyi4y ago

Yea, all are just bugs, not some intrinsic flaws in the language.

Given Julia's goals (performance, abstractions, accessible to science people), it's understandable if they had slightly higher bug concentration than other (similarly sized) ecosystems.

kllrnohj4y ago

The author's argument is that the bugs all share a pattern, and thus there is an intrinsic flaw. That doesn't necessarily mean the community wants to fix the intrinsic flaw, just like nobody is really interested in fixing the intrinsic memory safety flaws of C. But they shouldn't be denied as real risks, either, or a tradeoff of some kind.

Q6T46nT668w6i3m4y ago

@inbounds is a Base feature.

markkitti4y ago

Yes, and it is a perfectly fine feature when applied correctly. It would be incorrect to assume that an `AbstractArray` starts at `1` or `0` which is why the updated example now correctly uses `eachindex`: https://docs.julialang.org/en/v1/devdocs/boundscheck/#Elidin...

If you want to assume that an array starts at `1` one needs to require an `Array` rather than an `AbstractArray`.

arksingrad4y ago

@inbounds isn't the problem, it's incorrect usage of it. The poor docstring is absolutely a problem though, you should be iterating over eachindex(A), not 1:length(A).

xt004y ago· 7 in thread

If you look at the history of lots of packages in matlab they fixed tons of bugs that sound similar to this stuff over the years. It requires consistent hard work by a core group of people who understand the issues to get everything right. I have no idea who maintains Julia and these packages but the author of the article mentions this as language problems — aren’t these just bugs? Like if gcc was incorrectly multiplying some constant by the wrong value, that doesn’t sound like a bug with C but a bug with gcc right?

kazinator4y ago

The author's point seems to be something like: not only are there these bugs, but there is a lot of them that people are running into regularly, and the project isn't headed in a direction where the situation as such will improve (as in even if these are fixed, by the time that happens, there will be even more).

Hard to prove or disprove.

dandanua4y ago

Julia has more than 18k closed issues on its github. No wonder such an active user encountered a lot of it. It's not a problem with the language, though. Yes, it allows to use offsetarrays and @inbounds together, but C can read out-of-memory locations too, so what?

Edit: Julia is better than C in this regard, since the usage of @inbounds is explicit, i.e. everyone can see that the code is potentially unsafe.

rob744y ago

I think the point he was trying to make was that the example for @inbounds from the official documentation could cause out-of-bounds accesses, while it was clearly stated that you should only use @inbounds if you are sure that no out-of-bounds accesses are possible.

jakobnissen4y ago

The issue is that there is no way to verify if OOB access is possible given an abstract type, unless you know how that type behaves, i.e. how it's indexed.

And Julia provides no way of specifying the behaviour of abstract types.

1 more reply

masklinn4y ago

> but C can read out-of-memory locations too, so what?

So it's widely considered a plague upon the field, suffered because of the lack of alternative?

Gwypaas4y ago

> but C can read out-of-memory locations too, so what?

Simply decades of exploitable security issues.

krastanov4y ago

The Julia example is closer to Rust's `unsafe`. Pretty much every language let's you skip bound checks, in Julia (like other modern languages) it is elective. The author was complaining about a library that decided to skip the bound check in a clumsy way (there happens to be a "correct" way to skip the bound check). It is not really about the language.

1 more reply

one-more-minute4y ago· 6 in thread

It might be useful to separate the issues that are "just" bugs from the problems that come with Julia's unusual level of composability. I have no idea if Julia has more bog-standard, local bugs – things like data structure problems or compiler faults – than other languages of comparable maturity and resources, but clearly the OP has bumped into several, which is frustrating.

The composition bugs – as in offsetarrays or AD – are a bit of a special case. In most languages package A will only work with package B if it's specifically designed to, and the combination will be explicitly developed and tested. That A and B can work together by default in Julia is really cool, but it also means that as you add new types and packages, you have a quadratically growing set of untested edges.

The canonical solution is strict interfaces. But Julia is laissez faire about those too (with some good reasons). Together this means that if A doesn't work with B as expected, it's not always easy even to assign fault, and both might be reluctant to effectively special-case the other. Program transformations (autodiff) compound this problem, because the default is that you promise to support the universe, and it's not easy to opt out of the weird cases.

I think it's absolutely right to celebrate Julia's approach to composition. I also hope new research (in Julia or elsewhere) will help us figure out how to tame it a bit.

Sebb7674y ago

> That A and B can work together by default in Julia is really cool, but it also means that as you add new types and packages, you have a quadratically growing set of untested edges.

But as the authors example showed, they clearly can't work together - they just fail at runtime instead of at compile time.

Other languages have generics and interfaces to make stuff like this dynamically exchangeable. Sure, your code needs to be designed to support this, but it also means that the author explicitly thought about what they expect from their data structures. If they don't, you might suddenly find yourself violating implicit assumptions like arrays starting at 1.

SemanticStrengh4y ago

any tutorial/blog on what make julia compositionability special vs othe languages? Is there a relation with multiple dispatch or delegation?

chalst4y ago

Julia has a very nice type system, the nicest of any dynamically typed language I am familiar with. This is something to do with multiple dispatch, but it's more to do with trying to have a type system that allows all the JIT to unbox all the things that have to be unboxed for high performance without sacrificing the freedom of dynamic typing.

IIUC, Common Lisp is the giant on whose shoulders Julia built in this respect.

rashidrafeek4y ago

Yes. Its a side effect of multiple dispatch being the core paradigm of the language. See Stefan Karpinski's talk about it: https://www.youtube.com/watch?v=kc9HwsxE1OY

FabHK4y ago

The title of Stefan's talk is great: The Unreasonable Effectiveness of Multiple Dispatch. He gives a nice example of composability: how you can throw a new type into an existing algorithm and it just works.

jpeloquin4y ago

The "Unreasonable Effectiveness of Multiple Dispatch" talk is a good example of how multiple dispatch is special in a good way, in that everything (should) work together as new types and functions are added to the ecosystem. However, this also means the scope of potential integration bugs encompasses the entire ecosystem. The Julia manual has a small section about special composibility pitfalls arising from multiple dispatch: https://docs.julialang.org/en/v1/manual/methods/#man-method-...

As best as I can summarize it: Multiple dispatch is supposed to dispatch a function call to the implementation with the most "specific" call signature. This means that you must design your functions with an eye to what everyone else has implemented or might implement so whatever function gets called does the "right" thing, and also that your implementation doesn't block someone else from writing their own implementation specialized to other types. This requires some coordination across packages, as shown in one of the manual's examples.

The rules defining type specificity (subtyping) are complicated, and I think not in the manual. They have been inferred by observation: http://janvitek.org/pubs/oopsla18a.pdf. To quote from that paper, "In many systems answering the question whether t1 <: t2 is an easy part of the development. It was certainly not our expectation, approaching Julia, that reverse engineering and formalizing the subtype relation would prove to be the challenge on which we would spend our time and energy. As we kept uncovering layers of complexity, the question whether all of this was warranted kept us looking for ways to simplify the subtype relation. We did not find any major feature that could be dropped." Julia's multiple dispatch allows a high degree of composibility, but this does create new complexity and new problems.

rpmuller4y ago· 6 in thread

I've been a part of many language communities, and that the Julia team is the very best in terms of the professionalism of the language and the key modules.

Maybe the best response to this is to view it as a call to action for us Julia fanboys/girls to stop cheering and fix some bugs ;-).

CJefferson4y ago

I've had a couple of conversations on twitter with Viral B Shah (co creator of Julia) which I found unprofessional, so I stopped learning Julia. Unless he was just having a very bad day, in my opinion he takes badly to minor criticism of Julia (although others might disagree).

Edit, here is one thread I could find quickly: <EDIT2: edited out link which most people seem to think is actually fine, just people getting slightly annoyed on Twitter. I deleted the link as people were going and interacting with people in the old thread>

The comments aren't particularly bad, but they do feel to me like making a bad faith interpretation of someone's comment, then digging in. I don't feel that's a good way to talk to users, and ethos comes from the top.

cbkeller4y ago

I don't see anything problematic in what Viral said here; I think it would be fair to say your initial take ("Julia has been the future of machine learning for 10 years and will stay as the future of machine learning for the next 10 years") is likely to be perceived as at least somewhat inflammatory, a defensive response is natural enough in that context.

saghm4y ago

Yeah, I fully expected based on the description of the twitter interaction to see something really terrible, and from actually looking at it, it seems pretty mild. If anything, it seems like they went out of their way to try to bait the Julia creator and he had a fairly reasonable response to it. I'm not sure what could be considered "inflammatory" about any that.

CJefferson4y ago

What part of the conversation justifies "If you truly believe that nobody will ever adopt anything new, we would all have been programming in Fortran or assembly!"? To me that is a stupid escalation -- noone was suggesting not to do new things, Python (the discussed AI alternative) is of course newer than Fortran and assembly for a start!

That just seemed like a bizarre overreaction to me.

1 more reply

chrsig4y ago

That thread is just ripe with bad communication across the board. It's pretty clear that none of you understand what each other is saying, but are very willing to infer.

Maybe try not communicating on twitter.

Sukera4y ago

Do you have an example? I'd like to know more about this - it must have been quite egregious if it makes you stop learning a language.

1 more reply

QuackingTheQ4y ago· 5 in thread

I've spent a lot of time developing large computational codebases in Julia, and I think the most insidious of these issues is a product of no formal way of enforcing interfaces. Using one of the common packages to build a trait system and add some sort of guarantee that all the right methods are implemented for a given trait simplifies maintenance dramatically.

This doesn't catch mathematical bugs, but those crop up everywhere. Instead, knowing what the interfaces must be specified so you can trust your implementation is crucial, and being able to know when it is invalidated is invaluable.

I've had a few awful bugs involving some of the larger projects in this language, but a proper interface/trait system would simplify things exponentially. There are some coding style things that need to be changed to address this, like using `eachindex` instead of `1:length(A)` for array iteration as the example in the article points out. However, these should be one-off lessons to learn, and a good code linter should be able to catch potential errors like this.

Between a good code linter (or some static analysis, I'm pulling for JET.jl) and a formal interface spec, I really think most of Julia's development-side issues could be quelled.

ThenAsNow4y ago

I agree with the kernel of your point here, but also with the author of the article when he says "But systemic problems like this can rarely be solved from the bottom up, and my sense is that the project leadership does not agree that there is a serious correctness problem. They accept the existence of individual isolated issues, but not the pattern that those issues imply."

My impression is that the Julia core devs are more focused on functionality and being able to construct new, more powerful, faster capabilities than on reflecting on how the foundations could or should be made more rigorous. For this, I think the devs have to philosophically agree that soundness in the large should be a first-tier guiding principle, and that the language should have mechanisms whereby correctness-by-construction can be encouraged, if not enforced. Presently, notions of soundness seems to only be considered in the small, such as the behavior of specific floating point ops. Basically, I don't think the core devs are as concerned with soundness, rigor, and consistency as they are with being able to build more impressive capabilities.

I don't want this to sound like I'm ungrateful for the awesomeness that Julia and its ecosystem does bring to the table. For numerical computing, I don't see any alternatives whose tradeoffs are more favorable. But it is disappointing that it doesn't seem to learn the lessons about rigorous language design and the language-level implications for engineering vs. craftsmanship appropriate for a twenty-first century language.

FabHK4y ago

Sounds like Julia needs a Snow Leopard/Mountain Lion/High Sierra release - no new features, just cleaning things up...

fluidcruft4y ago

Could some of the need for interfaces be addressed by providing an extensive test battery for types of object? It seems like if something claims to be an implementation of a floating point number it should be possible to smash that type into every error ever found to uncover implementation errors.

QuackingTheQ4y ago

It's possible to hack interface verification into place at test-time, but that has a couple of problems:

1. Running the whole testing framework to determine if you implemented an interface is a high overhead when you're developing

2. You have a lot of tests to write to really check every error. Perhaps a package which defines an interface could provide a tester for this purpose

3. Interfaces should be attached to the types, and that should be sufficient for verifying the interface

I would settle for something like checking for the implementation of methods a la BinaryTraits.jl over what we have now, which is nothing. A huge step would be documentation and automated testing that proper interface methods are implemented, not even verifying if they're "correct". This drastically reduces the surface area you need to write and check to confirm compatibility with outside code.

This simple interface specification does produce design issues of its own, but correctness is much easier to handle if you know what needs to be correct in the first place.

mcabbott4y ago

Yes, although that seems like the easy half of this, making sure `struct NewNum <: AbstractFloat` defines everything. There aren't yet tools for this but they are easy to imagine. And missing methods do give errors.

The hard half seems to be correctness of functions which accept quite generic objects. For example writing `f(x::Number)` in order to allow units, means you also allow quaternions, but many functions doing that will incorrectly assume numbers commute. (And not caring is, for 99% of these, the intention. But it's not encoded anywhere.) Less obviously, we can differentiate many things by passing dual numbers through `f(x::Real)`, but this tends to find edge cases nobody thought of. Right now if your algorithm branches on `if det(X) == 0` (or say a check that X is upper triangular) then it will sometimes give wrong answers. This one should be fixed soon, but I am sure there are other subtleties.

catchclose89194y ago· 5 in thread

Only thing "interesting" to me there would be the automatic differentiation bugs ...but is there any argument as to them being the fault of the language, instead of just poor engineering from the library developers' part?

I mean, one can't expect all algorithms to work correctly with all datatypes just because the compiler allows that code to run ...you write tests and guarantee numerical stability for a small subset of types you can actually do it for, and then it's the code's consumers' job to ensure it work with types it's not documented to work and such, no? ...Julia is quite a dynamic language, JITed or what not, its semantics are closer to Python and Lisp than to Rust or Haskell ...maybe don't expect guarantees that aren't there and just code more defensively when making libraries others depends on?

Probably the Python + C(++) ecosystems works better bc their devs know they are working in loose, dynamic and weekly typed shoot-your-foot-off type languages and just take action and code defensively and test things properly, whereas Julia devs expect the language to give them guarantees that aren't there.

Q6T46nT668w6i3m4y ago

I think the author addresses this. It’s a Catch-22. If you restrict use to a small subset of types you’re undermining one of Julia’s best features.

As someone who has been writing a lot of numerical analysis code recently, I would absolutely love a type system that could describe and enforce numerical stability traits.

one-more-minute4y ago

Right. It's important to remember that tools like JAX and PyTorch have total control over the numerical libraries they are differentiating, and have freedom to impose whatever semantics, rules and restrictions are convenient (immutability and referential transparency in JAX, for example). Seemingly small decisions in an existing language and library can have a big impact on the feasibility and practicality of AD.

dklend1224y ago

That's exactly where Dex might improve over Julia, with language level control over mutability and effect handlers and array access safety ... time will tell.

So packages just use those features

Maybe it will hit the right trade off, or maybe Julia will adopt similar language level tools, but adjusted for dynamic semantics. Is that even possible?

catchclose89194y ago

> a type system that could describe and enforce numerical stability traits

Wow, that sounds cool! have your reasearched if anyone has done anything in this are? how would you even start to approach the problem?

Do you think it has any change of being done without massive sacrifices to performance?

jstrong4y ago

in rust code, I like using `debug_assert!` to represent numerical expectations/assumptions of the implementation. later if I have a problem, I can turn on debug assertions and I will get a bunch of additional checks. but I can also turn them off and not pay for them all the time.

Sporktacular4y ago· 5 in thread

This is a pity. It seems like a great language and I'd be keen to dive in more, but it seems fair to expect a math/numerical analysis-oriented language to be especially dependable wrt correctness.

I remember a claim made by Mathworks about MATLAB and wondering if it wasn't far fetched, but if true I appreciate it: "A team of MathWorks engineers continuously verifies quality by running millions of tests on the MATLAB code base every day." https://www.mathworks.com/products/matlab/why-matlab.html#re...

cbkeller4y ago

I actually wouldn't be surprised if the total number of tests run in the Julia ecosystem wasn't too different (thousands of packages with typically hundreds to thousands of unit tests, run on every commit and PR) -- virtually every Julia package has CI set up (at least standalone unit tests, though many packages could use more integration tests). Of course, in neither Matlab nor Julia do tests guarantee correctness.

Sporktacular4y ago

Is that tests for the purpose of verifying correctness or tests of applications that will flag problems incidentally? I'm not too familiar, but like the idea of dedicating resources to that specifically.

Guarantees aside, does MATLAB have an issue with this to the same extent as Julia?

cbkeller4y ago

Personally I'd probably categorize most unit tests as verifying correctness (but only for the scenarios tested); integration tests may be more useful for finding incidental issues that you wouldn't have thought to test for directly. I'm for sure on board with dedicating more resources to testing -- and in my case as an academic, this is something I only have really been exposed to as a result of interacting with the Julia community.

Matlab is pretty mature at this point, but I'm sure it's had its share of bugs over the years as well (especially if you also counted the file exchange, which is probably the closest thing they have to an open source package ecosystem); it would be interesting to compare the two at a similar level of maturity / development person-hours if quantitative data could be found.

adgjlsfhk14y ago

sample size of 1, but I've run 1 billion tests today in Julia (floating point power for Float16, Float32 and Float64)

Sporktacular4y ago

For correctness? What was the result?

2 more replies

Sukera4y ago· 4 in thread

Most of these seem to be about packages in the ecosystem (which, after clicking through all links, actually almost all got fixed in a very timely manner, sometimes already in a newer version of the packages than the author was using), not about the language itself. Other than that, the message of this seems to be "newer software has bugs", which yes is a thing..?

For example, the majority of issues referenced are specific to a single package, StatsBase.jl - which apparently was written before OffsetArrays.jl was a thing and thus is known to be incompatible:

> Yes, lots of JuliaStats packages have been written before offset axes existed. Feel free to make a PR adding checks.

https://github.com/JuliaStats/StatsBase.jl/issues/646#issuec...

EDIT: Since this comment seems to gain some traction - title is editorialized, original is "Why I no longer recommend Julia".

snicker74y ago

"known to be incompatible"

Known to whom? People who regularly participate in the Julia forum/chat? Julia's composability relies on people agreeing on unwritten rules and standards.

In other languages, such incompatibilities are caught by the compiler. Even in other dynamic languages like Python or Javascript, it is now considered best practice by many to annotate types whenever you can. Like Julia, Haskell is also composable. Unlike Julia, it does not need to sacrifice correctness.

DNF24y ago

Agreed, one cannot just expect this to be known.

Does type annotations in Python actually catch type errors? I thought they were mainly for documentation.

nickm124y ago

Absolutely yes, but you have to use a typechecker like mypy (and generally make it part of your release builds). I've found typechecking my Python code makes my development iterations much faster than writing tests. My biggest issue is that if you are using a legacy codebase or 3P library without type annotations then the "Any" type become pervasive and removes much of the value you get from type annotations. You can run mypy in a mode that flags when this is happening, but it's not like you're going to go type annotate the world just to push your code change.

snicker74y ago

Yes, if you use tooling (mypy). It definitely helped me a few times.

p33p4y ago· 4 in thread

Viral frequents HN so I will be curious to see if he engages this directly in a productive manor.

There are many great qualities of Julia, and I've wanted to love it and use it in production. However, coming from the tooling and correctness of Rust leaves me thinking something is just missing in Julia. One of the links in the post references "cowboy" culture. While I don't think this is the correct nomenclature, there is a sense with looking at the package ecosystem and even Julia itself that makes me think of the pressure in academia to publish constantly. I'm not sure what to make of that, and it's simply a feeling.

ViralBShah4y ago

I think Keno's comment above pretty much articulates my thoughts as well. I have met Yuri on several occasions and have been thrilled to see his contributions. I find the post constructive and it will certainly help make Julia better, and hope Yuri will be back at a later date.

Some of the issues linked are JuliaStats issues, and there's a lot happening to improve it, which should become more visible over the next few months. Example: https://discourse.julialang.org/t/pushing-julia-statistics-d...

Julia really pushes on language and compiler design in ways many statically typed languages do not. There is real wok to be done at the frontiers, and also investment in tooling built on top of that. It is all happening. The package ecosystem takes time to mature - Julia has a deliberate release process, the key packages have adopted a more deliberate release process, but stuff out in the long tail naturally tends to move fast - as it should.

derbOac4y ago

I've been a user of Julia for some time (at least since beta versions). I love the language and feel like the author of the blog post is maybe exaggerating or generalizing a bit too much. On the other hand, based on my personal experiences with Julia, I can definitely empathize and feel like there's a lot about the blog post that rings true.

I share your sense that "something is just missing in Julia" but I maybe disagree with the author in that I see it as potentially changeable or something, as not hopeless.

Julia has grown tremendously in a short period of time, both in the language, its implementation, and the size of the community. So in that sense I see it as inevitable there's going to be a lot of bugs and chaos for a bit.

On the other hand, I've always felt a bit of unease that a numerical language was being developed from the ground up as that, without it being an offshoot of more general purpose language. It's not that I think there's something inherently wrong with it, but I do think that having a greater variety of perspectives looking at it are more likely to catch things early.

I don't think in this regard it's a function of academia -- although it certainly could be -- it's more a function of having a very narrow community looking at the language. Regardless of how smart they all are, I think having a broader range of perspectives might catch things earlier.

In this regard, I might have preferred the Julia fervor and effort be put into some numerical Nim libraries, or a numerical "abstracted subset of Rust" or something. It's not so much I dislike Julia as much as it is I'd feel safer with a more generalist perspective on basic language design.

But who knows. To me it's a bit ironic the author focuses on Python as an alternative, because it's not like that is free from problems, and Python has been around for a lot longer. They might be different problems, but they're not absent. Python is a bit ironic too in that it has been sort of kludged together over time into what it is today, for better or worse. I guess it feels like to me all the major numerical programming platforms have this kind of kludgy feeling in different ways; Julia feels/felt a bit like an opportunity for a clean break, if nothing else.

jbezanson4y ago

I don't think there is anything "numerical" about the core language design of julia; it is just a general generic-function-based OO language. In fact I think we made many decisions in line with trends in the broader language world, e.g. emphasizing immutable objects, having no concrete inheritance, using tasks and channels for concurrency, and deliberately avoiding "matlab-y" features like implicit array resizing. Of course many in the "general purpose" crowd don't like 1-based indexing, but surely that is not the source of all of our problems :)

derbOac4y ago

Fair enough, I might have to eat my words a bit. Julia does have a lot to offer in terms of language characteristics, that's true, and I think part of the appeal. It has been a breath of fresh air, and feels well-thought out from basic principles. But along the lines of the original linked article, it's maybe worth thinking about why Julia hasn't seen more widespread adoption in say, web servers or systems programming, etc and so forth. I don't mean that as a criticism, just that I do think it's been marketed (or received) as a numerical computing language, and that's the community that it's primarily developed around for one reason or another, with its concomitant specific blind spots (as do all language communities).

For what it's worth, I prefer 1-based indexing.

My guess is a lot of what's in the post is probably tied to growing pains and maybe butterfly effects of novel language features on bigger-picture patterns. It would be interesting to see where things were at at a similar stage in other languages.

IshKebab4y ago· 4 in thread

I tried Julia but the compilation time for interactive use was just too insane.

I ended up paying £125 for MATLAB. Nothing else really remotely compares to MATLAB's plotting facilities.

forgotpwd164y ago

Did you tried Octave, GNU's numerical package that is compatible to MATLAB?

IshKebab4y ago

Of course! The language implementation is decent and the GUI is promising, except for the most important feature of the GUI - the plot viewer, which is completely awful. Forget about the same league, it's not even playing the same game as MATLAB.

DNF24y ago

I use Matlab daily, and the plotting is indeed excellent.

But the language itself is a horrible kludgy mess. Most of the development time is spent on input parsing and contorting your code into a vectorized shape.

IshKebab4y ago

Yeah I agree the language is not great, especially for non-matrix things. But Julia isn't exactly great either (unsurprisingly since it is pretty much a MATLAB derivative).

But I only use it for prototyping. I would absolutely not recommend it for production code. If there's some gnarly input processing to be done I'll do that in another language and just have it output CSV or similar.

fgh4y ago· 4 in thread

It would be interesting to know which language the author currently uses.

rendall4y ago

The author mentions that he was stuck on a problem for weeks using Julia, but solved it with Python within hours

tgv4y ago

That was someone else: Patrick Kidger is mentioned in the article. If I look at the author's github, it's go and javascript.

rendall4y ago

You're right. I misread.

ninjin4y ago

Pretty sure it was Go last time I talked to Yuri, he is very much a stand-up guy.

cs7024y ago· 3 in thread

A more appropriate title for the OP would have been:

"A new language that makes it easy to write and use generic algorithms on a growing number of custom types developed by others is bound to experience growing pains as difficult-to-foresee correctness bugs have to be discovered and fixed over time."

In my humble opinion, this kind of universal composability, which Julia makes easy via multiple dispatch and naming conventions, is the underlying root cause of all the correctness bugs that have surfaced as the language has evolved. But the bugs are being fixed, one at a time, and ultimately the result should be both beautiful and powerful. We will be all be thankful for it!

mbauman4y ago

The most tragic thing here to me is that we're losing Yuri — who has been an invaluable contributor and bug-reporter for issues like these — and that Yuri got burned out instead of feeling empowered.

cs7024y ago

Yeah, good point. Sometimes I wonder if the fact that so many of the folks developing and using Julia are both highly educated (e.g., in math) and insanely smart (evidently) is a barrier to mass adoption. That is, I wonder if the broader mass of developers out there -- many of whom are less knowledgeable -- find it difficult to benefit from and contribute to the Julia ecosystem.

lostmsu4y ago

I am not sure I know of any statically typed languages with generics, that experienced the same kind of problems on multiple occasions. The only one I am aware of is C# and array variance, which is kept for compatibility purposes.

jrochkind14y ago· 3 in thread

> Given Julia’s extreme generality it is not obvious to me that the correctness problems can be solved. Julia has no formal notion of interfaces, generic functions tend to leave their semantics unspecified in edge cases, and the nature of many common implicit interfaces has not been made precise (for example, there is no agreement in the Julia community on what a number is).

Does all that apply to Python? I think so? Yet apparently similar problems don't exist in python, and even one of the examples in OP had the reporter moving to python to have no problems getting the same thing to work that was problematic in Julia.

In a language intended for math, I do understand the desire to have something with more formal properties suited for guarantees and such. But Python seems to be doing just fine in that domain without those features, so, I'm not sure what we should conclude here.

adgjlsfhk14y ago

The main difference between Julia and python is that most of the "core" python ecosystem has had a lot more dev time put into it. Google, Facebook, and Microsoft all have hundreds of full time developers on major python packages.

jrochkind14y ago

Makes sense. I guess the author's contention is that if Julia had those formal features the author wants, it would need very significantly less dev time to reach python's levels of reliability?

It's of course plausible, that's what those sorts of features are intended for, but I'm not certain I'm absolutely confident. At any rate, python demonstrates it is not the only path, as the author seems to be suggesting ("it is not obvious to me the problem can be solved" without these features, says the author. But it's not obvious to me that those features are necessary to solve the problem, or sufficient to solve the problem...)

adgjlsfhk14y ago

Python's reliability here comes because it is a much less flexible language in some ways. If you write your own array type in python, and pass it into tensorflow, you would expect it to error. If you do the same thing in Julia, you would expect it to work.

1 more reply

asdfman1234y ago· 3 in thread

> In my experience, Julia and its packages have the highest rate of serious correctness bugs of any programming system I’ve used, and I started programming with Visual Basic 6 in the mid-2000s.

Oh God, is this what qualifies you as "old" now

CRConrad4y ago

Kids these days, eh? Lawn, etc.

asdfman1234y ago

Nah, it's not like that. In my mind an "older" programmer is like from the 90s.

But I did that stuff in the mid-2000s too: am I now an "old"?

Terrifying!

CRConrad4y ago

Relax: No, you're not. You've got another fifteen years.

cleandreams4y ago· 2 in thread

Unfortunately this is not a feature but a bug, and the worst kind, a bug at the language design level:

Basically, Julia tries very hard to make composability work, even if the authors of the packages that you're composing don't know anything about each other. That's a critical feature that makes Julia as powerful as it is, but of course you can easily end up with situations where one or the other package is making implicit assumptions that are not documented (because the author didn't think the assumptions were important in the context of their own package) and you end up with correctness issues.

JonChesterfield4y ago

It's an interesting point in the language design space. Composing unrelated projects gives a rapidly increasing state space of interactions which noone is directly responsible for. I can't decide if that's brilliant or broken by design.

adgjlsfhk14y ago

why not both?

jbezanson4y ago· 1 in thread

I'm not sure what to make of this. Yuri is great and I'll certainly miss having him in the Julia community. Yes, of course there are bugs. We work on fixing them all the time. If there are just too many for you, or we are too slow at fixing them for you, then OK I understand you might walk away.

With these kinds of posts (and the reactions to them) lots of issues tend to get conflated. For example there are issues with OffsetArrays because some people write code assuming indexes start at 1. Starting at 0 wouldn't fix that. A static type system wouldn't fix that; most static type systems don't check array bounds. Are we supposed to un-register the OffsetArrays package? Should we disallow overloading indexing? Personally I have told people not to use `@inbounds` many times. We could remove it, but those who want the last drop of performance would not be too happy. The only path I see is to fix the bugs.

> They accept the existence of individual isolated issues, but not the pattern that those issues imply.

I admit, I do not see the pattern allegedly formed by these issues. Of course, static types do remove a whole category of issues, but "switch to static types" is not really a practical request. There are other things you can do, like testing, but we do a LOT of testing. I really do not mean to downplay Yuri's experience here, I am just not sure what to take away other than that we should work even harder on bugs and quality.

ThenAsNow4y ago

I've worked on large engineering projects in physical disciplines. When I am the customer, I often bring in a group of independent experts to review the design products. Often these experts provide inputs that are not 100% usable in the form they're provided. One may have to disentangle their conflation of related-but-not-the-same issues, or ignore the specific solutions they propose, etc.

That being said, I have learned the hard way not to ignore or trivialize these review inputs, even if they are not immediately actionable as-provided. Users and reviewers are really good at figuring out weak areas or flaws even if they can't articulate the solutions, fully unentangle related issues, or do all the generalization or abstraction that would make those issues easier to address. There is usually some truth underlying the negative feedback.

The article looks to potentially be an example of an expert review in the above vein. If you are able to take a step back, you might find the HN discussion on this submission to provide further inputs to help figure out how any of this should be channeled into language, practice, and ecosystem improvements. Certainly there is more to work with here than little "to take away other than that we should work even harder on bugs and quality."

nohat4y ago· 1 in thread

The power of allowing everyone to make foundational types and functions that work together is indeed dangerous. I'm not sure you are better off in the even more dangerous waters of c/c++/fortran, except that they are older and more established with many times the man-hours sunk into them. Is there a good way to control the interaction of these many different libraries with losing the generality and composability of Julia?

I will say that as a matter of language design 1 based indexing is perfectly fine, 0 based indexing is perfectly fine. Choose your own indexing is a hilarious foot gun, so no surprise it went off sometimes. Fortunately using it seems to be quite rare.

DNF24y ago

But it's not a matter of language design. The 'choose your own indexing' is something you do entirely in libraries.

You can create your own indexing in python too, it will just be slow. The 'sin' of Julia is that it will be fast...

sharikous4y ago· 1 in thread

My opinion is that Julia was too ambitious from day one. Reimplementing the whole scientific computing stack AND a new modern language with an innovative type system and introspection AND perfecting tooling is just too big an effort.

The priority for correctness has been drowned out by too much other issues and we are here with a 10 years old language with a very perfectionist and ambitious mindset that is still a raw fruit in basically everything. It's not some rough edges it's just too many edges, most of them rough.

I cannot help thinking that if the same amount of people focused on a much smaller goal we could have something much more usable today. As it is now I know Julia won't be production ready for at least 10 years. And that's in the lucky case that it doesn't become irrelevant in the meantime.

DNF24y ago

If Julia followed your recommendation, it would be irrelevant before it ever started.

There is no way for a new language to be useful or relevant unless it brings significant improvements.

teddyh4y ago· 1 in thread

Actual title: “Why I no longer recommend Julia”.

mbauman4y ago

@Dang could we get the title corrected?

blindseer4y ago· 1 in thread

Correctness in Julia feels like it'll never happen, because interfaces seem like they'll never happen.

Correctness guarantees / interfaces and slow startup are both my biggest pain points in Julia.

I often think what would happen if every Julia dev just dropped the language and used Rust instead. A scientific ecosystem in Rust would be amazing.

jakobnissen4y ago

As someone who really likes both Rust and Julia, there is absolutely no way Julia's scientific users would switch to a static language. Rust is slow to write, verbose, also suffers from long compile times, has no REPL or garbage collector... It is deeply unsuitable for scientific coding.

ModernMech4y ago· 1 in thread

I always thought this sounded like a bad idea. I remember one time I was working with a C++ guy on a Matlab project, and he handed me some Matlab code with 0 based indexing assumed. I said "Did you even run this code?", and he assured me he had. But of course he had not, because if he did it would have complained about the 0-based indices. But the point is that it did complain when I ran it, and I was able to match it to my code. I imagine in Julia he would have used 0-based indices, and I would have used 1-based, and our programs would have silently failed.

cbkeller4y ago

For it to silently fail of course though, he would have had to explicitly used the OffsetArrays package and explicitly switched all `Array`s to `OffsetArray`s (which hopefully you would notice) -- and then you would have to go ahead and use those OffsetArrays in a package which doesn't support them; if you just go ahead use 0 as an index in plain Julia code it will error as you would expect.

chubot4y ago· 1 in thread

Oof, accessing out of bounds memory is pretty surprising to me for a dynamic language ... But I guess it's not surprising if your goal is to compile to fast native code (e.g. omit bounds checks).

I don't know that much about how Julia works, but I feel like once you go there, you need to have very high test coverage, and also run your tests in a mode that catches all bound errors at runtime. (they don't have this?)

Basically it's negligent not to use ASAN/Valgrind with C/C++ these days. You can shake dozens or hundreds of bugs out of any real codebase that doesn't use them, guaranteed.

Similarly if people are just writing "fast" Julia code without good tests (which I'm not sure about but this article seems to imply), then I'd say that's similarly negligent.

-----

I've also learned the hard way that composability and correctness are very difficult aspects of language design. There is an interesting tradeoff here between code reuse with multiple dispatch / implicit interfaces and correctness. I would say they are solving O(M x N) problems, but that is very difficult, similar how the design of the C++ STL is very difficult and doesn't compose in certain ways.

(copy of lobste.rs comment)

mbauman4y ago

You can also use `julia --check-bounds=yes` — and our testing frameworks automatically do so.

adolph4y ago· 1 in thread

Oftentimes people describe languages as "Turing complete" but how often do they talk about languages being "Gödel incomplete?" Another way of stating maybe is "Are what some call flaws what others call features?"

https://stackoverflow.com/questions/7284/what-is-turing-comp...

https://plato.stanford.edu/entries/goedel-incompleteness/

mattkrause4y ago

Even fairly simple arithmetic is incomplete, so unless the language is heavily restricted, allowing only multiplication of positive integers (x)or addition of natural numbers, they're all going to be incomplete.

ninjin4y ago

I am a long-time member of the Julia community and had a discussion with the author about these issues a long time ago – but did not give feedback on the post. Let me first state that Yuri is a great person and was a valuable member of the community. He pushed the boundaries of the language and produced some very nice packages in his time. His concerns are genuine and should be respected and discussed in that context.

Also, let me say that encountering these kinds of bugs is not something I have had experience with. But, I tend to be very conservative with my usage of libraries and fancy composition.

If I had more experience with programming language theory and implementation, perhaps I would have a better name to describe the source of the issues described. My attempt is to call it “type anarchy”. The way I see it, there is not a clear way to assign responsibility for correctness. In the case of the array used in the post, is it the fault of the implementer of the `sum` function (without a type signature, as it should be) or implementer of the data structure? I am honestly not sure. But as Julia breaks news ground with its type system and multiple dispatch, this could very much be an open question.

cwp4y ago

I wonder how much of this is just that Julia is more composable than most people are used to, and the community hasn't yet developed the patterns and culture that are needed to avoid these kinds of problems.

I'm thinking, for example, of the way that Smalltalkers often create parameters with type-evocative names, such as "aString". Or Objective-C with two-letter prefixes to work around lack of namespaces. Or even the Java "EntityAdaptorFactoryFactory" design aesthetic. (Some of you will shudder, and I'm with you, but it did solve real problems that the Java world was facing.)

Julia is still a pretty young language, and it's probably only recently that the ecosystem has gotten big enough to hit these problems.

Edit: come to think of it, one of the issues that the Java folks were dealing with was lack of composability. :-/

freemint4y ago

As a huge fan of Julia i got to fully agree. Although i would probably not "no longer recommend Julia" but "give huge caveats when mentioning Julia". Organisations (that includes those who maintain programming language) have values Bryan Cantrill has an excellent talk on this https://youtu.be/2wZ1pCpJUIM and i got to agree with the author that correctness (especially correctness under arbitary composability) is not a value that Julia teaches and instills in its users. Some Julia users care about this, some core maintainers do to (as the Pkg3 demonstrates). However there are many invocations (SafeTestSets vs Test) and stumbling blocks. I am aware of no efforts to do formal verification on Julia code. There are no good ways to move certain Run-Time to compile errors. Correctness is not a value of the Julia language. Here is the good thing though, as Bryan demonstrates in his talk, you can hire for values.

dmos624y ago

In extreme composability, it might be hard to determine where the origin of a bug is. Worse yet, when libraries start adhering and relying on the brokenness of other libraries, fixing the once minor bug isn't enough anymore. How do you address technical debt in such situations?

In my mind Julia broke new ground in terms of what happens when you create an environment where such compasibility is possible. Author's finishing thought is apt:

> Ten years ago, Julia was introduced to the world with inspiring and ambitious set of goals. I still believe that they can, one day, be achieved—but not without revisiting and revising the patterns that brought the project to the state it is in today.

hprotagonist4y ago

Ouch. That sounds all the more damning for the authors studious care to calmly describe instead of angrily rant.

I’ve spent too much time in research working on codebases that feel like quicksand — you never know what changing something might do!— to want to worry about that for stdlib or major package ecosystems, too.

pankgeorg4y ago

I feel this post is a bit unfair and quite outdated (seems like it's written 9-12 months ago), and I interpret his issue as a prioritization issue, not a language one. If your priorities mandate a more mature ecosystem, you should use one. The Julia ecosystem is much smaller - both in terms of people and development invested, than Python, Java or JavaScript, and still overperforms in many aspects of computing. If those aspects, where Julia is first-of-class, are not your priorities, and your fault tolerance is very low, maybe another tool is better for you.

Also, as every ecosystem, the Julia Ecosystem will naturally see some packages come and go. JSON3 is the third approach to reading JSON (and it's terrific). HTTP.jl is the reference HTTP implementation - Julia hasn't had it's `requests.py` moment. Web frameworks have also been immature, python has had `Django`, `pyramid`, `flask` and so many others before `FastAPI` (along with new language features) came and dominated. Some people need to put effort in attempts that will naturally hit a dead end before we have a super polished and neat FastAPI.jl, and the same goes for everything.

Also, https://github.com/JuliaLang/julia/issues/41096 is referenced with a wrong name that involves the issue's author's misunderstanding, can you update please and, if possible, add a note about the edit?

isaacimagine4y ago

So it seems Julia's multiple dispatch (dynamic dispatch for any function based on argument types) has a flaw: namely, if the types used do not match assumptions present in the implementation of the function (e.g. arrays start at 1), the results may be silently incorrect. Julia's multiple dispatch is really cool, but I'm not sure how this issue can be prevented in practice (without a lot of added verbosity). It'd be a pity to have to restrict yourself to a small set of types you know work with the functions you're using, because multiple dispatch is one of Julia's killer features.

randyzwitch4y ago

Not specific to specific examples in the article, I think some of the things people perceive as "bugs" other people see as features or an opportunity to correct past mistakes.

I can remember an example where I suggested automatic treatment of missing values in a stats library, and the library maintainer disagreed. Meaning, my lobbying for Julia to do what R/Python did was seen as "Yes, but that's wrong and we shouldn't promote that sort of treatment". As a business user, I didn't care that it was theoretically wrong, the maintainer as an academic did.

That ends up becoming open-source prerogative. I could do it wrong "on my own time" in my own code...doesn't make either a bug, but a different choice based on perspective.

ble4y ago

This article contains no instances of the word "test", which seems surprising but entirely in keeping with the author's observations.

> Julia has no formal notion of interfaces, generic functions tend to leave their semantics unspecified in edge cases, and the nature of many common implicit interfaces has not been made precise (for example, there is no agreement in the Julia community on what a number is).

> The Julia community is full of capable and talented people who are generous with their time, work, and expertise. But systemic problems like this can rarely be solved from the bottom up, and my sense is that the project leadership does not agree that there is a serious correctness problem. They accept the existence of individual isolated issues, but not the pattern that those issues imply.

It sounds like the cultural standard for writing libraries is, "works good enough for users like me" which should be good if you are using things the same way as the authors. Writing good tests for numerics is hard and grueling; testing numerics or numerics-like code is not nearly as fun or productive-feeling as using numerics to get shit done, so it all makes sense to me.

j7ake4y ago

In terms of saving human time I have found R to be fastest (in human time) for iterative prototyping, exploring, and visualising data

R still has the best statistical package ecosystem, although python is catching up.

cbkeller4y ago

This seems hard to evaluate without a quantitative comparison to the abundance of bugs in the package ecosystems of other languages at the same age. So, for instance, how many correctness bugs existed (or, alternatively, had been found and fixed) in the Python ecosystem when Python was ten years old? The author makes a subjective claim, but from the few other languages they mention it seems they are comparing primarily to older and more stable ecosystems.

snicker74y ago

A lot of these issues can be fixed. Adding robust type constraints (e.g. traits) and accompanying "static analysis" tooling would help a lot. Julia can learn a lot from ML-family languages (e.g. OCaml, Haskell) in that regard. And there are efforts in the Julia community to add these features via third-party libraries. However, I don't see things improving unless such features are baked into the language and used more ubiquitously in open source modules.

garrison4y ago

Similar correctness issues are a big part of the reason that, several years ago, I submitted a series of pull requests to Julia so that its entire test suite would run without memory errors under Valgrind, save for a few that either (i) we understood and wrote suppressions for, or (ii) we did not understand and had open issues for. Unfortunately, no one ever integrated Valgrind into the CI system, so the test suite no longer fully runs under it, last time I checked. (The test suite took nearly a day to run under Valgrind on a fast desktop machine when it worked, so is infeasible for every pull request, but could be done periodically, e.g. once every few days.)

Even a revived effort on getting core Julia tests to pass under Valgrind would not do much to help catch correctness bugs due to composing different packages in the ecosystem. For that, running in testing with `--check-bounds=yes` is probably a better solution, and much quicker to execute as well. (see e.g. https://github.com/JuliaArrays/OffsetArrays.jl/issues/282)

patrec4y ago

I tend to be a a bit wary of dynamic languages with sophisticated, performant implementations of complex abstractions, especially if they have somewhat niche appeal. In my experience this is a combination that makes for running into a lot of implementation bugs. For example, I've run into many more nasty compiler bugs with lisps (and julia at least qualifies as an almost-lisp) than with more simple-minded dynamic languages like python or erlang[1] or fairly sophisticated but niche statically typed languages.

I think watching Julia over the next few years will be quite interesting: it's the only dynamically typed language that has both sophisticated abstractions and a sophisticated implementation[1] that has enough pull to have a chance to become entrenched in certain domains. I wonder to what extent they will be able to get this problem under control.

[1] BEAM, unlike cpython, is actually a marvel of engineering and making very deliberate trade-offs. But it's not very complex.

[2] Javascript is of course the one pervasive dynamically typed programming language that has sophisticated implementations, but of mostly ill-conceived constructs.

cancandan4y ago

I wonder why not much is done to bring high performance scientific computing to common lisp. There are some interesting projects I was able to find like https://github.com/clasp-developers/clasp and https://github.com/marcoheisig/Petalisp and https://github.com/takagi/avm. But I guess it would be good to have a coordinated effort in this area.

uwuemu4y ago

A devastating article for Julia. I was thinking about trying Julia out... but not after reading this.

vasili1114y ago

Does the Python have similar issues?

hzhou3214y ago

Think about programing layers: A->B->C->D->...->Compiler->binary->output, where A is the end programmer, and B, C, D are the libraries and modules. I think what the article describes is not much different from issues in any complicated software systems, as quite a few comments also pointed out. However, when the language become more expressive and compiler become more clever, more of the issues will be rooted from the the compiler->binary link. I think this is inevitable with the current model of how software works, which I can simplify as: A -> [super compiler] -> output

The middle part is the concatenation of all the middle links and handles the complexity necessary to translate from language to output. As we trying to make A less complex, the middle [super compiler] will get more complex, and more buggy because of the complexity.

I believe the fundamental issue with this model is the lack of feedback. A feedback on output, and A makes change (in A) until output get correct. With the big complex and opaque middle, for one, we can't get full feedback on output -- that is the correctness issue. The more complex the middle gets, the less coverage the testing can achieve. For two, even with clear feedback -- a bug -- A cannot easily fix it. The logic from A to output is no longer understandable.

I believe the solution is to abandon the pursuit of magic solution of A -> [super compiler] -> output but to focus on how to get feedback from every link in A->B->C->D->...->compiler->binary->output

For one this give A a path to approach and handle complexity. A can choose to check on B or C or ... directly on output, depending on A's understanding and experience. For the least, A can point fingers correctly.

For two, this provides a path to evolve the design. The initial design on which handles which or how much complexity is no longer crucial. Each link, from A, to B, to C, ... to compiler can adjust and shift the complexity up and down, and eventually settle down to a system that fits the problem and team.

I believe this is how natural language works. Initially A tells B to "get an apple" and they directly feedback on the end result of what apple B gets to A and may alter layer of A by expanding into more details until it gets the right result. Then, some of the details will be handled by B and A can feed back on B's intermediate response for behavior. As the world gets more complex, the complexity at the layer A stays finite but we added middle layers. Usually, A only need feedback on its immediate link (B) and the final output, but B needs to be able to feedback on its next immediate link, and if A is capable, A may choose to cut-out the one of his middle man.

jahewson4y ago

When Julia was first released, I tried it out and decided I'd write a syntax highlighter for it, so I asked for a grammar. There wasn't one. I was told to refer to the parser source code, which was written in a custom dialect of LISP. That was a red flag for me and I never returned.

j / k navigate · click thread line to collapse

407 comments

246 comments · 47 top-level

ChrisRackauckas4y ago· 43 in thread

Strilanc4y ago

> Everything has correctness issues somewhere.

Also, to the extent that Julia uses LLVM, a correctness bug in LLVM is also a correctness bug in Julia. So arguing "LLVM has lots of correctness bugs" is not helping the case...

> because the code is all Julia, it's really easy to dig in there and find potential bugs.

suavesito4y ago

> So arguing "LLVM has lots of correctness bugs" is not helping the case

Another thing to have in mind is that Julia ships patches for some of these, that are not used upstream yet. So Julia does not suffer from some bugs on LLVM that other projects might.

ChrisRackauckas4y ago

1 more reply

mbauman4y ago

p33p4y ago

ChrisRackauckas4y ago

> but would you say most packages follow or enforce SemVer?

> Would enforcing a stricter dependency graph fix some of the foot guns of using packages or would that limit composability of packages too much?

kazinator4y ago

> Julia ships an entire patched version of LLVM to fix correctness bugs in numerical methods

Sounds like the banana ships with the gorilla which requires the entire jungle, and we're too busy fixing the gorilla to give the banana our undivided attention.

nightpool4y ago

   # TODO: this is inconsistent with the regular prod in cases where the arguments
   # require size promotion to system size.

How did this pass code review? Why would it be okay for a standard library function to be "inconsistent" in this way?

cmcaine4y ago

Julia released experimental support for arrays whose indexes don't start at 1 in Julia 0.5, October 2016.

The boundschecking feature was added in 2015, so at the time they wrote their code and examples, they were correct.

1 more reply

leephillips4y ago

Just to clarify, the prod() bug you mention was fixed about a year ago.

nalimilan4y ago

nightpool4y ago

Discussed in a sibling thread: https://news.ycombinator.com/item?id=31401155.

kazinator4y ago

> I agree, code should never do that. It should be `eachindex(A)`

Will that generate the same code as "i in 1:length(A)"?

Maybe whoever wrote that didn't believe so at least, or perhaps didn't find it so at the time.

The reason @inbounds would have been used is performance, so that's likely why the for loop header was written that way?

mbauman4y ago

`eachindex` is — in quite a few situations — faster than `1:n`.

bjourne4y ago

2 more replies

kazinator4y ago

The question is: is it at least as fast in all situations? Was it always that way?

You may promote a culture of not doing certain things, but that by itself won't make those things disappear from existing code.

1 more reply

cbkeller4y ago

I think it should be fine for performance AFAIU to use `eachindex` instead; at least I know `eachindex` plays nicely with LoopVectorization.jl with no performance costs there.

That said, I think you're exactly right that people may wonder just this and use the seemingly "lower-level" form out of concern with or without testing it.

1 more reply

rcthompson4y ago

TimTheTinker4y ago

> Is "for i in 1:length(A)" ever correct?

In some rare cases, it very well might be exactly what the code's author intended and needed.

So for cases like this, I think it's better to rely on a flexible linting tool (if available) than warnings or errors.

[0] https://martinfowler.com/bliki/SoftwareDevelopmentAttitude.h...

dan-robertson4y ago

Why not have a feature to allow you to turn off the warning? E.g. have something recognise 1:length(x) and complain unless you write e.g. @nowarn eachindex before it.

1 more reply

rashidrafeek4y ago

An issue was created sometime ago in StaticLint.jl to fix this: https://github.com/julia-vscode/StaticLint.jl/issues/337

cmcaine4y ago

a13692099934y ago

> Is "for i in 1:length(A)" ever correct?

Yes, actually. While I have approximately zero knownledge of Julia specifically, a language-independent example might be:

  B = OneBasedArray(length(A))
  A_ = iter(A)
  for i in 1:length(A) { B[i] = pop(A_) }
  assert(iter_isdone(A_))

And if that looks contrived... yes; it is contrived.

> that pattern followed by usage of i to index into A inside the loop?

I can't think of any legitimate uses for that, but there probably are some; make sure to allow:

  len = length(A)
  for i in 1:len ...

as a `if( (x = foo()) )`-style workaround.

poulpy1234y ago

Why allow iterating with 1:length(A) if it's not the good way ?

cmcaine4y ago

adgjlsfhk14y ago

you can't disallow it at a language level since either way, you are just indexing with Ints. That said, we can add better linting rules to catch stuff like this.

grumpyprole4y ago

> Everything has correctness issues somewhere.

pron4y ago

> A language with static types would have made it easier to build correct software

Given empirical studies, the current working hypothesis should be that if static types do have a positive effect on correctness, it is a small one (if it were big, detecting it would have been easy).

ThenAsNow4y ago

1 more reply

grumpyprole4y ago

> Given empirical studies, the current working hypothesis should be that if static types do have a positive effect on correctness, it is a small one.

Which use cases, languages and static type systems are you referring to? The context is very important, especially when seeking to draw general conclusions from empirical studies.

jolux4y ago

> Note that Matlab, the workhorse of scientific computing for a few decades now, is even less typed than Julia.

2 more replies

StefanKarpinski4y ago

In particular, not a single issue mentioned in this article would have been prevented by static type checking.

1 more reply

guenthert4y ago

Julia allows you to specify the type of a datum if you feel the need (not unlike Common Lisp). Is any of the bugs the author mentioned related to the type system?

mattkrause4y ago

I'm surprised at this critique, as I thought Julia's type system was often considered to be one of its strongest features.

ThenAsNow4y ago

A prior comment I made, all of which seems unaddressed to me three years later: https://news.ycombinator.com/item?id=20589167

ChrisRackauckas4y ago

KKKKkkkk14y ago

FMA can't be broken on Windows because FMA is implemented in hardware by Intel. What's broken is the compiler that Julia uses on Windows.

ChrisRackauckas4y ago

celrod4y ago

Even when FMA is implemented in hardware, LLVM will generally use the software version when the arguments are known at compile time.

stephencanon4y ago

FMA is only implemented in hardware on Haswell and later uArches. If you’re running on (or compiling for) IVB or earlier, you’ll get a libcall instead, and MSVC’s has been broken since forever.

Diggsey4y ago

Is this actually broken in MSVC, or is it broken because Julia is using mingw and linking to an ancient version of libc on windows (which is intentionally left as-is for back-compat)?

(I genuinely don't know, but the linked issue mentioned mingw specifically)

1 more reply

adgjlsfhk14y ago

Const-me4y ago

CPUs support these instructions for 9 years now. When ignoring these old CPUs, most languages and compilers are usually doing a good job. Example in C which does not depend on any library functions:

    double fma( double a, double b, double c )
    {
        __m128d av = _mm_set_sd( a );
        __m128d bv = _mm_set_sd( b );
        __m128d cv = _mm_set_sd( c );
        return _mm_cvtsd_f64( _mm_fmadd_sd( av, bv, cv ) );
    }

1 more reply

dekhn4y ago· 33 in thread

Wait, are those examples real?

I remember complaining about 1-bsaed indexing only to be told "julia is great! we have offsetindex". If it's a source of bugs, that ... greatly reduces my future interest in adopting the language.

orbifold4y ago

I would still think most of this is my failings, but it is also extraordinarily hard to figure out what is going wrong.

ChrisRackauckas4y ago

orbifold4y ago

wnoise4y ago

JanisErdmanis4y ago

I am curios, what are those many standard assumptions about arrays that Unitful violates. There does not seem to be any space left for alternatives as representing each element with struct:

> struct Element{T, Unit} <: Number > value::T > end

which is placed in the array.

patrickkidger4y ago

You may already know of it, but if you want differential-equations-in-JAX then allow me to quickly advertise Diffrax: https://github.com/patrick-kidger/diffrax (of which I am the author, disclaimer).

orbifold4y ago

Yes I am aware :) it is missing a few things but I might end up contributing.

1 more reply

forgotpwd164y ago

>If it's a source of bugs, that ... greatly reduces my future interest in adopting the language.

It can be a source of bugs because some/many packages incorrectly assume that what you pass is 1-based indexed.

karmakaze4y ago

I was wondering if the 1-based arrays (and option to change index base) would factor into this.

Array indexing is such a core thing and I don't understand why anything mathematical or scientific would start with 1.

coldtea4y ago

>Array indexing is such a core thing and I don't understand why anything mathematical or scientific would start with 1.

Because starting with 0 is neither math nor array indexing in general.

It's just how the base addresss of an array pointer memory block was referenced in C (and it spread from there).

Which is why all math focused languages use 1-based (fortran, apl, matlab, r, mathematica, etc.)

Rayhem4y ago

> Because starting with 0 is neither math nor array indexing in general.

[1]: https://www.cs.utexas.edu/users/EWD/transcriptions/EWD08xx/E...

2 more replies

jacobolus4y ago

Math (usually) uses 1-based indexes because those parts of math started before the concept of zero as a number, and then the convention persisted, even down to Matlab.

1 more reply

lapinot4y ago

_vdpp4y ago

xdavidliu4y ago

> It's just how the base addresss of an array pointer memory block was referenced in C (and it spread from there).

There was also the famous 1-pager by Dijkstra: "Why numbering should start at zero"

https://news.ycombinator.com/item?id=777580

1 more reply

SeanLuke4y ago

> C (and it spread from there).

BCPL.

karmakaze4y ago

I don't read much about users modern languages with 0-based index requesting 1-based options/alternatives.

davisoneee4y ago

I'll start by saying that I greatly prefer 0-based, and have used but 0- and 1-based indexing, but the choice is largely arbitrary.

0 makes sense as the '0-th offset' when thinking from a pointer perspective, but I often find when teaching, that 1-based comes more naturally for many students (the 'first' item).

rocqua4y ago

My experience is that 0-based offsets (and use of < or even != for upper bounds) mean that I should almost never have to write something like idx - 1 or idx + 1.

Especially because I now sort-off have a linter rule in my head 'if I am writing i - 1 then I am making a mistake or doing something the wrong way'. Which has been quite successful.

1 more reply

jjgreen4y ago

Not for polynomial coefficient indices :-)

2 more replies

lahvak4y ago

As other posters noted, in mathematics both 0 based and 1 based indexing is used.

There are also negative indices.

sdfhdhjdw34y ago

> Array indexing is such a core thing and I don't understand why anything mathematical or scientific would start with 1.

Because that's how maths work? Literally everywhere in maths you count from 1, except in software engineering. That's why. I hope that clarified your confusion.

planede4y ago

wnoise4y ago

Starting with 0 is quite common in series, e.g. Taylor, Fourier, Chebyshev expansions, etc.

1 more reply

temp89644y ago

> Array indexing is such a core thing and I don't understand why anything mathematical or scientific would start with 1.

1 more reply

hprotagonist4y ago

> Array indexing is such a core thing and I don't understand why anything mathematical or scientific would start with 1.

So, no FORTRAN, huh?

CRConrad4y ago

> Array indexing is such a core thing and I don't understand why anything mathematical or scientific would start with 1.

Counting things is such a core thing to humans that when we have a bunch of N things we think of them as thing #1 to thing #N. We start counting from 1, not 0.

In stead, we got the mental gymnastics of

   for(i=0, i++, i<=N-1) {...}

and its many variations (in stead of just for i := 1 to N...), which surely have caused orders of magnitude more headaches in off-by-one bugs over the years than it saved on performance.

DNF24y ago

There are good arguments for using either 0- or 1-based indices. As you should be aware, there are many languages on each side.

While preferring one over the other is perfectly fine, I question the intellectual honesty of anyone claiming incredulity about opposite choice.

dklend1224y ago

If packages use generic indexing functions like eachindex, there would be no correctness issue with that specific example

sdfhdhjdw34y ago

The problem isn't that 1-base indexing can be "fixed" in Julia. The problem is that you see 1-based indexing as a flaw.

dekhn4y ago

IshKebab4y ago

It is a flaw. Computers don't work that way fundamentally, and it introduces lots of awkward translation.

dash24y ago

But humans don't work 0-based. Try explaining to a bunch of scientists why for rows 2-5 of the DataFrame they have to write df[1:5].

1 more reply

KenoFischer4y ago· 29 in thread

patrickkidger4y ago

FWIW my take is not that Yuri is expressing "there are too many bugs" so much as he's expressing a problem in the culture surrounding Julia itself:

> But systemic problems like this can rarely be solved from the bottom up, and my sense is that the project leadership does not agree that there is a serious correctness problem.

Concisely:

1. The ecosystem is poorly put together. (It's been produced by academics rather than professional software developers.)

2. The language provides few tools to guarantee correctness. (No static typing; no interfaces.)

Personally, what I'd love to see is one of the big tech companies come on board and just write their own ecosystem. The Julia language is amazing. The ecosystem needs to be rewritten.

ChrisRackauckas4y ago

btilly4y ago

5 more replies

nextos4y ago

> The Julia language is amazing. The ecosystem needs to be rewritten.

IMHO, the biggest problem is that there is no reliable library to build huge transformers.

josephg4y ago

As a user, I’d prefer “correct but lacking composability” over “composable but sometimes my results will be silently wrong”.

What is Julia’s composability useful for if it leaves me unable to trust my results?

1 more reply

nl4y ago

I dunno.

Things like the use of scipy.spatial.distance metrics[1] by in sklearn clustering[2] seems a great example of composability that is easy to learn and very efficient.

And the sklearrn side isn't a "thing layer over C, C++, Fortran" even if scikit is (sort of) this.

[1] https://docs.scipy.org/doc/scipy/reference/spatial.distance....

[2] https://scikit-learn.org/stable/modules/generated/sklearn.me...

Tarrosion4y ago

How has Python - almost surely the most successful and widely adopted scientific programming ecosystem - avoided the problems of #2? E.g. Python doesn't have static typing.

rudellius4y ago

1 more reply

adgjlsfhk14y ago

amkkma4y ago

Hi Keno,

KenoFischer4y ago

> TPUs, kernel fusion, branching (Lux relies on generated functions) and copy elision (though this last part is being worked on IIUC).

We work closely with them of course, so I think there'll be some discussions there, but it's a very tough design problem.

amkkma4y ago

Glad to hear it's being worked on!

> That said, I don't think we can be faulted for that one, because I don't think anybody really has a good answer to this particular design problem.

Not sure what a Julian solution would look like, if possible. But given the usability wins, it would be great to have in Julia as well.

1 more reply

amkkma4y ago

https://julialang.zulipchat.com/#narrow/stream/256674-compil...

StefanKarpinski4y ago

On a personal note, Yuri, thanks for all the code and I'm sorry to see you go.

mfsch4y ago

It seems to me that much of the difficulty with interfaces, whether they are made explicit or kept implicit, lies in defining the semantics that the functions are supposed to have.

It is also worth mentioning that it is very possible to use Julia without ambitious use of cross-package generic functionality, and use it “merely” as a better Fortran or Matlab.

Sukera4y ago

[1]: Why specifications don't compose - https://hillelwayne.com/post/spec-composition/

tialaramex4y ago

> Defining an interface on an abstract type only gives you that a implementation exists, not that it is correct

> you really want to be alerted to when you expect more of an interface than the interface guarantees

jerome-jh4y ago

clhodapp4y ago

> What you really want is a way to generically express behaviors of an abstraction in a way that can be automatically tested.

The pure FP ecosystems in Scala often accomplish this in the form of "laws", which are essentially bundles of pre-made unit tests that they ship alongside their core abstraction libraries.

iamed24y ago

One little win could be publishing interface tests like these for Base interfaces in the Test stdlib. I appreciate that the Generic* types are already exposed in the Test stdlib!

blindseer4y ago

> 2) you can find out what methods you need to implement just by running the code that uses the implementation and see what fails.

For large codebases this is SO painful to do. I just don't understand how anyone gets anything done when this is how they have to develop code.

StefanKarpinski4y ago

2 more replies

kaba04y ago

renox4y ago

I remember reading a long time ago about the 1-based array and the offset-array 'kludge'.

I don't claim to be especially smart: this is/was obvious.. Unfortunately what isn't obvious is how to fix this issue and especially how to fix the culture which produces this kind of issue..

StefanKarpinski4y ago

3 more replies

HelloNurse4y ago

> there are a number of "flagship" applications that are really pushing the boundary of what Julia can do, but at the same time also need a disproportionate amount of attention.

Disproportionate effort is an obvious sign that hacks to keep such flagships seaworthy are prioritized over a good language and a good library.

> Basically, Julia tries very hard to make composability work, even if the authors of the packages that you're composing don't know anything about each other.

Typically, programming languages and libraries don't need to "try very hard" because they are designed to be safe and correct, at the cost of curbing ambitious features.

> not all implicit assumptions are easily capturable in interfaces. Perhaps there needs to be more explicit documentation around what combinations of packages are "supported".

JustFinishedBSG4y ago

> Disproportionate effort is an obvious sign that hacks to keep such flagships seaworthy are prioritized over a good language and a good library.

dang4y ago

Re the title: ok, we've replaced the submitted title ("The Julia language has a number of correctness flaws") with a representative phrase from the OP which uses the word 'ecosystem'.

I'm going to autocollapse this comment so we don't get a big thread about titles.

KenoFischer4y ago

Thanks. Appreciate your thoughtful moderation as always :).

RcouF1uZ4gsC4y ago· 9 in thread

> If you pass it an array with an unusual index range, it will access out-of-bounds memory: the array access was annotated with @inbounds, which removed the bounds check.

mattkrause4y ago

    indices = CartesianIndices(multidimensional_X)
    for index in indices

       X[index] = # whatever

mike_hock4y ago

mattkrause4y ago

Is `for i in eachindex(X)` really any worse?

runevault4y ago

leephillips4y ago

     julia> pairs("François") |> collect                                                                        
     8-element Vector{Pair{Int64, Char}}:                                                            
     1 => 'F'                                                                                                
     2 => 'r'                                                                                                
     3 => 'a'                                                                                           
     4 => 'n'                                                                                                
     5 => 'ç'                                                                                                  
     7 => 'o'                                                                                                  
     8 => 'i'                                                                                                  
     9 => 's'

Notice the missing index 6, because ç takes two bytes.

In contrast, enumerate() gets you the iteration number:

    julia> enumerate("François") |> collect                                                                    
    8-element Vector{Tuple{Int64, Char}}:                                                           
     (1, 'F')                                                                                                
     (2, 'r')                                                                                                
     (3, 'a')                                                                                           
     (4, 'n')                                                                                                
     (5, 'ç')                                                                                                  
     (6, 'o')                                                                                                  
     (7, 'i')                                                                                                  
     (8, 's')

This can trip you up.

1 more reply

lmiq4y ago

for (i,val) in pairs(array)

adgjlsfhk14y ago

kllrnohj4y ago

DNF24y ago

What do you mean by 'mistake'? How are the Julia devs going to stop someone from defining arrays with configurable indices?

Are you suggesting that the core language should somehow make this impossible? How?

NeutralForest4y ago· 8 in thread

I mean this looks like good potential targets to improve the language moving forward, it's healthy to not be in awe of your tools and push to make them better. I don't see this as "bad" honestly.

s_Hogg4y ago

fluidcruft4y ago

tylermw4y ago

NeutralForest4y ago

CoastalCoder4y ago

I think the real test will be whether or not Julia's custodians / developers start putting a greater focus on semantics and correctness.

markkitti4y ago

The question here is are these merely just bugs or is there something about the language that makes Julia error prone?

There is potential in using Julia's type inference engine to check for correctness. For example see JET.jl. "JET.jl employs Julia's type inference to detect potential bugs."

https://github.com/aviatesk/JET.jl https://www.youtube.com/watch?v=7eOiGc8wfE0

The video brings up some potential difficulties with Julia's metaprogramming facilities for static or lexical analysis, but also shows that these issues are also addressable.

freemint4y ago

NeutralForest4y ago

It's still at version 1.x, maybe an explicit roadmap could help tackling those issues?

ur-whale4y ago· 8 in thread

The examples provided feel more like bugs in various libraries than an actual problem intrinsic to Julia the language.

wodenokoto4y ago

According to the article the problem is in the ecosystem, and partly the standard lib.

trenchgun4y ago

All bugs mentioned had been quickly fixed: https://news.ycombinator.com/item?id=31397425

wnoise4y ago

That comment doesn't say all bugs have been fixed, or even quickly fixed. When I check on the posted links, many are in fact still open, e.g.

https://github.com/JuliaStats/Distributions.jl/issues/1253

https://github.com/JuliaStats/StatsBase.jl/issues/642

https://github.com/JuliaStats/StatsBase.jl/issues/616

https://github.com/JuliaLang/julia/issues/39385

1 more reply

exyi4y ago

Yea, all are just bugs, not some intrinsic flaws in the language.

Given Julia's goals (performance, abstractions, accessible to science people), it's understandable if they had slightly higher bug concentration than other (similarly sized) ecosystems.

kllrnohj4y ago

Q6T46nT668w6i3m4y ago

@inbounds is a Base feature.

markkitti4y ago

If you want to assume that an array starts at `1` one needs to require an `Array` rather than an `AbstractArray`.

arksingrad4y ago

@inbounds isn't the problem, it's incorrect usage of it. The poor docstring is absolutely a problem though, you should be iterating over eachindex(A), not 1:length(A).

xt004y ago· 7 in thread

kazinator4y ago

Hard to prove or disprove.

dandanua4y ago

Edit: Julia is better than C in this regard, since the usage of @inbounds is explicit, i.e. everyone can see that the code is potentially unsafe.

rob744y ago

jakobnissen4y ago

The issue is that there is no way to verify if OOB access is possible given an abstract type, unless you know how that type behaves, i.e. how it's indexed.

And Julia provides no way of specifying the behaviour of abstract types.

1 more reply

masklinn4y ago

> but C can read out-of-memory locations too, so what?

So it's widely considered a plague upon the field, suffered because of the lack of alternative?

Gwypaas4y ago

> but C can read out-of-memory locations too, so what?

Simply decades of exploitable security issues.

krastanov4y ago

1 more reply

one-more-minute4y ago· 6 in thread

I think it's absolutely right to celebrate Julia's approach to composition. I also hope new research (in Julia or elsewhere) will help us figure out how to tame it a bit.

Sebb7674y ago

> That A and B can work together by default in Julia is really cool, but it also means that as you add new types and packages, you have a quadratically growing set of untested edges.

But as the authors example showed, they clearly can't work together - they just fail at runtime instead of at compile time.

SemanticStrengh4y ago

any tutorial/blog on what make julia compositionability special vs othe languages? Is there a relation with multiple dispatch or delegation?

chalst4y ago

IIUC, Common Lisp is the giant on whose shoulders Julia built in this respect.

rashidrafeek4y ago

Yes. Its a side effect of multiple dispatch being the core paradigm of the language. See Stefan Karpinski's talk about it: https://www.youtube.com/watch?v=kc9HwsxE1OY

FabHK4y ago

jpeloquin4y ago

rpmuller4y ago· 6 in thread

I've been a part of many language communities, and that the Julia team is the very best in terms of the professionalism of the language and the key modules.

Maybe the best response to this is to view it as a call to action for us Julia fanboys/girls to stop cheering and fix some bugs ;-).

CJefferson4y ago

cbkeller4y ago

saghm4y ago

CJefferson4y ago

That just seemed like a bizarre overreaction to me.

1 more reply

chrsig4y ago

That thread is just ripe with bad communication across the board. It's pretty clear that none of you understand what each other is saying, but are very willing to infer.

Maybe try not communicating on twitter.

Sukera4y ago

Do you have an example? I'd like to know more about this - it must have been quite egregious if it makes you stop learning a language.

1 more reply

QuackingTheQ4y ago· 5 in thread

Between a good code linter (or some static analysis, I'm pulling for JET.jl) and a formal interface spec, I really think most of Julia's development-side issues could be quelled.

ThenAsNow4y ago

FabHK4y ago

Sounds like Julia needs a Snow Leopard/Mountain Lion/High Sierra release - no new features, just cleaning things up...

fluidcruft4y ago

QuackingTheQ4y ago

It's possible to hack interface verification into place at test-time, but that has a couple of problems:

1. Running the whole testing framework to determine if you implemented an interface is a high overhead when you're developing

2. You have a lot of tests to write to really check every error. Perhaps a package which defines an interface could provide a tester for this purpose

3. Interfaces should be attached to the types, and that should be sufficient for verifying the interface

This simple interface specification does produce design issues of its own, but correctness is much easier to handle if you know what needs to be correct in the first place.

mcabbott4y ago

catchclose89194y ago· 5 in thread

Q6T46nT668w6i3m4y ago

I think the author addresses this. It’s a Catch-22. If you restrict use to a small subset of types you’re undermining one of Julia’s best features.

As someone who has been writing a lot of numerical analysis code recently, I would absolutely love a type system that could describe and enforce numerical stability traits.

one-more-minute4y ago

dklend1224y ago

That's exactly where Dex might improve over Julia, with language level control over mutability and effect handlers and array access safety ... time will tell.

So packages just use those features

Maybe it will hit the right trade off, or maybe Julia will adopt similar language level tools, but adjusted for dynamic semantics. Is that even possible?

catchclose89194y ago

> a type system that could describe and enforce numerical stability traits

Wow, that sounds cool! have your reasearched if anyone has done anything in this are? how would you even start to approach the problem?

Do you think it has any change of being done without massive sacrifices to performance?

jstrong4y ago

Sporktacular4y ago· 5 in thread

This is a pity. It seems like a great language and I'd be keen to dive in more, but it seems fair to expect a math/numerical analysis-oriented language to be especially dependable wrt correctness.

cbkeller4y ago

Sporktacular4y ago

Guarantees aside, does MATLAB have an issue with this to the same extent as Julia?

cbkeller4y ago

adgjlsfhk14y ago

sample size of 1, but I've run 1 billion tests today in Julia (floating point power for Float16, Float32 and Float64)

Sporktacular4y ago

For correctness? What was the result?

2 more replies

Sukera4y ago· 4 in thread

For example, the majority of issues referenced are specific to a single package, StatsBase.jl - which apparently was written before OffsetArrays.jl was a thing and thus is known to be incompatible:

> Yes, lots of JuliaStats packages have been written before offset axes existed. Feel free to make a PR adding checks.

https://github.com/JuliaStats/StatsBase.jl/issues/646#issuec...

EDIT: Since this comment seems to gain some traction - title is editorialized, original is "Why I no longer recommend Julia".

snicker74y ago

"known to be incompatible"

Known to whom? People who regularly participate in the Julia forum/chat? Julia's composability relies on people agreeing on unwritten rules and standards.

DNF24y ago

Agreed, one cannot just expect this to be known.

Does type annotations in Python actually catch type errors? I thought they were mainly for documentation.

nickm124y ago

snicker74y ago

Yes, if you use tooling (mypy). It definitely helped me a few times.

p33p4y ago· 4 in thread

Viral frequents HN so I will be curious to see if he engages this directly in a productive manor.

ViralBShah4y ago

derbOac4y ago

I share your sense that "something is just missing in Julia" but I maybe disagree with the author in that I see it as potentially changeable or something, as not hopeless.

jbezanson4y ago

derbOac4y ago

For what it's worth, I prefer 1-based indexing.

IshKebab4y ago· 4 in thread

I tried Julia but the compilation time for interactive use was just too insane.

I ended up paying £125 for MATLAB. Nothing else really remotely compares to MATLAB's plotting facilities.

forgotpwd164y ago

Did you tried Octave, GNU's numerical package that is compatible to MATLAB?

IshKebab4y ago

DNF24y ago

I use Matlab daily, and the plotting is indeed excellent.

But the language itself is a horrible kludgy mess. Most of the development time is spent on input parsing and contorting your code into a vectorized shape.

IshKebab4y ago

Yeah I agree the language is not great, especially for non-matrix things. But Julia isn't exactly great either (unsurprisingly since it is pretty much a MATLAB derivative).

fgh4y ago· 4 in thread

It would be interesting to know which language the author currently uses.

rendall4y ago

The author mentions that he was stuck on a problem for weeks using Julia, but solved it with Python within hours

tgv4y ago

That was someone else: Patrick Kidger is mentioned in the article. If I look at the author's github, it's go and javascript.

rendall4y ago

You're right. I misread.

ninjin4y ago

Pretty sure it was Go last time I talked to Yuri, he is very much a stand-up guy.

cs7024y ago· 3 in thread

A more appropriate title for the OP would have been:

mbauman4y ago

cs7024y ago

lostmsu4y ago

jrochkind14y ago· 3 in thread

adgjlsfhk14y ago

jrochkind14y ago

Makes sense. I guess the author's contention is that if Julia had those formal features the author wants, it would need very significantly less dev time to reach python's levels of reliability?

adgjlsfhk14y ago

1 more reply

asdfman1234y ago· 3 in thread

> In my experience, Julia and its packages have the highest rate of serious correctness bugs of any programming system I’ve used, and I started programming with Visual Basic 6 in the mid-2000s.

Oh God, is this what qualifies you as "old" now

CRConrad4y ago

Kids these days, eh? Lawn, etc.

asdfman1234y ago

Nah, it's not like that. In my mind an "older" programmer is like from the 90s.

But I did that stuff in the mid-2000s too: am I now an "old"?

Terrifying!

CRConrad4y ago

Relax: No, you're not. You've got another fifteen years.

cleandreams4y ago· 2 in thread

Unfortunately this is not a feature but a bug, and the worst kind, a bug at the language design level:

JonChesterfield4y ago

adgjlsfhk14y ago

why not both?

jbezanson4y ago· 1 in thread

> They accept the existence of individual isolated issues, but not the pattern that those issues imply.

ThenAsNow4y ago

nohat4y ago· 1 in thread

DNF24y ago

But it's not a matter of language design. The 'choose your own indexing' is something you do entirely in libraries.

You can create your own indexing in python too, it will just be slow. The 'sin' of Julia is that it will be fast...

sharikous4y ago· 1 in thread

DNF24y ago

If Julia followed your recommendation, it would be irrelevant before it ever started.

There is no way for a new language to be useful or relevant unless it brings significant improvements.

teddyh4y ago· 1 in thread

Actual title: “Why I no longer recommend Julia”.

mbauman4y ago

@Dang could we get the title corrected?

blindseer4y ago· 1 in thread

Correctness in Julia feels like it'll never happen, because interfaces seem like they'll never happen.

Correctness guarantees / interfaces and slow startup are both my biggest pain points in Julia.

I often think what would happen if every Julia dev just dropped the language and used Rust instead. A scientific ecosystem in Rust would be amazing.

jakobnissen4y ago

ModernMech4y ago· 1 in thread

cbkeller4y ago

chubot4y ago· 1 in thread

Oof, accessing out of bounds memory is pretty surprising to me for a dynamic language ... But I guess it's not surprising if your goal is to compile to fast native code (e.g. omit bounds checks).

Basically it's negligent not to use ASAN/Valgrind with C/C++ these days. You can shake dozens or hundreds of bugs out of any real codebase that doesn't use them, guaranteed.

Similarly if people are just writing "fast" Julia code without good tests (which I'm not sure about but this article seems to imply), then I'd say that's similarly negligent.

-----

(copy of lobste.rs comment)

mbauman4y ago

You can also use `julia --check-bounds=yes` — and our testing frameworks automatically do so.

adolph4y ago· 1 in thread

https://stackoverflow.com/questions/7284/what-is-turing-comp...

https://plato.stanford.edu/entries/goedel-incompleteness/

mattkrause4y ago

ninjin4y ago

Also, let me say that encountering these kinds of bugs is not something I have had experience with. But, I tend to be very conservative with my usage of libraries and fancy composition.

cwp4y ago

Julia is still a pretty young language, and it's probably only recently that the ecosystem has gotten big enough to hit these problems.

Edit: come to think of it, one of the issues that the Java folks were dealing with was lack of composability. :-/

freemint4y ago

dmos624y ago

In my mind Julia broke new ground in terms of what happens when you create an environment where such compasibility is possible. Author's finishing thought is apt:

hprotagonist4y ago

Ouch. That sounds all the more damning for the authors studious care to calmly describe instead of angrily rant.

pankgeorg4y ago

isaacimagine4y ago

randyzwitch4y ago

Not specific to specific examples in the article, I think some of the things people perceive as "bugs" other people see as features or an opportunity to correct past mistakes.

That ends up becoming open-source prerogative. I could do it wrong "on my own time" in my own code...doesn't make either a bug, but a different choice based on perspective.

ble4y ago

This article contains no instances of the word "test", which seems surprising but entirely in keeping with the author's observations.

j7ake4y ago

In terms of saving human time I have found R to be fastest (in human time) for iterative prototyping, exploring, and visualising data

R still has the best statistical package ecosystem, although python is catching up.

cbkeller4y ago

snicker74y ago

garrison4y ago

patrec4y ago

[1] BEAM, unlike cpython, is actually a marvel of engineering and making very deliberate trade-offs. But it's not very complex.

[2] Javascript is of course the one pervasive dynamically typed programming language that has sophisticated implementations, but of mostly ill-conceived constructs.

cancandan4y ago

uwuemu4y ago

A devastating article for Julia. I was thinking about trying Julia out... but not after reading this.

vasili1114y ago

Does the Python have similar issues?

hzhou3214y ago

I believe the solution is to abandon the pursuit of magic solution of A -> [super compiler] -> output but to focus on how to get feedback from every link in A->B->C->D->...->compiler->binary->output

jahewson4y ago

j / k navigate · click thread line to collapse