My Journey from R to Julia (opens in new tab)

(drtomasaragon.github.io)

102 pointsmichelpereira3y ago112 comments

112 comments

64 comments · 16 top-level

karencarits3y ago· 11 in thread

> For example, in R, we try to avoid loops because they are very inefficient

This was true before, but the performance of for loops has been improved a lot later years, and while vectorization is still faster, for loops are no longer a no-no

See https://www.r-bloggers.com/2022/02/avoid-loops-in-r-really/

_Wintermute3y ago

It's a really sticky misconception. I've seen many beginners telling others to "never ever use loops in R", and so you end up with nested sapply()s or whatever soon-to-be-deprecated tidyverse functions are in vogue that nobody can reason about.

vharuck3y ago

Agreed. The most common reason loops become bottlenecks is people "adding onto" vectors or dataframes. This causes a whole new vector to be created, the data from the old one copied into it, and then the new data filled in at the end. You'll rarely notice the performance hit unless you stick it in a loop that runs tens of thousands of times.

For those who want to avoid it and still use a loop, you can create a vector beforehand with the final length and fill it in. If you don't know the final length, create a vector with a good guess for length, double its length whenever it gets full, and then crop off the unused tail when you're done.

em5003y ago

So Rob Pike’s rule 1 and 2 again:

Rule 1. You can't tell where a program is going to spend its time. Bottlenecks occur in surprising places, so don't try to second guess and put in a speed hack until you've proven that's where the bottleneck is.

Rule 2. Measure. Don't tune for speed until you've measured, and even then don't unless one part of the code overwhelms the rest.

https://users.ece.utexas.edu/~adnan/pike.html

3 more replies

deng3y ago

> so you end up with nested sapply()s

And that's usually not even vectorizing anything, it just hides the for-loop that is buried somewhere in the apply-code...

johnmyleswhite3y ago

Does the article you linked not show that a loop is 10x slower than vectorization for computing square roots? The fact that 10x is better than the 60x slowdown for vapply isn't really evidence that loops are a reasonable alternative to vectorization yet.

rlh23y ago

The obsession with cpu speed almost always confuses me in these topics. Time it takes to program is way more important, and that’s where a terse language like R shines. The base/most common functions are almost always executing C anyway. It’s kind of like lisp in that it’s easy to write slow code, but who cares if it’s “fast enough”? Also, it’s almost always easy to speed up if necessary at the R level and R’s C API is also easy to use for for numeric computing/optimization which is exposed at the C level if you want to use it.

kkoncevicius3y ago

It depends. Take for example any omic dataset where you might need to run a GLM model on ~500,000 rows. Codes I've seen for this operation can range in time from taking 30 minutes to 2 days.

My take away here is that, sure, for one operation the speed is not that critical, but there is always the case where that one operation will be used close to a million times in one analysis and then it all adds up. On top of that if it's implemented in C then the invocation from R to C and back will be happening that many times which adds to the slowness.

1 more reply

Hasnep3y ago

One of the key points of Julia is that the language you use for performance critical parts is also Julia. That applies to both the libraries like DataFrames.jl and for situations where you'd drop to a lower level language when optimising. I think being productive in Fortran or C++ is unrealistic for most scientific programmers.

1 more reply

freehorse3y ago

It is a trade-off and a sweet spot has a lot to do with the specific context and background. Run speed matters a lot when the difference is between having to run your code on a dataset for half an hour vs through the whole night. Once you have prototyped your code, you are gonna use it more and more (not to mention runs in order to tweak parameters or validate results), and R's speed is not satisfying enough for my work. Python matlab are easy and fast enough to program in, and much faster for tasks that are computing-heavy. If I was getting into C I would not have saved as much time as I would have put into learning how run eg parallel tasks there safely. Moreover, R is not necessarily faster to program, always; real (ie tidyverse-style) R is quite idiosyncratic, if you come from a programming and not from a statistics background probably it will take more time to learn than it is worth unless it is sth important in your work environment.

CyberDildonics3y ago

When someone understands what is happening when their program executes they will write faster programs without much more effort.

You might like writing slow programs, but that doesn't mean people like using them.

npalli3y ago

Sorry, not following the logic here. From the article, vectorization[1] is more than 10 times faster than a loop. How is this an endorsement for "for" loops.

   [1] Vectorization is more than ten times faster than the naive loop.

BrandonS1133y ago· 6 in thread

Just now I was thinking of moving a long calculation from R to Julia (non-linear optimisation of a simple function with multiple local minima, for a lot of different datasets). No loops. Embarrassingly parallel. And to my great surprise, R and Julia took the same time.

tkuraku3y ago

I think there are a lot of performance pitfalls with Julia. In my experience you have to pay close attention https://docs.julialang.org/en/v1/manual/performance-tips/. Otherwise you can easily get lack luster performance. However, there are times when the speed of Julia really shines and it just feels magical compared to R, python, etc.

Lyngbakr3y ago

I only used Julia for a short time, but I didn't see the blazing fast speeds I was promised. I've seen the benchmarks, of course, on which the claims are founded, but the C-like speeds weren't obvious to me in everyday data science workflows. In the end, there wasn't sufficient motivation for me to switch to Julia as my weapon of choice. I do like Pluto[0], though...

[0]https://plutojl.org/

BrandonS1133y ago

If doing data science, I find Julia's tools to be inferior to Python and R. But in my work, when it comes to long computations, not only does Julia usually vastly outperform both, we write Julia code faster with fewer errors.

1 more reply

rightbyte3y ago

You need to know how the compiler propagates types in detail to write performant code. It is quite hard.

1 more reply

ChrisRackauckas3y ago

Did you use JuMP? It would be interesting to see the JuMP code.

BrandonS1133y ago

No, nlopt. Why we could easy port from R to Julia as nlopt exists for both (its c)

2 more replies

adenozine3y ago· 6 in thread

I’ve made most of my career turning scientific and mathematical code into maintainable and aesthetic code, and the red flag for me in this article is that he evidently couldn’t keep up with the Python learning curve and chose instead a language with no traits, no interfaces, and no classes. So, the amount of organization in his code is effectively zero.

I understand that Julia 2.0 is slated to have some sort of concrete interface mechanism, so that’s good. Thus far, I’ve seen some pretty low quality results. There’s just no way to have intuition about what method is going to be called in Julia. In python, I know it’s either going to be somewhere in dir(some-obj) or it’s gonna be some funky meta class stuff. Either way, pycharm can literally just hyperlink me.

Until Julia has the same capability, it just won’t be suitable for general purpose code. I know there will be some Julia fan in the replies about how I can approximate the behavior, and how Julia is the future and blah blah blah.

Just fix interfaces. It’s not that hard. They’ve got MIT grads for crying out loud!

I’m a little appalled there’s PhDs doing computer science work with public money that can’t wrap their head around python. That’s a failed curriculum imo.

andyferris3y ago

I find this a bit perplexing - it is possible to make organised, maintainable code without class-based polymorphism or statically enforced interfaces (traits, type classes). And Julia is so much easier than several other languages I know to organise your code in a neat modular way (and it still being quite generic and reusable).

Better static analysis tools (or traits/interfaces in the type system) would of course be welcome. But in my experience that’s more to catch silly mistakes and typos than to aid in healthy modularity or easy discoverability (which to me are remarkable good already).

mxkopy3y ago

> no traits, no interfaces, and no classes. So, the amount of organization in his code is effectively zero.

This is an utterly deranged take. Do you mean to say that adherence to OOP and code organization are the same thing ?

EdwardDiego3y ago

Deranged? You could've said that you disagree and leave it at that.

1 more reply

BrandonS1133y ago

That is harsh. I know PhDs doing comp science work, hey with PhDs in comp science, coming to same conclusion. Python is an excellent language. But coding in numpy is not its strength.

jpfr3y ago

I'm surpised. At JuliaCon ~6 months ago the message was that no 2.0 is in the works [1]. I.e. no backwards-incompatible changes to the language.

I checked the usual places and did not find any information on 2.0 and an interface mechanism. Do you have a pointer?

[1] https://youtu.be/N4h46_TCmGc?t=1656

adgjlsfhk13y ago

No, he's just making stuff up.

1 more reply

vitorsr3y ago· 5 in thread

This has been said before multiple times over but with these languages it is rarely about the languages themselves but their ecosystems:

https://cran.r-project.org/web/packages/available_packages_b...

To go from R to Julia, as an example, one would have to give up on a hundred or so high-quality packages potentially related to their activities.

getoffmycase3y ago

Having looked at a large number of R packages source code, I do hesitate to freely label R packages as generally high-quality. I’ve been operating on the “trust, but verify” principle

NicolasL-S3y ago

You don't have to. Just use Rcall.

Of course R has been here longer. Eleven years after its creation, R had fewer than 500 packages. Julia was released in 2012 and today has over 7,000 packages.

vitorsr3y ago

    R
    Cited in: 8,589 Publications
    7,353 [Citing Publications in] Statistics (62-XX)
    https://zbmath.org/software/771

    Julia
    Cited in: 442 Publications
    64 [Citing Publications in] Statistics (62-XX)
    https://zbmath.org/software/13986

BrandonS1133y ago

That is exactly the issue. No language comes close to the richness of the R statistical package ecosystem.

dunefox3y ago

It's not an issue at all with RCall and PyCall.

1 more reply

hnarayanan3y ago· 5 in thread

I get confused by this every time this comes up. Is multiple dispatch the same as function-overloading (e.g. in C++)?

sfpotter3y ago

They're different. IIRC, multiple dispatch is dynamic (i.e., happens at runtime) while C++'s function overloading is static (happens at compile time).

borodi3y ago

One interesting thing is that if julia can prove what types a function will be called with at compile time, it doesn't have to do dynamic dispatch, so it has no overhead. It's what the julia folks call type-stable code

1 more reply

moelf3y ago

>happens at runtime

is not technically true, because that implies a massive slow-down. instead it's more accurate to say behavior-wise it's always equivalent to a dynamic dispatch, but because Julia's Just-Ahead-of-Time compilation, often you eliminate the dynamic dispatch during run time.

1 more reply

hnarayanan3y ago

Yes, thank you! I keep re-learning and forgetting this. Trying to research now and see how this manifests in practice.

bicepjai3y ago

This video helps understand the difference. https://youtu.be/kc9HwsxE1OY

kkoncevicius3y ago· 3 in thread

R can handle the examples in the article with generic functions:

  oddsratio         <- function(x, ...)     UseMethod("oddsratio", x)
  oddsratio.integer <- function(a, b, c, d) (a * d) / (b * c)
  oddsratio.numeric <- function(p1, p0)     ((p1)/(1 - p1)) / ((p0)/(1 - p0))
  oddsratio.matrix  <- function(x)          (x[1, 1] * x[2, 2]) / (x[1, 2] * x[2, 1])

Then:

  oddsratio(12L, 6L, 2L, 29L)         # 29
  oddsratio(12/(12+2), 6/(6+29))      # 29
  oddsratio(matrix(c(12,6,2,29), 2))  # 29

civilized3y ago

This works, but only because you can tell which function you need to call using only the first argument.

A more compelling example for Julia would have to have two modes of operation where the first argument has the same type in both modes, but later arguments have different types.

Cosi11253y ago

    setMethod(myfun, signature = c("integer", "character"), ...)
    setMethod(myfun, signature = c("list", "data.frame", "logical"), ...)
    setMethod(myfun, signature = "foo", ...)

This takes into account as many arguments as you wish.

kkoncevicius3y ago

True, but in R (at least in S3) it's probably avoided by design because of default parameters. A short example:

  genericfun.type1 <- function(x, sub=1) x - sub
  genericfun.type2 <- function(x, y)     sum(x, y)

The single case can be differentiated:

  genericfun(x)

But which function are we calling with:

  genericfun(x, y)

I don't know about Julia and how it solves this. Maybe by not allowing to pass nameless optional arguments.

1 more reply

sfpotter3y ago· 2 in thread

This is a pretty weak article. The author lists five reasons an epidemiologist would be interested in Julia and then only gives a (kind of simple and contrived) example for one of them.

bigger_cheese3y ago

My father is a (retired) veterinary epidemiologist (i.e. he looked at spread of diseases in Animal populations) probably a function of his age but he wrote most of his models and simulations in Pascal and then switched to Basic eventually.

From my conversations with him the programming language was not the bottleneck for him it was the integration with GIS software and spatial mapping packages which caused him problems. A lot of programming languages did not mesh together very well with spatial mapping tools available to him at the time.

The first language I ever saw him use was Turbo Pascal, later he would use QBasic and GWBasic eventually he was using Visual Basic).

Towards the end of his career I believe he looked at other languages like Python and Java but I don't believe he found them very compelling. Python I believe has better spatial tools available now but it would have been relatively early in the languages life my Dad was looking at it and those packages probably did not exist.

jgalt2123y ago

For the epidemiologists, I wonder how Julia stacks up against R when calculating the benefits (but not the costs) of long-lasting lockdowns and school closures.

mxkopy3y ago· 2 in thread

From my understanding Julia is closer to metal than R. This means the semantics are much more specific than R, and the syntax is more consistent/rigid.

For example, plotting in R always baffled me.

plot(x, y, col=..., col.name=...)

In this case, col.name is literally just a symbol. But in another context col.name is the data with index 'name' stored in col. Or something, it's been a while.

R seems to have a lot of these 'special contexts' that A. make understanding and writing code much quicker and B. reward familiarity over intuition. One line in R can be 100 in Julia, and both compile to 80 machine instructions, for example.

I'd say if you can agree with others on what R code does and you're comfortable with R, then use R. If you need to build something performant with many domains, then Julia is a great language for that sort of thing.

kgwgk3y ago

    > plot(x, y, col=..., col.name=...)
    > In this case, col.name is literally just a symbol.

In this case, col.name is literally… made up?

mxkopy3y ago

This is bad faith pedantry

1 more reply

aljabadi3y ago· 2 in thread

It’s important to note that R’s S4 Object System now supports multiple dispatch & I have enjoyed using it. I would agree that it’s not quite as elegant as Julia’s. See https://www.mpjon.es/2021/05/31/r-julia-multiple-dispatch/

getoffmycase3y ago

The problem with S4 is that it really sucks to write.

kgwgk3y ago

> now

It's 25 years old!

gozzoo3y ago· 2 in thread

The article is supposed to tell us why Jilia is better than R, but it mainly focuses on one feature - multiple dispatch. Can someone please explain - does multiple dispatch provide any advantage over other function call strategies, and even if it does how much effort would it save, how much shorter or less ambiguous our code would become.

markkitti3y ago

Multiple dispatch is not unique to Julia, but it is a large part of the language. It helps 3rd parties extend rather than duplicate interfaces.

There a few recordings of the "Unreasonable effectiveness of multiple dispatch" talk that explains this: https://youtu.be/QTCKsqIK6nE

xtalax3y ago

This isn't even showcasing what multiple dispatch is, and its power, see https://www.youtube.com/watch?v=kc9HwsxE1OY

bluedino3y ago· 1 in thread

I work with PhD chemists at a F500 company, most everyone uses Python, we have a pocket of users that are on the R train. Mostly Rstudio mixed with Python.

Someone just asked to install Julia on the compute cluster just last week so we'll see how many others start using it.

dunefox3y ago

Also have a look at RCall and PyCall.

dan-robertson3y ago· 1 in thread

Interestingly, I found myself going the other way. Let me first say that R is a hilariously weird-feeling and janky language. The Julia features mentioned (structure are good for organising; compilation and better data structures mean you need to worry less about accidentally writing code that is 10x or 100x slower than it ought to be, which tends to matter a lot for interactive use) are definitely useful, and magically getting e.g. arbitrary precision arithmetic is pretty cool.

I think the example in the post shows an annoying way for Julia’s generic functions to be difficult because the function seems to take a matrix but secretly it only wants a 2x2 matrix. If such a function gets called with the wrong value deep in some other computation, and especially if it silently doesn’t complain, you may end up with some pretty annoying bugs. This kind of bug can happen in R too (functions may dispatch on the type of their first arg and many are written to be somewhat generic by inspecting types at runtime). I think it’s a little less likely only because data structures are more limited. A related example that trips me up in R is min vs pmin.

The biggest issue I had in practice is that for either language, I wanted to input some data, fiddle with it, draw some graphs, maybe fit some models, and suchlike. R seems to have better libraries for doing the latter but maybe I just didn’t find the right Julia libraries.

- I feel like I had more difficulties reading csvs with Julia. But then when I was using Julia, I wanted to read a bunch of ns-precision time stamps which the language didn’t really like, and with R I didn’t happen to need this. I found neither language had amazing datetime type support (partly this is things like precision. Partly this is things like wanting to group by week/day/whatever. Partly this is things like wanting sensible graphs to appear for a time axis)

- R has a bigger standard library of functions that are useful to me, e.g. approx or nlm or cut. I think it’s a reasonable philosophy for Julia to want a small stdlib but it is less fun trying to find the right libraries all the time. Presumably if I knew the canonical libraries I would have been happier.

- R seems to have better libraries for stats.

- I found manipulating dataframes in Julia to be less ergonomic than dplyr, but maybe I just wasn’t using the Julia equivalent. In particular, instead of e.g. mutate(x=cumsum(yfilter)), I would have to write something like mutate(do, [:y, :filter]=>((y,f)-> cumsum(yfilter))=>:x). I didn’t like it, even though it’s clearly more explicit about scoping which I find desirable in a less interactive language.

- I much preferred ggplot2 to the options in Julia. It seems the standard thing is plots.jl but I never had a great time with that. Gadfly seemed to have a better interface but had similar issues to manipulating data frames and I found myself hitting many annoying bugs with it. Ggplot is fast slow, however.

- Pluto crashed a lot on me, which wasn’t super fun. In general, I felt like Julia was more buggy in general. Though I also get an annoying bug with R where it starts printing new prompts every second or so, and sometimes just crashes after that. Pluto also doesn’t work with Julia’s parallelism features (but maybe it does now?)

- The thing that most frustrated me with Pluto/Gadfly was that I would want to take a bunch of data, draw it nice and big, and have a good look at it. Ggplot (probably because of bad hidpi support) does this well by throwing up the plot with a tiny font size on a nice 4k window and, with appropriate options, not doing a ton of X draw calls for partial results (downside: it is still quite slow with a lot of points). Gadfly in Pluto wants to generate an SVG with massive font size and thick borders on chonky scatter plot shapes, and crams it into a tiny rectangle in Pluto. Maybe this is more aesthetic or something but generally I plot things because I want to look at the data and this is not an easy way to look at it. The option to hide the thick borders in gadfly is hilariously obscure. I never bothered learning how to not generate the svg in the notebook. I would just suffer terrible performance while I zoomed in to get a higher resolution screenshot (before deleting the avg in the dev console) or generate a png file.

That said, there are still things I don’t know how to do with either plotting system, like reversing a datetime scale, or having a scale where the output coordinate goes as -pseudolog(1-y) to see the tail of an ecdf, or having a scale where the labels come from one source but positions come from some weight, e.g. time on the x axis weighted by cpu-hours so that an equal x distance between points corresponds to equal cpu-hours rather than equal wall-time. Maybe I will learn how to do it someday with ggplot.

stillyslalom3y ago

Take a look at DataFramesMeta for nicer manipulation of Julia's dataframes. Your example would look like

  julia> df = DataFrame(y = rand(10^6), filter=randn(10^6));

  julia> @transform!(df, :x = cumsum(:y .* :filter))
  1000000×3 DataFrame
       Row │ y          filter      x
           │ Float64    Float64     Float64
  ─────────┼────────────────────────────────────
         1 │ 0.0726663   1.7213       0.125081
         2 │ 0.183898   -0.392131     0.0529686
         3 │ 0.150274    1.08083      0.21539
      ⋮    │     ⋮          ⋮            ⋮

It's particularly nice in conjunction with @chain [1].

[1] https://juliadata.github.io/DataFramesMeta.jl/dev/#Chaining-...

bluenose693y ago· 1 in thread

The key comment is that it's hard to know more than 1.5 languages. I think everyone has their own number for that. My number is higher than the author's. I use R for most work, but a lot of my computations involved large binary datasets that are best read with C/C++, so I use C/C++ and R in tandem for my data-analysis work.

Separate from that, I use python when I'm writing (undemanding) system-level work. I see it as a great replacement for the shell. (Python took over from perl, and once I got to 20% proficiency with python I had a sigh of relief, knowing that I would never really need to write in perl again.)

And, yes, I also use Julia. This is mainly for writing small numerical models. It is a lovely language. I would never start to write a small model in fortran anymore. But that doesn't mean I can leave fortran behind because it is still the language used for large numerical models. (These models involve many tens of person-years of effort by world experts. This is not just a coding thing.)

I suspect that quite a lot of people have language limits more like mine than the 1.5 stated by the author. For such people, Julia is definitely an arrow that ought to be in the quiver. It is elegant. It is fast. It is modern. Parts of it are simply delightful. But there are downsides.

1. The startup is slow enough to be annoying, for folks (like me) who like to use makefiles to coordinate a lot of steps in analysis, as opposed to staying in a language environment all day long. (Note, though, that julia is getting faster. In particular, the time-to-first-plot has been decreasing from an annoying minute or so, down to perhaps half a minute.) 2. The error messages are often emanated from a low level, making it hard to understand what is wrong. In this, R and python and even C/C++ are much superior. 3. The language is still in rapid development, so quite often the advice you find on the web will not be the best advice. 4. There are several graphics systems, and they work differently. This wild-west approach is confusing to users. Which one to choose? If I run into problems with one and see advice to switch to another, what new roadblocks will I run into? 5. The graphical output is fairly crude, compared with R. 6. It has some great libraries, but in shear number and depth and published documentation, it cannot really hold a candle to R. Nearly every statistical PhD involves R code, and I think quit a lot of packages come from that crucible. This environment ought not to be underestimated.

The bottom line? It only takes an hour or so to see that Julia is a wonderful open-source replacement for matlab, and for small tasks that might otherwise be done in Fortran. Anyone with a language capacity of 2 or 3 or more (and I suspect this is many folks on HN) will find Julia to be a great tool to learn, for certain tasks.

BrandonS1133y ago

1.5 is on the low side. I use Python, R, Julia, and Latex professionally. Python for op system/internet/data, R for stats and Julia for numerical calculations (some very large scale). So know the useful parts of all 4.

usgroup3y ago· 1 in thread

TLDR: Author switched to Julia because he “fell in love” with it, with no further qualification.

He then speaks a bit about multiple dispatch and how it’s useful when it’s suitable.

Personally I saw nothing here that might actually convince someone to switch. R + Tidyverse + Rcpp + CRAN is formidable.

heywhatupboys3y ago

> Rcpp

Rcpp is the worst thing that ever happened to humanity. Crazy build system, impossible magic words and macros, poisons an entire C or C++ project with new headers etc., extremely to downright impossibly hard to compile without R specific compiler tools. Two different build systems for whatever reasons in sourceCpp, compiler just includes arbitrary files, maintainer is ahem extremely condescending to any Q&A questions on SO and GH and doesn't understand why crazy long errors aren't just obvious

fithisux3y ago

I'm afraid R is dragged by its S legacy. I think its time for these to evolve separately. I see Julia can do what R already does by following software engineering practices, cleaner code and typing.

Julia is the new R for me. Unless R re-invents itself.

maxboone3y ago

I'd love to see more Julia (or Python) adoption in non-cs/math/phys academic research.

It's a breeze doing such analyses with Stata, and with a bunch of weird syntax, some libraries and more lines you can get it done in R as well.

But I tried assisting my SO with setting up their statistical methods in Python and it was so much more work than Stata (or R).

j / k navigate · click thread line to collapse

112 comments

64 comments · 16 top-level

karencarits3y ago· 11 in thread

> For example, in R, we try to avoid loops because they are very inefficient

This was true before, but the performance of for loops has been improved a lot later years, and while vectorization is still faster, for loops are no longer a no-no

See https://www.r-bloggers.com/2022/02/avoid-loops-in-r-really/

_Wintermute3y ago

vharuck3y ago

em5003y ago

So Rob Pike’s rule 1 and 2 again:

Rule 2. Measure. Don't tune for speed until you've measured, and even then don't unless one part of the code overwhelms the rest.

https://users.ece.utexas.edu/~adnan/pike.html

3 more replies

deng3y ago

> so you end up with nested sapply()s

And that's usually not even vectorizing anything, it just hides the for-loop that is buried somewhere in the apply-code...

johnmyleswhite3y ago

rlh23y ago

kkoncevicius3y ago

It depends. Take for example any omic dataset where you might need to run a GLM model on ~500,000 rows. Codes I've seen for this operation can range in time from taking 30 minutes to 2 days.

1 more reply

Hasnep3y ago

1 more reply

freehorse3y ago

CyberDildonics3y ago

When someone understands what is happening when their program executes they will write faster programs without much more effort.

You might like writing slow programs, but that doesn't mean people like using them.

npalli3y ago

Sorry, not following the logic here. From the article, vectorization[1] is more than 10 times faster than a loop. How is this an endorsement for "for" loops.

   [1] Vectorization is more than ten times faster than the naive loop.

BrandonS1133y ago· 6 in thread

tkuraku3y ago

Lyngbakr3y ago

[0]https://plutojl.org/

BrandonS1133y ago

1 more reply

rightbyte3y ago

You need to know how the compiler propagates types in detail to write performant code. It is quite hard.

1 more reply

ChrisRackauckas3y ago

Did you use JuMP? It would be interesting to see the JuMP code.

BrandonS1133y ago

No, nlopt. Why we could easy port from R to Julia as nlopt exists for both (its c)

2 more replies

adenozine3y ago· 6 in thread

Just fix interfaces. It’s not that hard. They’ve got MIT grads for crying out loud!

I’m a little appalled there’s PhDs doing computer science work with public money that can’t wrap their head around python. That’s a failed curriculum imo.

andyferris3y ago

mxkopy3y ago

> no traits, no interfaces, and no classes. So, the amount of organization in his code is effectively zero.

This is an utterly deranged take. Do you mean to say that adherence to OOP and code organization are the same thing ?

EdwardDiego3y ago

Deranged? You could've said that you disagree and leave it at that.

1 more reply

BrandonS1133y ago

That is harsh. I know PhDs doing comp science work, hey with PhDs in comp science, coming to same conclusion. Python is an excellent language. But coding in numpy is not its strength.

jpfr3y ago

I'm surpised. At JuliaCon ~6 months ago the message was that no 2.0 is in the works [1]. I.e. no backwards-incompatible changes to the language.

I checked the usual places and did not find any information on 2.0 and an interface mechanism. Do you have a pointer?

[1] https://youtu.be/N4h46_TCmGc?t=1656

adgjlsfhk13y ago

No, he's just making stuff up.

1 more reply

vitorsr3y ago· 5 in thread

This has been said before multiple times over but with these languages it is rarely about the languages themselves but their ecosystems:

https://cran.r-project.org/web/packages/available_packages_b...

To go from R to Julia, as an example, one would have to give up on a hundred or so high-quality packages potentially related to their activities.

getoffmycase3y ago

Having looked at a large number of R packages source code, I do hesitate to freely label R packages as generally high-quality. I’ve been operating on the “trust, but verify” principle

NicolasL-S3y ago

You don't have to. Just use Rcall.

Of course R has been here longer. Eleven years after its creation, R had fewer than 500 packages. Julia was released in 2012 and today has over 7,000 packages.

vitorsr3y ago

    R
    Cited in: 8,589 Publications
    7,353 [Citing Publications in] Statistics (62-XX)
    https://zbmath.org/software/771

    Julia
    Cited in: 442 Publications
    64 [Citing Publications in] Statistics (62-XX)
    https://zbmath.org/software/13986

BrandonS1133y ago

That is exactly the issue. No language comes close to the richness of the R statistical package ecosystem.

dunefox3y ago

It's not an issue at all with RCall and PyCall.

1 more reply

hnarayanan3y ago· 5 in thread

I get confused by this every time this comes up. Is multiple dispatch the same as function-overloading (e.g. in C++)?

sfpotter3y ago

They're different. IIRC, multiple dispatch is dynamic (i.e., happens at runtime) while C++'s function overloading is static (happens at compile time).

borodi3y ago

1 more reply

moelf3y ago

>happens at runtime

1 more reply

hnarayanan3y ago

Yes, thank you! I keep re-learning and forgetting this. Trying to research now and see how this manifests in practice.

bicepjai3y ago

This video helps understand the difference. https://youtu.be/kc9HwsxE1OY

kkoncevicius3y ago· 3 in thread

R can handle the examples in the article with generic functions:

  oddsratio         <- function(x, ...)     UseMethod("oddsratio", x)
  oddsratio.integer <- function(a, b, c, d) (a * d) / (b * c)
  oddsratio.numeric <- function(p1, p0)     ((p1)/(1 - p1)) / ((p0)/(1 - p0))
  oddsratio.matrix  <- function(x)          (x[1, 1] * x[2, 2]) / (x[1, 2] * x[2, 1])

Then:

  oddsratio(12L, 6L, 2L, 29L)         # 29
  oddsratio(12/(12+2), 6/(6+29))      # 29
  oddsratio(matrix(c(12,6,2,29), 2))  # 29

civilized3y ago

This works, but only because you can tell which function you need to call using only the first argument.

A more compelling example for Julia would have to have two modes of operation where the first argument has the same type in both modes, but later arguments have different types.

Cosi11253y ago

    setMethod(myfun, signature = c("integer", "character"), ...)
    setMethod(myfun, signature = c("list", "data.frame", "logical"), ...)
    setMethod(myfun, signature = "foo", ...)

This takes into account as many arguments as you wish.

kkoncevicius3y ago

True, but in R (at least in S3) it's probably avoided by design because of default parameters. A short example:

  genericfun.type1 <- function(x, sub=1) x - sub
  genericfun.type2 <- function(x, y)     sum(x, y)

The single case can be differentiated:

  genericfun(x)

But which function are we calling with:

  genericfun(x, y)

I don't know about Julia and how it solves this. Maybe by not allowing to pass nameless optional arguments.

1 more reply

sfpotter3y ago· 2 in thread

This is a pretty weak article. The author lists five reasons an epidemiologist would be interested in Julia and then only gives a (kind of simple and contrived) example for one of them.

bigger_cheese3y ago

The first language I ever saw him use was Turbo Pascal, later he would use QBasic and GWBasic eventually he was using Visual Basic).

jgalt2123y ago

For the epidemiologists, I wonder how Julia stacks up against R when calculating the benefits (but not the costs) of long-lasting lockdowns and school closures.

mxkopy3y ago· 2 in thread

From my understanding Julia is closer to metal than R. This means the semantics are much more specific than R, and the syntax is more consistent/rigid.

For example, plotting in R always baffled me.

plot(x, y, col=..., col.name=...)

In this case, col.name is literally just a symbol. But in another context col.name is the data with index 'name' stored in col. Or something, it's been a while.

kgwgk3y ago

    > plot(x, y, col=..., col.name=...)
    > In this case, col.name is literally just a symbol.

In this case, col.name is literally… made up?

mxkopy3y ago

This is bad faith pedantry

1 more reply

aljabadi3y ago· 2 in thread

getoffmycase3y ago

The problem with S4 is that it really sucks to write.

kgwgk3y ago

> now

It's 25 years old!

gozzoo3y ago· 2 in thread

markkitti3y ago

Multiple dispatch is not unique to Julia, but it is a large part of the language. It helps 3rd parties extend rather than duplicate interfaces.

There a few recordings of the "Unreasonable effectiveness of multiple dispatch" talk that explains this: https://youtu.be/QTCKsqIK6nE

xtalax3y ago

This isn't even showcasing what multiple dispatch is, and its power, see https://www.youtube.com/watch?v=kc9HwsxE1OY

bluedino3y ago· 1 in thread

I work with PhD chemists at a F500 company, most everyone uses Python, we have a pocket of users that are on the R train. Mostly Rstudio mixed with Python.

Someone just asked to install Julia on the compute cluster just last week so we'll see how many others start using it.

dunefox3y ago

Also have a look at RCall and PyCall.

dan-robertson3y ago· 1 in thread

- R seems to have better libraries for stats.

stillyslalom3y ago

Take a look at DataFramesMeta for nicer manipulation of Julia's dataframes. Your example would look like

  julia> df = DataFrame(y = rand(10^6), filter=randn(10^6));

  julia> @transform!(df, :x = cumsum(:y .* :filter))
  1000000×3 DataFrame
       Row │ y          filter      x
           │ Float64    Float64     Float64
  ─────────┼────────────────────────────────────
         1 │ 0.0726663   1.7213       0.125081
         2 │ 0.183898   -0.392131     0.0529686
         3 │ 0.150274    1.08083      0.21539
      ⋮    │     ⋮          ⋮            ⋮

It's particularly nice in conjunction with @chain [1].

[1] https://juliadata.github.io/DataFramesMeta.jl/dev/#Chaining-...

bluenose693y ago· 1 in thread

BrandonS1133y ago

usgroup3y ago· 1 in thread

TLDR: Author switched to Julia because he “fell in love” with it, with no further qualification.

He then speaks a bit about multiple dispatch and how it’s useful when it’s suitable.

Personally I saw nothing here that might actually convince someone to switch. R + Tidyverse + Rcpp + CRAN is formidable.

heywhatupboys3y ago

> Rcpp

fithisux3y ago

I'm afraid R is dragged by its S legacy. I think its time for these to evolve separately. I see Julia can do what R already does by following software engineering practices, cleaner code and typing.

Julia is the new R for me. Unless R re-invents itself.

maxboone3y ago

I'd love to see more Julia (or Python) adoption in non-cs/math/phys academic research.

It's a breeze doing such analyses with Stata, and with a bunch of weird syntax, some libraries and more lines you can get it done in R as well.

But I tried assisting my SO with setting up their statistical methods in Python and it was so much more work than Stata (or R).

j / k navigate · click thread line to collapse