This was true before, but the performance of for loops has been improved a lot later years, and while vectorization is still faster, for loops are no longer a no-no
See https://www.r-bloggers.com/2022/02/avoid-loops-in-r-really/
For those who want to avoid it and still use a loop, you can create a vector beforehand with the final length and fill it in. If you don't know the final length, create a vector with a good guess for length, double its length whenever it gets full, and then crop off the unused tail when you're done.
Rule 1. You can't tell where a program is going to spend its time. Bottlenecks occur in surprising places, so don't try to second guess and put in a speed hack until you've proven that's where the bottleneck is.
Rule 2. Measure. Don't tune for speed until you've measured, and even then don't unless one part of the code overwhelms the rest.
And that's usually not even vectorizing anything, it just hides the for-loop that is buried somewhere in the apply-code...
My take away here is that, sure, for one operation the speed is not that critical, but there is always the case where that one operation will be used close to a million times in one analysis and then it all adds up. On top of that if it's implemented in C then the invocation from R to C and back will be happening that many times which adds to the slowness.
You might like writing slow programs, but that doesn't mean people like using them.
[1] Vectorization is more than ten times faster than the naive loop. oddsratio <- function(x, ...) UseMethod("oddsratio", x)
oddsratio.integer <- function(a, b, c, d) (a * d) / (b * c)
oddsratio.numeric <- function(p1, p0) ((p1)/(1 - p1)) / ((p0)/(1 - p0))
oddsratio.matrix <- function(x) (x[1, 1] * x[2, 2]) / (x[1, 2] * x[2, 1])
Then: oddsratio(12L, 6L, 2L, 29L) # 29
oddsratio(12/(12+2), 6/(6+29)) # 29
oddsratio(matrix(c(12,6,2,29), 2)) # 29A more compelling example for Julia would have to have two modes of operation where the first argument has the same type in both modes, but later arguments have different types.
setMethod(myfun, signature = c("integer", "character"), ...)
setMethod(myfun, signature = c("list", "data.frame", "logical"), ...)
setMethod(myfun, signature = "foo", ...)
This takes into account as many arguments as you wish. genericfun.type1 <- function(x, sub=1) x - sub
genericfun.type2 <- function(x, y) sum(x, y)
The single case can be differentiated: genericfun(x)
But which function are we calling with: genericfun(x, y)
I don't know about Julia and how it solves this. Maybe by not allowing to pass nameless optional arguments.https://cran.r-project.org/web/packages/available_packages_b...
To go from R to Julia, as an example, one would have to give up on a hundred or so high-quality packages potentially related to their activities.
Of course R has been here longer. Eleven years after its creation, R had fewer than 500 packages. Julia was released in 2012 and today has over 7,000 packages.
R
Cited in: 8,589 Publications
7,353 [Citing Publications in] Statistics (62-XX)
https://zbmath.org/software/771
Julia
Cited in: 442 Publications
64 [Citing Publications in] Statistics (62-XX)
https://zbmath.org/software/13986From my conversations with him the programming language was not the bottleneck for him it was the integration with GIS software and spatial mapping packages which caused him problems. A lot of programming languages did not mesh together very well with spatial mapping tools available to him at the time.
The first language I ever saw him use was Turbo Pascal, later he would use QBasic and GWBasic eventually he was using Visual Basic).
Towards the end of his career I believe he looked at other languages like Python and Java but I don't believe he found them very compelling. Python I believe has better spatial tools available now but it would have been relatively early in the languages life my Dad was looking at it and those packages probably did not exist.
Someone just asked to install Julia on the compute cluster just last week so we'll see how many others start using it.
For example, plotting in R always baffled me.
plot(x, y, col=..., col.name=...)
In this case, col.name is literally just a symbol. But in another context col.name is the data with index 'name' stored in col. Or something, it's been a while.
R seems to have a lot of these 'special contexts' that A. make understanding and writing code much quicker and B. reward familiarity over intuition. One line in R can be 100 in Julia, and both compile to 80 machine instructions, for example.
I'd say if you can agree with others on what R code does and you're comfortable with R, then use R. If you need to build something performant with many domains, then Julia is a great language for that sort of thing.
> plot(x, y, col=..., col.name=...)
> In this case, col.name is literally just a symbol.
In this case, col.name is literally… made up?Julia is the new R for me. Unless R re-invents itself.
It's 25 years old!
There a few recordings of the "Unreasonable effectiveness of multiple dispatch" talk that explains this: https://youtu.be/QTCKsqIK6nE
It's a breeze doing such analyses with Stata, and with a bunch of weird syntax, some libraries and more lines you can get it done in R as well.
But I tried assisting my SO with setting up their statistical methods in Python and it was so much more work than Stata (or R).
I think the example in the post shows an annoying way for Julia’s generic functions to be difficult because the function seems to take a matrix but secretly it only wants a 2x2 matrix. If such a function gets called with the wrong value deep in some other computation, and especially if it silently doesn’t complain, you may end up with some pretty annoying bugs. This kind of bug can happen in R too (functions may dispatch on the type of their first arg and many are written to be somewhat generic by inspecting types at runtime). I think it’s a little less likely only because data structures are more limited. A related example that trips me up in R is min vs pmin.
The biggest issue I had in practice is that for either language, I wanted to input some data, fiddle with it, draw some graphs, maybe fit some models, and suchlike. R seems to have better libraries for doing the latter but maybe I just didn’t find the right Julia libraries.
- I feel like I had more difficulties reading csvs with Julia. But then when I was using Julia, I wanted to read a bunch of ns-precision time stamps which the language didn’t really like, and with R I didn’t happen to need this. I found neither language had amazing datetime type support (partly this is things like precision. Partly this is things like wanting to group by week/day/whatever. Partly this is things like wanting sensible graphs to appear for a time axis)
- R has a bigger standard library of functions that are useful to me, e.g. approx or nlm or cut. I think it’s a reasonable philosophy for Julia to want a small stdlib but it is less fun trying to find the right libraries all the time. Presumably if I knew the canonical libraries I would have been happier.
- R seems to have better libraries for stats.
- I found manipulating dataframes in Julia to be less ergonomic than dplyr, but maybe I just wasn’t using the Julia equivalent. In particular, instead of e.g. mutate(x=cumsum(yfilter)), I would have to write something like mutate(do, [:y, :filter]=>((y,f)-> cumsum(yfilter))=>:x). I didn’t like it, even though it’s clearly more explicit about scoping which I find desirable in a less interactive language.
- I much preferred ggplot2 to the options in Julia. It seems the standard thing is plots.jl but I never had a great time with that. Gadfly seemed to have a better interface but had similar issues to manipulating data frames and I found myself hitting many annoying bugs with it. Ggplot is fast slow, however.
- Pluto crashed a lot on me, which wasn’t super fun. In general, I felt like Julia was more buggy in general. Though I also get an annoying bug with R where it starts printing new prompts every second or so, and sometimes just crashes after that. Pluto also doesn’t work with Julia’s parallelism features (but maybe it does now?)
- The thing that most frustrated me with Pluto/Gadfly was that I would want to take a bunch of data, draw it nice and big, and have a good look at it. Ggplot (probably because of bad hidpi support) does this well by throwing up the plot with a tiny font size on a nice 4k window and, with appropriate options, not doing a ton of X draw calls for partial results (downside: it is still quite slow with a lot of points). Gadfly in Pluto wants to generate an SVG with massive font size and thick borders on chonky scatter plot shapes, and crams it into a tiny rectangle in Pluto. Maybe this is more aesthetic or something but generally I plot things because I want to look at the data and this is not an easy way to look at it. The option to hide the thick borders in gadfly is hilariously obscure. I never bothered learning how to not generate the svg in the notebook. I would just suffer terrible performance while I zoomed in to get a higher resolution screenshot (before deleting the avg in the dev console) or generate a png file.
That said, there are still things I don’t know how to do with either plotting system, like reversing a datetime scale, or having a scale where the output coordinate goes as -pseudolog(1-y) to see the tail of an ecdf, or having a scale where the labels come from one source but positions come from some weight, e.g. time on the x axis weighted by cpu-hours so that an equal x distance between points corresponds to equal cpu-hours rather than equal wall-time. Maybe I will learn how to do it someday with ggplot.
julia> df = DataFrame(y = rand(10^6), filter=randn(10^6));
julia> @transform!(df, :x = cumsum(:y .* :filter))
1000000×3 DataFrame
Row │ y filter x
│ Float64 Float64 Float64
─────────┼────────────────────────────────────
1 │ 0.0726663 1.7213 0.125081
2 │ 0.183898 -0.392131 0.0529686
3 │ 0.150274 1.08083 0.21539
⋮ │ ⋮ ⋮ ⋮
It's particularly nice in conjunction with @chain [1].[1] https://juliadata.github.io/DataFramesMeta.jl/dev/#Chaining-...
Separate from that, I use python when I'm writing (undemanding) system-level work. I see it as a great replacement for the shell. (Python took over from perl, and once I got to 20% proficiency with python I had a sigh of relief, knowing that I would never really need to write in perl again.)
And, yes, I also use Julia. This is mainly for writing small numerical models. It is a lovely language. I would never start to write a small model in fortran anymore. But that doesn't mean I can leave fortran behind because it is still the language used for large numerical models. (These models involve many tens of person-years of effort by world experts. This is not just a coding thing.)
I suspect that quite a lot of people have language limits more like mine than the 1.5 stated by the author. For such people, Julia is definitely an arrow that ought to be in the quiver. It is elegant. It is fast. It is modern. Parts of it are simply delightful. But there are downsides.
1. The startup is slow enough to be annoying, for folks (like me) who like to use makefiles to coordinate a lot of steps in analysis, as opposed to staying in a language environment all day long. (Note, though, that julia is getting faster. In particular, the time-to-first-plot has been decreasing from an annoying minute or so, down to perhaps half a minute.) 2. The error messages are often emanated from a low level, making it hard to understand what is wrong. In this, R and python and even C/C++ are much superior. 3. The language is still in rapid development, so quite often the advice you find on the web will not be the best advice. 4. There are several graphics systems, and they work differently. This wild-west approach is confusing to users. Which one to choose? If I run into problems with one and see advice to switch to another, what new roadblocks will I run into? 5. The graphical output is fairly crude, compared with R. 6. It has some great libraries, but in shear number and depth and published documentation, it cannot really hold a candle to R. Nearly every statistical PhD involves R code, and I think quit a lot of packages come from that crucible. This environment ought not to be underestimated.
The bottom line? It only takes an hour or so to see that Julia is a wonderful open-source replacement for matlab, and for small tasks that might otherwise be done in Fortran. Anyone with a language capacity of 2 or 3 or more (and I suspect this is many folks on HN) will find Julia to be a great tool to learn, for certain tasks.
He then speaks a bit about multiple dispatch and how it’s useful when it’s suitable.
Personally I saw nothing here that might actually convince someone to switch. R + Tidyverse + Rcpp + CRAN is formidable.
Rcpp is the worst thing that ever happened to humanity. Crazy build system, impossible magic words and macros, poisons an entire C or C++ project with new headers etc., extremely to downright impossibly hard to compile without R specific compiler tools. Two different build systems for whatever reasons in sourceCpp, compiler just includes arbitrary files, maintainer is ahem extremely condescending to any Q&A questions on SO and GH and doesn't understand why crazy long errors aren't just obvious
is not technically true, because that implies a massive slow-down. instead it's more accurate to say behavior-wise it's always equivalent to a dynamic dispatch, but because Julia's Just-Ahead-of-Time compilation, often you eliminate the dynamic dispatch during run time.
I understand that Julia 2.0 is slated to have some sort of concrete interface mechanism, so that’s good. Thus far, I’ve seen some pretty low quality results. There’s just no way to have intuition about what method is going to be called in Julia. In python, I know it’s either going to be somewhere in dir(some-obj) or it’s gonna be some funky meta class stuff. Either way, pycharm can literally just hyperlink me.
Until Julia has the same capability, it just won’t be suitable for general purpose code. I know there will be some Julia fan in the replies about how I can approximate the behavior, and how Julia is the future and blah blah blah.
Just fix interfaces. It’s not that hard. They’ve got MIT grads for crying out loud!
I’m a little appalled there’s PhDs doing computer science work with public money that can’t wrap their head around python. That’s a failed curriculum imo.
Better static analysis tools (or traits/interfaces in the type system) would of course be welcome. But in my experience that’s more to catch silly mistakes and typos than to aid in healthy modularity or easy discoverability (which to me are remarkable good already).
This is an utterly deranged take. Do you mean to say that adherence to OOP and code organization are the same thing ?
I checked the usual places and did not find any information on 2.0 and an interface mechanism. Do you have a pointer?