One of my previous jobs basically turned into an in-house R consultant for a department in a pharmaceutical company, and I caught so many bugs when investigating some other issue which meant the results people were reporting were completely wrong. A really common one is multiplying 2 vectors of unequal length where broadcasting shouldn't be possible and it just recycles the shorter vector - but hey, it ran without error and there's an output so many researchers don't notice.
Not to mention trying to handle errors is pretty miserable, if you want to catch a specific error you have to match the error string, unfortunately the error message changes depending on the locale the R session is running in.
Use Rstudio
Include tidyverse
Turn warnings into errorsMost researchers are not programmers and don't care about programming. It's a tool to get the job done and I think you'd run into similar problems with other languages.
For repl driven development or academic code or exercises it is excellent.
On a more serious note, I agree that R being too charitable in interpreting things (seemingly without warning) seems to be a problem. You'll have to do some debugging to make sure it actually does what you intended it to do. I've only dabbled in it a bit though.
In the real world we start counting from 1. CS people cannot stop complaining about it but it makes sense in languages used for mathematics and statistics. Zero-indexing is not very relevant if you don’t care about memory layout.
It's more natural. You never count from zero with real life objects.
In my experience, the specific things R does well, python does it in a clunkier way.
Statistical software written by statisticians in academia, bioconductor, and quick prototyping is still much faster in R than in python.
My use case is to prototype in R, then move to python if things become more production rather than exploratory.
Which is an interesting detail in R that should be mentioned anyway, the difference between NA and NaN. Anyone used to languages which just NaN may confuse NA for that non-value.
https://www.r-bloggers.com/2012/08/difference-between-na-and...
https://cran.r-project.org/doc/manuals/r-release/R-lang.html... (lots of details how NA and NaN are handled)
Except - 1^NaN also is 1... now that IMO is wrong. But you can try the same in your browser's JS console and you will get 1 as a result too, so R is not the only one.
There are several NA values in R - NA_integer_, NA_real_, NA_complex_ and NA_character_, and the results will be different if you use some of them. NA_character_ and NA_complex_ will produce errors (different ones).
Also R's cleverness with NA's is not so consistent. For example:
median(c(1,1,1,NA))
Should return 1, since no matter what value is behind NA the median is still 1. But it returns NA."its value is substituted, unless env is .GlobalEnv in which case the symbol is left unchanged."
dim and dims :-)
R could do more here. I really like R.
I tried dims but: Error in dims(iris) : could not find function "dims"
I do find the occasional oddity. I've noticed more very useful messages/warnings (particularly in common tidyverse functions) recently, so I think they help.
To be fair, these quirks are generally very uncommon in day to day use.
https://rdatatable.gitlab.io/data.table/reference/substitute...
2*0 = 1 2*1 = 2 2*-1 = 1/2
When 1 raised to any power equals 1, does the power matter at all? Even if it's unknown, the answer is 1.
Unless the thing that makes the language difficult is your expecations. In that case, offering you an alternative mental model that helps you make better decisions when using the language does get you closer to solving your problem.
Yes, sure, as long as you recognize that as a very subjective determination.
From the statistician's non-programmer POV the syntax of R or some other language are similarly opaque. Learning one vs. another will present similar investments in time. From their perspective, R does not make things more difficult, and the fact that it's more of the lingua franca within the field has it's own benefits.
The people I see complain about R are usually people that learned a different general purpose language first and find that when work requires data analysis they much prefer the GPL for working through the non-analytical portions if their work. (Especially with python where pandas and numpy have made less specialized tasks much easier)
The beautiful it is to be used interactively, it really takes a lot of practice to write reliable code that doesn't abort with some error now and then.
Command line arguments are available as:
args <- commandArgs(trailingOnly=TRUE)
And there are three getopt()-like packages: getopt, optparse, and argparse.
They are just incredibly intuitive and easy to use. ggplot2 has fundamentally influenced how I think about plotting.
With my limited experience, I have never seen anything like it.
EDIT: reference https://news.ycombinator.com/item?id=15869039
The amount of consideration and careful design behind tidyverse APIs (tidyr, ggplot, dplyr) really astounds me. I've never felt the need to actually memorize any of them but they come to me so naturally whenever I type "library(tidyverse)". Very few DSLs, libraries or APIs have ever made me feel this way, and certainly NOT Python and the mess that pandas/matplotlib/scikit is. Even more impressive that he managed to build such a consistent layer atop the hack that is base R.
Note that I've nothing against base R. It really appeals to the hacker in me and it certainly has a ton of cool features (a condition system, multiple function evaluation forms - in what other language are `if`, `while`, `repeat` and even parentheses `(` and the BLOCK STATEMENT `{` all implemented as functions?) but damn if it isn't a mess of corner cases and gotchas.
1. It's not zero-indexed (even though most numerical languages aren't)
2. Loops are slow (though if you're looping in R you're probably doing it wrong)
3. It's inconsistent
4. The syntax is weird.
But people don't talk about the somewhat beautiful functional ability of the language to wrangle data almost magically. Its basis in lisp allows for the tidyverse and data.table to exist[1], and ggplot is a formidable analysis/plotting platform that Python doesn't come close to.
Once I grokked that R became my default language for anything analytics.
Anyhow, explaining the difference at that part of the tutorial is not easy, so I chose to omit it for now. But might introduce it later, along with "<<-" and "->>", probably after describing closures.
[1]: https://github.com/cran/diptest/blob/master/R/dipTest.R#L37
As someone who mostly writes not-R, my own R irritation comes from a handful of things:
- The dot character "." has no semantic meaning in identifiers. It's just a valid character for names. Looking at function names like "is.numeric" really messes with my reading comprehension.
- Ambiguously, "." also separates identifiers of objects in one of R's type systems from method calls. In some cases, `foo(bar)` and `bar.foo()` are equivalent. But only in some cases.
- Even better, a popular R library defines a function `.()` (i.e., its name is just a single period character), whose job is to expose a surprising quote/unquote expression evaluation semantics.
- This is not to mention the special meaning of "." in formula literals, which are fairly ubiquitous in R.
- Different authors use different naming conventions. Base prefers "as.numeric," Tidyverse might have "to_factor," another library might prefer camel case.
- Finally, R has a surprisingly extensive syntax, exercised by different libraries to different extents, and a correspondingly rich semantics, with "types," "modes," multiple class systems, "expression" objects, immediate and lazy evaluation, expression quoting and unquoting, metaprogramming, and homoiconicity. It is a zoo of a language.
If you're like me R is a godsend. You'll also love the tonnes of free packages. You can't get wrong with R if you appreciate simplicity and intuitiveness.
https://github.com/karoliskoncevicius/tutorial_r_introductio...
Gotta say this is very elegant.
Takes a weekend to work through the book and you get a statistics refresher as a bonus.