Kudos to the R community and supporters for providing a great and useful platform!
RStudio is the perfect IDE. REPL/command-line + Scripts + Plots. I could not be happier using it and I wish I could get VSCode to be half as good. Julia for VSCode is pretty good, but the Python science tooling goes 100% towards notebook environments which I'm not a huge fan of so the Python Science VScode experience is subpar.
The time needed to re-write a script in another language, and often using different packages, seems more than made up for by the ease of use of Rstudio.
I'm puzzled by this and wonder if you can provide some examples. The scientists I know tend to have incredibly disorganized R code, with a bunch of hard-coded paths and a single global environment in their home directory that all their R packages get installed to. Even stuff that seems critically important like reproducible science can be much harder than you'd expect in a lot of fields because questions like "what version of the libraries did you use" has to be answered (if it can be answered at all) by looking at the references in the paper.
Whereas in Python, I don't know how things could be any simpler. Creating an individualized environment for your project is one command. Installing packages that only live inside that environment is one `pip install` away. Most scientific work is not "distributed" in the sense of having users, but if you do ship a product to users, Python gives you the option of either relying on distribution provided packages (my preferred approach most of the time) or shipping a single binary created with something like PyInstaller.
Packrat[1] — an RStudio package — can be used to easily avoid the library versioning issues you describe. The problem isn't that the tooling isn't there or that it isn't easy to use. It's that some folks simply don't use it and are perhaps oblivious as to /why/ they should even use it, anyway.
[0] https://shotwell.ca/posts/2019-12-30-why-i-use-r/ [1] https://rstudio.github.io/packrat/
While there could be more effort in getting things like library versions out there a lot of journals don't care so there's no pressure on scientists to provide it.
One factor that isn't helping generations younger than mine (mid-50's) is the continual evolution of tools that remove the user from all the underlying parts. I recently worked with someone who told me they "only know Databricks on Azure" and "don't know python." Their self-assessment was accurate, and the utility of that individual was essentially zero.
The problem with python is that people like myself - non-engineers, and mostly end users of software - spend an inordinate amount of time dealing with mismatched library dependencies, deprecated features, rolling-back python versions to get a working kernel and so on.
The fact that the business model of at least two companies (Enthought and Anaconda) is predicated on the difficulty of getting a functioning python environment to work in this day and age speaks volumes about the problem.
If we can't get past "which pip?," how can we expect the other stuff to "just work?"
Here's the thing, programming is a skill. If people think it's the "not important thing" only the result (seen this often in some of my previous positions), you're going to get disasters yeah.
As for package management in R you can use either Renv or conda. Been coding R for a decade and have always pinned down packages and you could do so well before tooling made it simple as pie.
I don’t have a solution for the points above, and I understand that, once a promising approach has been found, the code starts to matter much more, because Ops will require it to be automated and executed in a reliable way. For now, what I do is to do the research in a very loose way, not caring about good SW practices. When I find something good, I start refactoring the code to meet the Ops expectations. But I’m a CS major with decades of experience in coding and ML - it’s not reasonable to expect the entire DS community to develop the same skills, it takes too long.
Any ideas out there?
(Disclosure, I am a Python programmer who has suffered through the trash packaging situation since forever)
Since when has R been in a position to cast shade on the reproducible environment of another language? Anytime I dip my toe into the R ecosystem, it feels anathema to development practices to find anyone using renv or equivalent to try and vendor dependencies. Enormous pain to try to try and get old R code running again.
Kudos to RStudio (Posit) for delivering great product over the last decade+ and growing a kind, helpful community!
Some RStudio products change their name - the “enterprise” offering.
Another RStudio product doest’t - the open-source IDE.
VScode feels more refreshing as compared to RStudio. I love the extensions within VScode that allows it more flexibility as compared to RStudio. Also ability to view hex code as colors in the editor itself. Plus the ability to sync settings using GitHub is so convenient when using multiple computers. On the flip side, Rstudio is more convenient for beginners and being very R focused helps to focus on the "Statistics and data munging".
As for the Rstudio as a company, they have supported Python in the past but with the Quarto they went to extend beyond that. I feel Quarto is still work in progress and has more ambitious outlook as compared to RMarkdown. RStudio cloud is a good option when one have to use specific version of R and alleviates the "Reproducibility" issue to some extent. Especially, when someone does not want to deal with Docker or similar platform. I think RStudio cloud is one of my favorite offering from the company.
[1]
I really couldn't care less about statistics, which like with many other topics/courses made/makes it incredibly hard for me to concentrate on and actually learn something about it. I could force the knowledge into my brain to be able to recite and use it in practice over and over again, but the moment the exams come around it's all gone from my head. That certainly made university very problematic.
[1] Edit to add: I forgot to say that using RStudio was the only remotely pleasant part of that Stats course and in later courses where some stats work was needed.
Error reading R script (), system error 2 (No such file or directory); Unable to find libR.dylib in expected locationswithin R Home directory /Library/Frameworks/R.framework/Resources