undefined | Better HN

0 pointseigenvalue3y ago0 comments

This seems eminently solvable though. Why can’t every package submission cause some minimal sandboxed docker image to install the package and call the various functions and methods and log all network and disk activity? If anything looks suspicious it would be denied and the submitter would have to appeal it, explaining why the submission is valid. The same applies for NPM and Cargo. I know there is a researcher out there who has retrieved and installed every single pip package to do an analysis, which is a good start. This seems like the kind of thing that wouldn’t even cost all that much, and big corporate users of python would stand to benefit.

0 comments

woodruffw3y ago

For one, because Docker is not a sandbox, and containers are not a strong security boundary[1]. What you really need here is a strongly isolated VM, at which point you're playing cat-and-mouse games with your target: their new incentive is to detect your (extremely detectable) VM, and your job is to make the VM look as "normal" as possible without actually making it behave normally (because this would mean getting exploited). That kind of work has a long and frustrating tail, and it's not particularly fruitful (relative to the other things packaging ecosystems can do to improve package security).

> I know there is a researcher out there who has retrieved and installed every single pip package to do an analysis, which is a good start.

You're probably talking about Moyix, who did indeed downloaded every package on PyPI[2], and unintentionally executed a bunch of arbitrary code on his local machine in the process.

[1]: https://cloud.google.com/blog/products/gcp/exploring-contain...

[2]: https://moyix.blogspot.com/2022/09/someones-been-messing-wit...

eigenvalueOP3y ago

You make some good points. But it still seems to me that, if you used the best available sandboxed VMs for each platform (Windows Sandbox for Windows; FireJail for Linux; VirtualBox with no folder permissions for OSX-- I don't know if these are the best or even good, those were the ones I found from a bit a searching), that you could install and run these packages in an automated way (especially with some GPT3-type help to figure out how to explore and call the important functions) and look for the telltale signs in the network and file access behavior that they are malicious. Even if we grant that this is a long-tailed "cat and mouse" game, then so what? We won't get 100% security, especially against super sophisticated threat actors, but if you could catch 98% or whatever of the typical clumsy supply chain attacks, or super egregious stuff like that NPM package that deleted your whole disk if you were Russian, that would be an incredibly vast improvement over the current state of affairs. Why isn't that worth doing? Why isn't Google or Microsoft at least trying this?

woodruffw3y ago

It isn't worth doing because the equation you've supplied doesn't include the effect of catastrophic failure: dynamic analysis lowers the barrier for exploit to a single hypervisor or VM exploit. Catching 98% of spam packages that affect nobody is worth very little when the 2% you don't catch are the ones that do the real damage.

> Why isn't Google or Microsoft at least trying this?

They are: Google and Microsoft both spend (tens of) millions of dollars on hypervisor and VM isolation research each year. It's a huge field.

com2kid3y ago

> What you really need here is a strongly isolated VM,

Simplify, don't use a VM.

Create an isolated network, hook your sacrificial machine up to it, have it install the package. Remotely kill it (network controlled power switch if needed). The machine's hard drive should be hooked up through a network controlled switch of some type. After the sacrificial machine is powered down, reroute the HD so it is connected to a machine that does forensics.

Now you have a clear "before" and "after" situation setup for analysis.

The sacrificial machine's network activity can be monitored by way of whatever switch/router it uses to connect to the Internet.

woodruffw3y ago

This is a VM, but flakier and with more steps! It’s also eminently not sustainable on PyPI’s scale, which is the context we’re talking about. I’m

1 more reply

nodogoto3y ago

Well some calls absolutely should invoke network or disk activity, so you would additionally need to define what constitutes good and bad activity for each. Moreover unless the package is a collection of pure functions it would be easy to hide the malware trigger in state that won't be initialized properly by the automated method calls but would be in the standard usage of the package.

j / k navigate · click thread line to collapse

0 comments

woodruffw3y ago

> I know there is a researcher out there who has retrieved and installed every single pip package to do an analysis, which is a good start.

You're probably talking about Moyix, who did indeed downloaded every package on PyPI[2], and unintentionally executed a bunch of arbitrary code on his local machine in the process.

[1]: https://cloud.google.com/blog/products/gcp/exploring-contain...

[2]: https://moyix.blogspot.com/2022/09/someones-been-messing-wit...

eigenvalueOP3y ago

woodruffw3y ago

> Why isn't Google or Microsoft at least trying this?

They are: Google and Microsoft both spend (tens of) millions of dollars on hypervisor and VM isolation research each year. It's a huge field.

com2kid3y ago

> What you really need here is a strongly isolated VM,

Simplify, don't use a VM.

Now you have a clear "before" and "after" situation setup for analysis.

The sacrificial machine's network activity can be monitored by way of whatever switch/router it uses to connect to the Internet.

woodruffw3y ago

This is a VM, but flakier and with more steps! It’s also eminently not sustainable on PyPI’s scale, which is the context we’re talking about. I’m

1 more reply

nodogoto3y ago

j / k navigate · click thread line to collapse