Typosquatting programming language package managers (opens in new tab)

(incolumitas.com)

486 pointsxrstf10y ago143 comments

143 comments

99 comments · 25 top-level

airless_bar10y ago· 11 in thread

This only seems to be an issue for languages where packages reside in a global namespace, like Python, Rust etc.

I think most languages these days are a bit smarter and avoid this beginner mistake (for various reasons).

This is obviously not true. If `serde` resided at `erickt/serde` (as the counterproposal for Rust would've had it), I could create `erict/serde` or `erick-t/serde` or any other variations of erickt's handle.

The only way this is 'solved' is if some third party authority hands out top level names and refuses to register names that are similar to other names for some definition of similar. The number of levels between top level and package name is irrelevant.

zardeh10y ago

Well, you could also solve it by saying that the post slash names are unique. ie. There can't exist zardeh/serde if erickt/serde already exists. Then the author-name works as a logical checksum, and you aren't any worse off than you were with a global namespace.

2 more replies

kzrdude10y ago

The name is just one part of the problem.

There's another solution (like debian does), auditing what the package itself does, so that you don't allow malicious code into the repository.

airless_bar10y ago

You are obviously wrong.

While attacking a single package would be possible, covering any interesting amount of "typo"-space would require registering huge amounts of namespaces.

If package manager developers are smart, the allocation of namespaces is also handled externally and associated with some cost (e. g. domain names).

Therefore these kinds of attacks become impractical.

1 more reply

abiox10y ago

this is yet another reason why i really wished rust had went for namespaced packaging on crates.io. i like so many of the decisions the rust team made, but not this one.

kibwen10y ago

This is incorrect. Package repositories with namespacing are just as vulnerable to these attacks.

airless_bar10y ago

Wrong.

1 more reply

vemv10y ago

Ruby and JS package managers are un-namespaced as well.

SixSigma10y ago

Julia too, though there is a central list. I haven't done any tests for this kind of thing.

anentropic10y ago

couldn't you register a typo namespace?

Klathmon10y ago

Yes, but then you'd need to also register a ton of packages under that namespace.

That's something that can be flagged for manual review before it gets too far.

1 more reply

wbond10y ago· 9 in thread

We've gotten flack from package developers submitting new packages to Package Control [0] because all additions to the default channel are hand reviewed. Part of this process is to prevent accidentally close package names, to try and encourage collaboration and to encourage developers to actually explain what their package does and how to use it.

My hope is to be automating a large amount of the review in the next few months, however I think this is a good argument for never having it be fully automatic. Having a human sanity check submissions isn't a terrible idea if we can keep the workload down.

Certainly this doesn't prevent a malicious author from posting a legitimate package and then changing the contents to be malicious, but that can be somewhat solved by turning off automatic updates.

[0] https://packagecontrol.io

SCdF10y ago

Hey Will,

Thanks for keeping Package Control high quality, I know it's highly appreciated :-)

notduncansmith10y ago

Another grateful Package Control user here.

bcg110y ago

Sonatype has a manual review process as well before allowing new projects to deploy to Maven Central. [1][2]

One step to mitigate things like this as well would be to have some sort of "crowd-sourcing" command in the package manager program... like "npm flag coffe-script" or something like that to alert repository maintainers of possible issues.

[1]: http://central.sonatype.org/pages/ossrh-guide.html [2]: http://central.sonatype.org/articles/2014/Feb/27/why-the-wai...

yxhuvud10y ago

Typosquatting can be flagged automatically (for reviewal of a human later) using Levenshtein distance.

patates10y ago

For Postgres users, it is built-in: https://www.postgresql.org/docs/9.5/static/fuzzystrmatch.htm...

eudox10y ago

Keep fighting the good fight.

mercer10y ago

> Certainly this doesn't prevent a malicious author from posting a legitimate package and then changing the contents to be malicious, but that can be somewhat solved by turning off automatic updates.

Perhaps you could make this safer by adding an automatic check for how much the package has changed since the last version? And at least warn the user when they want to update?

cortesoft10y ago

I don't know how much checking for how much the package has change would help. You wouldn't need to change much to exploit - one line that downloads and executes code from somewhere would do it.

hoodoof10y ago

Perhaps list all new packages to the community and require/request validation or flagging by the community, along with listing similar package names.

eudox10y ago· 9 in thread

I'm a fan of the approach of personally submitting projects to the repository maintainer (e.g. through GitHub issues), and having the maintainer personally approve them.

It does raise the barrier to entry, but it would prevent typosquatting and regular namesquatting.

EDIT: Does any major package manager provide a "did you mean" functionality, offering a list of actual package names similar to what you typed?

akavel10y ago

then the maintianer must have perfcet sigth and never ovrelook even one tpyo :]

and then also have perfect memory of all packages and notice that similarly named package is too (for some value of "too") similarly named to some already existing one... even if e.g. both are a correct dictionary word.

burkaman10y ago

APT does and others probably do too, but it obviously only gives suggestions when the package you entered doesn't exist.

eudox10y ago

Right, it's only useful if you've prevented typosquatting.

Which Debian has, because submitting a new package is a much more involved processes than sudo apt-get publish.

philjackson10y ago

That's a massive burden on the poor person who has to ok the package - especially at NPM's scale, for example.

seldo10y ago

We believe npm's scale is a direct result of having the lowest ceremony to publish a package. Turning the dial in the direction we did has pros and cons.

yoo1I10y ago

Well, ideally you'd set up some sort of system where multiple people work on managing a repository, similar to maybe how linux distributions package applications and libraries.

eudox10y ago

NPM's scale is the exception, rather than the rule.

tonyedgecombe10y ago

Or someone needs to approve suspiciously named packages.

eudox10y ago

How do you determine what is a suspicious package without reviewing every new package by hand?

1 more reply

Mizza10y ago· 9 in thread

This seems like pretty unethical research to me.

Also, doesn't point out that the bigger threat is that this is wormable.

tantalor10y ago

The doc (http://incolumitas.com/data/thesis.pdf) does have a short section on ethics, but IMHO it completely misses the point of the ethical concerns in running unauthorized, non-sandboxed code on devices you don't own. Instead it justifies the research by saying the threat cannot be shown unless the vulnerability is exploited, which is true, but that fact does not justify the research.

The acknowledgements mention 2 of the university advisers and a PyPi admin consented to the "notification program".

Still, people with good intentions have been prosecuted and convicted for less. I would be very concerned for this student.

throwawaysocks10y ago

There was no actual intrusion, so this feels like fair game to me. Especially since mitigating a very possible attack vector is a direct result of running experiment. Still, hopefully the researchers got an IRB to sign off on the experiment setup...

kbenson10y ago

Well, there was a small intrusion. It reports back a filtered command history (including just package install commands), the hardware info, and the list of installed modules (along with regular info, like system type, if there are admin privileges, etc). That's not nothing, but it is fairly benign. I was worried about the command history until I saw it was filtered, and that mostly allayed my misgivings.

placeybordeaux10y ago

The research got computers to execute code on them without authorization and extracted information from them.

That is a crime under the CFAA in the USA. Not sure what it is in Germany/EU.

2 more replies

SolarNet10y ago

Yea, this would never get past my university's ethics department. I'm actually surprised he was allowed to do this. Maybe it's partially due to the fact our ethics department is also worried about liability.

markbnj10y ago

Perhaps this could have been made cleaner by relying on the package manager for download counts only, and then demonstrating the code execution scenario on research machines only. If you wanted to avoid actually downloading anything to the user's machine (after all, they expect a 404 in this case, not a package even be it a harmless one) you'd perhaps need the cooperation of the repo admins to a greater extent.

Anyway, this is all part of why I always try to build inside a container, or at least in a virtualenv where I don't need to sudo the install.

placeybordeaux10y ago

Yeah I wouldn't want to find myself in court hearing

>17000 computers were forced to execute [unauthorized] arbitrary code

Certainly a crime in the US, not sure about Germany.

Nice execution though!

twinkletwinkle10y ago

I'm not so sure - were they forced? Could you take the maintainer of `requests` to court too? If someone types `pip install reqeusts` and gets something they maybe didn't expect, did you really force them?

1 more reply

2110y ago

I wonder about the legality. It looks to me like he isn't technically responsible, since he didn't access any authorized computer himself.

If I intentionally leave an infected USB drive on the ground, someone picks it up and sticks it into it's computer, am I liable?

Seems like it could go either way.

pmontra10y ago· 6 in thread

Probably the maintainers of the package managers know which typos their users do, because of the 404s in the logs or equivalent errors. A preventive action could be starting to blacklist any name resolving to 404. If somebody eventually tries to upload a package in the blacklist, a maintainer should check the code and whitelist the name. Obviously people can be very crative with typos and with squattinq and there is no real protection against mistakes.

utexaspunk10y ago

Might it work to mandate that the name of an uploaded package have a minimum levenshtein distance (or similar calculation) from the names of all the existing packages? Then you wouldn't have to worry about maintaining a blacklist.

wycats10y ago

That would mean that, for example on crates.io, you couldn't create a `libm`, because `libc` is already very popular. I don't think that works.

2 more replies

pmontra10y ago

It seems a good idea.

I used the Ruby code at the beginning of http://stackoverflow.com/questions/16323571/measure-the-dist... to calculate the distance between the package names at page 60 of the thesis and their typos. The maximum is 2.

I checked some similar package names from a Gemfile.lock of a project of mine. Unfortunately the two gems hike and hirb are also at distance 2. Probably many short names are close with this metric.

A combination of the two approaches could be ok: knowing that a name was blacklisted should be an indicator that's not a good name, despite the distance with any other name, plus an approval of the maintainers for distance 2.

But a blacklist could generate another type of squatting, with people trying to pre-blacklist perfectly legit names. Only one thing is sure: there is more work to do for the maintainers and this extra friction is not good.

Edit: the distance suffers from the same problem.

hughes10y ago

Surely some troll would deploy a fleet of machines that flood package indexes with requests to available names, effectively blacklisting entire dictionaries and eventually all short names.

pmontra10y ago

Yeah, this is what I came to think too. I mentioned it in another comment. Somebody suggested to use a distance indicator, but trolls could attack that too.

1 more reply

epalmer10y ago

> Obviously people can be very crative with typos and with squattinq and there is no real protection against mistakes.

I see what you did.

bennofs10y ago· 5 in thread

Did anyone else find it surprising the the number of total requests (45334) is so much higher than the number of unique total requests (17289)? It is more than twice the number of unique requests!

Possible explainations:

* Perhaps many of those are automated build systems, which would also explain the high number of systems with admin access (for example, if you use travis without docker, every build runs in a clean vm with admin access).

* People download one package and install it multiple times? Seems unlikely

Any other ideas?

Guillaume8610y ago

I think he forgot to define a baseline (could be wrong, I didn't read the paper). He should have generated a few packages with a completely innocent name (and maybe some packages with just a GUID as a name) to see how much downloads / installs they get too.

lighttower10y ago

The person who ran the line,

sudo pip install lumpy (instead of numpy)

Ran it again because it 'didn't work'

cderwin10y ago

In the case of python (not sure about the other package managers) if a valid package requires the hacked package, each project that requires that valid package will download and install the hacked package separately if you're using virtual environments. Also if you're using docker you reinstall everything when your requirements file changes.

caseysoftware10y ago

Numerous developers and/or building multiple servers behind a single IP address aka NAT. It's pretty common.

joepvd10y ago

Automated testing, continuous integration/delivery, et cetera download and install packages pretty often. If the type is made in the requirements.txt or package.json or what have you, the error can be repeated very often up to and including production.

zeveb10y ago· 4 in thread

Reminds me of the quote, 'there are only two hard things in computer science: naming things, cache invalidation and off-by-one errors.'

I think that this clearly falls under the heading 'naming issue.' People know what they want, but do not enter it properly.

I can't think of a 100% off-hand, which isn't surprising, because it's a hard problem.

pmontra's suggestion to use typo blacklisting ain't a bad idea. Maybe some sort of reputation-per-name could help?

a_t4810y ago

Sure it's not an off-by-one[-key] error? :)

blowski10y ago

Banks have a similar problem when people write cheques or set up standing orders. You have to put a name and the account number.

I wonder if you could do something similar here - enter the name of the package and a code of some sort. I haven't thought this through in a lot of detail.

PeterisP10y ago

Banks generally solve the issue with simple classic checksumming methods that guarantee that any number with a typo or swapped neighbouring characters will always result in an invalid number.

That doesn't work with arbitrary names because they are, well, arbitrary.

1 more reply

Klathmon10y ago

Or just refer to packages by 2 names.

    Maintainer/PackageName

It solves so many problems, this included.

5 more replies

szx10y ago· 4 in thread

When you think about it, how different is the destructive potential of an npm/pip install from curl | bash that (some) people tend to froth at the mouth about?

It's pretty mind blowing how big of a blindspot package installers are. I guess running everything inside a e.g. Docker container/VM would be a partial interim solution for the paranoid?

lmm10y ago

> When you think about it, how different is the destructive potential of an npm/pip install from curl | bash that (some) people tend to froth at the mouth about?

It's a bit better - there is only one possible source of compromise rather than everyone on the network path. Given that npm/pip likely keep archives of all packages uploaded, it would be much harder (perhaps impossible) to attack someone secretly this way, at least in the long term.

Good package managers require signing of uploads (e.g. maven central requires every package to have a GPG signature; Debian goes further, and requires your key to be signed by an existing member of the organization). If the client checks the signatures you end up with a system that's perhaps actually secure.

szx10y ago

Signing is definitely part of the answer but there's still the question of trust.

A signed package doesn't really tell you that much. In the best case scenario it tells you the package you're installing in fact came from developer X and contains code Y (which you kinda already know since you have the source code). This works as long as you know and trust developer X, or did your due diligence reading through the code (which you can already do today).

I can't think of an end solution that wouldn't have to rely on network effects and social proof, which strikes me as rather fragile. Maybe formal verification and AI can help, but that's a long way off (?)

raesene1010y ago

For me they're very similar. I actually did a talk last year for OWASP AppsecEU where I started with the curl|bash bit and pointed out where rubygems/npm etc aren't really a lot better in some ways

https://www.youtube.com/watch?v=Wn190b4EJWk

szx10y ago

Nice talk! Sounds like there's no silver bullet...

I'm curious to hear your opinion about a combination of digital signing with e.g. keybase/blockchain + reputation system, a sandboxed development environment (mitigates the "short con" risk) and a sandboxed production environment, with the minimum set of permissions required to operate (as well as auditing of course).

Call me pessimistic but I don't see developers taking on the extra friction given the status quo. Though a major data breach or two might change things, as I'm sure we'll find out sooner or later.

ysavir10y ago· 4 in thread

Instead of blacklisting, why not respond with a "You requested package ABD, but we think you might mean package ABC. Enter 'yes' to continue or anything else to start over."

That way authors can continue to use any name they want, and the emphasis is on letting installers know that they might be installing the wrong package.

VLM10y ago

"You requested package ABD, but we think you might mean package ABC. Enter 'yes' to continue or anything else to start over."

That'll be fun to automate around in puppet or ansible.

voltagex_10y ago

I hope you're using a local package cache for puppet or ansible or even specifying via hash (think git commit)

sheac10y ago

But if ABD and ABC are both package names in the system, then in order to present that warning we have to do some sort of resolution process to determine whether one is typosquatting.

Now that there's a strategy for finding fakers: 1) You have an attacker-defender arms race. The attacker will always be one step ahead of the defender. 2) You have the extra burden of keeping up in this race, otherwise your security feature is a facade. At best, this is useless. At worst, it lulls your users into a false sense of security.

zardeh10y ago

I feel like "pick the more popular package" is a good enough solution in this case.

1 more reply

ryanmarsh10y ago· 3 in thread

So last week my client discovered there's a gem named bunlder... sigh

pmontra10y ago

There is a gem called bundle which doesn't do anything but preventing a typosquat

https://rubygems.org/gems/bundle Total downloads 1,800,600

Source (empty) at https://github.com/will/bundle and interesting README.

https://rubygems.org/gems/bundler Total downloads 92,116,090

It's almost the 2%.

rspeer10y ago

I think the authors here missed an opportunity for even more effective squatting like that: cases where the name you import, name you type at the command line, or name you commonly call the package by is different from the name in the repository.

In Python, "pytables" (should be "tables") and "skimage" (should be "scikit-image") come to mind.

1 more reply

willlll10y ago

My gem has a good downlaod/loc ratio.

nichochar10y ago· 2 in thread

Wow, this a very good study and explanation of what typo squatting is, and I really liked how he proved it's effectiveness.

I wonder what kind of steps we can take to prevent this risk.

trungaczne10y ago

I think we will have to rely on crypto hash in some form. Similar to download checksum. It won't be convenient, but it will be safe(r).

bpicolo10y ago

That doesn't really save you from typos

1 more reply

mirekrusin10y ago· 2 in thread

with npm there should be at least an option which prompts for Y/N/A when package has preinstall hook.

but even this just tries to put the problem under carpet. you could still for example have requests package which just installs request package, works as expected, just sends request/response to your own server from time to time. ie. when there's http basic auth used only.

seldo10y ago

It is possible to disable install hooks at install time by running npm install with --ignore-scripts.

You can also make this the default, with npm config set ignore-scripts true (and then --ignore-scripts false at install time if you wish to run them).

zanchey10y ago

Solaris did (does?) this - "this package contains installation scripts which run as superuser" or words to that effect. Unfortunately I never found a owa to inspect the scripts directly so it wasn't all that helpful.

mbroshi10y ago· 2 in thread

Maybe this is overly naive, but when I make a typo in the Google search bar, it doesn't even search for my typo-ed term (even if it would have gotten some hits), it searches for what I actually meant to type. Can't package managers have a similar feature?

abstractbeliefs10y ago

The main problem is when you really did mean to search for the typo term. There's no inherent problem in two packages having similar names.

Consider the following:

requests - a python package for making HTTP requests. requestr - a python package for a fictional startup that allows you to send requests to your nearest and dearest.

Given they both could be typos of each other:

1) How do we determine which one to use? What if someone accidentally also tries "requestd", somewhere between the two ?

2) How do we apply the principle of least surprise - I asked to install requests, and everything installed just fine, but now I can't import it?!

ekimekim10y ago

    $ pip install requestr

    Package "requestr": did you mean "requests"? [Y/n]
    (reason for this warning: similar spelling and requests is much more popular)

    Pass --no-spell-warnings to disable this feature.

optimuspaul10y ago· 2 in thread

I'm confused.. is it 17 computers or 17000 computers? inconsistent use of decimals in this article.

cialowicz10y ago

17000. In Europe a common decimal format it #.###,##. See here: https://en.wikipedia.org/wiki/Decimal_mark#Examples_of_use

pidg10y ago

17,000. The author is from a country that uses . to delimit thousands.

Mahn10y ago· 1 in thread

> In the thesis itself, several powerful methods to defend against typo squatting attacks are discussed. Therefore they are not included in this blog post.

http://incolumitas.com/data/thesis.pdf section 5 "Practical implications". Just wanted to point out that in case you skipped it it's worth a read, some interesting proposals there that are worth discussing with package manager maintainers.

I particularly like the preemptive approach of auto-blacklisting common typos by simply monitoring the number of times a specific unexisting package is requested over time (5.10). So if a lot of people regularly attempt to install the unexisting package "reqeusts", it could signal that it's a common typo and should be blacklisted to prevent malicious use in the future. False positives could always be sorted out manually by communicating with the package manager maintainers.

nailer10y ago

You'd Bayesian that.

- The package name is something lot of people regularly attempt to install, but it doesn't exist (per above) - The package name is 1-2 chars off from the name of another package which has more than X downloads - The package is frequently installed then uninstalled in a short time

PeterisP10y ago· 1 in thread

Part of the problem is the many packages that require sudo permissions to install - IMHO that should be an exceptional case, but it isn't.

nneonneo10y ago

Packages often require sudo in order to install to the global interpreter - it's a security hazard otherwise. Imagine a Python package which overrides the sys module. If it didn't require sudo, anyone could install it and compromise Python for everyone else (or, for instance, compromise setuid programs).

The two solutions here are user-local packages (pip --user, for example) and virtual environments.

baby10y ago

After watching this awesome Defcon talk https://www.youtube.com/watch?v=YqxaKGA9Lnc I wondered if there was any use cases for bit/typo squating in crypto. This is a pretty cool one! Not crypto but interesting none-the-less :)

cormacrelf10y ago

And 'npmjs.org' is misspelled as 'npmsjs.org' in the introduction. Nice.

zmanian10y ago

We need operating system vendors to give us a mechanism for easily creating and managed sandboxed dev environments.

Ones dev environment should be a place where remote code execution is a high probablity and we need better tools to partition that from high value data.

jogjayr10y ago

I thank my stars every time I get a "Package not found" error due to a typo, because I'm reminded that it could have been much worse.

jwilk10y ago

Trying to parse the title made my head hurt. It should be "Typosquatting software package names" or something.

tbrock10y ago

The homebrew model where packages and changes to packages are reviewed takes care of this problem quite nicely.

andrewstuart10y ago

Ouch. This really hurts. So hard to protect against human error.

sheerun10y ago

Glad to hear bower is stated to be safe in this regard :)

irremediable10y ago

Really cool applied research. If I get the time, I'll check out the author's thesis.

j / k navigate · click thread line to collapse

143 comments

99 comments · 25 top-level

airless_bar10y ago· 11 in thread

This only seems to be an issue for languages where packages reside in a global namespace, like Python, Rust etc.

I think most languages these days are a bit smarter and avoid this beginner mistake (for various reasons).

tatterdemalion10y ago

zardeh10y ago

2 more replies

kzrdude10y ago

The name is just one part of the problem.

There's another solution (like debian does), auditing what the package itself does, so that you don't allow malicious code into the repository.

airless_bar10y ago

You are obviously wrong.

While attacking a single package would be possible, covering any interesting amount of "typo"-space would require registering huge amounts of namespaces.

If package manager developers are smart, the allocation of namespaces is also handled externally and associated with some cost (e. g. domain names).

Therefore these kinds of attacks become impractical.

1 more reply

abiox10y ago

this is yet another reason why i really wished rust had went for namespaced packaging on crates.io. i like so many of the decisions the rust team made, but not this one.

kibwen10y ago

This is incorrect. Package repositories with namespacing are just as vulnerable to these attacks.

airless_bar10y ago

Wrong.

1 more reply

vemv10y ago

Ruby and JS package managers are un-namespaced as well.

SixSigma10y ago

Julia too, though there is a central list. I haven't done any tests for this kind of thing.

anentropic10y ago

couldn't you register a typo namespace?

Klathmon10y ago

Yes, but then you'd need to also register a ton of packages under that namespace.

That's something that can be flagged for manual review before it gets too far.

1 more reply

wbond10y ago· 9 in thread

Certainly this doesn't prevent a malicious author from posting a legitimate package and then changing the contents to be malicious, but that can be somewhat solved by turning off automatic updates.

[0] https://packagecontrol.io

SCdF10y ago

Hey Will,

Thanks for keeping Package Control high quality, I know it's highly appreciated :-)

notduncansmith10y ago

Another grateful Package Control user here.

bcg110y ago

Sonatype has a manual review process as well before allowing new projects to deploy to Maven Central. [1][2]

[1]: http://central.sonatype.org/pages/ossrh-guide.html [2]: http://central.sonatype.org/articles/2014/Feb/27/why-the-wai...

yxhuvud10y ago

Typosquatting can be flagged automatically (for reviewal of a human later) using Levenshtein distance.

patates10y ago

For Postgres users, it is built-in: https://www.postgresql.org/docs/9.5/static/fuzzystrmatch.htm...

eudox10y ago

Keep fighting the good fight.

mercer10y ago

> Certainly this doesn't prevent a malicious author from posting a legitimate package and then changing the contents to be malicious, but that can be somewhat solved by turning off automatic updates.

Perhaps you could make this safer by adding an automatic check for how much the package has changed since the last version? And at least warn the user when they want to update?

cortesoft10y ago

I don't know how much checking for how much the package has change would help. You wouldn't need to change much to exploit - one line that downloads and executes code from somewhere would do it.

hoodoof10y ago

Perhaps list all new packages to the community and require/request validation or flagging by the community, along with listing similar package names.

eudox10y ago· 9 in thread

I'm a fan of the approach of personally submitting projects to the repository maintainer (e.g. through GitHub issues), and having the maintainer personally approve them.

It does raise the barrier to entry, but it would prevent typosquatting and regular namesquatting.

EDIT: Does any major package manager provide a "did you mean" functionality, offering a list of actual package names similar to what you typed?

akavel10y ago

then the maintianer must have perfcet sigth and never ovrelook even one tpyo :]

burkaman10y ago

APT does and others probably do too, but it obviously only gives suggestions when the package you entered doesn't exist.

eudox10y ago

Right, it's only useful if you've prevented typosquatting.

Which Debian has, because submitting a new package is a much more involved processes than sudo apt-get publish.

philjackson10y ago

That's a massive burden on the poor person who has to ok the package - especially at NPM's scale, for example.

seldo10y ago

We believe npm's scale is a direct result of having the lowest ceremony to publish a package. Turning the dial in the direction we did has pros and cons.

yoo1I10y ago

Well, ideally you'd set up some sort of system where multiple people work on managing a repository, similar to maybe how linux distributions package applications and libraries.

eudox10y ago

NPM's scale is the exception, rather than the rule.

tonyedgecombe10y ago

Or someone needs to approve suspiciously named packages.

eudox10y ago

How do you determine what is a suspicious package without reviewing every new package by hand?

1 more reply

Mizza10y ago· 9 in thread

This seems like pretty unethical research to me.

Also, doesn't point out that the bigger threat is that this is wormable.

tantalor10y ago

The acknowledgements mention 2 of the university advisers and a PyPi admin consented to the "notification program".

Still, people with good intentions have been prosecuted and convicted for less. I would be very concerned for this student.

throwawaysocks10y ago

kbenson10y ago

placeybordeaux10y ago

The research got computers to execute code on them without authorization and extracted information from them.

That is a crime under the CFAA in the USA. Not sure what it is in Germany/EU.

2 more replies

SolarNet10y ago

markbnj10y ago

Anyway, this is all part of why I always try to build inside a container, or at least in a virtualenv where I don't need to sudo the install.

placeybordeaux10y ago

Yeah I wouldn't want to find myself in court hearing

>17000 computers were forced to execute [unauthorized] arbitrary code

Certainly a crime in the US, not sure about Germany.

Nice execution though!

twinkletwinkle10y ago

1 more reply

2110y ago

I wonder about the legality. It looks to me like he isn't technically responsible, since he didn't access any authorized computer himself.

If I intentionally leave an infected USB drive on the ground, someone picks it up and sticks it into it's computer, am I liable?

Seems like it could go either way.

pmontra10y ago· 6 in thread

utexaspunk10y ago

wycats10y ago

That would mean that, for example on crates.io, you couldn't create a `libm`, because `libc` is already very popular. I don't think that works.

2 more replies

pmontra10y ago

It seems a good idea.

I checked some similar package names from a Gemfile.lock of a project of mine. Unfortunately the two gems hike and hirb are also at distance 2. Probably many short names are close with this metric.

Edit: the distance suffers from the same problem.

hughes10y ago

Surely some troll would deploy a fleet of machines that flood package indexes with requests to available names, effectively blacklisting entire dictionaries and eventually all short names.

pmontra10y ago

Yeah, this is what I came to think too. I mentioned it in another comment. Somebody suggested to use a distance indicator, but trolls could attack that too.

1 more reply

epalmer10y ago

> Obviously people can be very crative with typos and with squattinq and there is no real protection against mistakes.

I see what you did.

bennofs10y ago· 5 in thread

Did anyone else find it surprising the the number of total requests (45334) is so much higher than the number of unique total requests (17289)? It is more than twice the number of unique requests!

Possible explainations:

* People download one package and install it multiple times? Seems unlikely

Any other ideas?

Guillaume8610y ago

lighttower10y ago

The person who ran the line,

sudo pip install lumpy (instead of numpy)

Ran it again because it 'didn't work'

cderwin10y ago

caseysoftware10y ago

Numerous developers and/or building multiple servers behind a single IP address aka NAT. It's pretty common.

joepvd10y ago

zeveb10y ago· 4 in thread

Reminds me of the quote, 'there are only two hard things in computer science: naming things, cache invalidation and off-by-one errors.'

I think that this clearly falls under the heading 'naming issue.' People know what they want, but do not enter it properly.

I can't think of a 100% off-hand, which isn't surprising, because it's a hard problem.

pmontra's suggestion to use typo blacklisting ain't a bad idea. Maybe some sort of reputation-per-name could help?

a_t4810y ago

Sure it's not an off-by-one[-key] error? :)

blowski10y ago

Banks have a similar problem when people write cheques or set up standing orders. You have to put a name and the account number.

I wonder if you could do something similar here - enter the name of the package and a code of some sort. I haven't thought this through in a lot of detail.

PeterisP10y ago

Banks generally solve the issue with simple classic checksumming methods that guarantee that any number with a typo or swapped neighbouring characters will always result in an invalid number.

That doesn't work with arbitrary names because they are, well, arbitrary.

1 more reply

Klathmon10y ago

Or just refer to packages by 2 names.

    Maintainer/PackageName

It solves so many problems, this included.

5 more replies

szx10y ago· 4 in thread

When you think about it, how different is the destructive potential of an npm/pip install from curl | bash that (some) people tend to froth at the mouth about?

It's pretty mind blowing how big of a blindspot package installers are. I guess running everything inside a e.g. Docker container/VM would be a partial interim solution for the paranoid?

lmm10y ago

> When you think about it, how different is the destructive potential of an npm/pip install from curl | bash that (some) people tend to froth at the mouth about?

szx10y ago

Signing is definitely part of the answer but there's still the question of trust.

raesene1010y ago

For me they're very similar. I actually did a talk last year for OWASP AppsecEU where I started with the curl|bash bit and pointed out where rubygems/npm etc aren't really a lot better in some ways

https://www.youtube.com/watch?v=Wn190b4EJWk

szx10y ago

Nice talk! Sounds like there's no silver bullet...

Call me pessimistic but I don't see developers taking on the extra friction given the status quo. Though a major data breach or two might change things, as I'm sure we'll find out sooner or later.

ysavir10y ago· 4 in thread

Instead of blacklisting, why not respond with a "You requested package ABD, but we think you might mean package ABC. Enter 'yes' to continue or anything else to start over."

That way authors can continue to use any name they want, and the emphasis is on letting installers know that they might be installing the wrong package.

VLM10y ago

"You requested package ABD, but we think you might mean package ABC. Enter 'yes' to continue or anything else to start over."

That'll be fun to automate around in puppet or ansible.

voltagex_10y ago

I hope you're using a local package cache for puppet or ansible or even specifying via hash (think git commit)

sheac10y ago

But if ABD and ABC are both package names in the system, then in order to present that warning we have to do some sort of resolution process to determine whether one is typosquatting.

zardeh10y ago

I feel like "pick the more popular package" is a good enough solution in this case.

1 more reply

ryanmarsh10y ago· 3 in thread

So last week my client discovered there's a gem named bunlder... sigh

pmontra10y ago

There is a gem called bundle which doesn't do anything but preventing a typosquat

https://rubygems.org/gems/bundle Total downloads 1,800,600

Source (empty) at https://github.com/will/bundle and interesting README.

https://rubygems.org/gems/bundler Total downloads 92,116,090

It's almost the 2%.

rspeer10y ago

In Python, "pytables" (should be "tables") and "skimage" (should be "scikit-image") come to mind.

1 more reply

willlll10y ago

My gem has a good downlaod/loc ratio.

nichochar10y ago· 2 in thread

Wow, this a very good study and explanation of what typo squatting is, and I really liked how he proved it's effectiveness.

I wonder what kind of steps we can take to prevent this risk.

trungaczne10y ago

I think we will have to rely on crypto hash in some form. Similar to download checksum. It won't be convenient, but it will be safe(r).

bpicolo10y ago

That doesn't really save you from typos

1 more reply

mirekrusin10y ago· 2 in thread

with npm there should be at least an option which prompts for Y/N/A when package has preinstall hook.

seldo10y ago

It is possible to disable install hooks at install time by running npm install with --ignore-scripts.

You can also make this the default, with npm config set ignore-scripts true (and then --ignore-scripts false at install time if you wish to run them).

zanchey10y ago

mbroshi10y ago· 2 in thread

abstractbeliefs10y ago

The main problem is when you really did mean to search for the typo term. There's no inherent problem in two packages having similar names.

Consider the following:

requests - a python package for making HTTP requests. requestr - a python package for a fictional startup that allows you to send requests to your nearest and dearest.

Given they both could be typos of each other:

1) How do we determine which one to use? What if someone accidentally also tries "requestd", somewhere between the two ?

2) How do we apply the principle of least surprise - I asked to install requests, and everything installed just fine, but now I can't import it?!

ekimekim10y ago

    $ pip install requestr

    Package "requestr": did you mean "requests"? [Y/n]
    (reason for this warning: similar spelling and requests is much more popular)

    Pass --no-spell-warnings to disable this feature.

optimuspaul10y ago· 2 in thread

I'm confused.. is it 17 computers or 17000 computers? inconsistent use of decimals in this article.

cialowicz10y ago

17000. In Europe a common decimal format it #.###,##. See here: https://en.wikipedia.org/wiki/Decimal_mark#Examples_of_use

pidg10y ago

17,000. The author is from a country that uses . to delimit thousands.

Mahn10y ago· 1 in thread

> In the thesis itself, several powerful methods to defend against typo squatting attacks are discussed. Therefore they are not included in this blog post.

nailer10y ago

You'd Bayesian that.

PeterisP10y ago· 1 in thread

Part of the problem is the many packages that require sudo permissions to install - IMHO that should be an exceptional case, but it isn't.

nneonneo10y ago

The two solutions here are user-local packages (pip --user, for example) and virtual environments.

baby10y ago

cormacrelf10y ago

And 'npmjs.org' is misspelled as 'npmsjs.org' in the introduction. Nice.

zmanian10y ago

We need operating system vendors to give us a mechanism for easily creating and managed sandboxed dev environments.

Ones dev environment should be a place where remote code execution is a high probablity and we need better tools to partition that from high value data.

jogjayr10y ago

I thank my stars every time I get a "Package not found" error due to a typo, because I'm reminded that it could have been much worse.

jwilk10y ago

Trying to parse the title made my head hurt. It should be "Typosquatting software package names" or something.

tbrock10y ago

The homebrew model where packages and changes to packages are reviewed takes care of this problem quite nicely.

andrewstuart10y ago

Ouch. This really hurts. So hard to protect against human error.

sheerun10y ago

Glad to hear bower is stated to be safe in this regard :)

irremediable10y ago

Really cool applied research. If I get the time, I'll check out the author's thesis.

j / k navigate · click thread line to collapse