Yarn Plug'n'Play: Getting rid of node_modules (opens in new tab)

(github.com)

362 pointsCouto7y ago99 comments

99 comments

Thing is, node_modules isn't just thirdparty modules.

It's just what the Node path resolution standard uses, which is also how Local Modules (the "/src/node_modules" structure) allows you to import other files with clean paths, without having to add a config in every tool of the toolchain, post-install symlinks, or any other non-crossplatform tricks. It just works because it's literally what Node uses to resolve paths, and all build tools are based on Node, so when you stick to the standard resolution, you can add new tools to the toolchain without needing a bunch of configs for them to find your files. For example, it's now also the default resolution in Typescript as well.

The only time /src/node_modules doesn't work is when tool goes out of its way to break it, and wrongly assumes that node_modules can only ever be used for external thirdparty code (e.g. Jest).

So best of luck to make Node + NPM + Yarn to agree on a new path resolution syntax, but I hope we won't end up with another tool-specific resolution that only works in Yarn.

cprecioso7y ago

This doesn't break that, it specifically says it will fall back to Node's module resolution algorithm when the module you're looking isn't in the static resolutions table. That means you can keep using that technique as you have bee.

As an aside, you can also use lerna[1], yarn workspaces[2] or pnpm workspaces[3] to achieve the same effect, depending on your package manager of choice. You might get additional boosts to code organization/productivity, it's explained in the links.

[1]: https://lernajs.io [2]: https://yarnpkg.com/lang/en/docs/workspaces/ [3]: https://pnpm.js.org/docs/en/workspace.html

arcatek7y ago

> when the module you're looking isn't in the static resolutions table

The fallback will kick in when the package that makes the request isn't in the static resolution table. Since those packages aren't part of the dependency tree we assume their dependencies have been installed independently, hence the fallback.

That said, I think the use case described by the parent post is neatly solved by the `link:` protocol, which basically 'aliases' a module to a specific name.

danShumway7y ago

I think that approach is really good. If a package doesn't fit into the static resolution table it could cause a lot of problems to point its dependencies in that direction, especially since node has no issue allowing you to override local modules and reuse the same names when you're deeper.

On that note though, it seems at first glance like there'd be no way for Yarn to tell if a require is overriding another, since (maybe I'm reading the paper wrong) the require speed improvements come from skipping the native module resolution.

Let's say I have a package and my source is set up with the following files:

----

/-->package.json

/-->src--->node_modules--->leftpad.js

/-->src--->main.js

----

And let's say I then install leftpad through the proposed system.

If I call ``require('leftpad');`` from main.js, what will get returned? Will Yarn know to use the local version of leftpad instead of the installed one?

If I'm understanding you correctly, this would break, but I'd use the Yarn-specific "link" system to fix it?

Touche7y ago

Since we're doing an aside, what advantages do those "monorepo" things have over git submodules?

Benjamin_Dobell7y ago

I don't maintain any monorepos, I've always used Git sub-modules; not just for Node.js, but for all sorts of stuff.

However, I'm increasingly finding that sub-modules are a bit of a pain. If I patch a sub-module, I have to move into every single project that depends on the sub-module and pull the latest version. Additionally, if the sub-module is large it's a real waste of storage on my computer.

That said, monorepos have annoyances of their own e.g. many modern package managers will happily check-out dependencies directly from Git. However, that's simply not going to work unless there's some standardisation in monorepo structure that the package manager is able to interpret.

dfabulich7y ago

Here's a good article. https://danluu.com/monorepo/ It cites simplified organization, simplified dependencies, superior tooling, and easier cross-project changes.

https://blog.ffwll.ch/2017/08/github-why-cant-host-the-kerne... is another good one. You can build issue/PR tools treating a monotree as containing multiple repositories; that's how Google and Facebook do their work (including checked-in OWNERS/MAINTAINERS files).

mnutt7y ago

Monorepos are excellent if you have a group of related projects, especially if they depend on each other. Instead of opening a series of PRs for each project in a specific dependent order, you can open a single PR and move everything forward in lock-step, ensuring that every commit / tag has all projects in the monorepo working together.

The biggest downsides are really around tooling: we've had tons of issues with our continuous integration environment (Travis CI) and build times trying to build only specific sub-projects using Lerna. The Github Pull Reviews and Issues pages can get pretty messy too, though we have a process to automatically add Github tags based on changes via Lerna.

I suspect this is why bigger companies seem to be more successful with monorepos than smaller ones, because they have the resources to invest in building their own tooling.

nailer7y ago

Excellent question! I use both regularly so are qualified to answer:

- monorepo. Versioning over entire repo. node modules are used for granularity, separation of concerns, etc.

- git submodules. Versioning over individual node modules. node modules are used for sharing code.

Nothing stopping you from using both.

Also, silly semi-technical point: I have never, ever, had a bad experience using git submodules, but many others have. I like submodules, but often work in teams where otherwise technically good people really hate them due to bad prior experiences.

nailer7y ago

I've been using Yarn workspaces for a year now and love it.

I use the 'monorepo' pattern.

- /packages is my own modules - it's entirely full of interesting project-relevant things.

- /node_modules is third party modules installed by yarn, and symlinks added to your own modules by yarn. I leave it collapsed.

It works well.

arcatek7y ago

I'm the author behind the proposal, feel free to ask me any question you might have!

As a personal note, I'm super excited and so grateful to have had the chance to work on this project - node_modules have been a longstanding thorn in the side of the Javascript ecosystem, and to finally have a chance to try something completely new is amazing.

shados7y ago

This is awesome. I didn't read the full proposal yet, but at first glance it's very close to how we do things where I work (because we needed something long before Node even existed, and never replaced it).

The biggest hurdle we constantly face is tools that make assumptions about the environment (eg: Flow, TypeScript, Webstorm, etc), so changing anything to module resolution breaks everything. We have to spend a lot of time hacking it in every single time. Sometimes the hooks just aren't there (TypeScript has a hook for overriding module resolution, but editors often don't give you an easy way to hook into TypeScript's own hooks).

Any thoughts on how things would work for tools if this was to go forward? Would they all need to be updated to honor this new way to resolve modules?

arcatek7y ago

Flow is already 100% compatible with PnP, through the use of the `module.resolver` configuration settings. We even saw some slight perf increases after switching in.

I guess the same could be done for other tools: the .pnp.js file can be easily interfaced with anything you throw at it without them having to care about the way the dependencies are resolved under the hood. Even non-js tools can simply communicate with it through the cli interface, which returns JSON data ready to use.

afarrell7y ago

This is a bit of a tangent, but do you know of good tutorials, blog posts, videos, or books which help build a mental framework of how javascript packaging works internally?

My goal is to be better at debugging development environment issues.

To be more specific, I am asking this from the perspective of a user of the node ecosystem who has repeatedly had the experience of running into problems with my development environment or packages and not really knowing what to do to get myself un-stuck. I can totally search google/stack overflow/github for an answer and sometimes there is a clear one. Sometimes I fond a command to run, sometimes I don't. In either case, I come away with a strong feeling that I've not actually learned anything. If the same thing happened again, I might remember what to google for next time, but I wouldn't be able to reason clearly about what I'm working with. In the past when I run into these issues, I've just thrown time at them until something sticks. That kinda sucks.

I'd like to find a way to put in some work and come away with a solid understanding. When you were starting this project, how did you go about building up your knowledge of the underlying abstractions that npm/yarn/webpack/etc. use?

When you see a misconfiguration or get a bug report with this project, how do you go about investigating it?

Have you ever seen a really good explanation of a bug/misconfiguration which helps the reader build a solid mental model?

russellbeattie7y ago

Why is .pnp.js file and the .pnp directory hidden? Most projects are moving away from hidden magic files/dirs I think.

jonny_eh7y ago

Seems weird that yarn.lock is not hidden but pnp.js is.

eliseumds7y ago

I quickly read the paper and I wonder if there's anything in place to deal with packages that generate side effects via postinstall. Can we at least manually create a list of them in package.json or something so that Yarn copies them into node_modules like before?

arcatek7y ago

Postinstall scripts are the main issue, yep. Right now the current implementation doesn't do anything special with them, meaning that they are installed inside the cache (except when they're disabled altogether, which often works well enough since there isn't that many packages that require postinstall scripts).

This is obviously wrong, so we'll soon go to a model where we "unplug" the packages and put them into a specific directory (a bit like the node_modules folder, but entirely flat). The feature itself is already there (`yarn unplug` in the rfc), but we need to wire it to the install process.

Ideally, I think it would be beneficial for the ecosystem to move away as much as possible from postinstall scripts - WebAssembly became a very good solution for native libraries, and as the added benefit that it works everywhere, including the web.

_urga7y ago

How would you implement https://github.com/ronomon/direct-io with WebAssembly?

Native modules exist because people sometimes need to drop down to work with the platform directly.

1 more reply

marijn7y ago

This seems to be solving a problem quite similar to the one of getting browsers to resolve non-relative imports (without file system access, they also need a precise map from module names to paths). Unfortunately, I can't remember who or what was working on that and I'm not sure how much progress it is making, but it'd be wonderful if this could somehow be framed in a way where it also helps that issue along. Are you taking this into consideration in the design?

arcatek7y ago

I think you're referring to the package-name-maps[1] proposal. We've been made aware of it during the middle of the development, and while we chose to continue using the static data tables approach, they aren't incompatible.

The reason we abstracted the static tables behind a JS api is precisely to make it easier for everyone to experiment with various implementations - some might use the internally embedded data as we do, some could consume a package-name-maps file, and some could do something entirely different. As long as the contract described in the document is fulfilled, everything is possible.

[1] https://github.com/domenic/package-name-maps

audiolion7y ago

Hey, I just saw a tweet about this, it looks super exciting. A question I have is how does this work with yarn workspaces? I have a monorepo setup and would love to try this, but I use both yarn workspaces and lerna to run commands across packages. Will nohoist be respected?

arcatek7y ago

Workspaces are working just fine: instead of creating the symlinks we simply register them into the static tables, and resolve them from their actual location on the disk.

Nohoist was a bit of a hack from the beginning (precisely because the hoisting was anything but guaranteed), so some incompatibility might happen for packages that cannot work without.

That said, I'm not too worried: we've tried a bunch of the most popular open-source packages, and they all seemed to work fine - the one issue was on an interaction between create-react-app and eslint, but there's discussions going on to fix that in a clean way.

mhenr187y ago

As a data point, my company is using yarn workspaces where one of the projects is an Electron app, and another has a Node server with native dependencies.

Our nohoist section currently looks like this in order to get things to behave:

  "nohoist": [
    "**/electron/**",
    "**/electron",
    "**/electron*/**",
    "**/electron*",
    "**/canvas-prebuilt/**",
    "**/canvas-prebuilt"
  ]

These are likely going to require auto-ejection due to being native and having postinstall scripts.

Would it make sense to not bother with hoisting of ejected modules when using YPnP, given that YPnP solves the same problems as hoisting in a more global way? We'd be able to get rid of the nohoist section in that case.

audiolion7y ago

The reason I use nohoist is because my monorepo has a react native package and web package inside it, I found that if I hoist react native stuff it doesn't resolve properly with the metro bundler, so I have to nohoist every react native package.

marijn7y ago

Another question—does this also resolve from directory paths to index.js, add missing extensions, and so on, or does it only map package names to directories? As in, does this get rid of all 'superfluous' file system access during resolution?

arcatek7y ago

It works in two steps: the first step resolve to "unqualified paths" (which are just the paths within the cache without the index.js/extensions resolution). This step is entirely static, no filesystem involved here.

The second step converts the "unqualified paths" into "qualified paths", and is basically the index.js/extensions resolution. We currently access the filesystem in order to resolve them (just like Node), because we didn't want to store the whole list of files within our static tables (fearing that it would become too large and would slow down the startup time because of the parsing cost).

So to answer your question: we get rid of the node_modules folder-by-folder traversal, but decided that the extension check was an acceptable tradeoff. We might improve that a bit by storing the "main" entries within the tables, though, which would be a nice fast path for most package requires.

marijn7y ago

Makes sense. But I guess precise paths (ending with .js) are recognized and cause no actual file system searching?

1 more reply

suchabag7y ago

Looks great! Can we start using it now? Do you have a realistic timeframe for an official integration in Yarn?

arcatek7y ago

We'll see what the community response is, and once we're confident this is what everyone wants we'll merge it into master (PR is already up[1]).

While there isn't an official build at the moment, the code PR includes a prebuilt version of the current branch, and we have a playground repository to experiment with it.

[1] https://github.com/yarnpkg/yarn/pull/6382

[2] https://github.com/yarnpkg/pnp-sample-app

k__7y ago

Is this transparent for bundlers?

arcatek7y ago

Since the resolution is new, most of them requires plugins. I already wrote those for the most common projects:

https://github.com/yarnpkg/pnp-sample-app/tree/master/script...

They'll be published as separate packages once we merge the PR.

_urga7y ago

Removing `node_modules` would be fantastic, but not at the expense of native modules:

From the PDF:

"On the long term we believe that post-install scripts are a problem by themselves (they still need an install step, and can lead to security issues), and we would suggest relying less on this feature in the future, possibly even deprecating it in the long term. While native modules have their usefulness, WebAssembly is becoming a more and more serious candidate for a portable bytecode as the months pass."

WebAssembly is not "more and more a serious candidate" to replace native modules.

The issue with post-install scripts needs a better long-term solution, but simply deprecating native modules is not it.

arcatek7y ago

I mentioned it in another answer, but we'll eject the packages that require postinstall scripts - so they will work as before, but will take extra time to be installed.

As for wasm, I'm curious to hear what you think isn't good enough. I think the two main issues are garbage collection and dynamic linking, and there's ongoing work on them to fix them.

mhenr187y ago

Wasm is designed to deal with the VM sandbox, which if you're writing a server is fantastic compared to actual native code as there's better security properties.

It's not fantastic if you're working on an Electron app where there might be one specific feature that requires a native module in order to hit an OS API that you can't do from a browser sandbox (and that Electron itself doesn't expose).

Rather than pursue total deprecation/eventual removal of postinstall/native modules, I think a package.json "allowNativeDependencies": "never" | "devOnly" | "always" option that defaults to "devOnly" if not specified is the way to go.

woodrowbarlow7y ago

i would say make it a boolean flag, default to false, with a comment that says "this must be enabled for packages which use the local filesystem or OS APIs". if wasm delivers on its potential, OS-level interactions would really be the only valid use case (and legacy code, which should be considered a little bit inherently unsafe anyway).

perhaps yarn could even develop official packages which include native code that wraps all the most relevant OS APIs, and the default setting of the flag would allow only those native packages.

this is all pretty far down the road anyway.

_urga7y ago

"I mentioned it in another answer, but we'll eject the packages that require postinstall scripts"

Sure, but you also mentioned in the PDF that the other solution was just to deprecate native modules in the long term, and that's not acceptable.

"As a data point, we encountered no problem at Facebook with adding --ignore-scripts to all of our Yarn invocations. The two main projects we’re aware of that still use postinstall scripts are fsevents (which is optional, and whose absence didn’t cause us any harm), and node-sass (which is currently working on a WebAssembly port)."

I think you're too quick to sacrifice native modules because you're not really using them.

arcatek7y ago

I did work on the asmjs/wasm bindings for Yoga, Text-Buffer, and a few other projects, so I'm biased :)

Anyway, regardless of my own long term feelings, rest assured that postinstall scripts will be supported as well as they are now.

1 more reply

electroly7y ago

Native code has two benefits over JavaScript: it's faster, and it can call the OS. WASM is faster, but it can't call the OS. It cannot replace all uses of native modules.

zeroname7y ago

Calling the OS doesn't require native modules, it should be supported via a native cffi in node, like in Python.

1 more reply

mixedCase7y ago

Wasm on the browser can't call the OS, but I don't assume that to be the case on Node, just like it isn't for JS.

1 more reply

jpambrun7y ago

I don't think WAM achieve native performance yet. You need to copy data in and out of it instead of zero-copy. WASM can't link to native system libraries. WASM can't access most hardware.

It's very neat, but I can't see it fully replacing native modules soon.

kabes7y ago

Almost all the native modules I come across are about accessing system calls not available from within the sandbox. Only a few are there for speed, and since native calls add quite some overhead, they are only useful in specific use cases.

skrebbel7y ago

I didn't dive deeply enough, but I'd assume they could just fallback to classic node_modules behavior for the native packages, right? You'd not actually delete the node_modules directory then but still get most of the speed benefits.

EDIT: just saw a comment by the author that says roughly this: https://news.ycombinator.com/item?id=17977986

So, native packages will keep on working.

vorpalhex7y ago

Has WASM become the new bittorrent/cryptocurrency/ML that's going to solve any and all problems?

ohitsdom7y ago

npm announced crux yesterday, seems focused on the same problems. Approach seems slightly different, although there aren't as much technical details available for crux (unless someone has a better link):

https://blog.npmjs.org/post/178027064160/next-generation-pac...

edit: found the repo, seems a bit behind yarn's effort and not yet beta status. https://github.com/npm/crux

munificent7y ago

This is effectively how pub, the package manager for Dart works. At version resolution time, we generate a manifest file that contains the local paths to every depended-upon package. Then, at compile/load time, tools just use that file to locate packages.

gitgud7y ago

Very cool, does this mean you can check in the dependency file to version control? Just like yarn.lock?

As much as I hate node_modules, there are times when I want to see how a library is implemented. Is there a way to have some libraries in node_modules? Say only the ones in listed in the package.json file

ndarilek7y ago

Second this. I'd ask that any dependencies in node_modules/ override whatever is cached.

For a work project, I'm updating a dependency written against an undocumented specification. Much fun is being had doing this, I assure you. Being able to crack open the sources in node_modules/, instrument things with logging, and fix fix fix until I get something working is very helpful. I'm sure I can track things down in a cache directory and do the same, but it's nice knowing that my changes are limited to just this particular project, and that a checkout/reinstall of the module in another project will get the original broken sources for comparison.

arcatek7y ago

The .pnp.js file can be checked-in, but needs the cache to work. Right now I'd advise not to check-it in.

`yarn unplug` will eject a dependency from the cache and put it into the project folder. That's a quick and easy way to inspect and debug libraries you use.

w4tson7y ago

Just curious, why not?

arcatek7y ago

It doesn't have a lot of advantages right now. That said, it won't stay true forever. PnP is but the first step of a long-term goal I have. Check the Section 6.B from the whitepaper[1] for more details.

[1] https://github.com/yarnpkg/rfcs/blob/65b36475c04b1149eb51a81...

jannes7y ago

I wish there was something I could do about reducing the contents of the node_modules folder instead of hiding the files somewhere else on disk.

It angers me how many dependencies very simple projects amass.

zkochan7y ago

This change will reduce the size node_modules uses up (10 or more times).

pnpm saves one version of a package only ever once on a disk and node_modules consume drastically less space

zkochan7y ago

here's an article about how much less disk space is used by pnpm. This concept will probably come with the same disk space savings http://www.andrewconnell.com/blog/npm-yarn-pnpm-which-packag...

but this concept is fresh, pnpm already works;)

the_duke7y ago

It's actually relatively easy, I've recently written a non-trivial project with exactly 2 dependencies.

You just need to forgo all the conveniences the ecosystem provides and write stuff yourself instead.

Node.js or the browser have everything you need.

protonfish7y ago

I've been trying to clean up my excess node modules and found that when building a web-socket app, you don't need express (native http works fine) and you don't even need socket.io (unless your project needs to work on IE 9 or less) It's hard to find examples so I wrote one: https://github.com/chrisbroski/bare-socket

codingdave7y ago

Bringing packages into your work is your own choice. You should always be doing an evaluation of the benefit given by a package vs. the bloat it brings to your project. For many packages, it is worth it. But if you are in the habit of installing a package just to avoid writing a widget in your UI or writing some helper functions, then... yes, your projects will amass dependencies quickly.

jannes7y ago

My main complaint is probably about webpack and its dependencies. Unfortunately, it seems to be the accepted standard bundler. The benefits of hot module replacement are too nice to forego easily.

mschuetz7y ago

The ridiculous amount of dependencies in webpack is one of the major reasons I went for rollup. I refuse to install anything that depends on is-odd.

1 more reply

lucisferre7y ago

All I really want is a consistent solution to the `../../../..` problem with local module resolutions.

danShumway7y ago

Put your content inside of a nested node_modules folder and you won't have this problem anymore.

I don't think node_modules is perfect, and I get why it gets hate, but IMHO the algorithm is actually kinda nice for nesting packages.

If you set up your folders like so:

----

src--->node_modules--->utils--->helper.js

src--->main.js

----

You can require your helper in main.js with a simple ``require('utils/helper.js');``

What's nice is that you can't do the reverse. So your helper doesn't get the same access to your main.js. I use this a lot for testing - it means that my tests can require my components using the same syntax that I use in normal code, but those components don't have special access to the tests.

A big "aha" moment for me with node_modules was figuring out that it's an entirely separate system from npm. It's not trying to build a flat package structure; it's trying to make it easy to on-the-fly set up packages from anywhere.

Edit: example - https://gitlab.com/dormouse/dormouse/tree/master/src

I've also gotten into the habit of checking my installed packages into source for projects that I don't expect users to rebuild (ie. websites, games, etc...). That's a much longer conversation, but it mitigates a large number of issues with node_modules on long-lived projects.

udp7y ago

> A big "aha" moment for me with node_modules was figuring out that it's an entirely separate system from npm.

I wish this was true. My workflow from back in the early days of node has always been to `npm install` external dependencies, then `npm link` (symlink) the dependencies I'm working on. But npm >= v5 removes such symlinks if I `npm install` anything afterwards. I spend a significant amount of my time re-linking what npm install unlinked.

Usually when I hit a problem like this it's because I'm things have moved on and I'm doing something wrong. But when an npm developer says "consider npm as it is to be broken" and closes the issue [1], I'm not so sure.

[1] https://github.com/npm/npm/issues/17287#issuecomment-4008339...

danShumway7y ago

To repeat: a big "aha" moment for me with node_modules was figuring out that it's an entirely separate system[0] from npm.

Node's module resolution has nothing to do with npm. Npm is a package repository and installer built on top of node_modules. The reason your system broke is because npm cleared out your node_modules folder as part of its install. And from the sound of things on your linked issue, the devs are entirely aware of the problems this behavior causes and are planning to fix it.

An interesting exercise that I highly encourage people to do if they're finding this weird is to take a weekend and build their own version of npm just to demystify what's going on with it. It's not that hard to do, Node gives you all the tools you need - in its simplest form you need to curl whatever packages you want to install, and stick them in a node_modules folder. Then you need some way to track which packages you've downloaded. Node handles all the rest.

That npm occasionally breaks Node behavior is bad - I've been on the wrong end of those regressions as well[1] and it was super frustrating. But that doesn't really have anything to do with Node, it just means the npm team needs to test their releases more.

[0]: https://nodejs.org/api/modules.html#modules_loading_from_nod...

[1]: https://github.com/npm/npm/issues/18942

lucisferre7y ago

That’s actually a pretty good idea and something I had not realized was possible.

shados7y ago

while it would be great to have a standard way to do absolute imports, I almost like how it force people into more sensible project structures and splitting large projects into multiple packages. The "smell" of the ../../../ encourages better practices, so it's not all bad IMO.

MrEfficiency7y ago

Wow, I thought I was alone and did something unconventional. Good(?) to hear that this isnt me.

i3867y ago

Congratulations to the JavaScript community in reinventing the Maven local repository.

Hopefully yarn does a better job at validating dependencies than early Maven 1 and 2 did.

acemarke7y ago

I just saw two comments comparing Yarn's RFC to Maven - yours and one over on Lobsters.

Having never used Maven, any chance you could point to a document explaining how Maven's cache approach works, and maybe expand on the similarities between that and Yarn's RFC?

i3867y ago

Sure, here you go.

https://maven.apache.org/guides/introduction/introduction-to...

olingern7y ago

I just came here to say this is great work, and congrats to the Yarn team for continually making package management more sane.

mstade7y ago

Nice work – what's being done to make this the default behavior in node and deprecate node_modules proper, or is this forever destined to live in the awkward place of userland-but-not-really kind of territory?

People have already mentioned native modules. Install scripts are a nuisance, but exist for reasons. If you remove support for them – provided this project takes off, which I suppose it will because bandwagons – you risk people performing install-type tasks on first module load, potentially creating an even worse situation. Has this been considered as a risk?

arcatek7y ago

> what's being done to make this the default behavior in node and deprecate node_modules proper

My hope is that PnP proves that this approach can work and raises the stakes for Node to implement APIs that would allow for a better integration. It's a long term plan and we have yet to prove ourselves on the long run, but I'm quite confident :)

> If you remove support for them

I don't think we'll ever remove support for them. I'll personally advocate for them not to be used because of their subpar developer experience, but apart from that it won't cost us much to support them.

lxe7y ago

From what I understand: instead of cp -rf from offline cache, or ln -sf (which doesn't work for a large percentage of packages due to node being node), they propose to use a custom resolver to tap into the cache directly.

This will also break for various packages due to fs.readFile* dependencies, gyp, and other things. If your dependency tree is 4k+ node modules, the "vanilla" yarn or npm resolution and reconciliation techniques are already so brittle, that changing them will undoubtedly break things.

cryptozeus7y ago

Currently I use npm for angular projects. Before writing even hello world , I have to perform npm install on the project and install so many module to serve up small web app. This seems like we are going in the right direction, cant wait to try yarn as package manager.

ericintheloft27y ago

Novice question from someone from ruby land: would this work similar to bundler and Gemfile?

aij7y ago

How does yarn pnp compare to nix? (node2nix)

I've been thinking about unifying our current ivy+yarn+bower setup, but haven't yet gotten much past thinking about it...

specialist7y ago

This is a compelling reason to switch:

From https://pnpm.js.org

"Efficient: One version of a package is saved only ever once on a disk. So you save dozens of gigabytes of disk space!"

Without digging, I imagine this will be more like Maven's cache.

NPM's design decision to flatten the version hierarchy baffled me. And has occasionally tripped me up.

Rockslide7y ago

For everyone running a Docker setup, this is not an option... Docker doesn't support symlinking files outside the build context.

colemickens7y ago

This is soon to be solvable by mounting the cache directory during the build, thankfully: https://github.com/moby/buildkit/pull/442 ("dockerfile: add run --mount support #442 ")

striking7y ago

Docker also has a cache, making dependency installs not usually an issue.

Rockslide7y ago

Which works on a totally different level and doesn't help at all in terms of sharing dependency snapshots between different projects.

1 more reply

sergiotapia7y ago

The problem in javascript seems more an ideological one where programmers instantly reach for a package that has multiple dependencies of which those dependencies have even more dependencies.

There's this culture of not caring about bloat it seems in the vast majority of javascript projects. left_pad comes to the mind as the poster boy for this stuff.

MrEfficiency7y ago

Isnt this the point of OOP?

Does any other language have a solution to this?

munchbunny7y ago

Yes and no. It's more a question of degrees and developer culture. I think JS just has a stronger "glue stuff together" mentality combined with the lack of a thorough standard API.

In my experience, C# libraries tend to be more averse by default to taking on extra dependencies, but that's in part because .NET already does so much work for you. Python is a bit less averse, but certainly not to the level of JS where you can easily end up with hundreds of nodes in the dependency graph. But then Python isn't used much for client UI code.

nojvek7y ago

Part of the problem comes from using a popular package. That package could be importing 100’s of other things.

Typescript is one of the very few node modules that is very self contained. You install babel, webpack and eslint and that’s easily over 1k packages.

So yes, js ecosystem is a nightmare for security folks since anyone of those thousands of packages could access filesystem, network and create Backdoors.

Our express site got hacked because one of the sub dependencies was compromised.

Seriously stay away from using nodejs to serve production traffic for serious projects using glued packages. If you want to do it, use extremely thin, well vetted packages and be very mindful of upgrades.

1 more reply

nicoburns7y ago

It's also because node had a package manager before it got popular. So adding dependencies was much easier, right from the start.

jensvdh7y ago

Can we get rid of Yarn instead?

acemarke7y ago

Why in the world would you want to do that? As this example shows, the competition is good for the ecosystem.

j / k navigate · click thread line to collapse

99 comments

wildpeaks7y ago

Thing is, node_modules isn't just thirdparty modules.

The only time /src/node_modules doesn't work is when tool goes out of its way to break it, and wrongly assumes that node_modules can only ever be used for external thirdparty code (e.g. Jest).

So best of luck to make Node + NPM + Yarn to agree on a new path resolution syntax, but I hope we won't end up with another tool-specific resolution that only works in Yarn.

cprecioso7y ago

[1]: https://lernajs.io [2]: https://yarnpkg.com/lang/en/docs/workspaces/ [3]: https://pnpm.js.org/docs/en/workspace.html

arcatek7y ago

> when the module you're looking isn't in the static resolutions table

That said, I think the use case described by the parent post is neatly solved by the `link:` protocol, which basically 'aliases' a module to a specific name.

danShumway7y ago

Let's say I have a package and my source is set up with the following files:

----

/-->package.json

/-->src--->node_modules--->leftpad.js

/-->src--->main.js

----

And let's say I then install leftpad through the proposed system.

If I call ``require('leftpad');`` from main.js, what will get returned? Will Yarn know to use the local version of leftpad instead of the installed one?

If I'm understanding you correctly, this would break, but I'd use the Yarn-specific "link" system to fix it?

Touche7y ago

Since we're doing an aside, what advantages do those "monorepo" things have over git submodules?

Benjamin_Dobell7y ago

I don't maintain any monorepos, I've always used Git sub-modules; not just for Node.js, but for all sorts of stuff.

dfabulich7y ago

Here's a good article. https://danluu.com/monorepo/ It cites simplified organization, simplified dependencies, superior tooling, and easier cross-project changes.

mnutt7y ago

I suspect this is why bigger companies seem to be more successful with monorepos than smaller ones, because they have the resources to invest in building their own tooling.

nailer7y ago

Excellent question! I use both regularly so are qualified to answer:

- monorepo. Versioning over entire repo. node modules are used for granularity, separation of concerns, etc.

- git submodules. Versioning over individual node modules. node modules are used for sharing code.

Nothing stopping you from using both.

nailer7y ago

I've been using Yarn workspaces for a year now and love it.

I use the 'monorepo' pattern.

- /packages is my own modules - it's entirely full of interesting project-relevant things.

- /node_modules is third party modules installed by yarn, and symlinks added to your own modules by yarn. I leave it collapsed.

It works well.

arcatek7y ago

I'm the author behind the proposal, feel free to ask me any question you might have!

shados7y ago

Any thoughts on how things would work for tools if this was to go forward? Would they all need to be updated to honor this new way to resolve modules?

arcatek7y ago

Flow is already 100% compatible with PnP, through the use of the `module.resolver` configuration settings. We even saw some slight perf increases after switching in.

afarrell7y ago

This is a bit of a tangent, but do you know of good tutorials, blog posts, videos, or books which help build a mental framework of how javascript packaging works internally?

My goal is to be better at debugging development environment issues.

When you see a misconfiguration or get a bug report with this project, how do you go about investigating it?

Have you ever seen a really good explanation of a bug/misconfiguration which helps the reader build a solid mental model?

russellbeattie7y ago

Why is .pnp.js file and the .pnp directory hidden? Most projects are moving away from hidden magic files/dirs I think.

jonny_eh7y ago

Seems weird that yarn.lock is not hidden but pnp.js is.

eliseumds7y ago

arcatek7y ago

_urga7y ago

How would you implement https://github.com/ronomon/direct-io with WebAssembly?

Native modules exist because people sometimes need to drop down to work with the platform directly.

1 more reply

marijn7y ago

arcatek7y ago

[1] https://github.com/domenic/package-name-maps

audiolion7y ago

arcatek7y ago

Workspaces are working just fine: instead of creating the symlinks we simply register them into the static tables, and resolve them from their actual location on the disk.

Nohoist was a bit of a hack from the beginning (precisely because the hoisting was anything but guaranteed), so some incompatibility might happen for packages that cannot work without.

mhenr187y ago

As a data point, my company is using yarn workspaces where one of the projects is an Electron app, and another has a Node server with native dependencies.

Our nohoist section currently looks like this in order to get things to behave:

  "nohoist": [
    "**/electron/**",
    "**/electron",
    "**/electron*/**",
    "**/electron*",
    "**/canvas-prebuilt/**",
    "**/canvas-prebuilt"
  ]

These are likely going to require auto-ejection due to being native and having postinstall scripts.

audiolion7y ago

marijn7y ago

arcatek7y ago

marijn7y ago

Makes sense. But I guess precise paths (ending with .js) are recognized and cause no actual file system searching?

1 more reply

suchabag7y ago

Looks great! Can we start using it now? Do you have a realistic timeframe for an official integration in Yarn?

arcatek7y ago

We'll see what the community response is, and once we're confident this is what everyone wants we'll merge it into master (PR is already up[1]).

While there isn't an official build at the moment, the code PR includes a prebuilt version of the current branch, and we have a playground repository to experiment with it.

[1] https://github.com/yarnpkg/yarn/pull/6382

[2] https://github.com/yarnpkg/pnp-sample-app

k__7y ago

Is this transparent for bundlers?

arcatek7y ago

Since the resolution is new, most of them requires plugins. I already wrote those for the most common projects:

https://github.com/yarnpkg/pnp-sample-app/tree/master/script...

They'll be published as separate packages once we merge the PR.

_urga7y ago

Removing `node_modules` would be fantastic, but not at the expense of native modules:

From the PDF:

WebAssembly is not "more and more a serious candidate" to replace native modules.

The issue with post-install scripts needs a better long-term solution, but simply deprecating native modules is not it.

arcatek7y ago

I mentioned it in another answer, but we'll eject the packages that require postinstall scripts - so they will work as before, but will take extra time to be installed.

As for wasm, I'm curious to hear what you think isn't good enough. I think the two main issues are garbage collection and dynamic linking, and there's ongoing work on them to fix them.

mhenr187y ago

Wasm is designed to deal with the VM sandbox, which if you're writing a server is fantastic compared to actual native code as there's better security properties.

woodrowbarlow7y ago

perhaps yarn could even develop official packages which include native code that wraps all the most relevant OS APIs, and the default setting of the flag would allow only those native packages.

this is all pretty far down the road anyway.

_urga7y ago

"I mentioned it in another answer, but we'll eject the packages that require postinstall scripts"

Sure, but you also mentioned in the PDF that the other solution was just to deprecate native modules in the long term, and that's not acceptable.

I think you're too quick to sacrifice native modules because you're not really using them.

arcatek7y ago

I did work on the asmjs/wasm bindings for Yoga, Text-Buffer, and a few other projects, so I'm biased :)

Anyway, regardless of my own long term feelings, rest assured that postinstall scripts will be supported as well as they are now.

1 more reply

electroly7y ago

Native code has two benefits over JavaScript: it's faster, and it can call the OS. WASM is faster, but it can't call the OS. It cannot replace all uses of native modules.

zeroname7y ago

Calling the OS doesn't require native modules, it should be supported via a native cffi in node, like in Python.

1 more reply

mixedCase7y ago

Wasm on the browser can't call the OS, but I don't assume that to be the case on Node, just like it isn't for JS.

1 more reply

jpambrun7y ago

I don't think WAM achieve native performance yet. You need to copy data in and out of it instead of zero-copy. WASM can't link to native system libraries. WASM can't access most hardware.

It's very neat, but I can't see it fully replacing native modules soon.

kabes7y ago

skrebbel7y ago

EDIT: just saw a comment by the author that says roughly this: https://news.ycombinator.com/item?id=17977986

So, native packages will keep on working.

vorpalhex7y ago

Has WASM become the new bittorrent/cryptocurrency/ML that's going to solve any and all problems?

ohitsdom7y ago

https://blog.npmjs.org/post/178027064160/next-generation-pac...

edit: found the repo, seems a bit behind yarn's effort and not yet beta status. https://github.com/npm/crux

munificent7y ago

gitgud7y ago

Very cool, does this mean you can check in the dependency file to version control? Just like yarn.lock?

ndarilek7y ago

Second this. I'd ask that any dependencies in node_modules/ override whatever is cached.

arcatek7y ago

The .pnp.js file can be checked-in, but needs the cache to work. Right now I'd advise not to check-it in.

`yarn unplug` will eject a dependency from the cache and put it into the project folder. That's a quick and easy way to inspect and debug libraries you use.

w4tson7y ago

Just curious, why not?

arcatek7y ago

[1] https://github.com/yarnpkg/rfcs/blob/65b36475c04b1149eb51a81...

jannes7y ago

I wish there was something I could do about reducing the contents of the node_modules folder instead of hiding the files somewhere else on disk.

It angers me how many dependencies very simple projects amass.

zkochan7y ago

This change will reduce the size node_modules uses up (10 or more times).

pnpm saves one version of a package only ever once on a disk and node_modules consume drastically less space

zkochan7y ago

here's an article about how much less disk space is used by pnpm. This concept will probably come with the same disk space savings http://www.andrewconnell.com/blog/npm-yarn-pnpm-which-packag...

but this concept is fresh, pnpm already works;)

the_duke7y ago

It's actually relatively easy, I've recently written a non-trivial project with exactly 2 dependencies.

You just need to forgo all the conveniences the ecosystem provides and write stuff yourself instead.

Node.js or the browser have everything you need.

protonfish7y ago

codingdave7y ago

jannes7y ago

My main complaint is probably about webpack and its dependencies. Unfortunately, it seems to be the accepted standard bundler. The benefits of hot module replacement are too nice to forego easily.

mschuetz7y ago

The ridiculous amount of dependencies in webpack is one of the major reasons I went for rollup. I refuse to install anything that depends on is-odd.

1 more reply

lucisferre7y ago

All I really want is a consistent solution to the `../../../..` problem with local module resolutions.

danShumway7y ago

Put your content inside of a nested node_modules folder and you won't have this problem anymore.

I don't think node_modules is perfect, and I get why it gets hate, but IMHO the algorithm is actually kinda nice for nesting packages.

If you set up your folders like so:

----

src--->node_modules--->utils--->helper.js

src--->main.js

----

You can require your helper in main.js with a simple ``require('utils/helper.js');``

Edit: example - https://gitlab.com/dormouse/dormouse/tree/master/src

udp7y ago

> A big "aha" moment for me with node_modules was figuring out that it's an entirely separate system from npm.

[1] https://github.com/npm/npm/issues/17287#issuecomment-4008339...

danShumway7y ago

To repeat: a big "aha" moment for me with node_modules was figuring out that it's an entirely separate system[0] from npm.

[0]: https://nodejs.org/api/modules.html#modules_loading_from_nod...

[1]: https://github.com/npm/npm/issues/18942

lucisferre7y ago

That’s actually a pretty good idea and something I had not realized was possible.

shados7y ago

MrEfficiency7y ago

Wow, I thought I was alone and did something unconventional. Good(?) to hear that this isnt me.

i3867y ago

Congratulations to the JavaScript community in reinventing the Maven local repository.

Hopefully yarn does a better job at validating dependencies than early Maven 1 and 2 did.

acemarke7y ago

I just saw two comments comparing Yarn's RFC to Maven - yours and one over on Lobsters.

Having never used Maven, any chance you could point to a document explaining how Maven's cache approach works, and maybe expand on the similarities between that and Yarn's RFC?

i3867y ago

Sure, here you go.

https://maven.apache.org/guides/introduction/introduction-to...

olingern7y ago

I just came here to say this is great work, and congrats to the Yarn team for continually making package management more sane.

mstade7y ago

arcatek7y ago

> what's being done to make this the default behavior in node and deprecate node_modules proper

> If you remove support for them

lxe7y ago

cryptozeus7y ago

ericintheloft27y ago

Novice question from someone from ruby land: would this work similar to bundler and Gemfile?

aij7y ago

How does yarn pnp compare to nix? (node2nix)

I've been thinking about unifying our current ivy+yarn+bower setup, but haven't yet gotten much past thinking about it...

specialist7y ago

This is a compelling reason to switch:

From https://pnpm.js.org

"Efficient: One version of a package is saved only ever once on a disk. So you save dozens of gigabytes of disk space!"

Without digging, I imagine this will be more like Maven's cache.

NPM's design decision to flatten the version hierarchy baffled me. And has occasionally tripped me up.

Rockslide7y ago

For everyone running a Docker setup, this is not an option... Docker doesn't support symlinking files outside the build context.

colemickens7y ago

This is soon to be solvable by mounting the cache directory during the build, thankfully: https://github.com/moby/buildkit/pull/442 ("dockerfile: add run --mount support #442 ")

striking7y ago

Docker also has a cache, making dependency installs not usually an issue.

Rockslide7y ago

Which works on a totally different level and doesn't help at all in terms of sharing dependency snapshots between different projects.

1 more reply

sergiotapia7y ago

The problem in javascript seems more an ideological one where programmers instantly reach for a package that has multiple dependencies of which those dependencies have even more dependencies.

There's this culture of not caring about bloat it seems in the vast majority of javascript projects. left_pad comes to the mind as the poster boy for this stuff.

MrEfficiency7y ago

Isnt this the point of OOP?

Does any other language have a solution to this?

munchbunny7y ago

Yes and no. It's more a question of degrees and developer culture. I think JS just has a stronger "glue stuff together" mentality combined with the lack of a thorough standard API.

nojvek7y ago

Part of the problem comes from using a popular package. That package could be importing 100’s of other things.

Typescript is one of the very few node modules that is very self contained. You install babel, webpack and eslint and that’s easily over 1k packages.

So yes, js ecosystem is a nightmare for security folks since anyone of those thousands of packages could access filesystem, network and create Backdoors.

Our express site got hacked because one of the sub dependencies was compromised.

1 more reply

nicoburns7y ago

It's also because node had a package manager before it got popular. So adding dependencies was much easier, right from the start.

jensvdh7y ago

Can we get rid of Yarn instead?

acemarke7y ago

Why in the world would you want to do that? As this example shows, the competition is good for the ecosystem.

j / k navigate · click thread line to collapse