I can't believe the article doesn't mention it.
I've been using NixOS as my OS for development, desktop and we're in the middle of transitioning to using it for production deployments too.
Nix (the package manager not the distribution) solves so many of the discussed problems. And NixOS (the linux distribution) ties it all together so cleanly.
I keep my own fork of the Nixpkgs repository (which includes everything required to build the entire OS and every package), this is like having your own personal linux distribution with the but with the simplest possible way of merging changes or contributing from upstream.
I use it like I'd use virtualenv. I use it like I'd use chef. I use it like I'd use apt. I use it like I'd use Docker.
In addition to Nix, there is also a newer project: GNU Guix. Guix is built on top of Nix but replaces the custom package configuration language with Scheme, among other differences. https://gnu.org/software/guix/
When package management is solved at the system level, our deployment situation becomes a whole lot better. I used to do a lot of Ruby programming. Wrestling with RVM and bundler was a real pain, especially since bundler was incapable of helping me with the non-Ruby software that I needed as well like libmysqlclient, imagemagick, etc. Using Nix/Guix, you can throw out hacky RVM (that overrides bash built-ins like cd!) and simply use a profile that has the right Ruby version.
Bye pip, bundler, composer, CPAN, puppet, ansible, vagrant, ..., and hello Nix/Guix!
Personally, I'm rather more keen on Nix; the language is pretty much designed for writing JSON-style configuration except as a formal programming language, which is what the vast majority of Nix code is (both package definitions and system configurations).
Additionally, with Nix, you can be close to certain that if you build something twice, you'll get the same result, because it can't access impure resources.
Finally, because Guix is a GNU project, the official repositories are going to go nowhere near non-free software. Nixpkgs contains non-free software, although disabled from installation by default. You might be a little less likely to have people help you get non-free software working on the GNU Guix mailing lists, if you happen to use any.
I worked with a lot of platforms, such as PHP, Perl, Ruby, Python, Node.js and .NET. I felt the pain of pip, easy_install, setup-tools, virtualenv, bundler, gems, cpan, pear, rvm, rbenv, npm, bower, apt-get or whatever else I used at some point or another. And I swear, in spite of all the criticism that Java or Maven get and in spite of all warts, in terms of packaging and deployment for me it's been by far the sanest. I mean, it's not without warts, heaven forbid to end up with classpath issues due to transitive dependencies, but at the very least it is tolerable.
I've been hearing a lot about this project, but I always thought it was just an academic experiment. I'm in the process of packaging and maintaining a Python+Javascript+Redis+PostgreSQL application and Nix certainly is something I should learn more about.
Nix doesn't require you to specify the entire dependency tree; each dependency specifies its own dependencies, and those are resolved during the build process.
(For the record, at least Haskell, Python, and node.js packages are pulled into the nixpkgs tree from their respective package repositories regularly, albeit many missing native dependencies; there's a separate file that you can edit and send a pull request for packages which have native dependencies.)
Also, "Java" is mentioned twice in the article but I can't find mention of Ocaml Functors. I thought they solved most package problems even before Nix was around?
[...]
The basic idea is
- do away with modules
- all functions have unique distinct names
- all functions have (lots of) meta data
- all functions go into a global (searchable) Key-value database
- we need letrec
- contribution to open source can be as simple as
contributing a single function
- there are no "open source projects" - only "the open source
Key-Value database of all functions"
- Content is peer reviewed
These are discussed in no particular order below:
[...]
Full thread: http://thread.gmane.org/gmane.comp.lang.erlang.general/53472This is the crux of the problem. Before too long, the amount of metadata dwarfs the thing it describes and it's easier to rewrite the function than it is to find or describe it.
Modules and sub-module hierarchy offers a greater, simpler organizational methodology.
Seriously? Is that his solution to package management?
Keep in mind that Joe Armstrong is talking about Erlang here, which is a functional language - most of the functions in libraries are sort-of kind-of independent from each other; they especially don't share state.
There seems this fundamental disconnect between people making languages about how people use their languages. I don't have time to follow your Twitter feed, because I'm working on a lot of different things. I know it's important to you, the Language Developer, and so you think it should be important to me, the Language User. But I have dozens of things to keep track of, and all of them imagine that they're the most important thing in my world.
It's like the old office culture mocked in "Office Space" where the guy has 7 different bosses, each imagining their own kingdom is the most important.
Package management itself is not a solved problem, so you can't very well expect programming languages to be any different. The existing systems work quite well and make total sense: your package manager is tailored to your specific use case. Centralized/decentralized is a red herring. First figure out how to package every single thing for every single system and use case in the world, and then come back to me about organizational systems.
Every programming language has its own package manager because it's written in that language. No language maintainer is going to say something like, "Hey, want to use Ruby? Just install Perl first so you can install some Ruby packages!"
Likewise, every OS-level package manager assumes an OS. I'm sure apt-get, and yum and Nix are great. I'm also sure their greatness isn't very helpful to Windows users.
There's also the dependency between those two. An OS-level package manager can't easily be written in a high-level language, because one of its core jobs is to install high level languages. A language-level package manager doesn't want to re-invent the OS stack.
Bootstrapping is hard. Package managers sit very very low on the software stack where any dependencies are very difficult to manage and where consolidation is nigh impossible.
"Stable" distributions have an additional downside he doesn't mention: when you upgrade every package all at once it's a LOT more effort than if you had upgraded them slowly over time. Dealing with multiple library changes at once is an order of magnitude more difficult than dealing with them one-at-a-time.
And also, to some extent, if all the libraries you are using have a long term stable API, then it doesn't actually matter which one you pick - anything is painless.
Curious... I have exactly the opposite experience. I find that a certain amount of time is required to carefully regression-test my application code after upgrading a library. Doing this 23 times for my 23 different dependencies that need to be upgraded can be quite costly. If I, instead, upgrade all of the libraries at once and perform my extensive regression testing just once, I save a great deal of effort.
That's if everything goes smoothly. If something does NOT go smoothly and I encounter an error, then I need to determine which upgrade caused the problem. Most of the time (85% perhaps?), that turns out to be easy and obvious just by looking at the error that presents itself. In the remaining cases, I simply roll back half of the package upgrades and start binary-searching to identify the culprit (or culprits in the case of a conflict between libraries).
From my experience exactly the opposite is true. Compare uprading Slackware to keeping an Arch Linux running. With Slackware, I have to sit down for an hour, do the upgrade, read the notices that come along with it, maybe see if it will break any of my custom packages. This happes once or twice a year (security upgrades are completely painless as they don't break things). With Arch Linux I need to do that every day. If I don't have time to do it for a month, the system is basically broken beyond recognition...
I'm fine with having the odd out of date version of something, I'm just saying: be incremental about keeping your stuff up to date.
If you're very lucky, the packaging in question will not conflict horribly with apt or yum. So you probably won't be lucky.
Maybe this time we can talk about how to meaningfully solve these problems instead of just fighting pointlessly about if old tools are so great should be used for everything.
Decentralized package management huh?
How would that work?
A way of specifying an ABI for a packages instead of a version number? A way to bundle all your dependencies into a local package to depend on and push changes from that dependency tree automatically to builds off of it, but only manually update the dependency list?
I'm all for it. Someone go build one.
http://0install.net/ does this (sad to see it wasn't mentioned in the article). Basically:
1. Use URIs rather than short names to identify packages.
2. Scope dependencies so different applications can see different versions of the same library where necessary.
Here's an OSNews article from 2007 about such things:
http://www.osnews.com/story/16956/Decentralised-Installation...
Technically impossible for many languages (have fun figuring out what it would look like in Perl...). And even when it's possible, it's not a guarantee: you can have a semantic change without an ABI change. Cargo, Rust's newfangled package manager, supposes semantic versioning, and I think it's a sane attitude.
We've taken a pretty good shot at this in the OCaml ecosystem the via OPAM package manager (https://opam.ocaml.org).
* OPAM composes its package universe from a collection of remotes, which can be fetched either via HTTP(S), Git, Hg or Darcs. The resulting package sets are combined locally into one view, but can be separated easily. For instance, getting a view into the latest XenAPI development trees just requires "opam remote add xapi-dev git://github.com/xapi-project/opam-repo-dev".
* The same feature applies to pinning packages ("opam pin add cohttp git://github.com/avsm/ocaml-cohttp#v0.6"). This supports local trees and remote Git/Hg/Darcs remotes (including branches).
* OCaml, like Haskell, is statically typed, and so recompiles all the upstream dependencies of a package once its updated. This lets me work on core OCaml libraries that are widely used, and just do an "opam update -u" to recompile all dependencies to check for any upstream breakage. We did not go for the very pure NixOS model due to the amount of time it takes to compile distinct packages everywhere. This is a design choice to balance composability vs responsiveness, and Nix or 0install are fine choices if you want truely isolated namespaces.
* By far the most important feature in OPAM is the package solver core, which resolves version constraints into a sensible user-facing solution. Rather than reinvent the (rather NP-hard) solver from scratch, OPAM provides a built-in simple version and also a CUDF-compatible interface to plug into external tools like aspcud, which are used by other huge repositories such as Debian to handle their constraints.
This use of CUDF leads to some cool knobs and utilities, such as the OPAM weather service to test for coinstallability conflicts: http://ows.irill.org/ and the solver preferences that provide apt-like preferences: https://opam.ocaml.org/doc/Specifying_Solver_Preferences.htm...
* Testing in a decentralized system is really, really easy by using Git as a workflow engine. We use Travis to test all incoming pull requests to OPAM, much like Homebrew does, and can also grab a snapshot of a bunch of remotes and do bulk builds, whose logs are then pushed into a GitHub repo for further analysis: https://github.com/ocaml/opam-bulk-logs (we install external dependencies for bulk builds by using Docker for Linux, and Xen for *BSD: https://github.com/avsm/docker-opam).
All in all, I'm very pleased with how OPAM is coming along. We use it extensively for the Mirage OS unikernel that's written in OCaml (after all, it makes sense for a library operating system to demand top-notch package management).
If anyone's curious and wants to give OPAM a spin, we'd love feedback on the 1.2beta that's due out in a couple of weeks: http://opam.ocaml.org/blog/opam-1-2-0-beta4/
Also, you can pick which version of the compiler to run, and have it manage switching everything.
It seemed like it was years ahead of cabal, but that might just be because I only used it a little, I don't know. But there are some things to learn from OPAM.
Do you have a blog post like this, or something I could post the the Haskell subreddit?
http://opam.ocaml.org/blog/opam-1-2-pin/
(How to pin a development is central to the day-to-day development workflow of OCaml/OPAM users and quite annoying to change after-the-fact, so we're eager for feedback on this iteration before we bake it into the 1.2.0 release).
The OPAM blog is only about 2 weeks old, so there'll are quite a few more posts coming up as our developers discover there's quite a lot to write about :)
The real problem is that its so powerful and hard to ramp up on... The docs aren't sufficient for its overall complexity. That all aside, if the will were there, it could be the git of package managers.
* Quality and Trust mechanisms. If there are 14 different postgres clients, which do I choose?
* Package Metadata management. Where can I send bug reports? Who is the maintainer? How can I contact someone? Is there an IRC channel?
* Documentation and Function/Class Metadata. Why should I go to the Github README for one package, and to a random domain for another package?
* Linking compile and runtime error messages to documentation or bug reports. Why is google still the best way to track down the cause of an obscure error message?
* Source data linking and code reviews. I should be able to type in a module/namespace qualified function name and view the source without having to scour a git repository. I should also be able to comment directly on that source in a way that is publicly visible or privately visible.
I want to illustrate this with a detailed example of something I did just the other day, when I set up the structure for a new single page web application. Bear with me, this is leading up to the point at the end of this post.
To build the front-end, I wanted to use these four tools:
- jQuery (a JavaScript library)
- Knockout (another JavaScript library)
- SASS (a preprocessor to generate CSS)
- Jasmine (a JavaScript library/test framework)
Notice that each of these directly affects how I write my code. You can install any of them quite happily on its own, with no dependencies on any other tool or library. They are all actively maintained, but if what you’ve got works and does what you need then generally there is no need to update them to newer versions all the time either. In short, they are excellent tools: they do a useful job so I don’t have to reinvent the wheel, and they are stable and dependable.
In contrast, I’m pretty cynical about a lot of the bloated tools and frameworks and dependencies in today’s web development industry, but after watching a video[1] by Steven Sanderson (the creator of Knockout) where he set up all kinds of goodies for a large single page application in just a few minutes, I wondered if I was getting left behind and thought I’d force myself to do things the trendy way.
About five hours later, I had installed or reinstalled:
- 2 programming languages (Node and Ruby)
- 3 package managers (npm with Node, gem with Ruby, and Bower)
- 1 scaffolding tool (Yeoman) and various “generator” packages
- 2 tools that exist only to run other software (Gulp to run the development tasks, Karma to run the test suite) and numerous additional packages for each of these so they know how to interact with everything else
- 3 different copies of the same library (RequireJS) within my single project’s source tree, one installed via npm and two more via Bower, just to use something resembling modular design in JavaScript.
And this lot in turn made some undeclared assumptions about other things that would be installed on my system, such as an entire Microsoft Visual C++ compiler set-up. (Did I mention I’m running on Windows?)
I discovered a number of complete failures along the way. Perhaps the worst was what caused me to completely uninstall my existing copy of Node and npm — which I’d only installed about three months earlier — because the scaffolding tool whose only purpose is to automate the hassle of installing lots of packages and templates completely failed to install numerous packages and templates using my previous version of Node and npm, and npm itself whose only purpose is to install and update software couldn’t update Node and npm themselves on a Windows system.
Then I uninstalled and reinstalled Node/npm again, because it turns out that using 64-bit software on a 64-bit Windows system is silly, and using 32-bit Node/npm is much more widely compatible when its packages start borrowing your Visual C++ compiler to rebuild some dependencies for you. Once you’ve found the correct environment variable to set so it knows which version of VC++ you’ve actually got, that is.
I have absolutely no idea how this constitutes progress. It’s clear that many of these modern tools are only effective/efficient/useful at all on Linux platforms. It’s not clear that they would save significant time even then, compared to just downloading the latest release of the tools I actually wanted (there were only four of those, remember, or five if you count one instance of RequireJS).
And here’s the big irony of the whole situation. The only useful things these tools actually did, when all was said and done, were:
- Install a given package within the local directory tree for my project, with certain version constraints.
- Recursively install any dependent packages the same way.
That’s it. There is no more.
The only things we need to solve the current mess are standardised, cross-platform ways to:
- find authoritative package repostories and determine which packages they offer
- determine which platforms/operating systems are supported by each package
- determine the available version(s) of each package on each platform, which versions are compatible for client code, and what the breaking changes are between any given pair of versions
- indicate the package/version dependencies for a given package on each platform it supports
- install and update packages, either locally in a particular “virtual world” or (optionally!) globally to provide a default for the whole host system.
This requires each platform/operating system to support the concept of the virtual world, each platform/operating system to have a single package management tool for installing/updating/uninstalling, and each package’s project and each package repository to provide information about versions, compatibility and dependencies in a standard format.
As far as I can see, exactly none of this is harder than problems we are already solving numerous different ways. The only difference is that in my ideal world, the people who make the operating systems consider lightweight virtualisation to be a standard feature and provide a corresponding universal package manager as a standard part of the OS user interface, and everyone talks to each other and consolidates/standardises instead of always pushing to be first to reinvent another spoke in one of the wheels.
We built the Internet, the greatest communication and education tool in the history of the human race. Surely we can solve package management.
[1] http://blog.stevensanderson.com/2014/06/11/architecting-larg...
So now that we know what to do, the big question is: who's going to spend the next 5-10 years of their life on that project?
But this is my point: We are already solving all of those problems, and doing almost all of the work I suggested.
All of the main package managers recognise versions and dependencies in some form. Of course the model might not be perfect, but within the scope of each set of packages, it is demonstrably useful, because many of us are using it every day.
All of the people contributing packages to centralised package repositories for use with npm and gem and pip and friends are already using version control and they are already adding files to their projects to specify the dependencies for the package manager used to install their project — or in many cases, for multiple package managers, so the project can be installed multiple different ways, which is effectively just duplicated effort for no real benefit.
All major operating systems already come with some form of package management, though to me this is the biggest weak point at the moment. There are varying degrees of openness to third parties, and there is essentially no common ground across platforms except where a few related *nix distributions can use the same package format.
All major operating systems also support virtualisation to varying degrees, though again there is plenty of scope for improvement. I’ve suggested before that it would be in the interests of those building operating systems to make this kind of isolation routine for other reasons as well. However, even if full virtual machine level isolation if too heavyweight for convenient use today, usually it suffices to install the contents of packages locally within a given location in the file system and to set up any environment accordingly, and again numerous package managers already do these things in their own ways.
There is no need for multi-year ISO standardisation processes, and there is no need to have everything in the universe work the same way. We’re talking about tools that walk a simple graph structure, download some files, and put them somewhere on a disk, a process I could have done manually for the project I described before in about 10 minutes. A simple, consolidated version of the best tools we have today would already be sufficient to solve many real world problems, and it would provide a much better foundation for solving any harder problems later, and it would be in the interests of just about everyone to move to such a consolidated, standardised model.
It's not something you can make generic like a file/folder based version control tool. It's like asking for the Git of unit testing/continuous integration or whatever, not going to happen.
It needs to do this because each application is sandboxed. For most uses a generic packager is fine though. After all, most languages also have RPM, Deb, packages etc.