> duplicating the proper development environment in the end-user's computer would take a lot of space
Most distros offer precompiled binaries, there are relatively few that use source-based distribution and expect the user to have all the necessary compilers installed.
> would be complicated by the enormous variety of programming languages and environments
That problem isn't effectively addressed by a universal IR. You can't have a single IR that works well for all languages, precisely because of the variety of languages.
> Linux distros manage this with a lot of effort.
Hopefully that should improve if the trend toward languages like Safe Rust continues. C and C++ are infamously full of footguns.
> I'm imagining something like a universal intermediate representation that can be compiled quickly (because a lot of the early language-specific part of compilation will have already been done by whoever you get your packages from) and in a uniform way because there's a common intermediate representation format that all the compiled languages use.
Again this can't be done effectively. There are good technical reasons why Java, Haskell, and JavaScript, don't generally use LLVM as their backend. The differences between languages aren't just skin deep, they extend right through the compiler stack.
To be more precise: it could be done, but there would be an unacceptable performance cost. After all, you could start distributing binaries for the SuperH SH-4, and just use emulation everywhere. The question is whether it could be done effectively.
I mentioned before that LLVM IR is not intended to be used this way, although the Google Native Client project took LLVM and turned it into what you're suggesting.
C and C++ are quite different from Java. The size of the int type varies between platforms, for instance. They also have a preprocessor which allows the programmer to conditionally compile platform-specific code, e.g. intrinsics, fragments of assembly code, or workarounds. The program might use system-specific macros that expand before compilation.
Languages like Haskell are very different from the sorts of languages that LLVM is built for. Even Java prefers to use its own backend, with tight integration with its GC.
There's also a package-management question, although this issue wouldn't be as significant. The C/C++ way is to have the build system (autotoools or CMake or whatever) detect what libraries are available on the system. If an optional library is missing, the C/C++ code is automatically adjusted by the build system, prior to compilation. It would be unusual to detect availability of libraries at runtime. This approach doesn't play nicely with a universal IR. This might not be an issue if the IR is treated as a surrogate for the native-code binary, but the IR wouldn't be a good surrogate for the source.
The C/C++ philosophy is to accommodate platform variations, in contrast with the JVM approach of mandating compliance to a virtual machine. With the JVM approach you forbid the sorts of variations that C and C++ permit (everything from int_fast32_t varying between platforms, to hand-written SIMD assembly).
Others have already mentioned WASM and Google Native Client, both of which are stable, but neither of which are going to become mainstream ways of distributing Unix application code.
This topic has turned up on HN before, but frustratingly I wasn't able to find the thread.
> Universal binaries might also be acceptable for commercial, closed-source applications where source distribution would not.
True, but I think modern Unix OSs do a pretty good job on ABI stability. If they want portability without releasing source (something I don't think GNU/Linux should aim to accommodate, incidentally) they already have other options, like Java. JavaFX doesn't get much attention but it pretty much 'just works' for portable GUI applications.
edit skissane has an interesting comment on ANDF, a solution I hadn't heard of before.