It looks like a very clever way of packaging an x86 fat binary for multiple platforms, without actually duplicating the code. To support ARM I assume it’ll need to be an actual fat binary, with both x86 and ARM code. At that point, unless I actually test the code myself on both architectures, how can I be confident it’s going to work properly?
If you’re using very simple C constructs and not doing anything fancy, it should work, but it’s not clear to me that this approach is preferable to e.g. a Python script. If you’re doing fancy stuff, it’s a bit more chancy as C isn’t memory-safe and has tons of undefined behavior and platform-specified weirdness.
Java is “run anywhere” because they’ve specified the JVM in massive detail and tried to ensure it actually works the same on all platforms. I don’t see how you can have that same confidence if you’re running machine code everywhere.
I guess I just don’t see the use case where this is compelling. If I write a handy Unix utility in C, I’ll just keep the source code around, and compile it as needed.
It might be handy if you need to move such a utility quickly from Unix to Windows, if you don’t have any dev tools set up. But I can’t think of a situation when I’ve needed that.
I’ll be using this feature in my day-to-day work because I interact with short-lived VMs running the software I’m developing. Being able to run lnav on my devbox and have it live-tailing logs on the VMs running the changes will be an improvement over the current workflow where I manually copy the files onto the devbox.
Can't wait to try this feature.
One use case is where the source and target are different operating systems, and either the target does not have a compiler installed or the source does not have a cross-compiler installed.
I run into this often. I do not wish to install Python or Java on every computer. I have small form factor computers where I need to conserve space. Also, I prefer the size and speed of C over Java and Python.
I think there may be a tad too much "marketing" and trying to be cute in the way this is presented but this work certainly has potential utility, whether or not it is used as intended. Even just reading through the hacks is educational.
Programming does not always have to be commercially-oriented to be useful or interesting.
Agreed!
Thanks for the response, that does sound like an area where this could be useful.
Although (speaking as somebody who doesn’t do a lot of embedded programming these days, so take this with a pinch of salt) I would have thought that if cross-compilation could be made much faster and easier, that might be a better solution overall. I really like what’s being done with Zig around cross-compilation.
Do you mean different OSes? AFAIK, Cosmopolitan isn't cross-architecture.
Isn't this exactly why you don't see the use case? You're willing to compile.
As someone working on a cross-platform, cross-language packaging tool (https://github.com/spack/spack), it's very appealing to not have to build for every OS and Linux distro. Currently we build binaries per-OS/per-distro. This would eliminate a couple dimensions from our combinatorial builds.
We still care a lot about non-x86_64 architectures, so that's still an issue, but the work here is great for distributors of binaries. It seems like it has the potential to replace more cumbersome techniques like manylinux (https://github.com/pypa/manylinux).
What varies between Linux distros, for your purposes? Different libc? I would naively assume it’s all just ELF so it shouldn’t be that big a deal to make a portable Linux binary.
That seems separate from swizzling ELF/Mach-O/PE, specializing the binary on first run(!), etc, which is all super cool but something I’d be wary of relying on as a solid platform. Maybe I’m being too cautious, though!
Reading the Cosmopolitan docs, it has some really clever optimizations, and I think I’d be more excited about simply a small and fast libc over the flashier APE parts.
Let's see if I can get some inspiration of how things can be built from this repo?
`o/$(MODE)/depend`, a makefile-looking file with the full dependency tree. It is compiled using tool/build/mkdeps.c. So we have a Makefile generator (Justine, you mentioned elsewhere in this thread you didn't want to invent a build system? :)) The Makefile generator is very specific to this project: parses C files and creates that tree.
It is damn fast and, so far, beautifully documented (at least the build parts I looked). Hell, even documentation lines in file preambles are 72 characters wide, and justified. Crazy. Do you manually justify those?
You are also vendoring statically-built GCC, and the folder with executables is <10MB. LLVM C/C++ toolchain is hundreds of megs, compressed.
I am certainly taking inspiration of being in tight control of the compiler toolchain, and beautiful documentation. Not sure I will write my Makefile generator, since my project is also not that big.
Thanks. Cosmopolitan is giving me much more to look at than an αcτµαlly pδrταblε εxεcµταblε.
gcc has a family of command-line options starting with -M than can generate dependencies as a byproduct during compilation. The generated files are in Makefile format, and only need to be included in your main Makefile.
https://make.mad-scientist.net/papers/advanced-auto-dependen...
Each and every build tool should be version pinned so that two build runs today and 4 years ago produce the same (hex neutral) executable.
Otherwise debugging issues becomes a nightmare...
Years ago I had a customer who wanted to freeze a toolset and wanted to be sure that any bug fix they requested changed only the lines of code relevant to the bug (they diffed the binaries and traced each change back to our source change to be sure). They paid an enormous premium for this capability.
There's 132,059 lines of Makefile code that's generated to o/$(MODE)/depend e.g.
o//libc/stubs/gcov.o: \
libc/stubs/gcov.S \
libc/macros.internal.h \
libc/macros.internal.inc \
libc/macros-cpp.internal.inc \
ape/relocations.h
That much code can't be written by hand, and if you don't write that, then your build targets won't be invalidated correctly. You'll end up with a non-deterministic unreliable build, which is much worse than generating some unfancy make that causes the make process to bootstrap itself. Goal is to get those hex perfect reproducible binaries with minimal toil.Also, when you write build configs, do you depend on system-provided tools and libraries? Such as some .so file or the python interpreter? The cosmopolitan mono repo doesn't do that. It currently only requires the make, sh, zip, mv, rm, touch and gzip commands. I'd ideally like to make it more hermetic but so far that hasn't been an issue, since the above tools are so stable.
It simply represents different tradeoffs of convenience vs. correctness. With Bazel, you get correctness but you pay a complexity price.
> Cosmopolitan makes C a build-once run-anywhere language, similar to Java, except it doesn't require interpreters or virtual machines be installed beforehand. Cosmo provides the same portability benefits as high-level languages like Go and Rust, but it doesn't invent a new language and you won't need to configure a CI system to build separate binaries for each operating system. What Cosmopolitan focuses on is fixing C by decoupling it from platforms, so it can be pleasant to use for writing small unix programs that are easily distributed to a much broader audience.
... and it only supports x86 (without binary translation), right? It's great to see progress like this, but it's poor form to suggest it's build-once run-anywhere in the same sense that Java is. As far as I can tell, it's not trivial to run these binaries on a RPi.
There is actually a discussion on that on this page, I'll embed part of the relevant discussion here:
> It'll be nice to know that any normal PC program we write will "just work" on Raspberry Pi and Apple ARM. All we have to do embed an ARM build of the emulator above within our x86 executables, and have them morph and re-exec appropriately, similar to how Cosmopolitan is already doing doing with qemu-x86_64, except that this wouldn't need to be installed beforehand. The tradeoff is that, if we do this, binaries will only be 10x smaller than Go's Hello World, instead of 100x smaller. The other tradeoff is the GCC Runtime Exception forbids code morphing, but I already took care of that for you, by rewriting the GNU runtimes.
Not exactly what you meant, perhaps, but in the same ball-park
It's extremely impressive, and dare I say useful for several cases, but it isn't really like java unless you squint really hard and pretend virtualization/emulators like QEMU are akin to a JVM, which is the way I understand that claim is supposed to be taken.
I mean, the analogy works, and I respect why the author believes this is more practically useful than Java, but it's like saying (IMO) that Linux binaries are a universal standard because you can just virtualize or emulate a Linux kernel on the cheap, a la Docker for the Desktop or whatever it's called.
- https://news.ycombinator.com/item?id=26271117
Is my understanding correct that the binary changes itself when first run on the target platform? That sounds like it will trigger a lot of red lights with many automated defensive mechanisms like anti virus.
What I'd like is 2 mechanisms:
First, the ability to disable this optimisation altogether.
Second, a means to run this optimisation as a distinct step, without executing the rest of the binary. For example, if the "--optimize" flag is used, do the optimisation and then exit.
> All you need to do is download the redbean.com program below, change the filename to .zip, add your content in a zip editing tool, and then change the extension back to .com.
> That performance is thanks to zip and gzip using the same compression format, which enables kernelspace copies.
Oh my. Having a web server executable which is a zip archive at the same time is a lovely idea. Have there been any other attempts similar to this?
From the latest issue:
Technical Note:The electronic edition of this magazine is valid as both PDF and ZIP. The PDF has been cryptographically signed with a factored private key for the TI 83+ graphing calculator.
No way they'd agree on a common binary format.
It says in the license:
Copyright 2020 Justine Alexandra Roberts Tunney
ISC License (same as MIT or BSD with unnecessary text removed)
I suppose this level of portability is more a feature if you're shipping to PCs anyway, though. If you're deploying to servers, you know the arch and OS ahead of time and there is no obvious downside I can think of to just targeting it directly.
> Please note that your APE binary will assimilate itself as a conventional resident of your platform after the first run, so it can be fast and efficient for subsequent executions.
I understand there may be no real way out, but this defeats part of the promise/purpose of PAE: assume I use a pae binary, know it's pae, and implicitly share it or copy it to a different machine. But once I have copied it from ~/bin/ it's not pae any more, it's optimized for my OS/arch!
Assuming the first-run optimization step is necessary, would it make sense to provide `--deoptimize` or `--paeize` flag to the same binary, so it's easy to return to the original, without recompiling it from source? Does it lose information when it optimizes? Can that information be tucked away (with an optional flag or env variable) for this step?
What happens if the binary is read-only?
.. Add Fabrice Bellard's JavaScript engine to third party
.. Add SQLite to third party
* Java is slowing our tools down
* amalgamation sqlite style is showing the benefits of tightly written, standalone good old C.
* By extension: monorepos are introducing complexity, since they depend on JVM for blaze/bazel.
* x86 is pervasive in our industry
People might also want to consider the cost/benefit trade-off for binary vs source compatibility. If you can code in a programming language that works across platforms, can be readily transpiled to one of the supported statically typed languages with a robust, small and fast toolchain, you have any number of packagers who can quickly make binaries for your platform of interest that makes it convenient to install.You get the benefit of better static analysis vs good old C.
> Please note that your APE binary will assimilate itself as a conventional resident of your platform after the first run...
That's funny.