Is this actually smaller than TCC? It's kind of hard to tell since TCC is split into files and I don't know what's actually necessary. And that question includes earlier, less featureful versions of TCC. Note that the more-or-less-C compiler it's based off of is absolutely miniscule: https://bellard.org/otcc/
Also if you add up the binaries used via dynamic linking you are looking at which in the case of /lib/x86_64-linux-gnu/libc.so.6 is 16MB in size?
Did I mention it works on all x86+amd64 POSIX systems too?
It uses calloc, fopen, and fgetc, which can trivially forwarded to the same syscalls as stage0 with only a few bytes. It uses isalnum, isspace, isdigit, strcpy, and strstr, which are all trivial. Add all that up and maybe you need to penalize it by a kilobyte. I'm not sure exactly how it's using dlsym but it seems to also be worth few bytes. It's not using the other 16MB of libc.
Which makes for a pertinent question: can you compile GCC or Clang with this?
OTCC is <4KB. It's admittedly using tiny names to squeeze under a size limit, but even when you correct for that it's a lot smaller.
(Edit: At the time of submission the link went to https://github.com/oriansj/stage0/blob/master/Linux%20Bootst... with a description of "World's Smallest C Compiler")
It's even smaller than l1 cache. (In case you wanted to bruteforce every possible text file you could feed this binary, and wanted to do it all on cache in the CPU).
Maybe it's the size of payload you can put into a usb C cable or something.
I mean it's just so freaking small. Any ideas what limits this is "small enough" to fit under?
The part you grab on a typical usb C cable is just about the size of a microsd card. You might have to use a slightly narrower chip, but you can also go many times thicker. So if you're sneaking storage into a usb C cable think "terabyte".
(Unless you mean hijacking an existing chip, which might have 0 writable storage or might have a megabyte, who knows.)
It could comfortably fit inside the Apollo lander computer's RAM three times over.
You can easily build a ROM this big in Minecraft. https://www.youtube.com/watch?v=e4TXjhZLHpw
You could encode it into a relatively short Twitter thread https://qntm.org/twitcodings
But perhaps it's the smallest if we count transitive dependencies! (by this I mean count the lines of code of the program, and every program you need to build the program and everything you need to build them and so on)
I believe the "size" refers to the binary size of the compiler.
How is this measured/qualified? These days, I would doubt that 70% of people whose primary job is to write code have any knowledge of assembly whatsoever, so a naive reading of the above paragraph seems unlikely to succeed.
If pressed, I suspect most of the programmers in the world could read assembly. They might hate it, but they could do it, if given sufficient motivation. Simplified assembly used to be written as a game.
https://www.corewars.org/index.html
so a naive reading of the above paragraph seems unlikely to succeed.
How naive are you going here? Turn it into a contest, where the versions of the code contain backdoors, and contestants are ranked by how quickly and accurately they can identify them. Arrange for cash prizes, and you'd have your determination.
I strongly disagree. I work with a lot of very smart people. These people are Phd researchers at the top of their field, working on cutting edge algorithmic development. Their work is primarily mathematical, but they write enough MATLAB to prove that what they're developing really works. They write enough code that I think most people would consider them "programmers", yet they absolutely do not understand C++, much less assembly. As I said, these are very smart people; they could certainly learn, but they have no motivation to do so. It's not their job to understand all of the details of how a computer works (it's mine, more or less).
>How naive are you going here? Turn it into a contest, where the versions of the code contain backdoors, and contestants are ranked by how quickly and accurately they can identify them. Arrange for cash prizes, and you'd have your determination.
I didn't really mean naive in the sense of the people, but in the intent of the author. When I read "70% of the population of programmers", I think of myself and my ~6 coworkers who spend ~50+% of their time manipulating code. That's the simplest (i.e. naive) definition of "programmer" that I can come up with. If the author intended a different definition (like "people who claim to understand any assembly language"), then that definition might exclude my coworkers, making the goal a lot more achievable.
For the naive definition, only 14% (1/7) of my group have any chance of understanding this project. I think you could find a lot of front-end web focused groups where the percentage was much lower than that, and at this point I think those groups far outnumber the embedded systems groups where the number would be much closer to 100%.
Interpretability is another part of the problem: even if we recovered an executable copy of Ivan Sutherland's historically groundbreaking program SKETCHPAD, for example, we wouldn't be able to run it because we don't know the instruction set for the computer it was built for. Remember that the entire body of knowledge about Ancient Egyptian culture was lost in the 5th Century, when the Christian Dark Age closed the temples, and not regained for almost 1400 years — and then only due to the great good fortune of the Rosetta Stone.
A bootstrappable computing stack is a crucial part of the "Rosetta Stone" that will be needed to preserve 21st-century knowledge. One of the few papers tackling the interpretability problem in this form is http://www.vpri.org/pdf/tr2015004_cuneiform.pdf, "The Cuneiform Tablets of 2015", by Long Tien Nguyen and Alan Kay.
There's a more immediate necessity, though. As recent events make clear — Chrome's extension API kneecapping ad-blockers, the increasing effectiveness of Chinese censorship, and the shocking US$12 million award to Nintendo last November against ROM site operators for preserving classic video games, for example — the current political and economic system cannot be trusted to preserve our access to our cultural heritage, even during our lifetimes. That means that we need an autonomously-bootstrappable trustworthy free-software infrastructure that is viable without the massive economies of scale that fund mainstream platforms like Linux, Android, MacOS, Chrome, and even Firefox. If your personal archive of the Tank Man photo, the Arab Spring tweets, or the video of the murder of Philando Castile runs afoul of future malicious-content filters integrated into your operating system, there is no guarantee that it, or you, will survive.
So we're doing our best to get some green shoots established before the situation has any opportunity to get worse.
* Egg of the Phoenix (Blog post) - http://canonical.org/~kragen/eotf/
* The Cuniform Tablets of 2015 (Blue-sky academic research) - http://www.vpri.org/pdf/tr2015004_cuneiform.pdf
* Preventing The Collapse of Civilization (Video) - https://www.youtube.com/watch?v=pW-SOdj4Kkk
* Coding Machines (scifi story about trusting-trust attack) - https://www.teamten.com/lawrence/writings/coding-machines/
It sounds stupid, I know, but when I investigated possible paths for getting from zero to GCC, flex looked like the biggest potential obstacle.
There are also C files in GCC/binutils that are generated by complex shell scripts. That was perhaps the second biggest obstacle.
(If I recall correctly, bison's not a problem: old versions of bison don't use bison.)
Bison is my next target, but I haven't been able to work much on it lately. It's true that it should be possible to bootstrap it by history, although I really hope that one does not need to many steps to get to the latest release.
I hope Github improves Org mode support at some point.
https://github.com/wallyqs/org-ruby/issues/64
I have been looking at AsciiDoc:
https://asciidoctor.org/docs/asciidoc-syntax-quick-reference
I know. That's why I said "I hope Github improves Org mode support at some point." :)
> I have been looking at AsciiDoc
If you like Org mode, you can overcome the Github/etc. Org mode rendering limitations by hosting small static sites for your projects. You may choose to do so by simply exporting Org to HTML using ox-html[1], or even exporting Org to markdown for Hugo (disclaimer: my package -> ox-hugo)[2].
I mean, if you like using Org mode, and Github et al don't see the value in improving support for that, don't use them to render Org mode docs :)
What a wonderfully mad, cool goal.
stage0 is one of mes' inspirations, so I'd say there's a level of connection there.
Apparently mes is also working towards being able to be compiled by M2-Planet by the same author [0], at which point it might be possible to eventually build mes from stage0 as it can now bootstrap M2-Planet.
The main contributing points seem to be github and gitlab, rather than savannah.
If not, then the smallest binary to bootstrap a C compiler is actually a single jump to a C compiler in memory with a README containing the memory dump in to be typed in :)
Seriously though, it reminds me of the Toaster Project [0] where an RCA student attempted to build a modern toaster without using the modern supply chain.
On a related note, one can write arbitrary bytes to a file (on a Unix-like system) using GNU 'echo' or the 'printf' utilities. Chris Wellons' post "A Magnetized Needle and a Steady Hand" describes how to write a basic utility in this way.