The Journey Before main() (opens in new tab)

(amit.prasad.me)

316 pointsamitprasad8mo ago143 comments

143 comments

54 comments · 14 top-level

vbezhenar8mo ago· 14 in thread

I wonder how many C projects prefer to avoid standard library, just invoking Linux syscalls directly. Much more fun to write software this way, IMO.

electroly8mo ago

Not exactly the same, but on Windows if you use entirely Win32 calls you can avoid linking any C runtime library. Win32 is below the C standard library on Windows and the C runtime is optional.

okanat8mo ago

This is one of the cornerstones that guarantee Windows can easily upgrade the C runtime and make performance and security upgrades. Win32 APIs have a different function calling ABI too.

So only part of that gets "bloated" is Win32 API itself (which is spread across multiple DLLs and don't actually bloat RAM usage). Most of the time even those functions and structures are carefully designed to have some future-proofness but it is usual to see APIs like CreateFile, CreateFile2, CreateFile3. Internally the earlier versions are upgraded to call the latest version. So not so much bloating there either.

When the C runtime and the OS system calls are combined into the single binary like POSIX, it creates the ABI hell we're in with the modern Unix-likes. Either the OSes have to regularly break the C ABI compatibility for the updates or we have to live with terrible implementations.

GNU libc and Linux combo is particularly bad. On GNU/Linux (or any other current libc replacements), the dynamic loading is also provided by the C library. This makes "forever" binary file compatibility particularly tricky to achieve. Glibc broke certain games / Steam by removing some parts of their ELF implementation: https://sourceware.org/bugzilla/show_bug.cgi?id=32653 . They backed due to huge backlash from the community.

If "the year of Linux desktop" would ever happen, they need to either do an Android and change the definition of what a software package is, or split Glibc into 3 parts: syscalls, dynamic loader and the actual C library.

PS: There is actually a catch to your " C runtime is optional." argument. Microsoft still intentionally holds back the ability of compiling native ABI Windows programs without Visual Studio.

The structured exception handlers (equivalent of Windows for SIGILL, SIGBUS etc.. not for SIGINT or SIGTERM though) are populated by the object files from the C runtime libraries (called VCRuntime/VCStartup). So it is actually not possible to have official Windows binaries without MSVC or any other C runtime like Mingw-64 that provides those symbols. It looks like some developers in Microsoft wanted to open-source VCRuntime / VCStartup but it was ~vetoed~ not fully approved by some people: https://github.com/microsoft/STL/issues/4560#issuecomment-23... , https://www.reddit.com/r/cpp/comments/1l8mqlv/is_msvc_ever_g...

4 more replies

codedokode8mo ago

I think using syscalls directly is a worse idea than loading shared libraries, and new kernel features, like ALSA (audio playback), DRM (graphics rendering) and other use libraries instead of documenting syscalls and ioctls. This is better because it allows intercepting and subverting the calls, adding support for features even if the kernel doesn't support it, makes it easier to port code to other OSes, support different architectures (32-bit code on 64-bit kernel), and allows changing kernel interface without breaking anything. So Windows-style approach with system libraries is better in every aspect.

matheusmoreira8mo ago

I once wrote a liblinux project just for this!! It was indeed extremely fun. Details in my other comment:

https://news.ycombinator.com/item?id=45709141

I abandoned it because Linux itself now has a rich set of nolibc headers.

Now I'm working on a whole programming language based around this concept. A freestanding lisp interpreter targeting Linux directly with builtin system call support. The idea is to complete the interpreter and then write the standard library and Linux user space in lisp using the system calls.

It's been an amazing journey. It's incredible how far one can take this.

17186274408mo ago

I generally try to stay portable, but file descriptors are just to nice, to not use them.

Retr0id8mo ago

File descriptors are part of the linux syscall API, not libc. Are you thinking of FILE?

2 more replies

jjmarr8mo ago

Tons of driver code does this.

forrestthewoods8mo ago

You had me with “avoid C standard library” but lost me at “incoming Linux syscalls directly”.

Windows support is a requirement, and no WSL2 doesn’t count.

C standard library is pretty bad and it’d be great if not using it was a little easier and more common.

WJW8mo ago

Obviously only a requirement if you intend your software to run under windows. But if you don't, why bother. Not all software is intended to be distributed to users far and wide. Some of it is just for yourself, and some of it will only ever run on linux servers.

1 more reply

rfl8908mo ago

You can make CRT-free Win32 programs, read this guide[1] and you're all set. I've written a couple CLI utilities which are completely CRT-free and weigh just under a few kilobytes.

[1]: https://nullprogram.com/blog/2023/02/15/

2 more replies

antihero8mo ago

> Windows support is a requirement

Why, exactly?

AnimalMuppet8mo ago

> Windows support is a requirement...

For what?

There is some software for which Windows support is required. There are others for which it is not, and never will be. (And for an article about running ELF files on RiscV with a Linux OS, the "Windows support" complaint seems a bit odd...)

throwawaysoxjje8mo ago

A requirement from whom? To do what?

pmc008mo ago

You can do this in Windows too, useful if you want tiny executables that use minimum resources.

I wrote this little systemwide mute utility for Windows that way, annoying to be missing some parts of the CRT but not bad, code here: https://github.com/pablocastro/minimute

1 more reply

fweimer8mo ago· 7 in thread

> The ELF file contains a dynamic section which tells the kernel which shared libraries to load, and another section which tells the kernel to dynamically “relocate” pointers to those functions, so everything checks out.

This is not how dynamic linking works on GNU/Linux. The kernel processes the program headers for the main program (mapping the PT_LOAD segments, without relocating them) and notices the PT_INTERP program interpreter (the path to the dynamic linker) among the program headers. The kernel then loads the dynamic linker in much the same way as the main program (again without relocation) and transfers control to its entry point. It's up to the dynamic linker to self-relocate, load the referenced share objects (this time using plain mmap and mprotect, the kernel ELF loader is not used for that), relocate them and the main program, and then transfer control to the main program.

The scheme is not that dissimilar to the #! shebang lines, with the dynamic linker taking the role of the script interpreter, except that ELF is a binary format.

matheusmoreira8mo ago

Yeah it turns out the kernel doesn't care about sections at all. It only ever cares about the PT_LOAD segments in the program header table, which is essentially a table of arguments for the mmap system call. Sections are just dynamic linker metadata and are never covered by PT_LOAD segments.

This seems to be a common misconception. I too suffered from it once... Tried to embed arbitrary files into ELF files using objcopy. The tool could easily create new sections with the file contents just fine, but the kernel wouldn't load them into memory. It was really confusing at first.

https://stackoverflow.com/q/77468641

There were no tools for patching the program header table, I ended up making them! The mold linker even added a feature just to make this patching easy!

https://www.matheusmoreira.com/articles/self-contained-lone-...

mkoubaa8mo ago

I've always wondered why there weren't more popular loaders to choose from given that on Linux loaders are user-space

fweimer8mo ago

With containers, you usually get incompatible dynamic loaders in the containers (see mananaysiempre' comment; the glibc dynamic linker sees rather active development in some LTS distributions). This wouldn't be possible if the loader were part of the kernel.

Non-ELF loaders are fairly common, too. It's how Wine works, and how Microsoft reuses PE/COFF SQL Server binaries on Linux.

ksherlock8mo ago

There's also binfmt support, which can check a supposedly executable file against some magic and auto-launch an interpreter (like wine or java or dosemu). I looked into it for something once but in my case the magic wasn't good enough.

https://www.kernel.org/doc/html/latest/admin-guide/binfmt-mi...

1 more reply

mananaysiempre8mo ago

Part of it is the Glibc loader’s carnal knowledge of Glibc proper; there’s essentially no module boundary there. (That’s not completely unjustified, but Glibc is especially hostile there, like in its many other architectural choices.) Musl outright merges the two into a single binary. So if you want to do a loader then you’re also doing a libc.

Part of it for desktop Linux specifically is that a lot of the graphics stack is very unfriendly to alternative libcs or loaders. For example, Wayland is nominally a protocol admitting multiple implementations, but if you want to not be dumb[1] and do GPU-accelerated graphics, then the ABI ties you to libwayland.so specifically (event-loop opinions and all) in order to load vendor-specific userspace drivers, which entails your distro’s preferred libc (probably Glibc).

[1] There can of course be good engineering reasons to be dumb.

1 more reply

BobbyTables28mo ago

I suspect it is because they get really hairy.

Loading ELFs and processing relocations is actually not too bad. It’s fun after the initial learning curve.

Then one has to worry about handling of “dlopen” and the loader creating the data structures it cares about. Yuck!!!

It’s kinda a shame because the glibc loader is a bit bloated with all the audit and preload handling. Great for flexibility, not for security.

1 more reply

amitprasadOP8mo ago

You’re right, and I knew this back in February when I wrote most of this post. I must have revised it down incorrectly before posting; will correct. Bit of a facepalm from my side.

bignerd_958mo ago· 7 in thread

As someone who teaches this stuff at university, I see students getting confused every single year by how textbooks draw memory. The problem is mostly visual, not conceptual.

Most diagrams in books and slides use an old hardware-centric convention: they draw higher addresses at the top of the page and lower addresses at the bottom. People sometimes justify this with an analogy like “floors in a building go up,” so address 0x7fffffffe000 is drawn “higher” than 0x400000.

But this is backwards from how humans read almost everything today. When you look at code in VS Code or any other IDE, line 1 is at the top, then line 2 is below it, then 3, 4, etc. Numbers go up as you go down. Your brain learns: “down = bigger index.”

Memory in a real Linux process actually matches the VS Code model much more closely than the textbook diagrams suggest.

You can see it yourself with:

cat /proc/$$/maps

(pick any PID instead of $$).

...

[0x00000000] lower addresses

...

[0x00620000] HEAP start

[0x00643000] HEAP extended ↓ (more allocations => higher addresses)

...

[0x7ffd8c3f7000] STACK top (<- stack pointer)

                  ↑ the stack pointer starts here and moves upward

                  (toward lower addresses) when you push

[0x7ffd8c418000] STACK start

...

[0xffffffffff600000] higher addresses

...

The output is printed from low addresses to high addresses. At the top of the output you'll usually see the binary, shared libs, heap, etc. Those all live at lower virtual addresses. Farther down in the output you'll eventually see the stack, which lives at a higher virtual address. In other words: as you scroll down, the addresses get bigger. Exactly like scrolling down in an editor gives you bigger line numbers.

The phrases “the heap grows up” and “the stack grows down” aren't wrong. They're just describing what happens to the numeric addresses: the heap expands toward higher addresses, and the stack moves into lower addresses.

The real problem is how we draw it. We label “up” on the page as “higher address,” which is the opposite of how people read code or even how /proc/<pid>/maps is printed. So students have to mentally flip the diagram before they can even think about what the stack and heap are doing.

If we just drew memory like an editor (low addresses at the top, high addresses further down) it would click instantly. Scroll down, addresses go up, and the stack sits at the bottom. At that point it’s no longer “the stack grows down”: it’s just the stack pointer being decremented, moving to lower addresses (which, in the diagram, means moving upward).

krackers8mo ago

The stack does grow down though no matter what, in the sense that the pushing decrements the stack pointer. You can represent this as "up" in your diagram, but I don't think this makes it any easier conceptually because by analogy to a simple push/pop on an array, you'd naively expect higher addresses to contain more recent stack contents.

The core of the issue is that the direction stack growth differs from "usual" memory access patterns which usually allocate from lower to higher addresses (consider array access, or how strings are laid out in memory. And little-endian systems are the majority)

But if we're going with visualization options I prefer to visualize it horizontally, with lower addresses on left. This has a natural correspondence with how you access an array or lay out strings in memory.

bignerd_958mo ago

Please try to draw, step by step, a process where lower addresses are at the top and higher addresses are at the bottom. You’ll see that this makes everything much easier to understand.

Do not confuse this with push and pop on an abstract stack data structure. That is not the same as the process stack. On a real process stack, newer data is stored at LOWER addresses. In fact, every push decrements the stack pointer (the address is decreased).

If you want an example, think about how a string is placed and accessed on the stack. First, the stack pointer is decremented to reserve space (so in my diagram this “moves up” visually). Then the string can be read byte by byte by incrementing an index from the lower address toward the higher address. This is exactly like reading a book: left to right, top to bottom. If you flip memory upside down, everything becomes unnatural to understand: you would have to read the string from the bottom to the top.

Try decompiling a program with Ghidra. Open the disassembly view and look at the addresses on the left. Lower addresses are shown at the top. Higher addresses are shown at the bottom. In my diagram this matches perfectly. Everything is consistent and you never have to mentally flip the memory layout.

Years of practice led me to this, not just theory.

amitprasadOP8mo ago

I think I got stuck in the same rut that I learned address space in whilst writing that diagram. I would tend to agree with you that your model makes much more sense to the student.

Related: In notation, one thing that I used to struggle with is how addresses (e.g. 0xAB_CD) actually have the bit representation of [0xCD, 0xAB]. Wonder if there's a common way to address that?

bignerd_958mo ago

If you're referring to little-endianness, it means the CPU stores multi-byte values in memory with the least significant byte first (at the lowest address).

This convention started on early Intel chips and was kept for backward compatibility. It also has a practical benefit: it makes basic arithmetic and type widening cheaper in hardware. The "low" part of the value is always at the base address, so the CPU can load 8 bits, then 16 bits, then 32 bits, etc. starting from the same address without extra offset math.

So when you say an address like 0xABCD shows up in memory as [0xCD, 0xAB] byte-by-byte, that's not the address being "reversed". That's just the little-endian in-memory layout of that numeric value.

There are also big-endian architectures, where the most significant byte is stored at the lowest address. That matches how humans usually write numbers (0xABCD in memory as [0xAB, 0xCD]). But most mainstream desktop/server CPUs today are little-endian, so you mostly see the little-endian view.

1 more reply

bignerd_958mo ago

Yes, I reached the same conclusions the hard way while exploiting memory corruption bugs. Once I understood how misleading these representations can be, everything finally became clear.

About the address notation you're describing, I'm not sure I fully get the problem. Can you spell out the question with a concrete example?

This is what the address space of a real bash process looks like on my machine:

$ cat /proc/$(pidof bash)/maps

5e6e8fd0f000-5e6e8fd3f000 r--p 00000000 fc:00 3539412 /usr/bin/bash

5e6e8fd3f000-5e6e8fe2e000 r-xp 00030000 fc:00 3539412 /usr/bin/bash

5e6e8fe2e000-5e6e8fe63000 r--p 0011f000 fc:00 3539412 /usr/bin/bash

5e6e8fe63000-5e6e8fe67000 r--p 00154000 fc:00 3539412 /usr/bin/bash

5e6e8fe67000-5e6e8fe70000 rw-p 00158000 fc:00 3539412 /usr/bin/bash

5e6e8fe70000-5e6e8fe7b000 rw-p 00000000 00:00 0

5e6e94891000-5e6e94a1e000 rw-p 00000000 00:00 0 [heap]

7ec3d1400000-7ec3d16eb000 r--p 00000000 fc:00 3550901 /usr/lib/locale/locale-archive

7ec3d1800000-7ec3d1828000 r--p 00000000 fc:00 3548995 /usr/lib/x86_64-linux-gnu/libc.so.6

7ec3d1828000-7ec3d19b0000 r-xp 00028000 fc:00 3548995 /usr/lib/x86_64-linux-gnu/libc.so.6

7ec3d19b0000-7ec3d19ff000 r--p 001b0000 fc:00 3548995 /usr/lib/x86_64-linux-gnu/libc.so.6

7ec3d19ff000-7ec3d1a03000 r--p 001fe000 fc:00 3548995 /usr/lib/x86_64-linux-gnu/libc.so.6

7ec3d1a03000-7ec3d1a05000 rw-p 00202000 fc:00 3548995 /usr/lib/x86_64-linux-gnu/libc.so.6

7ec3d1a05000-7ec3d1a12000 rw-p 00000000 00:00 0

7ec3d1a2b000-7ec3d1a84000 r--p 00000000 fc:00 3549063 /usr/lib/locale/C.utf8/LC_CTYPE

7ec3d1a84000-7ec3d1a85000 r--p 00000000 fc:00 3549069 /usr/lib/locale/C.utf8/LC_NUMERIC

7ec3d1a85000-7ec3d1a86000 r--p 00000000 fc:00 3549072 /usr/lib/locale/C.utf8/LC_TIME

7ec3d1a86000-7ec3d1a87000 r--p 00000000 fc:00 3549062 /usr/lib/locale/C.utf8/LC_COLLATE

7ec3d1a87000-7ec3d1a88000 r--p 00000000 fc:00 3549067 /usr/lib/locale/C.utf8/LC_MONETARY

7ec3d1a88000-7ec3d1a89000 r--p 00000000 fc:00 3549066 /usr/lib/locale/C.utf8/LC_MESSAGES/SYS_LC_MESSAGES

7ec3d1a89000-7ec3d1a8a000 r--p 00000000 fc:00 3549070 /usr/lib/locale/C.utf8/LC_PAPER

7ec3d1a8a000-7ec3d1a8b000 r--p 00000000 fc:00 3549068 /usr/lib/locale/C.utf8/LC_NAME

7ec3d1a8b000-7ec3d1a8c000 r--p 00000000 fc:00 3549061 /usr/lib/locale/C.utf8/LC_ADDRESS

7ec3d1a8c000-7ec3d1a8d000 r--p 00000000 fc:00 3549071 /usr/lib/locale/C.utf8/LC_TELEPHONE

7ec3d1a8d000-7ec3d1a90000 rw-p 00000000 00:00 0

7ec3d1a90000-7ec3d1a9e000 r--p 00000000 fc:00 3551411 /usr/lib/x86_64-linux-gnu/libtinfo.so.6.4

7ec3d1a9e000-7ec3d1ab1000 r-xp 0000e000 fc:00 3551411 /usr/lib/x86_64-linux-gnu/libtinfo.so.6.4

7ec3d1ab1000-7ec3d1abf000 r--p 00021000 fc:00 3551411 /usr/lib/x86_64-linux-gnu/libtinfo.so.6.4

7ec3d1abf000-7ec3d1ac3000 r--p 0002e000 fc:00 3551411 /usr/lib/x86_64-linux-gnu/libtinfo.so.6.4

7ec3d1ac3000-7ec3d1ac4000 rw-p 00032000 fc:00 3551411 /usr/lib/x86_64-linux-gnu/libtinfo.so.6.4

7ec3d1ac4000-7ec3d1ac5000 r--p 00000000 fc:00 3549065 /usr/lib/locale/C.utf8/LC_MEASUREMENT

7ec3d1ac5000-7ec3d1ac6000 r--p 00000000 fc:00 3549064 /usr/lib/locale/C.utf8/LC_IDENTIFICATION

7ec3d1ac6000-7ec3d1acd000 r--s 00000000 fc:00 3548984 /usr/lib/x86_64-linux-gnu/gconv/gconv-modules.cache

7ec3d1acd000-7ec3d1acf000 rw-p 00000000 00:00 0

7ec3d1acf000-7ec3d1ad0000 r--p 00000000 fc:00 3548992 /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2

7ec3d1ad0000-7ec3d1afb000 r-xp 00001000 fc:00 3548992 /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2

7ec3d1afb000-7ec3d1b05000 r--p 0002c000 fc:00 3548992 /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2

7ec3d1b05000-7ec3d1b07000 r--p 00036000 fc:00 3548992 /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2

7ec3d1b07000-7ec3d1b09000 rw-p 00038000 fc:00 3548992 /usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2

7ffd266f8000-7ffd26719000 rw-p 00000000 00:00 0 [stack]

7ffd2678a000-7ffd2678e000 r--p 00000000 00:00 0 [vvar]

7ffd2678e000-7ffd26790000 r-xp 00000000 00:00 0 [vdso]

ffffffffff600000-ffffffffff601000 --xp 00000000 00:00 0 [vsyscall]

___

Each line is a memory mapping. The first field is the start address. The second field is the end address. So an entry like

7ffd266f8000-7ffd26719000

means "this mapping covers virtual addresses from 0x7ffd266f8000 up to 0x7ffd26719000."

The addresses are always increasing:

- left to right: within a single line you go from lower address to higher address

- top to bottom: as you go down the list you also go to higher and higher addresses

Exactly like reading a book: left to right and then top to bottom.

1 more reply

17186274408mo ago

That's how stacks on my desk grow and how everything grows in reality. I wouldn't numerate stacked things on my desk from the top, since this constantly changes. You also wouldn't name the first branch of a tree (the plant) to be the top-most one.

In your example "the stack grows down", seems to be wrong in the image.

bignerd_958mo ago

Thanks! I tried to rewrite the final sentence

1 more reply

mmsc8mo ago· 4 in thread

It's also possible to pack a whole codebase into "before main()" - or with no main() at all. I was recently experimenting doing this, as well as a whole codebase that only uses main() and calls itself over and over. Good fun: https://joshua.hu/packing-codebase-into-single-function-disr...

17186274408mo ago

That is a really fun read and honestly doesn't even seem to be complicated and brittle. Just rename every function to main(100+n, ...).

thatxliner8mo ago

Just wondering, how did you get that domain name? I’ve been looking for registrars offering .hu

slater8mo ago

https://nic.hu/index_en.html ?

hashstring8mo ago

whois data points to https://www.domain.hu.

khaledh8mo ago· 3 in thread

> A note on interpreters: If the executable file starts with a shebang (#!), the kernel will use the shebang-specified interpreter to run the program. For example, #!/usr/bin/python3 will run the program using the Python interpreter, #!/bin/bash will run the program using the Bash shell, etc.

This caused me a lot of pain while trying to debug a 3rd party Java application that was trying to launch an executable script, and throwing an IO error "java.io.IOException: error=2, No such file or directory." I was puzzled because I know the script is right there (using its full path) and it had the executable bit set. It turns out that the shebang in the script was wrong, so the OS was complaining (actual error from a shell would be "The file specified the interpreter '/foo/bar', which is not an executable command."), but the Java error was completely misleading :|

Note: If you wonder why I didn't see this error by running the script myself: I did, and it ran fine locally. But the application was running on a remote host that had a different path for the interpreter.

17186274408mo ago

Note, that this is not a Java specific problem, it can occur with other programs as well. "No such file or directory" is just the nice description for ENOENT, which can occur in a lot of syscalls. I typically just run the program through strace, then you will quickly see what the program did.

gjf8mo ago

For those interested, I did a breakdown of the hashbang: https://blog.foletta.net/post/2021-04-19-what-the/

mscdex8mo ago

Also be aware that kernel support for shebangs depends on CONFIG_BINFMT_SCRIPT=y being in the kernel config.

Animats8mo ago· 2 in thread

From the title, I thought this was going to be about the parts of a program that run before the main function is entered. Static objects have to be constructed. Quite a bit of code can run. Order of initialization can be a problem. What happens if you try to do I/O from a static constructor? Does that even work?

amitprasadOP8mo ago

This is heavily language runtime dependent — there’s nothing that fundamentally stops you from doing anything during the phase between jumping to an entry point and the main()

abnercoimbre8mo ago

Indeed the craziest among us occasionally abuse this fact, so long as the compiler implementation lets us.

1 more reply

archmaster8mo ago· 1 in thread

This is awesome! To anyone interested in learning more about this, I wrote https://cpu.land/ a couple years ago. It doesn't go as in-depth into e.g. memory layout as OP does but does cover multitasking and how the code is loaded in the first place.

fuzzy_biscuit8mo ago

I love cpu.land! Thanks for creating such a fun resource.

turbert8mo ago· 1 in thread

Its been a while since I've touched this stuff but my recollection is the ELF interpreter (ldso, not the kernel) is responsible for everything after mapping the initial ELF's segments.

iirc execve maps pt_load segments from the program header, populates the aux vector on the stack, and jump straight to the ELF interpreter's entry point. Any linked objects are loaded in userspace by the elf interpreter. The kernel has no knowledge of the PLT/GOT.

matheusmoreira8mo ago

That's right!

https://lwn.net/Articles/631631/

https://github.com/torvalds/linux/blob/master/fs/binfmt_elf....

Especially relevant for dynamic linkers is the AT_PHDR and AT_BASE auxiliary vector entries which provide the address of the executable's program header table and the address of the interpreter, respectively.

https://lwn.net/Articles/519085/

hagbard_c8mo ago· 1 in thread

On the subject of symbols:

> Yeah, that’s it. Now, 2308 may be slightly bloated because we link against musl instead of glibc, but the point still stands: There’s a lot of stuff going on behind the scenes here.

Slightly bloated is a slight understatement. The same program linked to glibc tops at 36 symbols in .symtab:

    $ readelf -a hello|grep "'.symtab'"
    Symbol table '.symtab' contains 36 entries:

amitprasadOP8mo ago

Ah I should have taken the time to verify; It might also have something to do with the way I was compiling / cross-compiling for RISC-V!

More generally, I'm not surprised at the symtab bloat from statically-linking given the absolute size increase of the binary.

nneonneo8mo ago

For a fun example of a crash that can occur before main() even starts: https://stackoverflow.com/questions/12570374/floating-point-...

The poster was receiving a SIGFPE (floating point exception) on a C program that is simply “int main() { return 0; }”. A fun little mystery to dive into!

itopaloglu838mo ago

I like doing this with old microcontrollers like PIC16 series etc. You said see how to stack pointer, timers, and variables etc. all are configured.

yawpitch8mo ago

You’ve got a broken link in your markdown, round about the phrase “lang_start function (defined here)”.

ramanvarma8mo ago

did you see the relocations for the main binary applied before or after the linker resolves its own symbols? the ordering always feels like black magic when you step through it in a debugger

matheusmoreira8mo ago

Hacking this stuff is so fun!!

> Depending on your program, _start may be the only thing between the entrypoint and your main function

I once developed a liblinux project entirely built around this idea.

I wanted to get rid of libc and all of its initialization, complexity and global state. The C library is so complex it has a primitive form of package management built into it:

https://blogs.oracle.com/solaris/post/init-and-fini-processi...

So I made _start functions which did nothing but pass argc, argv, envp and auxv to the actual main function:

https://github.com/matheusmoreira/liblinux/blob/master/start...

You can get surprisingly far with just this, and it's actually possible to understand what's going on. Biggest pain point was the lack of C library utility functions like number/string conversion. I simply wrote my own.

https://github.com/matheusmoreira/liblinux/tree/master/examp...

Linux is the only operating system that lets us do this. In other systems, the C library is part of the kernel interface. Bypassing it like this can and does break things. Go developers once discovered this the hard way.

https://www.matheusmoreira.com/articles/linux-system-calls

The kernel has their own nolibc infrastructure now, no doubt much better than my project.

https://github.com/torvalds/linux/tree/master/tools/include/...

I encourage everyone to use it.

Note also that _start is an arbitrary symbol. The name is not special at all. It's just some linker default. The ELF header contains a pointer to the entry point, not a symbol. Feel free to choose a nice name!

j / k navigate · click thread line to collapse

143 comments

54 comments · 14 top-level

vbezhenar8mo ago· 14 in thread

I wonder how many C projects prefer to avoid standard library, just invoking Linux syscalls directly. Much more fun to write software this way, IMO.

electroly8mo ago

Not exactly the same, but on Windows if you use entirely Win32 calls you can avoid linking any C runtime library. Win32 is below the C standard library on Windows and the C runtime is optional.

okanat8mo ago

This is one of the cornerstones that guarantee Windows can easily upgrade the C runtime and make performance and security upgrades. Win32 APIs have a different function calling ABI too.

PS: There is actually a catch to your " C runtime is optional." argument. Microsoft still intentionally holds back the ability of compiling native ABI Windows programs without Visual Studio.

4 more replies

codedokode8mo ago

matheusmoreira8mo ago

I once wrote a liblinux project just for this!! It was indeed extremely fun. Details in my other comment:

https://news.ycombinator.com/item?id=45709141

I abandoned it because Linux itself now has a rich set of nolibc headers.

It's been an amazing journey. It's incredible how far one can take this.

17186274408mo ago

I generally try to stay portable, but file descriptors are just to nice, to not use them.

Retr0id8mo ago

File descriptors are part of the linux syscall API, not libc. Are you thinking of FILE?

2 more replies

jjmarr8mo ago

Tons of driver code does this.

forrestthewoods8mo ago

You had me with “avoid C standard library” but lost me at “incoming Linux syscalls directly”.

Windows support is a requirement, and no WSL2 doesn’t count.

C standard library is pretty bad and it’d be great if not using it was a little easier and more common.

WJW8mo ago

1 more reply

rfl8908mo ago

You can make CRT-free Win32 programs, read this guide[1] and you're all set. I've written a couple CLI utilities which are completely CRT-free and weigh just under a few kilobytes.

[1]: https://nullprogram.com/blog/2023/02/15/

2 more replies

antihero8mo ago

> Windows support is a requirement

Why, exactly?

AnimalMuppet8mo ago

> Windows support is a requirement...

For what?

throwawaysoxjje8mo ago

A requirement from whom? To do what?

pmc008mo ago

You can do this in Windows too, useful if you want tiny executables that use minimum resources.

I wrote this little systemwide mute utility for Windows that way, annoying to be missing some parts of the CRT but not bad, code here: https://github.com/pablocastro/minimute

1 more reply

fweimer8mo ago· 7 in thread

The scheme is not that dissimilar to the #! shebang lines, with the dynamic linker taking the role of the script interpreter, except that ELF is a binary format.

matheusmoreira8mo ago

https://stackoverflow.com/q/77468641

There were no tools for patching the program header table, I ended up making them! The mold linker even added a feature just to make this patching easy!

https://www.matheusmoreira.com/articles/self-contained-lone-...

mkoubaa8mo ago

I've always wondered why there weren't more popular loaders to choose from given that on Linux loaders are user-space

fweimer8mo ago

Non-ELF loaders are fairly common, too. It's how Wine works, and how Microsoft reuses PE/COFF SQL Server binaries on Linux.

ksherlock8mo ago

https://www.kernel.org/doc/html/latest/admin-guide/binfmt-mi...

1 more reply

mananaysiempre8mo ago

[1] There can of course be good engineering reasons to be dumb.

1 more reply

BobbyTables28mo ago

I suspect it is because they get really hairy.

Loading ELFs and processing relocations is actually not too bad. It’s fun after the initial learning curve.

Then one has to worry about handling of “dlopen” and the loader creating the data structures it cares about. Yuck!!!

It’s kinda a shame because the glibc loader is a bit bloated with all the audit and preload handling. Great for flexibility, not for security.

1 more reply

amitprasadOP8mo ago

You’re right, and I knew this back in February when I wrote most of this post. I must have revised it down incorrectly before posting; will correct. Bit of a facepalm from my side.

bignerd_958mo ago· 7 in thread

As someone who teaches this stuff at university, I see students getting confused every single year by how textbooks draw memory. The problem is mostly visual, not conceptual.

Memory in a real Linux process actually matches the VS Code model much more closely than the textbook diagrams suggest.

You can see it yourself with:

cat /proc/$$/maps

(pick any PID instead of $$).

...

[0x00000000] lower addresses

...

[0x00620000] HEAP start

[0x00643000] HEAP extended ↓ (more allocations => higher addresses)

...

[0x7ffd8c3f7000] STACK top (<- stack pointer)

                  ↑ the stack pointer starts here and moves upward

                  (toward lower addresses) when you push

[0x7ffd8c418000] STACK start

...

[0xffffffffff600000] higher addresses

...