- ELF files are used not only for executables, but also object files, shared libraries, and also coredumps. Different parts of the ELF format serve different purposes, although there is a lot of overlap.
- The program headers don't state the location of .text, but indicate the area of the file that should be mapped into memory.
- Dynamic linking doesn't require section headers. The dynamic loader (ld.so) parses the program headers for a PT_DYNAMIC entry, which refers to the .dynamic section (which in turn refers to .dynsym, .dynstr, .rela.dyn, .init_array, etc.).
- Relocation sections (what is a relocation symbol?) are required for static linking, where every section with relocations gets its own relocation section, so .text gets .rela.text. Also, in object files, sections must use relocations to refer to other sections. Executables don't need to have relocations.
- The alignment of PT_LOAD segments must be at least the page size: the kernel or loader will use mmap to map the file, so alignments smaller than the page size won't work.
- The first section table entry must be of type SHT_NULL. The magic value SHN_UNDEF (=0) is used to refer to undefined symbols, so referring to the first section in, e.g., the symbol table, is not possible.
Although not required for a minimal file, any "modern" ELF executable should have a PT_GNU_STACK program header with flags read+write, otherwise the stack will get mapped as executable memory region, thereby creating a large and often avoidable attack vector.
> The program headers don't state the location of .text, but indicate the area of the file that should be mapped into memory.
Specifically, the PT_LOAD segment does that. Other segments are used for other purposes. Linkers generally don't generate ELFs with PT_LOAD segments covering the section header table but one could patch the ELF so that the last PT_LOAD segment covers the table or even the entire file. That way the location of the .text section becomes reachable to the running program via the section header table.
There's also this surprisingly useful PT_NULL segment type. They're essentially just placeholders with undefined program header structure contents. Excellent targets for patching. Scripting the linker to output these segments proved to be quite difficult so I just asked for a linker command line option instead. LLVM and GNU ld weren't interested but mold quickly added this feature.
A PT_NULL segment allows patching in a PT_LOAD segment for any data or metadata the programmer needs. It's also possible to create custom segments just like GNU did since there's a truly massive numeric range reserved just for that. These two facts enable some really cool stuff:
https://www.matheusmoreira.com/articles/self-contained-lone-...
> The alignment of PT_LOAD segments must be at least the page size: the kernel or loader will use mmap to map the file, so alignments smaller than the page size won't work.
In addition to that, they must also be sorted! For some weird reason, PT_LOAD segments cannot be in arbitrary order even if they don't overlap.
Violating these requirements causes some truly excruciating crashes. The executable would somehow segfault before a single instruction executed. This uber segfault brought the likes of GDB to its knees and I was reduced to pasting readelf output on stackoverflow.
Not sure why you would want to find .text in the program, but if you do, the linker (at least ld.bfd and ld.lld do) adds the symbols __executable_start and _etext, which surround the program code. Using linker-resolved symbols is much more reliable than parsing section headers.
> In addition to that, they must also be sorted! For some weird reason
For efficiency and simplicity when loading.
Re your article:
> Unless I can figure out a way to move the program header table to the end of the file without breaking everything
This should be doable, but you need to make sure that the program headers are mapped to memory (i.e., completely covered by a PT_LOAD segment).
I do wonder why you use program headers, though: depending on your exact use case, it might be easier to link against an object file containing only data (e.g., from objcopy) and use symbols instead.
> Technically, the section header table and the section header string table are not needed, but having them sets the stage for adding more sections in the future, namely dynamic linking and debug symbol sections.
Even dynamically linked executables tables do not need sections!!
It just so happens to be that the runtime symbol table is defined to be the same format as the normal symbol table. Although it shows up as .dynsym, the runtime linker doesn’t find it that way!
Dynamic linking is a process that takes place with loadable ELF files. A loadable ELF file needs no sections at all.
echo 'f0VMRgIBAQAAAAAAAAAAAAIAPgABAAAA+ABAAAAAAAAAAQAAAAAAAEAAAAAAAAAAAAAAAEAAOAABAEAAAgABAAEAAAABAAAABgAAAAAAAAD4AEAAAAAAAPgAAAAAAAAADgAAAAAAAAAAAAAAAAAAAAAQAAAAAAAAAAAAAAAAAAAHAAAAAwAAAAAAAAAAAAAAAAAAAAAAAAAGAQAAAAAAABEAAAAAAAAAAAAAAAAAAAABAAAAAAAAAAAAAAAAAAAAAQAAAAUAAAD4AAAAAAAAAPgAQAAAAAAA+ABAAAAAAAAOAAAAAAAAAA4AAAAAAAAAABAAAAAAAABIx8A8AAAAvyoAAAAPBQAudGV4dAAuc2hzdHJ0YWIA' | base64 -d > main
chmod +x main
./main
# bash: ./main: cannot execute binary file: Exec format error
Am I doing it incorrectly? I hope I copied the bytes correctly since copy & pasting from the website is a bit challenging. echo 'f0VMRgIBAQAAAAAAAAAAAAIAPgABAAAA+ABAAAAAAADAAAAAAAAAAEAAAAAAAAAAAAAAAEAAOAABAEAAAgABAAEAAAABAAAABgAAAAAAAAD4AEAAAAAAAPgAAAAAAAAADgAAAAAAAAAAAAAAAAAAAAAQAAAAAAAAAAAAAAAAAAAHAAAAAwAAAAAAAAAAAAAAAAAAAAAAAAAGAQAAAAAAABEAAAAAAAAAAAAAAAAAAAABAAAAAAAAAAAAAAAAAAAAAQAAAAUAAAD4AAAAAAAAAPgAQAAAAAAA+ABAAAAAAAAOAAAAAAAAAA4AAAAAAAAAABAAAAAAAABIx8A8AAAAvyoAAAAPBQAudGV4dAAuc2hzdHJ0YWIA' | base64 -d > main
./main
echo $? # should print 42The Really Teensy Linux ELF Executables Essays