A minimal, complete, and correct ELF file (opens in new tab)

(scratchpad.avikdas.com)

72 pointsfebeling2y ago16 comments

16 comments

Nice visualization of the ELF headers. However, the article has a few inaccuracies:

- ELF files are used not only for executables, but also object files, shared libraries, and also coredumps. Different parts of the ELF format serve different purposes, although there is a lot of overlap.

- The program headers don't state the location of .text, but indicate the area of the file that should be mapped into memory.

- Dynamic linking doesn't require section headers. The dynamic loader (ld.so) parses the program headers for a PT_DYNAMIC entry, which refers to the .dynamic section (which in turn refers to .dynsym, .dynstr, .rela.dyn, .init_array, etc.).

- Relocation sections (what is a relocation symbol?) are required for static linking, where every section with relocations gets its own relocation section, so .text gets .rela.text. Also, in object files, sections must use relocations to refer to other sections. Executables don't need to have relocations.

- The alignment of PT_LOAD segments must be at least the page size: the kernel or loader will use mmap to map the file, so alignments smaller than the page size won't work.

- The first section table entry must be of type SHT_NULL. The magic value SHN_UNDEF (=0) is used to refer to undefined symbols, so referring to the first section in, e.g., the symbol table, is not possible.

Although not required for a minimal file, any "modern" ELF executable should have a PT_GNU_STACK program header with flags read+write, otherwise the stack will get mapped as executable memory region, thereby creating a large and often avoidable attack vector.

matheusmoreira2y ago

I'd like to add to your post.

> The program headers don't state the location of .text, but indicate the area of the file that should be mapped into memory.

Specifically, the PT_LOAD segment does that. Other segments are used for other purposes. Linkers generally don't generate ELFs with PT_LOAD segments covering the section header table but one could patch the ELF so that the last PT_LOAD segment covers the table or even the entire file. That way the location of the .text section becomes reachable to the running program via the section header table.

There's also this surprisingly useful PT_NULL segment type. They're essentially just placeholders with undefined program header structure contents. Excellent targets for patching. Scripting the linker to output these segments proved to be quite difficult so I just asked for a linker command line option instead. LLVM and GNU ld weren't interested but mold quickly added this feature.

A PT_NULL segment allows patching in a PT_LOAD segment for any data or metadata the programmer needs. It's also possible to create custom segments just like GNU did since there's a truly massive numeric range reserved just for that. These two facts enable some really cool stuff:

https://www.matheusmoreira.com/articles/self-contained-lone-...

> The alignment of PT_LOAD segments must be at least the page size: the kernel or loader will use mmap to map the file, so alignments smaller than the page size won't work.

In addition to that, they must also be sorted! For some weird reason, PT_LOAD segments cannot be in arbitrary order even if they don't overlap.

Violating these requirements causes some truly excruciating crashes. The executable would somehow segfault before a single instruction executed. This uber segfault brought the likes of GDB to its knees and I was reduced to pasting readelf output on stackoverflow.

aengelke2y ago

> That way the location of the .text section becomes reachable to the running program via the section header table.

Not sure why you would want to find .text in the program, but if you do, the linker (at least ld.bfd and ld.lld do) adds the symbols __executable_start and _etext, which surround the program code. Using linker-resolved symbols is much more reliable than parsing section headers.

> In addition to that, they must also be sorted! For some weird reason

For efficiency and simplicity when loading.

Re your article:

> Unless I can figure out a way to move the program header table to the end of the file without breaking everything

This should be doable, but you need to make sure that the program headers are mapped to memory (i.e., completely covered by a PT_LOAD segment).

I do wonder why you use program headers, though: depending on your exact use case, it might be easier to link against an object file containing only data (e.g., from objcopy) and use symbols instead.

matheusmoreira2y ago

> Not sure why you would want to find .text in the program

Not sure either, I just used the example I read in your post. Now I'm curious about why someone would want to do that. Maybe to make the section writable and patch the code at runtime?

> __executable_start and _etext, which surround the program code.

You're right! I did see those symbols when I dumped ld's default linker script. Completely forgot about them.

> This should be doable, but you need to make sure that the program headers are mapped to memory (i.e., completely covered by a PT_LOAD segment).

I'll keep this in mind when I try it again. At the time I got pretty frustrated because it was pretty hard to debug and figure out why it was failing. The mold solution was like a light at the end of the tunnel for me.

> I do wonder why you use program headers, though: depending on your exact use case, it might be easier to link against an object file containing only data (e.g., from objcopy) and use symbols instead.

Objcopy was ths first thing I tried! Even asked a question about it on stack overflow.

https://stackoverflow.com/q/77468641

Long story short, by default the sections aren't covered by a PT_LOAD segment and so they are unreachable. I wanted the program to work even if the symbols were not defined which is why I tried to find it in the table at runtime.

bregma2y ago

This ELF file is not minimal. A loadable ELF file needs no sections at all. The section header table can be empty and the .shrstr section does not need to be present. The .text entry is a just a pointer into one of the LOAD segments and can also be eliminated.

boricj2y ago

This is acknowledged by the author:

> Technically, the section header table and the section header string table are not needed, but having them sets the stage for adding more sections in the future, namely dynamic linking and debug symbol sections.

BobbyTables22y ago

The author sounds horribly mistaken.

Even dynamically linked executables tables do not need sections!!

It just so happens to be that the runtime symbol table is defined to be the same format as the normal symbol table. Although it shows up as .dynsym, the runtime linker doesn’t find it that way!

aengelke2y ago

Dynamic linking doesn't require section headers, either, just a PT_DYNAMIC program header to refer to the .dynamic section (which in turn refers to .dynsym, .dynstr, .rela.dyn, .init_array, stc.).

bregma2y ago

> A loadable ELF file needs no sections at all.

Dynamic linking is a process that takes place with loadable ELF files. A loadable ELF file needs no sections at all.

johndough2y ago

When trying to run this binary on a 64 bit Debian system, I get an error:

    echo 'f0VMRgIBAQAAAAAAAAAAAAIAPgABAAAA+ABAAAAAAAAAAQAAAAAAAEAAAAAAAAAAAAAAAEAAOAABAEAAAgABAAEAAAABAAAABgAAAAAAAAD4AEAAAAAAAPgAAAAAAAAADgAAAAAAAAAAAAAAAAAAAAAQAAAAAAAAAAAAAAAAAAAHAAAAAwAAAAAAAAAAAAAAAAAAAAAAAAAGAQAAAAAAABEAAAAAAAAAAAAAAAAAAAABAAAAAAAAAAAAAAAAAAAAAQAAAAUAAAD4AAAAAAAAAPgAQAAAAAAA+ABAAAAAAAAOAAAAAAAAAA4AAAAAAAAAABAAAAAAAABIx8A8AAAAvyoAAAAPBQAudGV4dAAuc2hzdHJ0YWIA' | base64 -d > main
    chmod +x main
    ./main

    # bash: ./main: cannot execute binary file: Exec format error

Am I doing it incorrectly? I hope I copied the bytes correctly since copy & pasting from the website is a bit challenging.

johndough2y ago

Found the mistake. The offset for the program header table should be c0 00 instead of 00 01 (third row on the website, starting at byte 0020). Here is the correct code.

    echo 'f0VMRgIBAQAAAAAAAAAAAAIAPgABAAAA+ABAAAAAAADAAAAAAAAAAEAAAAAAAAAAAAAAAEAAOAABAEAAAgABAAEAAAABAAAABgAAAAAAAAD4AEAAAAAAAPgAAAAAAAAADgAAAAAAAAAAAAAAAAAAAAAQAAAAAAAAAAAAAAAAAAAHAAAAAwAAAAAAAAAAAAAAAAAAAAAAAAAGAQAAAAAAABEAAAAAAAAAAAAAAAAAAAABAAAAAAAAAAAAAAAAAAAAAQAAAAUAAAD4AAAAAAAAAPgAQAAAAAAA+ABAAAAAAAAOAAAAAAAAAA4AAAAAAAAAABAAAAAAAABIx8A8AAAAvyoAAAAPBQAudGV4dAAuc2hzdHJ0YWIA' | base64 -d > main
    ./main
    echo $? # should print 42

matheusmoreira2y ago

The Really Teensy Linux ELF Executables Essays

https://www.muppetlabs.com/~breadbox/software/tiny/

llvnux2y ago

This is an amazing animation!

cpach2y ago

Animation? Did I miss something?

echoangle2y ago

You can press the Byte values on the left and get an explanation in the black area on the right

a99c43f2d5655042y ago

Very cool!

j / k navigate · click thread line to collapse

16 comments

aengelke2y ago

Nice visualization of the ELF headers. However, the article has a few inaccuracies:

- The program headers don't state the location of .text, but indicate the area of the file that should be mapped into memory.

- The alignment of PT_LOAD segments must be at least the page size: the kernel or loader will use mmap to map the file, so alignments smaller than the page size won't work.

matheusmoreira2y ago

I'd like to add to your post.

> The program headers don't state the location of .text, but indicate the area of the file that should be mapped into memory.

https://www.matheusmoreira.com/articles/self-contained-lone-...

> The alignment of PT_LOAD segments must be at least the page size: the kernel or loader will use mmap to map the file, so alignments smaller than the page size won't work.

In addition to that, they must also be sorted! For some weird reason, PT_LOAD segments cannot be in arbitrary order even if they don't overlap.

aengelke2y ago

> That way the location of the .text section becomes reachable to the running program via the section header table.

> In addition to that, they must also be sorted! For some weird reason

For efficiency and simplicity when loading.

Re your article:

> Unless I can figure out a way to move the program header table to the end of the file without breaking everything

This should be doable, but you need to make sure that the program headers are mapped to memory (i.e., completely covered by a PT_LOAD segment).

I do wonder why you use program headers, though: depending on your exact use case, it might be easier to link against an object file containing only data (e.g., from objcopy) and use symbols instead.

matheusmoreira2y ago

> Not sure why you would want to find .text in the program

Not sure either, I just used the example I read in your post. Now I'm curious about why someone would want to do that. Maybe to make the section writable and patch the code at runtime?

> __executable_start and _etext, which surround the program code.

You're right! I did see those symbols when I dumped ld's default linker script. Completely forgot about them.

> This should be doable, but you need to make sure that the program headers are mapped to memory (i.e., completely covered by a PT_LOAD segment).

Objcopy was ths first thing I tried! Even asked a question about it on stack overflow.

https://stackoverflow.com/q/77468641

bregma2y ago

boricj2y ago

This is acknowledged by the author:

BobbyTables22y ago

The author sounds horribly mistaken.

Even dynamically linked executables tables do not need sections!!

It just so happens to be that the runtime symbol table is defined to be the same format as the normal symbol table. Although it shows up as .dynsym, the runtime linker doesn’t find it that way!

aengelke2y ago

Dynamic linking doesn't require section headers, either, just a PT_DYNAMIC program header to refer to the .dynamic section (which in turn refers to .dynsym, .dynstr, .rela.dyn, .init_array, stc.).

bregma2y ago

> A loadable ELF file needs no sections at all.

Dynamic linking is a process that takes place with loadable ELF files. A loadable ELF file needs no sections at all.

johndough2y ago

When trying to run this binary on a 64 bit Debian system, I get an error:

    echo 'f0VMRgIBAQAAAAAAAAAAAAIAPgABAAAA+ABAAAAAAAAAAQAAAAAAAEAAAAAAAAAAAAAAAEAAOAABAEAAAgABAAEAAAABAAAABgAAAAAAAAD4AEAAAAAAAPgAAAAAAAAADgAAAAAAAAAAAAAAAAAAAAAQAAAAAAAAAAAAAAAAAAAHAAAAAwAAAAAAAAAAAAAAAAAAAAAAAAAGAQAAAAAAABEAAAAAAAAAAAAAAAAAAAABAAAAAAAAAAAAAAAAAAAAAQAAAAUAAAD4AAAAAAAAAPgAQAAAAAAA+ABAAAAAAAAOAAAAAAAAAA4AAAAAAAAAABAAAAAAAABIx8A8AAAAvyoAAAAPBQAudGV4dAAuc2hzdHJ0YWIA' | base64 -d > main
    chmod +x main
    ./main

    # bash: ./main: cannot execute binary file: Exec format error

Am I doing it incorrectly? I hope I copied the bytes correctly since copy & pasting from the website is a bit challenging.

johndough2y ago

Found the mistake. The offset for the program header table should be c0 00 instead of 00 01 (third row on the website, starting at byte 0020). Here is the correct code.

    echo 'f0VMRgIBAQAAAAAAAAAAAAIAPgABAAAA+ABAAAAAAADAAAAAAAAAAEAAAAAAAAAAAAAAAEAAOAABAEAAAgABAAEAAAABAAAABgAAAAAAAAD4AEAAAAAAAPgAAAAAAAAADgAAAAAAAAAAAAAAAAAAAAAQAAAAAAAAAAAAAAAAAAAHAAAAAwAAAAAAAAAAAAAAAAAAAAAAAAAGAQAAAAAAABEAAAAAAAAAAAAAAAAAAAABAAAAAAAAAAAAAAAAAAAAAQAAAAUAAAD4AAAAAAAAAPgAQAAAAAAA+ABAAAAAAAAOAAAAAAAAAA4AAAAAAAAAABAAAAAAAABIx8A8AAAAvyoAAAAPBQAudGV4dAAuc2hzdHJ0YWIA' | base64 -d > main
    ./main
    echo $? # should print 42

matheusmoreira2y ago

The Really Teensy Linux ELF Executables Essays

https://www.muppetlabs.com/~breadbox/software/tiny/

llvnux2y ago

This is an amazing animation!

cpach2y ago

Animation? Did I miss something?

echoangle2y ago

You can press the Byte values on the left and get an explanation in the black area on the right

a99c43f2d5655042y ago

Very cool!

j / k navigate · click thread line to collapse