story

x86 Disassembly (opens in new tab)

en.wikibooks.org

102 pointsinfinity11y ago28 comments

28 comments

It's funny to see this link come up on HN, because I wrote this book (or much of it) years ago as a college student. I was taking some courses on microprocessor architectures and assembly code, and was in the middle of reading Reversing by Eldad Eilam (which had just come out at the time). I had a mantra back then that the best way to learn a subject was to try and teach it, so I wrote this and several other wikibooks as sort of a study aide for my classes. This also explains why the material appears to cover barely a semester's worth of material and why my name ("Whiteknight") doesn't appear in the edit history after graduation in 2008.

Despite the relatively thin and incomplete coverage of the material, I've heard from several people over the years who appreciated the work as a nice introduction to the topic and even once received a job offer because of it (which didn't work out, for a variety of reasons). All things considered, if I had to change anything it would be the title to make it a little more focused. It's not really a book about disassembly so much as it is an introduction to what high-level language features look like when translated into non-optimized x86 assembly code. Find me a short, catchy title that accurately describes that, and you win some kind of prize.

I doubt I'll ever get back to this book either. I haven't worked with this material at all since school, and don't feel like I have up-to-date knowledge of the subject. Unless somebody else wants to jump in and fill it out, it will probably stay the way you see it now.

I'm glad to see that this book is still around and I'm glad that people are benefiting from it in some small way. I know it doesn't cover nearly what would be needed for a real book on the subject (I do still recommend Eilam's Reversing for book-o-philes) but I think it should be a decent stepping stone to pique interest and get people moving on towards more in-depth treatments.

tptacek11y ago

Write a disassembler at least once. It's much easier than you think it is (even with X86) --- it's essentially a file format parser --- and very illuminating.

userbinator11y ago

It's also much easier to decode x86 instructions when you look at them in octal instead of the hexadecimal that most tables use, since both the main opcode map and ModRM/SIB are organised in a 2-3-3 layout:

http://reocities.com/SiliconValley/heights/7052/opcode.txt

The 8080/8085/Z80 instruction sets also look much better in octal:

http://www.z80.info/decoding.htm

10191411y ago

This comment validates all the time I have "wasted" reading HN over the years.

It does not suprise me that something so simple would be so well overlooked (or, at least, "forgotten"). I wonder if I ever would have figured this out from my own readings and experiments. Doubtful.

Great tip!

1 more reply

tptacek11y ago

I had no idea! This is very cool.

1 more reply

danieljh11y ago

As someone looking more into x86_64 assembly, instruction encoding, syscalls and ELF files recently, not only the lack of good starting points but also the amount of work required to get into it is a pity.

I'm currently using [0] as a helping hand among other resources which is quite good; my maybe not-so-interesting results are at [1].

For example, to get a good overview of instruction encoding you have to 0/ read through and ditch horrible blog posts 1/ find the correct Intel manual 2/ search and read through thousands of PDF pages until you find something interesting 3/ understand the environment and facts that are either implicitly given or in the documents but not easy to find.

For the handful lines of actual code I wrote yesterday [1] I still have around 25 tabs open. Complexity and no end in sight.

Do you have any recommendations and hints as where to start with this in the year 2015?

[0] http://0xax.blogspot.se/p/assembly-x8664-programming-for-lin...

[1] https://github.com/daniel-j-h/x86-64-asm

tibiapejagala11y ago

Seconding this. I've implemented a part of x86 instruction encoding and you either find resources:

* comprehensible, but far from complete (some blogs)

* complete, but hard to understand and requiring some implicit knowledge (Intel manual or [1])

Rather than disassembler I recommend writing some simple JIT compiler, with [2] as a starting point. You skip some problems this way.

[1] http://ref.x86asm.net/ this seems pretty cool as a reference, but I can't wrap my head around it

[2] http://eli.thegreenplace.net/2013/11/05/how-to-jit-an-introd...

2 more replies

Mahn11y ago

And then you realize you could actually take it a step further and before you know it you are hunting for documentation on the amount of cycles each intruction and data access takes and other obscure clocking info, nevermind there are a thousand and one emulators already in existence that are several orders of magnitude better than the half assed attempt you are trying to come up with. Fun times :)

tptacek11y ago

That's such a huge problem with exploratory programming: the demotivating effect of knowing that there are better versions of almost anything you're building, because other people have been working on the problem for longer.

You have to get over it and keep going anyways or you'll never become one of those people yourself! (Or even be able to make an informed decision about whether you want to).

Not coincidentally: part of the point of the company we just started. :)

goalieca11y ago

Cycle accurate emulation of modern x86 would be impossible. Intel simply won't release the secret sauce on branch prediction, cache prediction, out of order execution, internal instruction engine, etc.

1 more reply

mschuster9111y ago

I'm currently writing one in JavaScript of all languages, just to prove it works. also, it's true multi-platform ;)

ianopolous11y ago

You may find this disassembler port to Java I did useful. [JayD](http://github.com/ianopolous/JayD) it's more accurate than both objdump and its source, udis86, after I used it as the base for the [JPC emulator](http://github.com/ianopolous/JPC).

xvilka11y ago

This book is missing very actively developed reverse engineering framework radare2 [1], which is supporting not only x86/x86_64 but also arm, mips, ppc, avr, arc, 8051, TI tms320c55x family and even more!

[1] http://rada.re/

awhitworth11y ago

The title of the book is "X86 Disassembly". It was only ever intended to cover x86. x86_64, mips, ppc, etc are all expressly out of scope.

indutny11y ago

And here is a JS x86 disassembler: https://github.com/indutny/disasm . It is incomplete, but shows how the basics works.

calibraxis11y ago

Why does `push eax` perform "much faster" than the following?

  sub esp, 4
  mov DWORD PTR SS:[esp], eax

A brief skim of Intel's "Software Developer’s Manual" (particularly ch. 6 on stacks), didn't seem to find an answer.

While hitting the ALU just for `sub` might be an extra step, doesn't hitting RAM make that a drop in the bucket? (`sub` may account for less than 1%?) Or is there some caching going on, so RAM may be updated in the background?

(I'm not an assembly programmer; very ignorant of what's happening.)

userbinator11y ago

It's a lot smaller (1 byte vs 6), which means less space spent in the cache and decoder, reducing cache misses and decode bandwidth. The x86 also has a dedicated "stack engine" since the Pentium M (but not suprisingly, absent in NetBurst), which contains an adder and copy of the stack pointer to handle push/pop operations. This is faster than using the general-purpose ALUs and memory read/write ports, and also frees those up for use by other non-stack instructions. On the other hand, it means reading/writing the stack pointer explicitly between implicit stack operations incurs a little extra latency to get the values between the stack engine and "real" ESP register synchronised.

Memory reads/writes do take a few more cycles to complete, but since this is a write, the CPU can continue on with other non-dependent instructions following it. All the above information assumes a CPU based on P6 and its successors (Core, Nehalem, Sandy Bridge, Ivy Bridge, Haswell, etc.); NetBurst and Atom are very different.

Linus also has some interesting things to say about using the dedicated stack instructions: http://yarchive.net/comp/linux/pop_instruction_speed.html

Somewhat amusingly, GCC was well known to generate the explicit sub/mov instructions by default, while most other x86 C compilers I knew of, including MSVC and ICC, would always use push.

calibraxis11y ago

Thanks to you and awhitworth! Very interesting stuff, kept me reading. (And soon searching to understand some of the ideas. And thinking of Linus's puzzle about `call` being faster than a `push` before a jump. Seems to be one of those cases where higher-level abstractions can be optimized better than lower-level ones. I suppose because lower-level ones are too general-purpose, while higher-level ones are constrained.)

awhitworth11y ago

I don't know exactly why, but I have some theories. First, the two instructions you list there aren't parallelizable, because you have a write to ESP and then a read from it, so the first instruction needs to complete first (or, get far enough in the pipeline for the partial result to be usable). This is almost certainly going to cause some stalls.

Beyond that, the best explanations I've seen are that the push/pop instructions are "optimized" in some way. I don't know if that means they are more optimized than just making the inc/read pair on the ESP atomic so it doesn't stall or if there is more to it.

kryptiskt11y ago

It is quite hard to envision a scenario where the current stack frame wasn't in any cache level.

calibraxis11y ago

Ah ok, it has something to do with caching. So the cost of `sub` can be noticeable. Thanks!

peterfirefly11y ago

It's about dependency chains (and fewer bytes, easier decoding, and fewer µops). The OOO core can do more if it has more (but shorter) dependency chains to work with.

http://www.agner.org/optimize/microarchitecture.pdf

§7.7.

j / k navigate · click thread line to collapse

28 comments

awhitworth11y ago

tptacek11y ago

Write a disassembler at least once. It's much easier than you think it is (even with X86) --- it's essentially a file format parser --- and very illuminating.

userbinator11y ago

http://reocities.com/SiliconValley/heights/7052/opcode.txt

The 8080/8085/Z80 instruction sets also look much better in octal:

http://www.z80.info/decoding.htm

10191411y ago

This comment validates all the time I have "wasted" reading HN over the years.

It does not suprise me that something so simple would be so well overlooked (or, at least, "forgotten"). I wonder if I ever would have figured this out from my own readings and experiments. Doubtful.

Great tip!

1 more reply

tptacek11y ago

I had no idea! This is very cool.

1 more reply

danieljh11y ago

I'm currently using [0] as a helping hand among other resources which is quite good; my maybe not-so-interesting results are at [1].

For the handful lines of actual code I wrote yesterday [1] I still have around 25 tabs open. Complexity and no end in sight.

Do you have any recommendations and hints as where to start with this in the year 2015?

[0] http://0xax.blogspot.se/p/assembly-x8664-programming-for-lin...

[1] https://github.com/daniel-j-h/x86-64-asm

tibiapejagala11y ago

Seconding this. I've implemented a part of x86 instruction encoding and you either find resources:

* comprehensible, but far from complete (some blogs)

* complete, but hard to understand and requiring some implicit knowledge (Intel manual or [1])

Rather than disassembler I recommend writing some simple JIT compiler, with [2] as a starting point. You skip some problems this way.

[1] http://ref.x86asm.net/ this seems pretty cool as a reference, but I can't wrap my head around it

[2] http://eli.thegreenplace.net/2013/11/05/how-to-jit-an-introd...

2 more replies

Mahn11y ago

tptacek11y ago

You have to get over it and keep going anyways or you'll never become one of those people yourself! (Or even be able to make an informed decision about whether you want to).

Not coincidentally: part of the point of the company we just started. :)

goalieca11y ago

1 more reply

mschuster9111y ago

I'm currently writing one in JavaScript of all languages, just to prove it works. also, it's true multi-platform ;)

ianopolous11y ago

xvilka11y ago

[1] http://rada.re/

awhitworth11y ago

The title of the book is "X86 Disassembly". It was only ever intended to cover x86. x86_64, mips, ppc, etc are all expressly out of scope.

indutny11y ago

And here is a JS x86 disassembler: https://github.com/indutny/disasm . It is incomplete, but shows how the basics works.

calibraxis11y ago

Why does `push eax` perform "much faster" than the following?

  sub esp, 4
  mov DWORD PTR SS:[esp], eax

A brief skim of Intel's "Software Developer’s Manual" (particularly ch. 6 on stacks), didn't seem to find an answer.

(I'm not an assembly programmer; very ignorant of what's happening.)

userbinator11y ago

Linus also has some interesting things to say about using the dedicated stack instructions: http://yarchive.net/comp/linux/pop_instruction_speed.html

Somewhat amusingly, GCC was well known to generate the explicit sub/mov instructions by default, while most other x86 C compilers I knew of, including MSVC and ICC, would always use push.

calibraxis11y ago

awhitworth11y ago

kryptiskt11y ago

It is quite hard to envision a scenario where the current stack frame wasn't in any cache level.

calibraxis11y ago

Ah ok, it has something to do with caching. So the cost of `sub` can be noticeable. Thanks!

peterfirefly11y ago

It's about dependency chains (and fewer bytes, easier decoding, and fewer µops). The OOO core can do more if it has more (but shorter) dependency chains to work with.

http://www.agner.org/optimize/microarchitecture.pdf

§7.7.

j / k navigate · click thread line to collapse