Why do Windows functions all begin with a pointless MOV EDI, EDI instruction? (opens in new tab)

(blogs.msdn.com)

320 pointscabacon14y ago43 comments

43 comments

35 comments · 8 top-level

jswinghammer14y ago· 7 in thread

I have basically no background writing applications on Windows outside of .Net but I love reading posts by Raymond Chen. I always enjoying learning about things that seem crazy from the outside from have a real purpose that you're just missing information to understand. That's pretty much what looking at someone else's code is often like so it's helpful to remember that even seemingly crazy things have a purpose.

I feel like I've learned a lot from reading his blog over the years. I even bought his book years ago because I felt like I was getting a lot of value from the blog.

It's really too bad Microsoft doesn't seem to value backwards compatibility as much as they did during the times Chen often writes about. It seems like an interesting challenge that they've pretty much given up on. I can't even count how many conversations I've been in where people complained on one hand that Microsoft focused on that backwards compatibility too much and on the other that their driver from 2001 doesn't work right in Windows 7. Often these statements happen moments apart.

bdonlan14y ago

Microsoft puts a LOT of effort into application compatibility. They're very serious about it, and the fact that they fail at times is proof of how hard it is to maintain app compatibility when application developers are doing their best to sabotage you by using internal APIs and corrupting the heap.

Driver compatibility is less of an issue; older hardware becomes less common (and thus less of an issue) over time, and strictly adhering to kernel compatibility limits your ability to add new features to the kernel significantly.

cousin_it14y ago

That's true, but sadly they don't put as much effort into compatibility of development tools. Old code has to be ported to new versions of Visual Studio. Even just sticking with C++, I ran into issues with winioctl.h and unicode fstreams requiring different code for different versions of VC++. I imagine other people may have a longer list.

2 more replies

cabaconOP14y ago

I have never written anything that runs on Windows, regardless of language or API, and I still subscribe to his blog. I feel badly that he gets snarked at so much that he has started injecting pre-emptive snark responses; I think he might write more, or more candidly, if it weren't for the highly vocal peanut gallery he has attracted.

This one in particular seemed worthy of sharing because of a couple of things. Firstly, if you just saw the artifact and not the reasoning, you would probably see a lot of WTF in having 7 bytes of NOP-type instructions, with five outside and two inside each function; it's a great reveal when you find out what it is for. Then my reaction was "Wow - I don't really design good instrumentation/logging/debugging points in my code, do I? I wonder if there's any part of this idea I can fruitfully rip off for my own code?" It was a nice one-two punch.

jbeda14y ago

I can guarantee that the snark is intrinsic to Raymond. He is just that type of guy. He loves a good story, is very sarcastic and likes to troll co-workers.

(I could tell quite a few Raymond stories from when we worked together but it is better to leave it to him. It is a shame that he sticks to technical topics mostly on his blog.)

1 more reply

syaz114y ago

I thought I'm the only one doing this. Raymond Chen is worth subscribing to just for his knowledge alone, regardless of what platform you program for.

rsynnott14y ago

I haven't used Windows more than occasionally for a decade, and I still read Chen religiously; he's very entertaining.

> It's really too bad Microsoft doesn't seem to value backwards compatibility as much as they did during the times Chen often writes about.

This, I'd disagree on. At a certain point, one either has to move on, or to spend 90% of the time working on backward compatibility.

> on the other that their driver from 2001 doesn't work right in Windows 7

Drivers were never really part of the backward compatibility obsession. They changed utterly between 9x and NT, of course, and also fairly dramatically DOS/Win3.x and 9x, between NT4 and 2000, and between XP and Vista.

dramaticus314y ago

They have been trying to cover the various markets with one product but instead of offering Windows Home, Home Pro, Pro, Super Pro or whatever they call them, they should split into Windows Classic, Windows Nouveau and Windows Server. Now they are trying to do it on a monolith with Win8.

tptacek14y ago· 7 in thread

If you're never had a chance to play with it, Detours, the more complex alternative to the hot-patch strategy Chen is talking about, is really slick.

What you do in Detours is, freeze the process, disassemble the first several instructions of the function you want to hook, copy out enough of them to make room for a full jump instruction, copy in your hook function somewhere in memory, followed by the instructions you stole to make room for the jump, followed by a jump back to the original function. Then you patch in a jump to that location and unfreeze the process.

The example programs for Detours do this, for instance, on every libc function to implement library tracing.

That this "just works" with Microsoft's Detours package is kind of mindboggling.

This is a great project to tackle if you want to write programmable debuggers. We've done it for Win32 (you need a full build environment to use Detours; we have the whole thing in Ruby), OS X, and Linux. It's crazy useful.

profquail14y ago

Detours is really cool. Last I checked (last year?) it was free for 32-bit code, but you had to pay for or license the 64-bit version. There's an open-source (but not 100% functionally-equivalent) alternative called EasyHook: http://easyhook.codeplex.com/

An anecdote: I've got a Sony VAIO Z series laptop, one of the 2010 models with "Switchable Hybrid" graphics -- that is, there's a switch marked "Auto"/"Speed"/"Stamina" which you can use to switch between the embedded Intel GPU and the discrete nVidia GPU. The laptop itself is great -- probably the best developer's laptop I've ever seen/used -- but drivers have always been a real pain. Anyway, as it turns out, I was updating the drivers last week and just happened to notice the Detours DLL within the driver installer files; so it seems that the graphics driver actually just checks the position of the switch and uses Detours to direct any calls to the "real" driver for whatever hardware is selected.

Someone14y ago

Does that handle double detours, where two different programs each patch the same function?

That was the most entertaining/frustrating (depending on your world view; it made the journey way more interesting, but also a lot longer) part of hacking classic Mac OS.

You would have a zillion extensions (apple and third party) each patch tens of OS calls, both at startup and, in cases where the Finder reset patches, after the Finder launched, or even after every application launched (to get your code running at such times, you would have to patch another OS call). On PowerPC machines you would have the added fun of patching PPC code with 68k code and vice versa.

That that _sometimes_ worked was really mind boggling. Relative to that, patching your own libc seems easy.

ryanmolden14y ago

>Does that handle double detours, where two different programs each patch the same function?

I believe Detours patches the import table inside a single process, it does not patch the call on a system wide basis, so there really is no 'two different programs each patching the same function'. In theory you could have two different pieces of code running in the same process doing that (i.e. each patching a given function), but Detours gives you a 'trampoline' function to invoke the original thing you are patching so I believe the second to patch, when invoking the trampoline function would simply invoke the first patch, which when invoking its own trampoline would invoke the original, though I haven't tried that so it may not work that way :)

VS uses this mechanism allow you to run as non-admin even though there is TONS of VS and third party code that expects to do things like write to HKLM which is a no-no if you are not an admin.

1 more reply

braindead_in14y ago

I used Detours years ago to hook into the Wave I/O API and DirectSound to capture the audio I/O. I was blown away by the power of API Hooking. Nothing concrete came out of it but it was a lot of fun.

ecaradec14y ago

If all you need is to trap a few functions Intel published a paper on how to intercept API call. It can be very useful if you need to fix some function inside a DLL you load, modify it's behavior or log calls for debugging. There is a lot of potential uses and it's a fun technique that teaches you some things about cache control and assembly.

http://software.intel.com/en-us/articles/intercepting-system...

kmm14y ago

That's coincidental. I once made a Linux kernel function hijacking module and I had the exact same idea. Well, without the freezing of other (kernel) threads, I didn't realize the problems yet.

Now that I think about it I'm even more surprised it worked in the first place. I thought Linux had w^x protection?

jrockway14y ago

It does, but you have to explicitly turn it on for the pages you care about.

cousin_it14y ago· 6 in thread

Okay I have two questions that might be very clueless but I don't know the answer to them so I will ask them anyway.

1) In the comments Raymond says, "Hot-patching is not an application feature. It's an OS internal feature for servicing." Then why does the compiler put hot-patch points in my code? Why not use a special compiler flag when building Windows DLLs?

2) Why do we need a special hot-patch point at all? What's wrong with just overwriting the first few bytes of the function you want to hot-patch?

to3m14y ago

Having a hot-patch point avoids race conditions when overwriting the code. The patch target needs to be 1 instruction, so that any OS threads executing that code either see the old instruction (and run the unpatched code), or the new instruction (and run patched code), and never some mishmash of pre-patch and post-patch instructions.

(It also needs to be possible to overwrite the patch target with 1 instruction, which isn't possible for a far JMP as they are 5 bytes in length.)

cabaconOP14y ago

1) I didn't see anything that suggested that all DLL functions have this hot-patch point. I think from his perspective "Windows DLL" means "a DLL that is part of the Windows operating system", not "a DLL used by an application executing on Windows".

2) I think he addressed this - someone might be executing the function while you are trying to patch it. Having a 2-byte, one clock cycle NOP at the front means that you can replace it "atomically" from the perspective that nobody can walk into the middle of you updating the memory.

cousin_it14y ago

Thanks! Re 1), it does seem to be a compiler switch /hotpatch, not the default behavior.

tptacek14y ago

For one thing, you can't just scoop out 2-5 bytes, replace them with a jump, and assume that things will work. Detours, the alternative to wired-in hot patch points, includes a small disassembler that ensures it's working on instruction boundaries. Detours is significantly more complex than the patching strategy Chen outlined.

sp_14y ago

About 2: You would have to copy the overwritten bytes to another place in memory to execute them later. As the length of x86 instructions is not fixed you would need a whole disassembler to find out what bytes belong to what instruction. Easier to have just two bytes you can overwrite at will. Saves the hassle of calculating how many bytes you need to copy.

tptacek14y ago

I actually don't know how Windows hot-patching works, but I'd assume they'd just replace the whole function. You wouldn't need to execute the replaced bytes (like you do in Detours, which is usually hooking functions, not replacing them).

giardini14y ago· 5 in thread

Whatever happened to the old idea of separating program and data spaces and write-protecting the program space?

xpaulbettsx14y ago

Instead we do the opposite today, we mark the data pages with the "No Execute" bit, which is a far better way to ensure malicious code doesn't execute.

DrJokepu14y ago

I'm by no means an expert in this domain but my understanding is that that's normally not happening on x86/x64, not even on Linux or OS X. Otherwise, how could just-in-time compilers like Java, .NET or V8 work? Please correct me however if I'm wrong.

rsynnott14y ago

Modern x86 does have the NX bit (and other major archs like ARM have equivalents), which allows areas of memory to be marked executable or not, and most modern operating systems do use it. JIT VMs will explicitly set it on produced code.

This is part of the reason that JIT is not allowed, and will not even _work_, on iOS (with the exception of Mobile Safari's Javascript engine, which has special privileges). Applications aren't allowed set the ARM NX equivalent.

haberman14y ago

JIT compilers allocate executable memory by using mmap() (not malloc(), since malloc()-allocated memory is not guaranteed (or even likely) to be executable). When you map memory with mmap(), you can decide the protection bits (PROT_READ, PROT_WRITE, PROT_EXEC, and they can be OR'd together in any combination).

For example, from the V8 sources: http://www.google.com/codesearch#W9JxUuHYyMg/trunk/src/platf...

2 more replies

alnayyir14y ago

These are von neumann machines, not harvard, at all levels of abstraction save for some newer security measures.

ajross14y ago· 1 in thread

NOOP sequences in x86 are a fun subject. There's an interesting section in Intel's optimization guide somewhere (I'm too lazy to find it) that details "best practice" noop instructions of 1, 2, ... up to something like 9 bytes. These are used for alignment puposes too, where you need a few bytes of padding to make a loop-back target cache-line aligned or whatnot.

haberman14y ago

Check out the "smartalign" package in NASM: it contains 1-8 byte nop instructions for several different x86 variants: http://repo.or.cz/w/nasm.git/blob/a2c78555770990ed966c414da9...

You can use xxd and objdump to see which all of these translate into. For example, here's an 8-byte nop for x86-64:

  $ echo '0x0f,0x1f,0x84,0x00,0x00,0x00,0x00,0x00' | xxd -r > /tmp/bincode
  $ objdump -M intel -D -b binary -mi386 -Mx86-64 /tmp/bincode
  
  /tmp/bincode:     file format binary
  
  
  Disassembly of section .data:
  
  00000000 <.data>:
     0:   0f 1f 84 00 00 00 00    nop    DWORD PTR [rax+rax*1+0x0]
     7:   00

alexwestholm14y ago· 1 in thread

Wow awesome explanation - about 6 years ago while hacking gtk+ and Mozilla I used those instructions to hack into the main event loop to get gtk+ embedding gecko 1.7 and had no idea that those my perceived hacks where actually some what valid method for doing what I needed to do - modify how window events from gecko where propagated to gtk+ event loop and vice versa. I think that my bug report is probably still open and might even be worth revisiting if anyone is still interested in gtk+ with Mozilla embedded - would likely need to make lots of changes... Latest gecko is 1.9?? Anyways awesome explanation

sid014y ago

Latest Gecko is 9.0 to coincide with Firefox 9. :)

rwmj14y ago

For those that are interested, the Linux kernel does almost the same thing (if compiled that way):

https://lwn.net/Articles/264029/

The mcount feature piggybacks on the profiling instruction added into every function when you use the gcc -pg option.

Edit: better link is probably this one: http://www.mjmwired.net/kernel/Documentation/trace/ftrace.tx...

wwwww14y ago

Then why do I need to restart the computer after I install anything?

j / k navigate · click thread line to collapse

43 comments

35 comments · 8 top-level

jswinghammer14y ago· 7 in thread

I feel like I've learned a lot from reading his blog over the years. I even bought his book years ago because I felt like I was getting a lot of value from the blog.

bdonlan14y ago

cousin_it14y ago

2 more replies

cabaconOP14y ago

jbeda14y ago

I can guarantee that the snark is intrinsic to Raymond. He is just that type of guy. He loves a good story, is very sarcastic and likes to troll co-workers.

(I could tell quite a few Raymond stories from when we worked together but it is better to leave it to him. It is a shame that he sticks to technical topics mostly on his blog.)

1 more reply

syaz114y ago

I thought I'm the only one doing this. Raymond Chen is worth subscribing to just for his knowledge alone, regardless of what platform you program for.

rsynnott14y ago

I haven't used Windows more than occasionally for a decade, and I still read Chen religiously; he's very entertaining.

> It's really too bad Microsoft doesn't seem to value backwards compatibility as much as they did during the times Chen often writes about.

This, I'd disagree on. At a certain point, one either has to move on, or to spend 90% of the time working on backward compatibility.

> on the other that their driver from 2001 doesn't work right in Windows 7

dramaticus314y ago

tptacek14y ago· 7 in thread

If you're never had a chance to play with it, Detours, the more complex alternative to the hot-patch strategy Chen is talking about, is really slick.

The example programs for Detours do this, for instance, on every libc function to implement library tracing.

That this "just works" with Microsoft's Detours package is kind of mindboggling.

profquail14y ago

Someone14y ago

Does that handle double detours, where two different programs each patch the same function?

That was the most entertaining/frustrating (depending on your world view; it made the journey way more interesting, but also a lot longer) part of hacking classic Mac OS.

That that _sometimes_ worked was really mind boggling. Relative to that, patching your own libc seems easy.

ryanmolden14y ago

>Does that handle double detours, where two different programs each patch the same function?

VS uses this mechanism allow you to run as non-admin even though there is TONS of VS and third party code that expects to do things like write to HKLM which is a no-no if you are not an admin.

1 more reply

braindead_in14y ago

I used Detours years ago to hook into the Wave I/O API and DirectSound to capture the audio I/O. I was blown away by the power of API Hooking. Nothing concrete came out of it but it was a lot of fun.

ecaradec14y ago

http://software.intel.com/en-us/articles/intercepting-system...

kmm14y ago

That's coincidental. I once made a Linux kernel function hijacking module and I had the exact same idea. Well, without the freezing of other (kernel) threads, I didn't realize the problems yet.

Now that I think about it I'm even more surprised it worked in the first place. I thought Linux had w^x protection?

jrockway14y ago

It does, but you have to explicitly turn it on for the pages you care about.

cousin_it14y ago· 6 in thread

Okay I have two questions that might be very clueless but I don't know the answer to them so I will ask them anyway.

2) Why do we need a special hot-patch point at all? What's wrong with just overwriting the first few bytes of the function you want to hot-patch?

to3m14y ago

(It also needs to be possible to overwrite the patch target with 1 instruction, which isn't possible for a far JMP as they are 5 bytes in length.)

cabaconOP14y ago

cousin_it14y ago

Thanks! Re 1), it does seem to be a compiler switch /hotpatch, not the default behavior.

tptacek14y ago

sp_14y ago

tptacek14y ago

giardini14y ago· 5 in thread

Whatever happened to the old idea of separating program and data spaces and write-protecting the program space?

xpaulbettsx14y ago

Instead we do the opposite today, we mark the data pages with the "No Execute" bit, which is a far better way to ensure malicious code doesn't execute.

DrJokepu14y ago

rsynnott14y ago

haberman14y ago

For example, from the V8 sources: http://www.google.com/codesearch#W9JxUuHYyMg/trunk/src/platf...

2 more replies

alnayyir14y ago

These are von neumann machines, not harvard, at all levels of abstraction save for some newer security measures.

ajross14y ago· 1 in thread

haberman14y ago

Check out the "smartalign" package in NASM: it contains 1-8 byte nop instructions for several different x86 variants: http://repo.or.cz/w/nasm.git/blob/a2c78555770990ed966c414da9...

You can use xxd and objdump to see which all of these translate into. For example, here's an 8-byte nop for x86-64:

  $ echo '0x0f,0x1f,0x84,0x00,0x00,0x00,0x00,0x00' | xxd -r > /tmp/bincode
  $ objdump -M intel -D -b binary -mi386 -Mx86-64 /tmp/bincode
  
  /tmp/bincode:     file format binary
  
  
  Disassembly of section .data:
  
  00000000 <.data>:
     0:   0f 1f 84 00 00 00 00    nop    DWORD PTR [rax+rax*1+0x0]
     7:   00

alexwestholm14y ago· 1 in thread

sid014y ago

Latest Gecko is 9.0 to coincide with Firefox 9. :)

rwmj14y ago

For those that are interested, the Linux kernel does almost the same thing (if compiled that way):

https://lwn.net/Articles/264029/

The mcount feature piggybacks on the profiling instruction added into every function when you use the gcc -pg option.

Edit: better link is probably this one: http://www.mjmwired.net/kernel/Documentation/trace/ftrace.tx...

wwwww14y ago

Then why do I need to restart the computer after I install anything?

j / k navigate · click thread line to collapse