What's happening here is that they're getting computation without executing any instructions, simply through the process of using the MMU hardware to "resolve addresses". The page directory system has been set up in such a way that address resolution effects a virtual machine that they can code to.
This works because when you attempt to resolve an invalid address, the CPU generates a trap (#PF), and the handling of that trap pushes information on the "stack". Each time you push data to the stack, you decrement the stack pointer. Eventually, the stack pointer underflows; when that happens, a different trap (#DF) fires. This mechanism put together gives you:
if x < 4 { goto b } else { x = x - 4 ; goto a }
also known as "subtract and branch if less than or equal to zero", also known as "an instruction adequate to construct a one-instruction computer".The virtual machine "runs" by generating an unending series of traps: in the "goto a" case, the result of translation is another address generating a trap. And so on.
The details of how this computer has "memory" and addresses instructions is even headachier. They're using the x86 TSS as "memory" and for technical reasons they get 16 slots (and thus instructions) to work with, but they have a compiler that builds arbitrary programs into 16-colored graphs to use those slots to express generic programs. Every emulator they could find crashes when they abuse the hardware task switching system this way.
Here's it running Conway's Life:
http://youtubedoubler.com/?video1=E2VCwBzGdPM&start1=0...
Here's their talk for a few months back:
http://www.youtube.com/watch?v=NGXvJ1GKBKM
The talk is great, but if you're not super interested in X86/X64 memory corruption countermeasures, you might want to skip the first 30 minutes.
Probably by patching the emulator.
By running it on a physical machine. Unless there is a requirement that the processor not be multitasking that I am missing.
Actually is it even true that they are not executing any x86 instructions on the CPU? From my understanding, the handling of page faults needs the CPU to execute some instructions. Maybe I'm wrong, if so could you enlighten me? Thanks.
I suppose if you setup everything very carefully you can make it fault over and over again without giving it the time to execute any instruction.
Without looking into the specifics, I think it's very possible that the CPU is not actually executing any instructions, just waiting for the MMU to get a hold of itself. After all, in order to simply load the instructions you need the MMU to be responsive (or deactivated I suppose, if there's such a thing as no-MMU x86).
The easiest way however would be to use the TrapCC mechanism to transfer control between bits of normal assembler code (perhaps repurposed from other functions already in your kernel), doing something similar to ROP. Of course, for additional fun, feel free to throw in BX's Brainfuck interpreter in ELF and James Oakley's DWARF exception handler. We might drop a demo of this soon, i.e. implementing a self-decrypting binary via page faults.
I'm wondering what method PFLA uses to read/write non-code addresses. Only one address per page can be addressed? I'll take a look at the compiler.
By simply expanding the addressing capability, a very tiny program could emulate an instruction stream from memory, overcoming the limited code space (at the cost of execution speed).
Cheers!
This is basically the canonical instruction for OISCs (one instruction set computers). Wikipedia describes it pretty well: https://en.wikipedia.org/wiki/One_instruction_set_computer#S....
https://events.ccc.de/congress/2012/Fahrplan/events/5265.en....
video:
Would that slow things down incredibly?
Is there a way the virtual machine might spawn another virtual machine child of its own?
http://www.cs.dartmouth.edu/~bx/elf-bf-tools/slides/ELF-berl...