Is this assumption false?
I mean besides this crazy-like-a-fox twin trampoline system, or for compromising binaries at runtime maliciously. The days of overlays are long gone. :)
I haven't built one myself, but I know that the Seaside framework comes with a built-in memory profiler.
Doing intercession of messages to objects in Smalltalk isn't trivial, but it isn't voodoo either. I whipped up a quick method replacement class that swaps out a database 'save' method's behavior for testing. However, I would imagine that trying to replace VM primitive calls would lead to some not-so-nice side effects (maybe... or would it just make the image slower?).
I like Ruby, and I like Smalltalk, but I'm liking Smalltalk much better lately because the "turtles all the way down" aspect which makes things like a memory profiler much less "voodoo-y".
The general Detours approach:
* Disassemble the first N bytes of a target location
* Scoop N bytes worth of opcodes out of the target, and re-host them somewhere heap-allocated
* Replace the N bytes with an absolute jump to a heap-allocated trampoline
* Bounce to your code
* Execute the scooped-up N bytes worth of opcodes
* Jump back to the target
This approach (what the post calls "caller-side trampolines") works well when your targets are function prologues, and less well when it's an arbitrary bblock.
We have an implementation in pure Ruby, complete with a pure-Ruby ia32 assembler (which is one of the most useful little pieces of code I've written) at Timur's Ragweed repository --- google "Ruby ragweed github".
It used to be used quite comprehensively to add non-Apple-sanctioned functionality to Mac OS X. Hopefully it isn't used much these days.