There is a slight bit of stack overhead: Option<ForkResult> is at least {tag:u8, {tag:u8, pid:i32}}, and due to alignment constraints it's actually {tag: u32, {tag: u32, pid: i32 }}). A nonzero wrapper[0] would allow folding either ForkResult or Option into a 0-valued pid_t and remove one level of tagging: http://is.gd/yxStW1
Beyond that you'd need generalised enum folding in order to fold two tags into the underlying value (you'd denote that pid_t is nonzero and nonnegative for instance)
[0] which is unstable, so not really an option
Compiler optimizations aside, C does a pretty good job at this. It's way more efficient than writing assembly, but your still basically just moving memory around, while doing some arithmethic. Easy to understand in "machine" terms.
Of course, this is only relevant when you're doing low-level stuff, like kernel or drivers programming. For the userland, Rust really looks like a nice language (I've played with it just a bit), and I'd be really happy if it pushes C++ away ;-)
We live in a world of many cores, and multiple CPUs all over the place - in your GPUs, your hard drives, motherboard controllers - and the intrinsic language support for multithreading literally does not exist as part of the C99 standard? One has to reach out to a mixture of POSIX, and the compiler extensions the POSIX implementation uses to annotate memory barriers so the optimizer won't break things, and intrinsics that introduce atomic operations, and... gah!
C and C++ do such a terrible job of this I have to resort to disassembly to debug program behavior far too frequently. These are the only languages I'm forced to do this with. If C or C++ were really "close to what the machine will really do", I'd expect the opposite result.
Even simple things like class and structure layouts and type sizes are controlled by a mess of compiler and architecture specific rules and extensions to control the application of those rules with regards to padding, alignment, etc. which I get to debug. Ever had to debug differences in class layout between MSVC and Clang due to differently handling EBCO in a multiple inheritance environment? What about handling alignment of 8-byte types on 32-bit architectures differently? At least you've replaced all uses of "long" because of the mixture of LP64 and LLP64 compilers out there...? And what about when two incompatible versions of the standard library with different type layouts get linked in by a coworker? These are the symptoms of a language that doesn't control what the machine is really doing very well at all.
When I really need tight control over what the machine will do at a low level, my tools are actual (dis)assembly, intrinsics, an understanding of the underlying hardware itself, and simple code that eschews features requiring significant runtime support or underpinnings. None of those are C or C++ specific. The last one requires some knowledge of how a language's features are implemented - C and C++ might be broken enough that you're forced to wrestle with that topic, when it's more optional in other languages, but... that still doesn't make it C or C++ specific.
</rant>