I used to work on a library that solved a certain type of mathematical problem, written mainly in C++. It was built separately for many target platforms in production: different operating systems, processor architectures, compilers. Each supported combination of these might require special provisions in the code due to compiler bugs, hardware limitations, etc. Then you’d have diagnostic code that was only included in the debug builds used during development. There are more than enough preprocessor shenanigans to go around in that kind of environment already, and another layer for test-related flags and checks isn’t going to break any records for most readable code.
I think this is just internalized pain of working with C++ (I suffered a lot with that 20 years ago). For systems-level stuff, I'm getting very excited about Zig, I imagine it will be basically "equally easy" to instrument these sorts of things at compile-time in Zig, as types are first-class values at compile-time. Zig is also super-composable; I have an example prime sieve that can run 60+ settings (combinations of single threaded, multithreaded, bool array, bitmap, hyperthread-awareness) by using a compile-time for loop across a series of instrumented settings; so long as I stay within the standard library the language itself already takes care of the polyfill for platform stuff, and I have access to those flags, if, say I wanted hyper-optimize for things like sse. I don't see any mocking libraries yet, but I doubt that instrumenting a mocking namespace in test will be any more difficult than it would be, in, say, elixir.