Adding a conditional ("do I answer or do I proxy?") on every DNS query -- and there are many -- is going to introduce enough latency to be noticed unless you throw a lot of gear at it. And you're still going to introduce latency by inserting another hop. That's my point, though I do agree with you.
Welcome to the world of recursive name servers, there is a lot of software out there that does exactly what you just mentioned, I fail to see what would be hard about making this change.
Hm, why? Any modern CPU is blazingly fast. Writing it in Ruby probably wouldn't be smart, but Python + PyPI or Lua + LuaJIT would easily get within a factor of 10x of C.
This isn't a "could it be done?" exercise, it's more of a "could it be done without detection?" exercise. For this specific case, it's a pretty big risk.
As for detection, that was the reason I brought up CPU power. Modern CPUs are so fast that that it seems like this redirector would hardly generate a blip in any chart (such as top).
I don't care about proving anybody wrong. I care about filling my knowledge gaps. I.e. it's interesting to try to figure out how something like this would be detected in practice.