Also it's not a high overhead runtime. The runtime itself is compiled to highly optimized machine code (it can even compile, say the derivative of f(x) = 5x+3 down to the machine immediate "5" at compile time).
There is a lot of lifting to get that compilation framework into place, so there is a load-time overhead.