Much of that 5M are hardware register definitions expanded into C headers. I am not sure how you'd consolidate that but it's not like that's all bespoke C code.
I sympathize, but the reality is that except for very specialized cases, (hyper) optimizing for CPU performance is unnecessary, even in the embedded space. A Cortex-M0 has roughly the same performance as a 486, and is cheap and power efficient enough to be bundled in disposable test kits, vapes, etc.