I know this probably won't help with your current project, but you should think of your compiler as an exotic virtual machine: your code is the input program, and output executable is the output. Just like with a "real" CPU, there are ways to write a program that are fast, and ways to write a program that are slow.
To continue the analogy: if you have to sort a list, use `qsort()`, not `bubble sort()`.
So, for C/++ we can order the "cost" of various language features, from most-expensive-to-least-expensive:
1. Deeply nested header-only (templated/inline) "libraries";
2. Function overloading (especially with templates);
3. Classes;
4. Functions & type definitions; and,
5. Macros & data.
That means, if you were to look at my code-base, you'll see lots-and-lots of "table driven" code, where I've encoded huge swathes of business logic as structured arrays of integers, and even more as macros-that-make-such-tables. This code compiles at ~100kloc/s.We don't use function-overloading: one place we removed this reduced compile times from 70 hours to 20 seconds. Function-overloading requires the compiler to walk a list of functions, perform ADL, and then decide which is best. Functions that are "just C like" require a hash-lookup. The difference is about a factor of 10000 in speed. You can do "pretend" function-overloading by using a template + a switch statement, and letting template instantiation sort things out for you.
The last thing is we pretty much never allow "project" header files to include each other. More importantly, templated types must be instantiated once in one C++, and then `extern`ed. This is all the benefit of a template (write-once, reuse), with none of the holy-crap-we're-parsing-this-again issues.