It's a combination of factors, optimization certainly takes a lot of time, but it's not the only significant factor. The modular multi-pass architechture also takes a toll. If you look at fast c compilers like TCC, you can see that they get their speed primarily from doing fewer passes over the source code. In the case of TCC it only does a single pass, which severely limits the types of optimizations it can do, but if you compare the performance of TCC to that of gcc or clang with optimizations disabled (-O0), TCC is significantly faster. So you can see that the architechture used to make large extensible optimizing compilers like GCC also inherently removes some of their speed.
I believe it is theoretically possible to write many of the optimizations that larger compilers have in fewer passes, but that comes at the cost of complexity and difficulty in debugging different combinations.