Very good point. This is even more critical in a FPGA as you are working with a fairly constrained resource.
There were a number of people in my first hardware design course that had a great idea right up until they found it didn't fit on the xilinks chips we had in the lab.
From the proposed flat design, adding pipelining would probably add 10-20% more area, and double or triple the speed. Definitely worth exploring (and would indeed make for a great follow-up post).
Only if it's a 3-stage pipeline without a hazard detection. Otherwise the area would at least double. But, yes, I'd also like to see a pipelined core in this new HDL.