And here's the deal, with more years in this industry, I've always found that the optimal performance almost always comes from a simple architecture.
And that performance optimization done prematurely often achieve complete opposite result. Meanwhile performance optimization done at necessity rarely have the same problem.