What I sadly don't get is the hierarchy of your routers and switches by the pictures and the lack of QSFP+ utilization as well as the amount of copper. What's the reason behind the copper sfp transceivers in the ASR-1001[0]?
[0]https://nickcraver.com/blog/content/SO-Hardware-Network-NewY...
But a better answer is: that price keeps changing. We now have a lot of AWS and on-prem experience in house to do a great post. We'll be doing a lot of research and proper comparisons as a huge part of that upcoming post: https://trello.com/c/4e6TOnA7/87-on-prem-vs-aws-azure-etc-wh...
Will we run Stack Overflow on core? Probably not for a while. I was asked exactly this in a recent On.Net interview at the 24:22 mark; you can listen here for reasoning: https://youtu.be/DJn8-Psznsw?t=24m22s
In the Stack Exchange cluster, the RAID 0 NVMe array contains all databases except for a large log database (which I called Careers.BigStuff, because it seemed like a good idea at the time). This larger log database is much more rarely accessed and is on the 10K HDD RAID10.
We run backups on the primary for several reasons, but they are all sent off-box. We have 2 primary on-site backup servers, then those backups go offsite and to tape. Database backups are every 15 minutes with T-Logs and full backups nightly. We also run copy-only backups in the DR data center nightly as an additional backup measure.