I don't run on-prem clusters or clouds but know a couple people who do and, at large enough scale, it is a constant "fuck-shit-stack on top of itself" (to quote Reggie Watts). There is almost always something wrong and some people upset about it.
The promise of a fully integrated system (compute HW, network HW, all firmware/drivers written by experts using Rust wherever possible) that pays attention to optimizing all your OpEx metrics is a big deal.
It may take Oxide a couple more years to really break into the market in a big way, but if they can stick it out, they will do very well.