This is like saying "Hitachi Storage hard drives broke" when you actually mean "we didn't run RAID".
There is a reason why Amazon and Google takes EBS/Persistent Disk failures very seriously: there are not supposed to be unavailable during several hours, except if the whole datacenter is unable to operate (flood, fire, etc.), but it's not the case here.
If your RAID fails, and you have a support contract which guarantees restoration within 1 hour, and it's not restored within 1 hour, then I think you can legitimately say something was wrong at your provider. It's not pointing fingers. Everyone does mistakes. It's taking responsibility.
That said, I agree they should have run in multiple zones, as recommended by Google, if they need/want to avoid that kind of downtime.
But I maintain Google Compute Engine Persistent Disk are not supposed to fail in such a way, and I'm quite sure Google will do whatever they can to avoid this in the future, instead of saying "don't point finger at us, it's supposed to happen".
[1] https://cloud.google.com/compute/sla
All that said, people choose SSD because it's faster and has higher throughput, so SSDs not being fast is obviously a real problem for applications relying on this, and rest assured we are indeed doing whatever we can to avoid this in the future.
Disclaimer: I work in Google Cloud Support.
If you don't follow your vendor's recommendations for how to use their product, how can you blame them when that exact recommendation would have saved you?
> Google Compute Engine Persistent Disk are not supposed to fail in such a way, and I'm quite sure Google will do whatever they can to avoid this in the future
Sure. And the power to my office is not supposed to go out (and I've certainly worked in places where there has never been an unplanned power outage in decades), but if my business relies on it I need a UPS.
> instead of saying "don't point finger at us, it's supposed to happen".
It's not, and they shouldn't. Also unless you know something I don't, they didn't.
> If your RAID fails, and you have a support contract which guarantees restoration within 1 hour,
But as other commenter pointed out: Google did not violate the SLA during this, apparently. So…
> It's not, and they shouldn't. Also unless you know something I don't, they didn't.
Sorry, my comment was confusing. Google of course never said or wrote such a thing.
> Google did not violate the SLA during this, apparently.
I agree.