I believe the Google design is one big pool of machines, perhaps spread across a few buildings, but they hope that any failure only affects a few racks.
They will arrange/move workloads such that any one customer will only see an outage in one 'zone'.
Clearly that didn't work here,