It's almost tradition to have a rack with UPS's in the bottom and then the rest of the space filled with servers or drive arrays.
We wouldn't ever think of putting a tiny backup generator in the bottom of every rack, so why do we put a battery storage system there? Also, with the advances in battery chemistry technology that improve reliability and density, it's only a matter of time until Lithium chemistry batteries are available and that also increases the risk of fire.
Is there any reason not to move backup power to another room, or even to a separate structure like how they put backup generators on a pad outside of the building?
https://www.cnet.com/news/google-uncloaks-once-secret-server...
See http://apec.dev.itswebs.com/Portals/0/APEC%202017%20Files/Pl... page 6.
Yes: longer cables.
See Figure 1 in the Schneider-APC white paper, where they have "Electrical Space", "Mechanical Space" (HVAC), and IT Space:
* https://download.schneider-electric.com/files?p_File_Name=VA...
Power is generated hundreds of kilometres from where it is used, so having your UPS room a few dozen metres from your actual DC room isn't a big deal. I-squared-R losses aren't going to be that huge.
Europe uses 400Y/230 for nominal low-voltage distribution (see Table 1 in above), so stringing some 400V extra copper to the PDUs, which then have 230V at the plugs, isn't a big deal.
Not all Li-ion chemistries are equal. In particular, the increasingly popular LiFePO4 (LFP) technology has much higher energy density, longer lifespan, improved environmental characteristics, and similar if not better safety characteristics compared to lead-acid.
(Besides sharing lead-acid's very low risk of fire, LiFePO4 also contains no corrosive acids which could damage and short equipment if a leak were to occur.)
Isn't lead-acid prone to releasing hydrogen gas?
When jump-starting a car, it is commonly recommended to connect the ground (black) cable to the chassis, not to the battery's black terminal, to avoid a spark igniting the hydrogen and causing an explosion.
I've seen people use UPSes to allow them to rearrange wiring. I've seen them fail by relying on the UPSes as well, of course.
If you wire the entire room with 2-3 separate electrical systems all powered off of separate remote UPSes, you can do whatever you want, but it's harder to change your mind or build out incrementally if you do.
Battery rooms were traditionally separate, used lead acid batteries, were surrounded by thicker walls, and equipped with FM200, just like main datacenter floors. They were typically placed near the transfer and PDU switchgear. I wouldn't put anything more flammable than LiFePO4 in a battery room, much less anywhere near a server.
It's people who decide to throw away conventions, common sense, and building codes because "they know better" who get into trouble.
I suspect this datacenter company could be sued into oblivion.
All racks and servers were connected to dual power feeds, so even when one of the feeds goes down the servers should run fine.
The lack of fire suppression is also very worrying.
How close the 3 data centers are in SBG: https://cdn.baxtel.com/data-center/ovh-strasbourg-campus/pho...
How hot that fire was. I'm pretty sure the orange spots are holes melted in the walls that are made from metal shipping containers: https://pbs.twimg.com/media/EwGqV17XMAMF_wa?format=jpg&name=...
Second question: is such a system required for this kind of operation? Maybe?
That should be mandatory. Otherwise, it'd be very hard to contain. Especially when all your servers have RAID controllers with Li-Ion batteries or supercapacitors or other extremely trigger-happy components.
Oh, and cooling systems. You're just kindling the fire with it at the beginning.
No, they are a major risk to the employees working there.
I would much rather have a data center destroyed by for every twenty years without victims than mandating the user of oxygen replacing fire suppressants.
They are hosting at least 400,000 servers. They for sure have multiple servers taking fire every single day, and yet it’s the first time it ends up in a catastrophic fire.
The fire suppression system either catastrophically failed, or something out of design happened with one of their inverters, as suggested in the video.
Maybe one day if have a basement and some kind of concrete compartment to put the battery in I'll feel a bit better about it... but even then, not much you can do about gas leakage if that's a possibility with more recent UPSs.
Or get a LiFePo battery and put it in a metal tub.
They're building 2500 servers per week.
For the offline buildings that are not destroyed, they have to rebuild the electrical distribution and network. It was not clear if they are also moving servers physically.
Back when I worked in hosting we'd get an email from this particular DC's NOC about either "UPS Maintenance" or generator testing. Our hearts would sink because, during one particular eighteen month period, there was a 50/50 chance our suite would go dark afterwards.
Also, in the first part of a lithium battery fire, dropping water on them is quite explosive. It will eventually quench the fire but on the short run it will make it worse, filling the room with explosive hydrogen and poisonous lithium hydroxide. So when your water sprinklers engage over your UPS, you better be sure there's nobody around: https://www.youtube.com/watch?v=cTJh_bzI0QQ
Most high-end UPSes have a relay where you can run an active-high or active-low emergency power off (EPO) signal. The EPO can either be a button that is pressed manually by the staff, automatically via fire suppression system, or both/either.
Schneider-APC white paper (PDF)
* https://download.schneider-electric.com/files?p_File_Name=AS...
The EPO can also cut-off the HVAC so oxygen is no longer fed into the area, and smoke isn't (re-)circulated.
In the US, this is probably covered in NFPA 75, "Standard for the Fire Protection of Information Technology Equipment":
* https://www.nfpa.org/codes-and-standards/all-codes-and-stand...
https://lafibre.info/ovh-datacenter/ovh-et-la-protection-inc... (posted in previous threads)
edit: That comment was snide, my heart goes out to the OVH team, the message within the video was good, forthright & honest. I hope it will be well received by their customers - just a shame it's a bit difficult to listen to!
AWS/Google/Azure, if this happens, there should only be limited outage to a small fraction of customers. As a matter of fact, Google had such an incident before, and literally no customers (internal and external) noticed.
If you are using AWS, Google, or Azure, ran a single (or multiple machines) inside a single AZ with no backups and opted out of snapshots, you would face the exact same situation.
I can definitely say I see people complaining about how everything they have is down on AWS when us-east-1 goes down periodically, while large players that deploy sanely like Netflix fail over to another region seamlessly.
This [only owning a single machine at all] is what most of their customers whinging the most were doing. People that have actual sane production workloads on AWS or GCP are not going to be running 100% of their workload on a single EC2 instance with no backups.
People that are running on OVH are running often things like gameservers etc that monopolise 100% of a physical machine and don't support horizontal scaling. You quite literally cannot force a srcds/hlds server to "load balance" dynamically and fail over on heartbeat.
Often they are kids or students too, and the $30/m for a machine with 32-64GB ram is all they can afford (though this doesn't absolve them of paying $1-2/m more for offsite backups elsewhere)
You can provision more physical machines with the OVH API and have them be up in a different city in a minute or two. You get linespeed bandwidth between OVH DCs. It's up to you to use it.
And well, I guess this is one of the reasons.
When you buy a storage API, sure, failure rates go up, latency increases 100x, but after a few hours its probably back to normal.
Of course, with the increased abstraction, you get more problems. "Availability zones" are useless when most cloud outages are because of configuration or systemic issues that tend to bring the whole thing down, no matter which AZ you are. But apparently it's now considered "good enough" to just go "oh we are down because AWS is down".
Yet somehow, at smaller providers and dedicated hosters bandwidth is usually included as a too-cheap-to-meter feature. Gotta love cloud innovation.
OVH has a lot of dedicated servers as well though, so if you're using one of those then it can't be moved very easily to avoid downtime.
Knowing nothing about OVH, I just typed "ovh datacenters" into Google and the first hit was this: https://www.ovh.com/world/us/about-us/datacenters.xml with the first sentence being "27 data centers around the world, including 2 of the largest ones".
list broken down a bit more on this page
The buildings are very close together: https://cdn.baxtel.com/data-center/ovh-strasbourg-campus/pho...
And the fire looked really hot...like melting steel containers hot: https://pbs.twimg.com/media/EwGqV17XMAMF_wa?format=jpg&name=...