On March 10, 2021, the largest hosting/cloud provider in Europe, OVH, suffered an unfortunate disaster when one of its hosting facilities burned down. Thankfully, no one was injured.
OVH staff went into action and sent out instructions for their clients, as appear below, to initiate disaster recovery plans since the whole site had to be isolated:
We are currently facing a major incident in our DataCenter of Strasbourg with a fire declared in the building SBG2.
Firefighters were immediately on the scene but could not control the fire in SBG2.
The whole site has been isolated, which impacts all our services on SBG1, SBG2, SBG3 and SBG4.
If your production is in Strasbourg, we recommend to activate your Disaster Recovery Plan.
On Twitter, an impacted OVH customer tried to lift the spirits of those impacted by the bad news of the outage, posting:
However, from media reports, it appears that some OVH customers did not have backups of their data or even have a tested, execution-ready disaster recovery plan in place.
Lessons to Learn
The cloud is often described as “your data in someone else’s data center,” and data center fires are a risk that exists in both the cloud and in on-premises data centers.
In the cloud, major vendors like AWS and Azure give users tools and API’s to back up and recover their infrastructure to other data centers and even other regions. However, just because those tools exist doesn’t mean that they have been tested and put into place by someone who is building on the cloud.
TotalCAE does plan, test, and have the ability in our hosted cloud platform to failover to a different datacenter or entire region, in the event that a similar disaster would strike our partner cloud data centers.
For clients where we host their HPC clusters in their corporate data centers, we offer HPC cloud as a low-cost disaster recovery strategy in the event that a data center would be unavailable due to fire, flood, earthquake, or any other threat.
So, the lesson here is that while the cloud is a great technology for agility and flexibility, don’t forget: the same rules and lessons of on-premises still apply. Don’t neglect disaster recovery and backups in your HPC cloud plans.