AWS Disaster Rendered Non-Event by RavenDB Distributed Database
A trillion-dollar cloud platform fails, bringing down countless instances of applications and their component parts while not a single RavenDB client experiences a service disruption.
June 10, Frankfurt, Germany. The AWS Availability Zone in Germany went down due to a climate malfunction in one of their server farms.
Once the German cloud sector went down, red lights started flashing throughout the RavenDB cloud systems.
Immediately, a quick check was made to see if this outage happened in such a way that the RavenDB response team could create workarounds to keep the nodes running.
The disaster was diagnosed as an overheat in one of the cloud server farms. Nothing could be done except wait for the people at Amazon Web Services to fix it.
Then, the team checked all nodes inside database clusters that had one node impacted by the outage. As a distributed database, if one node goes down, the other nodes continue functioning, even picking up the slack by redistributing background tasks of the downed nodes to other database instances throughout the cluster.
Every cluster with a node in this specific availability zone kept running. None of our clients experienced a disruption in service.
Once the AWS servers were back, the working nodes updated the downed node on everything that went on while it was offline, and the node returned to work.
Clients received notifications of the event, but there was no need for further action.
In the end, everybody went back to bed.
NoSQL Distributed Database High-Availability on Two Levels
The primary advantage of a distributed database is high availability in the face of any type of external outage.
The other main advantage of a distributed system is the ability to read and write on every node. Had the database cluster in question been a relational monolith or a document database with a single-writer node, an outage to the node that reads and writes could have disabled the entire system.
Applications running on those databases likely received more than a notification and their developers probably lost some sleep over it.
RavenDB employs a multi-writer architecture where every node in the database cluster can read and write.
Even if a database on-premise goes offline, the system still works. If we are talking about the main database of a hospital and the internet went down, it can still operate as if nothing happened. Once the network comes back, the “separated” node updates and is updated by the rest of the cluster.
Even trillion-dollar cloud platforms with 99.999% availability still go down. Using a distributed database with the right features ensures that you don’t go down with them.
See for yourself!