The Requirements of a Distributed Database
To support a distributed application, you need distribution at the database layer. Multiple databases on a specific node must be able to replicate to other nodes throughout your cluster.
You can also have a cluster that replicates itself to other clusters throughout your entire database system. A node in a data cluster in the eastern United States needs to replicate itself to a node in Central Europe or South Asia. This gives you high availability and the ability to minimize latency to every user connecting to you from everywhere in the world.
Gossip Protocol is how your distributed database manages the traffic of several nodes taking in data all at once. When several nodes are taking data all at the same time, the right protocol answers who is the one managing the traffic? What happens when that node goes offline? What happens when it goes offline and suddenly comes back?
All of these are important issues that must be resolved at every moment to keep your cluster running at all times.
A Master-Master setup is vital to a distributed network. A master-slave cluster is too “monolithic”, because in the end all transactions have to go through one master node for final persistence. A master-master setup means you can read and write data locally, minimizing your latency and reaping the true gains a distributed system is designed to offer.
Another must have for a master-master cluster is that your database can run, even when nodes go offline. If you have a critical system like an application used in a hospital, your systems have to keep running even if you lose connectivity. RavenDB’s master-master cluster keeps working even if it is offline. A node will continue to accept data at its local location, updating the rest of the cluster once connectivity is restored.
Assignment failover. When a node goes down, its backup tasks are redistributed to nodes still in operation. Backups are one of the tasks that will keep going continually. Maintain equal distribution of tasks among all nodes keeps performance robust.
Clusterwide ACID is another feature that is part of an effective distributed database system. Not only do you have ACID across multiple documents, you can also extend that data integrity to all the nodes throughout your database cluster.
Conflict resolution. The good news is that data conflicts are literally one in a million transactions. But at current CPU speeds, this one in a million happens 300 times per second. Your distributed database must be able to handle these conflicts automatically so you don’t have to manually settle 18,000 issues every minute.
External Replication. ETL lets you ferry your own data from nodes in New York to nodes in London and Tokyo. You can share data over multiple clusters to reduce latency and boost performance for everyone.