As your database grows, you will inevitably reach a point in time when a dataset you operate on exceeds the capacity of a single machine. Database Sharding is one possible solution in this situation – splitting the database into several pieces and placing each one of those pieces on a separate machine.
There are many subtle but important details you need to consider, even though this sounds like an easy solution to an unpleasant challenge.First, you are introducing change at the lowest level of your application, and now you may need to propagate these changes all the way up to the Data Layer of your application; this often requires a complete redesign of the data access code.
Second, careful planning is needed – not all workloads are created equal, so you must carefully analyze and plan the sizing and positioning of the shards you will create. You do not want to overwhelm some of the servers while having others sitting idle most of the time.
Finally, your new sharded setup must be not only an appropriate setup to support further growth in data volume and traffic – it also must not get harder to operate over time. With so many fine details and every project being unique, it is easy to make mistakes that would turn out to be hugely expensive in the future.
When designing and implementing Sharding in RavenDB v6, we considered all these challenges. As a result, RavenDB can scale infinitely without sacrificing performance. Furthermore, we concentrated on the challenging part – developer experience and operational ease of use.
Transitioning from your existing setup to a sharded one is as seamless as it can be – once you create a sharded database, aside from the fact that your data is split across multiple machines, you will notice no change in behavior. You can move your existing RavenDB-based system to a sharded environment without a single change. We made an extreme effort to keep Sharding implementation self-contained, preventing leakage of any implementation details into your data access layer.
However, if you need more control, RavenDB also supports you by introducing Sharding by Prefix. This feature allows you to group related documents by common prefixes, giving you greater control over data placement and improving performance by minimizing roundtrips between shards.