Questions to answer when sizing a RavenDB node
A common question that is raised by customers is how to determine what kind of hardware you need to run RavenDB on. I’m sorry, but the answer is it’s depend, because there are a lot of variables to juggle, in this post, I”m going to try to give some insights about what sort of things you should consider when sizing your instances.
In general, you have three axis that you can work with. CPU, Memory and I/O. In terms of the best bang for the buck, optimizing I/O is usually the way to go and will return the most dividends. This is because most of the time, RavenDB will be bottlenecked on the I/O. This is especially true when you are running on the cloud, where 500 IOPS is a fairly common default (that is basically zilch to a database).
To give more a concrete answer we’ll need more details. Let’s say that you have an application with a database per customer (common for multi tenant scenarios). The structure of the database is the same, but the databases contain data that is separated from each customer. The database has 20 indexes in total, 15 map / full text search as well as 5 for map-reduce / facets operations. There is also an few ETL tasks and a couple of subscriptions for background work.
Let’s breakdown the load on single server in this mode, shall we?
- 100 databases (meaning 100 tx merger threads for I/O).
- 2,000 indexes – 20 indexes x 100 databases (meaning 2,000 indexing threads).
Across the cluster, we also have:
- 500 ETL tasks – 5 per database x 100
- 200 subscriptions – you get the drill
The latest items are spread fairly among all the nodes that you have, but the first two are present in all nodes in the cluster.
What does this mean? We have 2,100 threads active at any given point in time? Well, that is where things gets a bit complex. We need to know more than just the raw numbers, we need to understand usage.
How many of those databases are active at any given point in time? In a multi tenant system, it is common to have many customers using the system sporadically, which can allow you to pack a lot more instances into the same hardware resources.
Of more interest, however, is usually the rate of writes. Here we need to ask ourselves what is the write write as well. In general, for reads RavenDB will load all the relevant items into memory and serve directly from there. For writes, given it’s durable nature, RavenDB must hit the disk. And the question now becomes how many database are active at the same time?
This is important, because 10 writes per second to a single database are far better than 10 writes / second across 10 databases. This is because RavenDB is able to batch I/O for a single database, but not across databases. Let’s consider the scenario where we have writes that would impact 5 indexes in the database, what is going to happen when we have 10 writes / sec in a single database?
- 1 – 5 writes to the disk for the actual documents writes (depends on a lot of factors, and assuming that we are talking about concurrent requests here).
- 5 – 10 index updates: 1 –2 index updates x 5 relevant indexes (in most cases, we are able to batch indexes even better than documents writes).
Total number of writes to disk: 6 – 15 writes.
However, if we take the same scenario, but now run it across 10 databases, each having a single write? There is no way for us to batch updates, so we’ll have:
- 10 databases x (1 document writes + 5 index updates) = 50 writes to disk.
If the number of relevant indexes is high, or if there are more databases involved, it is easy to hit the limits of I/O, especially on the cloud.
I’m actually painting somewhat bleak picture, in most cases you don’t have to worry too much about those details, RavenDB will take care of that for you. However, when you need to consider the sizing, you want to be aware of the possible load that you’ll have. Ironically enough, if you have enough load, RavenDB is able to really optimize things, it is when you have sporadic operations, spread across many locations that we start putting a lot of load on the underlying system.
So far, I was talking about I/O only, but there are other factors as well. Let’s assume that you are running 100 databases with 20 indexes each on a system with 4 cores. How is RavenDB going to split the load across the system?
The first priority is going to be given to processing requests, and then we’ll start on running indexes. That is actually by design, to ensure that we won’t overwhelmed the underlying system by issuing too much work all at once. That means that we’ll round robin the work across all the indexes that want to run, while keeping enough capacity to process user requests. In this case, more cores will allow us higher degree of parallelism, but if you have an unbalanced system (a lot of CPU but slow I/O), you’re going to see stalls because we’ll wait a lot for I/O.
In short, you need to have a fair idea about how your system is going to be used. If you don’t have at least a good guess on the topic, you are probably better off getting more I/O bandwidth than anything else. RavenDB continuously monitor itself and will alert you if there are resource issues. You are then able to shore up anything that is lacking to get the best system performance.