Production postmortem: The server ate all my memory
A customer reported a scenario where RavenDB was using stupendous amounts of memory. In the orders of tens of GB on a system that didn’t have that much load.
Our first suspicion was that this is an issue with reading the metrics, since RavenDB will try to keep as much of the data in memory, which sometimes leads users to worry. I spoke about this at length in the past.
In this case, that wasn’t the case. We were able to drill down into the exact cause of the memory usage and we found out that RavenDB was using an abnormally high amount of memory. The question was why that was, exactly.
We looked into the common operations on the server, and we found a suspicious query, it looked something like this:
from index ‘Sales/Actions’
where endsWith(WorkflowStage, ‘/Final’)
The endsWith query was suspicious, so we looked into that further. In general, endsWith requires us to scan all the unique terms for a particular field, but in most cases, there aren’t that many unique values for a field. In this case, however, that wasn’t the case, here are some of the values for WorkflowStage:
In total, there were about 250 million sales in the database, each one of them with a unique WorflowStage value.
What does this mean, in terms of RavenDB query execution? Well, the fields are indexed, but we need to effectively do:
This isn’t the actual code, but it will show you what is going on.
In other words, in order to process this query, we have to scan (and materialize) all 250 million unique terms for this field. Obviously that is going to consume a lot of memory.
But what is the solution to that? Instead of doing an expensive endsWith query, we can move the computation from the query time to the index time.
In other words, instead of indexing the WorkflowStage field as is, we’ll extract the information we want from it. The index would have one of those:
IsFinalWorkFlowStage = doc.WorkflowStage.EndsWith(“/Final”),
WorkflowStagePostfix = doc.WorkflowStage.Split(‘/’).Last()
The first one will check whether the value is final or not, while the second just gets the (one of hopefully a few) postfixes for the field. We can then query using equality instead of endsWith, leading to far better performance and greatly reduced memory usage, since we don’t need to materialize any values during the query.