Not all O(1) operations are considered equal
At some point in any performance optimization sprint, you are going to run into a super annoying problem: The dictionary.
The reasoning is quite simple. One of the most powerful optimization techniques is to use a cache, which is usually implemented as a dictionary. Today’s tale is about a dictionary, but surprisingly enough, not about a cache.
Let’s set up the background, I’m looking at optimizing a big indexing batch deep inside RavenDB, and here is my current focus:
You can see that the RecordTermsForEntries take 4% of the overall indexing time. That is… a lot, as you can imagine.
What is more interesting here is why. The simplified version of the code looks like this:
Basically, we are registering, for each entry, all the terms that belong to it. This is complicated by the fact that we are doing the process in stages:
- Create the entries
- Process the terms for the entries
- Write the terms to persistent storage (giving them the recorded term id)
- Update the entries to record the term ids that they belong to
The part of the code that we are looking at now is the last one, where we already wrote the terms to persistent storage and we need to update the entries. This is needed so when we read them, we’ll be able to find the relevant terms.
At any rate, you can see that this method cost is absolutely dominated by the dictionary call. In fact, we are actually using an optimized method here to avoid doing a TryGetValue() and then Add() in case the value is not already in the dictionary.
If we actually look at the metrics, this is actually kind of awesome. We are calling the dictionary almost 400 million times and it is able to do the work in under 200 nanoseconds per call.
That is pretty awesome, but that still means that we have over 2% of our total indexing time spent doing lookups. Can we do better?
In this case, absolutely. Here is how this works, instead of doing a dictionary lookup, we are going to store a list. And the entry will record the index of the item in the list. Here is what this looks like:
There isn’t much to this process, I admit. I was lucky that in this case, we were able to reorder things in such a way that skipping the dictionary lookup is a viable method.
In other cases, we would need to record the index at the creation of the entry (effectively reserving the position) and then use that later.
And the result is…
That is pretty good, even if I say so myself. The cost went down from 3.6 microseconds per call to 1.3 microseconds. That is almost 3 folds improvement.