Challenge: The code review bug that gives me nightmares–the issue
In my previous post, I discussed a bug that brought up in code review, that bug made me go into near panic mode. Here is the issue:
In order to understand this bug, you have to take into account multiple concurrent threads at the same time. Look at the ComputeHashAndPutInCache() method, where we register an eviction callback for the item in the cache. When we evict the item, we return the buffer to the buffer pool.
We want to avoid allocating memory, so that is certainly something desirable, no? However, consider what happens if I have a thread in ComputeHash(), getting a value from the cache. Before I have a chance to look at the value, however, the cache will decide to evict it. At which point the eviction callback will run.
We’ll return the buffer back to the buffer pool, where it may be used again by something else. I am also using this buffer to do other things from the caller of the ComputeHash() call. This is a somewhat convoluted use after free issue, basically.
And I find is super scary bug because of its affects:
- Randomly, and rarely, some buffer will contain the wrong data, leading to wrong results (but hard to track it down).
- Trying to find such a bug after the fact, however, is nearly impossible.
- Most of the debugging techniques (repeating the operation for a particular value) will make it go away (the cache will keep the value and not evict it).
In short, this is a recipe for a really nasty debug session and an impossible to resolve bug. All from code that looks very much like an innocent bystander.
Now, I can obviously fix it by not using the array pool, but that may cause me to allocate more memory than I should. How would you approach fixing this issue?