Extendible hash table–Deletion II

The naïve overflow handling I wrote previously kept me up at night. I really don’t like it. I finally figured out what I could do to handle this in an elegant fashion.

The idea is to:

Find the furthest non overflow piece from the current one.
Read its keys and try to assign them to its natural location.
If successfully moved all non native keys, mark the previous piece as non overlapping.
Go back to the previous piece and do it all over again.

Maybe it will be better to look at it in code?

There is quite a lot that is going on here, to be frank. We call this method after we deleted a value and go a piece to be completely empty. At this point, we scan the next pieces to see how far we have to go to find the overflow chain. We then proceed from the end of the chain backward. We try to move all the keys in the piece that aren’t native to the piece to their proper place. If we are successful, we mark the previous piece as non overflowing, and then go back one piece and continue working.

I intentionally scan more pieces than the usual 16 limit we use for put, because I want to reduce overflows as much as possible (to improve lookup times). To reduce the search costs, we only search within the current chain, and I know that the worst case scenario for that is 29 in truly random cases.

This should do amortize the cost of fixing the overflows on deletes to a high degree, I hope.

Next, we need to figure out what to do about compaction. Given that we are already doing some deletion book keeping when we clear a piece, I’m going to also do compaction only when a piece is emptied. For that matter, I think it make sense to only do a page level compaction attempt when the piece we just cleared is still empty after an overflow merge attempt. Here is the logic:

Page compaction is done by finding a page’s sibling and seeing if we can merge them together. A sibling page is the page that share the same key prefix with the current page except a single bit. We need to check that we can actually do the compaction, which means that there is enough leaf pages, that the sizes of the two pages are small enough, etc. There are a lot of scenarios we are handling in this code. We verify that even if we have enough space theoretically, the keys distribution may cause us to avoid doing this merge.

Finally, we need to handle the most complex parts. We re-assign the buckets in the hash, then we see if we can reduce the number of buckets and eventually the amount of memory that the directory takes. The code isn’t trivial, but it isn’t really complex, just doing a lot of things:

With this, I think that I tackled the most complex pieces of this data structure. I wrote the code in C because it is fun to get out and do things in another environment. I’m pretty sure that there are bugs galore in the implementation, but that is a good enough proof of concept to do everything that I wanted it to do.

However, writing this in C, there is one thing that I didn’t handle, actually destroying the hash table. As it turns out, this is actually tricky, I’ll handle that in my next post.

RavenDB

RavenDB Cloud

Try

Experience interactive demos and playground server

RavenDB Docs

RavenDB Cloud Docs

Documentation Guide

Download

Features

Performance

Comparison

What’s New

Demo

Bootcamp

Webinars

Workshops

Inside RavenDB Book

GitHub

StackOverflow

Articles

Whitepapers

Events

Promotional Materials

Unlock your business potential

Use Cases

Articles

Whitepapers

Press Releases

Industry Reports

Performance

Comparison

Proof of Concept Program

Academic Program

Events

What’s New

Roadmap

On-premise Pricing

Cloud Pricing

Support

Proof of Concept Program

Academic Program

Extendible hash table–Deletion II

Woah, already finished? 🤯

Related Articles

CollabTalk Podcast | Episode 123 with Oren Eini–Building a business with Open Source foundations

RavenDB’s storage engine: Voron–unlocking the secret

Certificates from the Ground Up

Watch Live Demo

RavenDB

RavenDB Cloud

Try

RavenDB Docs

RavenDB Cloud Docs

Documentation Guide

Download

Features

Performance

Comparison

What’s New

Demo

Bootcamp

Webinars

Workshops

Inside RavenDB Book

GitHub

StackOverflow

Articles

Whitepapers

Events

Promotional Materials

Use Cases

Articles

Whitepapers

Press Releases

Industry Reports

Performance

Comparison

Proof of Concept Program

Academic Program

Events