Disk space management inside Voron

ayende Blog

Voron is RavenDB’s storage engine. It is how we store data, keep transactions and in generally get a lot of our abilities. In this post, I want to talk about the way RavenDB manages disk space internally.  Voron uses a single data file to do its work, the data file is divided into 8KB pages, like so:

Voron uses eager disk allocations to reserve disk space from the operating system. Each time the space inside the file runs out, Voron will double the size of the file. That last until the file size reaches 2GB, after which RavenDB will grow by 1GB at a time. This behavior ensures that Voron gives the underlying file system enough information to provide the database with a continuous range of disk space. In other words, we grab disk space in large chunks to avoid fragmentation of the data file.  

What happens when you delete data, however? Voron mark the free space in its free list and will use that space before it will allocate more disk space from the operating system.

Why aren’t we releasing the disk space back to the operating system? The simplest reason is that it isn’t an just the data at the end of the file that is freed. In fact, like in the image above, free and busy segments are interwoven in the file. We can’t just truncate the file.

Internal references inside of RavenDB make use of the position data inside the file, so just moving the data won’t help. Instead, you have to compact the data. That forces us to re-write the entire database layout from scratch and fixes those references.  That is an offline operation, however.

For the most part, it doesn’t actually matter. RavenDB will use the internal free space as needed, so it isn’t like it is actually lost.

One feature that we are considering for version 6.0 of RavenDB is hole punching. That means that we’ll make use of advanced file system API to free the disk space allocated to RavenDB even mid file.

On Linux, that means using FALLOC_FL_PUNCH_HOLE. On Windows, that means using FSCTL_SET_ZERO_DATA.

That will have the advantage of freeing disk space back to the operating system without needing user intervention. In particular, that is going to make it so a user that delete data to free disk space see the free space reflected in the OS metrics.

There are problems with this approach, however. First, the size of the file remains the same, which leads to interesting questions. Consider:

image

Second, this defeats the purpose of wanting to optimize disk allocations. If we free disk space in this manner, when we get it back, it may no longer be continuous on the disk. That said, it is not that big a problem in the days of SSDs and NVMes as it was at the time of the rotational hard disk.

Then, you may get into a very bad situation in which Voron tries to use disk space that it had allocated mid file (but was already freed) but it can’t, because the disk is full. Right now, this is simply an impossible error, with hole punching, we need to consider how to deal with this.

NoSQL Database Demo

Watch
Live Demo

A customized
presentation of RavenDB