Implementing a file pager in Zig: Using mmap December 24, 2021 Author: Oren Eini, CEO RavenDB
Read the orginal blog post

Implementing a file pager in Zig: Using mmap

by Oren Eini, CEO RavenDB December 24, 2021

Now that we know what we want to implement, let’s dig a bit deeper and see how to do it. An interesting way to implement a file pager is to… not do that. Instead, we can rely on the OS’ memory mapping to do most of the heavy lifting. Let’s see how we can do that.

The first thing that we need to manage is the setup and teardown of the pager, which you can see here:

There isn’t much here, we simply call mmap() and that is about… it. Let’s see how we can implement the actual pager behavior. We’ll start with the easy pieces here getting and releasing the memory from the pager:

You’ll notice that we don’t actually have anything here? Even the act of checking that the page is within the bound of the mapped memory is done by slicing the ptr directly. What about the blocking part? How do we actually move the data to memory? The answer is that we aren’t. When you access the pointer we return from the get(), we’ll just get a page fault and the OS will read the data from the disk.  The release() function also doesn’t need to do much, all the behavior is inside the mmap() implementation, after all.

A bit more complex is the part where we try to get the pages from the disk, here is the tryGet() implementation:

That is quite a bit of code for not much in practice. We create a temporary array and then call mincore() on the range of memory that we’ll return. If the entire range is not already in memory, we’ll call madvice() to load it in the background and return null. If the range is already in memory, just return it.

This isn’t 100% safe to do, by the way, there may be race conditions that would cause us to think that the data is in memory just as it is swapped to disk, but that is good enough for our needs. Especially because the whole thing is quite simple overall.

The next stage is to handle writes and syncing to disk. This is simplicity itself, in this model.

Since we handed out a buffer from the memory map itself, we don’t need to do any copying, we already modified that range of memory. And when we sync to the file, we can do that by a single msync() call. There are a few things to note here, though:

  • Because we are writing directly to the memory mapped file, it is possible that our changes will show up in the file before write and sync are called.
  • The msync() will sync the entire range, if we have smaller changes that we made, we can try to reduce the amount of memory that is synced by remembering what parts we have written to, but it ends up being quite a chore. And since the OS is already doing that for us, we can shell that to it directly.

And that is pretty much it. The whole pager is under 100 lines of code.

There are some things that I don’t handle, such as what happens if we want to extend the size of the file. That requires us to re-wire the mapping, if we are going by the strict reading of the API. But in both Linux & Windows, you can define a memory mapping that is greater than the file and that will automatically adjust as you grow the file. That is quite a nice feature for us and can save us a lot of management overhead internally.

With that out of the way, we can start implementing higher level functions in a storage system. But notice how we moved pretty much everything to the OS? What would it look like if we wanted to build that ourselves?

Woah, already finished? 🤯

If you found the article interesting, don’t miss a chance to try our database solution – totally for free!

Try now

Share your thoughts or ask us a question

Categories: