Databases will typically divide their data files into pages, and manage all data in pages. As you can imagine, the notion of pages is pretty important to a database. Up to this point, we worked with pages, but not really. A page was just a buffer that we could read and write to. It took me 13 posts to manage to read and write pages from disk, but that is just the beginning. We need to add some structure for the pages. The usual way to do that is to have a page header, which contains some metadata about the page. That leads to an interesting question, how do we manage the page metadata?
One way to do that is to reserve a part of the page for this purpose, like so:
This is probably the simplest option (and what we use in Voron), but I don’t like it very much. It makes certain operations harder to deal with, because the usable size of the page is no longer a power of 2. Assuming we have 8KB pages and a page header of 64 bytes, that means that we can use just 8,128 bytes in the page. A lot of data structures are far easier to use if we can assume that the size is a power of 2.
Another thing to remember is that the underlying pager that we use operates on the level of 2 MB chunks. In other words, reading a value from the same 2MB range is as cheap as reading a value from the same 8KB page. For that reason, we can utilize the following approach, we’ll reserve the first section of the chunk for all the page headers. Here is what this will look like:
A chunk is 2MB in size, and pages are 8KB. That gives us 256 pages in a chunk. If we reserve the first page for page headers, that gives us 32 bytes for each page header. That is quite a lot, to be honest, which is great, because having more space on the header gives us a lot of options down the road.
An important consideration to pay attention to is that we read from the disk at 2MB chunks, but when we write, the writes happen at the page boundary. A change on Page #3, for example, means that we have to change both the header page and then page #3 as separate operations. If we were using an embedded page header, we’ll have just a single write to deal with. In practice, I don’t think this is a major issue. We are doing asynchronous writes and write coalescing anyway. For that matter, writes are actually done lazily (and the critical path goes to the Write Ahead Log anyway) so I don’t think this aspect matters so much.
At any rate, let’s get some idea about what the page header will look like:
I should note that this is merely a skeleton of the page header, just showing its structure. Over time, we’ll add a lot more details, but this is sufficient to understand the direction I’m going. A page header is denoted by a tag, which determines what type of page this is. Based on the type of the page, we can set various fields inside the header. You can also see the getPageCount(), which is a good demonstration of how we work with the page header. I need to have fine-grained control over the bytes in the header, so I’m setting them up in this manner.
Zig currently has a number of bugs related to packed / extern structs and unions, which make this a bit harder than expected, sadly. I’m waiting for the self hosted compiler, which will hopefully fix those issues.
With the PageHeader struct in place, we can start making use of this, like so:
There isn’t much going on here, which is great, but it does deserve some explanation. We first get the first page in the chunk for the page that was requested. For example, if we wanted to get page # 4, we get page #0, if we wanted to get page # 300, we’ll get page #256. That page is the metadata page for the chunk, the only thing that it contains is the headers for all the pages in the chunk in question (including itself, of course).
The reason that this matters is that the header for the page is used to compute the number of pages that we need to return. As you’ll note, the API we have here does not provide a way for the caller to say how many pages should be returned. We need to figure it out on our own. To do that, however, we need to store that information somewhere. That is where the page header comes into place.
We get the page header of the page we care about from the metadata page of the chunk and compute the number of pages to get from the pager. Then we return to the caller the page header and the page buffer as a single unit.
Of course, this presupposes that the values got to the pages somehow. That leads us to other directions, however, dealing with page allocations. You can already see some hints of that in the code above, where we use freeSpaceBitmap. I’ll dedicate my next post to this topic.