Inside RavenDB 4.0

Encrypting your data

There are three states of data: data in transit (when it flows through the network), data in use (when it's actively being read and modified) and data at rest (when it's in a stable storage). In the previous chapter, we discussed securing your data in transit using TLS 1.2 and strong encryption. In this chapter, we will focus on securing your data at rest.

Some data is intrinsically public, such as a press release. In contrast, some data is very private, such as healthcare, financial and personally identifiable information. Part of any security consideration is the notion of "defense in depth": Even if your servers are protected (physically and virtually), you must still consider the case that someone will be able to get their hands on your data.

RavenDB supports strong encryption (XChaCha20Poly1305 with a 256 bit key) to allow for full and transparent protection of your entire database. This feature ensures that nothing unencrypted is ever written to the disk, and that even in memory, outside of a running transaction, everything is encrypted.

In many industries, data encryption is a regulatory requirement. (PCI and HIPAA come to mind.) Even if an application doesn't require it, encryption is a fairly routine request that can provide you with the benefit of additional safety. Encrypting data at rest doesn't replace other security measures (such as limiting access to your database, encrypting the communication lines, protecting your access credentials, etc.) but it does complement them.

The major advantage of having encryption at the database level is that you don't need to change anything in your applications or clients. RavenDB will take care of this encryption behind the scenes — with no external changes for you to deal with. A user with access to this encrypted database can simply log in, query documents, modify them, etc., while RavenDB handles encrypting and decrypting the data as needed. RavenDB requires that any access to an encrypted database will use HTTPS, which takes care of the data in transit portion as well.

What doesn't database encryption protect you from?

Encrypting the database means that if you open your database file, the data inside it will appear indistinguishable from random noise unless you have the key. This means that if the hard disk is lost or stolen, you can be confident that the data on it will be inaccessible to others. But that's just one threat vector.

Encrypting the database will not protect you from breaches through authorized credentials; if a user has permissions to your database, RavenDB will decrypt the information from the disk and hand it over to this authorized user.

Such encryption will also place only a few hurdles in the path of someone who can execute code on the database machine (as the database user or as root) because he or she will be able to connect to RavenDB using the rvn admin-channel and register a certificate, then just access the data normally.

RavenDB goes to great lengths to ensure that on disk, and even in memory, your data is encrypted. In fact, the data is only decrypted when there is an active transaction — and even then, only the pieces that are touched by that transaction are left unprotected. Once the transaction is complete, RavenDB will zero the memory to erase the sensitive data.

Database encryption should be deployed as part of a comprehensive security strategy, including controlled access to the machines, a secure backup strategy (addressing security concerns with your high-availability and off-site deployment), management of your keys, and the appropriate audit and monitoring tools. This is a much wider topic than can be covered in this book, so I'll focus here only on the details of data encryption in RavenDB.

Before we get into the details, I want to be sure to mention that encryption has a cost. In terms of database performance, this cost is usually around 15% to 20%, depending on the exact load. In most cases, that's a reasonable price to pay for the additional security afforded. But there are also additional costs in managing encrypted databases: key management and backup, secure backups, being able to get the encryption key when you need to restore the database, etc.

Together, these can add up to a significant operational overhead (as much as RavenDB strives to reduce it). I suggest confirming that you actually need the benefits of database encryption before using this feature in RavenDB, rather than just saying "encryption is good" and pressing forward needlessly.

To get started, we'll review how to define an encrypted database through the Studio. Then, we'll dive into what goes on behind the scenes and how RavenDB actively protects your data.

Setting up an encrypted database

The first step of creating an encrypted database is to verify that your cluster is running in a secured mode. There's no point in securing the data on disk if anyone on the network can see the data going in and out. In the previous chapter we reviewed the steps required for a such a setup, so we can skip the details here.

A database in RavenDB can be marked as encrypted when it's first created. You can see how this looks in the Studio in Figure 14.1.

Figure 14.1 Creating an encrypted database

Creating an encrypted database

An encrypted database uses a 256 bit key to protect all its data. Without this key, the data only appears as random noise. (So, as you can imagine, your key is important. More on that later.) RavenDB's design also offers no way for a person to obtain the encryption key of an existing database. You'd have to know the key in advance. The properties of the key are also important. The key is a 256 bit value, generated using a cryptographically strong random number generator. While you can provide your own key, in general, there is little reason to bother.

Regardless, you should be sure to keep a copy of the encryption key somewhere safe, so that you can access it and the database later on if needed. RavenDB makes such a decision explicit, as shown in Figure 14.2.

Figure 14.2 The encryption configuration includes just the key and requires that you store a copy of it.

The encryption configuration includes just the key and requires that you store a copy of it.

In RavenDB, you must confirm that you have a copy of the key saved somewhere. For convenience, RavenDB even lets you print the key and its QR code. The idea is that you can print this page and then file it away in a locked cabinet somewhere — as even the most sophisticated computer attack will have a hard time reaching information stored on paper! Plus, a hard-copy backup is beyond easy as a protection step to implement.

Of course, many organizations already have policies for encryption key usage, storage and backup. Sometimes it's with a hardware security modules. Sometimes it's using "vault" services such as Microsoft Azure Key Vault, Keywhiz or HashiCorp Vault. Whether part of a formal policy or not, you as the admin should ensure that there's a copy of the encryption key.

What can an admin do with the encryption key?

Given the emphasis I have just placed on the admin holding a copy of the encryption key, you might think you'll be using it often. But that can't be further from the truth. During normal operations, there are no cases where the key is needed. Instead, the key is needed when you're restoring the database from a snapshot backup (for example, if the machine is lost and you need to move the data elsewhere) or if you're adding a node to the encrypted database group and want all the nodes to use the same key (which is suggested, but not required).

Another requirement for encrypted databases is that the admin must select which nodes to include in the encrypted database group. Usually, RavenDB can select these nodes based on its own preferences. But for encrypted databases, an admin must specify them directly. This distinction is designed to handle the cases where servers might have different security zones in the same cluster.

Storing the encryption keys

Only the nodes participating in the database group will have the encryption key for that database. When you first create the encrypted database, RavenDB will contact each node hosting the database and identify the encryption key to use — over an encrypted HTTPS request, of course.

This is why admins must manually select the nodes that will participate. These nodes must be up and available during the database's launch so that they can accept the new key.

Once the database is created, what I've described here is pretty much it — at least as far as deviations from the standard usage and configuration of databases.

Full database encryption

Caesar, about 21 centuries ago, used a cipher to send messages securely, by shifting the letters of his messages by three characters. When most of the population was illiterate, this approach was probably sufficient by way of security. Today, the science of cryptography and cryptoanalysis is a bit more sophisticated.

The goal of encryption is to take your input and a key, and then generate a random-looking pattern of bytes from it. This offers no way of going back to the original input without the key.
But that's only part of what a good encryption scheme must deal with today. Modern cryptography should handle other things, such as the timing (and other side channels) of an attack, forward secrecy, authenticated encryption and many other details that are crucial for the security of the system but tend to be rather obtuse, onerous and obscure to non-experts.

RavenDB uses Daniel J. Bernstein's XChaCha20Poly1305 algorithm to encrypt your databases, as implemented by the libsodium library. Both the algorithm and the library have been analyzed and audited by cryptographic experts. Both passed with flying colors.

I'm not going to discuss the actual encryption algorithm here, which would be quite out of scope for this book. You can find the gory details elsewhere. But I am going to focus on the way RavenDB uses encryption to protect your data. You can safely skip this section if you'd like, as it has a very little impact on using and operating RavenDB.

Internally, RavenDB holds your data inside a data file (usually called Raven.voron) that's memory-mapped to the RavenDB process. We also use temporary files1, which are typically found in the Temp directory and have names such as: Temp/scratch.0000000000.buffers and Temp/compression.0000000000.buffers. There are also the write-ahead journals, which are the key to RavenDB's transactional nature and ACID capabilities. These are stored in the Journals directory with names such as Journals/0000000000000000001.journal and Journals/0000000000000000002.journal.

All these files contain some portion of your document data. As such, they need to be encrypted. Let's see how we deal with encrypting each of them in turn in the next section.

Encrypting the write-ahead journal

The write-ahead journal is a set of files (Journals/0000000000000000001.journal, Journals/0000000000000000002.journal, etc.) that RavenDB uses to maintain its ACID guarantees. Each journal file is allocated in advance (typically 256 MB at a time), and a new transaction is written to the file whenever it's committed. Indeed, a transaction cannot be considered committed unless it's been successfully written to the journal file.2

A journal file is a set of consecutive transactions. When RavenDB opens a database, it will read the journal file, find all the transactions that haven't yet been synced to disk and apply them to the data file. In this way, we can be certain - even after a crash - that no data has been lost. Without encryption, a transaction is protected using a non-cryptographic hash (XXHash64) to ensure that the full transaction has been written to disk. This lets us verify whether a transaction was committed or not.

Authenticated encryption

If I took the following text {"User": "Oren", "Admin": "N" } and "encrypted" it using the Caesar cipher, I would get the following output text: {"Xvhu": "Ruhq", "Dgplq": "Q" }. Figure 14.3 shows the encryption key for this cipher.

Figure 14.3 The encryption key for the Caesar cipher

The encryption key for the Caesar cipher

I'm using the Caesar cipher here because it makes it easier to talk about encryption, while staying simple enough that we don't need to delve into complex mathematics to discuss the details.

A common use pattern for encrypted data is to hand it to an untrusted party, then accept it back from that party later on. A good example of this is the cookies in your browser. The text above would be used as the session cookie to remember the user among different HTTP requests — but obviously with a better encryption algorithm.

Now, imagine that somewhere in your code you have a line such as isAdmin = GetSessionCookieData().Admin != 'N'. We give the cookie to the browser, and the user is free to modify it. What happens if we change the encrypted text to {"Xvhu": "Ruhq", "Dgplq": "R"}? The only change we made was to flip the "Q" at the end to "R". When decrypted, the output will be {"User": "Oren", "Admin": "O"}, and suddenly the user is considered an admin.

In other words, just because the encrypted text was decrypted successfully doesn't mean that its original value remains — this text might have been tampered with. There have been real attacks using this angle.

Because of this risk, all modern encryption algorithms use a mode called Authenticated Encryption (with Additional Data), which is usually shortened to AEAD. In this mode, the algorithm not only encrypts the data but also computes a cryptographically secure hash over it (and potentially over additional data as well) — and then signs it.

Similarly, during decryption, the signature is checked first and the decryption fails if the signature doesn't match. Any tampering with the data will be caught this way. RavenDB uses only AEAD encryption to protect your data, so any attempt to modify it will be immediately detected. Such modification can be done maliciously or as a result of a hardware failure (bit flipping in storage, for example).

During encryption, we store only the transaction header unencrypted in the journal file. The transaction data itself is encrypted using XChaCha20Poly1305. This is an authenticated encryption algorithm, giving us cryptographic assurance that if the decryption was successful, the data we got matches the data we encrypted. Since we're already verifying the integrity of the data, we don't bother to also use XXHash64 on the transaction when using encryption. Each transaction is encrypted with a different key, derived from the master key, the transaction ID and a 192 bits random nonce.

Encrypting the main data file

The main data file (Raven.voron) contains all the data in your database. Unlike the journals, which are written to in a consecutive manner — one transaction at a time — the data file is written to and read from using random I/O. To handle this mode of operations, the data file is split into 8 KB pages3 that can be treated as independent of one another. If there's a value over 8 KB in size, then the file will use as many consecutive pages as needed and will be treated as a single page, as only the first of those pages will have a page header.

Each page is encrypted independently. This means that when we need to read a page, we can go directly to that page, decrypt it and read its content without having to touch anything else in the database. This process grants us the ability to do random reads and writes through the database. The structure of a page is shown in Figure 14.4.

Figure 14.4 The internal structure of an encrypted page in RavenDB

The internal structure of an encrypted page in RavenDB

As you can see in Figure 14.4, the page is composed of a header, nonce, MAC and the data itself. You're already familiar with the nonce. But what is the MAC field for? This is the message authentication code, which is used to verify that the page hasn't been modified (see the section on authentication encryption earlier in this chapter). Another interesting tidbit is that the space we have for the nonce is only 128 bits (16 bytes), but we know that the XChaCha20Poly1305 algorithm uses a 192 bits (24 bytes) nonce. Listing 14.1 shows what's actually going on.

Listing 14.1 Internal structure of the page header and the full nonce usage


+--------+
|Page    |
|Header  |
|        |
|32 bytes| <-------+
+--------+         |
|Nonce   |      Actual nonce
|16 bytes|         |24 bytes
+--------+ <-------+
|MAC     |
|16 bytes|
+--------+

When RavenDB needs to encrypt a page, it will generate a 128 bits random value and store it in the nonce portion of the page header. However, when we need to pass a nonce to XChaCha20Poly1305, we will pass a value that is 24 bytes in size, starting 8 bytes before the nonce. In other words, the nonce also contains 8 bytes from the page header. In practice, this means that the nonce is using 128 bits of randomness with an additional 64 bits that will change as RavenDB sees fit. Each page is encrypted using a dedicated key, derived from the master key and the page number. The page header is stored unencrypted, of course, but the page's contents are encrypted.

How does RavenDB access encrypted data?

RavenDB keeps all the data in the database encrypted at all times, both on disk and in memory. Whenever a transaction requires access to a particular page, that page is decrypted into memory owned by that transaction. For the duration of the transaction, the unencrypted values touched by this transaction will be kept in memory. When the transaction is over, that memory will be securely wiped.

To further protect your data, RavenDB will attempt to lock the unencrypted data in memory, so that it will not be written to a page file and will not be visible in core dumps. This is done by calling mlock or VirtualLock, depending on the system in question.

Locking memory into physical RAM is subject to certain limitations and may require you to change the system configuration to let RavenDB lock enough memory to handle routine operations. If your threat model doesn't include worrying about attackers digging into the page file, you can tell RavenDB that failing to lock memory is fine by using the following configuration option: Security.DoNotConsiderMemoryLockFailureAsCatastrophicError.

This might be a valid choice if the system doesn't have a swap or page file defined, for example, or if you're using encrypted swap already and don't need to worry about data leaks from there.

RavenDB also uses a few temporary files (Temp/scratch.0000000000.buffers and Temp/compression.0000000000.buffers, for example). In terms of encryption, there are two file types that we care about. First are the scratch files. This is the place where RavenDB writes your data until it's written to the data file. These files are encrypted in the exact same way as the data file itself. Whenever you need to access data from one of these files, they're decrypted on temporary storage during the transaction and then wiped after it's completed.

The other set of files are used as temporary buffers and are wiped immediately after use. The compression set of files, for example, is used as part of writing to the journal. We write the transaction data to the memory-mapped compression file and then compress, encrypt and write it to the disk. Once that's done, we securely wipe the compression file to remove all traces of your data from memory.

So, what's encrypted, you ask? Everything stored in the database file. This includes:

  • Documents
  • Revisions
  • Conflicts
  • Attachments
  • Tombstones

What's not encrypted? Values that are stored at the cluster level. These are:

  • Identities
  • Compare exchange values
  • The database record

Identities aren't generally considered sensitive information. But compare exchange values most certainly can contain data you'll want to keep private. Most important, the database record might contain connection strings to other databases. This is relevant only if you're using ETL SQL and providing the password in the connection string. In that case, the full connection string is stored at the cluster level and is not affected by your database's encryption mode. To enable encryption at the cluster level, you'll need to take additional steps, as we'll see now.

Notice the logs output

In an encrypted database — likely storing high-value data — be sure to pay attention to the output of the log. RavenDB doesn't generally log documents' data to the log file, even on the most verbose mode, but it can certainly write document IDs in certain cases. If document IDs themselves are sensitive data, you should either ensure that the logs directory is encrypted or disable logging entirely.

Encrypting the cluster information

In addition to storing your database-level data, RavenDB also stores data at the cluster level. Data stored at the cluster level is usually referred to as the server store and is managed independently by any node in the cluster. The data stored there includes all the databases' records, identities, compare exchange values, etc. These are stored in all the nodes in the clusters, including for databases that don't reside on this particular node.

You can also encrypt the information on the server store. Although, doing so is a bit more involved than the process of encrypting a database, and you must repeat this operation on all the nodes in the cluster. Here's how:

  1. Shutdown the RavenDB node
  2. Run rvn offline-operation encrypt /path/to/system-db
  3. Restart the RavenDB node

The key here is in the second step. This action loads the existing server store (which typically resides in the System directory), generates a new key (see the section on key management later in this chapter) and then encrypts the server store using this key.

This process should be done on all the nodes in the cluster, and it will typically result in a different key being generated for each node. Note that RavenDB does not enforce a server store encryption on every node. This is to allow for a rolling migration of encrypting the server store (taking one node at a time, encrypting it and restarting it). If you do decide to encrypt your server store, make sure to involve all nodes in the cluster — including when you are adding new nodes. You can also run the rvn offline-operation encrypt command before adding any new nodes to the cluster so that it won't ever write unencrypted data to the disk.

Encrypting indexes

In addition to the main data file, there are indexes to consider. Each index has a separate Raven.voron file, its own Scratch and compression files, etc. And just like the main data file, indexes are encrypted on all levels, using the exact same techniques we just discussed.

Key derivation and additional security

You might have noticed that pages and transactions aren't encrypted using the master key. Instead, each time you need to encrypt a value, RavenDB generates a derived key for that specific purpose. The idea is that even if — due to some unforeseeable error — an attacker were able to figure out the key for a particular page or transaction, all your other data would remain protected.

The key derivation function we use ensures that attackers can't go back to the master key from which a derived key was generated. This way, even full key exposure for a particular part of the data won't expose your entire database.

During queries, the indexing transaction decrypts the relevant pages so that you can perform searches normally. It then wipes the data from the memory when the query is completed. There's one exception to this rule: in memory caches that the indexing engine uses for optimization purposes.

These caches contain the indexed terms as memory arrays and are kept outside the transaction boundary. This is because creating them can be quite expensive, in terms of time and number of allocations. Because of that, they are created once and retained for as long as they are useful. They're never written to the disk, but they might be written to the page file.

If you're concerned about the safety of this type of data, either make sure your page file or swap is encrypted or don't index any sensitive information. (There's rarely a need to run a query using a full credit card number, for example; the last four digits will usually suffice.) Document data that hasn't been indexed isn't included in the cache!

During indexing, we also write temporary files to the indexing directory, containing the indexed data. These files are also encrypted using XChaCha20Poly1305, with a random key RavenDB generates.

Even index definitions are encrypted. So, you can rest assured that with RavendB everything going to a persistent medium is encrypted and safe.

Now, what about what goes on the network?

Encrypted data on the wire

Different nodes in the cluster may use different keys to encrypt the database. That means that we can't just send raw encrypted data from one node to another. Indeed, whenever RavenDB sends data over the network — whether as a response to a client's query or to replicate data for another node — we must first decrypt the data before sending it.

This may sound worrisome, but remember that an encrypted database can only reside on a node that's running in a secured mode — and, all communication uses HTTPS and TLS 1.2 which is both strongly encrypted and authenticated. Let's explore a few of the ways this is put into practice.

Aside from a client querying the database, there are few other ways to get data from RavenDB. Replication, external replication and ETL are the most common ones. Backups should also be considered; these are handled later in this chapter.

Replication is done between different database instances in the same database group — all of them will be encrypted (usually, but not always, with the same key). See the key management section later in this chapter for more details. External replication lets us copy data to a different database, either in the same or a different cluster. While RavenDB requires that any external replication from an encrypted database go to a secure server, it does not require the destination to be an encrypted database.

The lost (encrypted) laptop

Why doesn't RavenDB require that external replication from an encrypted database also goes to an encrypted database? Because the other side is the one that controls the server. We wouldn't gain anything by creating such a requirement, and there are several desirable scenarios where we wouldn't want or need it.

Consider the case of a salesperson who travels to pitch his product to customers. He needs to carry data on his laptop (to be able to fill new orders, etc.) but this data might be sensitive in nature. So, we set up the database on his laptop as an encrypted database. We also set up external replication to the master cluster in our data center.

We assume that our data center is locked - so we don't want to encrypt the data in the master cluster. Thus, encrypting the database on the salesperson's laptop only protects us from a loss or theft of this laptop. That way, data that goes out and travels is encrypted, while the data we store in a secure location is not. This distinction allows for better performance and makes specific operational tasks easier (see the discussion on backups later in this chapter).

In addition to external replication, it's common to use ETL processes to extract data out of a RavenDB database. A good example is on a system that stores payment information. Given PCI4 compliance issues, data must be stored in an encrypted way. But imagine we set up the RavenDB ETL process on a separate database to let us work with the data more easily. In doing so, we removed the sensitive payment details and were effectively left with just the orders history. Since this (non-sensitive) information isn't required to be encrypted, it's left alone here. In the same way, we can also use ETL SQL to transfer some of the (non-sensitive) data to a reporting database for later analysis.

One thing to note at this point is that, while RavenDB will insist that the ETL process use a secure mode (HTTPS), we don't always have a good way to detect whether your SQL ETL connection string is itself encrypted. It's therefore the admin's responsibility to ensure that this communication channel is safe from eavesdropping. For that matter, regardless of whether your communication channel is encrypted, admins should be aware of the data flow inside the system, making sure that sensitive information is not sent to an unencrypted store and potentially exposing data that shouldn't be stored in plain text.5

Key management

This section is important. In fact, it's probably the most important aspect for an admin to read. This is because, to understand the security of the system, you ultimately need to be aware of where your keys are. You can encrypt your data using the strongest encryption methods, using post-quantum algorithms, for example. But in the end, everything still hinges on the security of your keys. If the key leaks in any way, it's game over for you as far as your system's security is concerned.

As you can imagine, we put a lot of thought into key management at RavenDB. And yet we recognize competing concerns here: The more secure your system is, the harder it is to use. As you've already seen earlier in this chapter, not much is required to set up an encrypted database using RavenDB. Our software generates a key for you automatically. So, aside from writing this key down, you can just sit back and let run everything normally.

But this leads to an interesting question: Where is the key stored on our end, and how?

In fact, there isn't a key in the singular sense; there might be more than one. RavenDB uses the following two keys6:

  • Database encryption key - a per-database key, used as the master encryption key for the entire database and stored on the server store
  • Server master key - a server-wide value used for encrypting the databases encryption keys

By default, the server store as a whole is not encrypted. To prevent an attacker from simply reading the database encryption keys from the file system and then accessing your data, we encrypt the database encryption keys themselves a second time, using a server master key.

Of course, as a smart reader, you know that this just moves the attack vector. So, you may be wondering: How is the server master key being protected?

While RavenDB could encrypt the server master key as well, doing so would just lead to a need for a third encryption key, and then a fourth. And if we encrypt that, then we'll discover that it's turtles all the way down.

This isn't a problem that's unique or new to RavenDB. Your organization is also likely to have policies in place for protecting the encryption key: through safe storage. We'll discuss how RavenDB can fit into those policies later on. But for now, let's see how RavenDB stores the server master key by default.

On Windows, a data protection API (DPAPI) lets RavenDB piggyback the encryption of the key on top of a Windows password. (Conceptually, Windows uses a value derived from the logged-in user's password to encrypt or decrypt values.) This means that RavenDB doesn't even need a server master key; we can rely on DPAPI to manage things for us.

So, whenever we need to store the database encryption keys for Windows, we'll call DPAPI, which will encrypt each key and store its encrypted value. Whenever we open an encrypted database, we'll hand the encrypted value back to DPAPI and get the key in return, which we can then use to open the encrypted database.

This process has the advantage of being pretty seamless and (usually) good enough as a security measure. The disadvantage is that this security is tied to your Windows password.7 For example, an admin resetting a password will cause DPAPI to fail at decrypting any values encrypted with the old password. (Note that changing your password, which requires entering the current password, is safe in this regard. Only a password reset will lose us access to previously encrypted DPAPI values.)

On Linux, the situation is a lot more complex. There isn't a single solution like DPAPI on Linux. Instead, there are many solutions that can be used: libsecret, Gnome Keyring, KDE Vallet, etc.

Because there's no universally accepted approach — and to avoid dependencies that might not exist for all deployments — RavenDB doesn't use any of these solutions for Linux (see the next section for how you can customize that). Instead, we use the operating system permissions to securely hold the key. This key is stored in the ~/.ravendb/secret.key file with permissions set to only allow the RavenDB user access to it.

On Linux, if you have a single hard disk that stores both the secret.key file and the encrypted database, then you can plug it into a separate system where you have root privileges and skip any permissions checks on the file-system level. (On Windows, there are tools such as DPAPick that can decrypt DPAPI values, given offline access to a machine.)

So, by default, RavenDB uses these operating-system level mechanisms to secure the server master key. But recognizing that doing so gives you only up to a certain level of security, we let you customize the way in which RavenDB gets the encryption key.

Customizing key management in RavenDB

Your master encryption key is the holy grail of your database security. RavenDB has reasonable defaults to store it — using DPAPI or file system permissions, depending on which operating system you're running on. But there's a limit to how much these methods can protect your data. In many organizations, there are strict security policies around key management, and RavenDB lets you follow them easily.

In much the same way that you can customize how RavenDB gets the X509 certificate to ensure that your communication is safe, we also let you specify an executable that will fetch the key from some secret store. This process is controlled using the Security.MasterKey.Exec configuration value.

Listing 14.2 shows an example of a PowerShell script that can be invoked to fetch the encryption key from Azure Vault.

Listing 14.2 Getting the encryption key from Azure Vault and sending it to RavenDB


$secret = Get-AzureKeyVaultSecret -VaultName 'AllMySecrets' -Name 'RavenMasterEncKey'
$key = [System.Convert]::FromBase64String($secret.SecretValueText)
$stdout = [System.Console]::OpenStandardOutput()
$stdout.Write($key, 0, $key.Length)

RavenDB will invoke this script, read the key from the standard output and use it as the server master key. In this way, you retain complete control over key storage, access control, etc.

Changing the key

You can change the server master key, but it takes a bit of work. Effectively, you have to decrypt and re-encrypt the server store with the new key. This is actually reasonable to do because the server store is usually fairly small. Changing the database encryption key, on the other hand, isn't really possible without a full export/import, which can take a lot of time for a large database.

Since different servers can use different keys, you might want to create the new key on a new server, then tell RavenDB to move the databases over. That way, you have an online process and won't need to take the system down while it's all happening. But in practice, changing the key is rare, and isn't usually needed.

It is important to note that failing to retrieve the server master key - or getting the wrong key - will cause a failure when loading any encrypted database. And if the server store is encrypted, starting RavenDB will result in a failure as well.

Managing the database encryption key

So far we've talked primarily about the server master key. But the database encryption key also deserves some attention. Earlier in this chapter, we walked through creating an encrypted database, and as part of that, we also got the encryption key to safely store away (Figure 14.2).

Encryption keys are not part of the global cluster state, nor are they usually sent over the wire. Instead, at database creation time, the server generates a key and then contacts each of the nodes configured for hosting this database and tells them the key for this database. Only then is the actual database created.

If you created an encrypted database on Node A, and later on you want to expand the database group to also reside on Node C, how does that work? After all, the encryption key isn't available on Node C. So, just trying to expand the database group at this point will result in an error, as setting up a key for an encrypted database is a separate action from setting up the database.

The database creation wizard makes this process seamless. But it's important to understand what's going on beneath the surface.

Getting the encryption keys from RavenDB

As an administrator, you can get the server master encryption key by using rvn offline-operation get-key and providing the path to the server store folder. This is typically used if you need to move the database between machines.

To get the database encryption key, you would go to Manage Server and then to Admin JS Console. Choose Database as the type, and select the database you want to get the key for. Now you can run the following command to get the key: return database.DocumentsStorage.Options.MasterKey;.

You can see an example of this process in Figure 14.5.

Figure 14.5 Getting the encryption key from an active database using an admin script

Getting the encryption key from an active database using an admin script

This technique uses the admin scripting functionality, which is only available to the cluster administrator and lets you execute arbitrary scripts in the context of RavenDB itself. The same functionality is also exposed through the rvn tool — using rvn admin-channel and then the script command.

Note that getting the server master encryption key is an offline operation, while the only way to get the database encryption key is when the database server is up and running.

Because different nodes don't necessarily have the same encryption key for the same database — and because the encryption key is important — we require an administrator action to create the key on a node before RavenDB loads an encrypted database. This can be done through a REST call, as you can see in Listing 14.3.

Listing 14.3 Pushing a database encryption key to a node


$baseUrl = "https://c.raven.development.run"
$dbName = "Northwind"

$spm = [System.Net.ServicePointManager]
$spm::SecurityProtocol = [System.Net.SecurityProtocolType]::Tls12
$rng = [System.Security.Cryptography.RNGCryptoServiceProvider]
$key = New-Object byte[] 32
$rng::Create().GetBytes($key)
$payload = [System.Convert]::ToBase64String($key)
$cert = Get-PfxCertificate -FilePath admin.cert.pfx

Invoke-WebRequest "$baseUrl/admin/secrets?name=$dbName" `
    -Certificate $cert -Method POST -Body $payload

There is actually a lot going on in Listing 14.3. First, we define which node we'll push the key to and which database whose encryption key we'll push. Then, we ask PowerShell to use TLS 1.2 to talk to the server. (Sometimes it defaults to TLS 1.0, which isn't supported by RavenDB for security reasons.) RNGCryptoServiceProvider is then used to generate a cryptographically secured random number. We convert it to Base64 and then use a certificate with cluster administrator privileges to send it to the node.

Once this is done, we can now expand the Northwind database from Node A to Node C as well. The database will be created on the new node, and the previously pushed encryption key will be in use.

An admin can decide whether to use the same key on all nodes — which may simplify some operations, such as restoring from backup — or use different keys. We'll speak more about backup later in this book. But there are some things about backing up an encrypted database that require special attention here. Let's go over them.

Backing up encrypted databases

Just because a database is encrypted doesn't mean that it doesn't need all the usual care and maintenance you give to other data. In particular, I'm thinking about backups, restores and high-availability considerations.

We'll discuss backups in full in the next part of this book, so I won't get too deep into the details now of how to work with them. Instead, I'll just discuss the details that are important to remember when dealing with encrypted databases.

RavenDB supports the following forms of backups:

  • Snapshot - a compressed binary view of the database files at a given point in time, as well as additional data from the cluster level that belongs to the database (identities, compare exchange values, etc.)
  • Full backup - a compressed JSON of all data in the database, as well as all the cluster-level data for this database
  • Incremental backup - a compressed JSON of all data in the database since the last full backup, as well as all the cluster-level data for this database.

For an encrypted database, it's important to consider what parts of the backup are encrypted, and in what manner. With a Snapshot backup, the entire snapshot is encrypted using the database encryption key. Conceptually, you're getting the raw file from the disk-as is8.

As a result, you must have the appropriate encryption key if you ever want to restore the snapshot. Without this key, there's no way to restore the database, access the data or really do anything at all.

Alongside the snapshot data, there's also the cluster-level data. This set of data is typically much smaller, but it is not encrypted in the case of a snapshot. While it is compressed, this data is available to anyone who can read the backup media.

Full and incremental backups are always completely unencrypted, so you should be aware of your backup strategy and where you'll back up your encrypted databases. RavenDB is equipped to push backups to a local or shared directory, to an FTP/SFTP site, to Azure Blob storage, to Amazon S3 and to Amazon Glacier.

In any of these cases, if you have an encrypted database, you need to consider where you will store the data. You can back up to an encrypted folder, or you can enable data-at-rest encryption settings when uploading to the cloud (exactly how depends on which system you're using, but all have some level of support for automatic encryption of uploaded content).

We'll discuss backup management at length in Chapter 17, later in this book. But I want to emphasize that, for encrypted databases, in addition to backing up the data for the document itself, it's important to have a copy of the encryption key. Not only is this step important for restoring snapshot data, but it also can be very relevant if an admin has ever reset a password for a user, resulting in DPAPI failure to decrypt the database encryption key at startup.

This error can happen pretty far down the line. If the admin resets the password on Monday, but RavenDB had already gotten the encryption key in memory, then no issue will appear until the moment when RavenDB unloads the database and needs to reload it — potentially several days or weeks later. At that point, your being able to quickly and easily grab the encryption key from a locked drawer and provide it to RavenDB for loading the encrypted database is much preferred to a forced restore of everything.

Summary

This chapter might have been hard to decipher, but I hope you got the right keys out of it. In a more serious tone, we've gone over a lot of information about how RavenDB is using high-end encryption to safely protect your data. RavenDB uses the XChaCha20Poly1305 algorithm to encrypt any and all data on disk. Decrypting information is done only during an active transaction. The memory holding the decrypted data is locked into memory, so it won’t be written to a page file or a swap partition. RavenDB will immediately wipe the decrypted contents in memory upon transaction closure, reducing the time that sensitive information is available.

We went over the details of how RavenDB encrypts every part of the system, from how transactions are encrypted as they're written to the disk to how each individual page in the database is encrypted with its own unique key. We saw what's encrypted on RavenDB (documents, attachments, revisions, indexes) and what isn't - even if the database itself is encrypted (identities and compare exchange values). We then saw how we can encrypt cluster-level data by encrypting the server store and met for the first time the rvn tool.

After that, we looked at what's probably the most important topic for you in this chapter: how the encryption keys are being managed by RavenDB. By default, RavenDB encrypts the database encryption keys using the server master key. This master key is then encrypted using DPAPI on Windows or protected using file system permissions on Linux. You also have the option of telling RavenDB how to fetch the master key from a hardware security module, a vault or any other method that fits your security policies using the Security.MasterKey.Exec option.

Finally, we discussed backup concerns for encrypted databases, and in particular, the safekeeping of your encryption keys. I find it ironic that the most secure backup method is probably just printing a hard copy of the encryption key and storing it offline in a locked cupboard at your offices. Nonetheless, I've found this approach to be one of the most efficient ways to handle the issue of a lost or stolen key. Having the key tucked away like this means that you won't have to think about it too much — but if you do have a need, the key is available to you.

You might have noticed some emphasis on my part on the topic of keeping the encryption key safe. This is because the key is important. Without it, you have zero access to the database. You may think this emphasis is obvious as a basic property of an encrypted database. And yet, we've gotten support calls with some variant of "we lost the key, how do we get the data back?". This scenario can occur due to a password reset invalidating a DPAPI-encrypted value, losing the main hard disk of the machine while still having the database drive up, or many other reasons.

Regardless of the cause, the key was lost. If there's no key, then there's no way to access your data. That is the point of encryption! So, please remember that when you create an encrypted database, keep your encryption keys somewhere safe - just in the off case that you'll need them.

This chapter also closes our discussion on security inside RavenDB. The next topic on the table is an in-depth dive into operations, deployments and monitoring your production clusters.


  1. These files are also memory-mapped, but they aren't persistent, as far as RavenDB is concerned. Instead, we are using memory-mapped files to avoid using too much private memory and give the operating system well-known backing store for this temporary memory. It also allows RavenDB to have fine-grained control over the memory usage required by the storage requirements.

  2. RavenDB uses direct and unbuffered I/O to write to the disk, ensuring that writes are persistent, skipping the caches in the middle.

  3. There are a lot of other reasons why the data is divided into pages, and this is how RavenDB works without encryption as well. It just happens that it also plays very nicely into the requirements for using the data while keeping it encrypted.

  4. Payment Card Industry.

  5. You can force RavenDB to accept ETL tasks that use non encrypted channels using the AllowEtlOnNonEncryptedChannel option.

  6. Both the database keys and the server master key are local to a node, not shared across the cluster.

  7. In this case, the relevant password is for the user account that is running the RavenDB process. That will usually not be a normal user, but a service account.

  8. Note that this is just a conceptual model. You can't just copy the file out of the way while RavenDB is running and consider this a backup.