Looking into convergent encryption
I’ll start with saying that this is not something that is planned in any capacity, I run into this topic recently and decided to dig a little deeper. This post is mostly about results of my research.
If you run a file sharing system, you are going to run into a problem very early on. Quite a lot of the files that people are storing are shared, that is a good thing. Instead of storing the same file multiple times, you can store it once, and just keep a reference counter. That is how RavenDB internally deals with attachments, for example. Two documents that have the same attachment (content, not the file name, obviously) will only have a single entry inside of the RavenDB database.
When you run a file sharing system, one of the features you really want to offer is the ability to save space in this manner. Another one is not being able to read the users’ files. For many reasons, that is a highly desirable property. For the users, because they can be assured that you aren’t reading their private files. For the file sharing system, because it simplify operations significantly (if you can’t look at the files’ contents, there is a lot that you don’t need to worry about).
In other words, we want to store the users’ file, but we want to do that in an encrypted manner. It may sound surprising that the file sharing system doesn’t want to know what it is storing, but it actually does simplify things. For example, if you can look into the data, you may be asked (or compelled) to do so. I’m not talking about something like a government agency doing that, but even feature requests such as “do virus scan on my files”, for example. If you literally cannot do that, and it is something that you present as an advantage to the user, that is much nicer to have.
The problem is that if you encrypt the files, you cannot know if they are duplicated. And then you cannot use the very important storage optimization technique of de-duplication. What can you do then?
This is where convergent encryption comes into play. The idea is that we’ll use an encryption system that will give us the same output for the same input even when using different keys. To start with, that sounds like a tall order, no?
But it turns out that it is quite easy to do. Consider the following symmetric key operations:
One of the key aspects of modern cryptography is the use of nonce, that ensures that for the same message and key, we’ll always get a different ciphertext. In this case, we need to go the other way around. To ensure that for different keys, the same content will give us the same output. Here is how we can utilize the above primitives for this purpose. Here is what this looks like:
In other words, we have a two step process. First, we compute the cryptographic hash of the message, and use that as the key to encrypt the data. We use a static nonce for this part, more on that later. We then take the hash of the file and encrypt that normally, with the secret key of the user and a random nonce. We can then push both the ciphertext and the nonce + encrypted key to the file sharing service. Two users, using different keys, will always generate the same output for the same input in this model. How is that?
Both users will compute the same cryptographic hash for the same file, of course. Using a static nonce completes the cycle and ensures that we’re basically running the same operation. We can then push both the encrypted file and our encrypted key for that to the file sharing system. The file sharing system can then de-duplicate the encrypted blob directly. Since this is the identical operation, we can safely do that. We do need to keep, for each user, the key to open that file, encrypted with the user’s key. But that is a very small value, so likely not an issue.
Now, what about this static nonce? The whole point of a nonce is that it is a value that you use once. How can we use a static value here safely? The problem that nonce is meant to solve is that with most ciphers, if you XOR the output of two cipher texts, you’ll get the difference between them. If they were encrypted using the same key and nonce, you’ll get the result of XOR between their plain texts. That can have catastrophic impact on the security of the system. To get anywhere here, you need to encrypt two different messages with the same key and nonce.
In this case, however, that cannot happen. Since we use cryptographic hash of the content as the key, we know that any change in the message will ensure that we have a different key. That means that we never reuse the key, so there is no real point in using a nonce at all. Given that the cryptographic operation requires it, we can just pass a zeroed nonce and not think about it further.
This is a very attractive proposition, since it can amounts to massive space savings. File sharing isn’t the only scenario where this is attractive. Consider the case of a messaging application, where you want to forward messages from one user to another. Memes and other viral details are a common scenario here. You can avoid having to re-upload the file many times, because even in an end to end encryption model, we can still avoid sharing the file contents with the storage host.
However, this lead to one of the serious issues that we have to cover for convergent encryption. For the same input, you’ll get the same output. That means that if an adversary know the source file, it can tell if a user has that file. In the context of a messaging application, it can spell trouble. Consider the following image, which is banned by the Big Meat industry:
Even with end to end encryption, if you use convergent encryption for media files, you can tell that a particular piece of content is accessed by a user. If this is a unique file, the server can’t really tell what is inside it. However, if this is a file that has a known source, we can detect that the user is looking at a picture of salads and immediately snitch to Big Meat.
This is called configuration of file attack, and it is the most obvious problem for this scenario. The other issue is that you using convergent encryption, you may allow an adversary to guess about values in the face on known structure.
Let’s consider the following scenario, I have a system service where users upload their data using convergent encryption, given that many users may share the same file, we allow any user to download a file using:
Now, let’s assume that I know what typical files are stored in the service. For example, something like this:
Looking at this form, there are quite a few variables that you can plug here, right? However, we can generate options for all of those quite easily, and encryption is cheap. So we can speculate on the possible values. We can then run the convergent encryption on each possibility, then fetch that from the server. If we have a match, we figured out the remaining information.
Consider this another aspect of trying to do password cracking, and using a set of algorithms that are quite explicitly meant to be highly efficient, so they lend themselves to offline work.
That is enough on the topic, I believe. I don’t have any plans of doing something with that, but it was interesting to figure out. I actually started this looking at this feature in WhatsApp:
It seams that this is a client side enforced policy, rather than something that is handled via the protocol itself. I initially thought that this is done via convergent encryption, but it looks like just a counter in the message, and then you the client side shows the warning as well as applies limits to it.