With the massive amounts of information that modern applications are expected to handle, it’s important to be able to compress data into smaller sizes to save storage space. In most databases, this presents a tradeoff – the more efficiently data is compressed, the slower it becomes to store, retrieve, or modify it. RavenDB 5.0 introduces a new way of compressing documents more efficiently, without sacrificing the high performance and speed you’ve come to expect.
In many use cases, the data stored in RavenDB consists of many documents with similar structures or other commonalities. A lot of space could be saved by compressing collections of similar documents as a single unit. But this would mean that to load a single document, the entire collection would have to be decompressed. To add, delete, or change a single document, the entire collection would have to be compressed again in a different way. Past versions of RavenDB offered ways of compressing the data within documents, but we found that compressing multiple documents simply wasn’t worth the performance cost.
In RavenDB 5.0 we have introduced a solution to this dilemma by integrating the Zstd compression algorithm (Zstandard) first developed for Facebook. This algorithm is able to ‘train’ on a batch of documents and learn the commonalities between them, and then compress new documents individually, which makes it possible to decompress them individually as well.
This new Documents Compression feature can be toggled for each document collection, as well as document revisions. By training on the first few documents in the collection, the Zstd algorithm generates a dictionary that it then applies to every new document. The algorithm monitors the compression ratio achieved for each document, and if it is not satisfactory, the algorithm goes back and re-trains on the most recently modified documents. After this, if the algorithm is able to compress the new document more efficiently, the dictionary is updated. In this way, the algorithm continuously adapts to the data you feed it.
Using this new feature, we often see compression ratios of over 50%, with negligible costs to performance. In fact, because of the reduced I/O usage, we often see an overall increase in the speed the system is able to process and handle operations.
Read more
- You can read more about Documents Compression here.