Dealing with large documents (100+ MB)

RavenDB can handle large documents. There actually is a limit to the size of a document in RavenDB, but given that it is 2 GB in size, that isn’t a practical one. I actually had to go and check if the limit was 2 or 4 GB, because it doesn’t actually matter.

That said, having large documents is something that you should be wary of. They work, but they have very high costs.

I have run some benchmarks on the topic a while ago, and the results are interesting. Let’s consider a 100MB document. Parsing time for that should be around 4 – 5 seconds. That ignores the fact that there are also memory costs. For example, you can have a JSON documents that is parsed to 50(!) time the size of the raw text. That is 5GB of memory to handle a single 100MB document. That is just the parsing cost. But there are others. Reading a 100MB from most disks will take about a second, assuming that the data is sequential. Assuming you have 1Gbits/S network, all of which is dedicated to this single document, you can push that to the network in 800 ms or so.

Dealing with such documents is hard and awkward, if you accidently issue a query on a bunch of those documents and get 25 of them page, you just got a query that is 2.5 GB in size. With documents of this size, you are also likely to want to modify multiple pieces at the same time, so you’ll need to be very careful about concurrency control as well.

In general, at those sizes, you stop threating this as a simple document and move to a streaming approach, because anything else doesn’t make much sense, it is too costly.

A better alternative is to split this document up to its component parts. You can then interact with each one of them on an independent basis.

It is the difference between driving an 18 wheeler and driving family cars. You can pack a whole lot more on the 18 wheeler truck, but it got a pretty poor mileage and it is very awkward to park. You aren’t going to want to use that for going to the grocery store.

RavenDB

RavenDB Cloud

Try

Experience interactive demos and playground server

RavenDB Docs

RavenDB Cloud Docs

Documentation Guide

Download

Features

Performance

Comparison

What’s New

Demo

Bootcamp

Webinars

Workshops

Inside RavenDB Book

GitHub

StackOverflow

Articles

Whitepapers

Events

Promotional Materials

Unlock your business potential

Use Cases

Articles

Whitepapers

Press Releases

Industry Reports

Performance

Comparison

Proof of Concept Program

Academic Program

Events

What’s New

Roadmap

On-premise Pricing

Cloud Pricing

Support

Proof of Concept Program

Academic Program

Dealing with large documents (100+ MB)

Woah, already finished? 🤯

Related Articles

RavenDB’s storage engine: Voron–unlocking the secret

Certificates from the Ground Up

Using analyzers in RavenDB indexing

Watch Live Demo

RavenDB

RavenDB Cloud

Try

RavenDB Docs

RavenDB Cloud Docs

Documentation Guide

Download

Features

Performance

Comparison

What’s New

Demo

Bootcamp

Webinars

Workshops

Inside RavenDB Book

GitHub

StackOverflow

Articles

Whitepapers

Events

Promotional Materials

Use Cases

Articles

Whitepapers

Press Releases

Industry Reports

Performance

Comparison

Proof of Concept Program

Academic Program

Events