Importing the Stack Overflow dataset into RavenDB

6 minutes
Around 2017 we needed to test RavenDB with realistic datasets. That was the time that we were working hard on the 4.0 release, and we wanted to have some common dataset that was production quality (for all the benefits and complications that this brings) to play with. A serious issue was that we needed that […]

Architectural optimizations vs the profiler

12 minutes
For the past couple of years, we had a stealth project going on inside of RavenDB. That project is meant to re-architect the internals of how RavenDB handles queries. The goal is to have a major performance improvement for RavenDB indexing and queries. We spent a lot of time thinking about architecting this. Design discussions […]

Tracking down RavenDB I/O usage in Linux

7 minutes
Today I had to look into the a customer whose RavenDB instance was burning through a lot of I/O. The process is somewhat ingrained in me by this point, but I thought that it would make for a good blog post so I’ll recall that next time. Here is what this looks like from the […]

Production postmortem: The allocating query

7 minutes
A customer was experiencing large memory spikes in some cases, and we were looking into the allocation patterns of some of the queries that were involved. One of the things that popped up was a query that allocated just under 30GB of managed memory during its processing. Let me repeat that, because it bears repeating. […]

Webinar Recording: RavenDB & Messaging Transactions

1 minutes
In RavenDB 5.4, we’re introducing new ELT features for Kafka and RabbitMQ. Now, instead of your documents just sitting there in your database, you can involve them in your messaging transactions. In this webinar, RavenDB CEO Oren Eini explains how these ETL tasks open up a whole new world of architectural patterns, and how they […]

Recording: Build your own database at Cloud Lunch & Learn

1 minutes
I spoke at Cloud Lunch & Learn about the basics of building a database from scratch. We took a storage engine and created a simple database within the span of an hour. Covered in the talk are the details of how you can build the database, using indexes to speed up queries and the manner […]

Production postmortem: Efficiency all the way to Out of Memory error

7 minutes
RavenDB is written in C#, and as such, uses managed memory. As a database, however, we need granular control of our memory, so we also do manual memory management. One of the key optimizations that we utilize to reduce the amount of overhead we have on managing our memory is using an arena allocator. That is […]

Benchmarking: Slow is fast, fast is slow

6 minutes
I’m trying to compare indexing speed of Corax vs. Lucene. Here is an interesting result: We have two copies of the same index, running in parallel on the same data. And we can clearly see that Lucene is faster. Not by a lot, but enough to warrant investigation. Here is the core of the work […]

When debugging, assume an unreliable narrator

8 minutes
When we are handling a support call, we are often working with partial information about the state of the software at the customer site. Sometimes that is an unavoidable part of the job. When troubleshooting a system with patients’ records, I can’t just ask the customer to schlep the data to my laptop so I […]