Merging documents in RavenDB with Map/Reduce

We got an interesting question in the mailing list. Given the following documents structures:

We want to be able to merge these into the following output:

If we had just a single skill in the professions document, that would have been easy. It would also be easy if we had the professions recorded in the skills document. But we have to merge multiple separate skills, without knowing what professions they belong to. RavenDB doesn’t support doing this directly, so we have to do a bit of work to do so.

We can easily merge documents in RavenDB if we have the document id of the relevant document. But in this case, the external id of the skill isn’t part of the document id, so that complicate things.

The very first thing we need to do is to allow ourselves to reference a skill by its external id. This is done by creating a map/reduce index that project the value out, like so:

Note that we specify a pattern for the collection references, based on the actual data from the document. The index itself doesn’t really do much, to be fair, just gives me the document id we wanted to. I’ll post about this feature more in the next post, for now, I’m just using this to generate the results I want. Here is the generated document:

Because there may be multiple documents with the same value, we don’t end up with the actual document, but with a middle man that point to all the matches.

And here we have the reference to the original document, so we can now start working. We need another index, to bring it all together:

There is a lot going on here, it seems. But we are simply walking the line of documents, to find all the documents that we need. And here is the final result:

The nice thing about all this work is that this happens at indexing time. Meaning that queries on this data is really fast.

Merging documents in RavenDB with Map/Reduce

Woah, already finished? 🤯

Related Articles

Using RavenDB Queue ETL as Event Outbox

Begin analysis with OLAP ETL

RavenDB Ansible Collection: New Features (12/25)