Merging documents in RavenDB with Map/Reduce December 18, 2019 Author: Oren Eini, CEO RavenDB
Read the orginal blog post

Merging documents in RavenDB with Map/Reduce

byOren Eini, CEO RavenDB December 18, 2019

We got an interesting question in the mailing list. Given the following documents structures:

image

We want to be able to merge these into the following output:

image

If we had just a single skill in the professions document, that would have been easy. It would also be easy if we had the professions recorded in the skills document. But we have to merge multiple separate skills, without knowing what professions they belong to. RavenDB doesn’t support doing this directly, so we have to do a bit of work to do so.

We can easily merge documents in RavenDB if we have the document id of the relevant document. But in this case, the external id of the skill isn’t part of the document id, so that complicate things.

The very first thing we need to do is to allow ourselves to reference a skill by its external id. This is done by creating a map/reduce index that project the value out, like so:

image

Note that we specify a pattern for the collection references, based on the actual data from the document. The index itself doesn’t really do much, to be fair, just gives me the document id we wanted to. I’ll post about this feature more in the next post, for now, I’m just using this to generate the results I want. Here is the generated document:

image

Because there may be multiple documents with the same value, we don’t end up with the actual document, but with a middle man that point to all the matches.

image

And here we have the reference to the original document, so we can now start working. We need another index, to bring it all together:

image

There is a lot going on here, it seems. But we are simply walking the line of documents, to find all the documents that we need. And here is the final result:

image

The nice thing about all this work is that this happens at indexing time. Meaning that queries on this data is really fast.