RavenDB version 2.5. Other versions:

Document Structure Design Considerations

While Raven is a schema-free data store, that doesn't mean that you shouldn't take some time to consider how to design your documents to ensure that you can access all the data that you need to serve user requests efficiently, reliably and with as little maintainability cost as possible.
The most typical error people make when trying to design the data model on top of a document database is to try to model it the same way you would on top of a relational database.
Raven is a non-relational data store. Trying to hammer a relational model on top of it will produce sub-optimal results. But you can get fantastic results by taking advantage of the documented oriented nature of Raven.

Documents are not flat

Documents, unlike a row in a RDBMS, are not flat. You are not limited to just storing keys and values. Instead, you can store complex object graphs as a single document. That includes arrays, dictionaries and trees. Unlike a relational database, where a row can only contain simple values and more complex data structures need to be stored as relations, you don't need to work hard to get your data into Raven.
Let us take the following page as an example:

Figure 1: Document Structure

In a relational database, we would have to touch no less than 4 tables to show the data in this single page (Posts, Comments, Tags, RelatedPosts).
Using Raven, we can store all the data that we need to work with as a single document with the following format:

Figure 2: Document Structure

This format allows us to get everything that we need to display the page shown above in a single request.

Raven is not relational

When starting out with Raven, people will have problems when they attempt to use relational concepts. The major issue with that is, of course, that Raven is non-relational. However, it's actually more than that; there is a reason why Raven is non-relational.
Raven treats each document as an independent entity. By doing so, it is able to optimize the way documents are stored and managed. Moreover, one of the sweet spots that we see for Raven is for storing large amounts of data (too much data to store on a single machine).
Raven supports sharding out of the box, so there is no need to store a group of related documents together. Each document is independent and can be stored on any shard in the system.
Another aspect of the non-relational nature of Raven is that documents are expected to be meaningful on their own. You can certainly store references to other documents, but if you need to refer to another document to understand what the current document means, you are probably using Raven incorrectly.
With Raven, you are encouraged to include all of the information you need in a single document. Take a look at the post example above. In a relational database, we would have a link table for RelatedPosts, which would contain just the ids of the linked posts. If we wanted to get the titles of the related posts, we would need to join to the Posts table again. You can do that in Raven, but that isn't the recommended approach. Instead, as shown in the example above, you should include all of the details that you need inside the document. Using this approach, you can display the page with just a single request, leading to much better overall performance.

Entities and Aggregates

When thinking about using Raven to persist entities, we need to consider the two previous points. The suggested approach is to follow the Aggregate pattern from the Domain Driven Design book. An Aggregate Root contains several entities and value types and controls all access to the objects contained in its boundaries. External references may only refer to the Aggregate Root, never to one of its child objects.
When you apply this sort of thinking to a document database, there is a natural and easy to follow correlation between an Aggregate Root (in DDD terms) and a document in Raven. An Aggregate Root, and all the objects that it holds, is a document in Raven.
This also neatly resolves a common problem with Aggregates: traversing the path through the Aggregate to the object we need for a specific operation is very expensive in terms of number of database calls. In Raven, loading the entire Aggregate is just a single call and hydrating a document to the full Aggregate Root object graph is a very cheap operation.
Changes to the Aggregate are also easier to control, when using RDMBS, it can be hard to ensure that concurrent requests won't violate business rules. The problem is that two separate requests may touch two different parts of the Aggregate, each of them is valid on its own, but together they result in an invalid state. This has led to the usage of coarse grained locks, which are hard to implement when using typical OR/Ms.
Since Raven treats the entire Aggregate as a single document, the problem simply doesn't exist. You can utilize Raven's optimistic concurrency support to determine if the Aggregate or any of its children has changed. You can then reload the modified Aggregate and retry the transaction.

Associations Management

Aggregate Roots may contain all of their children, but even Aggregates do not live in isolation. For example:

Figure 3: Document Structure

The Aggregate Root for an Order will contain Order Lines, but an Order Line will not contain a Product. Instead, it contains a reference to the product id.

The Raven Client API will not try to resolve such associations. This is intentional and by design. Instead, the expected usage is to hold the value of the associated document key and explicitly load the association if it is needed.
The reasoning behind this is simple: we want to make it just a tad harder to reference data in other documents. It is very common when using an OR/M to do something like: orderLine.Product.Name, which will load the Product entity. That makes sense when you are living in a relational world, but Raven is not relational. This deliberate omission from the Raven Client API is intended to remind users that they should model their Aggregates and Entities in a format that follows the recommended practice for Raven.

Comments add new comment

The comments section is for user feedback or community content. If you seek assistance or have any questions, please post them at our support forums.

Bassem Mohsen
REPLY Posted by Bassem Mohsen on

In the blog post example above, what if I want to display the list of most recent comments? The list will have comments from different posts. How can I get this list? I will have to get all posts from the database?

Ayende Rahien
REPLY Posted by Ayende Rahien on

You can project them out to the index. See how we do this in Raccoon Blog:

https://github.com/ayende/RaccoonBlog/blob/master/RaccoonBlog.Web/Infrastructure/Indexes/PostComments_CreationDate.cs

Rob
REPLY Posted by Rob on

The "associations management" section gives a common example for related data. You say you deliberately make this hard in order to encourage recommended practice for Raven. What then would be the correct way to model the order line example using Raven? Am I to copy product data and store it in the order line document, or store document keys and manually load the related document to extract what I want?

Thanks.

Ayende Rahien
REPLY Posted by Ayende Rahien on

For order lines, you won't use a separate document. Order lines are part of the order and should be embedded inside it.

Steven Archibald
REPLY Posted by Steven Archibald on

Actually, the product/order line/order example is an extremely good example of where a document database might be a better choice. Typically, once an order is placed (in the real world - not in programming examples), the order must be preserved in the exact state it was in at the time of the order. In a relational model, since prices and/or discounts might change a year from now, it is necessary to preserve a 'history' version of the order. This is typically done by having duplicated tables, or ancillary datetime columns to indicate time periods during which prices/discounts apply. This type of solution is unneccessary in a document database. The order is the full order. There is no assembly required, and it is fully persisted in it's exact state at the time of creation and henceforth. It actually solves many other relational problems. For instance, if an product is discontinued, I must keep it in the database in a different state because it is used in previous orders that I must keep around for audit purposes. Now I have to maintain the state of the product so it isn't used in new orders, and is only available to be seen in old orders. None of those concerns apply with a document database.

Gio
REPLY Posted by Gio on

@Steven you are right, but usually, to have a really functional order (printable for example) you need the photo of the product or any other data related to it (and it can be really a lot), this will lead to too mutch data stored (and duplicated) for each order only to preserve the initial state... don't you think?

Phil
REPLY Posted by Phil on

@Gio I was thinking the same thing. How would something like order history be handled in Raven? Seems like the documents can get huge, quickly.

Ayende Rahien
REPLY Posted by Ayende Rahien on

Order history is just the snapshot of the order documents at the time they were processed, nothing more. You usually don't keep binary data such as images in the doc itself, you store them as references. Other times, you can hold the essential data in the actual doc, and if you care about additional data, you point to a version copy of the data at the time of the order begin processed.

KCs
REPLY Posted by KCs on

I'd like to ask, is the domain model tied to a given database? I mean is it possible to use a database in different applications without having the types stored in the database or, as I suppose, it's only possible by using JSON objects?

Fitzchak Yitzchaki
REPLY Posted by Fitzchak Yitzchaki on

You store in RavenDB just JSON objects. RavenDB doesn't have access to the classes that represents those objects. Those classes are tied to your application.

Todd
REPLY Posted by Todd on

In the Post example where you are storing the Title of each RelatedPost, you state that with a relational model you'd need to join the Posts table again to get that Title. In theory, you certainly could store the Title in RelatedPosts in a relational model, but you probably wouldn't, because if the title changed you'd have to update it in multiple places. Wouldn't you have the same problem in Raven or any other document database?

Ayende Rahien
REPLY Posted by Ayende Rahien on

Todd, That is correct. In RavenDB, you would either use LoadDocument in the index to get the associated data from the related item, or you accept the costs of denormalizing it.

Daniel Schilling
REPLY Posted by Daniel Schilling on

FYI: The "Aggregate pattern from the Domain Driven Design book" link in the Entities and Aggregates section is dead. 404 not found.

Ayende Rahien
REPLY Posted by Ayende Rahien on

It appears it was removed from the site, I removed the link

Christian
REPLY Posted by Christian on

Really cool stuff. One thing, supose that after an increment the aggregate root "changes". I mean, an entity that was part of an aggregate after talking to the domain expert and modeling new features, becomes an aggregate root. How do you handle this "migration" Thank you

Ayende Rahien
REPLY Posted by Ayende Rahien on

You can handle that using a migration. Usually a script that generates new documents from the existing one.

Andy
REPLY Posted by Andy on

In the above example the children don't seem to have any reference to their parent. Is there any way of supporting upwards traversal, e.g. "OrderLine.Order"? I might want to pass around an OrderLine, but there may be places that also needs to access properties of its parent Order. What's the recommended approach to achieve something like this?

Andy
REPLY Posted by Andy on

In the above example the children don't seem to have any reference to their parent. Is there any way of supporting upwards traversal, e.g. "OrderLine.Order"? I might want to pass around an OrderLine, but there may be places that also needs to access properties of its parent Order. What's the recommended approach to achieve something like this?

B - Ry
REPLY Posted by B - Ry on

Thinking your aggregate boundaries should be mirrored in persistence makes your models responsible for too much IMO. RavenDB != DDD.

Bry
REPLY Posted by Bry on

Thinking your aggregate boundaries should be mirrored in persistence makes your models responsible for too much IMO. RavenDB != DDD.

Maggie Longshore
REPLY Posted by Maggie Longshore on

In the examples here, the RelatedPosts contains a link to a post and the title. You state that this is so we can display the data without fetching the related post. Why would you not treat the OrderLine the same way and store the Product Name along with the reference to product? It seems like you will need to load all of the product details in order to just show a single order The text above seems to contradict itself.

How do you determine which approach to take - the normalized one or the de-normalized one?

SUBMIT COMMENT