How to Handle Document Relationships

One of the design principles that RavenDB adheres to is the idea that documents are independent, meaning all data required to process a document is stored within the document itself. However, this doesn't mean there should not be relations between objects.

There are valid scenarios where we need to define relationships between objects. By doing so, we expose ourselves to one major problem: whenever we load the containing entity, we are going to need to load data from the referenced entities as well (unless we are not interested in them). While the alternative of storing the whole entity in every object graph it is referenced in seems cheaper at first, this proves to be quite costly in terms of database resources and network traffic.

RavenDB offers three elegant approaches to solve this problem. Each scenario will need to use one or more of them. When applied correctly, they can drastically improve performance, reduce network bandwidth, and speed up development.

Denormalization

The easiest solution is to denormalize the data within the containing entity, forcing it to contain the actual value of the referenced entity in addition to (or instead of) the foreign key.

Take this JSON document for example:

// Order document with ID: orders/1-A
{
    "Customer": {
        "Name": "Itamar",
        "Id": "customers/1-A"
    },
    "Items": [
        {
            "Product": {
                "Id": "products/1-A",
                "Name": "Milk",
                "Cost": 2.3
            },
            "Quantity": 3
        }
    ]
}

As you can see, the Order document now contains denormalized data from both the Customer and the Product documents which are saved elsewhere in full. Note we won't have copied all the customer fields into the order; instead we just clone the ones that we care about when displaying or processing an order. This approach is called denormalized reference.

The denormalization approach avoids many cross document lookups and results in only the necessary data being transmitted over the network, but it makes other scenarios more difficult. For example, consider the following entity structure as our start point:

public class Order {
    private String customerId;
    private String[] supplierIds;
    private Referral referral;
    private LineItem[] lineItems;
    private double totalPrice;

    public String getCustomerId() {
        return customerId;
    }

    public void setCustomerId(String customerId) {
        this.customerId = customerId;
    }

    public String[] getSupplierIds() {
        return supplierIds;
    }

    public void setSupplierIds(String[] supplierIds) {
        this.supplierIds = supplierIds;
    }

    public Referral getReferral() {
        return referral;
    }

    public void setReferral(Referral referral) {
        this.referral = referral;
    }

    public LineItem[] getLineItems() {
        return lineItems;
    }

    public void setLineItems(LineItem[] lineItems) {
        this.lineItems = lineItems;
    }

    public double getTotalPrice() {
        return totalPrice;
    }

    public void setTotalPrice(double totalPrice) {
        this.totalPrice = totalPrice;
    }
}

public class Customer {
    private String id;
    private String name;

    public String getId() {
        return id;
    }

    public void setId(String id) {
        this.id = id;
    }

    public String getName() {
        return name;
    }

    public void setName(String name) {
        this.name = name;
    }
}

If we know that whenever we load an Order from the database we will need to know the customer's name and address, we could decide to create a denormalized Order.Customer field and store those details directly in the Order object. Obviously, the password and other irrelevant details will not be denormalized:

public class DenormalizedCustomer {
    private String id;
    private String name;
    private String address;

    public String getId() {
        return id;
    }

    public void setId(String id) {
        this.id = id;
    }

    public String getName() {
        return name;
    }

    public void setName(String name) {
        this.name = name;
    }

    public String getAddress() {
        return address;
    }

    public void setAddress(String address) {
        this.address = address;
    }
}

There wouldn't be a direct reference between the Order and the Customer. Instead, Order holds a DenormalizedCustomer, which contains the interesting bits from Customer that we need whenever we process Order objects.

But what happens when the user's address is changed? We will have to perform an aggregate operation to update all orders this customer has made. What if the customer has a lot of orders or changes their address frequently? Keeping these details in sync could become very demanding on the server. What if another process that works with orders needs a different set of customer fields? The DenormalizedCustomer will need to be expanded, possibly to the point that the majority of the customer record is cloned.

Tip

Denormalization is a viable solution for rarely changing data or for data that must remain the same despite the underlying referenced data changing over time.

Includes

The Includes feature addresses the limitations of denormalization. Instead of one object containing copies of the fields from another object, it is only necessary to hold a reference to the second object. Then the server can be instructed to pre-load the referenced document at the same time that the root object is retrieved. We do this using:

Order order = session
    .include("CustomerId")
    .load(Order.class, "orders/1-A");

// this will not require querying the server!
Customer customer = session.load(Customer.class, order.getCustomerId());

Above we are asking RavenDB to retrieve the Order orders/1-A, and at the same time "include" the Customer referenced by the Order.CustomerId field. The second call to load() is resolved completely client side (i.e. without a second request to the RavenDB server) because the relevant Customer object has already been retrieved (this is the full Customer object not a denormalized version).

There is also a possibility to load multiple documents:

Map<String, Order> orders = session
    .include("CustomerId")
    .load(Order.class, "orders/1-A", "orders/2-A");

for (Order order : orders.values()) {
    Customer customer = session.load(Customer.class, order.getCustomerId());
}

You can also use Includes with queries:

List<Order> orders = session
    .query(Order.class)
    .include("CustomerId")
    .whereGreaterThan("TotalPrice", 100)
    .toList();

for (Order order : orders) {
    // this will not require querying the server!
    Customer customer = session
        .load(Customer.class, order.getCustomerId());
}
List<Order> orders = session
    .query(Order.class)
    .include(i -> i.
        includeDocuments("CustomerId").
        includeCounter("OrderUpdateCount"))
    .whereGreaterThan("TotalPrice", 100)
    .toList();

for (Order order : orders) {
    // this will not require querying the server!
    Customer customer = session
        .load(Customer.class, order.getCustomerId());
}
from Orders
where TotalPrice > 100
include CustomerId
from Orders as o
where TotalPrice > 100
include CustomerId,counters(o,'OrderUpdateCount')

This works because RavenDB has two channels through which it can return information in response to a load request. The first is the Results channel, through which the root object retrieved by the load() method call is returned. The second is the Includes channel, through which any included documents are sent back to the client. Client side, those included documents are not returned from the load() method call, but they are added to the session unit of work, and subsequent requests to load them are served directly from the session cache, without requiring any additional queries to the server.

Note

Embedded and builder variants of Include clause are essentially syntax sugar and are equivalent at the server side.

Streaming query results does not support the includes feature.
Learn more in How to Stream Query Results.

One to many includes

Include can be used with a many to one relationship. In the above classes, an Order has a field SupplierIds which contains an array of references to Supplier documents. The following code will cause the suppliers to be pre-loaded:

Order order = session
    .include("SupplierIds")
    .load(Order.class, "orders/1-A");

for (String supplierId : order.getSupplierIds()) {
    // this will not require querying the server!
    Supplier supplier = session.load(Supplier.class, supplierId);
}

Alternatively, it is possible to use the fluent builder syntax.

Order order = session.load(Order.class, "orders/1-A",
    i -> i.includeDocuments("SupplierIds"));

for (String supplierId : order.getSupplierIds()) {
    // this will not require querying the server!
    Supplier supplier = session.load(Supplier.class, supplierId);
}

The calls to load() within the foreach loop will not require a call to the server as the Supplier objects will already be loaded into the session cache.

Multi-loads are also possible:

Map<String, Order> orders = session
    .include("SupplierIds")
    .load(Order.class, "orders/1-A", "orders/2-A");

for (Order order : orders.values()) {
    for (String supplierId : order.getSupplierIds()) {
        // this will not require querying the server!

        Supplier supplier = session.load(Supplier.class, supplierId);
    }
}

Secondary level includes

An Include does not need to work only on the value of a top level field within a document. It can be used to load a value from a secondary level. In the classes above, the Order contains a Referral field which is of the type:

public class Referral {
    private String customerId;
    private double commissionPercentage;

    public String getCustomerId() {
        return customerId;
    }

    public void setCustomerId(String customerId) {
        this.customerId = customerId;
    }

    public double getCommissionPercentage() {
        return commissionPercentage;
    }

    public void setCommissionPercentage(double commissionPercentage) {
        this.commissionPercentage = commissionPercentage;
    }
}

This class contains an identifier for a Customer. The following code will include the document referenced by that secondary level identifier:

Order order = session
    .include("Referral.CustomerId")
    .load(Order.class, "orders/1-A");

// this will not require querying the server!
Customer customer = session.load(Customer.class, order.getReferral().getCustomerId());

It is possible to execute the same code with the fluent builder syntax:

Order order = session
    .load(Order.class, "orders/1-A",
        i -> i.includeDocuments("Referral.CustomerId"));

// this will not require querying the server!
Customer customer = session.load(Customer.class, order.getReferral().getCustomerId());

This secondary level include will also work with collections. The Order.LineItems field holds a collection of LineItem objects which each contain a reference to a Product:

public class LineItem {
    private String productId;
    private String name;
    private int quantity;

    public String getProductId() {
        return productId;
    }

    public void setProductId(String productId) {
        this.productId = productId;
    }

    public String getName() {
        return name;
    }

    public void setName(String name) {
        this.name = name;
    }

    public int getQuantity() {
        return quantity;
    }

    public void setQuantity(int quantity) {
        this.quantity = quantity;
    }
}

The Product documents can be included using the following syntax:

Order order = session
    .include("LineItems[].ProductId")
    .load(Order.class, "orders/1-A");

for (LineItem lineItem : order.getLineItems()) {
    // this will not require querying the server!
    Product product = session.load(Product.class, lineItem.getProductId());
}

The fluent builder syntax works here too.

Order order = session.load(Order.class, "orders/1-A",
    i -> i.includeDocuments("LineItems[].ProductId"));

for (LineItem lineItem : order.getLineItems()) {
    // this will not require querying the server!
    Product product = session.load(Product.class, lineItem.getProductId());
}

The [] within the include tells RavenDB which field of secondary level objects to use as a reference.

String path conventions

When using string-based includes like:

Order order = session
    .include("Referral.CustomerId")
    .load(Order.class, "orders/1-A");

// this will not require querying the server!
Customer customer = session.load(Customer.class, order.getReferral().getCustomerId());

you must remember to follow certain rules that must apply to the provided string path:

  1. Dots are used to separate fields e.g. "Referral.CustomerId" in the example above means that our Order contains field Referral and that field contains another field called CustomerId.

  2. Indexer operator is used to indicate that field is a collection type. So if our Order has a list of LineItems and each LineItem contains a ProductId field, then we can create string path as follows: "LineItems[].ProductId".

  3. Prefixes can be used to indicate the prefix of the identifier of the document that is going to be included. It can be useful when working with custom or semantic identifiers. For example, if you have a customer stored under customers/login@domain.com then you can include it using "Referral.CustomerEmail(customers/)" (customers/ is the prefix here).

Learning string path rules may be useful when you will want to query database using HTTP API.

curl -X GET "http://localhost:8080/databases/Northwind/docs?id=orders/1-A&include=lines[].product"

Dictionary includes

Dictionary keys and values can also be used when doing includes. Consider following scenario:

public class Person {
    private String id;
    private String name;
    private Map<String, String> attributes;

    public String getId() {
        return id;
    }

    public void setId(String id) {
        this.id = id;
    }

    public String getName() {
        return name;
    }

    public void setName(String name) {
        this.name = name;
    }

    public Map<String, String> getAttributes() {
        return attributes;
    }

    public void setAttributes(Map<String, String> attributes) {
        this.attributes = attributes;
    }
}

HashMap<String, String> attributes1 = new HashMap<>();
attributes1.put("Mother", "people/2");
attributes1.put("Father", "people/3");

Person person1 = new Person();
person1.setId("people/1-A");
person1.setName("John Doe");
person1.setAttributes(attributes1);

session.store(person1);

Person person2 = new Person();
person2.setId("people/2");
person2.setName("Helen Doe");
person2.setAttributes(Collections.emptyMap());

session.store(person2);

Person person3 = new Person();
person3.setId("people/3");
person3.setName("George Doe");
person3.setAttributes(Collections.emptyMap());

session.store(person3);

Now we want to include all documents that are under dictionary values:

Person person = session.include("Attributes.Values")
    .load(Person.class, "people/1-A");

Person mother = session
    .load(Person.class, person.getAttributes().get("Mother"));

Person father = session
    .load(Person.class, person.getAttributes().get("Father"));

Assert.assertEquals(1, session.advanced().getNumberOfRequests());

The code above can be also rewritten with fluent builder syntax:

Person person = session.load(Person.class, "people/1-A",
    i -> i.includeDocuments("Attributes.Values"));

Person mother = session
    .load(Person.class, person.getAttributes().get("Mother"));

Person father = session
    .load(Person.class, person.getAttributes().get("Father"));

Assert.assertEquals(1, session.advanced().getNumberOfRequests());

You can also include values from dictionary keys:

Person person = session
    .include("Attributes.Keys")
    .load(Person.class, "people/1-A");

Here, as well, this can be written with fluent builder syntax:

Person person = session
    .load(Person.class, "people/1-A",
        i -> i.includeDocuments("Attributes.Keys"));

Complex types

If values in dictionary are more complex e.g.

public class PersonWithAttribute {
    private String id;
    private String name;
    private Map<String, Attribute> attributes;

    public String getId() {
        return id;
    }

    public void setId(String id) {
        this.id = id;
    }

    public String getName() {
        return name;
    }

    public void setName(String name) {
        this.name = name;
    }

    public Map<String, Attribute> getAttributes() {
        return attributes;
    }

    public void setAttributes(Map<String, Attribute> attributes) {
        this.attributes = attributes;
    }
}

public class Attribute {
    private String ref;

    public Attribute() {
    }

    public Attribute(String ref) {
        this.ref = ref;
    }

    public String getRef() {
        return ref;
    }

    public void setRef(String ref) {
        this.ref = ref;
    }
}

HashMap<String, Attribute> attributes = new HashMap<>();
attributes.put("Mother", new Attribute("people/2"));
attributes.put("Father", new Attribute("people/3"));

PersonWithAttribute person1 = new PersonWithAttribute();
person1.setId("people/1-A");
person1.setName("John Doe");
person1.setAttributes(attributes);

session.store(person1);

Person person2 = new Person();
person2.setId("people/2");
person2.setName("Helen Doe");
person2.setAttributes(Collections.emptyMap());

session.store(person2);

Person person3 = new Person();
person3.setId("people/3");
person3.setName("George Doe");
person3.setAttributes(Collections.emptyMap());

session.store(person3);

We can also do includes on specific fields:

PersonWithAttribute person = session
    .include("Attributes[].Ref")
    .load(PersonWithAttribute.class, "people/1-A");

Person mother = session
    .load(Person.class, person.getAttributes().get("Mother").getRef());

Person father = session
    .load(Person.class, person.getAttributes().get("Father").getRef());

Assert.assertEquals(1, session.advanced().getNumberOfRequests());

Combining approaches

It is possible to combine the above techniques.
Using the DenormalizedCustomer from above and creating an order that uses it:

public class Order3 {
    private DenormalizedCustomer customer;
    private String[] supplierIds;
    private Referral referral;
    private LineItem[] lineItems;
    private double totalPrice;

    public DenormalizedCustomer getCustomer() {
        return customer;
    }

    public void setCustomer(DenormalizedCustomer customer) {
        this.customer = customer;
    }

    public String[] getSupplierIds() {
        return supplierIds;
    }

    public void setSupplierIds(String[] supplierIds) {
        this.supplierIds = supplierIds;
    }

    public Referral getReferral() {
        return referral;
    }

    public void setReferral(Referral referral) {
        this.referral = referral;
    }

    public LineItem[] getLineItems() {
        return lineItems;
    }

    public void setLineItems(LineItem[] lineItems) {
        this.lineItems = lineItems;
    }

    public double getTotalPrice() {
        return totalPrice;
    }

    public void setTotalPrice(double totalPrice) {
        this.totalPrice = totalPrice;
    }
}

We have the advantages of a denormalization, a quick and simple load of an Order, and the fairly static Customer details that are required for most processing. But we also have the ability to easily and efficiently load the full Customer object when necessary using:

Order3 order = session
    .include("Customer.Id")
    .load(Order3.class, "orders/1-A");

// this will not require querying the server!
Customer customer = session.load(Customer.class, order.getCustomer().getId());

This combining of denormalization and Includes could also be used with a list of denormalized objects.

It is possible to use Include on a query being a projection. Includes are evaluated after the projection has been evaluated. This opens up the possibility of implementing Tertiary Includes (i.e. retrieving documents that are referenced by documents that are referenced by the root document).

RavenDB can support Tertiary Includes, but before resorting to them you should re-evaluate your document model. Needing Tertiary Includes can be an indication that you are designing your documents along "Relational" lines.

Summary

There are no strict rules as to when to use which approach, but the general idea is to give it a lot of thought and consider the implications each approach has.

As an example, in an e-commerce application it might be better to denormalize product names and prices into an order line object since you want to make sure the customer sees the same price and product title in the order history. But the customer name and addresses should probably be references rather than denormalized into the order entity.

For most cases where denormalization is not an option, Includes are probably the answer.