Dynamic fields

Introduction

RavenDB, as a schemaless database, gives you flexibility in how you persist data in the system, allowing the system to evolve freely over time, and it’s not only restricted to the documents. Static indexes (written in C# or JavaScript) provide the possibility of producing/generating specialized data structures for efficient querying. At first glance, it might appear that the static indexes don’t offer the flexibility in defining fields for querying, as they require static, compile-time binding of the index field to one or more document fields/properties. Adding an additional field necessitates changing the definition and rebuilding the indexes, which can be time-consuming for large datasets.

Fortunately, it only seems that way.

Examples of use cases

For our examples, let’s assume we are developing an e-commerce site to make them more practical and understandable.

Product attributes

Your e-commerce site has a product catalog. We know that some products have common attributes such as ‘Name’ or ‘Price,’ but often a product can have distinguishing properties like `Color`. It makes sense to filter, for example, T-Shirts by color, but does it make sense to filter software by colors? When building a catalog, it is hard to predict what we would filter on in the future, and we do not want to edit the definition each time something new appears.

Normally, the definition of the ProductSearch index would probably look like this:

from doc in docs.Products
  select new
  {
      Name = doc.Name,
      Price = doc.Price,
  }

Alternatively, in JavaScript:

map("Products", (product) => {
      return {
          Name: product.Name,
          Price: product.Price
      };
  })

and when a new property appears, we explicitly add the field to the definition. However, this isn’t very flexible. Yet, with a little help from the dynamic fields feature we can make our index flexible.

Depending on the document structure we have a couple of possibilities there. Let’s assume we have a document in format like this:

{
      "Name": "Unique T-Shirt",
      "Price": 20,
      "Attributes": {
          "Color": "Black"
      }
}

This is a win-win situation, so let’s change our index definition:

from doc in docs.Products
  select new
  {
      Name = doc.Name,
      Price = doc.Price,
      _ = doc.Attributes.Select(attribute => CreateField(attribute.Key, attribute.Value))
  }
map("Products", (product) => {
      return {
          Name: product.Name,
          Price: product.Price,
          _: Object.entries(product.Attributes).map(([key, value]) => createField(key, value, {
              indexing: 'Default',
              storage: false,
              termVector: null
          }))
      };
  })

The `CreateField` method creates an additional index field for document index entry with the name `Color` and pushes `Black` as a value.

When we look into raw index entry (instruction how to get raw entry is available at docs our documents has next fields:

{
      "Color": "black",
      "Name": "unique t-shirt",
      "Price": "20",
      "id()": "products/1-a"
}

This gives the possibility to query by color even though we do not have any explicit `Color` field in our index definition:

from index 'ProductSearch' where Color == 'Black'
  session.Query<ProductIndexResult, ProductSearch>
  .Where(x => x.Color == "Black")
  .ProjectInto<Product>()
  .ToList();

Remember, no term is written under the `Color` field for documents that don’t have a `Color` attribute.

As always, you can have totally different document structures, like properties are not aggregated in the dictionary but stored in root (like the `Name`), what to do in such a case? You can always index all fields from document, e.g:

from doc in docs.Products
  select new 
  {
      _ = AsJson(doc).Select(x => CreateField(x.Key, x.Value))
  }
map('Products', function(product) {
      return {
          _: Object.keys(product).map(key => createField(key, product[key], {
              indexing: 'Default',
              storage: false,
              termVector: null
          }));
      }
  })

Let’s analyze the example above. RavenDB offers AsJson() indexing method, which returns a document as a dictionary object, which gives a lot of flexibility in creating unbounded indexes, however such usage should be carefully considered and analyzed. Great advantage of such an approach is schemaless index, so you will be able to automatically use new fields added to documents. On the other hand you probably will index fields that are probably unnecessary which will cause higher usage of disk space and degradation of indexing performance in case when your documents are big.

Secondly, when you have complex values (e.g. you store some class as value of property) the indexed term will be a JSON so querying it would be hard since you would have to match JSON.

In such scenarios, it is simply better to have a field with the names of the attributes to be additionally indexed.

{
      "Name": "Unique T-Shirt",
      "Price": 20,
      "Color": "Black",
      "ForSearch": ["Color"]
}
from doc in docs.Products
  select new 
  {
      Name = doc.Name,
      Price = doc.Price,
      _ = AsJson(doc).Where(x => doc.ForSearch?.Contains(x.Key) ?? false)
          .Select(field => CreateField(field.Key, field.Value))
  }
map("Products", (product) => {
      return {
          Name: product.Name,
          Price: product.Price,
          _: product.ForSearch == false || Array.isArray(product.ForSearch) == false
              ? null
              : product.ForSearch.filter(key => product.Attributes.hasOwnProperty(key)).map(key => createField(key, product.Attributes[key], {
                  indexing: 'Default',
                  storage: false,
                  termVector: null
              }
          )), 
      };
  })

In such a scenario you control what you index per document and avoid unwanted fields.

Configure field options

Method `CreateField` gives the possibility to configure field options. Learn more at docs.

Dynamic fields: things to be aware of

Dynamic fields give the possibility to make our index flexible, but we should be careful what and how we index. Let’s discuss some scenarios that might happen to you.

Duplicated names

Let’s have an index definition like this:

from doc in docs.Products
  select new
  {
      CreationDate = doc.CreationDate,
      _ => doc.Attributes.Select(x => CreateField(x.Key, x.Value))
  }
map("Products", (product) => {
      return {
          Name: product.CreationDate,
          _: Object.entries(product.Attributes).map(([key, value]) => createField(key, value, {
              indexing: 'Default',
              storage: false,
              termVector: null
          }))
      };
  })

And our document has two properties with the same name, but one of them is nested.

{
      [...]
      "CreationDate": "2022-02-01",
      "Attributes": {
          "CreationDate": "2023-01-01"
      }
      [...]
}

Situations like this can happen. For example: the `CreationDate` in the root of the DOM can mean the creation of the product in our catalog, but the `CreationDate` in the `Attributes` can mean the creation date of the product itself.

After indexing, we can query our document like this:

from index 'Index' where CreationDate == "2022-02-01"
  or
  from index 'Index' where CreationDate == "2023-01-01"

Both queries match our document, which can lead to mismatches in queries because terms can have different meanings.

Please be careful when creating dynamic fields with non-unique naming conventions (e.g. prefixes/suffixes), as this may inadvertently push additional terms into fields we do not initially want.

Creating different projection

The example above shows us how dangerous it could be if we’ve duplicated names, but there is another scenario. One of RavenDB features is that we can store data in index for performance reasons or we want to keep transformed data by index for projection purposes. RavenDB when performing projection has to retrieve documents from disk storage, however, when we know that document is big and we exactly know that we will need only some properties from it we could store it inside index and avoid loading whole documents from storage which can significantly reduce time of querying. Learn more about storing fields here.

We can do a trick with a dynamic field and have stored different values than we index.

Let’s have document like this:

{
      "Name": "ProductOne"
      "OrderedAt": "2023-03-01T12:00:00.0000000"
}

We have a page where we show our salespeople all orders with the date, of course we would not show them the date as a raw string but with nice formatting.

We could do this in our application or we could have all data already prepared and just stream the model from our index.

from order in docs.Orders
  select
  {
      Name = order.ProductName,
      OrderedAt = order.OrderedAt // this would index our date as Tick
      _ = CreateField("OrderedAt", doc.OrderedAt.ToString("D", CultureInfo.GetCultureInfo("pl_PL").DateTimeFormat), 
          options: , new CreateFieldOptions()
          {
              Indexing = FieldIndexing.No,
              Storage = FieldStorage.Yes,
              TermVector = FieldTermVector.No
          })
  }

We indexed our date as ticks with the explicit field (`OrderedAt`) and inserted a custom projection (as a stored field) via a dynamic field.

Now we could query like this:

from index 'Index' where OrderedAt between "2023-02-01T12:00:00.0000000" and "2023-04-01T12:00:00.0000000"
      select Name, OrderedAt

and it will returns us json like this:

{
      "Name": "ProductOne",
      "OrderedAt": "środa, 1 marca 2023",
}

Please be aware that this is not recommended to implement in most cases, but it is shown as a showcase to raise awareness of the implications caused by using the same name.

 

Summary

This article presents non-trivial usage of dynamic fields, which can increase development performance and less changes as the data changes over time to index definition. A great source of knowledge about the possibilities of dynamic fields can be found in our documentation page.

Woah, already finished? 🤯

If you found the article interesting, don’t miss a chance to try our database solution – totally for free!

Try now try now arrow icon