Yet Another Bug Tracker: Article #3 Read more articles in this series ›

Dynamic Fields for Indexing

by Alex Klaus

The power of dynamic fields for indexing dictionaries and nested collections in RavenDB

 
 

The power of dynamic fields for indexing dictionaries and nested collections in RavenDB

Yet Another Bug Tracker (YABT) we are building in this series needs to be flexible and accommodate custom fields and structures that can be added by the end-user. The custom fields is a widely-adopted concept in bug-trackers to manage additional fields on the ticket per project, team, or ticket type. It exists in Jira, Azure DevOps, Bugzilla and others.

Let's follow the trend and add to the Backlog Item (aka "ticket") two related features:

  1. A container for custom fields – additional properties of various types.
  2. A list of all modifications to keep track of who/when modified the ticket.

Here is a mock-up of the Backlog Item screen:

Backlog Item screen
  • Custom fields are displayed on the right-hand side. Each of them is formatted according to its type.
  • The "Created" and "Last Updated" fields (below the custom fields) get resolved from the history of the ticket's modifications (with the rest of the history is available under the "View All"). The name is a link for navigation to the user's profile page.

1. NoSQL solution


At first glance, the task is trivial. Flexibility of the NoSQL does not require defining the structure upfront, so any fields can be easily added ad hoc. Though, the developers still need to be aware of the structure for two reasons:

  • To build the presentation layer (layout and style of the fields).
  • To filter and search on dynamically added fields (especially in a strongly-typed language).

1.1. DB design overview

We described the Data model in the previous article. Here is the relevant part of the diagram including CustomField, BacklogItem and User aggregates:

NoSQL Diagram

In JSON format a Backlog Item record would look like:

    
    {
        "Title": "Malfunction at the Springfield Nuclear Power Plant",
        "Description": "Some terrible details",
        "ModifiedBy": [{
                "Timestamp": "2020-01-01T00:00:00",
                "Summary": "Created",
                "ActionedBy": {
                    "Id": "users/1-A",
                    "Name": "Homer Simpson"
                }
            },
            {
                "Timestamp": "2020-01-02T00:00:00",
                "Summary": "Modified description",
                "ActionedBy": {
                    "Id": "users/2-A",
                    "Name": "Waylon Smithers"
                }
            }
        ],
        "CustomFields": {
            "CustomFields/1-A": "Mr. Burns",
            "CustomFields/2-A": 10000000000
        }
    }
    

Where sample records of custom fields would be:

    
    {
        "Id": "CustomFields/1-A",
        "Name": "Affected Persona",
        "Type": "Text"
    },
    {
        "Id": "CustomFields/2-A",
        "Name": "Potential Loss",
        "Type": "Currency"
    }
    

We are going to focus on two dynamic BacklogItem properties:

  • ModifiedBy – change history, a collection of items with a predefined structure;
  • CustomFields – container for custom fields, a more complex structure represented by a dictionary, where each value has variable structure (BTW, any JSON structure can be presented as a dictionary).

Taking it one level up from JSON to C# we get RavenDB models for persisting in the database:

    
    // The Backlog Item aggregate persisted in the DB
    public class BacklogItem
    {
        public string Id    { get; private set; }
        public string Title { get; set; }

        // List of all users who/when modified the ticket.
        public IList<ChangedByUserReference> ModifiedBy { get; } = new List<ChangedByUserReference>();

        // Resolve Who/when created & updated the ticket, no need to persist it in the DB
        [JsonIgnore]
        public ChangedByUserReference Created     => ModifiedBy?.OrderBy(m => m.Timestamp).FirstOrDefault();
        [JsonIgnore]
        public ChangedByUserReference LastUpdated => ModifiedBy?.OrderBy(m => m.Timestamp).LastOrDefault();

        // Custom properties of various data types. Stored as { custom field ID, value }
        public IDictionary<string, object> CustomFields { get; set; }
    }

    public class ChangedByUserReference
    {
        // Timestamp of the change
        public DateTime Timestamp { get; set; }
        // Brief summary of the change
        public string Summary { get; set; }
        // The user who made the change
        public UserReference ActionedBy { get; set; }
    }

    public class UserReference
    {
        string Id	{ get; set; }
        string Name	{ get; set; }
    }
    

and a separate collection of Custom Fields would be:

    
    // The Custom Field entity persisted in the DB
    public class CustomField
    {
        public string Id { get; set; }
        public string Name { get; set; }
        // Type of the custom field determines how to process the associated value
        public CustomFieldType FieldType { get; set; }

        public enum CustomFieldType { Text, Date, Currency }
    }
    

1.2. Design justification

Design of a de-normalised database always depends on the main use-cases of your application. Therefore, the structures provided above ought to be justified.

References to other entities

While both ModifiedBy and CustomFields refer to other entities, they do it differently.

Instead of just referencing user IDs, ActionedBy property has a bit richer structure – UserReference class. It's done to keep often used user's properties handy and avoid excessive JOINs with the User collection for presenting auxiliary data requested along with the backlog items. For example, a list of backlog items will show names of users who created/edited them:

Backlog Item list

But convenience of receiving user names along with backlog items comes with some maintenance responsibility – keeping user names in references in sync with corresponding records in the Users collection. It is a trade-off that will be discussed in a separate article.

Alternatively, we can go with a traditional approach of using the ID of the referred record and resolving other properties in runtime when querying. The CustomFields property is a good fit for that as custom fields are unlikely to be queried along with multiple backlog items (as shown on the mock-up above), so resolving them via a separate request or a JOIN would not put too much stress on the DB.

ModifiedBy: IList vs IDictionary

ModifiedBy could have been presented in C# as a dictionary with a timestamp as the key (e.g. IDictionary<DateTime, ChangedByUserReference>). If you are confident that 2+ events will never occur at the exact same time, then go for it. Note, that the user ID could not be used as the key, as one user may make multiple changes and the list should reflect the whole history. Overall, I would rather have ModifiedBy as a list.

CustomFields on the other side is the perfect candidate for a dictionary if we decide not to use one custom field twice in the ticket.

OK, it was a diversion. I hope, getting values from the Custom Fields along with the created/updated is straightforward (see YABT source code for examples). So, we resolved the first issue and can build the presentation layer for a known data structure. But what about filtering on those fields?

2. Filtering on sub-attributes... and sub- sub- attributes


There are two interesting cases requiring filtering the Backlog items on ModifiedBy and CustomFields properties:

  1. Getting "My recent tickets", tickets that were recently edited by the current user.
  2. Filtering on certain values of various custom fields. E.g. for text fields where the "Affected Persona" contains word "Burns" or the "Potential Loss" is more than $1,000.

2.1. Filter for "My recent tickets"

To filter a collection we need an index and the Dynamic fields feature comes in handy. It creates key-value pairs in the index terms where the keys are resolved runtime when updating the index.

A generic example of Dynamic fields in index would look like:

    
    Map = tickets =>
        from ticket in tickets
        select new
        {
            _ = ticket.ModifiedBy.Select(x => CreateField(x.ActionedBy.Id, x.Timestamp))
        };
    

It will create index terms for the above example reflecting changes by each user:

  • 'users/1-A': { 2020-01-01T00:00:00 }
  • 'users/2-A': { 2020-01-02T00:00:00 }

Looks good, but not good enough as:

  1. One user can modify one ticket multiple times, but to get the timestamp for the very last modification we need to group by ActionedBy.Id.
  2. Filtering on the generated terms from a strongly-typed language requires a generic structure for using in the queries.

Battling the first problem is easy but figuring out a solution for the second one would require deep knowledge of underlining structures.

If you need details, the discussion is here. The gist – forming the keys for CreateField() in a special format will allow querying on the terms in C# as it was a dictionary. Bear with me.

For index created with syntax CreateField("Bla_" + k.Key, k.Value) and k.Key='Key' you would query:

  • in RQL as from index 'Y' where Bla_Key = 4 (looks quite obvious),
  • in C# as s.Query<X,Y>().Where(p => p.Bla["Key"].Equals(4)) (here is your dictionary).

RavenDB below v5 had two constraints on the Dynamic field key:

  • the key cannot start with a digit, e.g. 1-A (issue #15234);
  • the key cannot contain the slash (/) symbol (issue #15235).

Fortunately, both have been fixed in RavenDB v5.0.

If you are on an older version, then to work around those constraints, we set the key as CreateField("Bla_" + k.Key!.Replace("/",""), k.Value). Then for k.Key == 'users/1-A' you would query s.Query<X,Y>().Where(p => p.Bla["users1-A"].Equals(4)).

Here is a full example:

    
    public class BacklogItems_ForList : AbstractIndexCreationTask<BacklogItem>
    {
        public class Result
        {
            public IDictionary<string, DateTime> ModifiedByUser { get; set; }
        }
        public BacklogItems_ForList()
        {
            Map = tickets =>
                from ticket in tickets
                select new
                {
                    _ = ticket.ModifiedBy.GroupBy(m => m.ActionedBy.Id)		// Grouping by user
                                        .Select(x => CreateField($"{nameof(Result.ModifiedByUser)}_{x.Key!.Replace("/","").ToLower()}",
                                                                x.Max(o => o.Timestamp)
                                                                )
                                                )
                };
        }
    }
    

Now we can get all the tickets modified by a user ID and sort in descending order by the timestamp of the last change by the current user via:

    
    var userKey = userId.Replace("/", "").ToLower();  // For 'users/1-A' get 'users1-A'
    s.Query<Result, BacklogItems_ForList>()
     .Where(t => t.ModifiedByUser[userKey] > DateTime.MinValue)
     .OrderByDescending(t => t.ModifiedByUser[userKey])
    

Note: here we take string concatenation for the key outside of the query, it'd fail otherwise.

2.2. Filter by custom fields

We have already covered all the gotchas so it should be a smooth ride for filtering on custom fields. A simple case looks pretty much the same as for ModifiedBy:

    
    __ = ticket.CustomFields.Select(x => CreateField($"{nameof(Result.CustomFields)}_{x.Key.Replace("/","").ToLower()}", x.Value))
    

But here it all boils down to the supported data types of the custom fields. And we can knock ourselves out:

  • can search in text fields;
  • check for equality in numeric fields;
  • filter on sub-attributes for complex structures (e.g. on ID for UserReference when referencing to other users).

So the index would require resolving FieldType of the custom field and running many if/else conditions for each type:

    
    public class BacklogItems_ForList : AbstractIndexCreationTask<BacklogItem>
    {
        public class Result
        {
            public IDictionary<string, string> CustomFields { get; set; }
        }
        public BacklogItems_ForList()
        {
            Map = tickets =>
                from ticket in tickets
                select new
                {
                    __ = from x in ticket.CustomFields
                            let fieldType = LoadDocument<CustomField.CustomField>(x.Key).FieldType
                            let key = $"{nameof(Result.CustomFields)}_{x.Key.Replace("/", "").ToLower()}"
                            select 
                                (fieldType == CustomFieldType.Text)
                                    // search in text Custom Fields
                                    ? CreateField(key, x.Value, false, true)	
                                    : (fieldType == CustomFieldType.UserReference)
                                        // Exact match of User ID
                                        ? CreateField(key, x.Value.Id)
                                        // Other Custom Fields (e.g. numbers, dates) can use a '≥' comparison
                                        : CreateField(key, x.Value)
                };
        }
    }
    

And here how we can query against that index:

  1. When the Custom Field in question has type Text
            
        "CustomFields": {
            "CustomFields/1-A": "Mr. Burns"
        }
        
    we can search text:
            
        var fieldKey = customFieldId.Replace("/", "").ToLower();  // For 'CustomFields/1-A' get 'customfields1-A'
        s.Query<Result, BacklogItems_ForList>()
        .Search(t => t.CustomFields[fieldKey], "Burns")
        
  2. When the Custom Field in question is a Number:
            
        "CustomFields": {
            "CustomFields/2-A": 10000000000
        }
        
    we can use the '>' operator:
            
        var fieldKey = customFieldId.Replace("/", "").ToLower();  // For 'CustomFields/2-A' get 'customfields2-A'
        s.Query<Result, BacklogItems_ForList>()
        .Where(t => t.CustomFields[fieldKey] > 1_000_000)
        
    A similar approach would be used for querying on user IDs.

3. RavenDB Studio Tools


Dynamic fields lack of transparency. To have a look under the hood (e.g. see the Index Terms) use the RavenDB Studio.

If you open the Index Terms for an index when running one of the YABT tests covering the scenarios described above (e.g. this one in the YABT repo), you would see terms like:

RavenDB Studio Index Terms

That screenshot shows values for Custom Fields and user's modifications you can query on.

That's it. Happy filtering.

Check out the full source code at our repository on GitHub - github.com/ravendb/samples-yabt and let us know what you think. Stay tuned for the next articles in the YABT series.

Read more articles in this series


Categories:

NoSQL Database Demo

Watch
Live Demo

A customized
presentation of RavenDB