Indexing Hierarchical Data



Hierarchical data

One significant advantage of document databases is their tendency not to impose limits on data structuring. Hierarchical data structures exemplify this quality well; for example, consider the commonly used comment thread, implemented using objects such as:

class BlogPost {
    constructor(title, author, text, comments) {
        this.title = title;
        this.author = author;
        this.text = text;

        // Blog post readers can leave comments
        this.comments = comments;
    }
}

class BlogPostComment {
    constructor(author, text, comments) {
        this.author = author;
        this.text = text;

        // Allow nested comments, enabling replies to existing comments
        this.comments = comments;
    }
}

Readers of a post created using the above BlogPost structure can add BlogPostComment entries to the post's comments field, and readers of these comments can reply with comments of their own, creating a recursive hierarchical structure.

For example, the following document, BlogPosts/1-A, represents a blog post by John that contains multiple layers of comments from various authors.

BlogPosts/1-A:

{
    "author": "John",
    "title": "Post title..",
    "text": "Post text..",
    "comments": [
        {
            "author": "Moon",
            "text": "Comment text..",
            "comments": [
                {
                    "author": "Bob",
                    "text": "Comment text.."
                },
                {
                    "author": "Adel",
                    "text": "Comment text..",
                    "comments": {
                        "author": "Moon",
                        "text": "Comment text.."
                    }
                }
            ]
        }
    ],
    "@metadata": {
    "@collection": "BlogPosts"
    }
}

Index hierarchical data

To index the elements of a hierarchical structure like the one above, use RavenDB's Recurse method.

The sample index below shows how to use Recurse to traverse the comments in the post thread and index them by their authors. We can then query the index for all blog posts that contain comments by specific authors.

class BlogPosts_ByCommentAuthor extends AbstractCsharpIndexCreationTask {
    constructor() {
        super();

        this.map = `
            docs.BlogPosts.Select(post => new { 
                authors = this.Recurse(post, x => x.comments).Select(x0 => x0.author)
            })`;
    }
}
const indexDefinition = new IndexDefinition();

indexDefinition.name = "BlogPosts/ByCommentAuthor";
indexDefinition.maps = new Set([
    `from blogpost in docs.BlogPosts
     let authors = Recurse(blogpost, (Func<dynamic, dynamic>)(x => x.comments))
     let authorNames = authors.Select(x => x.author)
     select new
     {
         Authors = authorNames
     }`
]);

await store.maintenance.send(new PutIndexesOperation(indexDefinition));

Query the index

The index can be queried for all blog posts that contain comments made by specific authors.

Query the index using code:

const results = await session
    .query({ indexName: "BlogPosts/ByCommentAuthor" })
     // Query for all blog posts that contain comments by 'Moon':
    .whereEquals("authors", "Moon")
    .all();
from index "BlogPosts/ByCommentAuthor"
where authors == "Moon"

Query the index using the Studio:

  • Query the index from the Studio's List of Indexes view:

    "List of Indexes view"

    List of Indexes view

  • View the query results in the Query view:

    "Query View"

    Query view

  • View the list of terms indexed by the Recurse method:

    "Click to View Index Terms"

    Click to view index terms

    "Index Terms"

    Index terms