Indexing Hierarchical Data



Hierarchical data

One significant advantage of document databases is their tendency not to impose limits on data structuring. Hierarchical data structures exemplify this quality well; for example, consider the commonly used comment thread, implemented using objects such as:

class BlogPost:
    def __init__(self, author: str = None, title: str = None, text: str = None, comments: List[BlogPostComment] = None):
        self.author = author
        self.title = title
        self.text = text
        
        # Blog post readers can leave comments
        self.comments = comments


class BlogPostComment:
    def __init__(self, author: str = None, text: str = None, comments: List[BlogPostComment] = None):
        self.author = author
        self.text = text

        # Allow nested comments, enabling replies to existing comments
        self.comments = comments

Readers of a post created using the above BlogPost structure can add BlogPostComment entries to the post's comments field, and readers of these comments can reply with comments of their own, creating a recursive hierarchical structure.

For example, the following document, BlogPosts/1-A, represents a blog post by John that contains multiple layers of comments from various authors.

BlogPosts/1-A:

{
    "Author": "John",
    "Title": "Post title..",
    "Text": "Post text..",
    "Comments": [
        {
            "Author": "Moon",
            "Text": "Comment text..",
            "Comments": [
                {
                    "Author": "Bob",
                    "Text": "Comment text.."
                },
                {
                    "Author": "Adel",
                    "Text": "Comment text..",
                    "Comments": {
                        "Author": "Moon",
                        "Text": "Comment text.."
                    }
                }
            ]
        }
    ],
    "@metadata": {
        "@collection": "BlogPosts"
    }
}

Index hierarchical data

To index the elements of a hierarchical structure like the one above, use RavenDB's Recurse method.

The sample index below shows how to use Recurse to traverse the comments in the post thread and index them by their authors. We can then query the index for all blog posts that contain comments by specific authors.

class BlogPosts_ByCommentAuthor(AbstractIndexCreationTask):
    class Result:
        def __init__(self, authors: List[str] = None):
            self.authors = authors

    def __init__(self):
        super().__init__()
        self.map = "from blogpost in docs.Blogposts let authors = Recurse(blogpost, x => x.comments) select new { authors = authors.Select(x => x.author) }"
store.maintenance.send(
    PutIndexesOperation(
        IndexDefinition(
            name="BlogPosts/ByCommentAuthor",
            maps={
                """from blogpost in docs.BlogPosts
 in Recurse(blogpost, (Func<dynamic, dynamic>)(x => x.comments))
select new
{
 comment.author
}"""
            },
        )
    )
)

Query the index

The index can be queried for all blog posts that contain comments made by specific authors.

Query the index using code:

results = list(
    session.query_index_type(BlogPosts_ByCommentAuthor, BlogPosts_ByCommentAuthor.Result).where_equals(
        "authors", "Moon"
    )
)
from index "BlogPosts/ByCommentAuthor"
where Authors == "Moon"

Query the index using the Studio:

  • Query the index from the Studio's List of Indexes view:

    "List of Indexes view"

    List of Indexes view

  • View the query results in the Query view:

    "Query View"

    Query view

  • View the list of terms indexed by the Recurse method:

    "Click to View Index Terms"

    Click to view index terms

    "Index Terms"

    Index terms