Indexing Hierarchical Data



Hierarchical data

One significant advantage of document databases is their tendency not to impose limits on data structuring. Hierarchical data structures exemplify this quality well; for example, consider the commonly used comment thread, implemented using objects such as:

class BlogPost
{
    private ?string $author = null;
    private ?string $title = null;
    private ?string $text = null;

    // Blog post readers can leave comments
    public ?BlogPostCommentList $comments = null;

    public function getAuthor(): ?string
    {
        return $this->author;
    }

    public function setAuthor(?string $author): void
    {
        $this->author = $author;
    }

    public function getTitle(): ?string
    {
        return $this->title;
    }

    public function setTitle(?string $title): void
    {
        $this->title = $title;
    }

    public function getText(): ?string
    {
        return $this->text;
    }

    public function setText(?string $text): void
    {
        $this->text = $text;
    }

    public function getComments(): ?BlogPostCommentList
    {
        return $this->comments;
    }

    public function setComments(?BlogPostCommentList $comments): void
    {
        $this->comments = $comments;
    }
}

class BlogPostComment
{
    private ?string $author = null;
    private ?string $text = null;

    // Comments can be left recursively
    private ?BlogPostCommentList $comments = null;

    public function getAuthor(): ?string
    {
        return $this->author;
    }

    public function setAuthor(?string $author): void
    {
        $this->author = $author;
    }

    public function getText(): ?string
    {
        return $this->text;
    }

    public function setText(?string $text): void
    {
        $this->text = $text;
    }

    public function getComments(): ?BlogPostCommentList
    {
        return $this->comments;
    }

    public function setComments(?BlogPostCommentList $comments): void
    {
        $this->comments = $comments;
    }
}

class BlogPostCommentList extends TypedList
{
    public function __construct()
    {
        parent::__construct(BlogPost::class);
    }
}

Readers of a post created using the above BlogPost structure can add BlogPostComment entries to the post's comments field, and readers of these comments can reply with comments of their own, creating a recursive hierarchical structure.

For example, the following document, BlogPosts/1-A, represents a blog post by John that contains multiple layers of comments from various authors.

BlogPosts/1-A:

{
    "Author": "John",
    "Title": "Post title..",
    "Text": "Post text..",
    "Comments": [
        {
            "Author": "Moon",
            "Text": "Comment text..",
            "Comments": [
                {
                    "Author": "Bob",
                    "Text": "Comment text.."
                },
                {
                    "Author": "Adel",
                    "Text": "Comment text..",
                    "Comments": {
                        "Author": "Moon",
                        "Text": "Comment text.."
                    }
                }
            ]
        }
    ],
    "@metadata": {
        "@collection": "BlogPosts"
    }
}

Index hierarchical data

To index the elements of a hierarchical structure like the one above, use RavenDB's Recurse method.

The sample index below shows how to use Recurse to traverse the comments in the post thread and index them by their authors. We can then query the index for all blog posts that contain comments by specific authors.

class BlogPosts_ByCommentAuthor_Result
{
    private ?StringArray $authors = null;

    public function getAuthors(): ?StringArray
    {
        return $this->authors;
    }

    public function setAuthors(?StringArray $authors): void
    {
        $this->authors = $authors;
    }
}

class BlogPosts_ByCommentAuthor extends AbstractIndexCreationTask
{
    public function __construct()
    {
        parent::__construct();

        $this->map = "from blogpost in docs.Blogposts let authors = Recurse(blogpost, x => x.comments) select new { authors = authors.Select(x => x.author) }";
    }
}
$indexDefinition = new IndexDefinition();
$indexDefinition->setName("BlogPosts/ByCommentAuthor");
$indexDefinition->setMaps([
    "from blogpost in docs.BlogPosts
    from comment in Recurse(blogpost, (Func<dynamic, dynamic>)(x => x.Comments))
    select new
    {
        Author = comment.Author
    }"
]);

$store->maintenance()->send(new PutIndexesOperation($indexDefinition));

Query the index

The index can be queried for all blog posts that contain comments made by specific authors.

Query the index using code:

/** @var array<BlogPost> $results */
$results = $session
    ->query(BlogPosts_ByCommentAuthor_Result::class, BlogPosts_ByCommentAuthor::class)
    ->whereEquals("authors", "john")
    ->ofType(BlogPost::class)
    ->toList();
/** @var array<BlogPost> $results */
$results = $session
        ->advanced()
        ->documentQuery(BlogPost::class, BlogPosts_ByCommentAuthor::class)
        ->whereEquals("authors", "John")
        ->toList();
from index "BlogPosts/ByCommentAuthor"
where Authors == "Moon"

Query the index using Studio:

  • Query the index from Studio's List of Indexes view:

    "List of Indexes view"

    List of Indexes view

  • View the query results in the Query view:

    "Query View"

    Query view

  • View the list of terms indexed by the Recurse method:

    "Click to View Index Terms"

    Click to view index terms

    "Index Terms"

    Index terms