Indexes: Indexing Hierarchical Data

One of the greatest advantages of a document database is that we have very few limits on how we structure our data. One very common scenario is the usage of hierarchical data structures. The most trivial of them is the comment thread:

public static class BlogPost {
    private String author;
    private String title;
    private String text;
    private List<BlogPostComment> comments;

    public String getAuthor() {
        return author;
    }

    public void setAuthor(String author) {
        this.author = author;
    }

    public String getTitle() {
        return title;
    }

    public void setTitle(String title) {
        this.title = title;
    }

    public String getText() {
        return text;
    }

    public void setText(String text) {
        this.text = text;
    }

    public List<BlogPostComment> getComments() {
        return comments;
    }

    public void setComments(List<BlogPostComment> comments) {
        this.comments = comments;
    }
}

public static class BlogPostComment {
    private String author;
    private String text;
    private List<BlogPostComment> comments;


    public String getAuthor() {
        return author;
    }

    public void setAuthor(String author) {
        this.author = author;
    }

    public String getText() {
        return text;
    }

    public void setText(String text) {
        this.text = text;
    }

    public List<BlogPostComment> getComments() {
        return comments;
    }

    public void setComments(List<BlogPostComment> comments) {
        this.comments = comments;
    }
}

While it is very easy to work with such a structure in all respects, it does bring up an interesting question, namely how can we search for all blog posts that were commented by specified author?

The answer to that is that RavenDB contains built-in support for indexing hierarchies, and you can take advantage of the Recurse method to define an index using the following syntax:

public static class BlogPosts_ByCommentAuthor extends AbstractIndexCreationTask {
    public BlogPosts_ByCommentAuthor() {
        map = "docs.BlogPosts.Select(post => new { " +
            "    authors = this.Recurse(post, x => x.comments).Select(x0 => x0.author) " +
            "})";
    }
}
IndexDefinition indexDefinition = new IndexDefinition();
indexDefinition.setName("BlogPosts/ByCommentAuthor");
indexDefinition.setMaps(Collections.singleton(
    "from post in docs.Posts" +
        "  from comment in Recurse(post, (Func<dynamic, dynamic>)(x => x.comments)) " +
        "  select new " +
        "  { " +
        "      author = comment.author " +
        "  }"
));
store.maintenance().send(new PutIndexesOperation(indexDefinition));
public static class BlogPosts_ByCommentAuthor extends AbstractJavaScriptIndexCreationTask {
    public BlogPosts_ByCommentAuthor() {
        setMaps(Sets.newHashSet("map('BlogPosts', function(b){\n" +
            "            var names = [];\n" +
            "            b.comments.forEach(x => getNames(x, names));\n" +
            "                return {\n" +
            "                   authors : names\n" +
            "                };" +
            "            })"));

        java.util.Map<String, String> additionalSources = new HashMap<>();
        additionalSources.put("The Script", "function getNames(x, names){\n" +
            "        names.push(x.author);\n" +
            "        x.comments.forEach(x => getNames(x, names));\n" +
            "    }");

        setAdditionalSources(additionalSources);
    }
}

This will index all the comments in the thread, regardless of their location in the hierarchy.

List<BlogPost> results = session
    .query(BlogPost.class, BlogPosts_ByCommentAuthor.class)
    .whereEquals("authors", "Ayende Rahien")
    .toList();