see on GitHub

Querying: MoreLikeThis

MoreLikeThis returns a list of similar documents that are related to a given document. This feature can be used for situations like when a user views an article. Many news sites show a list of the related articles at the bottom of the page. To accomplish this, the RavenDB MoreLikeThis uses the MoreLikeThis from the Lucene contrib project. To find out more about the algorithm, please read Aaron Johnson excellent blog post that is available here.

Setup

In order to work, MoreLikeThis requires access to the text in the index. Therefore, the index being queried needs to store the fields or store the term vectors for those fields.

public class Article {
    private String id;
    private String name;
    private String articleBody;

    public String getId() {
        return id;
    }

    public void setId(String id) {
        this.id = id;
    }

    public String getName() {
        return name;
    }

    public void setName(String name) {
        this.name = name;
    }

    public String getArticleBody() {
        return articleBody;
    }

    public void setArticleBody(String articleBody) {
        this.articleBody = articleBody;
    }
}

public class Articles_ByArticleBody extends AbstractIndexCreationTask {
    public Articles_ByArticleBody() {
        map = "from doc in docs.articles " +
            "select new {" +
            "   doc.articleBody " +
            "}";

        store("articleBody", FieldStorage.YES);
        analyze("articleBody", "StandardAnalyzer");
    }
}

Basic Usage

MoreLikeThis has many defaults already set and the simplest mode will satisfy the majority of the usage scenarios.

Java
RQL

List<Article> articles = session
    .query(Article.class, Articles_ByArticleBody.class)
    .moreLikeThis(builder -> builder.usingDocument(x -> x.whereEquals("id()", "articles/1")))
    .toList();

from index 'Articles/ByArticleBody' 
where morelikethis(id() = 'articles/1')

MoreLikeThis will use all the fields defined in an index. To use only a specific field or fields, pass them in MoreLikeThisOptions.Fields property.

Java
RQL

MoreLikeThisOptions options = new MoreLikeThisOptions();
options.setFields(new String[]{ "articleBody" });
List<Article> articles = session
    .query(Article.class, Articles_ByArticleBody.class)
    .moreLikeThis(builder -> builder
        .usingDocument(x -> x.whereEquals("id()", "articles/1"))
        .withOptions(options))
    .toList();

from index 'Articles/ByArticleBody' 
where morelikethis(id() = 'articles/1', '{ "Fields" : [ "articleBody" ] }')

Options

Default parameters can be changed by manipulating MoreLikeThisOptions properties and passing them to the MoreLikeThis.

Options
MinimumTermFrequency	Integer	Ignores terms with less than this frequency in the source doc
MaximumQueryTerms	Integer	Returns a query with no more than this many terms
MaximumNumberOfTokensParsed	Integer	The maximum number of tokens to parse in each example doc field that is not stored with TermVector support
MinimumWordLength	Integer	Ignores words less than this length or, if 0, then this has no effect
MaximumWordLength	Integer	Ignores words greater than this length or if 0 then this has no effect
MinimumDocumentFrequency	Integer	Ignores words which do not occur in at least this many documents
MaximumDocumentFrequency	Integer	Ignores words which occur in more than this many documents
MaximumDocumentFrequencyPercentage	Integer	Ignores words which occur in more than this percentage of documents
Boost	Boolean	Boost terms in query based on score
BoostFactor	Float	Boost factor when boosting based on score
StopWordsDocumentId	String	Document ID containing custom stop words
Fields	String[]	Fields to compare

Stop Words

Some of Lucene analyzers have a built-in list of common English words that are usually not useful for searching (like "a", "as", "the" etc.). Those words are called stop words and they are considered to be uninteresting and ignored. If a used analyzer does not support stop words, or you need to overload them, you can specify your own set of stop words. A document with a list of stop words can be stored in RavenDB by storing the MoreLikeThisStopWords document:

MoreLikeThisStopWords stopWords = new MoreLikeThisStopWords();
stopWords.setStopWords(Arrays.asList("I", "A", "Be"));
session.store(stopWords, "Config/Stopwords");

The document ID is then set in the MoreLikeThisOptions.

Remarks

Information

Please note that default values for settings, like MinimumDocumentFrequency, MinimumTermFrequency, and MinimumWordLength, may result in filtering out related articles, especially when there is little data set (e.g. during development).

see on GitHub

RavenDB

RavenDB Cloud

Try

Experience interactive demos and playground server

RavenDB Docs

RavenDB Cloud Docs

Documentation Guide

Download

Features

Performance

Comparison

What’s New

Demo

Bootcamp

Webinars

Workshops

Inside RavenDB Book

GitHub

StackOverflow

Articles

Whitepapers

Events

Promotional Materials

Unlock your business potential

Use Cases

Articles

Whitepapers

Press Releases

Industry Reports

Performance

Comparison

Proof of Concept Program

Academic Program

Events

What’s New

Roadmap

On-premise Pricing

Cloud Pricing

Support

Proof of Concept Program

Academic Program

Querying: MoreLikeThis

Setup

Basic Usage

Options

Stop Words

Remarks

Information

Related Articles

Client API

RavenDB

RavenDB Cloud

Try

RavenDB Docs

RavenDB Cloud Docs

Documentation Guide

Download

Features

Performance

Comparison

What’s New

Demo

Bootcamp

Webinars

Workshops

Inside RavenDB Book

GitHub

StackOverflow

Articles

Whitepapers

Events

Promotional Materials

Use Cases

Articles

Whitepapers

Press Releases

Industry Reports

Performance

Comparison

Proof of Concept Program