Querying: moreLikeThis

moreLikeThis returns a list of documents that are related to a given document.
This feature can be used, for example, to show a list of related articles at the bottom of the currently-read article page, as done in many news sites.
To accomplish this, RavenDB uses the Lucene contrib project moreLikeThis feature.
In this page:
- Setup
- Basic Usage
- Options
- Stop Words
- Remarks

Setup

To be able to work, MoreLikeThis requires access to the index text.
The queried index needs, therefore, to store the fields or the term vectors for these fields.

class Article {
    constructor(id, name, articleBody) {
        this.id = id;
        this.name = name;
        this.articleBody = articleBody;
    }
}

class Articles_ByArticleBody extends AbstractIndexCreationTask {
    constructor() {
        super();

        this.map = `from doc in docs.Articles select new { 
            doc.articleBody 
        }`;

        this.store("articleBody", "Yes");
        this.analyze("articleBody", "StandardAnalyzer");
    }
}

Basic Usage

Many MoreLikeThis options are set by default.
The simplest mode will satisfy most usage scenarios.

Node.js
RQL

const articles = await session
    .query({ indexName: "Articles/ByArticleBody" })
    .moreLikeThis(builder => 
        builder.usingDocument(x => 
            x.whereEquals("id()", "articles/1")))
    .all();

from index 'Articles/ByArticleBody' 
where morelikethis(id() = 'articles/1')

MoreLikeThis will use all the fields defined in an index.
To use only specific fields, pass these fields in the MoreLikeThisOptions.fields property.

Node.js
RQL

const options = {
    fields: [ "articleBody" ]
};
const articles = await session
    .query({ indexName: "Articles/ByArticleBody" })
    .moreLikeThis(builder => builder
        .usingDocument(x => x.whereEquals("id()", "articles/1"))
        .withOptions(options))
    .all();

from index 'Articles/ByArticleBody' 
where morelikethis(id() = 'articles/1', '{ "Fields" : [ "articleBody" ] }')

Options

Default parameters can be changed by manipulating MoreLikeThisOptions properties and passing them to MoreLikeThis.

Options
minimumTermFrequency	`number`	Ignores terms with less than this frequency in the source doc
maximumQueryTerms	`number`	Returns a query with no more than this many terms
maximumNumberOfTokensParsed	`number`	The maximum number of tokens to parse in each example doc field that is not stored with TermVector support
minimumWordLength	`number`	Ignores words less than this length or, if 0, then this has no effect
maximumWordLength	`number`	Ignores words greater than this length or if 0 then this has no effect
minimumDocumentFrequency	`number`	Ignores words which do not occur in at least this many documents
maximumDocumentFrequency	`number`	Ignores words which occur in more than this many documents
maximumDocumentFrequencyPercentage	`number`	Ignores words which occur in more than this percentage of documents
boost	`boolean`	Boost terms in query based on score
boostFactor	`number`	Boost factor when boosting based on score
stopWordsDocumentId	`string`	Document ID containing custom stop words
fields	`string[]`	Fields to compare

Stop Words

Some Lucene analyzers have a built-in list of common English words that are usually not useful for searching, like "a", "as", "the", etc.
These words, called stop words, are considered uninteresting and are ignored.
If a used analyzer does not support stop words, or you need to overload these terms, you can specify your own set of stop words.
A document with a list of stop words can be stored in RavenDB by storing the MoreLikeThisStopWords document:

const stopWords = new MoreLikeThisStopWords();
stopWords.stopWords = [ "I", "A", "Be" ];
await session.store(stopWords, "Config/Stopwords");

The document ID will then be set in the MoreLikeThisOptions.

{PANEL: Remarks}

Please note that default values for settings, like minimumDocumentFrequency, minimumTermFrequency, and minimumWordLength, may result in filtering out related articles, especially with a small data set (e.g. during development).

see on GitHub

RavenDB

RavenDB Cloud

Try

Experience interactive demos and playground server

Download

RavenDB Cloud

Features

Performance

Comparison

What’s New

RavenDB Docs

RavenDB Cloud Docs

Documentation Guide

Demo

Bootcamp

Webinars

Workshops

Inside RavenDB Book

GitHub

StackOverflow

Articles

Whitepapers

Events

Promotional Materials

Unlock your business potential

Use Cases

Articles

Whitepapers

Press Releases

Industry Reports

Performance

Comparison

Proof of Concept Program

Academic Program

Events

What’s New

Roadmap

On-premise Pricing

Cloud Pricing

Proof of Concept Program

Academic Program

Querying: moreLikeThis

Setup

Basic Usage

Options

Stop Words

Related Articles

Client API

RavenDB

RavenDB Cloud

Try

Download

RavenDB Cloud

Features

Performance

Comparison

What’s New

RavenDB Docs

RavenDB Cloud Docs

Documentation Guide

Demo

Bootcamp

Webinars

Workshops

Inside RavenDB Book

GitHub

StackOverflow

Articles

Whitepapers

Events

Promotional Materials

Use Cases

Articles

Whitepapers

Press Releases

Industry Reports

Performance

Comparison

Proof of Concept Program

Academic Program