RavenDB Compared to Known NoSQL Built-in Full-text Search Databases
The main differences and what makes RavenDB hard to beat.
RavenDB is a NoSQL database with built-in full-text search functionality – both on the cloud, and on-premise – at no extra cost. That sounds great, but how does it compare to other database systems on the market? We need some context. Let’s start with a quick recap of how full-text search works.
Table of contents
Full-Text Search in a Nutshell
In order to achieve the lightning-fast search speeds we’ve come to expect from modern search engines, we need to use indexes. Indexes in databases are similar to those in the back of books: ordered lists of terms from the text with references to each occurrence of those terms.
What appears in an index is determined by the analyzer used to create it. An analyzer is a piece of code that breaks text strings into tokens, and those tokens become the terms listed in the index. The exact process depends on the analyzer used, and different analyzers will be more appropriate for different types of data.
For a large amount of data, it’s much quicker to search an index for a term than to scan through the entire data set. There are other advantages to having terms organized in an index, such as being able to see the number of occurrences of a term or find similar terms nearby. This information can be used to implement features like relevancy ranking and suggestions.
The Small World of Database Search Engines
Database systems, and the search engines they use, generally don’t reinvent the wheel when it comes to full-text search. Instead, they build on an existing library like Lucene.
Lucene is a “library providing powerful indexing and search features, as well as spellchecking, hit highlighting and advanced analysis/tokenization capabilities.”
It’s fast, efficient, and feature-rich. It’s also open-source and free to download.
Lucene powers search for big-name companies like Twitter and LinkedIn, and search engines like Elasticsearch and Solr. There are alternatives to Lucene, but it’s currently the industry standard for full-text search libraries, and the most commonly used.
Full-Text Search Implementation in Databases
When it comes to implementing full-text search in databases, there are a few different approaches:
The simplest is to not do it at all. Many simpler database systems only offer basic search functionality, i.e. no full-text indexing; only the ability to scan text for a string value. These searches are slow, limited, and do nothing to ensure results are relevant.
Another approach is to connect the database to a search engine, usually via plug-in or add-on. Examples include DynamoDB, Firebase, and CouchDB. This approach seems reasonable but it requires sending data to third-party software for analysis. This can mean integration and security issues, less than optimal performance, and additional fees.
The ideal approach (if done well) is to build the search engine right into the database. Performance is completely optimized, there are no integration issues, and there’s no setup required. This is the approach RavenDB takes: an internal search engine custom-built around Lucene.
An Excuse to Charge You for Full-text Search
Now that we have the context, we can look at the costs.
In RavenDB, features like full-text search are simply part of the package. There’s no money down, no associated fees, no catch.
Unfortunately, this isn’t a common approach.
Having full-text search as an “optional” external service gives companies a great excuse to charge you for it.
Some, like Azure CosmosDB, directly charge you to use their search engine.
Some, like DynamoDB, offer you a “free, open-source” search engine like Elasticsearch. Sounds good, right? They are built on free open-source libraries after all. But if you read a bit further you’ll find that unless you want to set up and manage everything yourself, you’ll have to use their paid “Search Service”.
Now you’re paying to search a database that you’re already paying to use.
Other companies have more subtle ways of making you pay for full-text searches.
MongoDB, like RavenDB, has its own built-in Lucene-powered search engine. Unfortunately, they’ve only integrated it into their Cloud service, which they’d really prefer you to use. If you want or need an on-premise solution, you’re out of luck. You’re left with a choice: get your wallet out and change to their preferred (more expensive) service, or make do with extremely basic search functionality.
So What Can You Do With the Built-in Search In RavenDB?
Because RavenDB’s full-text search is built around Lucene it has all of Lucene’s great functionality and analyzers built-in. (You can also create your own analyzers if you wish.)
Results are automatically ranked according to relevancy, by a system that can be customized with boosting. Queries can be extensively customized. You can specify search operators, use prefixes and wildcards, get suggestions and similar terms, show text snippets with terms highlighted, the list goes on…
RavenDB’s approach to indexing in general offers additional benefits.
Indexes are automatically updated whenever data is added or changed, meaning the server can respond quickly to searches even after large additions or changes. You can specify what data to index, but if you don’t RavenDB will automatically create and manage indexes for you.
Most of the major players in database systems have similar full-text search functionality, especially those built on the same code libraries. The main differences are in the integration and the cost, and unless you can find a database system that pays you to search, RavenDB is going to be hard to beat.