Understanding query processing and wildcards in RavenDB
I recently got an interesting question about how RavenDB is processing certain types of queries. The customer in question had a document with a custom analyzer and couldn’t figure out why certain queries didn’t work.
For the purpose of the discussion, let’s consider the following analyzer:
In other words, when using this analyzer, we’ll have the following transformations:
- “Singing avocadoes” – will be: “sing”, “avocadoes”
- “Sterling silver” – will be: “ster”, “silver”
- “Singularity Trailer” – will be “singularity”, “trailer”
As a reminder, this is used in a reverse index, which gives us the ability to lookup a term and find all the documents containing that term.
An analyzer is applied on the text that is being indexed, but also on the queries. In other words, because during indexing I changed “singing” to “sing”, I also need to do the same for the query. Otherwise a query for “singing voice” will have no results, even if the “singing” term was in the original data.
The rules change when we do a prefix search, though. Consider the following query:
What should we be searching on here? Remember, this is using an analyzer, but we are also doing a prefix search. Lets consider our options. If we pass this through an analyzer, the query will change its meaning. Instead of searching for terms starting with “sing”, we’ll search for terms starting with “s”.
That will give us results for “Sterling Silver”, which is probably not expected. In this case, by the way, I’m actually looking for the term “singularity”, and processing the term further would prevent that.
For that reason, RavenDB assumes that queries using wildcard searches are not subject to an analyzer and will not process them using one. The reasoning is simple, by using a wildcard you are quite explicitly stating that this is not a real term.