Query for distinct results



Sample query with Distinct

// Get a sorted list without duplicates:
// =====================================

List<string> countries = session
    .Query<Order>()
    .OrderBy(x => x.ShipTo.Country)
    .Select(x => x.ShipTo.Country)
     // Call 'Distinct' to remove duplicates from results
     // Items wil be compared based on field 'Country' that is specified in the above 'Select' 
    .Distinct()
    .ToList();

// Running this on the Northwind sample data
// will result in a sorted list of 21 countries w/o duplicates.
// Get a sorted list without duplicates:
// =====================================

List<string> countries = await asyncSession
    .Query<Order>()
    .OrderBy(x => x.ShipTo.Country)
    .Select(x => x.ShipTo.Country)
     // Call 'Distinct' to remove duplicates from results
     // Items wil be compared based on field 'Country' that is specified in the above 'Select' 
    .Distinct()
    .ToListAsync();

// Running this on the Northwind sample data
// will result in a sorted list of 21 countries w/o duplicates.
// Get a sorted list without duplicates:
// =====================================

IList<string> countries = session
    .Advanced
    .DocumentQuery<Order>()
    .OrderBy(x => x.ShipTo.Country)
    .SelectFields<string>("ShipTo.Country")
     // Call 'Distinct' to remove duplicates from results
     // Items wil be compared based on field 'Country' that is specified in the above 'SelectFields' 
    .Distinct()
    .ToList();

// Running this on the Northwind sample data
// will result in a sorted list of 21 countries w/o duplicates.
from "Orders"
order by ShipTo.Country
select distinct ShipTo.Country

Paging with Distinct

A special approach must be used when calling Distinct() while paging.
Please read the dedicated article about paging through tampered results.

Count with Distinct

Use Count() in combination with Distinct() to get the number of unique items.
Similar to ToList(), Count() triggers query execution on the server-side.

// Count the number of unique countries:
// =====================================

var numberOfCountries = session
    .Query<Order>()
    .Select(x => x.ShipTo.Country)
    .Distinct()
    .Count();

// Running this on the Northwind sample data,
// will result in 21, which is the number of unique countries.
// Count the number of unique countries:
// =====================================

var numberOfCountries = await asyncSession
    .Query<Order>()
    .Select(x => x.ShipTo.Country)
    .Distinct()
    .CountAsync();

// Running this on the Northwind sample data,
// will result in 21, which is the number of unique countries.
// Count the number of unique countries:
// =====================================

var numberOfCountries = session
    .Advanced
    .DocumentQuery<Order>()
    .SelectFields<string>("ShipTo.Country")
    .Distinct()
    .Count();

// Running this on the Northwind sample data,
// will result in 21, which is the number of unique countries.
// This RQL is intended for use when issuing a Raw Query from the client API.
// Running directly from the Studio will not display the number of results.
// ========================================================================

from "Orders"
select distinct ShipTo.Country
limit 0, 0

Performance cost and alternative approaches

  • Please keep in mind that using Count() with Distinct() might not be efficient for large sets of data
    as it requires scanning all index results to find unique values.

  • Getting the distinct items' count can also be achieved by creating a Map-Reduce index that will aggregate data by the field for which distinct count results are needed.

  • Using a Map-Reduce index is more efficient since computations are done during indexing time and not at query time. The entire dataset is indexed once, whereafter the aggregated value is always kept up to date as indexing will occur only for new/modified data.

Map-Reduce index example:

Index definition:

public class Orders_ByShipToCountry : AbstractIndexCreationTask<Order, Orders_ByShipToCountry.IndexEntry>
{
    public class IndexEntry
    {
        public string Country { get; set; }
        public int CountryCount { get; set; }
    }

    public Orders_ByShipToCountry()
    {
        // The Map phase indexes the country listed in each order document
        // CountryCount is assigned with 1, which will be aggregated in the Reduce phase
        Map = orders => from order in orders
            select new IndexEntry
            {
                Country = order.ShipTo.Country,
                CountryCount = 1
            };

        // The Reduce phase will group the country results and aggregate the CountryCount
        Reduce = results => from result in results
            group result by result.Country
            into g
            select new IndexEntry
            {
                Country = g.Key,
                CountryCount = g.Sum(x => x.CountryCount)
            };
    }
}

Query the index:

// Query the map-reduce index defined above
var queryResult = session
    .Query<Orders_ByShipToCountry.IndexEntry, Orders_ByShipToCountry>()
    .ToList();

// The resulting list contains all index-entry items where each entry represents a country. 
// The size of the list corresponds to the number of unique countries.
var numberOfUniqueCountries = queryResult.Count;
// Query the map-reduce index defined above
var queryResult = await asyncSession
    .Query<Orders_ByShipToCountry.IndexEntry, Orders_ByShipToCountry>()
    .ToListAsync();

// The resulting list contains all index-entry items where each entry represents a country. 
// The size of the list corresponds to the number of unique countries.
var numberOfUniqueCountries = queryResult.Count;
// Query the map-reduce index defined above
var queryResult = session.Advanced
    .DocumentQuery<Orders_ByShipToCountry.IndexEntry, Orders_ByShipToCountry>()
    .ToList();

// The resulting list contains all index-entry items where each entry represents a country. 
// The size of the list corresponds to the number of unique countries.
var numberOfUniqueCountries = queryResult.Count;
from index "Orders/ByShipToCountry"

Combining faceted queries with Map-Reduce:

Faceted queries can be used together with a map-reduce index as another alternative approach.
See the article "Implementing a count(distinct) query in RavenDB" for an example.