Querying: Distinct
The Distinct
method allows you to remove duplicates from the result. Items are compared based on the fields listed in the select
section of the query.
// returns sorted list of countries w/o duplicates
IList<string> countries = session
.Query<Order>()
.OrderBy(x => x.ShipTo.Country)
.Select(x => x.ShipTo.Country)
.Distinct()
.ToList();
// returns sorted list of countries w/o duplicates
IList<string> countries = session
.Advanced
.DocumentQuery<Order>()
.OrderBy(x => x.ShipTo.Country)
.SelectFields<string>("ShipTo.Country")
.Distinct()
.ToList();
from Orders
select distinct ShipTo.Country
Paging
Please read the dedicated article about paging through tampered results. This kind of paging is required when using a distinct keyword.
Count
RavenDB supports returning counts when the distinct operation is used.
var numberOfCountries = session
.Query<Order>()
.Select(x => x.ShipTo.Country)
.Distinct()
.Count();
var numberOfCountries = session
.Advanced
.DocumentQuery<Order>()
.SelectFields<string>("ShipTo.Country")
.Distinct()
.Count();
Performance
Please keep in mind that this operation might not be efficient for large sets of data due to the need to scan all of the index results in order to find all the unique values.
The same result might be achieved by creating a Map-Reduce index that aggregates data by the field where you want a distinct value of. e.g.
public class Order_Countries : AbstractIndexCreationTask<Order, Order_Countries.Result>
{
public class Result
{
public string Country { get; set; }
}
public Order_Countries()
{
Map = orders => from o in orders
select new Result
{
Country = o.ShipTo.Country
};
Reduce = results => from r in results
group r by r.Country into g
select new Result
{
Country = g.Key
};
}
}
var numberOfCountries = session
.Query<Order_Countries.Result, Order_Countries>()
.Count();
var numberOfCountries = session
.Advanced
.DocumentQuery<Order_Countries.Result, Order_Countries>()
.Count();