see on GitHub

Graph Querying: Overview


RavenDB's experimental graph support allows you to query your database as if it had a graphical structure, gaining extreme efficiency and speed in recognizing relations between data elements and organizing them into searchable patterns. Intricate relationships that would render a relational databases useless, become the asset they are meant to be.


Introduction to graph modelling

  • In The Beginning..
    One of the best known founding moments of graph theory is Leonhard Euler's attempt at solving the Königsberg Bridges riddle, eventually tackling the problem by representing the scenery and its elements in a graph.
    Euler's search for an optimal path is a great referal to the practicality of graph theory, leading all the way to its immense present-day effectiveness in managing large and complex data volumes.
  • ..and large data volumes
    Large, complex data volumes represent an important step in the evolution of data management, and are evidently here to stay and develop.
    As relational databases and the data model they enable are inefficient in (and often incapable of) searching and managing big-data volumes with intricate relations, various applications take part in complementing or replacing them.
    Databases capable of running graph queries are a major contribution in this regard, though not limited to the management of large data volumes and often as comfortable and efficient in handling smaller ones.
  • ..and Multi Model
    It is common to find graph querying as one of the features of a multi-model database, based upon or cooperating with other database features.
    RavenDB's graph capabilities are founded upon a capable document store, and data already deposited in the store can participate graph querying with no preceding arrangements, easing user administration and improving internal logic and data management.

Enabling Graph Querying

Graph Querying is an Experimental feature, still under development. We are happy to provide it, and would be grateful to hear from you regarding your experiences with it.
As other experimental fatures, it is disabled by default. You can enable it following theis simple procedure:

  • Open the RavenDB server folder, e.g. C:\Users\Dave\Downloads\RavenDB-4.1.1-windows-x64\Server
  • Open settings.json for editing
  • Enable the Experimental Features.
    • Verify that the json file contains the following line:
      "Features.Availability": "Experimental"
    • Save settings.json and restart RavenDB Server.

Designing Graph Queries

Graph representations

Graph querying enhances RQL with simple vocabulary and syntax that allow you to approach your existing data as if it had been designed graphically. Here's a basic query that shows relations between employees, using documents taken from the Northwind database RavenDB lets you install as sample data.

  • Graph query:
    match(employees as employee)-[ReportsTo as reportsTo]->(employees as incharge)
  • Query results are provided by the Studio both graphically and textually.

Illustrative graph reqults
Textual graph results


Basic Terms, Syntax and Vocabulary

  • Graph Elements
    Data elements ("Nodes") and their relations ("Edges") are represented in a graph as equally important.

    • Data Nodes*
      A "data node" can be a documents collection, or a subset of selected documents.
      * We use the term "data nodes" to make it easier for you to distinguish between the data elements we talk about here, and servers of a cluster (that are also called "nodes").
    • Edges
      An "edge" is a link between nodes, that joins them in a relation of some sort.
      A RavenDB edge is simply a string-field within a document, that refers to the unique identifier of a document.
      RavenDB's edges are always directional, pointing from one data node to another.
  • Graph results
    Here are the results of a very simple query.
    Figure 1. Simple Relation

    1. The first data node is Dogs/Ruffus, a document named Ruffus in the Dogs collection.
      This is how the document may look like:

      Document: Dogs/Ruffus
          {
             "Owner": "Owners/John",
             "Name": "...",
             "@metadata": {
             "@collection": "Dogs",
             "@id": "Dogs/Ruffus"}
          }

    2. The arrow titled ownedBy is the edge, indicating that Ruffus belongs to John.
      You can find its definition in the Ruffus document as the "Owner" field containing John's ID.

    3. The second data node is Owners/John, a document named John in the Owners collection.

      Document: Owners/John
          {
             "Name": "...",
             "@metadata": {
             "@collection": "Owners",
             "@id": "Owners/John"
             }
          }

  • Graph query
    Here's a query that could have produced the results shown above:
    match(Dogs)-[Owner as ownedBy]->(Owners)
    Let's go through its parts and syntax.

    • The match keyword instructs the retrieval of documents that match specified conditions.
      In this case, the conditions specify document of one collection, connected by ownership to documents of another collection.
    • The data nodes are indicated by surrounding parantheses: (Dogs) and (Owners).
      In this example, they are the Dogs and Owners document collections.
    • The edge is placed within brackets: [Owner as ownedBy].
      It has a specific direction, pointing from Ruffus to John.
      A hyphen connects it to the node it emerges from: -
      An "arrow" combined of a hyphen and a bigger-than symbol connects it to the node it points at: ->
    • You can tag graph elements (nodes and edges) with whatever alias you choose.
      Use the as keyword to do so, like in -
      match(Dogs as dogs)-[Owner as ownedBy]->(Owners as owner)
      Giving elements aliases isn't obligatory when they are defined implicitly, and is required when defining them explicitly.
      It is, however, often recommended and sometimes essential.
      • In our sample query, the Owner relation between a dog and its owner is given the alias "ownedBy", because we wanted to emphasize this aspect of the relations. Another query may emphasize a different aspect by using the alias "occupant", "patient" or something else.
      • The same node or edge may appear multiple times in a query, sometimes in very different roles.
        Using different aliases may be technically needed in such cases.
      • To eliminate an entity from the results, use a sequence of _ symbols as an alias (i.e. _, __, ___..)
      • Each alias needs to be unique.
        Note that this is true for _ aliases as well: use each _ sequence (_, __, ___..) only once.
  • Graph Queries Flow
    • Lucene indexing
      When a graph query is executed, the first thing RavenDB does is index each node clause using Lucene.
      The result is a group of indexed tables that the graph engine can easily play with.
    • Handling relations
      If the query comprises edges, the graph engine uses them now while going through the table prepared during the first phase and fathoming the relations between table elements.
      Be aware that this part of a query is performed in memory and is not indexed, so reruns actually mean re-running it.

FAQ

Q: When should or shouldn't I use graph queries?

A: There are configurations and situations for which graph querying is an optimal solution, and other circumstances that require different approaches. You may find this list helpful in determining whether to give it a go.

  • Use graph querying when -

    • Relations between data elements are a concern.
      If your data continuously grows in quantity and intricity, queries become increasingly complicated and results arrive after longer and longer periods of time, graph queries are likely to be the solution you're craving for.
    • You look for optimized paths.
      As their history suggestss, graph queries are awesom in finding optimal paths between related nodes. Graph-using applications may find the fastest way to a suitable host, the quickest publicity route to a destination audience, or the cheapest way to get a specified product.
    • You want to collect data from a web of relations.
      You can dynamically build user profiles, product pages, vendor data sheets and so on, using graph queries that collect data related to them in the first degree, second degree, third degree and so on.
  • Graph querying may not be an ideal solution for you if -

    • Your documents are isolated from each other by structure or preference.
    • Your data is pre-arranged and pre-indexed, requiring no ongoing relation queries to refurnish its contents.
    • A different model has a clear advantage, e.g. key/value store for key/value customer lists, relational database for fixed tables, etc.
    • Your queries starts with a broad search.
      Graph queries work best when the search starts with a definite starting point and lays out a path from there on.