Ongoing Tasks: RavenDB ETL


  • RavenDB ETL Task creates an ETL process for a given database when the destination is another RavenDB database.

  • It can be defined in code or using the Studio by creating a RavenDB ETL task under Settings -> Manage Ongoing Tasks.

  • One RavenDB ETL task can have multiple transformation scripts.

  • Each script can be defined on a single collection, multiple selected collections or be applied to all documents regardless of the associated collection.

  • The script is executed per document once the document is created or modified.

  • In this page:

Figure 1. Configure RavenDB ETL task

RavenDB ETL in Studio

Transformation Script Options

Loading Documents

  • In order to load data to the destination database you must call the loadTo<CollectionName>() method and pass a JS object.

  • Indicating the collection name in the load method is a convention.

  • The objects passed to the loadTo method will be sent to the target database in the last Load stage.

  • All results created in a single ETL run will be sent in a single batch and processed transactionally on the destination.

  • For example, if you want to write data to Employees collection you need to call the following method in the script body:

loadToEmployees({ ... });
  • The method parameter must be a JS object. You can create it as follows:

loadToEmployees({
    Name: this.FirstName + " " + this.LastName
});
  • Or simply transform the current document object and pass it:

this.Name = this.FirstName + " " + this.LastName;

delete this.Address;
delete this.FirstName;
delete this.LastName;

loadToEmployees(this);

Alternative Syntax

The target collection name can be passed to the loadTo command separately, as a string argument, using this syntax: loadTo('Target', obj)

  • Example:
    The following two calls to loadTo are equivalent.
    loadToEmployees(this);
    loadTo('Employees', this);

  • The target name 'Employees' in this syntax is not a variable and cannot be used as one: it is simply a string literal of the target's name.
  • Separating the target name from the loadTo command makes it possible to include symbols like - and . in target names. This is not possible when the standard loadToEmployees syntax is used because including special characters in the name of a JS function turns it invalid.

Documents Identifiers

  • The documents generated in the destination database are given an id according to the collection name specified in the loadTo method.

  • If the specified collection is the same as the original one then the document is loaded to the same collection and the original identifier is preserved.

  • For example, the following ETL script defined on Employees collection will keep the same idenifiers in the target database:

// original identifier will be preserved
loadToEmployees({ ... });
  • If the 'loadTo' method indicates a different target collection, e.g. People,
    then the employee documents will get new identifiers that combine the original ID and the new collection name in the destination database.

// new identifier will be generated
loadToPeople({ ... });
  • In addition, ETL appends the symbol / to the requested id so that the target database will generate identifiers on its side.
    As a result, documents in the People collection in the target database will have identifiers such as: employees/1-A/people/0000000000000000001-A.

Filtering

  • Documents can be filtered from the ETL by calling the loadTo method only for documents that match some condition:

if (this.Active) {
    // load only active users
    loadToEmployees({ ... });
}

Loading Data from Other Documents

  • The load method loads a document with a specified ID during script execution.

// this.ReportsTo has some document ID
var manager = load(this.ReportsTo);

Accessing Metadata

  • The metadata can be accessed in the following way:

var value = this['@metadata']['custom-metadata-key'];

Creating Multiple Documents from a Single Document

  • The loadTo method can be called multiple times in a single script.
    That allows you to split a single source document into multiple documents on the destination database:

// documents will be creaed in `Addresses` collection
loadToAddresses({
    City: this.Address.City ,
    Country: this.Address.Country ,
    Address: this.Address.Line1
});

delete this.Address;

// documents will be created in the `Employees` collection
loadToEmployees(this);

Empty Script

  • An ETL task can be created with an empty script.
  • The documents will be transferred without any modifications to the same collection as the source document.

Attachments & Revisions

  • Attachments are sent automatically only when you send a full collection to the destination using an empty script.
    If you do use a script, there is currently no way to indicate that attachments should also be sent.

  • Revisions are not sent by the ETL process.
    But, if revisions are configured on the destination database, then when the target document is overwritten by the ETL process a revision will be created as expected.

Deletions

  • As described above, the identifiers created for the sent documents can be different from the original source documents identifiers.
    The source isn't aware of the new IDs created so documents deletion requires a special approach.

  • In order to remove the matching documents on the destination side, the deletion of a document sends a command to remove documents that have an ID with a well-known prefix.

  • For example:

    1. Document employees/1-A is processed by ETL and put into People collection with ID:
      employees/1-A/people/0000000000000000001-A.
    2. Deletion of the employees/1-A document on the source side triggers sending a command that deletes documents having the following prefix in their ID: employees/1-A/people/.
  • If you output multiple documents from a single document then multiple delete commands will be sent, one for each prefix containing the destination collection name.

  • When documents are sent to the same collection and IDs don't change then deletion on the source results in sending a single delete command for a given ID.

Example

  • The following is an example of a RavenDB ETL script processing documents from Employees collection:

var managerName = null;

if (this.ReportsTo !== null)
{
    var manager = load(this.ReportsTo);
    managerName = manager.FirstName + " " + manager.LastName;
}

// load documents to `Employees` collection in the destination
loadToEmployees({
    // the loaded documents will have these fields:
    Name: this.FirstName + " " + this.LastName,
    Title: this.Title,
    BornOn: new Date(this.Birthday).getFullYear(),
    Manager: managerName
});