Ongoing Tasks: SQL ETL



Supported relational databases

  • RavenDB supports ETL processes to the following relational databases:

    • Microsoft SQL Server
    • PostgreSQL
    • MySQL
    • Oracle
  • You must specify the provider type for the target relational database when setting up the
    SQL connection string.

  • Before starting with SQL ETL, you need to create tables in the target relational database.
    These tables will serve as the destinations for records generated by the ETL scripts.

Creating the SQL ETL task

To create an SQL ETL task using the Client API, see Creating the SQL ETL task from the Client API.
To create an SQL ETL task from the Studio open Tasks -> Ongoing Tasks.

Configure SQL ETL task

Create the SQL ETL Task

Configuring the SQL tables

Define the target tables where the SQL ETL task will load data.

Define SQL tables

Define the SQL Tables

Document ID Column

  • For each table, you must specify a column that will store the document ID column. RavenDB will populate this column with the source document ID, enabling the handling of document updates and deletions.

  • Note that the specified column does not need to be the primary key of the table.

  • For performance reasons, you should define indexes on the SQL tables on the relational database side,
    at least on the column used to store the document ID.

Insert only

  • The SQL ETL process updates documents in the relational database using DELETE and INSERT statements.

  • If your system is append-only, you can enable the "Insert Only" toggle to instruct RavenDB to insert data without executing DELETE statements beforehand. This can provide a significant performance boost for systems of this kind.

Transformation scripts

The basic characteristics of an SQL ETL script are similar to those of other ETL types.
The script defines what data to extract from the source document, how to transform this data,
and which SQL table to load it to.

A single SQL ETL task can have multiple transformation scripts.
The script is defined per collection, and it cannot be empty.
The script is executed per document from the source collection once the document is created or modified.


The loadTo method

To specify which SQL table to load the data into, use either of the following methods in your script.
The two methods are equivalent, offering alternative syntax:

  • loadTo<TableName>(obj)

    • Here the target table is specified as part of the function name.
    • The target <TableName> in this syntax is Not a variable and cannot be used as one,
      it is simply a string literal of the target's name.
  • loadTo('TableName', obj)

    • Here the target table is passed as an argument to the method.
    • Separating the table name from the loadTo command makes it possible to include symbols like '-' and '.' in table names. This is not possible when the loadTo<TableName> syntax is used because including special characters in the name of a JavaScript function makes it invalid.
Parameter Type Description
TableName string The name of the target SQL table
obj object The object to transfer

For example, the following two calls, which load data to "OrdersTable", are equivalent:

  • loadToOrdersTable(obj)
  • loadTo('OrdersTable', obj)

Loading to multiple tables

The loadTo method can be called multiple times in a single script. That allows you to split a single Order document having Lines collection into two tables and insert multiple rows:

The following is a sample script that processes documents from the Orders collection:

// Create an orderData object
// ==========================
var orderData = {
    Id: id(this),
    OrderLinesCount: this.Lines.length,
    TotalCost: 0
};

// Update the orderData's TotalCost field
// ======================================
for (var i = 0; i < this.Lines.length; i++) {
    var line = this.Lines[i];    
    var cost = (line.Quantity * line.PricePerUnit) * ( 1 - line.Discount);
    orderData.TotalCost += cost;

    // Load the object to SQL table 'OrdersTable'
    // ==========================================
    loadToOrderLines({
        OrderId: id(this),
        Qty: line.Quantity,
        Product: line.Product,
        Cost: line.PricePerUnit
    });
}

orderData.TotalCost = Math.round(orderData.TotalCost * 100) / 100;

// Load to SQL table 'Orders'
// ==========================
loadToOrders(orderData);

Use the load method to load a related document with the specified ID during script execution.

var company = load(this.Company);

Loading Attachments

You can store binary data that is kept as attachments in RavenDB using the loadAttachment() method.
For example, if you have the following Attachments table:

CREATE TABLE [dbo].[Attachments]
(
    [Id] int identity primary key,
    [OrderId] [nvarchar](50) NOT NULL,
    [AttachmentName] [nvarchar](50) NULL,
    [Data] [varbinary](max) NULL
)

then you can define the following script that loads the document's attachments:

var attachments = this['@metadata']['@attachments'];

for (var i = 0; i < attachments.length; i++) {
    var attachment = {
        OrderId: id(this),
        AttachmentName: attachments[i].Name,
        Data: loadAttachment(attachments[i].Name)
    };
    loadToAttachments(attachment);
}
  • Attachments can be also accessed using the getAttachments() helper function
    (instead of grabbing them from metadata).
  • The existence of an attachment can be checked by the hasAttachment(name) function.

Loading to VARCHAR and NVARCHAR columns

Two additional functions are designed specifically for working with VARCHAR and NVARCHAR types:

varchar(value, size = 50) Defines the parameter type as VARCHAR, with the option to specify its size
(default is 50 if not provided).
nvarchar(value, size = 50) Defines the parameter type as NVARCHAR, with the option to specify its size
(default is 50 if not specified).

var names = this.Name.split(' ');

loadToUsers(
{
    FirstName: varchar(names[0], 30),
    LastName: nvarchar(names[1]),
});

Loading to specific column types

The SQL type of the target column can be explicitly specified in the SQL ETL script.
This is done by defining the Type and the Value properties for the data being loaded.

  • Type:
    The type specifies the SQL column type the value is loaded to.
    The type should correspond to the data types used in the target relational database.

    Supported enums for Type include:

    Some databases allow combining enum values using |.
    For example, using Array | Double for the Type is valid for PostgreSQL.

    If no type is specified, the column type will be detected automatically.

  • Value:
    The value contains the actual data to be loaded into the column.

var orderData = {
    Id: id(this),
    OrderLinesCount: this.OrderLines.length,
    Quantities: {
        // Specify the Type and Value for 'Quantities':
        // ============================================ 
        Type: 'Array | Double',
        Value: this.OrderLines.map(function(l) {return l.Quantity;})
    },
    Products: {
        // Specify the Type and Value for 'Products':
        // ==========================================
        Type: 'Array | Text',
        Value: this.OrderLines.map(function(l) {return l.Product;})
    },
};

// Load the data into the 'Orders' table
loadToOrders(orderData);

Filtering

To filter some documents out from the ETL, simply omit the loadTo call:

if (this.ShipTo.Country === 'USA') {
    // Load only orders shipped to USA
    loadToOrders({ ... });
}

Accessing the metadata

You can access the metadata in the following way:

var value = this['@metadata']['custom-metadata-key'];

Document extensions

The SQL ETL task does not support sending Counters, Time series, or Revisions.

Advanced options

  • Command timeout
    Number of seconds after which SQL command will timeout.
    It overrides the value defined in the ETL.SQL.CommandTimeoutInSec configuration key.
    Default: null (use provider default).
  • Parameterized deletes
    Control whether DELETE statements generated during the ETL process use parameterized SQL queries,
    rather than embedding values directly in the query.
    Default: true.
  • Table quotation
    Control whether table names in the generated SQL statements are enclosed in quotation marks.
    Default: true.
  • Force recompile query
    Control whether to force the SQL Server to recompile the query statement using (OPTION(RECOMPILE)).
    Default: false.

Transaction processing

All records created in a single ETL run, one for each loadTo call, are sent in a single batch and processed within the same transaction.

Creating the SQL ETL task from the Client API

// Define a connection string to a SQL database destination
// ========================================================
var sqlConStr = new SqlConnectionString
{
    Name = "sql-connection-string-name",

    // Define destination factory name
    FactoryName = "MySql.Data.MySqlClient",

    // Define the destination database
    // May also need to define authentication and encryption parameters
    // By default, encrypted databases are sent over encrypted channels
    ConnectionString = "host=127.0.0.1;user=root;database=Northwind"
};
               
// Deploy (send) the connection string to the server via the PutConnectionStringOperation
// ======================================================================================
var PutConnectionStringOp = new PutConnectionStringOperation<SqlConnectionString>(sqlConStr);
PutConnectionStringResult connectionStringResult = store.Maintenance.Send(PutConnectionStringOp);
// Define the SQL ETL task configuration
// =====================================
var sqlConfiguration = new SqlEtlConfiguration()
{
    Name = "mySqlEtlTaskName",
    ConnectionStringName = "sql-connection-string-name",
    
    SqlTables =
    {
        new SqlEtlTable
        {
            TableName = "Orders", DocumentIdColumn = "Id", InsertOnlyMode = false
        },
        new SqlEtlTable
        {
            TableName = "OrderLines", DocumentIdColumn = "OrderId", InsertOnlyMode = false
        },
    },
    
    Transforms =
    {
        new Transformation()
        {
            Name = "scriptName",
            Collections = { "Orders" },
            
            Script = @"
                var orderData = {
                    Id: id(this),
                    OrderLinesCount: this.Lines.length,
                    TotalCost: 0
                };

                for (var i = 0; i < this.Lines.length; i++) {
                    var line = this.Lines[i];
                    var cost = (line.Quantity * line.PricePerUnit) * ( 1 - line.Discount);
                    orderData.TotalCost += cost;

                    loadToOrderLines({
                        OrderId: id(this),
                        Qty: line.Quantity,
                        Product: line.Product,
                        Cost: line.PricePerUnit
                    });
                }

                orderData.TotalCost = Math.round(orderData.TotalCost * 100) / 100;
                loadToOrders(orderData);
            ",
            
            ApplyToAllDocuments = false
        }
    }
};

// Deploy the SQL ETL task to the server
// =====================================
var addSqlEtlOperation = new AddEtlOperation<SqlConnectionString>(sqlConfiguration);
store.Maintenance.Send(addSqlEtlOperation);

SqlEtlConfiguration:

Property Type Description
Name string The SQL ETL task name.
ConnectionStringName string The registered connection string name.
SqlTables List<SqlEtlTable> A list of SQL tables that the scripts will load data to.
Transforms List<Transformation> Your transformation scripts.
QuoteTables bool Control whether table names in the generated SQL statements are enclosed in quotation marks.
Default is true.
ParameterizeDeletes bool Control whether DELETE statements generated during the ETL process use parameterized SQL queries.
Default is true.
ForceQueryRecompile bool Set to true to force the SQL Server to recompile the query statement using (OPTION(RECOMPILE)).
Default is false.
CommandTimeout int? Number of seconds after which the SQL command will timeout.

SqlEtlTable:

Property Type Description
TableName string The table name your script will load data to.
DocumentIdColumn string The column in the destination table that will store the document IDs.
InsertOnlyMode bool When set to true, RavenDB will insert data directly without executing DELETE statements beforehand.
Default is false.