Ongoing Tasks: SQL ETL
-
SQL ETL is a task that creates an ETL process where data from a RavenDB database is extracted, transformed, and loaded into a relational database as the destination.
-
In this page:
Supported relational databases
-
RavenDB supports ETL processes to the following relational databases:
- Microsoft SQL Server
- PostgreSQL
- MySQL
- Oracle
-
You must specify the provider type for the target relational database when setting up the
SQL connection string. -
Before starting with SQL ETL, you need to create tables in the target relational database.
These tables will serve as the destinations for records generated by the ETL scripts.
Creating the SQL ETL task
To create an SQL ETL task using the Client API, see Creating the SQL ETL task from the Client API.
To create an SQL ETL task from the Studio open Tasks -> Ongoing Tasks
.
Create the SQL ETL Task
Configuring the SQL tables
Define the target tables where the SQL ETL task will load data.
Define the SQL Tables
Document ID Column
-
For each table, you must specify a column that will store the document ID column. RavenDB will populate this column with the source document ID, enabling the handling of document updates and deletions.
-
Note that the specified column does not need to be the primary key of the table.
-
For performance reasons, you should define indexes on the SQL tables on the relational database side,
at least on the column used to store the document ID.
Insert only
-
The SQL ETL process updates documents in the relational database using DELETE and INSERT statements.
-
If your system is append-only, you can enable the "Insert Only" toggle to instruct RavenDB to insert data without executing DELETE statements beforehand. This can provide a significant performance boost for systems of this kind.
Transformation scripts
The basic characteristics of an SQL ETL script are similar to those of other ETL types.
The script defines what data to extract from the source document, how to transform this data,
and which SQL table to load it to.
A single SQL ETL task can have multiple transformation scripts.
The script is defined per collection, and it cannot be empty.
The script is executed per document from the source collection once the document is created or modified.
The loadTo
method
To specify which SQL table to load the data into, use either of the following methods in your script.
The two methods are equivalent, offering alternative syntax:
-
loadTo<TableName>(obj)
- Here the target table is specified as part of the function name.
- The target <TableName> in this syntax is Not a variable and cannot be used as one,
it is simply a string literal of the target's name.
-
loadTo('TableName', obj)
- Here the target table is passed as an argument to the method.
- Separating the table name from the
loadTo
command makes it possible to include symbols like'-'
and'.'
in table names. This is not possible when theloadTo<TableName>
syntax is used because including special characters in the name of a JavaScript function makes it invalid.
Parameter | Type | Description |
---|---|---|
TableName | string | The name of the target SQL table |
obj | object | The object to transfer |
For example, the following two calls, which load data to "OrdersTable", are equivalent:
loadToOrdersTable(obj)
loadTo('OrdersTable', obj)
Loading to multiple tables
The loadTo
method can be called multiple times in a single script.
That allows you to split a single Order
document having Lines
collection into two tables and insert multiple rows:
The following is a sample script that processes documents from the Orders collection:
// Create an orderData object
// ==========================
var orderData = {
Id: id(this),
OrderLinesCount: this.Lines.length,
TotalCost: 0
};
// Update the orderData's TotalCost field
// ======================================
for (var i = 0; i < this.Lines.length; i++) {
var line = this.Lines[i];
var cost = (line.Quantity * line.PricePerUnit) * ( 1 - line.Discount);
orderData.TotalCost += cost;
// Load the object to SQL table 'OrdersTable'
// ==========================================
loadToOrderLines({
OrderId: id(this),
Qty: line.Quantity,
Product: line.Product,
Cost: line.PricePerUnit
});
}
orderData.TotalCost = Math.round(orderData.TotalCost * 100) / 100;
// Load to SQL table 'Orders'
// ==========================
loadToOrders(orderData);
Loading related documents
Use the load
method to load a related document with the specified ID during script execution.
var company = load(this.Company);
Loading Attachments
You can store binary data that is kept as attachments in RavenDB using the loadAttachment()
method.
For example, if you have the following Attachments table:
CREATE TABLE [dbo].[Attachments]
(
[Id] int identity primary key,
[OrderId] [nvarchar](50) NOT NULL,
[AttachmentName] [nvarchar](50) NULL,
[Data] [varbinary](max) NULL
)
then you can define the following script that loads the document's attachments:
var attachments = this['@metadata']['@attachments'];
for (var i = 0; i < attachments.length; i++) {
var attachment = {
OrderId: id(this),
AttachmentName: attachments[i].Name,
Data: loadAttachment(attachments[i].Name)
};
loadToAttachments(attachment);
}
- Attachments can be also accessed using the
getAttachments()
helper function
(instead of grabbing them from metadata). - The existence of an attachment can be checked by the
hasAttachment(name)
function.
Loading to VARCHAR and NVARCHAR columns
Two additional functions are designed specifically for working with VARCHAR and NVARCHAR types:
varchar(value, size = 50) |
Defines the parameter type as VARCHAR, with the option to specify its size (default is 50 if not provided). |
nvarchar(value, size = 50) |
Defines the parameter type as NVARCHAR, with the option to specify its size (default is 50 if not specified). |
var names = this.Name.split(' ');
loadToUsers(
{
FirstName: varchar(names[0], 30),
LastName: nvarchar(names[1]),
});
Loading to specific column types
The SQL type of the target column can be explicitly specified in the SQL ETL script.
This is done by defining the Type
and the Value
properties for the data being loaded.
-
Type:
The type specifies the SQL column type the value is loaded to.
The type should correspond to the data types used in the target relational database.Supported enums for
Type
include:- SqlDbType - see Microsoft SQL Server
- NpgsqlDbType - see PostgreSQL
- MySqlDbType - see MySQL Data Types
- OracleDbType - see Oracle Data Types
Some databases allow combining enum values using
|
.
For example, usingArray | Double
for the Type is valid for PostgreSQL.If no type is specified, the column type will be detected automatically.
-
Value:
The value contains the actual data to be loaded into the column.
var orderData = {
Id: id(this),
OrderLinesCount: this.OrderLines.length,
Quantities: {
// Specify the Type and Value for 'Quantities':
// ============================================
Type: 'Array | Double',
Value: this.OrderLines.map(function(l) {return l.Quantity;})
},
Products: {
// Specify the Type and Value for 'Products':
// ==========================================
Type: 'Array | Text',
Value: this.OrderLines.map(function(l) {return l.Product;})
},
};
// Load the data into the 'Orders' table
loadToOrders(orderData);
Filtering
To filter some documents out from the ETL, simply omit the loadTo
call:
if (this.ShipTo.Country === 'USA') {
// Load only orders shipped to USA
loadToOrders({ ... });
}
Accessing the metadata
You can access the metadata in the following way:
var value = this['@metadata']['custom-metadata-key'];
Document extensions
The SQL ETL task does not support sending Counters, Time series, or Revisions.
Advanced options
- Command timeout
Number of seconds after which SQL command will timeout.
It overrides the value defined in the ETL.SQL.CommandTimeoutInSec configuration key.
Default:null
(use provider default). - Parameterized deletes
Control whether DELETE statements generated during the ETL process use parameterized SQL queries,
rather than embedding values directly in the query.
Default:true
. - Table quotation
Control whether table names in the generated SQL statements are enclosed in quotation marks.
Default:true
. - Force recompile query
Control whether to force the SQL Server to recompile the query statement using (OPTION(RECOMPILE)
).
Default:false
.
Transaction processing
All records created in a single ETL run, one for each loadTo
call, are sent in a single batch and processed within the same transaction.
Creating the SQL ETL task from the Client API
// Define a connection string to a SQL database destination
// ========================================================
var sqlConStr = new SqlConnectionString
{
Name = "sql-connection-string-name",
// Define destination factory name
FactoryName = "MySql.Data.MySqlClient",
// Define the destination database
// May also need to define authentication and encryption parameters
// By default, encrypted databases are sent over encrypted channels
ConnectionString = "host=127.0.0.1;user=root;database=Northwind"
};
// Deploy (send) the connection string to the server via the PutConnectionStringOperation
// ======================================================================================
var PutConnectionStringOp = new PutConnectionStringOperation<SqlConnectionString>(sqlConStr);
PutConnectionStringResult connectionStringResult = store.Maintenance.Send(PutConnectionStringOp);
// Define the SQL ETL task configuration
// =====================================
var sqlConfiguration = new SqlEtlConfiguration()
{
Name = "mySqlEtlTaskName",
ConnectionStringName = "sql-connection-string-name",
SqlTables =
{
new SqlEtlTable
{
TableName = "Orders", DocumentIdColumn = "Id", InsertOnlyMode = false
},
new SqlEtlTable
{
TableName = "OrderLines", DocumentIdColumn = "OrderId", InsertOnlyMode = false
},
},
Transforms =
{
new Transformation()
{
Name = "scriptName",
Collections = { "Orders" },
Script = @"
var orderData = {
Id: id(this),
OrderLinesCount: this.Lines.length,
TotalCost: 0
};
for (var i = 0; i < this.Lines.length; i++) {
var line = this.Lines[i];
var cost = (line.Quantity * line.PricePerUnit) * ( 1 - line.Discount);
orderData.TotalCost += cost;
loadToOrderLines({
OrderId: id(this),
Qty: line.Quantity,
Product: line.Product,
Cost: line.PricePerUnit
});
}
orderData.TotalCost = Math.round(orderData.TotalCost * 100) / 100;
loadToOrders(orderData);
",
ApplyToAllDocuments = false
}
}
};
// Deploy the SQL ETL task to the server
// =====================================
var addSqlEtlOperation = new AddEtlOperation<SqlConnectionString>(sqlConfiguration);
store.Maintenance.Send(addSqlEtlOperation);
SqlEtlConfiguration
:
Property | Type | Description |
---|---|---|
Name | string |
The SQL ETL task name. |
ConnectionStringName | string |
The registered connection string name. |
SqlTables | List<SqlEtlTable> |
A list of SQL tables that the scripts will load data to. |
Transforms | List<Transformation> |
Your transformation scripts. |
QuoteTables | bool |
Control whether table names in the generated SQL statements are enclosed in quotation marks. Default is true . |
ParameterizeDeletes | bool |
Control whether DELETE statements generated during the ETL process use parameterized SQL queries. Default is true . |
ForceQueryRecompile | bool |
Set to true to force the SQL Server to recompile the query statement using (OPTION(RECOMPILE) ).Default is false . |
CommandTimeout | int? |
Number of seconds after which the SQL command will timeout. |
SqlEtlTable
:
Property | Type | Description |
---|---|---|
TableName | string |
The table name your script will load data to. |
DocumentIdColumn | string |
The column in the destination table that will store the document IDs. |
InsertOnlyMode | bool |
When set to true , RavenDB will insert data directly without executing DELETE statements beforehand.Default is false . |