ETL's three stages are:
- Extract - Extract the documents from the database
- Transform - Transform & filter the documents data according to the supplied script (optional)
- Load - Load (write) the transformed data into the target destination
The ETL process starts with retrieving the documents from the database.
You can choose which documents will be processed by next two stages (Transform and Load).
The possible options are:
- Documents from a single collection
- Documents from multiple collections
- All documents (RavenDB ETL & OLAP ETL only)
This stage transforms and filters the extracted documents according to a provided script.
Any transformation can be done so that only relevant data is shared.
A task can be provided with multiple transformation scripts.
Different scripts run in separate processes, allowing multiple scripts to run in parallel.
In addition to the ECMAScript 5.1 API, RavenDB introduces the following functions and members:
||The current document (with metadata)
||Returns the document ID
||Load another document.
This will increase the maximum number of allowed steps in a script.
Note: Changes made to the other loaded document will not trigger the ETL process.
Specific ETL functions:
||Load an object to a specified target.
There are several possible flavors to the syntax of this command,
find the general details below and more in the documentation for each ETL type.
Note: An object will be sent to the destination only if the
loadTo method was called.
||Load an attachment of the current document
||Check if an attachment with a given name exists for the current document
||Get a collection of attachments details for the current document. Each item has the following properties
||Add an attachment to a transformed document that will be sent to a target (
<doc>). Specific for Raven ETL only, see here for details
Documents are extracted and transformed by the ETL process in a batch manner.
The number of documents processed depends on the following configuration limits:
Loading the results to the target destination is the last stage.
An object can generally be loaded to a specified target using one of the below templates:
- The target is specified as a part of the
The target is specified as an argument of the
- The target name in this syntax is not a variable and cannot be used as one:
it is simply a string literal of the target's name.
- Separating the target name from the
loadTo command makes it possible to include symbols like
. in target names. This is not possible when the standard
loadToOrders syntax is
used because special characters are invalid in the name of a JS function.
- Note that the general syntax specified above,
loadTo('Target', obj), changes for some
ETL types. Find the accurate syntax for each ETL type in the type's specific documentation.
The target must be:
- RavenDB ETL: a collection name
- SQL ETL: a table name
- OLAP ETL: a Folder name
- Elasticsearch ETL: an Index name
- Kafka ETL: a topic name
Updates are implemented by executing consecutive DELETEs and INSERTs.
When a document is modified, the delete command is sent before the new data is inserted and both are processed under the same transaction on the destination side.
This applies to both ETL types.
There are two exceptions to this behavior:
- In RavenDB ETL, when documents are loaded to the same collection there is no need to send DELETE because the document on the other side has the same identifier and will just update it.
- in SQL ETL you can configure to use inserts only, which is a viable option for append-only systems.
In contrast to Replication, ETL is a push-only process that writes data to the destination whenever documents from the relevant collections were changed. Existing entries on the target will always be overwritten.
Loading data from encrypted database
If a database is encrypted then you must not send data in ETL process using a non encrypted channel by default. It means that a connection to a target must be secured:
- In Raven ETL, a URL of a destination server has to use HTTPS (a server certificate of the source server needs to be registered as a client certificate on the destination server)
- in SQL ETL, a connection string to an SQL database must specify encrypted connection (specific per SQL engine provided)
This validation can be turned off by selecting the Allow ETL on a non-encrypted communication channel option in the Studio (or setting
AllowEtlOnNonEncryptedChannel if a task is defined using the client API).
Please note that your data encrypted at rest won't be protected in transit then.