ETL's three stages are:
- Extract - Extract the documents from the database
- Transform - Transform & filter the documents data according to the supplied script (optional)
- Load - Load (write) the transformed data into the target destination
The ETL process starts with retrieving the documents from the database.
You can choose which documents will be processed by next two stages (Transform and Load).
The possible options are:
- Documents from a single collection
- Documents from multiple collections
- All documents (RavenDB ETL only)
This stage transforms and filters the extracted documents according to a provided script.
Any transformation can be done so that only relevant data is shared.
In addition to ECMAScript 5.1 API, RavenDB introduces the following functions and members:
||The current document (with metadata)
||Returns the document ID
||Load another document.
This will increase the maximum number of allowed steps in a script.
Note: Changes made to the other loaded document will not trigger the ETL process.
Specific ETL functions:
||Load an object to a specified
The target must be either a collection name (RavenDB ETL) or a table name (SQL ETL).
An object will be sent to the destination only if the
loadTo method was called.
||Load an attachment (SQL ETL only)
Documents are extracted and transformed by the ETL process in a batch manner.
The number of documents processed depends on the following configuration limits:
Loading the results to the target destination is the last stage.
Updates are implemented by executing consecutive DELETEs and INSERTs.
When a document is modified, the delete command is sent before the new data is inserted, and both are processed under the same transaction on the destination side.
This applies to both ETL types.
There are two exceptions to this behavior:
- In RavenDB ETL, when documents are loaded to the same collection there is no need to send DELETE because the document on the other side has the same identifier and it will just update it.
- in SQL ETL you can configure to use inserts only, which is a viable option for append-only systems.
In contrast to Replication, ETL is a push-only process that writes data to the destination
whenever documents from the relevant collections were changed. Existing entries on the target will always be overwritten.
Loading data from encrypted database
If a database is encrypted then you must not send data in ETL process using a non encrypted channel by default. It means that a connection to a target must be secured:
- in Raven ETL an URL of a destination server has to use HTTPS (a server certificate of the source server needs to be registered as a client certificate on the destination server)
- in SQL ETL a connection string to SQL database must specify encrypted connection (specific per SQL engine provided)
This validation can be turned off by selecting Allow ETL on a non-encrypted communication channel option in the Studio (or setting
AllowEtlOnNonEncryptedChannel if a task is defined using the client API).
Please note that your data encrypted at rest won't be protected in transit then.