RavenDB ETL Task
A RavenDB ETL Task is an ETL process
that transfers data from the current database to another RavenDB database instance,
outside of the Database Group
The sent data can be filtered and modified by transformation scripts
Learn more about the benefits of using ETL in Why use ETL
ETL is different from data replication. See RavenDB ETL Task -vs- Replication Task
In this page:
RavenDB ETL Task - Definition
Create New RavenDB ETL Task
Task Name (Optional)
- Choose a name of your choice
- If no name is given then RavenDB server will create one for you based on the defined connection string
Preferred Node (Optional)
- Select a preferred mentor node from the Database Group to be the responsible node for this RavenDB ETL Task
- If not selected, then the cluster will assign a responsible node (see Members Duties)
- Select an existing connection string from the list or create a new one
- The connection string defines the destination database and its database group server nodes URLs
RavenDB ETL Task - Transform Scripts
RavenDB ETL Task - Transform Scripts
Click to add a new script
Edit or Delete an existing script
Enter the script to use.
In the above example, each source document from the 'Products' collection will be sent to the 'ProductsInfo' collection in the destination database db3
(which is external to the cluster).
Each new document will have 2 fields: 'ProductName' & 'SupplierName'.
For detailed script options see Transformation Script Options.
By default, updates to the ETL script will not be applied to documents that were already sent.
When checking this option RavenDB will start the ETL process for this script from scratch ("beginning of time"),
rather than apply the update only to new or updated documents.
Select the collections for the ETL task -or - apply to all collections
RavenDB ETL Task - Details in Tasks List View
Tasks List View Details
RavenDB ETL Task Details:
- Task Status - Active / Not Active / Not on Node / Reconnect
- Connection String - The connection string used
- Destination Database - The destination database to which the data is being sent
- Actual Destination URL - The server URL to which the data is actually being sent,
the one that is currently used out of the available Topology Discovery URLs
- Topology Discovery URLs - List of the available destination Database Group servers URLs
Graph view of the responsible node for the External Replication Task
RavenDB ETL Task - Offline Behaviour
When the source cluster is down (and there is no leader):
Creating a new Ongoing Task is a Cluster-Wide operation,
thus, a new Ongoing RavenDB ETL Task cannot be scheduled.
If a RavenDB ETL Task was already defined and active when the cluster went down,
then the task will not be active, data will not be ETL'ed.
When the node responsible for the ETL task is down:
- If the responsible node for the RavenDB ETL Task is down,
then another node from the Database Group will take ownership of the task so that the ETL process will continue executing.
When the destination node is down:
The ETL process will wait until the destination is reachable again and proceed from where it left off.
If there is a cluster on the other side, and the URL addresses of the destination database group nodes are listed in the connection string,
then when the destination node is down, RavenDB ETL will simply start transferring data to one of the other nodes specified.
RavenDB ETL Task -vs- Replication Task
When a RavenDB node performs an ETL to another node it is not replicating the data, it is writing it.
In other words, we always overwrite whatever exists on the other side, there is no conflicts handling.
The source database for the ETL process is the owner of the data.
This means that any modifications done to the ETL'ed data on the destination database side are lost when overwriting occurs.
If you need to modify the ETL'ed data in the destination side, you should create a companion document on the destination database instead of modifying the ETL'ed data directly.
The rule is: For ETL'ed data, you can look but not touch...
On the other hand, Data that is replicated with RavenDB's External Replication Task does not overwrite existing documents.
Conflicts are created and handled according to the destination database policy defined.
This means that you can change the replicated data on the destination database and conflicts will be solved.
With replication Task, all documents contained in the database are replicated to the destination database without any content modification.
Whereas in ETL, the document content sent can be filtered and modified with the supplied transformation script.
In addition, partial data can be sent as specific collections can be selected.