Elasticsearch ETL Task
- An Elasticsearch ETL Task creates an ETL process that transfers data
from selected RavenDB collections to an Elasticsearch destination.
The transferred data can be filtered and modified by transformation scripts. - An Elasticsearch ETL task transfers documents only.
Document extensions like attachments, counters, or time series, will not be transferred. -
This page explains how to create an Elasticsearch ETL task using the Studio.
Learn more about Elasticsearch ETL tasks and how to create one using the client API in Ongoing Tasks: Elasticsearch ETL -
In this page:
Navigate to the Elasticsearch ETL Task View
To begin creating your Elasticsearch ETL task:
Ongoing tasks view
Task selection view
Define the Elasticsearch ETL Task
Define Elasticsearch ETL task
-
Task Name (Optional)
- Enter a name for your task
- If no name is provided, the server will create a name based on the defined connection string,
e.g. ElasticSearch ETL to ElasticConStr
-
Task State
Select the task state:
Enabled - The task runs in the background, transforming and sending documents as defined in this view.
Disabled - No documents are transformed and sent. -
Responsible Node (Optional)
- Select a node from the Database Group to be responsible for this task.
- If no node is selected, the cluster will assign a responsible node (see Members Duties).
-
Connection String
- Select an existing connection string from the list or create a new one.
-
The connection string defines the destination Elasticsearch URLs and the
authentication method.
- a. Name - Enter a name for the connection string.
- b. Nodes URLs - Provide the URL(s) of the destination Elasticsearch node(s).
- c. Authentication - Select the authentication method relevant for the Elasticsearch node(s)
Elasticsearch Indexes
Elasticsearch uses Indexes to store, access, and delete documents.
Use the task's Elasticsearch Indexes settings to determine which Elastsicsearch
Indexes the ETL task will access.
Define Elasticsearch Index
1. Add Index
- Click to add an Elasticsearch index to the list.
2. Index Name
- Enter the Elasticsearch index name.
The index name must match the index name provided in the loadTo<indexName> command in the transformation script.
E.g., using the commandloadToOrders
in the transformation script requires you to define here an index by the name oforders
. - The index name entered here must be all lower case, as required by Elasticsearch.
E.g.orders
- The index name used in the transformation script command can be either lower
or upper-case.
E.g. bothloadToOrders
andloadToorders
are permitted.
3. Document ID Property Name
-
Enter the name of the transformation script property that contains
id(this)
.
This property will be created in each generated Elasticsearch document, and allow the ETL task to recognize the documents by their original RavenDB IDs when they are hosted in Elasticsearch. -
E.g. -
- The transformation script property:
OrderId: id(this)
- The property name you enter as Document ID Property Name:
OrderId
- In each document that the transformation script creates in Elasticsearch,
it will create a property named
OrderId
, that contains the original RavenDB document ID.
- The transformation script property:
4. Insert Only
-
By default, the ETL task will:
- Delete from the Elasticsearch destination all documents that
match the provided document ID field.
To do that, the ETL task will send the Elasticsearch destination a_delete_by_query
command. - Append new documents.
To do that, the ETL task will send the Elasticsearch destination a_bulk
command.
- Delete from the Elasticsearch destination all documents that
match the provided document ID field.
-
Check Insert Only to skip the first step and append new documents without deleting existing ones first.
Enabling Insert Only would accumulate new document versions on your Elasticsearch destination without ever removing them.
5. Confirm
- Click to add this index to the list.
6. Cancel
- Click to cancel the operation without adding the index to the list.
7. An index that was added
- Can be edited or deleted.
Transformation Script
The transformation script defines the JSON document that will be sent to the Elasticsearch destination per RavenDB document from the selected collections.
Add Transformation Script
List of transformation Scripts
-
Add Transformation Script
Click to add a new transformation script to the list. -
Existing Script
a. Script name & Collections on which it is defined. (Informative)
b. Edit - Click to edit the script.
c. Remove - Click to remove the script from the list.
Edit Transformation Script
Edit Transformation Script
-
Script Name
Enter a name for the script (Optional).
A default name will be generated if no name is entered, e.g. Script_1 -
Script
Edit the transformation script.- Define a document object whose contents will be extracted from
each RavenDB document processed by the ETL task and appended as
a document to the Elasticsearch destination.
E.g.,var orderData
in the above example. - Make sure that one of the properties of the document object
is given the value
id(this)
.
The ETL task will use this property to identify documents that reside on the Elasticsearch destination by their source RavenDB document ID.
- Use the
loadTo\<indexName\>
method to pass the document object to the Elasticsearch destination.
- Define a document object whose contents will be extracted from
each RavenDB document processed by the ETL task and appended as
a document to the Elasticsearch destination.
-
Syntax
Click for a transformation script Syntax Sample. -
Collections
- Select (or enter) a collection
Type or select the names of the collections your script is using. - Collections Selected
A list of collections that were already selected.
- Select (or enter) a collection
-
Apply script to documents from beginning of time (Reset)
- When this option is enabled:
The script will be executed over all existing documents in the specified collections the first time the task runs. - When this option is disabled:
The script will be executed only over new and modified documents. - If Insert Only
is enabled:
RavenDB documents will be appended to Elasticsearch without deleting documents from Elasticsearch first. - If Insert Only
is disabled:
Documents will be deleted from Elasticsearch first, and then appended to it from RavenDB.
- When this option is enabled:
-
Add/Update
Click to add a new script or update the task with changes made in an existing script. -
Cancel
Click to cancel your changes. -
Test Script
Click to test the transformation script (read more about this option below).
Test Transformation Script
Transformation script
- Document ID
Type or select the ID of the document you want to test the script with. - Test
Click to run the test.
The test will display the commands that would be sent to Elasticsearch, without actually sending them. - Close Test Area
Close this view.
Test Results
The test results view displays a preview of the tested document, and the commands the task would send Elasticsearch.
- Document Preview Tab
Document Preview Tab
- Test Results Tab
Test Results Tab
-
Test Results Tab
Displays the commands that the task would send Elsticsearch. -
Elasticsearch Index
The index the commands and data are sent to. -
_delete_by_query Segment
The delete POST command with a list of IDs by which Elasticsearch would locate and remove existing documents.
Deleting existing document versions is optional, enable Insert Only to prevent the task from sending_delete_by_query
commands. -
_bulk Segment
The bulk POST command with a list of document objects, each with data extracted from RavenDB and an ID that Elasticsearch stores it by.