Elasticsearch ETL Task

To begin creating your Elasticsearch ETL task:

"Ongoing tasks view"

Ongoing tasks view

"Task selection view"

Task selection view

Define the Elasticsearch ETL Task

"Define Elasticsearch ETL task"

Define Elasticsearch ETL task

  1. Task Name (Optional)

    • Choose a name for your task
    • If no name is provided, the cluster will create a name based on the defined connection string (e,g, ElasticSearch ETL to ElasticConStr).
  2. Task State
    The task state can be -
    Enabled - The task runs in the background, transforming and sending documents as defined in this view.
    Disabled - The task does not transform and send documents.

  3. Responsible Node (Optional)

    • Select a preferred mentor node from the Database Group to be responsible for this task.
    • If no node is selected, the cluster will assign a responsible node (see Members Duties).
  4. Connection String

    • The connection string defines the destination Elasticsearch URLs.
    • If you already created connection strings, you can select one from the list.
    • You can create a new connection string:
    "Create Connection String"

    Create Connection String

    a. Name - The connection string name
    b. Nodes URLs - The Elasticsearch destination/s URL/s
    c. Authentication - The authentication method used by the Elasticsearch destination node/s.

    • Available authentication methods:
    "Authentication Methods"

    Authentication Methods

Elasticsearch Indexes

Elasticsearch uses Indexes to store, access, and delete documents.
Use the task Elasticsearch Indexes settings to choose the indexes the task will access.

"Define Elasticsearch Index"

Define Elasticsearch Index

  1. Add Index (Optional)

    • Click to add an Elasticsearch index to the list.
  2. Index Name
    Provide an Elasticsearch Index name, as defined by the transformation script loadTo<Target>(obj) command (where Target is the index name and obj is the object to be passed to Elasticsearch).
    E.g., a transformation script's loadToOrders(orderData) command requires you to define an Elasticsearch orders Index.

    • Elasticsearch requires an all lower case index name (e.g. orders).
    • The transformation script allows both lower and higher-case characters (e.g. both loadToOrders and loadToorders are permitted).
  3. Document ID Property Name
    Provide the name of a property passed by the transformation script to Elasticsearch, as an ID.
    Elasticsearch will store your documents by this ID, and you will be able to delete and modify them by it.
    E.g., if one of the properties of the object passed by your transformation script to Elasticsearch is "DocID", you can use DocID as the index's ID Property.

  4. Insert Only
    By default, the ETL task appends a new document only after deleting its existing version using _delete_by_query.
    Enabling Insert Only prevents the task from sending _delete_by_query messages, allowing you to append documents without removing their existing version first.

    Enabling Insert Only would accumulate new document versions on Elasticsearch without ever removing them.

  5. Confirm
    Click to add this index to the list.

  6. Cancel
    Click to cancel the operation without adding the index to the list.

"Indexes List"

Indexes List

  1. Defined Index
    An Elasticsearch index that has been added.
  2. Index
    Elasticsearch index name.
  3. Document ID Property
    The RavenDB document property that is used as an Elasticsearch ID.
  4. Edit Index
    Click to edit index properties.
  5. Remove Index
    Click to remove this index from the list.

Transformation Script

  • A transformation script sends Elasticsearch -
    • an optional _delete_by_query command, to delete existing document versions before appending new ones.
      You can omit _delete_by_query commands from the script using the task's Insert Only option.
    • a _bulk command to append RavenDB documents to Elasticsearch.

Add Transformation Script

"List of transformation Scripts"

List of transformation Scripts

  1. Transform Scripts
    List of existing transformation scripts.

  2. Add Transformation Script
    Click to add a new transformation script to the list.

  3. Existing Script
    a. Edit - Click to edit the script.
    b. Remove - Click to remove the script from the list.

Edit Transformation Script

"Edit Transformation Script"

Edit Transformation Script

  1. Script Name (Optional)
    The script is named automatically.
    Optionally, give it a name of your choice.

  2. Script

    • Add or edit the transformation script.
    • Add all the Elasticsearch indexes your script uses, to the indexes list.
  3. Syntax Click for a transformation script Syntax Sample.

  4. Collections

    • Select (or enter) a collection
      Type or select the names of the collections your script is using.
    • Collections Selected
      A list of collections that were already selected.
  5. Apply script to documents from beginning of time (Reset)

    • When this option is enabled, the script will be executed over all existing documents in the specified collections the first time the task runs.
    • When this option is disabled, the script will be executed only over new and modified documents.
    • If Insert Only is enabled,
      RavenDB documents will be appended to Elasticsearch without deleting documents from Elasticsearch first.
    • If Insert Only is disabled, documents will be deleted from Elasticsearch first, and then appended to it from RavenDB.
  6. Add/Update
    Click to add a new script or update the task with changes made in an existing script.

  7. Cancel
    Click to cancel your changes.

  8. Test Script
    Click to test the transformation script (read more about this option below).

Test Transformation Script

"Transformation script"

Transformation script

  1. Document ID
    Type or select the ID of the document you want to test the script with.
  2. Test
    Click to run the test.
    The test will display the commands that would be sent to Elasticsearch, without actually sending them.
  3. Close Test Area
    Close this view.

Test Results

The test results view displays a preview of the tested document, and the commands the task would send Elasticsearch.

  • Document Preview Tab
"Document Preview Tab"

Document Preview Tab

  • Test Results Tab
"Test Results Tab"

Test Results Tab

  1. Test Results Tab
    Displays the commands that the task would send Elsticsearch.
  2. Elasticsearch Index
    The index the commands and data are sent to.
  3. _delete_by_query Segment
    With a list of IDs by which Elasticsearch would locate and remove existing documents.
    Deleting existing document versions is optional, enable Insert Only to prevent the task from sending _delete_by_query commands.
  4. _bulk Segment
    With a list of document objects, each with data extracted from RavenDB and an ID that Elasticsearch stores it by.