OLAP ETL Task



"Ongoing task view"

Ongoing task view

To begin creating your OLAP ETL task:

  1. Navigate to Tasks > Ongoing Tasks
  2. Click on "Add a Database Task"
"Task selection view"

Task selection view

3. Select "OLAP ETL"

Define an OLAP ETL Task

"New OLAP ETL task"

New OLAP ETL task view

  1. The name of this ETL task (optional).
  2. Choose which of the cluster nodes will run this task (optional).
  3. Set a custom partition value which can be referenced in the transform script. See below.


Custom Partition Value

"Custom partition value"

Custom partition value

  • A custom partition can be defined to differentiate parquet file locations when using the same connection string in multiple OLAP ETL tasks.
  • The custom partition name is defined inside the transformation script.
  • The custom partition value is defined in the input box above.
  • The custom partition value is referenced in the transform script as $customPartitionValue.
  • A parquet file path with custom partition will have the following format:
    {RemoteFolderName}/{CollectionName}/{customPartitionName=$customPartitionValue}
  • Learn more in Ongoing Tasks: OLAP ETL.

Run Frequency

"Task run frequency"

Task run frequency

  • Select the exact timing and frequency at which this task should run from the dropdown menu.
  • The maximum frequency is once every minute.
  • Select 'custom' from the dropdown menu to schedule the task using your own customized cron expression.

OLAP Connection String

  • Select an existing connection string from the available dropdown or create a new one.
  • If you choose to create a new connection string you can enter its name and destination here.
  • Multiple destinations can be defined.

OLAP ETL Destinations

"OLAP ETL destinations"

OLAP ETL destinations

Select one or more destinations from this list. Clicking each toggle reveals further fields and configuration options for each destination.

Transform Script

"List of transform scripts"

List of transform scripts

  1. List of existing transform scripts.
  2. Add a new transform script.
  3. Edit an existing transform script.

"Transform script"

Transform script

  1. The script name is generated once the 'Add' button is clicked. The name of a script is always in the format: "Script #[order of script creation]".
  2. The transform script. Learn more about these scripts here.
  3. Select a collection (or enter a new collection name) on which this script will operate.
  4. The selected collection names on which the script operates.
  5. If this option is checked, the script will operate on all existing documents in the specified collections the first time the task runs. When the option is unchecked, the script operates only on new documents.

Every parquet table that is created by a transform script includes two columns that aren't specified in the script:

  • _id
    Contains the source document ID. The default name used for this column is _id.
    You can override this name in the task definition - see more below.
  • _lastModifiedTime
    The value of the last-modified field in a document's metadata. Represented in unix time.


Override ID Column

"Override ID column"

Override ID column

These settings allow you to specify a different column name for the document ID column in a parquet file. The default ID column name is _id.

  1. Add a new setting.
  2. Select the name of the parquet table for which you want to override the ID column.
  3. Select the name for the table's ID column.
  4. Click to add this setting.
  5. Click to edit this setting.