Queue ETL Overview



Queue ETL tasks

RavenDB produces messages to broker queues via the following Queue ETL tasks:

  • Kafka ETL Task
    You can define a Kafka ETL Task from the Studio or using the Client API.
  • RabbitMQ ETL Task
    You can define a RabbitMQ ETL Task from the Studio or using the Client API.
  • Azure Queue Storage ETL Task
    You can define an Azure Queue Storage ETL Task from the Studio or using the Client API.

These ETL tasks:

  • Extract selected data from RavenDB documents from specified collections.
  • Transform the data to new JSON objects.
  • Wrap the JSON objects as CloudEvents messages and Load them to the designated message broker.

Data delivery

What is transferred

  • Documents only
    A Queue ETL task transfers documents only.
    Document extensions like attachments, counters, or time series, will not be transferred.
  • CloudEvents messages
    JSON objects produced by the task's transformation script are wrapped and delivered as CloudEvents Messages.

How are messages produced and consumed

  • The Queue ETL task will send the messages it produces to the target using a connection string,
    which specifies the destination and credentials required to authorize the connection.
    Find the specific syntax for defining a connection string per task in each task's documentation.
  • Each message will be added to the tail of its assigned queue according to the transformation script.
    As earlier messages are processed, it will advance to the head of the queue, becoming available for consumers.
  • RavenDB publishes messages to the designated brokers using transactions and batches,
    creating a batch of messages and opening a transaction to the destination queue for the batch.

Idempotence and message duplication

  • RavenDB is an idempotent producer, which typically does not send duplicate messages to queues.
  • However, it is possible that duplicate messages will be sent to the broker.
    For example:
    Different nodes of a RavenDB cluster are regarded as different producers by the broker.
    If the node responsible for the ETL task fails while sending a batch of messages,
    the new responsible node may resend messages that were already received by the broker.
  • Therefore, if processing each message only once is important to the consumer,
    it is the consumer's responsibility to verify the uniqueness of each consumed message.

CloudEvents

  • After preparing a JSON object that needs to be sent to a message broker,
    the ETL task wraps it as a CloudEvents message using the CloudEvents Library.

  • To do that, the JSON object is provided with additional required attributes,
    added as headers to the message, including:

    Attribute Type Description Default Value
    id string Event identifier The document Change Vector
    type string Event type "ravendb.etl.put"
    source string Event context <ravendb-node-url>/<database-name>/<etl-task-name>
  • The optional 'partitionkey' attribute can also be added.
    Currently, it is only implemented by Kafka ETL.

    Optional Attribute Type Description Default Value
    partitionkey string Events relationship/grouping definition The document ID

Task statistics

Use the Studio's Ongoing tasks stats view to see various statistics related to data extraction, transformation,
and loading to the target broker.

Queue Brokers Stats

Ongoing tasks stats view