RabbitMQ ETL

Release 5.4 is all about integrations. RavenDB can seamlessly integrate with your existing ecosystem, talking to the RabbitMQ message broker.

What is RabbitMQ?

Using queues is a generally accepted approach for sharing resources between many consumers. Operating Systems provides a mechanism that enables your application to share the same machine with other applications starting at the lowest levels. Whenever your code attempts to write to or read from a disk or to use an internet connection, it will compete with other processes trying to do the same. Multiplexing solutions at the micro level will use queues to line up these resource utilization requests and allocate part of the shared resources for each request.

The same sharing approach can be used on a macro level. Your application has a limited processing capacity, and the number of users may vary. You want all your users to experience the same high level of service, and queues are a great way to realize this goal.

RabbitMQ logo

Message queues in a nutshell

Message queues are a building block of modern distributed applications. They can help you decouple solutions into smaller building blocks, increasing performance, capacity, and reliability, all without increasing the complexity of the code you write. These smaller building blocks are usually called services, and they will not communicate directly with each other anymore. Instead, created messages will be stacked in a queue, waiting to be picked up by one of the other services. One service can be both a producer and consumer of messages.

With message queues, you can replace synchronous calls with asynchronous ones, meaning that services will interact with the queue instead of interacting directly. Producers can hand over command messages to the queue and proceed with other tasks without waiting for the command to be executed. On the other side, consumers will individually handle messages from the queue at their own pace and capacity for processing. This approach optimizes data flow since components are not waiting for each other.

With this architecture, granular scalability becomes an option. In scenarios where workload peaks, services accepting requests will generate command messages and place them in the queue. Since modern queues have high capacity and low latency, your application can accept multiple requests with low latency, converting each one into a message pushed to the queue. To process these messages, you can start multiple instances of the same service, each acting as a consumer of messages from the queue. This process can be automated with solutions like Kubernetes KEDA, which will scale instances up and down based on the queue length. As a result, you can provide cost-effective scalability without adding complexity to your application.

RabbitMQ message queue

When to use it?

As a general principle, deferred execution, whenever possible, will increase the capacity and reliability of your system while also improving the perception of latency from the user's side.

Applications managing message queues are called message brokers, and RabbitMQ is one of the most popular brokers. This open-source solution is a mature and robust message broker you can run on your infrastructure or major cloud providers without introducing vendor lock-in. Since it supports multiple messaging protocols (AMQP and MQTT, among others), it is a good choice for a wide spectrum of applications, including IoT.

RabbitMQ when to use it

Benefits of using built-in RabbitMQ ETL

With RabbitMQ ETL introduced in version 5.4, RavenDB can act as a producer of messages that will be placed directly into RabbitMQ queues. Instead of implementing code that would push messages to RabbitMQ, you can rely on a robust and reliable ETL mechanism that will extract documents from your database, transform them with your custom script, and hand them over to RabbitMQ. Not only will this shorten development time, but you will also be able to rely on this persistent mechanism that implements a retry process to resiliently enqueue your messages.

In distributed architectures, queues are most commonly used for sending commands reliably from one service to another. It is important to notice that building reliable distributed systems is a hard task. Lots of assumptions you have when building monolithic systems do not hold anymore. As the saying goes, “Once in a million” can happen next Tuesday. Or in other words, what may go wrong will undoubtedly go wrong, so you must plan and prepare for this. Moving messages from one system to another may sound like a trivial task, but it is far from one. A list of things that can go wrong is comprehensive - internet connection can be unreliable or even completely down. Servers can restart. Or you can deliver a message successfully, but the other party fails to inform you of successful reception.

The team behind RavenDB invested years of experience in developing robust and reliable mechanisms for guaranteed delivery of messages from RavenDB to external systems. Using RabbitMQ ETL, you will leverage that built-in knowledge and move faster with the implementation without jeopardizing quality and reliability. Sending commands over the wire does not only mean handing them over to an external system. It also means verifying this handover and removing them from the source. One of the approaches RavenDB applies is the Outbox communication pattern. Messages that need to be handed over to RabbitMQ will be stored inside RavenDB, who is now responsible to (reliably) place them in the queue. If they are confirmed to be accepted by the Message Broker, they will be deleted from RavenDB. In case of unsuccessful transfer, they will be retained, and redelivery will be attempted shortly.

The idea is to get an out of the box implementation of the Outbox pattern. You have a single transaction that writes both normal documents and messages that need to go to the queue. But instead of dealing with distributed transactions or complex integration processes. You let the RabbitMQ ETL handle the issue of reliably sending the data to the queue.

Imagine a scenario where you must process a credit card payment and send a confirmation email to the customer. These two operations should be performed transactionally, i.e., either payment completes successfully, and confirmation email is sent, or payment processing will fail, and confirmation email is not dispatched. You will usually use external services for both operations, e.g., Stripe for credit card payment and MailChimp for emails.

RabbitMQ benefits

RabbitMQ ETL in practice

An important detail of implementation in this scenario is how you handle failures. You do not want to send a confirmation email if the payment fails. Also, you do not want a failure in sending a confirmation email to invalidate a payment that was processed successfully. You want both of these operations to either fail or succeed together. This atomicity is the key to the expected behavior of your system. Even though implementation might seem complicated, it becomes quite straightforward using commands and queues. You will call API to process the CC payment, and if you get confirmation, you will save this information to your RavenDB database. However, since sending a confirmation email is contingent on the payment being processed, and you want to be sure the email will be delivered, you will save the command for sending the email along with payment confirmation. So, you transactionally save the payment confirmation to your database as well as a command for sending a confirmation email. At this point, the RabbitMQ ETL task will be triggered, and move this command to the RabbitMQ queue. In the next step, a dedicated service for sending emails will pick up this command of the queue and process it by sending a confirmation email via an external service like SendGrid or MailChimp.

RabbitMQ ETL in practice

Conclusion

Overall, this feature will simplify the life of your developers by reducing integration efforts. At the same time, this will also reduce overall implementation costs. And all of this will not affect the architecture of your system negatively. On the contrary - you will be producing reliable enqueuing to RabbitMQ natively from your RavenDB database.

You can also use RabbitMQ ETL to react to certain patterns in your documents. Consider a case where you have a new requirement. All large financial transfers need to apply extra scrutiny. You can set up the RabbitMQ ETL task to send just those big transfers to the queue, where your service will apply the additional logic. You won’t even need to modify a single line of code in the application itself. The entire feature can be implemented using the Open-Closed principle, leading to a much simpler architecture and reduced complexity all around.

Conclusion
See all RavenDB 5.4 features