Using RavenDB from Serverless applications
I got a great question about using RavenDB from Serverless applications:
DocumentStore should be created as a singleton. For Serverless applications, there are no long-running instances that would be able to satisfy this condition. DocumentStore is listed as being “heavy weight”, which would likely cause issues every time a new insurance is created.
RavenDB’s documentation explicitly calls out that the DocumentStore should be a singleton in your application:
We recommend that your Document Store implement the Singleton Pattern as demonstrated in the example code below. Creating more than one Document Store may be resource intensive, and one instance is sufficient for most use cases.
But the point of Serverless systems is that there is no such thing as a long running instance. As the question points out, that is likely going to cause problems, no? On the one hand we have RavenDB’s DocumentStore, which is optimized for long running processes and on the other hand we have Serverless systems, which focus on minimal invocations. Is this really a problem?
The answer is that there is no real contradiction between those two desires, because while the Serverless model is all about a single function invocation, the actual manner in which it runs means that there exists a backing process that is reused between invocations. Taking AWS Lambda as an example, you can define a function that will be invoked for SQS (Simple Queuing Service), the signature for the function will look something like this:
async Task HandleSQSEvent(SQSEvent sqsEvent, ILambdaContext context);
The Serverless infrastructure will invoke this function for messages arriving on the SQS queue. Depending on its settings, the load and defined policies, the Serverless infrastructure may invoke many parallel instances of this function.
What is important about Serverless infrastructure is that a single function instance will be reused to process multiple events. It is the Serverless infrastructure’s responsibility to decide how many instances it will spawn, but it will usually not spawn a separate instance per message in the queue. It will let an instance handle the messages and spawn more as they are needed to handle the ingest load. I’m using SQS as an example here, but the same applies for handling HTTP requests, S3 events, etc.
Note that this is relevant for AWS Lambda, Azure Functions, GCP Cloud Functions, etc. A single instance is reused across multiple invocations. This ensure far more efficient processing (you avoid startup costs) and can make use of caching patterns and common optimizations.
When it comes to RavenDB usage, the same thing applies. We need to make sure that we won’t be creating a separate DocumentStore for each invocation, but once per instance. Here is a simplified example of how you can do this:
We define the DocumentStore when we initialize the instance, then we reuse the same DocumentStore for each invocation of the lambda (the handler code).
We can now satisfy both RavenDB’s desire to use a singleton DocumentStore for best performance and the Serverless programming model that abstracts how we actually run the code, without really needing to think about it.