High performance .NET: Building a Redis Clone–naively
I run into this project, which aims to be a Redis clone with better performance and ease of use. I found it interesting because one of the main selling points there was that it is able to run in a multi threaded mode (instead of Redis’ single thread per process model). They use memtier_benchmark (part of Redis) to test their performance. I got curious about how much performance I could get out of the system if I built my own Redis clone in C#.
The first version I built was done pretty naively. The idea is to write it in a high level manner, and see where that puts us. To make things interesting, here are the test scenarios:
- The memtier_benchmark is going to run on c6g.2xlarge instance, using 8 cores and 32 GB of memory.
- The tested instance is going to run on c6g.4xlarge, using 16 cores and 64 GB of memory.
Both of those instances are running on the same availability zone.
The command I’m going to run is:
memtier_benchmark –s $SERVER_IP -t 8 -c 16 –test-time=30 –distinct-client-seed -d 256 –pipeline=30
What this says is that we’ll use 8 threads (number of cores on the client instance) with 32 connections per thread, we’ll use 20% writes & 80% reads with data size that is 256 bytes in size. In total, we’ll have 256 clients and out tests are going to continuously push more data into the system.
The server is being run using:
dotnet run –c Release
Here is an example of the server while under this test:
I chose 30 seconds for the test duration to balance doing enough work to feel what is going on (multiple GC cycles, etc) while keeping the test duration short enough that I won’t get bored.
Here are the naïve version results:
============================================================================================================================ Type Ops/sec Hits/sec Misses/sec Avg. Latency p50 Latency p99 Latency p99.9 Latency KB/sec ---------------------------------------------------------------------------------------------------------------------------- Sets 86300.19 --- --- 8.14044 0.92700 99.83900 196.60700 25610.97 Gets 862870.15 36255.57 826614.58 8.10119 0.91900 99.32700 196.60700 42782.42 Waits 0.00 --- --- --- --- --- --- --- Totals 949170.34 36255.57 826614.58 8.10476 0.91900 99.32700 196.60700 68393.39
So the naïve version, using C#, doing almost nothing, is almost touching the 1 million queries / sec. The latency, on the other hand, isn’t that good. With the p99 at almost 100ms.
Now that I got your attention with the numbers and pretty graphs, let me show you the actual code that I’m running. This is a “Redis Clone” in under 100 lines of code.
Just a few notes on the implementation. I’m not actually doing much. Most of the code is there to parse the Redis protocol. And the code is full of allocations. Each command parsing is done using multiple string splits and concats. Replies to the client require even more concats. The “store” for the system is actually just a simple ConcurrentDictionary, without anything to avoid contention or high costs.
The manner in which we handle I/O is pretty horrible, and… I think you get where I’m going here, right? My goal is to see how I can use this (pretty simple) example to get more performance without having to deal with a lot of extra fluff.
Given my initial attempt is already at nearly 1M QPS, that is a pretty good start, even if I say so myself.
The next step that I want to take it to handle the allocations that are going on here. We can probably do better here, and I aim to try. But I’ll do that in the next post.
Woah, already finished? 🤯
If you found the article interesting, don’t miss a chance to try our database solution – totally for free!