Global distributed consistency, the easy way
I run into an interesting scenario the other day. As part of the sign in process, the application will generate a random token that will be used to authenticate the following user requests. Basically, an auth cookie. Here is the code that was used to generate it:
This is a pretty nice mechanism. We use cryptographically secured random bytes as the token, so no one can guess what this will be.
There was a serious issue with this implementation, however. The problem was that on application startup, two different components would race to complete the signup. As you can see from the code, this is meant to be done in such as a way that two such calls within the space of one hour will return the same result. In most cases, this is exactly what happens. In some cases, however, we got into a situation where the two calls would race each other. Both would load the document from the database at the same time, get a(n obviously) different security token and write it out, then one of them would return the “wrong” security token.
At that point, it meant that we got an authentication attempt that was successful, but gave us the wrong token back. The first proposed solution was to handle that using a cluster wide transaction in RavenDB. That would allow us to ensure that in the case of racing operations, we’ll fail one of the transactions and then have to repeat it.
Another way to resolve this issue without the need for distributed transactions is to make the sign up operation idempotent. Concurrent calls at the same time will end up with the same result, like so:
In this case, we generate the security token using the current time, rounded to an hour basis. We use Argon2i (a password hashing algorithm) to generate the required security token from the user’s own hashed password, the current time and some pepper to make it impossible for outsiders to guess what the security token is even if they know what the password is. By making the output predictable, we make the rest of the system easier.
Note that the code above is till not completely okay. If the two request can with millisecond difference from one another, but on different hours, we have the same problem. I’ll leave the problem of fixing that to you, dear reader.