Modeling temporal data with RavenDB
We got an interesting modeling question from a customer: “What is the optimal manner to represent time sensitive information in RavenDB?”
The initial thought was that they would use revisions and they asked about querying those. The issue is that this isn’t really the purpose for revisions, they are great if you want to see what the state of the system at a particular point in time, but not so good if your business logic has meaning over time.
The best scenario for temporal data that I’m aware of is payroll. You have a lot of data that make sense only in the context of the time if was relevant for. For example, consider an employee that is hired at a given salary level, then given raises over time. The data in this case is divided into several layers.
I’m going to use paper documents as the model here, because it makes it much easier to consider the modeling implications than when talking about JSON or class structures.
On the most basic model, we have the Payslip document, which represent the amount (work, deductions, taxes, etc) that was paid to an employee at a particular point in time. This is similar to this:
Once created, such a document will not change. It represent an action that happened in the past and is immutable. From this you can figure out taxes, overall payments, etc.
The Payslip is computed from the Timesheet document, which is similar to this one:
A Timehseet document is updated during the Payroll Period whenever an employee signs in or out. At the end of the Payroll Period, a manager will sign off on the Timesheet and approve it for payment. At this point, all the relevant business rules are run and the final Payslip is generated for each employee. Once the Timesheet is signed off and paid, it is no longer mutable and will not change. This make sense, since it represent something that already happened.
In some cases, you’ll have new information, such as an employee that worked, but didn’t report their hours. They will need to do so in a new Timesheet and a new Payslip will be generated.
Using the real world analogy, the Timesheet document is stored at the head office, and you cannot go and update that once it was submitted.
So far, we haven’t seen anything related to things that change over time. In fact, the fact that we have separate documents for Payslips and Timesheets means that we can ignore a lot of the complexity that you’ll usually have to deal with in temporal databases.
We can’t completely get away from it, however. We need to consider the employee’s Contract, however. Usually when we think about the employment contract we think about something like this:
The contract specify details such as the hourly rate, overtime payment, vacation time, etc. In payroll systems, contracts are actually more complex than that, because you have to take into account that they change.
For example, consider the following scenario:
- 1996 – Hired as mailroom clerk – 4.75$ / hour
- 1998 – Promoted to junior clerk – 5.25$ / hour
- 1999 – Promoted to clerk – 5.40$ / hour
- 2002 – Promoted to senior clerk – 6.20$ / hour
How do you handle something like that, in terms of modeling?
The answer depends quite heavily on how your business logic handles this. One way to handle this is to create revisions. Using the real world logic, we are talking about signing a new contract and expiring the old one. But in reality, that isn’t how things are done. You’ll usually just update the payment terms.
How does this looks like in terms of the data modeling when using RavenDB,however? Well, there are two options. We can represent the data as simple values, like so:
When the data changes, we update those values (which generates revisions for the old data). However, that isn’t usually ideal, because business logic usually want to access the past values. For example, your contract may change mid payroll period, so your hourly rate is different depending if the hours worked on Monday or Thursday.
In this case, you’ll want to represent the values changing in the model directly, like so:
In most cases, this is the best option for modeling data where the temporal aspects of the data needs to be directly exposed to the business logic.