Challenge: Detecting livelihood in a distributed cluster
I’m teaching a course at university about cloud computing. That can be a lot of fun, but quite frustrating at time. The key issue for me is that I occasionally need to provide students with some way to do something that I know how to do properly, but I can’t.
Case in point, assuming that I have a distributed cluster of nodes, and we need to detect what nodes are up or down, how do you do that?
With RavenDB, we assign an observer to the cluster whose job is to do health monitoring. I can explain that to the students, but I can’t expect them to utilize this technique in their exercises, there is too much detail there. The focus of the lesson or exercise is not to build a distributed system but to make use of one, after all.
As a rule, I try to ensure that all projects that we are working on can be done in under 200 lines of Python code. That puts a hard limit to the amount of behavior I can express. Because of that, I find myself looking for ways to rely on existing infrastructure to deal with the situation.
Each node is running the same code, and they are setup so they can talk to one another, if needed. It is important that all the live nodes will converge to agree on the active nodes in relatively short order.
The task is to find the list of active nodes in a cluster, where nodes may go up or down dynamically. We are running in AWS cloud so you can use its resources, how would you do that?
The situation should be as simple as possible and easy to explain to students.