Not so long ago, I have been tasked with the development of an in-memory IP address geolocation library. Yep, that was pretty cool and challenging at the same time (well, the challenge made it cool, right?).
In this short post, I want to share how the design of one component, the data driver, has evolved over time and how the Single Responsibility Principle (SRP) inspired this refactoring. The data driver was the poor guy charged of loading millions of geolocation data entry with the smallest possible memory footprint on a JVM (it did his job quite well, as the sizes of the zipped raw data and its in-memory form were of the same magnitude).
Like in any projects, everything started nice and simple. I have removed many moving parts and simplified the design to focus only on the discussion point, but what you see hereafter is pretty similar to where I started:
Quickly enough though, the rosy picture turned to a slightly less appealing color (kinda sorta brownish).
Loading all this data in memory takes time (around 15 seconds). It quickly became unacceptable to further slow down the usual crippled bootstrap of a JEE application server by holding the initialization thread longer than necessary. Consequently, the data driver had to delegate its actual initialization to another thread in order to free the main thread so it could perform its lengthy EJB bootstrapping business.
On top of that, because IP address blocks get re-assigned regularly, a geolocation database must be refreshed frequently, else you end-up with clients that appear to be in Antarctica instead of Kansas (that would be tough on the penguins). So the data driver had to be capable of hot reloading its data at any point of time while still being responsive (i.e. by keeping to use the old data until the new ones were fully loaded).
So here I went adding all these features and I ended up with that:
In complete violation of the SRP, my original jolly little driver had become the Mother Of All Drivers, capable of doing everything and even more.
At this point, my virtual green wristband started to burn my wrist pretty badly. I was hearing Uncle Bob's voice threatening me of disasters of cosmic proportions if I would not live up to my professional standards and refactor the code right away.
So I came up with this refactoring:
I gave the ReloadingDriver the single responsibility of hot reloading. It delegated the data access operation to the actual driver, to which it kept a private reference. Instead of dealing with state with flags, I refactored it according to the state pattern and used a null object to represent the "not ready" state.
To give you a better idea of how the state of the ReloadingDriver evolves, I have added a vertical time-line to the following diagram:
Interestingly, the mechanism that performs the regular reload is the same the performs the initial load that transitions from "not ready" to "Version1".
As a closing note, nothing of this would have been possible without my best friends: the Executor and the AtomicReference. I want to thank them here for their constant support in my concurrency software development endeavors.