Evolving your product isn't easy. Best case, nothing breaks, users love it, and you high-five until your hand hurts. Worst case, Digg v4.
Here's a technical diagram of what our Reporting 1.0 Backend looked like:
Joking aside, our 1.0 backend had reached its scalability limits, making it a constant source of firefighting. Not particularly liking this, we assembled a team to rewrite it, and soon we had a shiny new Reporting 2.0 Backend. The next step was decommissioning 1.0, but we didn't want to switch it over without further testing. This is where we used our good friend, The Four Step Process™:
"Place" here refers to an address for data. It could be database columns, cache keys, a variable name, a function call, a path on a filesystem, a URI, or just about anything!
Say you have a system that writes a
name value that is first name and last name concatenated. Well, that's no good; we want
last-name written individually. This will speed up our alphabetizer service tenfold. Time to 4-step this bad boy!
We write to the new places of
last-name, while still writing to 'name'.
Alright, the new data looks legit, so we start reading from
last-name; if they don't exist, we fall back to reading from
name. No problem.
Let's say our data is not ephemeral and we care about keeping old values. To stop reading from the old place, we need to modify the old data. We do so, making sure there is a
last-name for each
This step can be tricky, which is why it's important that we're still writing to
Step 3 has survived production for some time, so we stop writing to
name and then celebrate total victory.
Step 1 was building the Reporting 2.0 Pipeline, which is a story for another post.
At this point, we hadn't load-tested our new Redshift cluster. It could ingest data, but would it survive our users' queries? To test this, we:
By this point, we had ironed out 99% of the kinks. We were confident enough to roll out a final feature flag: stop sending requests to 1.0. Woo!
A few months more passed without any major incidents. So, we terminated our oldest ec2 instances. Reporting 1.0 was officially dead. It felt good, but also somehow sad. That said, we overcame our grief pretty quickly after seeing the next month's Amazon cost graphs.
Burning down our Reporting 1.0 Backend was a fun and stressful adventure. We expected many late nights, but, surprisingly, the whole process went smoothly. I attribute this to how much time we spent planning our four steps (as well as having the patience to not immediately switch over).
So, next time you're in a position like this, consider using the Four Step Process™!
Sean is a programmer at Kevel. He enjoys debugging code, system maintenance, and improving infrastructure.