Data Migration When You’re Always Up

I had a problem. I needed to drastically change a table schema, but our system is always up. We can’t just stop servers, run migration scripts, test everything, and bring it all back up. I needed to change it while it was being used. Someone likened this to changing the engines on a plane while it’s flying. What I decided to do is take the idea of parallel change and apply it to our database. So the schema migration was broken down into three phases: expand, migrate, and contract. With data, though, the migrate phase is particularly tricky. You have to deal with a period where the old version of the data and the new version have to coexist.

So what I did is make it so that the old version of the data (V1) was the source of truth while data migrated over to the new version (V2) with use. Updated clients would ask V2 for the data. If V2 already had the data, then it would return it. If not, then it would go to V1 for the data, convert it to V2 data, and return that. As that process continued, more and more data would get converted to the new format. Any writes have to update both versions.

The Internal Read

All the operations use the same read logic, which looks like this:

CoexistentDataMigration_InternalRead

The steps that are important for coexistence are green. When coexistence is over, those steps will go away.

The Read

Reading from the service takes advantage of the internal read logic, and looks like this:

CoexistentDataMigration_Read

When coexistence is over, the read logic won’t even change.

Inserting Data

Even write operations use the internal read. The insert uses it to see if the entity already exists before adding it.

CoexistentDataMigration_Create

During coexistence, the insert operation needs to insert to V1 as well, to keep the clients using the older methods happy.

Updating Data

Just like the insert, update uses the internal read to see if the entity exists before updating it.

CoexistentDataMigration_Update

 

Deleting Data

I hope by now you’re getting the idea. One more picture, and then some closing thoughts.

CoexistentDataMigration_Delete

Final Thoughts

One of the parts of all this that seemed a little counter-intuitive was how to get a list of entities. All the operations so far have been against the V2 collection. However, to get a list, we need to go against the V1 collection. Until all the data is migrated, getting a list from the V2 collection will return only a list of what has been migrated, which could leave entities out.

After a while, there will be a point where most of the data that will migrate by usage will have migrated. There’s still data in V1, and it still matters even though nobody has used it in a while. When that time came, I finally needed a script to move the remaining data. The script just got a list of entities that were in V1 but not V2. Then it did a get on all of them to force the conversion.