How to do Big Data Migration
I have been involved in many data migration projects. Usually you cannot write about data migration in general because it is vastly project- oriented development. You can mention some of the errors and mistakes but it is not possible to make a general plan for all the data migration projects. Here in this post I will explain some of the mistakes and some tips worth to read for the data migration process. hopefully you find it useful.
What is a DataAny transference of data from one source to another is a data migration. sometimes this is happening from one SQL server database to a newer version of database with changes in the scheema. Example:
Consider that you are selling an application named CustRocket which is a web application for managing the customers information. You are selling this application in the market for 10 years and you have 2000 clients. Today one of the sale department member approach you and inform your department that they have a new client who decided to use your application but they already have 100,000 customers in different systems like excels and outlook and some other small applications. And one part of the contract is to convert and migrate their legacy data into your application which uses the SQL Server. This is an example of data migration project. But is this a big project to migrate the 100,000 customers data? how you can assess the size of a data migration?
Size of a Data Migration Project
The size and complexity of a data migration is not related to the value of data that you are migrating. It is more about the number of entities that is being migrated over. Just consider that have 10 tables to migrate to a system with 15 tables and you have just 10 records to migrate; this would be much complicated than 2 tables with 100,000 records. Now that we know where the complexity of the project comes from, we can do our estimations.
How to estimate a data migration project
Estimation should be done based on the number of entities you are migrating. If you have 2 tables to migrate to a new system that can be done in 7 to 8 hours but if you have 50 tables then you need to have 20 to 25 days to write a proper script for migrating the data. In my experience you can migrate two entities per day. Because you need to develop lots of logic for validation, insertion and reconciliation. And this time is just the development time not the time of the whole process.
Data Migration Steps
Before I start to explain the steps, I need to mention that these are the steps based on my own experiences, you may find some other steps which is crucial and I may missed some of them. If you have any of them in your mind please do me a favour and comment here.
1- Analyse of the Source Data: this step has to be taken by the analysts. They are those who know how data are structured in the system and what are the minimum fields need to be populated in a system to work properly. They have to analyse the source data which is — for the example above — the information that client uses at the moment. It can be a box excel sheet. A SQL server database, some plain text files or any other information. Analysts need to translate those pieces of data into the data structure of the application. So you have a source and a target. Source can be any of those material while target is our database tables.
2- Migrate one piece of data correctly: when you have the analyse of the data in hand, you need to migrate manually one set of data — like one customer with all its related data — from the source to target manually. by manually I mean do it directly in the target. For example, if the analyse document is saying a row of data in the excel sheet would be translated to these rows in the target db then you need to select one row and populated the required tables in the target database for that selected row. In this way you are migration one brick from the source to target to see how it goes. if you are not able to migrate one small brick how can you migrate the whole house.
3- Testing the migrated brick: after you migrate that specific data then ask your QA to test the whole system with that data to see if the migrated data is fit in the system or not. Believe me you will find lots of problems and issues. If you find just couple your analysts did their job perfect which is not the case for most of the time. Thus, do not trust in any one else job or even your job, just trust the data and test the system. DO NOT IGNORE this step.
4- Develop the Migration Scrips: when you are confident that you have migrated one piece of data — like one customer — successfully then it is time to develop a script to migrate all the data from the source to the target. If you do not do the previous steps correctly then you will not be able to do the migration in an effective period of time. All the three previous steps helping you to reduce the time of development, testing and debugging after the whole data has been migrated.
5- Test the Migrated Data: QAs need to test the entire system based on the newly migrated data. For example, you need to see whether a migrated customer is editable in the system, or is that possible to remove a migrated customer. Are the newly migrated data searchable in the system, do we migrate the security related data correctly, are migrated data participating in the queries for the reporting system. Or in general are they became part of the system or not.
6- data reconciliation: you this is a huge topic for itself. To put it short we can say you need a mechanism in the system to check the data reconciliation, means you need to to check if the all the data has been migrated or not. for example if there are 200,000 records in the source database do we have 200,000 records in the target db ?
Data migration is a very tricky process in the application life cycle. If you do not follow steps one by one then you could not migrate the data correctly. Always keep in mind that the migration will be overdue, so add 10 to 20% extra time to your estimation.
I am writing another article here for how to structure and develop the migration scripts. Thanks for reading this and please feel free to comment your ideas.
Photo by Dariusz Grosa from Pexels