Information migration plan. Case research of high-quality, error-proof AWS mission
An excellent knowledge migration plan is a should, particularly when shifting knowledge between two advanced programs (a legacy system and a brand new one) which might be each repeatedly working. Migrating knowledge, whenever you switch knowledge price terabytes or work with delicate knowledge, is extraordinarily demanding. So, as a way to guarantee knowledge high quality for GOconnectIT, our knowledge migration specialists created a customized, AWS cloud-based resolution that supported and future-proofed GCI’s enterprise processes.
Why did GOconnectIT search for a technological companion?
GOconnectIT is an organization from the Netherlands creating skilled software program for managing the set up of optical fibers for personal properties and companies.
GOconnectIT is without doubt one of the main corporations within the utilities business, working with 50+ prospects, contractors, and community operators. They focus on offering GEO data, supporting improvements at infrastructural tasks, lowering harm to cables and pipelines, and ensuring that every one needed deadlines and procedures are met.
The corporate had two merchandise for Fiber rollout (FTTH), one among which was designated as end-of-life, intending to maneuver all prospects to the newer product – GO FiberConnect.
The preliminary evaluation confirmed that:
- the previous, legacy system has a really inflexible logic,
- the info is in depth and sophisticated.
The primary drawback we needed to resolve collectively was the configuration which differed relying on GOconnectIT’s prospects and their end-users. Typically a number of prospects work for a similar consumer after which the configurations needs to be equivalent. Sadly in FiberConnect, it diversified and extra knowledge engineering work was needed.
The information migration plan was only one a part of a a lot bigger mission for GOconnectIT
Try the total scale of our cooperation within the enterprise case research beneath 👇
Preliminary issues and a lesson discovered
Each mission brings a lesson, generally a harsh one. From this expertise, we gained an excellent stronger conviction that ETL tasks require an enormous dedication from each events, area specialists, and a separate and meticulous evaluation section. With no mounted plan, the implementation time will likely be topic to fixed change. As a result of complexity of the mission wants and really sturdy alignment with the consumer, our assumptions have advanced over time which had a direct impression on how we lastly carried out the mission. We grew to become satisfied that it was price spending extra time on the evaluation section.
At first, it appeared that the duty would consist of remodeling the info from the relational database and getting into it into the suitable goal system database. Information migration wasn’t solely on the relational database degree, but additionally required integration with the brand new system by way of its API.
For these causes, utilizing present options for database migration working on the SQL degree was out of the query.
So, how did our knowledge engineering providers resolve this?
Information migration technique. AWS cloud – we select you!
For the database migration, we picked AWS cloud, and extra particularly the next providers:
- Lambda,
- Step Capabilities,
- SQS,
- Parameter Retailer,
- DynamoDB,
- S3.

We selected this stack as a result of we have already got huge expertise with price limiting on Lambdas. We wanted an answer to find out how briskly the info is processed to the brand new system, and how briskly is knowledge downloaded from the previous system.
Moreover, we required a service that may permit us to avoid wasting the present state of the migration (errors, which knowledge was migrated and which wasn’t, and so on.), so if needed, we’re in a position to course of solely non-migrated knowledge with out having to delete the previous ones and migrate once more. On this case, we selected DynamoDB as a totally serverless, non-relational database designed to course of large quantities of knowledge (ETL).
Reusing TCP/IP connections on Lambdas? Why not!
One of many causes some folks could frown upon constructing a posh migration technique that fetches/uploads knowledge from/to a relational database utilizing Lambda features is reusing TCP/IP connections.
Lambda is a traditional FaaS (Operate as a Service), i.e. a software program improvement mannequin based mostly on code execution on digital machines absolutely managed by a cloud supplier. As FaaS customers, we solely want to supply the operate logic (exported as a handler), which can then be imported by the digital machine and finally run.
It might look like it’s important to re-establish the TCP/IP connection to the database each time you execute a Lambda operate. As you realize, establishing such a connection is time-consuming and the variety of parallel connections is normally restricted.
Nevertheless, there are two methods to reuse the identical TCP/IP connections to the database:
This fashion is just attainable until you employ Amazon RDS or AWS Aurora as your database service. Amazon RDS Proxy permits your serverless functions to share connections established with the database. As soon as the connection is settled, RDS Proxy takes care of it and when one other Lambda needs to hook up with the database, it doesn’t must create a model new connection, however it is going to be served with an already present one.
- Transferring the TCP/IP connection out of the Lambda handler
One other method, which is relevant irrespective of which database service you employ (on-premise/cloud/and so on.), is about reusing the identical connection throughout a single Digital Machine the place the Lambda features are operating.
The key is to take the initialization of the connection outdoors of the exported handler operate. Due to this, the connection will likely be established solely as soon as and can final so long as the digital machine is alive (normally round quarter-hour).
Extra optimizations like this are described by AWS themselves. In The Software program Home, we use them very intensively, so we extremely suggest the learn.
Information migration course of beneath the microscope
Lengthy story quick: we constructed an ETL mechanism that retrieves data from the supply database and places it on the queue. Then the ETL processes the queued knowledge and enters it into the goal system.
Now, onto the main points.

So, what’s with the graph? Let’s analyze it step-by-step as a way to perceive what’s occurring beneath the hood:
1. The method begins with the steps configureMigrateCompletedProjects and configurePlanboardSecondsToWait.
These are some easy ones that configure the parameters that are used later in all the stream. The consumer can both manually present the values when beginning the migration or – if they’re omitted – they are going to be populated with some default values.
2. Step migrateOnlyPlanboard is a traditional AWS Step Capabilities Alternative state.
Name it a easy if assertion. Based mostly on the enter parameter that the consumer specified (or that was populated robotically with a default worth), the stream can cut up into two paths – it may both skip many of the stream and go proper to the MigratePlanboardsFlow, or go gracefully step-by-step. The migrated system is fairly advanced, that’s why knowledge migrations additionally must be configurable as a way to tackle varied use instances. Step MigratePlanboardsFlow is dependent upon the brand new system’s asynchronous jobs completion standing. Generally we all know what’s the standing of those jobs and we will go straight to this step. Generally we don’t – and now we have to traverse the stream’s full path.
3. Steps saveActiveWorkflows, disableActiveWorkflows, and setEmailSendingFalse.
They’re there to configure the standing of the brand new system earlier than the info switch. They’re just like the warmup earlier than the sport – some minor tweaks and methods earlier than truly shifting knowledge.
4. Steps MigrateFlow, UpdateMigratedAssignmentsFlow, MigrateDocumentsFlow are those the place the true knowledge migration occurs!
They’re categorized by the entity title that’s at present being migrated. Nevertheless, one among them known as MigrateFlow migrates a number of of the assets which might be logically linked with one another. That’s why they have been positioned collectively beneath a typical step. Additionally, all of them are separate AWS Step Capabilities! Why? Not solely that helps to group frequent components of the method beneath properly named, separate AWS Step Capabilities and monitor the migration course of very exactly from the enterprise perspective, but additionally helps to beat the limitations of historical past occasions per step features.
Throughout AWS re:Invent 2022, a distributed map was added as a stream which is great for orchestrating large-scale parallel workloads. Each iteration of the map state is a baby execution with its personal occasion historical past – which implies you received’t run right into a 25,000 historical past occasions restrict per father or mother’s Step Operate execution.
5. The following step – waitForProcessingPlanboard – is an AWS Step Capabilities Wait state.
Bear in mind after we instructed you about these asynchronous jobs within the new system that must be executed earlier than going ahead with the migration? If we all know that these jobs haven’t been finalized but, now we have to attend a bit. Right here goes the… wait step. Badum-tss!
6. MigratePlanboardFlow is one other stage the place knowledge migration occurs.
That is this nasty step that’s depending on these asynchronous jobs execution. Since we all know that at this migration stage, all of them needs to be already executed, we will proceed ahead.
7. Final however not least, GenerateReportFlow step is triggered.
It generates a set of stories which include a abstract of the migration course of:
- how a lot knowledge was efficiently transferred from the previous to the brand new system,
- how a lot knowledge was not migrated,
- why the remaining knowledge couldn’t be migrated
- the length of the migration course of, and so on.
Working operations per sub-project
Since we needed to migrate not one, however many databases, we couldn’t merely run one Lambda Operate and command it to carry out the migration course of for all the databases. That’s due to some exhausting Lambda-level limitations like most execution time.
Due to that, we had to consider some normal resolution that might undergo all the databases and run the migration course of for them. The diagram beneath illustrates our method (which is – you guessed it – one other AWS Step Operate).

ProcessFlow is run utilizing a dynamic variable holding the ARN from the Step Operate that will likely be known as. This not solely permits us to group the method logically – by sub-projects – but additionally to scale back the variety of historic occasions in a single Step Operate (the restrict talked about earlier within the article).
Useful resource migration course of

The diagram above presents a generic useful resource migration course of that begins with downloading knowledge in batches from the previous system’s database.
Producers are the one a part of the method that interacts with the legacy system’s database to fetch knowledge for migration. Due to the truth that the info is downloaded in items – as an alternative of one after the other – and because of the concurrent downloading of knowledge utilizing a number of lambda features, the time for which the database is loaded has been considerably decreased.
Then Producers examine in DynamoDB (useful resource cache within the graph) whether or not the downloaded knowledge has been migrated. If the entity hasn’t but been migrated, the Producer generates a message that comprises all the mandatory data wanted emigrate it. Due to this resolution, additional steps of the migration course of now not want to speak with the database of the previous system once more. That makes the method way more versatile, loosely coupled, and error-proof.
Going additional, shoppers retrieve messages from the queuing system, map the info to a format acceptable by the brand new system and ship knowledge utilizing the API (or if there is no such thing as a applicable endpoint – on to the database however this time the database of the brand new system). The results of single useful resource migration is saved in DynamoDB (the identical DynamoDB the place Producers beforehand verified whether or not a given useful resource had already been migrated or not).
Modeling the method and utilizing DynamoDB as a cache makes the operation of the migration system idempotent – what has already been transferred is not going to be transferred once more. As well as, DynamoDB performs an audit operate right here. If the useful resource didn’t migrate (e.g. as a result of its knowledge was not accepted by the brand new system) an applicable error message and all the payload that was used will likely be saved within the database.
On twelfth January 2023, AWS added help for setting the Most Concurrency of Amazon SQS out of the field, therefore the trick described right here is now not wanted.
The important thing aspect of the migration course of was the usage of an AWS FIFO-type SQS queue that permits you to keep the order of processed messages. This queue has a quite fascinating function to manage the concurrency of our Lambda Capabilities.
In ETL processes, controlling the processing pace is essential. That is extraordinarily vital when getting into into communication with providers corresponding to a database or API – so as to not kill these providers with too many requests, we have to management the variety of requests per unit of time with clockwork precision.
The FIFO queue permits you to assign the MessageGroupId attribute to messages, specifying to which group a given message belongs. As a consequence of the truth that the order of message processing on this sort of queue have to be preserved, the variety of distinctive MessageGroupId attribute values determines what number of situations of the Lambda operate can course of messages at a given second.
By correctly classifying messages into totally different teams, we management the pace and concurrency of knowledge processing, thus adapting to the bounds of the system we’re at present migrating.
What’s particular about GCI database migration? Mission advantages
The legacy system is now not supported and will likely be discontinued. The migrated knowledge will permit continued roll-out and administration of the put in optical cable infrastructure.
The migration is error-proof. Data that didn’t be processed find yourself in a separate place the place they are often reanalyzed and reprocessed.
We’re in a position to regulate the pace of each knowledge downloading from the supply database and importing them to the API of the brand new system. This helped us schedule upkeep for occasions when it’s least disruptive to customers. It’s at all times vital to precisely analyze the quantity of knowledge, in any other case even probably the most thorough efficiency assessments received’t assist the efficiency points.
The answer is absolutely automated. As soon as deliberate migration (migrating chosen purchasers) will likely be carried out solely robotically. Contemplating the dimensions of the migration (thousands and thousands of data/TB of knowledge), there’s no want for an individual to continually manually examine it.
After the migration is accomplished, the system generates a set of stories that include a abstract of the migration course of (post-migration audit if you’ll). As a consequence of the truth that the previous system comprises a variety of inconsistencies within the knowledge, this report is a big assist – it reveals precisely on the testing stage which knowledge is defective and what finish customers want to repair earlier than the manufacturing migration.

GOconnectIT’s knowledge migration mission. Abstract
With a strong knowledge migration technique, we’ve efficiently transferred terabytes of knowledge from a legacy system the place it was saved in a relational database, and remodeled it to a format digestible by the brand new system.
We’ve accomplished the info migration plan in a totally managed method, iteratively refining the logic behind the info transformation, in addition to the info high quality of the data that might not be migrated.
“TSH has been in a position to assist us in occasions of bother”
– quote from Bart van Muijen (CTO, GOconnectIT). In case you wrestle to manage your organization’s knowledge, we’ve acquired a devoted migration crew who will resolve your knowledge migration issues too. We provide free, 1 hour consultations for anybody wanting to speak about their software program options and easy methods to enhance them. No strings connected, so why not strive?