Loading content

Please standby, while we are retrieving your information

Migration Asset

Let's migrate to cloud and achieve great heights

Data Discovery & Migration Studio

The Migration Asset provides the customer with the advanced tooling needed to analyse their existing data estate and create a customized migration plan to a modern, cloud-based platform. The user has to move historic data, typically to get the project started in the cloud. It can be from data warehouses as well as from other sources like files, RDBMS, etc. Targets can be S3, BLOB, Redshift, Synapse, Snowflake, and BigQuery.
Migration Asset

Enterprise Data Warehouse (EDW) are central repositories of integrated data that are purpose-built and optimized to support reporting and analytics. EDWs store current and historic data in an integrated repository to enable business intelligence throughout the enterprise. Building these warehouses has traditionally been a multi-year project, with great thought and data management best practices leveraged to build them.
The migrations are where the user has to move historic data, typically to get the project started in the cloud. It can be from data warehouses, or it can be from Adobe or other sources. The user is migrating a data warehouse to the cloud to get started.

Target Data Warehouse

Blog-post Thumbnail

Redshift

Amazon Redshift is a fast, fully managed, petabyte-scale data warehouse service that makes it simple and cost-effective to efficiently analyze all your data using your existing business intelligence tools.

Blog-post Thumbnail

Snowflake

Snowflake is a single, integrated platform delivered as-a-service. It features storage, compute, and global services layers that are physically separated but logically integrated.

Blog-post Thumbnail

Big Query

BigQuery is a fully managed enterprise data warehouse that helps you manage and analyze your data with built-in features like machine learning, geospatial analysis, and business intelligence.

Blog-post Thumbnail

AZURE SYNAPSE

Azure Synapse is a limitless analytics service that brings together enterprise data warehousing and Big Data analytics. It gives you the freedom to query data on your terms, using either serverless or dedicated resources—at scale.

Idea migration

Data Estate Modernization Process

It enables the user to extract the existing landscape through the discovery service and analyze the complexity of data transformation scripts using the ETL complexity analyzer.

It automates schema, data, and ETL migration from source to target through an intelligent NLP and metadata-driven mapping framework built specifically for a given source and target.

Data profiling, Data reconciliation, and Data validation modules are leveraged through ABCR & Data Trust module which is required to validate the migration status.


Migration Features

A brief summary

Link Service
Migration

Discovery Service
Migration

Schema
Migration

Data
Migration

Complexity Analyzer
Migration

ETL
Migration

Link Service Migration

The Link services enable the platform to connect to the source and target. Link services are much like connection strings, which define the connection information required to establish connection to external sources.

Link services support following connections

  • Teradata
  • Target warehouse
  • FTP/SFTP server
  • Discovery Service Migration

    Discovery service enables the user to extract the metadata information from source and target databases. The point-in-time snapshot will be captured from source/target and stored in IDEA metadata.
    The captured metadata will be used for other module job registry as well as job execution.

    Metadata extract from Teradata is performed by

  • FTP extract
  • SFTP extract
  • Target warehouse Redshift / Synapse / Big Query / Snowflake metadata extract is done using live connection, since these cloud native compute services are separated from storage and a dedicated data warehouse can be utilized to extract the metadata without impacting the other target warehouse activities.

    Discover plane is used to extract the metadata information captured and show the details visually to the user. The APIs are built to provide an overview as well as a detailed list of components for the user to explore the discovered objects.

    Schema Migration

    The Schema migration service enables the user  to migrate the database components from Teradata to the target warehouse based on the mapping information updated in IDEA metadata during the discovery service. The schema migration service depends on two key components

    • Teradata (aka Source) Database discovery service output
    • Mapping detail

    Teradata discovery service extracts the attributes for the following objects from DBC table. Based on the compatibility on target data warehouse, applicable objects will be migrated

    • Database
    • Tables
    • Views
    • Users
    • Roles
    • Macro
    • Stored Procedures
    • Triggers

    The collection of teradata specific keywords is stored in mapping metadata tables. These mapping tables will be used as reference and relevant target warehouse keywords are populated. If there is no direct mapping available, then the alternatives will be used.

    • View level mapping
    • Column level mapping
    • Database / Dataset level mapping
    • able level mapping

    Data Migration

    Data Migration service migrates data from Teradata as well as FTP / SFTP server to target warehouses - Redshift / Synapse / Big Query / Snowflake.

    The data migration jobs are created by selecting the list of tables from the discovery service and the refresh mechanism. When the category is selected ,such as Bulk ,Incremental ,Timestamp ,and Query, etc., the relevant tables will be listed based on the data captured at each table level.

    The job run will be triggered based on the details provided while registering the job and it is split into two stages. Data is extracted from Teradata or FTP/SFTP servers to target cloud service storage and vice versa.

    These two stages are decoupled using Kafka queue which helps to manage the failure-re-run effectively as well as start the next table load from Teradata while the previous table Target warehouse load in-progress.

    The job execution will be considered successful only when all the tables associated with the job are successfully loaded from source to target warehouse.

    The re-run option will be enabled only for the failed job run. A failed job re-run will perform the data migration for the failed tables and the data load will resume from the point where it failed.

    Complexity Analyzer

    The Complexity Analyzer defines the complexity of ETL/ELT scripts in the aspect of conversion.
    Users need to perform the following steps for Complexity Analyzer service

    • Complexity Analyzer Job Creation
    • View Complexity Job
    • Edit Complexity Job
    • Execution of Complexity Job
    • Monitoring Job Execution
    • Delete Complexity Job

    Also Complexity Analyzer scans the Teradata BTEQ scripts and classifies them into Simple, Medium and Complex based on the rules defined in cloud native services.

    The job registry requires user to specify a path where the BTEQ script is available along with the link service to connect the FTP server where the scripts are placed. While registering the jobs, the rules also need to be specified. The classification rules are specific to each job registered ,and it can have different classification values based on the user input.

    ETL Migration

    ETL migration service constitutes of two services, which generate the following output based on the mapping document available in IDEA metadata

    • ETL Code Conversion
    • Store Procedure wrapped ETL Conversion

    ETL script might have DDL (temporary table creation) and DML statements and the ETL migration supports both DDL and DML conversion into target warehouse compatible statements. ETL migration also uses parsers to generate two files

    • Error Parsed File (AI powered)
    • Semantic Parsed File (AI powered)

    Percentage statistics file will help you understand the conversion percentage of the input script. The two error files will specify the errors at a granular level for better understanding.

    For converted scripts - APIs are built to show the source and converted scripts on UI to perform side-by-side comparison.


    Design for Industrialization

    Service Benefits

    Quick turnaround.

    New connectors can be built as per the need to cover more sources due to a very scalable and agile architecture

    Use of Cloud specific native services for the job

    Quick notification on the success and failure of data migration service.

    Code analyzer to provide the current landscape of code in the source side

    Gives holistic view of code status and smoothens the process of migration.

    Low Code No Code Data Pipeline creation.

    With LCNC orchestration modules developers can drag and drop components on the canvas and easily build the data flows in the pipeline and generate DAGs with a click of a button.
    Blog-post Thumbnail

    Migration Asset

    Enterprise Data Warehouse (EDW) are central repositories of integrated data purpose-built and optimized to support reporting and analytics.

    Document Db, Python, Spark, Glue, Airflow, Kubernetes, and UI tools like React.js and Redux.
    The data migration service relies on Spark and Cloud specific native services like ADF, Glue. Spark clusters as well as ADF and Glue can auto-scale very fast, hence data migration services can scale for large data and the scaling also depends on the source scalability. For movement of a large volume of data, network utilization may shoot up and there is a dependency on network bandwidth between on-premises and cloud.
    Yes, please contact the IDEA Support team.
    Currently, IDEA uses Airflow. For other data workflow orchestration technologies, IDEA needs to be customized. Please contact IDEA Support team for placing your request.
    Next Steps

    To learn more about IDEA by Capgemini and how we can help make data your competitive edge.
    Visit : www.capgemini.com/ideabycapgemini

    • Mukesh Jain
      IDEA Head
      mukesh.jain@capgemini.com


    • Harsh Vardhan
      IDEA Chief Digital Leader & Chief Technology Architect
      harsh.c.vardhan@capgemini.com

    • Eric Reich
      Offer Leader and Global Head AI & Data Engineering VP
      eric.reich@capgemini.com

    • Aurobindo Saha
      IDEA Principal Sales Architect
      aurobindo.saha@capgemini.com


    • Sameer Kolhatkar
      IDEA GTM Lead
      sameer.kolhatkar@capgemini.com


    • Sandip Brahmachari
      IDEA D&S Lead sandip.brahmachary@capgemini.com


    • Anupam Srivastava
      IDEA Engineering Lead
      anupam.a.srivastava@capgemini.com


    • Subramanian Srinivasan
      IDEA Shared Services Lead
      subramanian.srinivasan@capgemini.com