Amazon Redshift is a fast, fully managed, petabyte-scale data warehouse service that makes it simple and cost-effective to efficiently analyze all your data using your existing business intelligence tools.
Please standby, while we are retrieving your information
Data Discovery & Migration Studio
Enterprise Data Warehouse (EDW) are central repositories of integrated data that are purpose-built and
optimized to support reporting and analytics. EDWs store current and historic data in an integrated
repository to enable business intelligence throughout the enterprise. Building these warehouses has
traditionally been a multi-year project, with great thought and data management best practices
leveraged to build them.
The migrations are where the user has to move historic data, typically to get the project started in the cloud. It can be from data warehouses, or it can be from Adobe or other sources. The user is migrating a data warehouse to the cloud to get started.
Target Data Warehouse
Snowflake is a single, integrated platform delivered as-a-service. It features storage, compute, and global services layers that are physically separated but logically integrated.
BigQuery is a fully managed enterprise data warehouse that helps you manage and analyze your data with built-in features like machine learning, geospatial analysis, and business intelligence.
Azure Synapse is a limitless analytics service that brings together enterprise data warehousing and Big Data analytics. It gives you the freedom to query data on your terms, using either serverless or dedicated resources—at scale.
Data Estate Modernization Process
It enables the user to extract the existing landscape through the discovery service and analyze the complexity of data transformation scripts using the ETL complexity analyzer.
It automates schema, data, and ETL migration from source to target through an intelligent NLP and metadata-driven mapping framework built specifically for a given source and target.
Data profiling, Data reconciliation, and Data validation modules are leveraged through ABCR & Data Trust module which is required to validate the migration status.
A brief summary
Link Service Migration
The Link services enable the platform to connect to the source and target. Link services are much like connection strings, which define the connection information required to establish connection to external sources.
Link services support following connections
Discovery Service Migration
Discovery service enables the user to extract the metadata information from source and target databases. The point-in-time snapshot will be captured from source/target and stored in IDEA metadata. The captured metadata will be used for other module job registry as well as job execution.
Metadata extract from Teradata is performed by
Target warehouse Redshift / Synapse / Big Query / Snowflake metadata extract is done using live connection, since these cloud native compute services are separated from storage and a dedicated data warehouse can be utilized to extract the metadata without impacting the other target warehouse activities.
Discover plane is used to extract the metadata information captured and show the details visually to the user. The APIs are built to provide an overview as well as a detailed list of components for the user to explore the discovered objects.
The Schema migration service enables the user to migrate the database components from Teradata to the target warehouse based on the mapping information updated in IDEA metadata during the discovery service. The schema migration service depends on two key components
- Teradata (aka Source) Database discovery service output
- Mapping detail
Teradata discovery service extracts the attributes for the following objects from DBC table. Based on the compatibility on target data warehouse, applicable objects will be migrated
- Stored Procedures
The collection of teradata specific keywords is stored in mapping metadata tables. These mapping tables will be used as reference and relevant target warehouse keywords are populated. If there is no direct mapping available, then the alternatives will be used.
- View level mapping
- Column level mapping
- Database / Dataset level mapping
- able level mapping
Data Migration service migrates data from Teradata as well as FTP / SFTP server to target warehouses - Redshift / Synapse / Big Query / Snowflake.
The data migration jobs are created by selecting the list of tables from the discovery service and the refresh mechanism. When the category is selected ,such as Bulk ,Incremental ,Timestamp ,and Query, etc., the relevant tables will be listed based on the data captured at each table level.
The job run will be triggered based on the details provided while registering the job and it is split into two stages. Data is extracted from Teradata or FTP/SFTP servers to target cloud service storage and vice versa.
These two stages are decoupled using Kafka queue which helps to manage the failure-re-run effectively as well as start the next table load from Teradata while the previous table Target warehouse load in-progress.
The job execution will be considered successful only when all the tables associated with the job are successfully loaded from source to target warehouse.
The re-run option will be enabled only for the failed job run. A failed job re-run will perform the data migration for the failed tables and the data load will resume from the point where it failed.
The Complexity Analyzer defines the complexity of ETL/ELT scripts in the aspect of conversion. Users need to perform the following steps for Complexity Analyzer service
- Complexity Analyzer Job Creation
- View Complexity Job
- Edit Complexity Job
- Execution of Complexity Job
- Monitoring Job Execution
- Delete Complexity Job
Also Complexity Analyzer scans the Teradata BTEQ scripts and classifies them into Simple, Medium and Complex based on the rules defined in cloud native services.
The job registry requires user to specify a path where the BTEQ script is available along with the link service to connect the FTP server where the scripts are placed. While registering the jobs, the rules also need to be specified. The classification rules are specific to each job registered ,and it can have different classification values based on the user input.
ETL migration service constitutes of two services, which generate the following output based on the mapping document available in IDEA metadata
- ETL Code Conversion
- Store Procedure wrapped ETL Conversion
ETL script might have DDL (temporary table creation) and DML statements and the ETL migration supports both DDL and DML conversion into target warehouse compatible statements. ETL migration also uses parsers to generate two files
- Error Parsed File (AI powered)
- Semantic Parsed File (AI powered)
Percentage statistics file will help you understand the conversion percentage of the input script. The two error files will specify the errors at a granular level for better understanding.
For converted scripts - APIs are built to show the source and converted scripts on UI to perform side-by-side comparison.