Present-day data storage solutions like the Azure data platform offer improved scalability and performance over mainframe and midrange systems. By modernizing, you can take advantage of these benefits. However, updating technology, infrastructure, and practices is complex. The process involves an exhaustive investigation of business and engineering activities. Data management is one aspect to consider when modernizing. But you also need to look at data visualization and integration.
Successful modernizations use a data-first strategy. With this approach, organizations focus on the data, rather than the new system. Data management is no longer merely an item on the modernization checklist. Instead, the data becomes the centerpiece. Sustainable systems result, as harmonized, quality-oriented data solutions replace fragmented, poorly governed ones.
This reference architecture outlines an end-to-end modernization plan for mainframe and midrange data sources. The solution uses Azure data platform components in a data-first approach. Specifically, the plan involves:
- Object conversion: Converting object definitions from the source data store to corresponding objects in the target data store.
- Data ingestion: Connecting to the source data store and extracting data.
- Data transformation: Transforming extracted data into appropriate target data store structures.
- Data storage: Loading data from the source data store to the target data store, both initially and continually.
Potential use cases
Mainframe and midrange customers can benefit from this solution, especially when targeting these goals:
- Modernize mission-critical workloads.
- Acquire business intelligence to improve operations and gain a competitive advantage.
- Escape the high costs and rigidity associated with mainframe and midrange data stores.
The diagram contains two parts, one for on-premises components, and one for Azure components. The on-premises part contains boxes that represent the file system, the relational and non-relational databases, and the object conversion components. Arrows point from the on-premises components to the Azure components. One of those arrows goes through the object conversion box, and one is labeled on-premises data gateway. The Azure part contains boxes that represent data ingestion and transformation, data storage, Azure services, and client apps. Some arrows point from the on-premises components to the tools and services in the data integration and transformation box. Another arrow points from that box to the data storage box, which contains databases and data stores. Additional arrows point from data storage to Azure services and to client apps.
Download a Visio file of this architecture.
Data modernization involves the following steps. Throughout the process, an on-premises data gateway transfers data quickly and securely between on-premises systems and Azure services (1).
The object conversion process extracts object definitions from sources. The definitions are then converted into corresponding objects on the target data store (2).
Microsoft SQL Server Migration Assistance (SSMA) for Db2 migrates schemas and data from IBM Db2 databases to Azure databases.
Data Provider for Host Files converts objects by:
- Parsing COBOL and RPG record layouts, or copybooks.
- Mapping the copybooks to C# objects that .NET applications use.
Third-party tools perform automated object conversion on non-relational databases, file systems, and other data stores.
Data ingestion and transformation
In the next step, the process migrates data.
Data Provider connects remotely to IBM host file system servers (3a). With non-mainframe systems, Data Provider reads data offline.
Mainframe and midrange systems store data on DASD or tape in EBCDIC format in these types of files:
COBOL, PL/I, and assembly language copybooks define the data structure of these files. Data Provider converts the data from EBCDIC to ASCII format based on the copybook layout.
FTP converts and transfers mainframe and midrange datasets with single layouts and unpacked fields to Azure (3b).
IBM mainframe and midrange systems store data in relational databases including:
These services migrate the database data (3c):
- Azure Data Factory uses a Db2 connector to extract and integrate data from these databases.
- SQL Server Integration Services (SSIS) handles a broad range of data ETL tasks.
IBM mainframe and midrange systems store data in non-relational databases including:
- IDMS, a network model Database Management System (DBMS)
- IMS, a hierarchical model DBMS
Third-party products integrate data from these databases (3d).
Azure services like Data Factory and AzCopy load data into Azure databases and data storage (4). Third-party solutions and custom loading solutions can also load data.
Azure offers many managed data storage solutions (5):
- Azure SQL Database
- Azure Database for PostgreSQL
- Azure Cosmos DB
- Azure Database for MySQL
- Azure Database for MariaDB
- Azure SQL Managed Instance
- Azure Data Lake Storage
- Azure Storage
A range of Azure Services use the modernized data tier for computing, analytics, storage, and networking (6).
Existing client applications also use the modernized data tier (7).
The solution uses the following components.
SSMA for Db2 automates migration from Db2 to Microsoft database services. While running on a virtual machine (VM), this tool converts Db2 database objects into SQL Server database objects and creates those objects in SQL Server. SSMA for Db2 then migrates data from Db2 to the following services:
- SQL Server 2012
- SQL Server 2014
- SQL Server 2016
- SQL Server 2017 on Windows and Linux
- SQL Server 2019 on Windows and Linux
- Azure SQL Database
- Azure SQL Managed Instance
- With offline connections, Data Provider reads and writes records in a local binary file.
- With SNA and TCP/IP connections, Data Provider reads and writes records stored in remote z/OS (IBM z series Mainframe) datasets or remote i5/OS (IBM AS/400 and iSeries systems) physical files. Only i5/OS systems use TCP/IP.
Azure Services provide environments, tools, and processes for developing and scaling new applications in the public cloud.
AzCopy is a command-line utility that moves blobs or files into and out of storage accounts.
SQL Server Integration Services (SSIS) is a platform for building enterprise-level data integration and transformation solutions. You can use SSIS to solve complex business problems by:
- Copying or downloading files
- Loading data warehouses
- Cleansing and mining data
- Managing SQL Server objects and data
Azure SQL Database is part of the Azure SQL family and is built for the cloud. This service offers all the benefits of a fully managed and evergreen platform as a service. Azure SQL Database also provides AI-powered, automated features that optimize performance and durability. Serverless compute and Hyperscale storage options automatically scale resources on demand.
Azure Database for PostgreSQL is a fully managed relational database service that's based on the community edition of the open-source PostgreSQL database engine. With this service, you can focus on application innovation instead of database management. You can also scale your workload quickly and easily.
Azure Cosmos DB is a globally distributed, multi-model database. With Azure Cosmos DB, your solutions can elastically and independently scale throughput and storage across any number of geographic regions. This fully managed NoSQL database service guarantees single-digit millisecond latencies at the ninety-ninth percentile anywhere in the world.
Azure Database for MySQL is a fully managed relational database service based on the community edition of the open-source MySQL database engine.
Azure SQL Managed Instance is an intelligent, scalable cloud database service that offers all the benefits of a fully managed and evergreen platform as a service. SQL Managed Instance has near 100 percent compatibility with the latest SQL Server (Enterprise Edition) database engine. This service also provides a native virtual network implementation that addresses common security concerns.
Azure Data Lake Storage is a storage repository that holds a large amount of data in its native, raw format. Data lake stores are optimized for scaling to terabytes and petabytes of data. The data typically comes from multiple, heterogeneous sources and may be structured, semi-structured, or unstructured.
Azure Storage is a cloud storage solution that includes object, file, disk, queue, and table storage. Services include hybrid storage solutions and tools for transferring, sharing, and backing up data.
An on-premises data gateway acts as a bridge that connects on-premises data with cloud services. Typically, you install the gateway on a dedicated on-premises VM. Cloud services can then securely use on-premises data.
Azure VMs are on-demand, scalable computing resources that are available with Azure. An Azure VM provides the flexibility of virtualization. But it eliminates the maintenance demands of physical hardware. Azure VMs offer a choice of operating systems, including Windows and Linux.
- When you use the Data Provider for Host Files client to convert data, turn on connection pooling to reduce connection startup time.
- When you use Data Factory to extract data, take steps to tune the performance of the copy activity.
Keep these points in mind when considering this architecture.
When you use an on-premises application gateway, be aware of limits on read and write operations.
- The on-premises data gateway provides data protection during transfers from on-premises to Azure systems.
- When you use Data Provider for Host Files to convert data, follow the recommendations in Data Providers for Host Files Security and Protection to improve security.
Use the Azure pricing calculator to estimate the cost of implementing this solution.
- Contact Azure Data Engineering - Mainframe & Midrange Modernization for more information.
- Read the Migration guide.