Mainframe file replication and sync on Azure

Azure Data Factory
Azure Data Lake
Azure SQL Database
Azure Storage
Azure Virtual Machines

Solution ideas

This article is a solution idea. If you'd like us to expand the content with more information, such as potential use cases, alternative services, implementation considerations, or pricing guidance, let us know by providing GitHub feedback.

When you migrate an on-premises mainframe or midrange application to Azure, transferring the data is a primary consideration. Several modernization scenarios require replicating files to Azure quickly or maintaining synchronization between on-premises files and Azure files.

This article describes several processes for transferring files to Azure, converting and transforming file data, and storing the data on-premises and in Azure.

Architecture

The following diagram shows some of the options for replicating and syncing on-premises files to Azure:

Diagram showing the three steps of migrating on-premises files to Azure: transferring, conversion and transformation, and storing in persistent storage.

Download a Visio file of this architecture.

Dataflow

  1. Transfer files to Azure:

    • The easiest way to transfer files on-premises or to Azure is by using File Transfer Protocol (FTP). You can host an FTP server on an Azure virtual machine (VM). A simple FTP job control language (JCL) sends files to Azure in binary format, which is essential to preserving mainframe and midrange computation and binary data types. You can store transmitted files in on-premises disks, Azure VM file storage, or Azure Blob Storage.

    • You can also upload on-premises files to Blob Storage by using tools like AzCopy.

    • The Azure Data Factory FTP/SFTP connector can also be used to transfer data from the mainframe system to Blob Storage. This method requires an intermediate VM on which a self-hosted integration runtime (SHIR) is installed.

    • You can also find third-party tools in Azure Marketplace to transfer files from mainframes to Azure.

  2. Orchestrate, convert, and transform data:

    • Azure can't read IBM Extended Binary Coded Decimal Interchange Code (EBCDIC) code page files in Azure VM disks or Blob Storage. To make these files compatible with Azure, Host Integration Server (HIS) converts them from EBCDIC to American Standard Code for Information Interchange (ASCII) format.

      Copybooks define the data structure of COBOL, PL/I, and assembly language files. HIS converts these files to ASCII based on the copybook layouts.

    • Before transferring data to Azure data stores, you might need to transform the data or use it for analytics. Data Factory can manage these extract-transform-load (ETL) and extract-load-transform (ELT) activities and store the data directly in Azure Data Lake Storage.

    • For big data integrations, Azure Databricks and Azure Synapse Analytics can perform all transformation activities fast and effectively by using the Apache Spark engine to perform in-memory computations.

  3. Store data:

    You can store transferred data in one of several available persistent Azure storage modes, depending on your requirements.

    • If there's no need for analytics, Azure Data Factory can store data directly in a wide range of storage options, such as Data Lake Storage and Blob Storage.

    • Azure hosts various databases, which address different needs:

      • Relational databases include the SQL Server family, and open-source databases like PostgreSQL and MySQL.
      • Non-relational databases include Azure Cosmos DB, a fast, multi-model, globally distributed NoSQL database.
  4. Review analytics and business intelligence:

    Microsoft Fabric is an all-in-one analytics solution that your organization can use to study data movement, experiment with data sciences, and review real-time analytics and business intelligence. It offers a comprehensive suite of features, including a data lake, data engineering, and data integration.

Components

Various file transfer, integration, and storage scenarios use different components. See the Azure pricing calculator to estimate costs for Azure resources.

Networking

An on-premises data gateway is bridge software that connects on-premises data to cloud services. You can install the gateway on a dedicated on-premises VM.

Data integration and transformation

  • Data Provider for Host Files is a component of HIS that converts EBCDIC code page files to ASCII. The provider can read and write records offline in a local binary file, or use Systems Network Architecture (SNA) or Transmission Control Protocol/Internet Protocol (TCP/IP) to read and write records in remote IBM z/OS mainframe datasets or i5/OS physical files. HIS connectors are available for BizTalk and Azure Logic Apps.

  • Azure Data Factory is a hybrid data integration service you can use to create, schedule, and orchestrate ETL and ELT workflows.

  • Azure Databricks is an Apache Spark-based analytics platform optimized for Azure. You can use Databricks to correlate incoming data, and enrich it with other data stored in Databricks.

  • Azure Synapse Analytics is a fast and flexible cloud data warehouse with a massively parallel processing (MPP) architecture that you can use to scale, compute, and store data elastically and independently.

Databases

  • Azure SQL Database is a scalable relational cloud database service. Azure SQL Database is evergreen and always up to date, with AI-powered and automated features that optimize performance and durability. Serverless compute and hyperscale storage options automatically scale resources on demand. With Azure Hybrid Benefit, you can use your existing on-premises SQL Server licenses on the cloud with no extra cost.

  • Azure SQL Managed Instance combines the broadest SQL Server database engine compatibility with all the benefits of a fully managed and evergreen platform as a service (PaaS). With SQL Managed Instance, you can modernize your existing apps at scale with familiar tools, skills, and resources.

  • SQL Server on Azure Virtual Machines lifts and shifts your SQL Server workloads to the cloud to combine the flexibility and hybrid connectivity of Azure with SQL Server performance, security, and analytics. You can access the latest SQL Server updates and releases with 100% code compatibility.

  • Azure Database for PostgreSQL is a fully managed relational database service based on the community edition of the open-source PostgreSQL database engine.

  • Azure Database for MySQL is a fully managed relational database service based on the community edition of the open-source MySQL database engine.

  • Azure Cosmos DB is a fully managed, multi-model NoSQL database service for building and modernizing scalable, high-performance applications. Azure Cosmos DB scales throughput and storage elastically and independently across geographic regions and guarantees single-digit-millisecond latencies at 99th percentile availability anywhere in the world.

Other data stores

  • Blob Storage stores large amounts of unstructured data, such as text or binary data, that you can access from anywhere via HTTP or HTTPS. You can use Blob Storage to expose data publicly or to store application data privately.

  • Data Lake Storage is a storage repository that holds a large amount of data in native, raw format. Data Lake Storage provides scaling for big data analytics workloads with terabytes and petabytes of data. The data typically comes from multiple heterogeneous sources, and might be structured, semi-structured, or unstructured.

Potential use cases

On-premises file replication and synchronization use cases include:

  • Downstream or upstream dependencies, for example if applications that run on a mainframe and applications that run on Azure need to exchange data via files.

  • Parallel testing of rehosted or re-engineered applications on Azure with on-premises applications.

  • Tightly coupled on-premises applications on systems that can't immediately be remediated or modernized.

Contributors

This article is maintained by Microsoft. It was originally written by the following contributors.

Principal authors:

To see non-public LinkedIn profiles, sign in to LinkedIn.

Next steps