Data Integration Design Patterns With Microservices
Data Integration Design Patterns With Microservices
My name is Mike Davison. I work at Microsoft as a Data Platform Solution Architect. In this role, I spend most of my time working with large customers on data and analytics solutions. Digital transformation is probably also worth a mention for those readers playing the IT Vendor Blog Drinking Game…
While this blog is focused on “BI, Big Data and SQL Server”, I will discuss these topics from the perspectives of application and enterprise architecture. The bulk of my professional expertise lies in these domains and my hope is to leverage my experience to add some novel, “outside in” perspective to matters data related. In this spirt my first of hopefully many posts is focused on patterns of data integration for microservices. Microservices and traditional approaches to data integration often appear to be at odds, so we’ll explore some data integration approaches that respect the principles that define the microservices architectural style.
I can be reached on LinkedIn at https://ca.linkedin.com/in/mikeydavison. I look forward to hearing from you.
Martin Fowler defines the microservices architectural style as “a particular way of designing software applications as suites of independently deployable services” (Fowler, 2014). Beyond independent deployment, services, be they microservices or the “macroservices” of the service oriented architecture (SOA) style, adhere to common architectural principles including loose coupling, composability, use of standardized interfaces, opacity, etc.
Opacity is synonymous with object oriented programming concepts like encapsulation or information hiding. Practically the opacity principle means that microservice implementation details are hidden from consumers, thus minimizing coupling between microservice and consumer. The opaque nature of microservices, particularly with respect to implementation technologies like databases, is often problematic for traditional approaches to data integration for business intelligence or other analytic activities.
Data integration for BI and analytics is well trodden ground in most organizations. Every organization that produces reports or dashboards often has some flavor of data integration platform working behind the scenes to pull data from source databases (relational, “no-SQL”, or otherwise), transform data, and load data into a dedicated data repository for BI and analytics. Data extraction necessarily involves the data integration platform knowing a source databases location and further knowing detailed information about the source databases structure. When a data integration platform extracts data from a database internal to a microservice, coupling that violates the principle of opacity occurs. The data integration platform is using information (e.g. table names, data types, etc.) it has no business knowing per the microservices style.
Tight coupling between microservice and data integration platform means that changes to the microservice must be coordinated, tested, and deployed in conjunction with corresponding changes to the data integration platform. This unfortunately sounds an awful lot like the complex and error prone approaches used in conventional application and data integration architectures today. Given that organizational reporting needs aren’t going away and further that organizations aren’t going to magically become effective at dependency management and change coordination, new approaches to data integration for microservices are required. Following are descriptions of a few approaches that enable data integration with microservices while remaining true to the principles that characterize the microservices architectural style. The approaches are:
- Microservice Data Integration via Operational Data Store
- Microservice Data Integration via Gateway
- Microservice Data Integration via Service Interface
1. Microservice Data Integration via Operational Data Store
A microservice may enable data integration via a microservice-managed operational data store (ODS). In this approach the microservice exposes an ODS, typically a relational database of some sort, via a standard open protocol like ODBC. The population of the ODS, via local data integration platform, code, or otherwise, remains the sole responsibility of the microservice. While the ODS is a common element in many data architectures, the shift of responsibility for ODS population from central data integration platform to microservice implementation detail preserves microservice opacity. In other words, all aspects of the ODS are within the scope of the microservice. In this capacity, the ODS interface is no different than any other microservice interface; preservation of the interface in the face of implementation change remains the sole responsibility of the microservice owner. From the perspective of a centralized data integration platform, interaction with the microservice ODS is business as usual. While the approach may appear to be simply shifting coupling from one database to another, it is precisely the shift of coupling to a managed, published interface from a rightfully internal implementation detail that enables microservice data integration while preserving the desirable characteristics of the microservices style.
Figure 1 – Microservice Data Integration via ODS
2. Microservice Data Integration via Gateway
The rise in adoption of microservices has catalyzed the development of a new family of microservice management solutions. Web search for terms like “API gateway” or “microservice gateway” yields many architectural approaches and offerings from vendors like Microsoft, Amazon, IBM, Mulesoft, and Layer 7 (now part of CA). Collectively gateway driven approaches have in common the notion of feeding all microservice traffic through a central access point for application of security, audit, quality of service, and other policies.
While gateways conceivably enable many data integration approaches, the following approach is inspired by the event log (Kreps, 2013) or Kappa Architecture, and the data lake. Data integration via gateway begins with the inclusion of a wiretap (Hohpe, n.d.) in the gateway processing pipeline. The wiretap feeds messages to an event broker (e.g. Apache Kafka, Azure Event Hubs). The event broker ensures messages are durable and buffers downstream message processing systems from spikes in message generation. Messages are then persisted in their raw form in an event log or data lake that is often built from the Hadoop family of technologies. The event log approach ensures that data integration platforms have access to current and historical data that may be needed by BI applications. While modern data integration tools usually support Hadoop, much of the data integration mindshare remains with SQL, so “SQL-on-Hadoop” technologies like Hive may be deployed on top of the raw event log to simplify integration. Stream processors may also be employed for near-real time data integration scenarios. The addition of real-time data integration yields an architecture that has much in common with the Lambda Architecture. Note that non-greenfield microservices architectures require a one-time data migration of pre-microservice databases to the data lake based event log.
Figure 2 – Data Integration via Gateway
3. Microservice Data Integration via Service Interface
A seemingly obvious approach to microservice data integration is to create companion data microservices services to the business capability focused services advocated by the microservices style (Fowler, 2014). One can easily envision microservice interfaces with methods like “GetData” and “GetDataChangedSince” that can be invoked by data integration tools. OData (http://www.odata.org/) is an attempt to standardize this approach. While intuitive and fully respectful of microservice principles, this approach has some drawbacks that must be carefully considered. Nevertheless, data services are common in microservices architectures.
Data volumes common in large data integration pipelines are not typically compatible with the JSON interchange format that is common in microservices. Binary formats can help with data size challenges but do nothing for data integration requirements like handling partial failures, retries, execution of arbitrary queries, etc. While a microservice interface can be constructed that fulfills these requirements and more, doing so is almost certain to fall victim to the Inner Platform Effect (Papdimoulis, 2006). When a microservice starts to implement the full set of functionality required by data integration it starts to look an awful lot like a RESTful version ODBC or JDBC. This path is almost assuredly a waste of time and money unless data integration requirements tend towards simple, entity based lookups and queries.
Figure 3 – Data Integration via Data Service
Microservices is an increasingly popular application and service delivery style that complicates many traditional data integration scenarios. Approaches based on microservice-managed operational data stores, microservice gateways, and companion data services can enable data integration for BI and analytics while preserving desirable characteristics of the microservices style.
The architectural approaches described herein can be implemented completely or in part on Microsoft platforms, both on premises and in the cloud. While Microsoft application, data, and cloud platforms are far too broad to describe in any meaningful depth here, some enabling solutions for microservices data integration include:
SQL Server and SQL Server Integration Services (SSIS)
A consequence of the data integration via ODS pattern is a proliferation of data integration platform instances. Conventional enterprise software licensing likely makes such proliferation an expensive proposition. SSIS is a data integration solution that is provided free of charge with in edition of SQL Server. SQL Server and SSIS provide the ideal combination of price and functionality for the ODS-driven microservice data integration architecture. Learn more about SSIS at https://msdn.microsoft.com/en-us/library/ms141026.aspx
Azure Service Fabric
The microservices architecture has deployment and infrastructure requirements that are markedly different than traditional (a.k.a. monolithic) applications architectures. In lieu of a single application deployment artifact (e.g. Java EE EAR file) a microservices application may consist of dozens of independently deployed and managed assets. Azure Service Fabric is a cloud-born platform designed specifically for microservices. Microservices and the data services described in option 3 above may be simply and cost effectively deployed on Service Fabric. Service Fabric is described in detail at https://azure.microsoft.com/en-us/services/service-fabric/
Azure Big Data Solutions
The gateway and “big data” approach to microservices data integration can pose numerous challenges to time and resource constrained IT departments. Setup and ongoing administration of technologies like Kafka, Hadoop, Spark, etc. are time consuming and often depend on highly skilled and highly in-demand resources. Many organizations have turned to Azure HDInsight (https://azure.microsoft.com/en-us/services/hdinsight/), a cloud Hadoop platform as a service (PaaS) offering from Microsoft to minimize the capital expense and administrative burden often associated with on-premises big data solutions. Azure HDInsight presents organizations an opportunity to implement the “big data” inspired approach to microservices data integration in an agile, cost effective, and low risk manner.
Fowler, M. (2014, March 25). Microservices. Retrieved from martinfowler.com: http://www.martinfowler.com/articles/microservices.html
Hohpe, G. (n.d.). Wire Tap. Retrieved from Enterprise Integration Patterns: http://www.enterpriseintegrationpatterns.com/patterns/messaging/WireTap.html
Kreps, J. (2013, December 16). The Log: What every software engineer should know about real-time data's unifying abstraction. Retrieved from Linkedin: https://engineering.linkedin.com/distributed-systems/log-what-every-software-engineer-should-know-about-real-time-datas-unifying
Papdimoulis, A. (2006, 04 21). The Inner Platform Effect. Retrieved from The Daily WTF: http://thedailywtf.com/articles/The\_Inner-Platform\_Effect