Data Replication and Synchronization Guidance
When you deploy an application to more than one datacenter, such as cloud and on-premises locations, you must consider how you will replicate and synchronize the data each instance of the application uses in order to maximize availability and performance, ensure consistency, and minimize data transfer costs between locations.
Why Replicate and Synchronize Data?
Cloud-hosted applications and services are often deployed to multiple datacenters. This approach can reduce network latency for globally located users, as well as providing a complete failover capability should one deployment or one datacenter become unavailable for any reason. For best performance, the data that an application uses should be located close to where the application is deployed, so it may be necessary to replicate this data in each datacenter. If the data changes, these modifications must be applied to every copy of the data. This process is called synchronization.
Alternatively, you might choose to build a hybrid application or service solution that stores and retrieves data from an on-premises data store hosted by your own organization. For example, an organization may hold the main data repository on-premises and then replicate only the necessary data to a data store in the cloud. This can help to protect sensitive data that is not required in all applications. It is also a useful approach if updates to the data occur mainly on-premises, such as when maintaining the catalog of an e-commerce retailer or the account details of suppliers and customers.
The key decisions in any distributed system that uses data replication concern where you will store the replicas, and how you will synchronize these replicas.
Replicating and Synchronizing Data
There are several topologies that you can use to implement data replication. The two most common approaches are:
Master-Master Replication, in which the data in each replica is dynamic and can be updated. This topology requires a two-way synchronization mechanism to keep the replicas up to date and to resolve any conflicts that might occur. In a cloud application, to ensure that response times are kept to a minimum and to reduce the impact of network latency, synchronization typically happens periodically. The changes made to a replica are batched up and synchronized with other replicas according to a defined schedule. While this approach reduces the overheads associated with synchronization, it can introduce some inconsistency between replicas before they are synchronized.
Figure 1 - Master-Master replication
Master-Subordinate Replication, in which the data in only one of the replicas is dynamic (the master), and the remaining replicas are read-only. The synchronization requirements for this topology are simpler than that of the Master-Master Replication topology because conflicts are unlikely to occur. However, the same issues of data consistency apply.
Figure 2 - Master-Subordinate replication
Benefits of Replication
The following list provides suggestions for achieving the benefits of replicating data:
- To improve performance and scalability:
- Use Master-Subordinate replication with read-only replicas to improve performance of queries. Locate the replicas close to the applications that access them and use simple one-way synchronization to push updates to them from a master database.
- Use Master-Master replication to improve the scalability of write operations. Applications can write more quickly to a local copy of the data, but there is additional complexity because two-way synchronization (and possible conflict resolution) with other data stores is required.
- Include in each replica any reference data that is relatively static, and is required for queries executed against that replica to avoid the requirement to cross the network to another datacenter. For example, you could include postal code lookup tables (for customer addresses) or product catalog information (for an ecommerce application) in each replica.
- To improve reliability:
- Deploy replicas close to the applications and inside the network boundaries of the applications that use them to avoid delays caused by accessing data across the Internet. Typically, the latency of the Internet and the correspondingly higher chance of connection failure are the major factors in poor reliability. If replicas are read-only to an application, they can be updated by pushing changes from the master database when connectivity is restored. If the local data is updateable, a more complex two-way synchronization will be required to update all data stores that hold this data.
- To improve security:
- In a hybrid application, deploy only non-sensitive data to the cloud and keep the rest on-premises. This approach may also be a regulatory requirement, specified in a service level agreement (SLA), or as a business requirement. Replication and synchronization can take place over the non-sensitive data only.
- To improve availability:
- In a global reach scenario, use Master-Master replication in datacenters in each country or region where the application runs. Each deployment of the application can use data located in the same datacenter as that deployment in order to maximize performance and minimize any data transfer costs. Partitioning the data may make it possible to minimize synchronization requirements.
- Use replication from the master database to replicas in order to provide failover and backup capabilities. By keeping additional copies of the data up to date, perhaps according to a schedule or on demand when any changes are made to the data, it may be possible to switch the application to use the backup data in case of a failure of the original data store.
Simplifying Synchronization Requirements
Some of the ways that you can minimize or avoid the complexity of two-way synchronization include:
- Use a Master-Subordinate Replication topology wherever possible. This topology requires only one-way synchronization from the master to the subordinates. You may be able to send updates from a cloud-hosted application to the master database using a messaging service, or by exposing the master database across the Internet in a secure way.
- Segregate the data into several stores or partitions according to the replication requirements of the data that they hold. Partitions containing data that could be modified anywhere can be replicated by using the Master-Master topology, while data that can be updated only at a single location and is static everywhere else can be replicated by using the Master-Subordinate topology.
- Partition the data so that updates, and the resulting risk of conflicts, can occur only in the minimum number of places. For example, store the data for different retail locations in different databases so that synchronization must occur only between the retail location and the master database, and not across all databases. For more information see the Data Partitioning Guidance.
- Version the data so that no overwriting is required. Instead, when data is changed, a new version is added to the data store alongside the existing versions. Applications can access all the versions of the data and the update history, and can use the appropriate version. Many Command and Query Responsibility Segregation (CQRS) implementations use this approach, often referred to as Event Sourcing, to retain historical information and to accrue changes at specific points in time.
- Use a quorum-based approach where a conflicting update is applied only if the majority of data stores vote to commit the update. If the majority votes to abort the update then all the data stores must abort the update. Quorum-based mechanisms are not easy to implement but may provide a workable solution if the final value of conflicting data items should be based on a consensus rather than being based on the more usual conflict resolution techniques such as “last update wins” or “master database wins.” For more information see Quorum on TechNet.
Considerations for Data Replication and Synchronization
Even if you can simplify your data synchronization requirements, you must still consider how you implement the synchronization mechanism. Consider the following points:
- Decide which type of synchronization you need:
- Master-Master replication involves a two-way synchronization process that is complex because the same data might be updated in more than one location. This can cause conflicts, and the synchronization must be able to resolve or handle this situation. It may be appropriate for one data store to have precedence and overwrite a conflicting change in other data stores. Other approaches are to implement a mechanism that can automatically resolve the conflict based on timings, or just record the changes and notify an administrator to resolve the conflict.
- Master-Subordinate replication is simpler because changes are made in the master database and are copied to all subordinates.
- Custom or programmatic synchronization can be used where the rules for handling conflicts are complex, where transformations are required on the data during synchronization, or where the standard Master-Master and Master-Subordinate approaches are not suitable. Changes are synchronized by reacting to events that indicate a data update, and applying this update to each data store while managing any update conflicts that might occur.
- Decide the frequency of synchronization. Most synchronization frameworks and services perform the synchronization operation on a fixed schedule. If the period between synchronizations is too long, you increase the risk of update conflicts and data in each replica may become stale. If the period is too short you may incur heavy network load, increased data transfer costs, and risk a new synchronization starting before the previous one has finished when there are a lot of updates. It may be possible to propagate changes across replicas as they occur by using background tasks that synchronize the data.
- Decide which data store will hold the master copy of the data where this is relevant, and the order in which updates are synchronized when there are more than two replicas of the data. Also consider how you will handle the situation where the master database is unavailable. It may be necessary to promote one replica to the master role in this case. For more information see the Leader Election pattern.
- Decide what data in each store you will synchronize. The replicas may contain only a subset of the data. This could be a subset of columns to hide sensitive or non-required data, a subset of the rows where the data is partitioned so that only appropriate rows are replicated, or it could be a combination of both of these approaches.
- Beware of creating a synchronization loop in a system that implements the Master-Master replication topology. Synchronization loops can arise if one synchronization action updates a data store and this update prompts another synchronization that tries to apply the update back to the original data store. Synchronization loops can also occur when there are more than two data stores, where a synchronization update travels from one data store to another and then back to the original one.
- Consider if using a cache is worthwhile to protect against transient or short-lived connectivity issues.
- Ensure that the transport mechanism used by the synchronization process protects the data as it travels over the network. Typically this means using encrypted connections, SSL, or TLS. In extreme cases you may need to encrypt the data itself, but this is likely to require implementation of a custom synchronization solution.
- Consider how you will deal with failures during replication. This may require rerouting requests for data to another replica if the first cannot be accessed, or even rerouting requests to another deployment of the application.
- Make sure applications that use replicas of the data can handle situations that may arise when a replica is not fully consistent with the master copy of the data. For example, if a website accepts an order for goods marked as available but a subsequent update shows that no stock is available, the application must manage this—perhaps by sending an email to the customer and/or by placing the item on back order.
- Consider the cost and time implications of the chosen approach. For example, updating all or part of a data store though replication is likely to take longer and involve more bandwidth than updating a single entity.
For more information about patterns for synchronizing data see Appendix A - Replicating, Distributing, and Synchronizing Data in the p&p guide Building Hybrid Applications in the Cloud on Microsoft Azure. The topic Data Movement Patterns on MSDN contains definitions of the common patterns for replicating and synchronizing data.
Determining how to implement data synchronization is dependent to a great extent on the nature of the data and the type of the data stores. Some examples are:
Use a ready-built synchronization service or framework. In Azure hosted and hybrid applications you might choose to use:
The Azure SQL Data Sync service. This service can be used to synchronize on-premises and cloud-hosted SQL Server instances, and Azure SQL Database instances. Although there are a few minor limitations, it is a powerful service that provides options to select subsets of the data and specify the synchronization intervals. It can also perform one-way replication if required.
For more information about using SQL Data Sync see SQL Data Sync on MSDN and Deploying the Orders Application and Data in the Cloud in the p&p guide Building Hybrid Applications in the Cloud on Microsoft Azure. Note that, at the time this guide was written, the SQL Data Sync service was a preview release and provided no SLA.
The Microsoft Sync Framework. This is a more flexible mechanism that enables you to implement custom synchronization plans, and capture events so that you can specify the actions to take when, for example, an update conflict occurs. It provides a solution that enables collaboration and offline access for applications, services, and devices with support for any data type, any data store, any transfer protocol, and any network topology.
For more information see Microsoft Sync Framework Developer Center on MSDN.
Use a synchronization technology built into the data store itself. Some examples are:
- Azure storage geo-replication. By default in Azure data is automatically replicated in three datacenters (unless you turn it off) to protect against failures in one datacenter. This service can provide a read-only replica of the data.
- SQL Server database replication. Synchronization using the built-in features of SQL Server Replication Service can be achieved between on-premises installations of SQL Server and deployments of SQL Server in Azure Virtual Machines in the cloud, and between multiple deployments of SQL Server in Azure Virtual Machines.
- Implement a custom synchronization mechanism. For example, use a messaging technology to pass updates between deployments of the application, and include code in each application to apply these updates intelligently to the local data store and handle any update conflicts. Consider the following when building a custom mechanism:
- Ready-built synchronization services may have a minimum interval for synchronization, whereas a custom implementation could offer near-immediate synchronization.
- Ready-built synchronization services may not allow you to specify the order in which data stores are synchronized. A custom implementation may allow you to perform updates in a specific order between several data stores, or perform complex transformation or other operations on the data that are not supported in ready-built frameworks and services.
- When you design a custom implementation you should consider two separate aspects: how to communicate updates between separate locations, and how to apply updates to the data stores. Typically, you will need to create an application or component that runs in each location where updates will be applied to local data stores. This application or component will accept instructions that it uses to update the local data store, and then pass the updates to other data stores that contain copies of the data. Within the application or component you can implement logic to manage conflicting updates. However, by passing updates between data store immediately, rather than on a fixed schedule as is the case with most ready-built synchronization services, you minimize the chances of conflicts arising.
Related Patterns and Guidance
The following patterns and guidance may also be relevant to your scenario when distributing and synchronizing data across different locations:
- Caching Guidance. This guidance describes how caching can be used to improve the performance and scalability of a distributed application running in the cloud.
- Data Consistency Primer. This primer summarizes the issues surrounding consistency over distributed data, and provides guidance for handling these concerns.
- Data Partitioning Guidance. This guidance describes how to partition data in the cloud to improve scalability, reduce contention, and optimize performance.
- The guide Data Access for Highly-Scalable Solutions: Using SQL, NoSQL, and Polyglot Persistence on MSDN.
- Appendix A - Replicating, Distributing, and Synchronizing Data from the guide Building Hybrid Applications in the Cloud on Microsoft Azure on MSDN.
- The topic Data Movement Patterns on MSDN.
- The topic SQL Data Sync on MSDN.
- Deploying the Orders Application and Data in the Cloud from the guide Building Hybrid Applications in the Cloud on Microsoft Azure.
- The Microsoft Sync Framework Developer Center on MSDN.