Sync data across multiple cloud and on-premises databases with SQL Data Sync
SQL Data Sync is a service built on Azure SQL Database that lets you synchronize the data you select bi-directionally across multiple SQL databases and SQL Server instances.
Architecture of SQL Data Sync
Data Sync is based around the concept of a Sync Group. A Sync Group is a group of databases that you want to synchronize.
Data Sync uses a hub and spoke topology to synchronize data. You define one of the databases in the sync group as the Hub Database. The rest of the databases are member databases. Sync occurs only between the Hub and individual members.
- The Hub Database must be an Azure SQL Database.
- The member databases can be either SQL Databases, on-premises SQL Server databases, or SQL Server instances on Azure virtual machines.
- The Sync Database contains the metadata and log for Data Sync. The Sync Database has to be an Azure SQL Database located in the same region as the Hub Database. The Sync Database is customer created and customer owned.
If you're using an on premises database as a member database, you have to install and configure a local sync agent.
A Sync Group has the following properties:
The Sync Schema describes which data is being synchronized.
The Sync Direction can be bi-directional or can flow in only one direction. That is, the Sync Direction can be Hub to Member, or Member to Hub, or both.
The Sync Interval describes how often synchronization occurs.
The Conflict Resolution Policy is a group level policy, which can be Hub wins or Member wins.
When to use Data Sync
Data Sync is useful in cases where data needs to be kept up-to-date across several Azure SQL Databases or SQL Server databases. Here are the main use cases for Data Sync:
Hybrid Data Synchronization: With Data Sync, you can keep data synchronized between your on-premises databases and Azure SQL Databases to enable hybrid applications. This capability may appeal to customers who are considering moving to the cloud and would like to put some of their application in Azure.
Distributed Applications: In many cases, it's beneficial to separate different workloads across different databases. For example, if you have a large production database, but you also need to run a reporting or analytics workload on this data, it's helpful to have a second database for this additional workload. This approach minimizes the performance impact on your production workload. You can use Data Sync to keep these two databases synchronized.
Globally Distributed Applications: Many businesses span several regions and even several countries. To minimize network latency, it's best to have your data in a region close to you. With Data Sync, you can easily keep databases in regions around the world synchronized.
Data Sync is not the preferred solution for the following scenarios:
|Scenario||Some recommended solutions|
|Disaster Recovery||Azure geo-redundant backups|
|Read Scale||Use read-only replicas to load balance read-only query workloads (preview)|
|ETL (OLTP to OLAP)||Azure Data Factory or SQL Server Integration Services|
|Migration from on-premises SQL Server to Azure SQL Database||Azure Database Migration Service|
How does Data Sync work?
Tracking data changes: Data Sync tracks changes using insert, update, and delete triggers. The changes are recorded in a side table in the user database.
Synchronizing data: Data Sync is designed in a Hub and Spoke model. The Hub syncs with each member individually. Changes from the Hub are downloaded to the member and then changes from the member are uploaded to the Hub.
Resolving conflicts: Data Sync provides two options for conflict resolution, Hub wins or Member wins.
- If you select Hub wins, the changes in the hub always overwrite changes in the member.
- If you select Member wins, the changes in the member overwrite changes in the hub. If there's more than one member, the final value depends on which member syncs first.
Get started with SQL Data Sync
Set up Data Sync in the Azure portal
Set up Data Sync with PowerShell
Review the best practices for Data Sync
Did something go wrong?
Consistency and performance
Since Data Sync is trigger-based, transactional consistency is not guaranteed. Microsoft guarantees that all changes are made eventually and that Data Sync does not cause data loss.
Data Sync uses insert, update, and delete triggers to track changes. It creates side tables in the user database for change tracking. These change tracking activities have an impact on your database workload. Assess your service tier and upgrade if needed.
Provisioning and deprovisioning during sync group creation, update, and deletion may also impact the database performance.
Requirements and limitations
Each table must have a primary key. Don't change the value of the primary key in any row. If you have to change a primary key value, delete the row and recreate it with the new primary key value.
Snapshot isolation must be enabled. For more info, see Snapshot Isolation in SQL Server.
A table cannot have an identity column that is not the primary key.
A primary key cannot have the datetime data type.
The names of objects (databases, tables, and columns) cannot contain the printable characters period (.), left square bracket ([), or right square bracket (]).
Azure Active Directory authentication is not supported.
Unsupported data types
XMLSchemaCollection (XML supported)
Cursor, Timestamp, Hierarchyid
Unsupported column types
Data Sync can't sync read-only or system-generated columns. For example:
System-generated columns for temporal tables.
Limitations on service and database dimensions
|Maximum number of sync groups any database can belong to.||5|
|Maximum number of endpoints in a single sync group||30||Create multiple sync groups|
|Maximum number of on-premises endpoints in a single sync group.||5||Create multiple sync groups|
|Database, table, schema, and column names||50 characters per name|
|Tables in a sync group||500||Create multiple sync groups|
|Columns in a table in a sync group||1000|
|Data row size on a table||24 Mb|
|Minimum sync interval||5 Minutes|
FAQ about SQL Data Sync
How much does the SQL Data Sync service cost?
There is no charge for the SQL Data Sync service itself. However, you still accrue data transfer charges for data movement in and out of your SQL Database instance. For more info, see SQL Database pricing.
What regions support Data Sync?
SQL Data Sync is available in all public cloud regions.
Is a SQL Database account required?
Yes. You must have a SQL Database account to host the Hub Database.
Can I use Data Sync to sync between SQL Server on-premises databases only?
Not directly. You can sync between SQL Server on-premises databases indirectly, however, by creating a Hub database in Azure, and then adding the on-premises databases to the sync group.
Can I use Data Sync to sync between SQL Databases that belong to different subscriptions?
Yes. You can sync between SQL Databases that belong to resource groups owned by different subscriptions.
- If the subscriptions belong to the same tenant, and you have permission to all subscriptions, you can configure the sync group in the Azure portal.
- Otherwise, you have to use PowerShell to add the sync members that belong to different subscriptions.
Can I use Data Sync to seed data from my production database to an empty database, and then sync them?
Yes. Create the schema manually in the new database by scripting it from the original. After you create the schema, add the tables to a sync group to copy the data and keep it synced.
Should I use SQL Data Sync to back up and restore my databases?
It is not recommended to use SQL Data Sync to create a backup of your data. You cannot back up and restore to a specific point in time because SQL Data Sync synchronizations are not versioned. Furthermore, SQL Data Sync does not back up other SQL objects, such as stored procedures, and does not do the equivalent of a restore operation quickly.
For one recommended backup technique, see Copy an Azure SQL database.
Can Data Sync sync encrypted tables and columns?
If a database uses Always Encrypted, you can sync only the tables and columns that are not encrypted. You can't sync the encrypted columns, because Data Sync can't decrypt the data.
If a column uses Column-Level Encryption (CLE), you can sync the column, as long as the row size is less than the maximum size of 24 Mb. Data Sync treats the column encrypted by key (CLE) as normal binary data. To decrypt the data on other sync members, you need to have the same certificate.
Is collation supported in SQL Data Sync?
Yes. SQL Data Sync supports collation in the following scenarios:
If the selected sync schema tables are not already in your hub or member databases, then when you deploy the sync group, the service automatically creates the corresponding tables and columns with the collation settings selected in the empty destination databases.
If the tables to be synced already exist in both your hub and member databases, SQL Data Sync requires that the primary key columns have the same collation between hub and member databases to successfully deploy the sync group. There are no collation restrictions on columns other than the primary key columns.
Is federation supported in SQL Data Sync?
Federation Root Database can be used in the SQL Data Sync Service without any limitation. You cannot add the Federated Database endpoint to the current version of SQL Data Sync.
Update the schema of a synced database
Do you have to update the schema of a database in a sync group? Schema changes are not automatically replicated. For some solutions, see the following articles:
Monitor and troubleshoot
Is SQL Data Sync performing as expected? To monitor activity and troubleshoot issues, see the following articles:
Learn more about Azure SQL Database
For more info about SQL Database, see the following articles: