Introduction to Microsoft Sync Framework

Microsoft Corporation
October 2009

Introduction

Microsoft Sync Framework is a comprehensive synchronization platform enabling collaboration and offline for applications, services and devices. Developers can build synchronization ecosystems that integrate any application, any data from any store using any protocol over any network. Sync Framework features technologies and tools that enable roaming, sharing, and taking data offline.

A key aspect of Sync Framework is the ability to create custom providers. Providers enable any data sources to participate in the Sync Framework synchronization process, allowing peer-to-peer synchronization to occur. 

A number of providers are included by Sync Framework that support many common data sources. Although they are not required, to minimize development it is recommended that developers use these providers wherever possible. The following are the providers included:

  • Database synchronization providers: Synchronization for ADO.NET-enabled data sources
  • File synchronization provider: Synchronization for files and folders
  • Web synchronization components: Synchronization for FeedSync feeds such as RSS and ATOM feeds

Developers can ultimately use any of the out-of-the-box providers or can create custom providers to exchange information between devices and applications.

The goal of this document is to help you understand how Microsoft Sync Framework enables synchronization. In this document we will outline some key concepts that will form the basis for how to create a provider.

Participants

Before discussing the specific components of a provider, we first need to understand the different types of participants that can be supported. A participant is the location where information from the data source is retrieved. A participant could be anything from a web service, to a laptop, to a USB thumb drive. 

Participant Types

Based on the capabilities of the device, the way that a provider integrates synchronization will vary. At the very least, we will assume that the device is capable of programmatically returning information when requested. Ultimately, what needs to be determined is if the device can:

  1. Enable information to be stored and manipulated either on the existing device or within the current data store, and;
  2. Allow applications (in our case a synchronization provider) to be executed directly from the device

It is important to distinguish the types of participants that will be part of the synchronization ecosystem because it tells us if they will be able to store the state information required by the provider and it also tells us if we are able to execute the provider directly from the device. Ultimately, the participant model is meant to be generic. As such, a full participant could be configured to be either a partial or simple participant. 

Full Participants

Full participants are devices that allow developers to create applications and new data stores directly on the device. A laptop or a Smartphone are examples of full participants because new applications can be executed directly from the device and you can also create new data stores to persist information if required.

Partial Participants

Partial participants are devices that have the ability to store data either in the existing data store or another data store on the device. These devices, however, do not have the ability to launch executables directly from the device. Some examples of these participants are thumb drives or SD Cards. These devices act like a hard drive where information can be created, updated or deleted. However, they do not typically give an interface that allows applications to be executed on them directly. 

Simple Participants

Simple participants are devices that are only capable of providing information when requested. These devices cannot store or manipulate new data and are unable to support the creation of new applications. RSS Feeds and web services provided by an external organization such as Amazon or EBay are both examples of simple participants. These organizations may give you the ability to execute web services and get results back, however, they do not give you the ability to create your own data stores and they also do not give you the ability to create your own applications to be executed within their web servers.

Bringing it All Together

Ultimately the goal of Microsoft Sync Framework is to allow any data source to be integrated regardless of the participant type. For this reason, partial participants can synchronize information with full participants and full participants can synchronize information with simple participants. At the very least there needs to be one full participant that has the ability to store information and launch the synchronization process.

Microsoft Synchronization Framework

Core Components

Before implementing synchronization using Sync Framework, we need to first understand the key components of a provider. The following diagram shows how a provider built using Sync Framework communicates with a data source and retrieves state information from a metadata store. These providers in turn communicate with other providers through a synchronization session.

Data Source

The data source is the location where all information which needs to be synchronized is stored. A data source could be a relational database, a file, a Web Service or even a custom data source included within a line of business application. As long as you can programmatically access the data, it can participate in synchronization.

Metadata

A fundamental component of a provider is the ability to store information about the data store and the objects within that data store with respect to state and change information. Metadata can be stored in a file, within a database or within the data source being synchronized. As an optional convenience, Sync Framework offers a complete implementation of a metadata store built on a lightweight database that runs in your process. The metadata for a data store can be broken down into five key components:

  • Versions
  • Knowledge
  • Tick count
  • Replica ID
  • Tombstones

For each item that is being synchronized, a small amount of information is stored that describes where and when the item was changed. This metadata is composed of two versions: a creation version and an update version. A version is composed of two components: a tick count assigned by the data store and the replica ID for the data store. As items are updated, the current tick count is applied to that item and the tick count is incremented by the data store. The replica ID is a unique value that identifies a particular data store. The creation version is the same as the update version when the item is created. Subsequent updates to the item modify the update version.

The two primary ways that versioning can be implemented are:

  1. Inline tracking: In this method change tracking information for an item is updated as the change is made. In the case of a database, for example, a trigger may be used to update a change tracking table immediately after a row is updated.
  2. Asynchronous tracking: In this method, there is an external process that runs and scans for changes. Any updates found are added to the version information. This process may be part of a scheduled process or it may be executed prior to synchronization. This process is typically used when there are no internal mechanisms to automatically update version information when items are updated (such as when there is no way to inject logic in the update pipeline). A common way to check for changes is to store the state of an item and compare that it to its current state. For example, it might check to see if the last-write-time or file size had changed since the last update.

All change-tracking must occur at least at the level of items. In other words, every item must have an independent version. In the case of a database, an item might be the entire row within a table. Alternatively, an item might be a column within a row of a table. In the case of file synchronization an item will likely be the file. More granular tracking is highly desirable in some scenarios as it reduces the potential for data conflicts (two users updating the same item on different replicas). The downside is that it increases the amount of change-tracking information stored.

Another key concept that we need to discuss is the notion of knowledge. Knowledge is a compact representation of changes that the replica is aware of. As version information is updated so does the knowledge for the data store. Providers use replica knowledge to:

  1. Enumerate changes (determine which changes another replica is not aware of).
  2. Detect conflicts (determine which operations were made without knowledge of each other)

Each replica must also maintain tombstone information for each of the items that are deleted. This is important because when synchronization is executed, if the item is no longer there, the provider will have no way of telling that this item has been deleted and cannot propagate the change to other providers. A tombstone must contain the following information:

  • Global ID.
  • Deletion version.
  • Creation version.

Because the number of tombstones will grow over time, it may be prudent to create a process to clean up this store after a period of time in order to save space. Support for managing tombstone information is provided with Sync Framework.

Synchronization Flow

The replica where synchronization is initiated is called the source and the replica it connects to is called the destination. The following sections outline the flow of synchronization described in the following diagram. For bidirectional synchronization, this process will be executed twice; source and destination swapped on the second iteration.

Synchronization Session Initiated with Destination

During this phase, the source provider initiates communication to the destination provider. The link between the two providers is called a synchronization session.

Destination Prepares and Sends Knowledge

As discussed previously, each replica stores its own unique knowledge. The knowledge stored in the destination is passed on to the source.

Destination Knowledge used to Determine Changes to be sent

On the source side, the knowledge that was just received is compared to the local item versions to determine the items that the destination does not know about. It is important to note that the versions that are sent are not the actual items but a summary of where the last change was made to each item. 

Change Versions and Source Knowledge sent to Destination

Once the source has prepared the list of change versions required, they are transported to the destination

Local Version Retrieved for Change Items and Compared against Source Version and Knowledge

The destination uses the versions to prepare a list of items that the source needs to send. The destination also uses this information to detect if there are any constraint conflicts.

Conflicts are Detected and Resolved or Deferred

A conflict is detected when the change version in one replica does NOT contain the knowledge of the other. Fundamentally, a conflict occurs if a change is made to the same item on two replicas between synchronization sessions.

Conflicts specifically occur when the source knowledge does not contain the destination version for an item (it is implied that the destination knowledge does not contain any of the source versions sent).

If the version is contained in the destination's knowledge then the change is considered obsolete.

Replicas are free to implement a variety of policies for the resolution of items in conflict across the synchronization community. Below are some examples of commonly used resolution policies:

  • Source Wins: Changes made by the local replica always win in the event of a conflict.
  • Destination Wins: Remote replica always wins
  • Specified Replica ID Always Wins: No matter who changes an item, the replica with the designated ID always wins.
  • Last-Writer Wins: Based on the assumption that all replicas are trusted to make changes and wall clocks are synchronized, allow the last writer to win.
  • Merge: In the event of two duplicate items in conflict, merge the information from one into the other.
  • Log Conflict: Choose to simply log or defer the conflict.

Destination Requests Item Data from Source

During this phase the destination has determined which items in the source need to be retrieved and communicates this request to the source.

Source Prepares and Sends Item Data

The source takes the item data request and prepares the actual data to be transferred to the destination. If the item being tracked is a row in a database, that row will be sent. If the item is a file in a folder then the file will be transferred.

Items are applied at Destination

The items received are taken and applied at the destination. If there are any errors during this process, such as network failure, the items will be tagged as exceptions and corrected during the next synchronization. The knowledge received from the source is added to the destination knowledge.

Synchronization Example

Using the synchronization flow described in the previous section, we will walk through an example of how Sync Framework enumerates changes and ultimately applies item data.

In this example there will be two replicas; Replica A and Replica B. Replica A will initiate synchronization to Replica B (meaning Replica A is the source and Replica B is the destination).

For example, imagine we wanted to synchronize files between these two replicas. A single file in a folder will be the item that will be tracked and is described a In (for example, I1, I2,I3…). When a new file (I1) is created, the metadata associated with that file also needs to be updated as follows:


Item
Update
Tick Count
Update
Replica ID
Creation
Tick Count
Creation
Replica ID
I1 1 A 1 A

If that file was updated again, the version table might look as follows:


Item
Update
Tick Count
Update
Replica ID
Creation
Tick Count
Creation
Replica ID
I1 5 A 1 A

There are likely multiple files being tracked so let's bring in multiple items. As you can see the version information grows as more and more items are created. Sync Framework does not require previous update versions to be stored. It only needs to know of the most recent update version.


Item
Update
Tick Count
Update
Replica ID
Creation
Tick Count
Creation
Replica ID
I2 3 A 2 A
I3 4 A 4 A
I1 5 A 1 A

If we take the current state of the items for this replica, we would represent the knowledge of Replica A as

Replica A Knowledge = A5

A is the Replica ID and 5 is the current tick count that this replica knows changes up to.

On Replica B, there may also be a number of files. This replica looks as follows:

Replica B


Item
Update
Tick Count
Update
Replica ID
Creation
Tick Count
Creation
Replica ID
I104 2 B 1 B
I105 4 B 3 B

The current knowledge for Replica B is:

Replica B Knowledge = B4

At this point we choose to initiate synchronization between the two replicas. Replica A will be the source (the one initiating synchronization) and Replica B will be the destination.

During synchronization the destination sends the source its knowledge. As mentioned earlier, the knowledge for the two replicas look as follows:

Replica A Knowledge = A5

Replica B Knowledge = B4

The source (Replica A) receives this knowledge and uses it to determine which versions to send to the destination. Since Replica B is not aware of any of the items in Replica A, the entire contents of Replica A are sent. In this case, it would include the following versions.


Item
Update
Tick Count
Update
Replica ID
Creation
Tick Count
Creation
Replica ID
I2 3 A 2 A
I3 4 A 4 A
I1 5 A 1 A

The destination receives these versions and enumerates through them to determine which items are to be requested from the source. It also uses this information to determine if there are any conflicts (for example, the same file was updated on both replicas).

Once that is complete the destination requests the source to send the items it is not aware of. In this case, Replica A would send the files that are associated with I1, I2 and I3.

The destination receives these files and adds them to its folder.

At the end of this synchronization session, the process is executed one more time, but this time the source becomes the destination and the destination becomes the source. This allows Replica A to receive any of the files that were created or changed on Replica B (I104 and I105)

At the end of synchronization both replica have the following updated knowledge.

Replica A Knowledge = A5, B4

Replica B Knowledge = A5, B4

Conflict Example

Extending the previous example, the two replicas are currently synchronized and each of the item versions look as follows:


Item
Update
Tick Count
Update
Replica ID
Creation
Tick Count
Creation
Replica ID
I104 2 B 1 B
I105 4 B 3 B
I2 3 A 2 A
I3 4 A 4 A
I1 5 A 1 A

Similarly, the knowledge for both replicas is as follows:

Replica A Knowledge = A5, B4

Replica B Knowledge = A5, B4

At this point both of the replicas decide to update the same file (item I2).

On Replica A the item version table is updated to:


Item
Update
Tick Count
Update
Replica ID
Creation
Tick Count
Creation
Replica ID
I104 2 B 1 B
I105 4 B 3 B
I2 6 A 2 A
I3 4 A 4 A
I1 5 A 1 A

On Replica B the item version table is updated to:


Item
Update
Tick Count
Update
Replica ID
Creation
Tick Count
Creation
Replica ID
I104 2 B 1 B
I105 4 B 3 B
I2 5 B 2 A
I3 4 A 4 A
I1 5 A 1 A

The knowledge for both replicas is also updated to:

Replica A Knowledge = A6, B4

Replica B Knowledge = A5, B5

At this point Replica A initiates synchronization with Replica B. Skipping to the stage where the source sends the item versions and knowledge to the destination, the following steps are performed for item I2.

  1. Replica B sees a new item change for item I2 which is:

    Update
    Tick Count
    Update
    Replica ID
    6 A
  2. Replica B reviews the Knowledge received from Replica A (A6, B4) and determines that Replica A was not aware of a change made to the same item by Replica B:

    Update
    Tick Count
    Update
    Replica ID
    5 B
  3. A conflict is detected and passed to the application or provider to be handled.

Summary

Microsoft Sync Framework includes all of the things required to integrate applications into an offline or collaboration based network, by using the pre-created providers or writing new custom providers. Providers enable any data source to participate in data synchronization regardless of network or device type.