Data Distribution Network
[This article was contributed by the SQL Azure team.]
For certain scenarios it makes sense to distribute your data and your web site closer to the end user; reducing the network latency and providing a better user experience. You can think about it like your own data delivery network, very similar to the Windows Azure Content Delivery Network. This data delivery network can be built with SQL Azure and Data Sync Server for SQL Azure, we will talk about how in this blog post.
Geo Distributed Web Sites
When you deploy your Windows Azure web role (your web site) to multiple data centers around the world the site can be closer to the end user and allowing the responses to travel over the network faster. This can be accomplished with some intelligent routing; for example users in the US get sent to the US data center. Currently, Windows Azure doesn't provide a mechanism for this; you would need to do this yourself at the DNS layer. Another way to direct users to the closest web site to them is to have separate subdomains with an entrance page on the domain that prompts the user to select their country of origin.
It makes sense that if you move your web sites closer to the end user; you will want to move your databases closer to the web sites. Web sites that make heavy use of the database will need the faster performance of having that data collocated with the application. Because you have multiple web sites in different data centers, you will have multiple databases which might have exactly the same data. This is where Data Sync Services for SQL Azure comes into play, the basics of Data Sync Services for SQL Azure is covered in our previous blog post.
Finding the Right Fit
This isn’t a solution for all types of web sites; Data Sync Services works best when all the databases don’t have to be in synchronized 100% of the time. For example, an airline ticketing web site couldn’t use the solution that I am proposing here; data sync services doesn’t provide transaction replication that modifies all the member databases on a commit.
However, if you are running a publishing web site, like a magazine or newspaper that has schedule deployments of content and that content is stored in a database, running a data distribution network to push data to localized datacenters would work very well.
Publishing a Magazine
Let’s say that you were running a web site for a print magazine. All of the magazines content is stored in the hub database (refer to this blog post for Data Sync Service terminology), and that content is published once a week.
Because of the nature of the web site, the database usage tends to be high read, low write. Management wants a Windows Azure centric solution, and it has to be fast – new search engine optimization rules favor web sites that return contently quickly.
You know that the majority of your traffic comes from the United States, so you decide to use three SQL Azure databases to support the read queries; these are read-only member databases of the Data Sync Services synchronization group. They exist in the same data center as the hub database. Then you create two more member databases one in Asia and one in Europe, they are in the same data centers as your Windows Azure web roles.
Note: this is fictional data, and not a reflection of MSDN Magazine traffic, nor does MSDN magazine use SQL Azure – currently.
Your content team would always update the hub database; maybe you would provide a private edition of the web site so they could see how the content looks before you published it. Then when you are ready to publish, you would tell Data Sync Service services to synchronize the group, and all the member databases would be updated with hub data pushed out to the remote data centers. For the rest of the week, the Azure web roles would call to member database that was local to them.
Notice that there is a potential cost savings to you because you have reduced the transfer charges for cross data center queries. Anytime you cross the data centers there is a data transfer fee, this includes querying SQL Azure databases that are not in the same data center as the web role. If you have a low write, high read scenario it benefits you to use Data Sync Services to push the data to the datacenter where the SQL Azure web role is hosted. The caveat is that the price of running another SQL Azure database is less than the potential transfer charges.
What about writes? Even the simplest front facing web sites want to do some writes. If you have a low write scenario you can always send your writes directly to the hub database. The issue becomes that those writes are not synchronized with the member databases until the content is published on a weekly basis. If you write them directly to the localized member database (Data Sync Service supports bi-directional synchronization) you have the same issue, plus a chance of merge conflicts.
One potential solution is to use a separate database for writes, one that is hosted in the same data center as your hub database. Worldwide web roles read and write data directly to this separate database to support forums, comments, and user generated content.
In the future I will blog about a more complicated technique that provides real time writes with Data Sync Services that provides geographic redundancy.
Do you have questions, concerns, comments? Post them below and we will try to address them.