Maintaining Data Compliance and Choosing an Azure Data Centers for Optimal Upload Speed

Imagine that you want to build a software product where you upload content to Azure Storage from all over the world. The goal is to give users the best possible upload speeds. But the goal is to balance this performance goal with keeping costs low. The more data centers that are chosen, the more likely you will incur data transfer costs if users are sharing content.

image001

Figure 1: Data access from the Phillipines

But there is also the issue of compliance. This is a complex topic. How would you go about makubg sure that you adhere to compliance laws regarding your data storage and transfer?

image002

Figure 2: What you need to consider

What you want to optimize

You need to consider the trade-offs. There is the cost of data transfer out of a data center. If you are sharing content among users and you have more data centers, then you have a higher chance of spending more on data transfer. Why? Because you get charged when data leaves the data center.

But then data transfer speed is an issue as well. Fewer data centers means that the average user will need to go to a farther data center to have access to a data center.

So the classic trade off of expense and speed is something that needs to be optimized.

How to go about finding answers

Well, read below for some guidelines.

If your goal is to maximize upload speeds by selecting the correct data centers, there are some basic steps you should consider:

  1. Use web analytics to identify the countries where your most important customers or most customers reside

  2. Use a map (see figure 1) to identify the countries with the closest major data centers.

    • Let's imagine that one of the core geographies is the Philippines . Using a map you notice that the two closest data centers are Hong Kong and Singapore .
    • Now you have to determine which of those data centers provides the best upload performance. Now there’s the tough part. There are so many factors that lie in between a data center and an end-user.
    • Let's assume that the Philippines is a country with lots of important customers. Some basic research indicates that the closest Azure data centers are our Hong Kong and Singapore data centers .
  3. You need to test end-users as realistically as possible. Identify some customers to perform some tests. 

    The challenge is that different users may be using different Internet providers, so you end up comparing apples and oranges. You also need to separate upload and download speeds and measuring them seprately. How big are the files being uploaded on average? Is the upload mechanism using multithreaded techniques, breaking up a large files into several smaller uploads to be uploaded in parallel?

    • Because the performance of Internet providers varies greatly, you can't just can't choose one user from a country to make conclusions.

    • You will need to research which Internet providers are the most commonly used in the country you are optimizing for.

    • You might select the top three Internet providers for the Philippines and then find customers who used as Internet providers. They are the ones you should perform the test with.

    • Wikipedia lists the Internet providers and major countries here:

Some Great Tooling

Here are some links and tools to help analyze bandwidth and performance of data transfer.

It may become important to also incorporate automated testing tools, making sure you find patterns and can make safe conclusions about connectivity performance among countries.

Azure Throughput Analyzer

The Microsoft Research eXtreme Computing Group cloud-research engagement team supports researchers in the field who use Windows Azure to conduct their research. As part of this effort, we have built a desktop utility that measures the upload and download throughput achievable from your on-premise client machine to Azure cloud storage (blobs, tables and queue). The download contains the desktop utility and an accompanying user guide. You simply install this tool on your on-premise machine, select a data center for the evaluation, and enter the account details of any storage service created within it. The utility will perform a series of data-upload and -download tests using sample data and collect measurements of throughput, which are displayed at the end of the test, along with other statistics.

  1. https://research.microsoft.com/en-us/downloads/5c8189b9-53aa-4d6a-a086-013d927e15a7/default.aspx

Web Site to measure speed

Measures the latency from your web browser to the Blob Storage Service in each of the Microsoft Azure Data Centers.

  1. https://azurespeedtest.azurewebsites.net/

image

Some review questions and Compliance Issues

Do I need to incorporate Traffic Manager (TM) into the analysis? If your storage account is deployed to multiple datacenters and you want a user to automatically get routed to closer data center, you should evaluate TM. Here is a lab you might consider (https://az12722.vo.msecnd.net/wazplatformtrainingcourse2-8/labs/windowsazuretrafficmanager1-0/Lab.docx). Traffic manager provides many features to ensure that users have access to their data. To learn more, you can read here: https://azure.microsoft.com/en-us/services/traffic-manager
Is there a way to test upload speeds, like from somewhere in Australia to the Singapore data center? Yes. Try https://www.azurespeed.com/, or use azcopy if you goal is to find the max upload speed with multiple threads. Remember that a storage account has 20k requests / second limitation and each blob has 60MB/s bandwidth limitation. Here is an excellent post to help you understand the limitations, https://msdn.microsoft.com/en-us/library/azure/dn249410.aspx.
Where can I learn more about compliance? Yes. One place you can look to start is here, (https://azure.microsoft.com/en-us/support/trust-center/privacy) and (https://azure.microsoft.com/en-us/support/trust-center/compliance).

Some Data Centers

Worldwide data centers as of 8/1/2014

United States US Central (Iowa) US East (Virginia) US East 2 (Virginia) US North Central (Illinois) US South Central (Texas) US West (California)
Europe Europe North (Ireland) Europe West (Netherlands)
Asia Pacific Asia Pacific Southeast (Singapore) Asia Pacific East (Hong Kong)
Japan Japan East (Saitama Prefecture) Japan West (Osaka Prefecture)
Brazil Brazil South (Sao Paulo State) One-way replication to US South Central (Texas)

Conclusion

Hopefully you now have some ways to think about the placement of Azure Storage Accounts in the various global data centers. I provided some guidelines about how to measure and what to consider when choosing a location to host data. You need to consider expense as well as performance. I also provided some guidelines about compliance. I can also follow up with you if you have specific compliance questions.