Windows Azure: Cost Architecting for Windows Azure

Designing applications and solutions for cloud computing and Windows Azure requires a completely different way of considering the operating costs.

Maarten Balliauw

Cloud computing and platforms like Windows Azure are billed as “the next big thing” in IT. This certainly seems true when you consider the myriad advantages to cloud computing.

Computing and storage become an on-demand story that you can use at any time, paying only for what you effectively use. However, this also poses a problem. If a cloud application is designed like a regular application, chances are that that application’s cost perspective will not be as expected.

Different Metrics

In traditional IT, one would buy a set of hardware (network infrastructure, servers and so on), set it up, go through the configuration process and connect to the Internet. It’s a one-time investment, with an employee to turn the knobs and bolts—that’s it.

With cloud computing, a cost model replaces that investment model. You pay for resources like server power and storage based on their effective use. With a cloud platform like Windows Azure, you can expect the following measures to calculate your monthly bill:

  • Number of hours you’ve reserved a virtual machine (VM)—meaning you pay for a deployed application, even if it isn’t currently running
  • Number of CPUs in a VM
  • Bandwidth, measured per GB in / out
  • Amount of GB storage used
  • Number of transactions on storage
  • Database size on SQL Azure
  • Number of connections on Windows Azure platform AppFabric

You can find all the pricing details on the Windows Azure Web site at microsoft.com/windowsazure/offers. As you can see from the list here, that’s a lot to consider.

Limiting Virtual Machines

Here’s how it breaks down from a practical perspective. While limiting the amount of VMs you have running is a good way to save costs, for Web Roles, it makes sense to have at least two VMs for availability and load balancing. Use the Windows Azure Diagnostics API to measure CPU usage, the amount of HTTP requests and memory usage in these instances, and scale your application down when appropriate.

Every instance of whichever role running on Windows Azure on a monthly basis doubles the amount of hours on the bill. For example, having three role instances running on average (sometimes two, sometimes four) will be 25 percent cheaper than running four instances all the time with almost no workload.

For Worker Roles, it also makes sense to have at least two role instances to do background processing. This will help ensure that when one is down for updates or a role restart occurs, the application is still available. By using the available tooling for Windows Azure, you can configure a Worker Role out-of-the-box for doing just one task.

If you’re running a photo-sharing Web site, for example, you’ll want to have a Worker Role for resizing images and another for sending out e-mail notifications. Using the “at least two instances” rule would mean running four Worker Role instances, which would result in a sizeable bill. Resizing images and sending out e-mails don’t actually require that amount of CPU power, so having just two Worker Roles for doing both tasks should be enough. That’s a savings of 50 percent on the monthly bill for running Windows Azure. It’s also fairly easy to implement a threading mechanism in a Worker Role, where each thread performs its dedicated piece of work.

Windows Azure offers four sizes of VM: small, medium, large and extra large. The differences between each size are the number of available CPUs, the amount of available memory and local storage, and the I/O performance. It’s best to think about the appropriate VM size before actually deploying to Windows Azure. You can’t change it once you’re running your application.

When you receive your monthly statement, you’ll notice all compute hours are converted into small instance hours when presented on your bill. For example, one clock hour of a medium compute instance would be presented as two small compute instance hours at the small instance rate. If you have two medium instances running, you’re billed for 720 hours x 2 x 2.

Consider this when sizing your VMs. You can realize almost the same compute power using small instances. Let’s say you have four of them billed at 720 hours x 4. That price is the same. You can scale down to two instances when appropriate, bringing you to 720 hours x 2. If you don’t need more CPUs, more memory or more storage, stick with small instances because they can scale on a more granular basis than larger instances.

Billing for Windows Azure services starts once you’ve deployed an application, and occurs whether the role instances are active or turned off. You can easily see this on the Windows Azure portal. There’s a large image of a box. “When the box is gray, you’re OK. When the box is blue, a bill is due.” (Thanks to Brian H. Prince for that one-liner.)

When you’re deploying applications to staging or production and turning them off after use, don’t forget to un-deploy the application as well. You don’t want to pay for any inactive applications. Also remember to scale down when appropriate. This has a direct effect on monthly operational costs.

When scaling up and down, it’s best to have a role instance running for at least an hour because you pay by the hour. Spin up multiple worker threads in a Worker Role. That way a Worker Role can perform multiple tasks instead of just one. If you don’t need more CPUs, more memory or more storage, stick with small instances. And again, be sure to un-deploy your applications when you’re not using them.

Bandwidth, Storage and Transactions

Bandwidth and transactions are two tricky metrics. There’s currently no good way to measure these, except by looking at the monthly bill. There’s no real-time monitoring you can consult and use to adapt your application. Bandwidth is the easier of those two metrics to measure. The less amount of traffic you put over the wire, the less the cost will be. It’s as simple as that.

Things get trickier when you deploy applications over multiple Windows Azure regions. Let’s say you have a Web Role running in the “North America” region and a storage account in the “West Europe” region. In this situation, bandwidth for communication between the Web Role and storage will be billed.

If the Web Role and storage were located in the same region (both in “North America,” for example), there would be no bandwidth bill for communication between the Web Role and storage. Keep in mind that when designing geographically distributed applications, it’s best to keep coupled services within the same Windows Azure region.

When using the Windows Azure Content Delivery Network (CDN), you can take advantage of another interesting cost-reduction measure. CDN is metered in the same way as blob storage, meaning per GB stored per month. Once you initiate a request to the CDN, it will grab the original content from blob storage (including bandwidth consumption, thus billed) and cache it locally.

If you set your cache expiration headers too short, it will consume more bandwidth because the CDN cache will update itself more frequently. When cache expiration is set too long, Windows Azure will store content in the CDN for a longer time and bill per GB stored per month. Think of this for every application so you can determine the best cache expiration time.

The Windows Azure Diagnostics Monitor also uses blob storage for diagnostic data such as performance counters, trace logs, event logs and so on. It writes this data to your application on a pre-specified interval. Writing every minute will increase the transaction count on storage leading to extra costs. Setting it to an interval such as every 15 minutes will result in fewer storage transactions. The drawback to that, however, is the diagnostics data is always at least 15 minutes old.

Also, the Windows Azure Diagnostics Monitor doesn’t clean up its data. If you don’t do this yourself, there’s a chance you’ll be billed for a lot of storage containing nothing but old, expired diagnostic data.

Transactions are billed per 10.000. This may seem like a high number, but you’ll pay for them, in reality. Every operation on a storage account is a transaction. Creating a blob container, listing the contents of a blob container, storing data in a table on table storage, peeking for messages in a queue—these are all transactions. When performing an operation such as blob storage, for example, you would first check if the blob container exists. If not, you would have to create it and then store a blob. That’s at least two, possibly three, transactions.

The same counts for hosting static content on blob storage. If your Web site hosts 40 small images on one page, this means 40 transactions. This can add up quickly with high-traffic applications. By simply ensuring a blob container exists at application startup and skipping that check on every subsequent operation, you can cut the number of transactions by almost 50 percent. Be smart about this and you can lower your bill.

Indexes Can Be Expensive

SQL Azure is an interesting product. You can have a database of 1GB, 5GB, 10GB, 20GB, 30GB, 40GB, or 50GB at an extremely low monthly price. It’s safe and sufficient to just go for a SQL Azure database of 5GB at first. If you’re only using 2GB of that capacity, though, you’re not really in a pay-per-use model, right?

In some situations, it can be more cost-effective to distribute your data across different SQL Azure databases, rather than having one large database. For example, you could have a 5GB and a 10GB database, instead of a 20GB database with 5GB of unused capacity. This type of strategic storage will affect your bill if you do it smartly, and if it works with your data type.

Every object consumes storage. Indexes and tables can consume a lot of database storage capacity. Large tables may occupy 10 percent of a database, and some indexes may consume 0.5 percent of a database.

If you divide the monthly cost of your SQL Azure subscription by the database size, you’ll have the cost-per-storage unit. Think about the objects in your database. If index X costs you 50 cents per month and doesn’t really add a lot of performance gain, then simply throw it away. Half a dollar is not that much, but if you eliminate some tables and some indexes, it can add up. An interesting example on this can be found on the SQL Azure team blog post, “The Real Cost of Indexes” (blogs.msdn.com/b/sqlazure/archive/2010/08/19/10051969.aspx).

There is a strong movement in application development to no longer use stored procedures in a database. Instead, the trend is to use object-relational mappers and perform a lot of calculations on data in the application logic.

There’s nothing wrong with that, but it does get interesting when you think about Windows Azure and SQL Azure. Performing data calculations in the application may require extra Web Role or Worker Role instances. If you move these calculations to SQL Azure, you’re saving on a role instance in this situation. Because SQL Azure is metered on storage and not CPU usage, you actually get free CPU cycles in your database.

Developer Impact

The developer who’s writing the code can have a direct impact on costs. For example, when building an ASP.NET Web site that Windows Azure will host, you can distribute across role instances using the Windows Azure storage-backed session state provider. This provider stores session data in the Windows Azure table service where the amount of storage used, the amount of bandwidth used and the transaction count are measured for billing. Consider the following code snippet that’s used for determining a user’s language on every request:

if (Session["culture"].ToString() == "en-US") {
  // .. set to English ...
}
if (Session["culture"].ToString() == "nl-BE") {
  // .. set to Dutch ...
}

Nothing wrong with that? Technically not, but you can optimize this by 50 percent from a cost perspective:

string culture = Session["culture"].ToString();
if (culture == "en-US") {
  // .. set to English ...
}
if (culture == "nl-BE") {
  // .. set to Dutch ...
}

Both snippets do exactly the same thing. The first snippet reads session data twice, while the latter reads session data only once. This means a 50 percent cost win in bandwidth and transaction count. The same is true for queues. Reading one message at a time 20 times will be more expensive than reading 20 messages at once.

As you can see, cloud computing introduces different economics and pricing details for hosting an application. While cloud computing can be a vast improvement in operational costs versus a traditional datacenter, you have to pay attention to certain considerations in application architecture when designing for the cloud.

Maarten Balliauw

Maarten Balliauw* is a technical consultant in Web technologies at RealDolmen, one of Belgium’s biggest ICT companies. His interests are ASP.NET MVC, PHP and Windows Azure. He’s a Microsoft Most Valuable Professional ASP.NET and has published many articles in both PHP and .NET literature such as MSDN Magazine, Belgium, and PHP Architect. Balliauw is a frequent speaker at various national and international events. Find his blog at blog.maartenballiauw.be.*