Photo Mosaics Part 8: Caching

Article
12/21/2011

My previous post in this series - on the use of the Service Bus within the Azure Photo Mosaics application – for all intents and purposes completed the explanation of all of the application features, but there’s an alternative implementation that I had planned for, specifically to demonstrate the (then new) feature of Caching. And that’s what I’ll focus on for this post.

Caching Overview

Windows Azure Caching provides a distributed in-memory cache as a service that you can leverage via applications running in the Windows Azure cloud. The cache is essentially a massive property bag of name/value pairs where data is stored across multiple nodes (machines) in the Windows Azure data center and managed on your behalf. Caching is a true service, since the only thing you have to do to set it up is pick which data center will host it, specify the size of the cache, and pick an endpoint name within the Windows Azure Portal. There are a discrete number of cache sizes available (from 128MB to 4GB), and although you pay the same amount for the cache whether it’s 0 or 100% utilized, you can increase or decrease its size when needed (although just once per day).

You may have noticed too that a similarly named feature, Windows Server AppFabric Caching, exists for providing an analogous caching capability for on-premises applications. Although Windows Azure Caching shares a common core (and genesis from the project codenamed Velocity), there are some notable differences that you should be aware of when developing applications that run both on-premises and in the cloud.

Usage Scenario

Within the Azure Photo Mosaics application, I included a configuration option to enable Caching for the storage of the image library tiles. If you recall the flow of the application (below), when the user creates a photomosaic, one of the inputs is a library of images (say from Flickr, his or her own vacation pictures, or what have you) that have been stored in a Windows Azure blob container. Those images are raw images and not yet resized to the dimensions requested for the final image – that after all is another input variable - so the same base images might be used to generate one mosaic with 16x16 pixel tiles and then another with 32x32. Rather than store versions of the same tile for all the available tile sizes, the tiles are generated dynamically by the ImageProcessor Worker Role.

With the default implementation, each instance of the ImageProcessor Worker Role creates an in-memory “Image Library” that contains each of the tiles, resized from the original image in the selected image library. Although the in-memory implementation works fine for the application, there are a couple of drawbacks:

Scalability – since the entire image library is held in the RAM of the virtual machine hosting the given Worker Role, there’s an absolute limit of how large an image library can be. If storage requirements for the generated tiles exceed the RAM allocation, your only option is to scale the application up by selecting a larger VM size, say medium versus small doubling your RAM allocation to 3.5GB. You can only scale up so far, however. Recall that Windows Azure data centers house homogeneous, commodity hardware, so once you reach the largest option (currently extra-large with 14GB) there’s no where else to go!

Performance – each instance of an ImageProcessor Worker Role creates an in-memory tile library that it uses to generate a slice of the final image, and that complete library is re-created for each slice. So, for instance, if you generate a mosaic for a given image and specify you want it processed into six slices, then six tasks will be queued, and the processing for each task will involve recreating the image library. This seeming redundancy is required, since the application is multi-tenanted and stateless, so you cannot rely on the same instance of a worker role processing all of the slices for a given image.

Enter the caching implementation (zoomed from the overall architecture diagram above):

Here, the ImageProcessor first consults the cache to see if a tile for the requested image library in the requested dimension exists. If so, it uses that cached image rather than regenerating it anew (and recalculating its average color). If the tile is not found in the cache, then it does have to retrieve the original image from the image library blob container and resize it to the requested dimensions. At that point it can be stored in the cache so the next request will have near-immediate access to it.

Creating a Cache

Creating a cache with the Windows Azure Management Portal is quite simple and straightforward. When you login to the portal, select the Service Bus, Access Control & Caching option on the left sidebar and then the Caching service at the top left. You’ll then get a list of the existing Caching namespaces:

The properties pane to the right shows information on existing caches, including the current size and peak sizes over the past month and year. To create a new cache, simple select the New option on the ribbon, and you’ll be prompted for four bits of information:

a namespace, which is the first part of the URL by which your cache is accessed. The URL is {namespace} .cache.windows.net, and so {namespace} must be unique across all of Windows Azure,
The region where your cache is located; you’ll pick one of the six Windows Azure data centers here,
The Windows Azure subscription that owns this cache,
The cache size desired, ranging from 128MB to 4GB in six discrete steps (multiples of two).

Configuring the Cache in Code

The easiest way to leverage a Windows Azure Cache in your code is via configuration. You can generate the necessary entries from the Windows Azure Management Portal (as shown below), and simply cut-and-paste the configuration into the web.config file of your Web Role or the app.config file of your Worker Role.

What you do in configuration, you can of course do programmatically – with a bit more elbow grease.

Programming Against the Cache

The Microsoft.ApplicationServer.Caching namespace hosts the classes you’ll need to interface with the Windows Azure cache. Note that this namespace is also used for accessing Windows Server AppFabric caches, and so contains classes and properties which will not apply to Windows Azure Caches (such as DataCacheNotificationProperties). The two primary classes you will use for your Windows Azure caches are:

DataCacheFactory which is used to configure the cache client (via DataCacheFactoryConfiguration) and return a reference to a DataCache. Named caches are not supported in Windows Azure, so you’ll get the reference to the default cache via code similar to the following:

    Dim theCache As DataCache
      
    Try
         theCache = New DataCacheFactory().GetDefaultCache()
     Catch ex As Exception
         Trace.TraceWarning("Cache could not be instantiated: {0}", ex.Message)
         theCache = Nothing
     End Try

DataCache which is a reference to the cache itself, and includes the methods Add, Get, Put, and Remove methods to manipulate objects in the cache. Each of these methods deals with the cached item as a System.Object; if you want to retrieve the object as well as additional metadata like the timeout and version, you can use the GetCacheItem method to return a DataCacheItem instance.

Within the Azure Photo Mosaics application, the following code is used to retrieve the tile thumbnail images from the default cache (Line 5). If an image is not found in the cache, the thumbnail is generated (Line 18) and then stored in the cache (Lines 16-19) ready to service the next request.

    1: Dim cachedTile As CacheableTile
    2:  
    3: Me.Library.ImageRequests += 1
    4: Try
    5:     cachedTile = CType(Me.Library.Cache.Get(Me.TileUri.ToString()), CacheableTile)
    6: Catch
    7:     cachedTile = Nothing
    8: End Try
    9:  
   10: If (cachedTile Is Nothing) Then
   11:     Me.Library.ImageRetrieves += 1
   12:     Trace.TraceInformation(String.Format("Cache miss {0}", Me.TileUri.ToString()))
   13:  
   14:     Dim fullImageBytes As Byte() = Me.Library.TileAccessor.RetrieveImage(Me.TileUri)
   15:  
   16:     cachedTile = New CacheableTile() With {
   17:             .TileUri = Me.TileUri,
   18:             .ImageBytes = ImageUtilities.GetThumbnail(fullImageBytes, Me.Library.TileSize)
   19:         }
   20:  
   21:     Me.Library.Cache.Put(Me.TileUri.ToString(), cachedTile)
   22: Else
   23:     Trace.TraceInformation(String.Format("Cache hit {0}", Me.TileUri.ToString()))
   24: End If

Monitoring the Cache

The Windows Azure Management Portal includes some high level statistics about your cache, namely the current size, peak size for the month, and peak size for the year; however, these statistics are not real time. Additionally there is no way to determine transaction, bandwidth, or connection utilization. Given the fact that access to your cache can be throttled, you need to program defensively and handle DataCacheExceptions. For a quota exception, for instance, the SubStatus value will be set to DataCacheErrorSubStatus.QuotaExceeded. See the Capacity Planning for Caching in Windows Azure whitepaper for additional insight into effective use of your caches.

Windows Server AppFabric Caching provides additional transparency into cache utilization for on-premises applications; for more information see Windows Server AppFabric Caching Capacity Planning Guide

If you do want to collect additional metrics on the cache utilization, consider overloading the Add, Put, Get, and other relevant methods of DataCache to maintain counters of utilization. In the Azure Photo Mosaics application, I added some simple properties to track cache hits and misses in each of the ImageProcessor Worker Roles:

 Public Property ImageRequests As Int32 = 0
Public Property ImageRetrieves As Int32 = 0
Public Property ColorValueRequests As Int32 = 0
Public Property ColorValueRetrieves As Int32 = 0

The “requests” variables indicate the number of times an item was requested (tile thumbnail or color value), and “retrieves” indicate the number of times the item was retrieved from the original source (the same as a cache miss). Cache hits are calculated as requests – retrieves.

In the next post, I’ll continue on this theme of caching by comparing several implementations of the ImageProcessor component that use various approaches to caching.

Some FAQs about Windows Azure Caching

Can I control how long an item will be cached? By default, items expire in 48 hours. You cannot override the expiration policy for a cache (in Windows Azure); however, you can specify eviction times on an item by item basis when adding them to the cache. There is no guarantee an item will be cached for the duration requested, since memory pressure will always push out the least recently used item.

Can I clear the cache manually or programmatically? Windows Azure Caching does not provide this capability at this time.

Is there a limit on what can be cached? Items are cached in a serialized XML format (although you can provide for custom serialization as well) and must be 8KB or less after serialization.

How much does caching cost? The costs for caching are wholly based upon the size of the cache (but do not include data transfer rates out of the data center, if applicable). As of this writing (December 2011) the cost schedule is as follows:

Cache Size Monthly Cost

128 MB $45.00

256 MB $55.00

512 MB $75.00

1 GB $110.00

2 GB $180.00

4 GB $325.00

Beyond cache size are there other constraints? Yes, each cache size designation comes with an associated amount of bandwidth, transaction, and connection limits. Since caching is a shared resource, it’s possible that your usage will be throttled to fall within the limitations listed below (current as of December 2011):

Cache Size Transactions (1000s) per Hour Bandwidth (MB) per Hour Concurrent Connections

128 MB 400 1400 10

256 MB 800 2800 10

512 MB 1600 5600 20

1 GB 3200 11200 40

2 GB 6400 22400 80

4 GB 12800 44800 160

Is there guidance on how to select the right cache size for my application? Yes, see the Capacity Planning for Caching in Windows Azure whitepaper.

Where can I read about those differences between Windows Azure Caching and Windows Server AppFabric Caching? The MSDN article Differences Between Caching On-Premises and in the Cloud covers that topic.

Cache Size	Monthly Cost
128 MB	$45.00
256 MB	$55.00
512 MB	$75.00
1 GB	$110.00
2 GB	$180.00
4 GB	$325.00

Cache Size	Transactions (1000s) per Hour	Bandwidth (MB) per Hour	Concurrent Connections
128 MB	400	1400	10
256 MB	800	2800	10
512 MB	1600	5600	20
1 GB	3200	11200	40
2 GB	6400	22400	80
4 GB	12800	44800	160