Volume 27 Number 10
Forecast: Cloudy - Micrsosoft Azure In-Role Caching
By Joseph Fultz | October 2012
The old notion that luck favors the prepared is meant to convey the idea that no matter how lucky you are, you need to be prepared in order to capitalize on the lucky occurrence. I’ve often thought this statement describes caching pretty accurately. If you’re lucky enough for the universe to align in such a way as to drive high use of your site and services, you’d better be prepared to serve the content quickly.
Back in January I covered some concepts related to caching that focused rather tactically on some coding approaches (msdn.microsoft.com/magazine/hh708748). With the addition of the dedicated and co-located roles for caching in the Azure Caching (Preview), which I’ll refer to here as simply Caching Preview, I felt it would be useful to consider how to use these roles as part of the overall solution architecture. This won’t be an exhaustive coverage of caching features; instead, it’s intended to be a designer’s view of what to do with the big blocks.
A Cache by Any Other Name …
… is not the same. Sure, the back-end implementation is pretty similar and, like its forerunner Azure Shared Caching, Caching Preview will move the data you fetch into the local cache client. More important, though, Caching Preview introduces some capabilities missing from the Shared Cache, so switching to role-based caching not only expands the available feature set, it also gives you better control over the deployment architecture. To start, let’s clarify the primary difference between the dedicated and co-located roles: configuration.
When configuring the cache nodes, you have the option of dedicating the entire role to the cache or setting aside just a percentage of the role. Just as a way to quickly consider the implications of reserving RAM for the co-located cache, take a look at Figure 1, which shows remaining usable RAM after the cache reservation. (Note that the co-located option isn’t available in the X-Small instance.)
Figure 1 Remaining RAM
|Virtual Machine Size||Total RAM||10%/90% Reserved/Available||20%/80% Reserved/Available||40%/60% Reserved/Available|
|X-Large||14GB||1.4GB/12.6GB||2.8GB / 11.2GB||5.6GB/8.4GB|
Often the first thought is to simply choose some medium or small size and allocate some amount of memory. As long as the amount of memory allocated is sufficient for its intended use and within the boundary of the available RAM, this is a fine approach. However, if the number of objects is high and there’s a reasonable expectation that the cache client on each machine might be holding its maximum number of objects, the result could be unexpected memory pressure. Moreover, too little cache RAM could lead to unwanted cache evictions, reducing the overall effectiveness of the cache.
Figure 2shows the percentage of RAM use based on virtual machine (VM) size. The chart is based on the one at msdn.microsoft.com/library/hh914152, which shows the amount of RAM available for caching in dedicated mode.
Figure 2 Cache Use for Dedicated Role
|Virtual Machine Size||Available Memory for Caching||% of RAM Use based on Virtual Machine Size|
In my co-located grid (Figure 1), I didn’t go beyond 40 percent allocation for the co-located type because I assumed I’d need a majority of the RAM for the application. In comparison, the dedicated version usually provides more RAM, but appears to hit maximum efficiency of RAM allocation at the large VM size. In that sense, two medium VMs are less useful than one large. Of course, one large instance can’t help with options such as high availability (HA), which duplicates your data, that you might want in your caching infrastructure. Still, it’s worth weighing the needs of space against the need for redundancy and choosing a configuration that not only meets technical needs, but also optimizes cost.
When caching is done purposefully, a RAM drought typically isn’t an issue. However, in cases where the shared cache is used to back session objects and/or the output cache, the situation is a bit more challenging because of the tendency to use session for everything and the difficulty in predicting exact load. For example, if you’re running a Model-View-Controller app that has deep models you’re placing in Session and you increase the maximum number of objects for the cache client, you might encounter undesired results under a medium or greater load. This would surface as slower site performance caused by evictions from the shared cache, from memory pressure you didn’t expect; don’t forget, the cache client is likely holding more RAM than anticipated due to the combination of an increased max object count and a deep graph. The framework helps you out a bit by compressing the serialized objects, but for such a precious and finite resource as RAM, diligence in accounting is the best practice, especially when trying to share the RAM among the application, output cache, session objects, data cache and cache client. To assist you in sizing your cache, Microsoft has published the Capacity Planning Guide spreadsheet, which you can find at msdn.microsoft.com/library/hh914129.
Regions add some nice functionality, but at a cost. The cost is that the region is pinned to a single server forcing all requests for objects in the region to be bottlenecked through that cache host when it isn’t stored in the cache client. The upside is that using regions provides tagging capability. My favorite use of regions is to hold pre-fetched reference data. This might seem folly at first, because of the bottleneck problem, but let’s see.
To consider the cache use, let’s postulate that I have a catalog of 10,000 products with up to eight variants for each product, meaning a catalog of potentially 80,000 items. If each object representing an item averages 1K, that’s about 82MB to shuffle around on each request, as well as take up space in the cache client. In addition, there will be some number of virtual catalogs that are either a full copy or subset of the original, so I could end up with an explosion of reference data to be shuffled about, all served by the single region host (see Figure 3).
Figure 3 Cache Layout with a Single Region
However, with a little work I can create more regions to hold subsections of the data. For example, I might break my catalog into departments or segments. I could, for example, create one region for consumer products and one for professional products, resulting in something like what’s shown in Figure 4.
Figure 4 Cache Layout with Two Regions
This provides a little more granularity in the cache, enabling the use of smaller roles to hold all of my cache objects, ease cache updates, and decrease traffic by reducing the queries to each role and by filtering through the use of tag queries.
The ability to tag content is the primary function driving the use of regions. Thus, I can mark the content in my catalogs; for computers, for example, I might have tags such as: “laptop,” “4GB,” “8GB,” “15 in.,” “HD Audio,” “Desktop” and so on. In this way I can enable such UI elements as a faceted product search for navigation by using a call to one of the GetObjectsByTag methods. It also means reengineering the data access layer and, in some regard, treating the cache more as the primary data source by which the queries on the facets (tags) of data are satisfied.
An interesting way to take advantage of this feature is to use Azure Storage Tables as the back-end datastore, but to pre-fetch the data, tag it and put it into cache. This provides some of the filtering missing from the current incarnation of Storage Tables while keeping costs to a minimum.
Using regions provides a lot of flexibility in retrieving cached data, but do note the specific type of strain it places on the deployment infrastructure. Still, regions are handy as a means to pre-fetch and access reference data.
There’s a funny thing to consider with HA caches—you use an HA cache to be careful, but you need to be careful when using HA. At least, you need to be appropriately thoughtful about what really needs to be highly available.
Because every role enabled for duplication doubles the amount of space needed for the actual objects, you run out of RAM much faster. Thus, as a matter of design, it’s best to use HA only for those features that actually need it or that would vastly improve UX so as to not arbitrarily drive up costs or artificially trigger cache evictions due to memory starvation resulting from overconsumption by duplicated caches.
I’ve seen some guidance that suggests putting session objects into the HA cache so you can query across users in active sessions based on certain tag values. In most instances, this would not be a useful approach as it inequitably distributes the load when retrieving session objects from cache; that load pattern should adhere more closely to the load balancing of the site. Furthermore, because you may well have a lot of empty anonymous profiles in addition to under-identified registered users, tag-based search for such entities as user profiles are actually more limiting than helpful.
I suggest you put user objects, sessions, output caching and the like into their own named caches, but don’t enable them for duplication. In cases where edit data is tied to a session, you might consider backing up the session with an HA cache depending on where you are in the application’s lifecycle. If the app is still being designed and created, it’s better to separate those in-situ state objects and place them in an HA cache outside of the session. This lets you manage the edit data related to a user beyond the scope of the session and to keep the session in a much more evenly spread cache. However, if your app is further along and you have data you don’t want to lose for legal, financial or just ease-of-use reasons that are bound up with the session object, it’s acceptable to just wire a session to the HA cache—just be sure to specifically stress that in your load tests and know the limits of the implementation. Beyond data that’s important due to its content or its point in a process—such as the data backing an editing interface—the types of data that are immediate targets for HA are large reference sets, pre-fetched data and pre-calculated data.
The commonality among all of these is the cost to pre-fetch that data again. In the case of pre-fetched or pre-calculated reference data, the start-up cost to prime the cache can be quite significant and losing the data during runtime might have a severe and even catastrophic impact on the site execution. Figure 5depicts how the cache objects might be allocated with HA turned on. Because the duplicates must be in a different fault domain, you can see how the duplicates reduce the overall RAM available for the cache. This is not to say it’s always bad so much as it is to say it’s what’s necessary. I’m simply suggesting a conscious awareness of the potential impact of HA.
Figure 5 High Availability and Fault Domains
Developers often like to think of development as building with Lego blocks; you create the basic blocks and snap them together into a useful application. This idea remains true even as we move further up the application stack from functions to objects to components to infrastructure. To that end, I want to leave you with some design guidance.
First, use all of the tools to your advantage. Don’t settle on only HA or on no HA because one way is easier. Don’t use only region-based caches because you can search them; or forego them because they get pinned to an instance. Rather, construct your caching infrastructure to suit your needs.
Figure 6shows the dedicated roles I use to house my duplicated caches and the regions I use for more richly searchable caches. As a general rule, I favor dedicated cache roles for housing regions, because I don’t want to load down a role that’s serving user traffic with all of the traffic for cache fetches related to a given region. The bottom row in Figure 6 depicts using the co-located style of caching to hold session, output and various other data I might cache during the execution of my application. This is not to say I’m stuck with dedicated or co-located roles as depicted, as that mostly depends on the RAM requirements of the items I intend to cache in the roles. Indeed, for many implementations, the bottom row alone will do the trick with no need for HA, regions or the large amount of RAM afforded by the dedicated role.
Figure 6 Cache Deployment Possibilities
Finally, Figure 7 is a grid that identifies my starting point for different types of data when I’m considering how to architect my cache deployment.
Figure 7 Configuration Preferences
|Type of Data||Use HA||Use Region||Dedicated||Co-Located|
This is in no way meant to dictate usage or suggest a role or feature is useful only for the slot I’ve identified. It’s just my starting point. If I had a large set of pre-fetched data I wanted to be able to search by tag, I’d combine the items I marked, ending up with a dedicated cache role that uses regions and HA. As an example of when I might deviate from my starting point, if I have a deployed application that uses session to cache its models while the user edits data, I would most likely toss my tendency to put session in a co-located cache and not only put it in a dedicated role, but enable HA as well.
So, if you’re lucky enough to have a busy site, make sure your luck will continue by properly preparing your caching infrastructure.
Joseph Fultz is a software architect at Hewlett-Packard Co., working as part of the HP.com Global IT group. Previously he was a software architect for Microsoft, working with its top-tier enterprise and ISV customers to define architecture and design solutions.
Thanks to the following technical experts for reviewing this article: Rohit Sharma and Hanz Zhang