Toolbox

Caching, Object-Object Mapping, Blogs and More

Scott Mitchell

Improving Web Application Performance with Distributed Caching

For most data-driven Web applications, each page displays a variety of information from the database. For example, a typical Web page at Amazon.com shows details about a product and includes user reviews, related products, information about the shopper, and so on. Consequently, whenever a Web page is requested, the application must issue a cascade of queries to the database to retrieve the information displayed on the page. This "chatty" behavior works fine for Web applications with light traffic, but it does not scale well.

Caching is one of the most effective tools for reducing load and improving scalability in read-dominated, data-driven Web applications. ASP.NET includes a built-in caching API that uses an in-memory backing store and includes features like time-based expiries and file system and database dependencies. There's also the Caching Application Block in the Enterprise Library, which can be used outside of ASP.NET applications and offers greater flexibility in terms of how and where the cache data is stored. However, both the ASP.NET cache and the Caching Application Block store their cache data locally. This results in suboptimal performance in a Web farm environment because the data cached in one server is not accessible to other servers in the farm. One option is to designate a single server in the farm as the centralized cache server, have it store the only copy of the cache, and share it among the others. However, this approach introduces a single point of failure and a potential bottleneck.

Distributed caching overcomes the shortcomings of having several localized caches or resorting to a single, centralized cache store. In a nutshell, a distributed cache either replicates or partitions the cache store across multiple servers, providing a more efficient caching strategy in a Web farm environment.

A variety of distributed caching tools are available. One of the most popular is memcached (version 1.2.8), a free, open-source option created by Danga Interactive and used on high-profile sites like LiveJournal, Wikipedia, and SourceForge. Memcached has its roots in the LAMP stack (Linux, Apache, MySQL, PHP), but there are community-created Windows ports and .NET libraries available, along with open-source custom provider classes for integration with ASP.NET's session state. Microsoft is busy working on its own distributed caching library, code-named "Velocity," which at the time of this writing is available as a community technology preview. And there are also commercial distributed caching tools, such as ScaleOut StateServer and ScaleOut SessionServer by ScaleOut Software, and NCache by Alachisoft. (NCache was reviewed in the October 2007 issue: msdn.microsoft.com/en-us/magazine/cc163343.aspx.)

To get started with any distributed caching tool, you must first define the distributed cache's topology. With memcached, you simply start the memcached service or application on those computers that will store cache data, specifying parameters like the cache size via command-line switches. Velocity and most commercial offerings Moreprovide both command-line access and graphical user interfaces for creating and managing the topology.

The patterns for reading from and writing to the distributed cache are no different from those used with ASP.NET's built-in caching API. Both types of caches act as a giant hashtable, where each item in the cache is referenced by a unique string. The pseudocode in Figure 1 shows how data is read from the cache. When the "Get" statement is executed, the distributed cache library determines where the cached item exists in the topology and retrieves the data. Note that the client application cannot assume that the data exists in the cache because it may have expired, been removed by user code or because of a dependency, or been evicted because of memory constraints. Instead, the client application must always check whether data was returned from the cache. If the item is not found, then it must be re-retrieved and re-added to the cache.

Figure 1 Typical "Get" Request

Function GetUserProfile(UserID)
UserProfile = Cache.Get(
“UserProfile" + UserID)
If UserProfile = NULL Then
UserProfile = _
Database.GetUserProfile(UserID)
Cache.Add("UserProfile" +
UserID, UserProfile)
End If
Return UserProfile
End Function

Whenever data is updated, any references to that data in the cache become outdated. To prevent showing stale data, all cached references must be removed or updated. Figure 2 contains pseudocode that would run when a user visiting the site updates her profile. This method not only updates the database for each instance, but also updates the associated cache item with the new data. Other techniques for maintaining fresh data in the cache include expiries and cache dependencies.

Figure 2 Typical "Update" Request

Function UpdateUserProfile(UserProfile)
Database.UpdateserProfile(UserProfile)
Cache.Update("UserProfile" +
UserProfile.UserID, UserProfile)
End Function

Caching is an essential component in building a scalable, data-driven Web application. For large, heavily trafficked Web sites that use a Web farm, consider using a distributed cache to maximize performance. Tools like memcached, Velocity and others provide an easy-to-use API for working with the cache and encapsulate the low-level details of maintaining, updating and accessing a distributed cache.

memcached:

danga.com/memcached

Velocity:

msdn.microsoft.com/en-us/data/cc655792.aspx

Blogs of Note

Most of the technical blogs I subscribe to focus on the technologies I use on a day-to-day basis, including ASP.NET, AJAX, Web design and so forth. But I also make a point to find and read blogs from experts in other fields. To me, an expert is a person who has a wealth of knowledge and, more important, real-world experience and can share this wisdom in a way that's clear and meaningful―even to developers who are not well-versed in the technology.


Udi Dahan’s Blog: udidahan.com/?blog=true

Udi Dahan fits this description. Dahan is a speaker, trainer, and consultant on software architecture and design of distributed systems and has worked on numerous large-scale, service-oriented applications for enterprises. He shares his insights on his blog, on online resource sites and in MSDN Magazine (see msdn.microsoft.com/en-us/magazine/dd569749.aspx and msdn.microsoft.com/en-us/magazine/cc663023.aspx).

If you haven't visited Dahan's blog before, start with the "First time here?" page (udidahan.com/first-time-here), where you'll find his most popular articles, blog posts, interviews and webcasts on service-oriented architecture (SOA), domain models and smart client applications. Also check out the Articles section, udidahan.com/articles, which contains links to Dahan's published content. And when you're ready to implement a SOA, check out nServiceBus, Dahan's free, open-source messaging framework for .NET applications.

nServiceBus:

nservicebus.com

Udi Dahan’s Blog:

udidahan.com/?blog=true

A Helpful Utility for Object-Object Mapping

An object-oriented application architecture models the real-world problem domain through the use of objects and through object-oriented principles like inheritance, modularity and encapsulation. A point-of-sale application would have classes like Employee, Customer, OrderItem and Product. Figure 3 shows how these classes might be composed to model a Sale.

Figure 3 Parts of the Sale Object

public class Sale {
public Employee SalesPerson { get; set; }
public Customer Customer { get; set; }
public IEnumerable<OrderItem> Items {
get; set; }
}
public class OrderItem {
public Product Product { get; set; }
public int Quantity { get; set; }
public decimal Discount { get; set; }
}

While this model works well within the application domain, it may be a less-than-ideal model for moving data outside of the domain. For example, imagine that our application included a Windows Communication Foundation (WCF) service that exposed sales data to a business partner. While the service could return a collection of Sale objects, those objects might contain more data than we care to expose. The Employee object or Customer object in the Sale object might include sensitive information, like Social Security numbers or payment details. The Products that comprise the OrderItems might include unimportant details that unnecessarily inflate the size of the transmitted payload.

A common technique for overcoming these issues is to define Data Transfer Objects (DTOs), such as SalesDTO, EmployeeDTO, CustomerDTO and so on. These DTOs would contain the precise set of properties to share. The code for the service would internally work with the domain models―Sale, Employee and so forth―but before returning the data, it would create the corresponding DTOs and populate them with the appropriate properties from the domain objects.

Writing the domain object to DTO mapping code is tedious. If you find yourself routinely writing such object-to-object mapping code, check out AutoMapper version 0.3.1. AutoMapper is a free, open-source utility that can automatically map one object onto another with as little as two lines of code.

To start, call the Mapper.CreateMap method and specify the source and destination types like so: Mapper.CreateMap <SourceType, DestinationType>(). This creates a MappingExpression object that defines the mapping between the two object types. If there are nested types (as there are in Figure 3), you would call CreateMap once for each type that needs to be mapped.

After creating the mapping, call Mapper.Map and pass it the source object: Mapper.Map <SourceType, DestinationType>(sourceObject). The Map method returns an instance of the destination object with its properties assigned to the corresponding members in the source.

Figures 4 and 5 illustrate an end-to-end example. Figure 4 defines the two objects in this example: Product and ProductDTO. Figure 5 shows code from our WCF service. Here we have a Product object that we need to map onto a ProductDTO object to return to the client. Note how this mapping is performed by AutoMapper's Mapper class with just two lines of code.

Figure 5 AutoMapper's Mapper.Map Method Creates a New ProductDTO Object Based on the Specified Product

public ProductDTO GetProduct(Guid productId) {
Product product = Product.Load(productId);
// Map Product onto a ProductDTO object
Mapper.CreateMap<Product, ProductDTO>();
ProductDTO productDto =
Mapper.Map<Product, ProductDTO>(product);
return productDto;
}

AutoMapper can also map between collections of one type to another, such as mapping a List of Product objects to an array of ProductDTO objects.

In the real world, it's not always possible to have the property names or property types neatly line up between the source and destination object types, but that's no problem for AutoMapper. With one line of code, you can project a property (or properties) in the source type to a differently named property in the destination type. If the mapped property types do not align, AutoMapper can automatically convert the source property type to the destination property type if there is a suitable type converter in the .NET Framework. If no such type converter exists, you can create a custom type converter.

Price: Free, open source

codeplex.com/AutoMapper

Scott Mitchell , author of numerous books and founder of 4GuysFromRolla.com, is an MVP who has been working with Microsoft Web technologies since 1998. Mitchell is an independent consultant, trainer and writer. Reach him at Mitchell@4guysfromrolla.com or via his blog at ScottOnWriting.net.