October 2010

Volume 25 Number 10

Interoperability - Runtime Data Sharing Through an Enterprise Distributed Cache

By Iqbal Khan | October 2010

Many organizations use a combination of Microsoft .NET Framework and Java applications, especially midsize to large organizations that can’t commit to only one technology for various reasons. Often, they employ Web applications, service-oriented architecture (SOA) Web services and other server applications that process lots of transactions.

Many of these applications need to share data with one another at run time. Often, they’re all working on common business data that’s stored in a database. They typically deal with continuous streams of data (for example, financial trading applications), and they need to process it and share results with other applications, again all at run time.

Although the database should be the master data store for permanent storage, it’s not well-suited for runtime data sharing. One reason for this is that performance isn’t always great when reading data from the database. Furthermore, the database may not scale nicely in terms of handling transactions, so it may quickly become a bottleneck and slow down all the applications relying on it.

Moreover, you can’t effectively share data in real time. Real-time data sharing requires that as soon as one application updates some data, all other applications interested in that data should be informed. Similarly, some applications may be waiting for certain types of data to be created and made available, and when this happens, they should be notified immediately.

These problems are common whether the applications needing to share data are all based on the .NET Framework or whether some are .NET and others Java. In fact, if the applications are a mix of .NET and Java, the problems are compounded because there’s no automatic way for these applications to share data at the app-to-app level in a native fashion.

The Solution: Enterprise Distributed Cache

Luckily, the enterprise distributed cache can resolve these problems. This in-memory store spans multiple servers, pooling the memory of the servers so that memory storage capacity is scalable. Transaction capacity also becomes scalable, so the more servers you add, the greater the transaction load you can handle.

Enterprise distributed caches also provide event notification mechanisms, allowing applications to alert one another when they have updated data. Hence, you can have an asynchronous event notification mechanism where one application produces some data and others may consume it, creating a producer/consumer model or publish/subscribe model. Multiple applications subscribe to certain types of data and are notified when it’s published.

There’s also a read-through/write-through mechanism, which means the enterprise distributed cache itself can read considerable data from the data source and the applications. Regardless of whether those applications are Java or .NET, their code becomes much simpler because they read the data from the enterprise distributed cache. They don’t need to have all that database access code built into them. See Figure 1 for a simple example of a .NET Framework app using an enterprise distributed cache.

Figure 1 A .NET App Using an Enterprise Distributed Cache

using System;
using Alachisoft.NCache.Web.Caching;
namespace Client
  class Program
    static string _sCacheName = "myAppCache";
    static Cache _sCache = NCache.InitializeCache(_sCacheName);
    static void Main(string[] args)
      string employeeId = "1000";
      string key = "Employee:EmployeeId:" + employeeId;
      // First check the cache for this employee
      Employee emp = _sCache.Get(key);
      // If cache doesn't have it then make database call
      if (emp == null)
        emp = LoadEmployeeFromDb(employeeId);
        // Now add it to the cache for next time
        _sCache.Insert(key, emp);

Moreover, an enterprise distributed cache can synchronize itself with any data changes in the database made by other third-party applications. It has a connection with the database, allowing the database to notify it whenever a certain type of data changes in the database. Figure 2 illustrates how .NET and Java applications can share data with one another at run time through an enterprise distributed cache.

image: .NET and Java Apps Sharing Data Through a Distributed Cache

Figure 2 .NET and Java Apps Sharing Data Through a Distributed Cache

.NET and Java Apps Sharing Data

With an enterprise distributed cache, multiple applications—both .NET and Java—can access the same cache and share data through it. If it were only .NET applications (or only Java applications) sharing data through a distributed cache, they could store the objects in a native binary form and serialize/deserialize them. But when both types try to share data with each other, they need a portable data format in which to store the data in the distributed cache.

That’s because when a .NET application stores an object in the distributed cache, it actually transforms the object into an XML document and stores that XML. On the other side, when a Java application reads that data from the distributed cache, it transforms the XML into a Java object. In effect, the XML is used as a portable data storage mechanism as a .NET object is transformed into XML and then from XML into Java and vice versa.

A number of open source libraries can help you transform your .NET or Java objects into XML and then back into the object form. You can develop your own, of course, but I recommend you pick an open source library. I personally like Web Objects in XML (WOX), developed by Carlos Jaimez and Simon Lucas (woxserializer.sourceforge.net). I’ll use examples of Java-to-.NET transformation from their Web site in this article (with their permission). Figure 3shows Student and Course classes defined both in Java and C#.

Figure 3 Student and Course Classes in Java and C#

// Java classes
public class Student
  private String name;
  private int registrationNumber;
  private Course[] courses;
public class Course
  private int code;
  private String name;
  private int term;
// ***************************************************
// .NET classes in C#
public class Student
  private String name;
  private Int32 registrationNumber;
  private Course[] courses;
public class Course
  private Int32 code;
  private String name;
  private Int32 term;

If we use .NET and Java apps to store these Student and Course objects in an enterprise distributed cache, we can then use the WOX library to transform them into XML. Then, when an application wants to read these objects from the enterprise distributed cache, it reads the WOX library again to transform the XML back into the Java or .NET object form. Figure 4 shows both Student and Course classes transformed into XML.

Figure 4 Java and .NET Classes Transformed into XML

<object type="Student" id="0">
  <field name="name" type="string" value="Carlos Jaimez"/>
  <field name="registrationNumber" type="int" value="76453"/>
  <field name="courses">
    <object type="array" elementType="Course" length="3" id="1">
      <object type="Course" id="2">
        <field name="code" type="int" value="6756"/>
        <field name="name" type="string" 
          value="XML and Related Technologies"/>
        <field name="term" type="int" value="2"/>
      <object type="Course" id="3">
        <field name="code" type="int" value="9865"/>
        <field name="name" type="string" 
          value="Object Oriented Programming"/>
        <field name="term" type="int" value="2"/>
      <object type="Course" id="4">
        <field name="code" type="int" value="1134"/>
        <field name="name" type="string" value="E-Commerce Programming"/>
        <field name="term" type="int" value="3"/>

Within your application, you should call WOX from your caching layer or the data access layer.

Item-Based Event Notifications

Event notifications are a powerful mechanism that allows multiple applications (both .NET and Java) to coordinate data sharing asynchronously. This mechanism helps avoid the expensive polling of the database that applications would have to do if they didn’t have such a facility. It’s shared among .NET and Java applications, so they can notify one another seamlessly.

One common type of event notification is item-based notification. In this type, applications register interest in various cached item keys (that may or may not exist in the cache yet), and they’re notified separately whenever that item is added, updated or removed from the distributed cache by anybody for any reason. For example, even if an item is removed due to expiration or eviction, the item-remove event notification is fired.

Both .NET and Java applications can register interest for the same cached items and be notified about them. The notification usually includes the affected cached item as well, which, as we saw in the previous section, is transformed into either .NET or Java, depending on the type of application.

App-Generated Custom Event Notifications

An enterprise distributed cache is also a powerful event propagation platform for both .NET and Java applications. Any applications connected to an enterprise distributed cache can fire custom events into the cache, and then all other applications that have registered interest in those custom events will be notified by the cache, regardless of where those applications are located. This by itself provides a powerful language- and platform-independent event propagation mechanism in an enterprise distributed cache.

This feature allows applications to collaborate in data sharing asynchronously. For example, if one application puts some data in the distributed cache, it can then fire a custom event so other applications that are supposed to consume or process this data further are notified immediately.

Continuous Query-Based Event Notifications

Item-based event notification is powerful but requires the application to know the key of the cached item. If you combine item-based event notification with other grouping features commonly provided in an enterprise distributed cache (such as tags, groups/subgroups and more), you can pretty much handle most of the cases where applications need to be notified based on what happens to various cached items.

However, there are two limitations with item-based events. First, as noted, the application has to know all the keys of cached items about which it wants to be notified. Second, it will be notified no matter what change is made to these items. The application can’t put more detailed criteria in place, so it’s notified only when specific changes are made to the data.

To handle such cases, an enterprise distributed cache provides a continuous query—a SQL-like query that captures an application’s business rules about data in which it’s interested. A continuous query isn’t a search query but rather a “criteria” the enterprise distributed cache keeps; anytime something is added or updated in the distributed cache, the operation is compared to the continuous query criteria. If the criteria match, an event is fired and the application issuing the continuous query criteria is notified.

Continuous query allows applications to watch for more complex and sophisticated changes and be notified only when those changes happen.

Read-Through and Write-Through Handlers

Many times, applications try to read data that doesn’t exist in the enterprise distributed cache and must be read from a database. In these situations, applications could go directly to the database and read that data, but that would mean that all applications end up having the same data access code duplicated (especially in both .NET and Java). Or, they can ask the enterprise distributed cache to read the data from the database for them when they need it.

The read-through/write-through feature allows an enterprise distributed cache to read data directly from the data source. The applications can simplify their code so they don’t have to go to the database. They can just ask the enterprise distributed cache to give them the data, and if the cache doesn’t have the data, it will go and read it from the data source. Figure 5 shows how read-through and write-through fit into an enterprise distributed cache.

image: How Read-Through/Write-Through Is Used

Figure 5 How Read-Through/Write-Through Is Used

I want to mention one point of caution here. Although there’s great benefit in having the distributed cache read the data from the database for you, many types of data are best read directly from the database by the application. If you’re reading collections of data that involve complex joins, it’s best to read it yourself and then put it in the distributed cache.

Database Synchronization

Because a lot of data is being put in the enterprise distributed cache, it only makes sense to make sure this data is kept synchronized with the master data source, usually a relational database. An enterprise distributed cache provides such a feature.

This database synchronization feature allows applications to specify a relationship (a dependency) between cached items and rows in database tables. Whenever data in the database changes, the database server fires a .NET event if it’s a SQL Server 2005/2008 database and notifies the enterprise distributed cache of such a change. For other databases that don’t support .NET events, an enterprise distributed cache also provides configurable polling, where it can poll the database (say, once every 15 seconds) and synchronize if data has changed there.

The distributed cache then either removes that data from the cache or reads a fresh copy of it if you’ve configured the read-through feature. Figure 6 shows how an enterprise distributed cache synchronizes with SQL Server.

image: Database Synchronization in Distributed Cache

Figure 6 Database Synchronization in Distributed Cache

High Availability: Self-Healing Dynamic Cluster

An enterprise distributed cache is used as a runtime data sharing platform among multiple applications (.NET to .NET, .NET to Java, and Java to Java). In many cases, these applications are mission-critical for your business.

This means an enterprise distributed cache must be highly available, because so many mission-critical applications depend on it. The enterprise distributed cache can’t go down or stop working, and it should require virtually no downtime for maintenance or other normal operations.

An enterprise distributed cache achieves high availability by having a self-healing, dynamic cluster of cache servers. Self-healing here means the cluster is aware of all its members and adjusts dynamically if a member leaves or joins. It also ensures that data is replicated for reliability, and if a cluster member leaves, its backup data is made available to the applications automatically. All of this must be done quickly and without causing any interruptions in the applications using the enterprise distributed cache.

Scalability: Cache Partitioning and Replication

Many applications using an enterprise distributed cache are high-transaction apps. Therefore, the load on the cache cluster can grow rapidly; however, if the response time of the enterprise distributed cache drops, it loses its value. In fact, this is an area where an enterprise distributed cache is superior to relational databases; it can handle a lot more transactions per second because it can keep adding more servers to the dynamic cluster. But scalability can’t be achieved unless data in the distributed cache is stored intelligently. This is accomplished through data partitioning, with each partition replicated for reliability.

Thanks to the enterprise distributed cache, you’re allowed to exploit the benefits of a partitioned topology for scalability. Figure 7 shows a partitioned-replication topology.

image: Partitioned-Replication Topology for Reliable Scalability

Figure 7 Partitioned-Replication Topology for Reliable Scalability

An enterprise distributed cache automatically partitions all data you’re storing in the cache. Every partition is stored on a different server, and a backup for this partition is created and stored on yet another server. This ensures that if any server goes down, no data is lost.

So, to summarize, partitioning allows you to keep adding more cache servers to the dynamic cluster to grow storage capacity, and it also increases the transaction-per-second capacity as you add more servers. And replication ensures reliability of data because no data loss occurs if a server goes down.

Capable and Collaborative

Wrapping up, an enterprise distributed cache is an ideal way for high-transaction .NET and Java applications to share data with other apps. It ensures that data sharing is done in real time because of its powerful event propagation mechanisms, including item-based event notification, application-generated custom event notifications and continuous query-based event notifications.

An enterprise distributed cache is extremely fast and scalable by design. It’s fast because it’s in-memory. It’s scalable because it can grow into multiple servers. It partitions the actual storage and stores each partition on a different server, and it stores a backup of that partition onto yet another server, like a RAID disk.

Today’s applications need to be much more capable than in the past. They need to work in a more collaborative fashion to share data and interact with one another. They need to be exceedingly fast while meeting the needs of extremely high loads to avoid compromising performance and scalability. Moreover, they must perform across different platforms so .NET applications can transparently and effectively work with Java apps. An enterprise distributed cache helps meet all these goals.

Iqbal Khan is the president and technology evangelist of Alachisoft (alachisoft.com), which provides NCache, a .NET-distributed cache for boosting performance and scalability in enterprise applications. Khan received a master’s degree in computer science from Indiana University in 1990. You can reach him at iqbal@alachisoft.com.

Thanks to the following technical expert for reviewing this article: Stefan Schackow