July 2017

Volume 32 Number 7

[Machine Learning]

Scale Applications with Microsoft Azure Redis and Machine Learning

By Stefano Tempesta

In a multi-tier application, bottlenecks can occur at any of the connection points between two tiers: at business logic and data access layers, client and service layers, presentation and storage layers, and so on. Large-scale applications can implement various levels of caching of information for improving performance and increasing scalability. Caching can be configured in memory or on some more permanent form of storage, in different sizes, and in diverse geographic locations. The open source Redis engine, as implemented in Azure, lets you intuitively configure and manage all these aspects, and use a variety of programming languages.

This article presents design best practices and code examples for implementing the Azure Redis Cache and tuning the performance of ASP.NET MVC applications, optimizing cache hit ratio and reducing “miss rate” with smart algorithms processed by Azure Machine Learning.

Azure Redis Cache

Let’s start by saying that Redis is not a Microsoft product. Redis is an open source project freely available for download from the Web site redis.io. Everyone can download the cache engine and install it on their servers. But Microsoft offers this, and much more, as a service in Azure. You can create a new Redis Cache instance in Azure in a few minutes and be ready to connect to it from your application.

What makes Redis different from other caching frameworks is its support for specialized data types, besides the typical key-value string pair, common in other cache engine implementations. You can run atomic operations on these types, such as appending to a string, incrementing the value in a hash, pushing an element to a list, computing set intersection, union and difference, or getting the member with highest ranking in a sorted set.

From an operational perspective, the Redis implementation in Azure lets you replicate cache instances in a two-node primary/secondary configuration, entirely managed by Microsoft. Redis also supports master-subordinate replication, with fast non-blocking first synchronization, auto-reconnection on net split, and so forth. Just to expand on the replication feature, which, for me, is a differentiation point:

  • Redis replication is non-blocking on the master side. This means that the master will continue to handle queries when one or more slaves perform the initial synchronization.
  • Replication is non-blocking on the slave side. While the slave is performing the initial synchronization, it can handle queries using the old version of the dataset.

Optionally, Redis supports persistence. Redis persistence lets you save data stored in Redis cache permanently to an allocated storage in Azure. You can also take snapshots and back up the data, which you can reload in case of a failure.

Last, Azure Redis Cache comes with important monitoring capabilities that, when enabled, provide insights on the utilization of the cache, in terms of cache hits and misses, used storage, and so on, as shown in Figure 1.

Redis Insights in Microsoft Azure
Figure 1 Redis Insights in Microsoft Azure

Connecting to Redis Cache in .NET

Once Redis Cache is configured in Azure, you define a unique URL to access it from a software application and obtain a key for authentication. These two pieces of information are necessary to establish a connection to the Redis Cache engine from your application. Let’s build an ASP.NET MVC application then, which stores objects in Redis Cache.

A common option for storing the cache access credentials is the Web.config file of the ASP.NET MVC application. In the <appSettings> section, you can simply add a key:

<add key="CacheConnection" value="<instance-

Parameter abortConnect is set to true, which means that the call won’t succeed if a connection to the Azure Redis Cache can’t be established. You might opt for a secured connection over HTTPS by setting the ssl parameter to true.

You also need to add one of the following NuGet packages to your projects:

  • StackExchange.Redis: A .NET implementation of a Redis Cache client, which provides an easy-to-use interface to Redis commands.
  • Newtonsoft.Json: The popular JSON framework for deserializing objects to JSON and allowing storage in a Redis Cache database.

In its simplified implementation, the MVC application stores and retrieves contact details from the Redis Cache by defining CRUD actions in a Controller. If the object isn’t found in the cache, it will be restored from the database and then stored in the cache for future access.

Your Contact model is defined as follows, with a few properties to enter name, e-mail address and country of origin:

public class Contact
  public Guid Id { get; set; }
  [DisplayName("Contact Name")]
  public string Name { get; set; }
  public string Email { get; set; }
  public string Country { get; set; }

You now add a new Contacts Controller to the application, choosing to add views using Entity Framework. As you’ll see, however, you’ll add a layer of fetching data from the cache first, before hitting the database.

Connecting to the Redis Cache, then, is purely a matter of defining a connection using the connection string stored in the Web.config file. Your ContactsController class will look something like this:

public class ContactsController : Controller
  static string cacheConnectionString =
  ConnectionMultiplexer connection =

Now, this is far from being a good practice for defining a connection to any storage system, as hardcoding the ConnectionMultiplexer class inside the Controller’s code clearly creates a highly coupled dependency. Ideally, you’d inject this dependency using an Inversion of Control library. However, for the sake of keeping things simple and straight in this example, the ConnectionMultiplexer class is all you need to obtain a connection to an instance of Redis Cache. The ConnectionMultiplexer, defined in the StackExchange.Redis namespace, works as a factory by exposing a static Connect method, and returning an instance of itself as a live connection to the defined Redis Cache.

A different approach to sharing a ConnectionMultiplexer instance in your application is to have a static property that returns a connected instance. This provides a thread-safe way to initialize only a single connected ConnectionMultiplexer instance, which can be shared in a singleton class. By masking the ConnectionMultiplexer behind a Lazy object, you also obtain just-in-time (JIT) allocation of the connection when actually used by a Controller’s action:

static Lazy<ConnectionMultiplexer> lazyConnection =
  new Lazy<ConnectionMultiplexer>(() =>
  return ConnectionMultiplexer.Connect(cacheConnectionString);
static ConnectionMultiplexer Connection => lazyConnection.Value;

Reading from and Writing to a Redis Cache Instance

Now that you’ve established a connection to a Redis Cache instance, you can access it in the read-and-write actions of the MVC Controller. The Get method of a ContactManager class checks for an instance of the object identified by its ID in cache, and if not found, will retrieve it from the database and allocate it in Redis for future access, as shown in Figure 2.

Figure 2 Get Method in the ContactManager Class

public async Task<Contact> Get(Guid id)
  IDatabase cache = cacheContext.GetDatabase();
  var value = cache.HashGet(cacheKeyName, id.ToString());
  // Return the entry found in cache, if any
  // HashGetAsync returns a null RedisValue if no entry is found
  if (!value.IsNull)
    return JsonConvert.DeserializeObject<Contact>(value.ToString());
  // Nothing found in cache, read from database
  Contact contact = databaseContext.Contacts.Find(id);
  // Store in cache for next use
  if (contact != null)
    HashEntry entry = new HashEntry(
      name: id.ToString(),
      value: JsonConvert.SerializeObject(contact));
    await cache.HashSetAsync(cacheKeyName, new[] { entry });
  return contact;

From the cache context, which identifies a connection to Redis Cache, you’ll obtain a reference to the data storage inside Redis itself, through the GetDatabase method. The returned IDatabase object is a wrapper around the Redis cache commands. Specifically, the HashGet method executes the HGET command (bit.ly/2pM0O00) to retrieve an object stored against the specified key (the object ID). The HGET command returns the value identified by a unique key in a named hash collection, if existing, or a null value otherwise. As key in the cache, you can use the object’s ID (a GUID), consistently with the same ID stored at database level, and managed by Entity Framework.

If an object is found at the indicated key, it’s deserialized from JSON into an instance of the Contact model. Otherwise, the object is loaded from the database, using the Entity Framework Find by ID, and then stored in cache for future use. The HashSet method, and more precisely its async variant, is used for storing a JSON serialized version of the Contact object.

Similar to this approach, the other CRUD methods are implemented around the HashSet method for creating and updating objects in Redis Cache, and the HashDelete method for removing them.

The complete source code is available in the associated code download at bit.ly/2qkV65u.

Cache Design Patterns

A cache typically contains objects that are used most frequently, in order to serve them back to the client without the typical overhead of retrieving information from a persistent storage, like a database. A typical workflow for reading objects from a cache consists of three steps, and is shown in Figure 3:

  1. A request to the object is initiated by the client application to the server.
  2. The server checks whether the object is already available in cache, and if so, returns the object immediately to the client as part of its response.
  3. If not, the object is retrieved from the persistent storage, and then returned to the client as in Step 2.

Level 1 Cache Workflow
Figure 3 Level 1 Cache Workflow

In both cases, the object is serialized for submission over the network. At cache level, this object might already be stored in serialized format, to optimize the retrieval process.

You should note that this is an intentionally simplified process. You might see additional complexity if you check for cache expiration based on time, dependent resources and so on.

This configuration is typically called a Level 1 (L1) cache, as it contains one level of cache only. L1 caches are normally used for Session and Application state management. Although effective, this approach isn’t optimal when dealing with applications that move large quantities of data over multiple geographies, which is the scenario that we want to optimize. First of all, large data requires large caches to be effective, which in turn are memory-intensive, thus requiring expensive servers with a big allocation of volatile memory. In addition, syncing nodes across regions implies large data transfers, which, again, is expensive and introduces delays in availability of the information in the subordinate nodes.

A more efficient approach to caching objects in data-intensive applications is to introduce a Level 2 (L2) cache architecture, with a first cache smaller in size that contains the most frequently accessed objects in the larger dataset, and a second cache, larger in size, containing the remaining objects. When the object isn’t found in the first-level cache, it’s retrieved from the second level, and eventually refreshed periodically from the persistent storage. In a geographically distributed environment, the L2 caches are synced across datacenters, and the L1 cache resides on the master server, as shown in Figure 4.

Level 2 Cache Workflow
Figure 4 Level 2 Cache Workflow

The challenge, then, is to define what goes in the L1 cache, what goes in the L2 cache, and with what frequency the regional nodes should be synced to optimize performance and storage of the cache instances. Performance of a cache is measured as “hit ratio” and “miss ratio.” The hit ratio is the fraction of accesses that are a hit (object found in cache) over all of the requests. The miss ratio is the fraction of accesses that are a miss (object not found in cache), or the remaining of the hit ratio to 100 percent.

With a mathematical formula, you can express the hit ratio as that shown in Figure 5.

Hit Ratio
Figure 5 Hit Ratio

The miss ratio is expressed as “1 - hit ratio.”

To optimize the performance of a cache instance, you want to increase the hit ratio and decrease the miss ratio. Irrespective of adopting an L1 or L2 cache architecture, there are different techniques for improving a cache performance, by pre-fetching data in cache on a regular basis to JIT caching, or allocation of the most used objects based on counters.

A prediction technique based on a machine learning algorithm is called Demand Estimation. Based on patterns of usage of objects, the Demand Estimation algorithm predicts the likelihood that an object will be used and, therefore, it can be allocated in cache before a request is submitted to increase the chance of a hit.

I’ll focus on the implementation of the Demand Estimation machine learning algorithm in the context of a data-oriented application, observing what objects are typically accessed, and populating the cache with the most used ones, as predicted by the algorithm’s outcome.

Machine Learning

Machine learning is a technique of data science that helps computers learn from existing data in order to forecast future behaviors, outcomes and trends. A machine learning system is one that uses data to make a prediction of some sort. Common techniques include logistic regression and neural network classification. Note that artificial intelligence (AI) is closely related, but generally refers to predictions that are associated with human behavior such as vision and speech. These predictions can make applications look smarter. For example, when you shop online, machine learning services might recommend other products you’d like based on what you’ve already purchased. When your credit card is swiped, another machine learning service compares your transaction against a database of millions of transactions for an anomaly of behaviors and helps detect a potential fraud.

In the context of a data-oriented application, you start from analyzing the hit ratio in the implemented cache and easily identify some patterns of regular access to specific objects over a period of time, as shown in Figure 6. For each object type (Contact, Interest, Brochure and so on), you can also drill down to the individual objects allocated over time and have better insights on the elapsing data flow of your application. But say your business is very seasonal and seasons change according to geography, and specific campaigns might influence traffic, too. So how do you scale your cache hit estimation and create a cache of objects that are most commonly accessed under certain conditions? You implement predictive analysis and demand estimation in Azure Machine Learning.

Cache Hits by Object Type
Figure 6 Cache Hits by Object Type

Predictive analytics uses math formulas that analyze historical data to identify patterns of use to forecast future demand. For your machine learning model to provide predictions, the model must first learn from known data in a process known as “training.” During training, data is evaluated by the machine learning algorithm, which analyzes the distribution and type of data, looking for rules and patterns that can be used in later prediction.

Once the training phase is completed, “scoring” is the process of applying a trained model to new data to generate predictions. Making predictions, though, might be error-prone, also for machines. Would the predicted data have a good proportion of true results to total cases, or have high accuracy (a characteristic called “generalization”)? Or would it be consistent with the patterns identified in the historical data and give consistent results (“precision”)? You need to quality check your prediction. “Evaluation” is the QA of the scoring process.

Demand Estimation in Azure Machine Learning Studio

Azure Machine Learning Studio is a cloud predictive analytics service that makes it possible to quickly create and deploy predictive models as analytics solutions. It’s possible to work from a ready-to-use library of algorithms or start with a blank experiment, use them to create models, and deploy the predictive solution as a REST service. Azure Machine Learning Studio is available at studio.azureml.net.

The entire process, or “project” in Azure Machine Learning Studio, consists of the following:

  • Importing your initial dataset; that is, the data on which the training process will be based. For our example, it’s page views over time and by region.
  • Defining the Machine Learning “experiment” that will generate the estimated outcome by choosing the appropriate algorithm for the process.
  • Executing the Training, Scoring and Evaluation processes to build the predictive model.
  • Deploying a REST service that external applications can consume to obtain the predicted outcome “as a service.” This is pure Machine Learning as a Service (MLaaS).

Let’s go through these steps in detail.


To develop and train a predictive analytics solution in Azure Machine Learning Studio, it’s necessary to import a dataset to analyze. There are several options for importing data and using it in a machine learning experiment:

  1. Upload a local file. This is a manual task. Azure Machine Learning Studio, at the time of this writing, supports the upload of .csv and .tsv files, plain text (.txt), SVM Light (.svmlight), Attribute Relation File Format (.arff), R Object or Workspace (.RData), and .zip files.
  2. Access data from an existing data source. The currently supported data sources are a Web URL using HTTP, Hadoop using HiveQL, Azure Blob or Table storage, Azure SQL Database or SQL Server on Azure Virtual Machine, on-premises SQL Server database, and any OData feed.
  3. From another Azure Machine Learning experiment saved as a dataset.

More information about importing training data into Azure Machine Learning Studio from various data sources is available at bit.ly/2pXhgus.


Once the dataset is available, and assuming you call it “cache hits,” it’s possible to build the machine learning experiment that will analyze data, identify patterns of usage of objects in cache and forecast the future demand. This estimation, expressed as a number of hits under certain conditions, will be used for populating the L2 cache in Redis.

Building a new experiment consists of defining a flow of tasks that are executed by the machine learning engine. In its more linear representation, a demand estimation experiment includes the following steps:

  • Add the dataset as the starting point of the experiment. On the left-hand panel, existing datasets are available under Saved Datasets | My Datasets.
  • Optionally, apply a transformation to the existing dataset. This can be done for several reasons, including data cleansing or improv­ing data accuracy. You’ll use the Select Columns in Dataset (bit.ly/2qTUQZ3) and the Split Data (bit.ly/2povZKP) modules to reduce the number of columns to analyze, and divide your dataset into two distinct sets. This is a necessary step when you want to separate data into training and testing sets, so that you can evaluate a model on a holdout dataset. The two modules are available under the Data Transformation section.
  • You can now train the model on the first split of the dataset by adding the Train Model (bit.ly/2qU2J0N) module as a next step in the experiment flow, and connect it to the left-hand side connection point of the Split Data module (the fraction of rows to consider for training). The Train Model module can be found under Machine Learning | Train. In the Column Set option, select the label column in the training dataset. The column should contain the known values for the class or outcome you want to predict. This is a numeric data type, in the example project the cache hits by object, on a given day.
  • Training a model requires to connect and configure one of the classifications or regression models provided in Azure Machine Learning Studio. You’ll use the Linear Regression (bit.ly/2qj01ol) module available in the Machine Learning | Initialize Model | Regression section.

Linear Regression

When a value is predicted, as with cache hits, the learning process is called “regression.” Training a regression model is a form of supervised machine learning (bit.ly/2qj01ol). That means you must provide a dataset that contains historical data from which to learn patterns. The data should contain both the outcome you’re trying to predict, and related factors (variables). The machine learning model uses the data to extract statistical patterns and build a model.

Regression algorithms (bit.ly/2qj6uiX) are algorithms that learn to predict the value of a real function for a single instance of data. These algorithms can incorporate input from multiple features by determining the contribution of each feature of the data to the regression function. Once the regression algorithm has trained a function based on already labeled data, the function can be used to predict the label of a new (unlabeled) instance. More information on how to choose algorithms for Azure Machine Learning is available at bit.ly/2gsO6PE.

Scoring and Evaluation

“Scoring” is the process of applying a trained model to new data to generate predictions and other values. The Score Model module (bit.ly/1lpX2Ed), available in the Machine Learning | Score | Score Model section, will predict the number of cache hits according to the selected features, as shown in Figure 7.

Cache Hits Estimation Experiment
Figure 7 Cache Hits Estimation Experiment

After the Scoring step, you can now connect the scored dataset to the Evaluate Model module (bit.ly/1SL05By) to generate a set of metrics used for evaluating the model’s accuracy (performance). Consider the Evaluation step as your QA process: You want to make sure that the predicted values are as accurate as possible by reducing the amount of error. A model is considered to fit the data well if the difference between observed and predicted values is small. The Evaluate Model module is available in the Machine Learning | Evaluate section.


A prediction of cache hits would be pointless if you couldn’t access this information and use it to optimize the pre-allocation of objects in cache. Access to the outcome of the predictive experi­ment is via a Web service that Azure Machine Learning generates and hosts at a public URL. It’s a REST endpoint that accepts a POST request, with an authorization bearer in the header, and a JSON input message in the body.

The authorization bearer is a key that authorizes a client appli­cation to the consumer service. The request body contains the input parameters to pass to the service, as specified in the predictive experiment. The format looks like that shown in Figure 8.

Figure 8 Request Body of the Machine Learning Service

  "Inputs": {
    "inputData": {
      "ColumnNames": [
      "Values": [
  "GlobalParameters": {}

The service’s response is a JSON message containing the scored label, as shown in Figure 9.

Figure 9 Response Body of the Machine Learning Service

  "Results": {
    "outputData": {
      "type": "DataTable",
      "value": {
        "ColumnNames": [
          "Scored Labels"
        "ColumnTypes": [
        "Values": [

Using HttpClient for establishing an HTTP connection to the service, it’s trivial to access the Web service and read the predicted outcome:

  • Input parameters are passed as a collection of strings.
  • The API key is assigned a bearer value in the request’s header.
  • The message is sent to the endpoint as a POST in JSON format.
  • The response is read as a string, again in JSON format.

Figure 10 shows the code input for consuming the Machine Learning Service in the Microsoft .NET Framework.

Figure 10 Consuming the Machine Learning Service in the Microsoft .NET Framework

using (var client = new HttpClient())
  var scoreRequest = new
    Inputs = new Dictionary<string, StringTable>() {
        new StringTable()
          ColumnNames = new string[] { "Date", "Object", "Hits" },
          Values = new string[,] {  { "YYYY/MM/DD", "GUID", "#" }  }
    GlobalParameters = new Dictionary<string, string>()
  client.DefaultRequestHeaders.Authorization =
    new AuthenticationHeaderValue("Bearer", apiKey);
  client.BaseAddress = new Uri(serviceEndpoint);
  HttpResponseMessage response =
    await client.PostAsJsonAsync(string.Empty, scoreRequest);
  if (response.IsSuccessStatusCode)
    string result = await response.Content.ReadAsStringAsync();

The full source code is available at bit.ly/2qz4gtm.

Wrapping Up

Observing cache hits for objects over several weeks generated a dataset that could be used in a machine learning project to identify access patterns and predict future demand. By exposing a Web service that can be consumed by an external integration workflow (running on Azure Logic Apps, for example), it’s possible to obtain predictions of demand on specific objects and pre-allocate them in Redis cache before they’re requested in order to minimize miss ratio. The observed improvement was of nearly 20 percent better hit ratio, passing from about 60 percent to almost 80 percent in the L2 cache. This has helped sizing the L2 cache accordingly, and by using the regional syncing capability of Azure Redis Cache, its minimized sync time between distributed nodes by a similar proportion (20 percent shorter duration).

Stefano Tempesta is a Microsoft MVP and technology ambassador, as well as a chapter leader of CRMUG for Italy and Switzerland. A regular speaker at international conferences, including Microsoft Ignite, NDC, API World and Developer Week, Tempesta’s interests span across cloud, mobile and Internet of Things. He can be reached via his personal Web site at tempesta.space.

Thanks to the following Microsoft technical expert for reviewing this article: James McCaffrey

Discuss this article in the MSDN Magazine forum