Article
01/25/2019

May 2017

Volume 32 Number 5

[DevOps]

Optimize Telemetry with Application Insights

By Victor Mushkatin | May 2017

The importance of monitoring your service is self-evident. In this article, we’ll focus on fundamental techniques to make your monitoring investments manageable. For the purpose of this discussion, “manageable” means that whatever telemetry you collect about your service, while actionable, doesn’t consume an unreasonable amount of resources.

When everything works smoothly, you don’t necessarily care about terabytes of log data collected for the service execution. You only care about the general trends. However, when the service goes down or performs poorly, you need everything—and then some—to diagnose the issues. How do you keep the balance between data that’s required to detect the problem and needs to be collected all the time, from the data that’s required to troubleshoot the problem and needs to be collected, well, when it needs to be collected?

To illustrate the techniques, we’ll use the Microsoft Azure Application Insights Service and its highly extensible SDK model. While concepts we cover are universally applicable, our goal is to make you familiar with Application Insights SDK out-of-the-box capabilities and extensibility points that let you make your “telemetry exhaust” manageable. After reading this article, you’ll be able to understand the Application Insights domain model, how telemetry is collected, and what coding techniques are available to decrease the amount of telemetry, while preserving monitoring capabilities, analytical accuracy, and diagnosing depth.

Unbound Volume of Telemetry

Take a service that processes 1 billion transactions a day. If you log all details about every transaction, you’ll be able to answer all sorts of questions—for example, questions about transaction latency 95th percentile from a particular geolocation, or failure rate for users running a particular browser. In addition to these monitoring metrics, you’ll be able to support users when they call and ask specific questions about their failed transaction by looking into logs. The more data you collect, the wider range of questions you can answer by analyzing application telemetry.

But it all comes with a price. Even with low prices for the telemetry collection, do you really need to collect 1 billion data points? Or worse, if every transaction makes one call to SQL Database, do you really need to collect an additional 1 billion data points for all SQL Database calls? If you add up all possible telemetry that might be needed in order to monitor, troubleshoot and analyze your service execution, you might find the infrastructure to support it to be quite expensive, significantly affecting ROI of the monitoring.

Typically, for monitoring purposes, people use various service key performance indicators (KPIs)—for example, service load, transaction latency and failure rate. In order to expose these KPIs, the monitoring system has to have an understanding of the telemetry data structure. It should be able to differentiate events that represent transaction execution from, let’s say, events that represent SQL calls. With the defined semantic of telemetry, you can efficiently convert the unbound volume of telemetry into a manageable set of KPIs that’ll be cheap to collect and store, and have enough data to answer your monitoring questions.

The Application Insights SDK offers the model that enables the Application Insights Service to render effective and intuitive UI to support your monitoring, troubleshooting and analytical needs, as shown in Figure 1.

Figure 1 Application Telemetry Data Model

We’ll focus on two application models as part of this review—applications with an endpoint that receive external requests, typical for Web Applications, and applications that periodically “wake up” to process data stored somewhere, typical for WebJobs or Functions. In both cases, we’ll call unique execution an operation. Operation succeeds or fails through exception or it might depend on other services/storage to carry its business logic. To reflect these concepts, Application Insights SDK defines three telemetry types: request, exception and dependency. For every one of these types, Telemetry Data Model defines fields used to construct common KPIs–name, duration, status code and correlation. It also lets you extend every type with the custom properties. Here are some typical fields for each of the event types:

Request (operation id, name, URL, duration, status code, […])
Dependencies (parent operation id, name, duration, […])
Exception (parent operation id, exception class, call stack, […])

Typically, these types are defined by the application framework and are automatically collected by the SDK. For example, ASP.NET MVC defines the notion of a request execution in its model-view-controller plumbing–it defines when request starts and stops, dependency calls to SQL are defined by System.Data, and calls to HTTP endpoints are defined by System.Net. However, there are cases where you might need to expose telemetry unique to your application. For example, you might want to implement diagnostics logging using a familiar-to-you instrumentation framework, such as Log4Net or System.Diagnostics, or you might want to capture user interaction with your service to analyze usage patterns. Application Insights recognizes three additional data types to assist with such a need—Trace, Event and Metric:

Trace (operation id, message, severity, […])
Metrics (operation id, name, value, […])
Event (operation id, name, user id, […])

In addition to data collection, Application Insights will automatically correlate all telemetry to the operation of which it’s a part. For example, if while processing a request application you make some SQL Database calls, Web Services calls and recorded diagnostics info, it all will be automatically correlated with request by placing a unique auto-generated operation id into the respective telemetry payload.

The Application Insights SDK has a layered model where the previously stated telemetry types, extensibility points and data reduction algorithms are defined in the Application Insights API NuGet package. To focus discussion on core principles, we’ll use this SDK to reduce the number of technology-specific data collection concepts as much as possible.

Reduction Techniques

There are four data reduction techniques available in the Application Insights SDK. As a developer, you might utilize them using a built-in extensibility API. We’ll demonstrate usage of those APIs later in this article.

Metrics extraction and aggregation is a technique that lets you locally reduce data by aggregating metrics from telemetry data and sending only aggregated values, instead of the events themselves. Imagine you have 100 requests per minute. If the only thing you care about is the number of requests per minute, this technique would let you locally count the number of requests and send the value once a minute, instead of sending each request and calculating counts from the raw telemetry.

Sampling is a technique that selectively collects subsets of telemetry that lets you estimate the characteristics of the service. For most services you might collect every “n-th” request to get well-distributed statistical representation of service behavior. This technique, on the one hand, lets you reduce the volume of collection by “n” times, and on the other hand, preserves with certain accuracy statistical validity of the metrics derived from such telemetry. For better accuracy, a sophisticated algorithm and data model must be used.

Exemplification is the ability to collect samples of interest without invalidating sampling statistical accuracy. For example, you might want to always collect request failures regardless of sampling configuration. This way, while you reduce telemetry load with sampling, you can preserve useful troubleshooting data.

Filtering is the ability to reduce data by filtering out telemetry you don’t care about. For example, you might want to ignore all telemetry related to traffic generated by synthetic monitoring or search bots. This way, your metrics will reflect true user interaction with the service.

Application Insights SDK

In order to demonstrate these reduction techniques, it’s important to understand how the Application Insights SDK processes telemetry. It can be logically grouped into four stages, as shown in Figure 2.

Figure 2 How the Application Insights SDK Processes Telemetry

Data collection is implemented as a set of telemetry modules, each responsible for particular data sets. For example, there’s a telemetry module to collect dependency, exceptions, performance counters and so on.

During telemetry enrichment, each item is augmented with useful telemetry. For example, the Application Insights SDK will automatically add the server name as one of the properties for each telemetry item. There are sets of predefined telemetry initializers; however, developers can add any number of additional initializers to include properties that help with monitoring, troubleshooting and analytical processes. For example, for geo-distributed services, you might want to add geolocation to analyze traffic processed by each datacenter separately. Essentially, during this step you increase the payload of the telemetry items.

The telemetry processing pipeline is the place where you define logic for reducing the amount of telemetry sent to the service. The Application Insights SDK provides sampling telemetry processors to automatically reduce collected telemetry data without compromising statistical accuracy.

Telemetry transmission is a final step of the telemetry processing where all telemetry data processed by an application is queued, batched, zipped, and periodically sent to one or more destinations. The Application Insights SDK supports transmission to the Application Insights Service and other channels, such as Event Hub, out of the box.

In this article, we concentrate on techniques available to the developer to configure out-of-the-box sampling and additional telemetry processors to fine-tune data collection to service needs. All examples in this article build the monitoring configuration in code from scratch. However, in many production environments, most of the mentioned parameters are exposed as configuration settings that can be fine-tuned without the application recompilation.

Metrics Aggregation

Before going further, we want to discuss telemetry type concepts. Generally speaking, you can split all telemetry into two buckets—metrics and events.

A metric is defined as a time-series data, pre-aggregated over specified intervals. For example, say you want to count the number of invocations of a function. This is a simple metric that gets incremented each time when call to function occurs. The value of the metrics itself gets aggregated over a period of time—for example, one minute—and at the end of that time is sent out.

An event is a single record of an occurrence that’s sent out every time. In many cases, events have very specific structure or type. In the example of Application Insights, the domain model event of a type Request has different properties than event of a type Exception. Going back to the previous example, in case you want to capture every function execution, you might send event with function name and function parameters every time it gets executed. These events let you answer all sorts of questions about function execution. For example, with raw event telemetry, you can calculate how many times this function has been called with a particular parameter value. Notice that with more data fidelity in addition to simple analysis, such as count of function execution, you can now analyze count of execution grouped by function parameter.

While raw telemetry is much richer and lets you provide better insights, there’s a drawback related to the processing and storage costs associated with that. One way to address this is to create as many metrics up front as you think you’ll need to analyze your application. The problem with this approach is that your business is far more dynamic and it’s not always possible to know what metrics you might need in advance. The Application Insights SDK addresses this issue by providing a balance between aggregated and raw telemetry—it aggregates key application performance indicators and sends sampled raw application telemetry. This sampling approach lets the SDK minimize the overhead of raw data collection and increases the ROI of the collected data.

Sampling

There are two out-of-the-box sampling telemetry processors provided by the Application Insights SDK—fixed sampling and adaptive sampling.

Fixed-rate sampling reduces the volume of per-node telemetry. For example, you might want to collect only 20 percent of all telemetry from each node.

Adaptive sampling automatically adjusts the volume of per-node telemetry. For example, you might want to decrease collection when the load is greater than 5 eps/node.

Note: There’s, also ingestion sampling that discards telemetry, which arrives from your app at the service ingestion endpoint. We aren’t going to cover this technique in this article, but documentation can be found at docs.microsoft.com/en-us/azure/application-insights/app-insights-sampling.

Both sampling telemetry processors use a common algorithm that lets you mix and match those processors without affecting statistical accuracy of the data. In order to decide if the telemetry item has to be sampled in or out, the SDK uses a stateless hash function and compares returned value with configuration ratio. It means that regardless of which thread, process or server processes the data, telemetry with a hash value below threshold will be consistently sampled in. In simplified form you can codify this algorithm like this:

If exist(UserID): Hash(UserID) = (returns value [0..100])
ElseIf exist(OperationID): Hash(OperationID) (returns value [0..100])
Else: Random [0..100]

As you can see from here, as long as UserID or OperationID is shared among all related telemetry items, it all will have the same hash value and consistently be sampled in or out. Application Insights by default enables adaptive sampling when collecting data. All telemetry stored in the services has a column called itemCount. It represents the sampling ratio at the moment of data collection. Because the hash calculation algorithm is stateless, this number doesn’t represent the actual number of sampled out telemetry, it only tells the statistical ratio of telemetry sampled in. To quickly analyze if your telemetry has been sampled, you can execute the following analytical query to compare the number of records stored with the number of records processed by the service:

requests | summarize sum(itemCount),
     count()

If you see the difference between those two numbers, then sampling has been enabled and your data set has been reduced.

Data Reduction in Action

Let’s review all these techniques. We’re going to use a console application to highlight main concepts. The application pings bing.com in a loop and stores telemetry data in Application Insights. It treats each loop execution as a request telemetry, automatically collects dependency data and correlates all telemetry to the appropriate “request” to which it belongs.

To initialize the Application Insights SDK, you need to perform three simple steps. First, you have to initialize configuration with the Instrumentation Key. Instrumentation Key is an identifier used to associate telemetry data with the Application Insights resource and can be obtained in the Azure Portal when creating it:

// Set Instrumentation Key
var configuration = new TelemetryConfiguration();
configuration.InstrumentationKey = "fb8a0b03-235a-4b52-b491-307e9fd6b209";

Next, you need to initialize the Dependency Tracking module to automatically collect dependency information:

// Automatically collect dependency calls
var dependencies = new DependencyTrackingTelemetryModule();
dependencies.Initialize(configuration);

Last, you have to add Telemetry Initializer that adds common correlation id to all related telemetry:

// Automatically correlate all telemetry data with request
configuration.TelemetryInitializers.Add(new                        
  OperationCorrelationTelemetryInitializer());

At this point, the Application Insights SDK is fully initialized and you can access all APIs via TelemetryClient object and code the main loop, as shown in Figure 3.

Figure 3 Typical Code Instrumentation for the Monitoring of Loop Processing

var client = new TelemetryClient(configuration);
var iteration = 0;
var http = new HttpClient();
while (!token.IsCancellationRequested)
{
  using (var operation = client.StartOperation<RequestTelemetry>("Process item"))
  {
    client.TrackEvent("IterationStarted",
      properties: new Dictionary<string, string>(){{"iteration",
      iteration.ToString()}});
    client.TrackTrace($"Iteration {iteration} started", SeverityLevel.Information);
    try
    {
      await http.GetStringAsync("https://bing.com");
    }
    catch (Exception exc)
    {
      // This call will not throw
      client.TrackException(exc);
      operation.Telemetry.Success = false;
    }
    client.StopOperation(operation);
    Console.WriteLine($"Iteration {iteration}. Elapsed time:
      {operation.Telemetry.Duration}");
    iteration++;
  }
}

When you execute this application, you’ll see the screen shown in Figure 4 in a console window.

Figure 4 Loop Processing Telemetry Output—Iteration Number and the Duration of Every Cycle

All the telemetry will be sent to the cloud and can be accessed using the Azure Portal. During the development, it’s easier to analyze telemetry in Visual Studio. So if you run the code in Visual Studio under debugger, you’ll see telemetry right away in the Application Insights Search tab. It will look like what’s shown in Figure 5.

Figure 5 Loop Processing Telemetry Output in Application Insights

Analyzing this log, for each request you can see trace, event and dependency telemetry with the same operation id. At this point, you have an app that sends various Application Insights telemetry types, automatically collects dependency calls and correlates them all to the appropriate requests. Now, let’s reduce telemetry volume utilizing out-of-the-box sampling telemetry processors.

As previously stated, the Application Insights SDK defines the telemetry processing pipeline that’s used to reduce the amount of telemetry sent to the portal. All collected telemetry enters the pipeline and every telemetry processor decides whether to pass it further along. As you’ll see, configuring of sampling with the out-of-the box telemetry processors is as easy as registering them in the pipeline and requires just a couple lines of code. But in order to demonstrate the effect of those processors, we’ll slightly modify the program and introduce a helper class to showcase the reduction ratio.

Let’s build the Telemetry Processor that’ll calculate the size of the telemetry items going through, as shown in Figure 6.

Figure 6 Telemetry Processor for Item Size Calculation

internal class SizeCalculatorTelemetryProcessor : ITelemetryProcessor
{
  private ITelemetryProcessor next;
  private Action<int> onAddSize;
  public SizeCalculatorTelemetryProcessor(ITelemetryProcessor next,
    Action<int> onAddSize)
  {
    this.next = next;
    this.onAddSize = onAddSize;
  }
  public void Process(ITelemetry item)
  {
    try
    {
      item.Sanitize();
      byte[] content =
        JsonSerializer.Serialize(new List<ITelemetry>() { item }, false);
      int size = content.Length;
      string json = Encoding.Default.GetString(content);
      this.onAddSize(size);
    }
    finally
    {
      this.next.Process(item);
    }
  }
}

Now you’re ready to build the telemetry processing pipeline. It will consist of four telemetry processors. The first one will calculate the size and count of telemetry sent into the pipeline. Then, you’ll use the fixed sampling telemetry processor to sample only 10 percent of dependency calls (in this case, ping to bing.com). In addition, you’ll enable adaptive sampling to all telemetry types, except Events. It means that all events will be collected. The last telemetry processor will calculate the size and count of the telemetry items that’ll be sent to the channel for subsequent transmission to the service, as shown in Figure 7.

Figure 7 Building a Telemetry Processing Chain with Items Sampling

// Initialize state for the telemetry size calculation
var collectedItems = 0;
var sentItems = 0;
// Build telemetry processing pipeline
configuration.TelemetryProcessorChainBuilder
  // This telemetry processor will be executed
  // first for all telemetry items to calculate the size and # of items
  .Use((next) => { return new SizeCalculatorTelemetryProcessor(next,
    size => Interlocked.Add(ref collectedItems, size)); })
  // This is a standard fixed sampling processor that'll let only 10%
  .Use((next) =>
  {
    return new SamplingTelemetryProcessor(next)
    {
      IncludedTypes = "Dependency",
      SamplingPercentage = 10,
    };
  })
  // This is a standard adaptive sampling telemetry processor
  // that will sample in/out any telemetry item it receives
  .Use((next) =>
  {
    return new AdaptiveSamplingTelemetryProcessor(next)
    {
      ExcludedTypes = "Event", // Exclude custom events from being sampled
      MaxTelemetryItemsPerSecond = 1, // Default: 5 calls/sec
      SamplingPercentageIncreaseTimeout =
        TimeSpan.FromSeconds(1), // Default: 2 min
      SamplingPercentageDecreaseTimeout =
        TimeSpan.FromSeconds(1), // Default: 30 sec
      EvaluationInterval = TimeSpan.FromSeconds(1), // Default: 15 sec
      InitialSamplingPercentage = 25, // Default: 100%
    };
  })
  // This telemetry processor will be executed ONLY when telemetry is sampled in
  .Use((next) => { return new SizeCalculatorTelemetryProcessor(next,
    size => Interlocked.Add(ref sentItems, size)); })
  .Build();

Finally, you’ll slightly modify the console output to see the collected and sent telemetry and the ratio for the reduction:

Console.WriteLine($"Iteration {iteration}. " +
  $"Elapsed time: {operation.Telemetry.Duration}. " +
  $"Collected Telemetry: {collectedItems}. " +
  $"Sent Telemetry: {sentItems}. " +
  $"Ratio: {1.0 * collectedItems / sentItems}");

When executing the app you can see that the ratio may be as high as three times!

Now, if you go to the Application Insights Analytics page and execute the query mentioned here, you might see the stats shown in Figure 8, proving that sampling worked. You see only a few requests representing many telemetry items.

Figure 8 Number of Telemetry Items in Application Insights and Estimated Number of Originally Collected Items

Exemplification and Filtering

So far we’ve talked about sampling and you’ve learned how to build a custom telemetry processing pipeline and simple telemetry processor. With this knowledge, you can explore two other techniques—filtering and exemplification. We made a couple of examples to showcase what you can do.

First, let’s take a look at the exemplification. Let’s say your application is dependent on a third-party service and it guarantees a certain performance SLA for processing requests. With the existing approach, you can collect samples of dependency calls. But what if you want to collect all evidences where that service was out of compliance with its SLA? For this demo purpose, we’ve created an exemplification telemetry processor that collects all dependency calls that are out of compliance with 100 ms SLA, as shown in Figure 9.

Figure 9 Marking Slow Dependency Calls for Collection and Exempt Them from Sampling

internal class DependencyExampleTelemetryProcessor : ITelemetryProcessor
{
  private ITelemetryProcessor next;
  public DependencyExampleTelemetryProcessor(ITelemetryProcessor next)
  {
    this.next = next;
  }
  public void Process(ITelemetry item)
  {
    // Check telemetry type
    if (item is DependencyTelemetry)
    {
      var r = item as DependencyTelemetry;
      if (r.Duration > TimeSpan.FromMilliseconds(100))
      {
        // If dependency duration > 100 ms then "sample in"
        // this telemetry by setting sampling percentage to 100
        ((ISupportSampling)item).SamplingPercentage = 100;
      }
    }
    // Continue with the next telemetry processor
    this.next.Process(item);
  }
}

Unlike exemplification, which, in fact, increases the volume of the collected telemetry for the purpose of more precise data fidelity, filtering is more radical as it drops telemetry items on the floor, making them completely invisible to the service. For demo purposes, we’ve created an exemplification telemetry processor that drops all dependency calls that are faster than 100 ms, as shown in Figure 10.

Figure 10 Filtering of Fast Dependency Calls Telemetry

internal class DependencyFilteringTelemetryProcessor : ITelemetryProcessor
{
  private readonly ITelemetryProcessor next;
  public DependencyFilteringTelemetryProcessor(ITelemetryProcessor next)
  {
    this.next = next;
  }
  public void Process(ITelemetry item)
  {
    // Check telemetry type
    if (item is DependencyTelemetry)
    {
      var d = item as DependencyTelemetry;
      if (d.Duration < TimeSpan.FromMilliseconds(100))
      {
        // If dependency duration > 100 ms then stop telemetry
        // processing and return from the pipeline
        return;
      }
    }
    this.next.Process(item);
  }
}

Telemetry filtering is effective to reduce the amount of telemetry and increase its quality. When you know that the telemetry item isn’t actionable, you don’t want to see it in analytics. Using the telemetry processor in the previous example, you’ll only see dependency calls faster than 100 ms. So if you try to calculate the average duration of the dependency processing based on dependency record, you’ll get incorrect results.

Let’s try to address this by locally aggregating dependency call telemetry and sending “true” metrics to the service. To do so, we’re going to use a new metrics API and modify the telemetry processor to expose metrics before dropping telemetry, as shown in Figure 11.

Figure 11 Filtering of Fast Dependency Calls Telemetry with Metrics Pre-Aggregation

internal class DependencyFilteringWithMetricsTelemetryProcessor
                                        : ITelemetryProcessor, IDisposable
{
  private readonly ITelemetryProcessor next;
  private readonly ConcurrentDictionary<string, Tuple<Metric, Metric>> metrics
    = new ConcurrentDictionary<string, Tuple<Metric, Metric>>();
  private readonly MetricManager manager;
  public DependencyFilteringWithMetricsTelemetryProcessor(
    ITelemetryProcessor next, TelemetryConfiguration configuration)
  {
    this.next = next;
    this.manager = new MetricManager(new TelemetryClient(configuration));
  }
  public void Process(ITelemetry item)
  {
    // Check telemetry type
    if (item is DependencyTelemetry)
    {
      var d = item as DependencyTelemetry;
      // Increment counters
      var metrics = this.metrics.GetOrAdd(d.Type, (type) =>
      {
        var dimensions = new Dictionary<string, string> { { "type", type } };
        var numberOfDependencies =
          this.manager.CreateMetric("# of dependencies", dimensions);
        var dependenciesDuration =
           this.manager.CreateMetric("dependencies duration (ms)", dimensions);
        return new Tuple<Metric, Metric>(
          numberOfDependencies, dependenciesDuration);
      });
      // Increment values of the metrics in memory
      metrics.Item1.Track(1);
      metrics.Item2.Track(d.Duration.TotalMilliseconds);
      if (d.Duration < TimeSpan.FromMilliseconds(100))
      {
        // If dependency duration > 100 ms then stop telemetry
        // processing and return from the pipeline
        return;
      }
    }
    this.next.Process(item);
  }
  public void Dispose()
  {
    this.manager.Dispose();
  }
}

As you can see, we’re creating two metrics—“# of dependencies” and “dependencies duration (ms)”—with dimensionality of a dependency type. In our case, all dependency calls are tagged with HTTP type. If you go to Analytics, you can see the information collected for your custom metrics, as shown in Figure 12.

Figure 12 Pre-Aggregated Metric Collected by Application Insights

This example lets you calculate the total number of calls and duration your app is spending while calling to dependencies. Name contains the name of the metrics, that is, dependency duration (ms); value is the sum of all http calls to bing.com; and customDimensions contains a custom dimension called type with value HTTP. There were a total of 246 calls to the Track API call; however, only one record was stored per minute for each metric. Both processing efficiency and cost are strong cases to expose app telemetry using the MetricsManager API. The challenge with this approach is that you have to define all your metrics and dimensions up front. When it’s possible, it’s a recommended way; however, in some cases, it’s either not possible or the cardinality of the dimension is too high. In such cases, relying on sampled raw telemetry is the reasonable compromise between accuracy and telemetry volume.

Wrapping Up

Controlling the volume of monitoring telemetry is an important aspect of making good return on your monitoring investments. Over collecting will cost you too much; under collecting will prevent you from being able to effectively detect, triage and diagnose your production issues. This article discussed techniques that help you manage the data collection footprint using the Application Insights SDK and its extensibility model. Using data reduction techniques, such as sampling, filtering, metrics aggregation, and exemplification, it was demonstrated how to significantly decrease the volume of data while preserving monitoring accuracy, analytical correctness, and diagnostics depth.

The Application Insights approach is being adopted by many new Microsoft services and frameworks, such as Azure Functions and Service Fabric (this support will be announced at this year’s Microsoft Build 2017 conference) and community through OSS contribution on GitHub. In addition to the .NET Framework, there are other SDKs available, including JavaScript, Java and Node.js (Node.js Application Insights SDK improvements like better telemetry collection and correlation, as well as easier enablement in Azure, will be announced at Build 2017). Through a consistent, extensible, cross-platform data collection SDK, you can take control and “manage” your telemetry across your heterogeneous application environment.

Victor Mushkatin is a group program manager on the Application Insights team. His main area of expertise is application performance monitoring technologies and DevOps practices. Reach him at victormu@microsoft.com.

Sergey Kanzhelev is a principal software developer on the Application Insights team. His career has been entirely dedicated to application monitoring and diagnostics. He’s passionate about connecting with customers, as well as an avid blog author and GitHub contributor. Reach him at sergkanz@microsoft.com.

Thanks to the following technical experts for reviewing this article: Mario Hewardt, Vance Morrison and Mark Simms
Mario Hewardt is a Principal Field Engineer in Microsoft and author of Advanced Windows Debugging and Advanced .NET Debugging. With over 17 years at Microsoft, he has worked with the development of Windows starting from Windows 98 up to Windows Vista. With the advent of cloud computing, Mario has worked in the SaaS arena and delivered the Asset Inventory Service as well as leading a team of developers building the core platform for the next generation Microsoft online management service – Windows Intune. Mario has also worked closely with enterprise customers as a Dedicated Developer Premier Field Engineer helping ensure that our customers build their solutions on the Microsoft stack in the most efficient and reliable way possible.

Mark Simms is an architect on the Microsoft Azure engineering team, working with customers from Fortune 50 to startups, building out the largest and most challenging applications and services on Azure. Prior to joining Microsoft, Mark worked on a broad range of software project and platforms, from embedded digital design, seismic visualization optimization, through live site operations for a SaaS platform.

Vance Morrison is the Performance Architect for the .NET Runtime at Microsoft. He spends his time either making various aspects of the runtime faster or teaching other how to avoid performance pitfalls using .NET. He has been an involved in designs of components of the .NET runtime since its inception. Previously he drove the design of the .NET Intermediate Language (IL) and has been the Development lead for the just in time compiler for the runtime. View his in MSDN Magazine

Discuss this article in the MSDN Magazine forum