June 2010

Volume 25 Number 06

Cloud Diagnostics - Take Control of Logging and Tracing in Windows Azure

By Mike Kelly | June 2010

Like many programmers, when I first started writing code I used print statements for debugging. I didn’t know how to use a debugger and the print statements were a crude but effective way to see what my program was doing as it ran. Later, I learned to use a real debugger and dropped the print statements as a debugging tool.

Fast forward to when I started writing code that runs on servers. I found that those print statements now go under the more sophisticated heading of “logging and tracing,” essential techniques for any server application programmer.

Even if you could attach a debugger to a production server application—which often isn’t possible due to security restrictions on the machines hosting the application—the types of issues server applications run into aren’t easily revealed with traditional debuggers. Many server applications are distributed, running on multiple machines, and debugging what’s happening on one machine isn’t always enough to diagnose real-world problems.

Moreover, server applications often run under the control of an operations staff that wouldn’t know how to use a traditional debugger—and calling in a developer for every problem isn’t desirable or practical.

In this article I’ll explain some basic logging, tracing and debugging techniques used for server applications. Then I’ll dive into how these techniques can be employed for your Windows Azure projects. Along the way you’ll see how logging and tracing are used with some real-world applications, and I’ll show you how to use Windows PowerShell to manage diagnostics for a running service.

A Logging Strategy

Ideally, any server application—and basically any Web application, including those running under Windows Azure—has a logging and tracing strategy designed in from the beginning. The logging information should be robust enough to describe nearly everything that’s happening within each component. However, just as those print statements I added to my first programs could produce a lot of output, so too can logging. Well-designed logging and tracing thus includes ways of adjusting the type and volume of logging from any component. This allows operational staff and developers to focus on a particular misbehaving component, perhaps even on a particular machine, to get much more information on exactly 
what’s happening inside it—without generating a lot of noise in the log that can be distracting and perhaps slow down the application significantly.

Furthermore, because server applications are commonly distributed applications, information must be collected and aggregated from multiple machines (perhaps in different application roles) to get a full picture of what was going on when a particular problem occurred. So a way to identify a transaction thread through the machines is important, allowing the aggregation after the fact.

The logging available in Windows Azure has matured through the Community Technology Preview (CTP) releases. The early logging wasn’t much more sophisticated than a print statement, captured as text in Windows Azure table storage. Starting with the PDC09 release, Windows Azure began to offer a much more full-featured logging and tracing infrastructure, based on the Event Tracing for Windows (ETW) framework.

This ETW framework is supported in ASP.NET through classes in the System.Diagnostics namespace. The Microsoft.WindowsAzure.Diagnostics namespace, which inherits from and extends standard System.Diagnostics classes, enables the use of System.Diagnostics as a logging framework in the Windows Azure environment. Figure 1 shows how ETW is implemented by Windows Azure Diagnostics.

Figure 1 High-Level Overview of Windows Azure Diagnostics
Figure 1 High-Level Overview of Windows Azure Diagnostics

ETW provides a model in which code logs to one or more TraceSources. The level of logging allowed through each source is controlled by a SourceSwitch. Sources are in turn connected to one or more consumers, which persist the logging information in different ways.

Windows Azure provides a standard consumer or listener to persist the logging information you generate either to Windows Azure table storage or to blob storage. You also can write your own consumer if you want to do something different with the event data, or use an off-the-shelf consumer (although some might have to be modified to work in the Windows Azure environment).

The ETW framework associates a TraceEventType with each event, as shown in Figure 2. The first five severity rows are the values most commonly used and they indicate the relative importance of the trace output. Note that the Suspend, Resume and Transfer types are used by Windows Communication Foundation (WCF).

Figure 2 Trace Event Types

TraceEventType Value Meaning
Critical 0x0001 Fatal error or application crash
Error 0x0002 Recoverable error
Warning 0x0004 Non-critical problem—may be an indication of more serious problems to come
Information 0x0008 Informational message
Verbose 0x0010 Debugging trace (such as detailed execution flow information, parameters, and so forth)
Start 0x0100 Starting of a logical operation
Stop 0x0200 Stopping of a logical operation
Suspend 0x0400 Suspension of a logical operation
Resume 0x0800 Resuming a logical operation
Transfer 0x1000 Transfer to a new activity

If you’re looking only for really bad things, you’ll want to be sure to capture Critical and probably Error values. If you want lots of information about what’s going on, look at everything above Verbose.

Your logging strategy should include consistent use of the event type and copious logging entries with the values further down the hierarchy. It should be possible to virtually follow the execution flow of your application if logging for all the highlighted values is enabled in your application. This can be invaluable in troubleshooting an error or problem in production.

You can hook up listeners, sources and switches to enable different levels of output programmatically, but this is typically done through configuration files. You can configure the output in app.config (for Windows Azure worker roles) or web.config (for Windows Azure Web roles). However, as I’ll explain in detail later in the article, putting this in ServiceConfiguration.cscfg lets you adjust logging and tracing options while the Windows Azure service is running, without having to redeploy any updates to the running code or to even stop the service. Windows Azure also exposes a RESTful interface to allow control over some logging options remotely. Windows PowerShell can make use of the RESTful interface.

Logging, Tracing and Debug Output

The terms logging, tracing and debug output can sometimes be used interchangeably. In this article, I’m going to distinguish among four different types of what can generally be called diagnosticoutput in your code. These are roughly ordered from most verbose to least verbose.

  • Debug Output: This is information that appears only in the debug builds of your application and is excluded at compile time from release builds (based on whether the DEBUG preprocessor symbol is defined at compile time, which Visual Studio by default defines only in debug builds). Typically, debug output includes things like Asserts that you add to help find cases where code is not complying with expected preconditions, leading to bugs, or even dumps of data structures. Adding these helps you debug algorithms during debugging and testing.
  • Tracing: These are statements that are intended to help track the flow of control and state of the program as it’s executing. 
Imagine running a debugger, stepping along through code and checking the values of key variables in the Watch window. Tracing statements are intended to replicate that experience in cases where you can’t attach a debugger. They should ideally provide enough context that you can see which path is taken at each control point in the application and sort of follow along in the code from reading the trace statements. Tracing is enabled when the TRACE preprocessor symbol is defined at compile time, and can be in both release and debug builds. (By default, Visual Studio defines TRACE in both debug and release builds, but you can of course change this.)
  • Event Logging: These are statements that are intended to capture major events in the course of running the application—for instance, the start of a transaction or the addition of an item to a database. Event logging is different from tracing in that it captures major states rather than detailed flow of control.
  • Error Logging: These are special cases of event logging in which you capture exceptional or potentially dangerous situations. Examples include any caught exception; cases where you can’t access a resource on another machine you expect to be able to access (and which, of course, your application will gracefully handle but would like to note); and cases where errors come back from APIs from which you don’t expect errors.

Error logging can also be useful to operations staff in situations where problems aren’t yet occurring, but indications are that they soon will—for instance, a quota that is nearing its maximum or a transaction that is succeeding but taking more time than usual. These sorts of logging events can help operations staff proactively address problems before they occur, avoiding downtime in the application.

Most good developers have gotten used to including debug output in their applications to help them diagnose problems during development, and many have developed some sort of solution for error logging.

However, you need to be sure you’re considering not just the debug output and error logging options, but also have a robust strategy for tracing and event logging, which can really help diagnose problems that occur only under stressful loads in production environments.

Also, carefully consider whether much of what you now think of as debug output shouldn’t instead be tracing and available in both release and debug builds. A misbehaving application in production will typically be running the release build. If you have the tracing statements present but disabled (as I’ll explain how to do later), you can selectively enable them to get very rich debug-like output from the release build, helping you diagnose problems.

Tracing and Logging in Windows Azure

You can use Windows Azure logging right out of the box—it’s part of the Windows Azure SDK. There are some advantages to using a logging framework like Logger.NET, Enterprise Library, log4net or Ukadc.Diagnostics. These add additional structure to your logging messages and also can help provide some of the configurability mentioned earlier. However, most have not been tweaked to work in the Windows Azure environment and some are much more than just a logging framework.

For the sample code for this article, I decided to use just the standard Windows Azure logging and tracing APIs with a thin layer on top to provide dynamic configuration. You will probably find building some helper classes and a framework for your logging and tracing strategy on top of this helpful, or watch for the other frameworks to provide Windows Azure versions.

When you create a new service in Windows Azure using Visual Studio, it’s already enabled to do basic logging. The boilerplate Worker Role and Web Role code generated by the Windows Azure templates has the diagnostics listener already configured and enabled.

To enable simple logging in a Windows Azure service, start a new project in Visual Studio using the Windows Azure Cloud Service (Visual C#) template. Give the project a name. For my sample I used MSDNSampleLoggingService. Click OK.

The New Cloud Service Project wizard will run. In the wizard, add one Worker Role and one Web Role to the project. Rename the Worker Role to LoggingWorkerRole and the Web Role to LoggingWebRole, then click OK. Visual Studio will create the project.

At this point, you can explore the generated code. Look at the app.config in the LoggingWorkerRole project and note that the following code appears:

<?xml version="1.0" encoding="utf-8" ?>
<configuration>
  <system.diagnostics>
    <trace autoflush="false" indentsize="4">
      <listeners>
        <add name="AzureDiagnostics"
        type="Microsoft.WindowsAzure.Diagnostics.DiagnosticMonitorTraceListener,
        Microsoft.WindowsAzure.Diagnostics, Version=1.0.0.0, Culture=neutral,
        PublicKeyToken=31bf3856ad364e35" />
      </listeners>
    </trace>
  </system.diagnostics>
</configuration>

This hooks up the standard Windows Azure diagnostics listener to your code, meaning that any logging and tracing you do from the worker role will be directed to the Windows Azure listener (DiagnosticMonitorTraceListener) unless you change this. You’ll find a similar entry in the web.config for the Web Role created for this service.

If you look at the WorkerRole.cs file in the worker role project, you’ll also see this line in the OnStart method:

DiagnosticMonitor.Start("DiagnosticsConnectionString");

And in the Run method, you’ll see a call to trace:

// This is a sample worker implementation. Replace with your logic.
Trace.WriteLine("LoggingWorkerRole entry point called", "Information");

Finally, if you look in the ServiceConfiguration.cscfg file in the Service root, you’ll see this line for both the Worker Role and the Web Role:

<Setting name="DiagnosticsConnectionString" 
         value="UseDevelopmentStorage=true" />

This tells the Windows Azure listener which storage account to use for persisting the logging and tracing information. In this case, the logging information will be stored in development storage on your local machine. By switching this to a Windows Azure cloud-storage account, you can have the logs go to cloud storage. Here is an example of the format for that from the sample code provided with this article:

<Setting name="DiagnosticsConnectionString" 
  value="DefaultEndpointsProtocol=https;AccountName=Xxxxxx;AccountKey=Yyyyyy" />

The AccountName and AccountKey values need to be customized to your particular Azure account and key. You get this information from the storage account portal for your service at windows.azure.com. AccountName is the first portion of the URL for the table and blob storage endpoints (the part before “.table.core.windows.net”). AccountKey is the base-64 encoded Primary Access Key for your storage account.

Note that because there is a distinct storage account used for diagnostics, you can choose to store your diagnostics information separate from your other application data. To do this, set up a separate storage account by clicking New Service on the portal page, selecting Storage Account, then giving it a name (MyAppLogs, for instance). You might want to set up an affinity group so the storage for your logs is in the same region as your service.

Now that you’ve taken a quick tour of the tracing code in Windows Azure services, you can run the simple Web Role project you’ve created. Note that the default listener provided by Windows Azure also directs the output to the Output window in Visual Studio for debug builds, so you’ll see the OnStart message appear in the debug window:

Information: LoggingWorkerRole entry point called

What if you want to look at logs after the service runs? By default, Windows Azure does not persist the logs to storage. You can tell it to do so by adding a few lines of code to your role’s OnStart method:

TimeSpan tsOneMinute = TimeSpan.FromMinutes(1);
DiagnosticMonitorConfiguration dmc =
DiagnosticMonitor.GetDefaultInitialConfiguration();
// Transfer logs to storage every minute
dmc.Logs.ScheduledTransferPeriod = tsOneMinute;
// Transfer verbose, critical, etc. logs
dmc.Logs.ScheduledTransferLogLevelFilter = LogLevel.Verbose;    
// Start up the diagnostic manager with the given configuration
DiagnosticMonitor.Start("DiagnosticsConnectionString", dmc);

Adding this code to WorkerRole.cs and rerunning will cause Windows Azure to transfer the logs to development storage every minute. You also can choose to do an on-demand transfer of logs (see the code in admin.aspx.cs in my sample app for how to do this) or use the Windows PowerShell commands described later in this article. Remember that once you transfer logs to storage, you’ll be charged for the storage space and it’s up to you to delete the information when it’s no longer needed.

Once you’ve gotten the logs into Windows Azure storage, you’ll need a tool to look at the storage tables to see the logs. I used Cerebrata’s Cloud Storage Studio (cerebrata.com). Cerebrata has since come out with a tool called Azure Diagnostics Manager, and there are also free tools available on CodePlex (codeplex.com) for looking at cloud storage and diagnostics. The logs are put in a table called WADLogsTable, which you can see in Figure 3.

Figure 3 Logs Persisted in Development Storage
Figure 3 Logs Persisted in Development Storage

 

You’ll notice a couple of things when you look at the logs in storage. First, Windows Azure automatically associates some information with each logged event: a timestamp, a tick count (which provides more detailed timing with 100-nanosecond granularity), and information about the deployment, role and role instance. This allows you to narrow down logs to specific instances if you want.

Second, there’s a Level and an EventId associated with each event. Level corresponds to the values in Figure 2—those Trace events logged as Information will have a Level value of 4, while those logged as Error will have a Level of 2. Generic events sent through Trace.WriteLine (as the boilerplate code does) will have Level 5 (Verbose).

The EventId is a value you specify. The basic Trace.WriteLine call shown earlier doesn’t let you specify it; you have to use other Trace methods to pass the EventId.

Selectively Enabling Tracing and Logging

A typical application consists of multiple logical components. For instance, you might have a database component that deals with the data model in Windows Azure storage. Your Web Role might in turn be divided into an administrative component and a user component (and that might even be divided further into logical components based on the needs of your application).

You can tie the logging and tracing options—what type of logging is enabled and at what level of detail—to these components. This allows you to selectively enable tracing in only the components for which you need it, avoiding a lot of clutter.

The key approach here is to not call Trace directly, but to use multiple TraceSource instances, typically one per namespace. A TraceSource has an associated SourceSwitch that controls whether the source is enabled, as well as what level of output is desired. Importantly, the SourceSwitch values are not set at compile time, but at run time. As a result, you can enable or disable diagnostic output from various parts of your application without having to recompile it, or even redeploy a different version.

WorkerDiagnostics.cs and WebDiagnostics.cs contain the configuration of the trace sources and switches in the sample code. Here’s an excerpt:

// Trace sources
public TraceSource ConfigTrace;
public TraceSource WorkerTrace;
// Add additional sources here
// Corresponding trace switches to control 
// level of output for each source
public SourceSwitch ConfigTraceSwitch { get; set; }
public SourceSwitch WorkerTraceSwitch { get; set; }
// Add additional switches 1:1 with trace sources here

Then, in the config file for your role, you hook these up to listeners as shown in Figure 4. This first sets up the standard Windows Azure diagnostics listener as a shared listener so it can be referred to in the <sources> items. It then configures two sources: a WorkerTrace source and a ConfigTrace source. It also sets up corresponding switches to enable you to adjust the level of output. ConfigTrace gives you the most verbose output; WorkerTrace gives you Errors only.

Figure 4 Configuring Trace Sources and Listeners

<configuration>
  <system.diagnostics>
    <sharedListeners>
      <add type="Microsoft.WindowsAzure.Diagnostics.DiagnosticMonitorTraceListener,
      Microsoft.WindowsAzure.Diagnostics, Version=1.0.0.0, Culture=neutral, 
      PublicKeyToken=31bf3856ad364e35"
        name="AzureDiagnostics">
        <filter type="" />
      </add>
    </sharedListeners>
    <sources>
      <source name="ConfigTrace">
        <listeners>
          <add name="AzureDiagnostics" />
        </listeners>
      </source>
      <source name="WorkerTrace">
        <listeners>
          <add name="AzureDiagnostics" />
        </listeners>
      </source>
    </sources>
    <switches>
      <add name="ConfigTrace" value="Verbose"/>
      <add name="WorkerTrace" value="Error"/>
    </switches>
    <trace>
      <listeners>
        <add name="AzureDiagnostics" />
      </listeners>
    </trace>
  </system.diagnostics>
</configuration>

You don’t have to name the switches the same as the sources, but it makes life easier. If they’re not the same, you add a switchName attribute to the source element to indicate the name of the switch that controls output for this source. This allows you to share a single switch across multiple trace sources. Note that the trace source and switch names are case-sensitive and must exactly match the case you pass to the constructor in your code.

You can avoid the switch altogether if you want by simply adding a switchValue attribute to the source element specifying the desired switch value for this source. The switch values you use are actually parsed from the config file as one of the SourceLevels defined in Figure 5, which also shows how the TraceEventType you pass to TraceSource calls interacts with the SourceLevel set for the source to determine what passes through.

Figure 5 Tracing Source Levels and TraceEventType

Figure 5 Tracing Source Levels and TraceEventType

You might have noticed that the SourceLevel is just a bitmask that is ANDed at run time with the TraceEventType to determine whether to log the event. To get combinations like Warning and Activity Tracing, specify the numeric value for the bitfield as the switch value rather than using the symbolic values shown.

In addition to switches, a listener can have a TraceFilter, which adds more sophisticated runtime logic as to whether a particular message is allowed through. Writing a custom TraceFilter is beyond the scope of this article, but you’ll find a helpful example in the Ukadc.Diagnostics project documentation on CodePlex (ukadcdiagnostics.codeplex.com/wikipage?title=LoggingPrimer).

Changing What Is Logged at Run Time

This is the default way that ASP.NET tracing works and it will work fine with services deployed to Windows Azure. The problem is that you’d like to be able to change the switch values at run time, and Windows Azure doesn’t allow you to just replace the web.config or app.config without redeploying the service. The generic ASP.NET solution for this is to use WebConfiguationManager to alter configuration values, but Windows Azure doesn’t currently allow you to do this for cloud-deployed services, either.

The solution is to mirror the values for these switches in ServiceConfiguration.cscfg. Windows Azure lets you edit that file (or upload a new one) through the development portal while your service is running. You’ll have to write some additional code to make this work, though.

The default System.Diagnostics code knows about settings only in app.config or web.config, but your roles will get run-time notification of changes in ServiceConfiguration.cscfg through the RoleEnvironmentChanging and RoleEnvironmentChanged events. You can then decide whether to recycle (restart) the role or simply update a configuration value. The latter is what you want for tracing switches. Restarting the role may make intermittent problems disappear. The sample code for this article shows how to do this by adding a couple of values to ServiceConfiguration.cscfg (note that you have to also edit ServiceDefinition.csdef, which provides the schema) and adding some code to your roles.

Testing on Development Fabric

What about testing on development fabric, where you don’t have the Windows Azure portal to edit the configuration as you do for cloud-deployed services? First, determine the deployment ID Windows Azure has assigned to your running development fabric service. You can see this by showing the development fabric UI from the system tray while the service is running. This will be a number like 177.

  1. Go to the directory where your service binaries have been put by build—typically \bin\debug or \bin\release under your service code. You’ll find the copy of ServiceConfiguration.cscfg that was created when you built the app.
  2. Next, using a text editor, edit this file to use the tracing switch you want. For instance, in the sample code, change WebTrace from Off to Verbose.
  3. Next, in a Windows Azure SDK command prompt (Start | All Programs | Windows Azure SDK v1.1 | Windows Azure SDK Command Prompt), run this command:
csrun /update:NNN;ServiceConfiguration.cscfg

NNN is the Windows Azure deployment ID you found earlier. This will do in development fabric what clicking the Configure button on the Windows Azure development portal does for cloud-deployed services—update the configuration settings and trigger the events.

Other Diagnostic Information

While this article has focused on tabular data that your application logs using System.Diagnostics, Windows Azure can also capture IIS logs and Failed Request Tracing (formerly known as Failed Request Buffering, or FREB) logs, among others. Some of these are placed into Windows Azure blob storage, some in Windows Azure table storage. Figure 6 lists the available logs and where they’re stored. Note that for those not enabled by default, you typically have to make a change in web.config or app.config for Windows Azure to collect these logs. Let’s drill into one example not collected by default to show how this works.  

Figure 6 Standard Azure Logging Options

Type of Log Windows Azure Storage Format Collected by Default? Notes
Windows Azure logs generated from your code Table 4 Trace listener must be added to web.config or app.config file as shown in the sample code. Stored in WADLogsTable.
IIS 7.0 logs Blob 4 Web Roles only. Stored in a Blob container under the path wad-iis-logfiles\<deployment ID>\<web role name>\<role instance>\W3SVC1.
Windows Diagnostic Infrastructure logs Table 4 Information about the diagnostics service itself. Stored in WADDiagnosticInfrastructureLogsTable.
Failed request logs Blob   Web Roles only. Enable by setting tracing options under system.WebServer settings in web.config. Stored in a blob container under the path wad-iis-failedreqlogfiles\<deployment ID>\<web role name>\<role instance>\W3SVC1.
Windows Event logs Table   Enable by altering DiagnosticMonitor Configuration.WindowsEventLog when setting up initial configuration. Stored in WADWindowsEventLogsTable. Steve Marx’s blog (blog.smarx.com/posts/capturing-filtered-windows-events-with-windows-azure-diagnostics) explains how to use this.
Performance counters Table   Enable by altering DiagnosticMonitor Configuration. PerformanceCounters. Stored in WADPerformanceCountersTable. The sample code Worker Role sets up a performance counter.
Crash dumps Blob   Enable by calling CrashDumps.EnableCollection. Stored in a blob container under the path wad-crash-dumps. Because ASP.NET handles most exceptions, this is generally useful only for a Worker Role.
Custom error logs Blob   Beyond the scope of this article, but see Neil Mackenzie’s blog (nmackenzie.spaces.live.com/blog/cns!B863FF075995D18A!537.entry) for a helpful example of how to use this.

As an example, let’s look at how to enable FREB logging from IIS on your Web Role. To see this in action, download the sample code for the MSDNSampleLoggingService provided with this article. Open web.config for the LoggingWebRole and find the section labeled <system.webServer>. Notice that the lines shown in Figure 7 have been added to the default Windows Azure web.config. This results in failure logging for any requests that take longer than 15 seconds or with status codes between 400 and 599 (the failureDefinitions element).

Figure 7 Failed Request Logging for LoggingWebRole

<tracing>
  <traceFailedRequests>
    <add path="*">
      <traceAreas>
        <add provider="ASP" verbosity="Verbose" />
        <add provider="ASPNET" 
             areas="Infrastructure,Module,Page,AppServices"
             verbosity="Verbose" />
        <add provider="ISAPI Extension" verbosity="Verbose" />
        <add provider="WWW Server"
             areas="Authentication,Security,Filter,StaticFile,CGI,Compression,Cache,RequestNotifications,Module" 
             verbosity="Verbose" />
      </traceAreas>
      <failureDefinitions timeTaken="00:00:15" statusCodes="400-599" />
    </add>
  </traceFailedRequests>
</tracing>

If you open about.aspx.cs in the LoggingWebRole project, you’ll note that in the PageLoad method I’ve added an arbitrary delay of 18 seconds with this line of code:

System.Threading.Thread.Sleep(18000);

This will force the load of this page to be considered a failed request under the definition specified earlier.

To see the FREB log, rebuild and deploy the app into the development fabric and then find the development fabric controller in the notification area of the taskbar (you may have to click the Show Hidden Icons button on the taskbar because it’s often hidden as inactive). Right-click it and select Show Development Fabric UI. While your application is running, this will show information about the application.

Expand the Web Role and right-click on the role instance (0). Select Open local store to open the folder on the local machine where logs are being saved (see Figure 8). Within that folder, the logs are in the \directory\DiagnosticStore folder. This is because the Web Role in the sample code is configured to store diagnostics information in development storage. If instead you set the DiagnosticsConnectionString to a cloud-storage account, the persisted logs will be in the blob storage associated with that storage account. You can use Cloud Storage Studio to look at the blob storage containers to see the logs.

Figure 8 Opening Logs Saved to Local Development Fabric Storage
Figure 8 Opening Logs Saved to Local Development Fabric Storage

Managing Diagnostics for a Running Service

While you may deeply instrument your code with logging, you typically don’t want all the logging information persisted into storage while your production service is running. You might want only error and critical information to go to the persisted logs, while more detailed information (logged as Verbose or Information) is suppressed.

But what if a problem occurs? You don’t want to redeploy a new version of your service or the problem might magically go away—you probably know how effective rebooting can be at making elusive problems disappear.

Instead, it’s more effective to increase the amount of information going to the logging tables or blob storage while allowing the misbehaving code to continue running. This is more likely to reveal the cause of the problem in your application while it’s currently operating.

Earlier I described how to fine-tune the details of what logging is passed through to Windows Azure diagnostics for a particular TraceSource. That’s a sort of upstream editing of what information gets logged. In this section, I’ll show you the general Windows Azure diagnostics settings that determine how information that passes through a TraceSource gets into persisted storage.

Windows PowerShell cmdlets can manage many aspects of your running Windows Azure services, including diagnostics. You run these from your local machine and they connect over the Internet to the Windows Azure cloud servers running your service, providing information and adjusting parameters. Windows PowerShell is installed with Windows 7 and can be downloaded for Windows Vista from microsoft.com/powershell. Download the Windows Azure Service Management CmdLets from code.msdn.microsoft.com/azurecmdlets and follow the directions for installing them. The Windows Azure diagnostics-related commands are shown in Figure 9.

Figure 9 Windows Azure Management Diagnostics Cmdlets

Name Description
Get-ActiveTransfers Returns the set of active diagnostic transfers with associated transfer information.
Get-CommonConfigurationLogs Gets the common configuration values for all logging buffers. This includes the time interval at which changes in configuration are polled for and the buffer size allocated for in-memory logs.
Get-DiagnosticAwareRoleInstances Returns a list of IDs of active role instances that have a diagnostic monitor running.
Get-DiagnosticAwareRoles Lists the set of roles that have successfully started at least one diagnostic monitor.
Get-DiagnosticConfiguration Gets the buffer configuration for the specified buffer name (Logs, Directories, PerformanceCounters, WindowsEventLogs or DiagnosticInfrastructureLogs).
Set-CommonConfigurationLogs Sets the common configuration values for all logging buffers.
Set-FileBasedLog Sets the buffer configuration for file-based logs.
Set-InfrastructureLog Sets the buffer configuration for the logs generated by the underlying Windows Azure diagnostics infrastructure. The diagnostic infrastructure logs are useful for troubleshooting the diagnostics system itself.
Set-PerformanceCounter Sets the buffer configuration for performance counter data being collected by your service.
Set-WindowsAzureLog Sets the buffer configuration for basic Windows Azure logs being generated by your service.
Set-WindowsEventLog Sets the buffer configuration for Windows event logs being generated by your service.
Start-OnDemandTransfer Starts an on-demand transfer of the specified data buffer. This moves the data to Windows Azure storage (either table or blob storage).
Stop-ActiveTransfer Stops active on-demand transfer, given a transfer ID.

For example, to find the current transfer parameters for a particular role instance, pass the deployment ID (from the Windows Azure developer portal where you deploy the service) and the storage account name and key you used for DiagnosticsConnectionString in the app.config or web.config for the service role (see Figure 10). Notice that Windows PowerShell prompts for a couple of missing required parameters—the role instance and the name of the buffer I want. Logs is the standard Windows Azure logs. The result shows that the filter level is Verbose and a transfer is scheduled every minute.

Figure 10 Diagnostics Configuration for a Running Service Using Windows PowerShell
Figure 10 Diagnostics Configuration for a Running Service Using Windows PowerShell

To change the configuration for this role, use the Set-DiagnosticConfiguration cmdlet, as shown in Figure 11. Note that I changed the transfer period from one minute to two minutes and the filter from Verbose to Error, meaning that only events logged as Error and Critical will be sent to the persisted storage.

Figure 11 Changing Diagnostics Configuration from Windows PowerShell
Figure 11 Changing Diagnostics Configuration from Windows PowerShell

Perhaps the most useful thing you can do remotely from Windows PowerShell is to force a transfer of log information immediately. Figure 12 shows how to do this.

Figure 12 Using Windows PowerShell to Initiate a Transfer of IIS Logs
Figure 12 Using Windows PowerShell to Initiate a Transfer of IIS Logs

First, I query for any existing on-demand transfer of logs. There is a restriction in the current Windows Azure diagnostics implementation that only one on-demand transfer of a particular type can take place at a time. Seeing that none is in progress, I request one, passing in the deployment ID of the service, the role and instance, the type of log I want transferred and the time interval of data to transfer. (For the log type, Directories means file-based logs, including the IIS logs. Logs would be the Windows Azure table-based logs sent through TraceSource.)

I also pass a notification queue name, which is where Windows Azure diagnostics will send a notification that the transfer has completed. Through experimentation, I found that if I didn’t pass a notification queue, the transfer seemed not to happen. I get back a GUID identifying the transfer request. I then query for the request status and see that it’s published, meaning it’s in progress.

The current version of the Windows Azure Service Management CmdLets doesn’t seem to show when the request has been completed, but if you query the blob storage for your diagnostics storage you’ll see the logs pop up in the Containers hierarchy shortly (or in Windows Azure table storage if you requested transfer of information stored in table storage).

Wrapping Up

Using a combination of adjusting configuration parameters for the TraceSources you define in your code and using the Windows Azure Service Management CmdLets to move information into persisted storage, you should be able to fully control what diagnostic output you get from your Windows Azure services.

Of course, the most important thing you can do to troubleshoot your production code is to develop a robust strategy for diagnostic output early in development, then follow that strategy throughout coding. The use of TraceSources and the tools that Windows Azure provides for configuring the verbosity of diagnostic output that flows from your application to storage will help you to tap into that to get just the amount of information you need when a problem arises.

There’s nothing worse than feeling like the code behaving erratically on a server is a black box, opaque to you. Solid diagnostics code and the tools described here will let you open the covers of that box and peer inside.  


Mike Kelly  is a consultant focused on software development and helping integrate acquisitions to larger corporations. He previously worked for 15 years at Microsoft in a variety of product development roles and as director of Emerging Practices on the Engineering Excellence team. He can be reached at himself@mikekellyconsulting.com.

Thanks to the following technical experts for reviewing this article:  Sumit Mehrotra, Michael Levin and Matthew Kerner from Microsoft, as well as Neil Mackenzie and Steven Nagy