September 2017

Volume 32 Number 9

[ASP.NET Core]

Snapshot Debugging for Production Apps and Services in Azure

By Nikhil Joglekar | September 2017

The mantra of DevOps is to “move fast and break things.” However, this requires you to constantly fix the things that break. In production environments, detecting and diagnosing an issue can be costly in terms of time and resources. Issues that only manifest in production problems are among the hardest to solve. How often have you run into a problem in production that worked just fine on your development box or had to sift through thousands of logs to try to understand what caused the issue?

Currently, there are several ways you can investigate reported site issues:

Try to reproduce the issue locally. A production environment is a lot more complex than a development environment: It’s often on a larger scale and houses real data in the production database. Getting a local reproduction of the issue might involve copying production data into a staging or testing environment. In the worst-case scenario, it might not be possible to reproduce an issue locally.

Sift through logs to try to find the cause of the issue. Logs will ideally contain details and context around the issue at the exact moment it happens. There’s ample tooling for collecting, aggregating and searching through logs, yet trying to find the exact details to correlate with the reported issue can be like looking for a needle in a haystack. Often, your app logs won’t contain the details necessary to identify the root cause of the problem. In such cases, you’ll have to add additional logs to your app and redeploy to get the details you need.

Call up the customer for more context. If logs and local reproduction prove fruitless, you might have to get directly in touch with the customer who ran into the issue. The customer might have interacted with the app in an unexpected or untested way.

Attach a real debugger as a last-ditch effort. Attaching a debugger into a production environment will let you inspect the state of the app, but setting and hitting a breakpoint will effectively stop the app from serving requests while you’re debugging.

These methods each have their pros and cons, yet even the most experienced developers following the loop can find diagnosing production issues time-consuming and difficult. In this article, I’ll introduce the Snapshot Debugger for Azure, a set of new tools in Visual Studio that can drastically reduce the time and pain in dealing with production issues.

Snapshot Debugger was designed from the ground up as a tool safe to debug production services running in Azure. It lets you debug live site issues with almost no impact to the service. Snapshot Debugging brings the ease of using a debugger to a cloud scale. There are three key functionalities in Snapshot Debugger:

Snappoints. Like a breakpoint you set in your app while debugging on your development box, Snappoints are placed on a line of code you specify. When your app hits a Snappoint, the Snapshot Debugger takes a snapshot of the app, letting you inspect the call stack, local variables and objects on the heap. Unlike breakpoints, the app doesn’t stop. Customers interacting with your app won’t realize that you’re currently debugging your app.

Logpoints. These let you insert additional log statements into a live production service in cases where you might not have log messages that detail a production issue. These new log statements are temporary and cause no side effects to your app.

Conditions. Only a fraction of the number of requests hitting a live app might be interesting or result in error. Conditional statements you specify apply to both Snappoints and Logpoints to ensure that you only get diagnostics information on interesting requests that might help identify the root cause of the problem.

Currently, the Snapshot Debugger is in public preview for ASP.NET and ASP.NET Core apps running on Azure App Services. You can download and try out the Snapshot Debugger at aka.ms/snappoint.

Debug Safely in Production with Snappoints

In the example in Figure 1, Visitors is an ASP.NET MVC Web site running on Azure App Services that registers and tracks visitors to Microsoft offices.

Visitors Is an ASP.NET MVC Web Site Running on Azure App Services
Figure 1 Visitors Is an ASP.NET MVC Web Site Running on Azure App Services

To set up this sample scenario: One of the registered visitors was seen exiting the Secret Project building, a building visitors should absolutely not have access to! You’ve frantically fired up your computer and started trying to figure out how the app gave the visitor access to the building. The first thing you did was look through the app logs for suspicious security access grants to any of the visitors. Unfortunately, you weren’t able to find any incriminating logs.

In this app, visitors are represented by a “Visitor” object, which has an internal SecurityCode property that lets them access certain buildings. The only way visitors can enter a building is if the SecurityCode property associated with them has granted them the right level of access. This property is only handled internally by the app, but it’s possible someone could somehow force in an unverified Security­Code. To investigate this further, the version of the source code that’s running in production in Visual Studio should be opened and the Snapshot Debugger started (this is done by right-clicking on Start Snapshot Debugger on an Azure App Service in the Cloud Explorer). Visual Studio will then connect to the production app running in Azure and enter a Snapshot Debugger session, as shown in Figure 2.

A Snapshot Debugger Session
Figure 2 A Snapshot Debugger Session

The Snapshot Debugger session functions a lot like a local debugging session on a development box. You can place Snappoints on a line you’re interested in debugging, instead of placing breakpoints like you do locally. You place Snappoints in the same gutter that you place breakpoints in a local debug session, but they’re inactive until you hit the Start/Update Collection button that turns Snappoints on in your production environment. Each Snappoint will only capture one snapshot by default: This snapshot reflects one request made to your app.

There are a few locations where the Visitors app updates visitors (and their SecurityCodes). The Update controller is one of those locations. You can investigate if there’s an Update request that sets a SecurityCode by placing a Snappoint inside the Update method and turning it on with the Start Collection button, as shown in Figure 3.

The Update Controller
Figure 3 The Update Controller

The Snapshot Debugger takes a snapshot when the app in production runs a line with a Snappoint. This snapshot is captured in about 10ms, after which the app continues to execute requests. This snapshot will show up in Visual Studio inside the Diagnostic Tools Window on the right side; you can double-click it to open the snapshot. Opening the snapshot will give you all kinds of debug details as to what happened at the point of time when the line of code where you placed the Snappoint runs. You can view the call stack and Locals in their respective tool windows. You can go to the Watch window and add visitor.SecurityCode to inspect if there actually was a SecurityCode accepted by the app:

visitor.SecurityCode  null  string

SecurityCode on this snapshot is just as expected: null. The Snapshot Debugger captures a snapshot on the first request to hit the line of code where you placed a Snappoint. This snapshot reflects the correct behavior of your app and therefore doesn’t help you diagnose the issue. A snapshot where the SecurityCode isn’t null will help you investigate your hypothesis. To attempt to capture a snapshot when a SecurityCode is present, you can add conditions to the Snappoints by clicking the Options gear when hovering over a Snappoint, as shown in Figure 4.

Capturing a Snapshot When SecurityCode Is Present
Figure 4 Capturing a Snapshot When SecurityCode Is Present

The added conditional statement will let you narrow down and only take a snapshot on an interesting request. After turning on  the new Snappoint by clicking Start/Update Collection and waiting for a few seconds, you see another snapshot visible in the Diagnostic Tools Window. Double-clicking to view the debug information for this snapshot shows the following information for the visitor local variable:

visitor  {MyCompany.Visitors.Model.Visitor}  MyCompany.Visitors.Model.Visitor
  Company  "Microsoft"  string
  Email  "toroidking@microsoft.com"  string
  FirstName  "Jackson"  string
  LastName  "Davis"  string
  Position  "Principal Software Engineer"  string
  SecurityCode  "SecretProjects"  string
  VisitorId  7  int

Bam! It looks as if your app somehow accepted an unverified SecurityCode. It appears as if someone had manually modified part of the HTTP request’s payload into the Update controller to add in a SecurityCode. Your app then accepted the SecurityCode without any further checks. To fix this problem, you could either make the SecurityCode property private or add in checks for clients manually setting a SecurityCode.

The manual modification of the PUT request was not a scenario that you built your app to handle, nor had this scenario been tested locally. Snappoints let you investigate real customer requests that come into your app to detect the error with almost no impact to the site itself.

Get More Information On Demand with Logpoints

Now that you’ve identified the source of your error, you can fix your code and deploy it into production. However, it might be some time before you can make the fix, get it reviewed by your team and have it propagate through your continuous integration systems up to production. For now, you’ll keep track of every user who tries to set a SecurityCode.

Logpoints let you add this additional temporary logging message to your app without any restart or redeployment and they let you evaluate variables to put in log messages. To add a Logpoint to your app, go to the settings on a Snappoint and convert it into a Logpoint (note any conditions you set will also work for Logpoints). Instead of taking a snapshot when the line of code runs, the app will emit a new log message. Logpoints will stay active only during a Snapshot Debugger session in Visual Studio. In the example in Figure 5, you log the visitor’s name when an invalid SecurityCode is set.

Adding a Logpoint
Figure 5 Adding a Logpoint

Logpoints can send logs to two places: back to Visual Studio in a live stream and to your app’s log store. When Send to Output Window is checked, new log messages stream back into Visual Studio and display in both the Output Window and the Diagnostic Tools Window. When Send to app logs is checked, you send the logs back to your app’s log store. To achieve this, you make a System.Diagnostics.Trace call with the input log message for ASP.NET apps and an ILogger.LogInformation call for ASP.NET Core apps. If you configure your logging framework to listen to System.Diagnostics or ILogger, the logs will appear in conjunction with any other app logs you currently collect.

Logpoints don’t cause any side effects to your app in production. The Snapshot Debugger virtually executes the input log statements in a way that can’t alter the state or exe­cution of your app in production.

By using Logpoints, you’re now tracking any user who’s trying to use a SecurityCode in your app. You’re collecting new log information in a part of your app that previously generated no logs. Logpoints let you get additional contextual information about your app, only where and when you need it.

How Snapshot Debugging Is Production-Ready

The Snapshot Debugger aims to bring the ease of the Visual Studio debugging experience to a production scale for apps running in Azure. The process of capturing snapshots on a live app is minimally intrusive. Performance measurements show that Snapshot Debugger has a negligible impact to the speed of your app.

Snapshot Debugger is intended to be used on Release build versions of your app. Currently, debugging Release versions of managed apps can be challenging, as debugger controller might be slightly off due to function optimization and in-lining in Release build apps. The Snapshot Debugger gives you much better support for debugging Release build apps. It recompiles functions containing Snappoints with optimizations and in-lining temporarily disabled. Resulting snapshots in these functions are, therefore, more easily “debuggable.” When you end your Snapshot Debugger session, the functions are recompiled once more and returned to their original production state.

When the Snapshot Debugger captures a snapshot of your app, it creates a fork of the app’s process. This snapshot, or forked process, reflects the full state of the app and the app’s heap at the point of time when the snapshot was taken. However, this snapshot doesn’t immediately copy the full heap of the app; instead, it only copies the page table and sets pages to copy on write. If some of the app’s objects on the heap change, their respective pages are then copied into the forked process. Each snapshot, therefore, requires only a small memory cost (on the Visitors app used in the example, the memory consumed by each snapshot was in the hundreds of kilobytes).

The Snapshot Debugger aggressively throttles the number of snapshots created to ensure that it puts minimal memory pressure on your production server. The Snapshot Debugger won’t take snapshots if the amount of free memory on your server is too low. Additionally, snapshots only exist while the Snapshot Debugger session in Visual Studio is active. When you stop the Snapshot Debugger session, all snapshot processes are killed.

Snappoints also let you investigate a request as it progresses through different lines of your code. Unlike breakpoints, you can’t step between Snappoints. However, you can place multiple Snappoints at different locations of interest in your code. The Start/Update Collection button will turn on all the Snappoints you place in bulk. Two Snappoints in one function will result in two snapshots in the Diagnostic Tools Window, assuming both lines with Snappoints execute. You can switch between the snapshots to see how variables change from the first snapshot to the second. The Snapshot Debugger will ensure that groups of Snappoints enabled in bulk will result in related snapshots. Two snapshots in one function will both be from the same request, even if there are hundreds of requests hitting your app at that moment in time.

In a production environment, there can be many servers running identical instances of your app. In some cases, interesting requests that might reflect an error may be rare and only occur on specific servers. The Snapshot Debugger enables you to investigate these issues as it supports debugging against multiple instances of your app at once. Snappoints placed in a function activate across every server running an instance of your app. Only the first instance to execute the line with the Snappoint will capture a snapshot. You can, therefore, use conditional statements to analyze seldom-occurring issues. The Snapshot Debugger will take the resulting snapshot only on the server where the input conditions become true.

Wrapping Up

The Snapshot Debugger was built after years of working with developers and listening to their difficulties in debugging and diagnosing production issues. The Snapshot Debugger enables you to have a rich diagnostics experience when developing .NET apps for Azure, in turn letting you save time and money when you run into inevitable production issues.

The Snapshot Debugger is currently in public preview for ASP.NET and ASP.NET Core apps running on Azure App Services. You can download the Snapshot Debugger and try it out at aka.ms/snappoint.


Nikhil Joglekar is a program manager at Microsoft, focusing on debugging and diagnostics for Azure services. He has worked on the Snapshot Debugger, Visual Studio Profiler and Azure SDK since joining Microsoft two years ago. Contact him at Nikhil.Joglekar@microsoft.com or on Twitter: @nikjogo.

Thanks to the following Microsoft technical experts for reviewing this article: Jackson Davis and Andy Sterland