Diagnose common scenarios with Service Fabric

This article illustrates common scenarios users have encountered in the area of monitoring and diagnostics with Service Fabric. The scenarios presented cover all 3 layers of service fabric: Application, Cluster, and Infrastructure. Each solution uses Application Insights and Azure Monitor logs, Azure monitoring tools, to complete each scenario. The steps in each solution give users an introduction on how to use Application Insights and Azure Monitor logs in the context of Service Fabric.

Note

This article was recently updated to use the term Azure Monitor logs instead of Log Analytics. Log data is still stored in a Log Analytics workspace and is still collected and analyzed by the same Log Analytics service. We are updating the terminology to better reflect the role of logs in Azure Monitor. See Azure Monitor terminology changes for details.

Prerequisites and Recommendations

The solutions in this article will use the following tools. We recommend you have these set up and configured:

How can I see unhandled exceptions in my application?

  1. Navigate to your Application Insights resource that your application is configured with.

  2. Click on Search in the top left. Then click filter on the next panel.

    AI Overview

  3. You will see lots of types of events (traces, requests, custom events). Choose "Exception" as your filter.

    AI Filter List

    By clicking an exception in the list, you can look at more details including the service context if you are using the Service Fabric Application Insights SDK.

    AI Exception

How do I view which HTTP calls are used in my services?

  1. In the same Application Insights resource, you can filter on "requests" instead of exceptions and view all requests made

  2. If you are using the Service Fabric Application Insights SDK, you can see a visual representation of your services connected to one another, and the number of succeeded and failed requests. On the left click "Application Map"

    AI App Map Blade AI App Map

    For more information on the application map, visit the Application Map documentation

How do I create an alert when a node goes down

  1. Node events are tracked by your Service Fabric cluster. Navigate to the Service Fabric Analytics solution resource named ServiceFabric(NameofResourceGroup)

  2. Click on the graph on the bottom of the blade titled "Summary"

    Azure Monitor logs solution

  3. Here you have many graphs and tiles displaying various metrics. Click on one of the graphs and it will take you to the Log Search. Here you can query for any cluster events or performance counters.

  4. Enter the following query. These event IDs are found in the Node events reference

    ServiceFabricOperationalEvent
    | where EventID >= 25622 and EventID <= 25626
    
  5. Click "New Alert Rule" at the top and now anytime an event arrives based on this query, you will receive an alert in your chosen method of communication.

    Azure Monitor logs New Alert

How can I be alerted of application upgrade rollbacks?

  1. On the same Log Search window as before enter the following query for upgrade rollbacks. These event IDs are found under Application events reference

    ServiceFabricOperationalEvent
    | where EventID == 29623 or EventID == 29624
    
  2. Click "New Alert Rule" at the top and now anytime an event arrives based on this query, you will receive an alert.

How do I see container metrics?

In the same view with all the graphs, you will see some tiles for the performance of your containers. You need the Log Analytics Agent and Container Monitoring solution for these tiles to populate.

Log Analytics Container Metrics

Note

To instrument telemetry from inside your container you will need to add the Application Insights nuget package for containers.

How can I monitor performance counters?

  1. Once you have added the Log Analytics agent to your cluster, you need to add the specific performance counters you want to track. Navigate to the Log Analytics workspace’s page in the portal – from the solution’s page the workspace tab is on the left menu.

    Log Analytics Workspace Tab

  2. Once you’re on the workspace’s page, click on “Advanced settings” in the same left menu.

    Log Analytics Advanced Settings

  3. Click on Data > Windows Performance Counters (Data > Linux Performance Counters for Linux machines) to start collecting specific counters from your nodes via the Log Analytics agent. Here are examples of the format for counters to add

    • .NET CLR Memory(<ProcessNameHere>)\\# Total committed Bytes

    • Processor(_Total)\\% Processor Time

      In the quickstart, VotingData and VotingWeb are the process names used, so tracking these counters would look like

    • .NET CLR Memory(VotingData)\\# Total committed Bytes

    • .NET CLR Memory(VotingWeb)\\# Total committed Bytes

      Log Analytics Perf Counters

  4. This will allow you to see how your infrastructure is handling your workloads, and set relevant alerts based on resource utilization. For example – you may want to set an alert if the total Processor utilization goes above 90% or below 5%. The counter name you would use for this is “% Processor Time.” You could do this by creating an alert rule for the following query:

    Perf | where CounterName == "% Processor Time" and InstanceName == "_Total" | where CounterValue >= 90 or CounterValue <= 5.
    

How do I track performance of my Reliable Services and Actors?

To track the performance of Reliable Services or Actors in your applications, you should collect the Service Fabric Actor, Actor Method, Service, and Service Method counters as well. Here are examples of reliable service and actor performance counters to collect

Note

Service Fabric performance counters cannot be collected by the Log Analytics agent currently, but can be collected by other diagnostic solutions

  • Service Fabric Service(*)\\Average milliseconds per request
  • Service Fabric Service Method(*)\\Invocations/Sec
  • Service Fabric Actor(*)\\Average milliseconds per request
  • Service Fabric Actor Method(*)\\Invocations/Sec

Check these links for the full list of performance counters on Reliable Services and Actors

Next steps