Exercise - Implement code-based resiliency

Completed

In this exercise, you'll implement a resiliency handler with Polly. The initial eShopOnContainers deployment includes a failure simulation feature when validating a coupon from the checkout basket. This feature allows you to configure the number of times a request for a specific discount code will fail.

In this unit, you will:

  • Update the app's code to implement failure handling using Polly.
  • Create an ACR instance and deploy the updated app to AKS.
  • Explore the system response under failure after implementing resiliency.

Note

If your Cloud Shell session disconnects due to inactivity, reconnect and run the following command to return to this directory and open the Cloud Shell editor:

cd ~/clouddrive/aspnet-learn/src/ && \
  code .

Add failure handling code using Polly

In this section, you'll modify the app to automatically retry a failing operation until it succeeds. If the operation continues to fail after several attempts, the UI will display an exception.

When validating a discount coupon, the HTTP request is sent to the web shopping aggregator. The web shopping aggregator is responsible for routing the request to the coupon service. This is an implementation of the Backends For Frontends pattern (BFF). The BFF implementation:

  • Sends another HTTP request to the coupon service to get the required information.
  • Handles resiliency using IHttpClientFactory and Polly.

To make the coupon service resilient, you'll implement a Retry and a Circuit Breaker policy to handle failure within the web shopping aggregator. Using Polly with IHttpClientFactory to add resiliency to web apps is one of the archetypical failure handling solutions. The IHttpClientFactory is responsible for creating instances of HttpClient.

The following sequence diagram shows the flow of events from an HttpClient instance to Polly's Retry and Circuit Breaker policies:

An HttpClient call through multiple PolicyHttpMessageHandlers

Complete the following steps to implement failure handling for the coupon service as described above:

  1. Set your current location to the HTTP aggregator project directory by running the following command:

    pushd src/ApiGateways/Aggregators/Web.Shopping.HttpAggregator/
    

    Your current location is ~/clouddrive/aspnet-learn/src/src/ApiGateways/Aggregators/Web.Shopping.HttpAggregator.

  2. Run the following command:

    dotnet add package Microsoft.Extensions.Http.Polly --version 3.1.6
    

    The preceding command installs a NuGet package in the Web.Shopping.HttpAggregator project. The package integrates IHttpClientFactory with Polly and installs the actual Polly package as a dependency. The package is necessary to configure Polly policies to handle conditions representing transient faults when making HTTP requests. Such conditions are handled by invoking the package's HttpPolicyExtensions.HandleTransientHttpError method. The conditions include:

    • Network failures, as indicated by exceptions of type HttpRequestException
    • Server errors, as indicated by HTTP 5xx status codes
    • Request timeouts, as indicated by the HTTP 408 status code
  3. Apply the following changes in the src/ApiGateways/Aggregators/Web.Shopping.HttpAggregator/Extensions/ServiceCollectionExtensions.cs file:

    1. Replace the comment // Add the GetRetryPolicy method with the following method:

      public static IAsyncPolicy<HttpResponseMessage> GetRetryPolicy()
      {
          return HttpPolicyExtensions.HandleTransientHttpError()
              .WaitAndRetryAsync(5,
                  retryAttempt => TimeSpan.FromMilliseconds(Math.Pow(1.5, retryAttempt) * 1000),
                  (_, waitingTime) =>
                  {
                      Log.Logger.Information(
                          "----- Retrying in {WaitingTime}s", $"{ waitingTime.TotalSeconds:n1}");
                  });
      }
      
    2. Replace the comment // Add the GetCircuitBreakerPolicy method with the following method:

      public static IAsyncPolicy<HttpResponseMessage> GetCircuitBreakerPolicy() =>
          HttpPolicyExtensions.HandleTransientHttpError()
              .CircuitBreakerAsync(15, TimeSpan.FromSeconds(15));
      
    3. In the AddApplicationServices method, call the AddPolicyHandler extension method twice. Chain the method calls to the AddHttpMessageHandler method call for the coupon service:

      public static IServiceCollection AddApplicationServices(this IServiceCollection services)
      {
          //register delegating handlers
          services.AddTransient<HttpClientAuthorizationDelegatingHandler>();
          services.AddSingleton<IHttpContextAccessor, HttpContextAccessor>();
      
          //register HTTP services
          services.AddHttpClient<ICouponService, CouponService>()
              .AddHttpMessageHandler<HttpClientAuthorizationDelegatingHandler>()
              .AddPolicyHandler(GetRetryPolicy())
              .AddPolicyHandler(GetCircuitBreakerPolicy());
      
          //code omitted for brevity
      
          return services;
      }
      
    4. Replace the comment // Add the using statements with the following using directives:

      using Polly;
      using Polly.Extensions.Http;
      using System.Net.Http;
      

      Importing the preceding namespaces resolves member references in the GetRetryPolicy and GetCircuitBreakerPolicy methods.

  4. Save the ServiceCollectionExtensions.cs file.

  5. Run the following command to build the app:

    dotnet build --no-restore
    

    The --no-restore option is included because no NuGet packages were added since the last build. The build process bypasses restoration of NuGet packages and succeeds with no warnings. If the build fails, check the output for troubleshooting information.

  6. Return to your previous location by running the following command:

    popd
    

With the preceding changes:

  • A Retry policy was defined that retries up to five times with an exponentially increasing delay in between each attempt. This policy's premise is that faults are transient and may self-correct after a short delay. The policy's delay:

    • Increases as a power of 1.5 seconds after each attempt.
    • Is a power of 2 seconds by default. To decrease wait times for this exercise, 1.5 seconds is used instead.
  • A Circuit Breaker policy was defined which enforces a 15-second pause after 15 consecutive failures. This policy's premise is that protecting the service from overload can help it recover.

  • The HttpClient instance used by the coupon service was configured to apply the Retry and Circuit Breaker policies. This particular HttpClient instance is provided to the CouponService class via constructor injection:

    public class CouponService : ICouponService
    {
        public readonly HttpClient _httpClient;
        private readonly UrlsConfig _urls;
        private readonly ILogger<CouponService> _logger;
    
        public CouponService(
            HttpClient httpClient,
            IOptions<UrlsConfig> config,
            ILogger<CouponService> logger)
        {
            _urls = config.Value;
            _httpClient = httpClient;
            _logger = logger;
        }
    
        // code omitted for brevity
    

    The AddApplicationServices extension method is invoked from the ConfigureServices method in the project's Startup.cs file:

    public void ConfigureServices(IServiceCollection services)
    {
        // code omitted for brevity
    
        services.AddCustomMvc(Configuration)
            .AddCustomAuthentication(Configuration)
            .AddApplicationServices();
    }
    

Deploy the updated microservice

Complete the following steps to deploy the changes that you've implemented:

  1. Run the following script to publish the aggregator's updated Docker image to ACR:

    ./deploy/k8s/build-to-acr.sh --services webshoppingagg
    

    The preceding script builds and publishes the updated image to the ACR instance. An ACR quick task is used to build and publish the webshoppingagg image to the ACR instance. You'll see a variation of the following output:

    Building images to ACR
    ======================
    ~/clouddrive/aspnet-learn/src/deploy/k8s ~/clouddrive/aspnet-learn/src
    
    Building and publishing docker images to eshoplearn20200729161705092.azurecr.io
    ~/clouddrive/aspnet-learn/src ~/clouddrive/aspnet-learn/src/deploy/k8s ~/clouddrive/aspnet-learn/src
    
    Building image "webshoppingagg" for service "webshoppingagg" with "src/ApiGateways/Aggregators/Web.Shopping.HttpAggregator/Dockerfile.acr"...
    
     > az acr build -r eshoplearn20200729161705092 -t eshoplearn20200729161705092.azurecr.io/webshoppingagg:linux-latest -f src/ApiGateways/Aggregators/Web.Shopping.HttpAggregator/Dockerfile.acr .
    
    Packing source code into tar to upload...
    Excluding '.gitignore' based on default ignore rules
    Uploading archived source code from '/tmp/build_archive_1a826ecd8db64f8c846d796af13d6318.tar.gz'...
    Sending context (7.838 MiB) to registry: eshoplearn20200729161705092...
    Queued a build with ID: cf2
    Waiting for an agent...
    2020/07/29 17:03:19 Downloading source code...
    2020/07/29 17:03:21 Finished downloading source code
    2020/07/29 17:03:22 Using acb_vol_faae1c90-bbea-4ea6-89e9-daa0ab059f5a as the home volume
    2020/07/29 17:03:22 Setting up Docker configuration...
    2020/07/29 17:03:23 Successfully set up Docker configuration
    2020/07/29 17:03:23 Logging in to registry: eshoplearn20200729161705092.azurecr.io
    2020/07/29 17:03:24 Successfully logged into eshoplearn20200729161705092.azurecr.io
    2020/07/29 17:03:24 Executing step ID: build. Timeout(sec): 28800, Working directory: '', Network: ''
    2020/07/29 17:03:24 Scanning for dependencies...
    2020/07/29 17:03:25 Successfully scanned dependencies
    2020/07/29 17:03:25 Launching container with name: build
    

    And this particular line once the image has been published to ACR:

    2020/07/29 17:04:57 Successfully pushed image: eshoplearn20200729161705092.azurecr.io/webshoppingagg:linux-latest
    
  2. Run the following command to verify the URL of your ACR instance:

    eval $(cat ~/clouddrive/aspnet-learn/create-acr-exports.txt) && \
        echo $ESHOP_REGISTRY
    

    The setup script saved some environment variable declarations in a text file. The preceding command evaluates the text file to set the environment variables. You'll see a variation of the following output:

    eshoplearn2020072900000000.azurecr.io
    
  3. Run the following script to deploy the updated image in ACR to AKS:

    ./deploy/k8s/deploy-application.sh --registry $ESHOP_REGISTRY --charts webshoppingagg
    

    The preceding script uninstalls the old webshoppingagg Helm chart and installs it again. The AKS cluster uses the new image from the ACR instance. You'll see a variation of the following output:

    ~/clouddrive/aspnet-learn ~/clouddrive/aspnet-learn/src/deploy/k8s
    ~/clouddrive/aspnet-learn/src/deploy/k8s
    
    Uninstalling chart webshoppingagg...
    release "eshoplearn-webshoppingagg" uninstalled
    
    Deploying Helm charts from registry "eshoplearn20200731194920286.azurecr.io" to "http://13.87.153.177"...
    ---------------------
    
    Installing chart "webshoppingagg"...
    NAME: eshoplearn-webshoppingagg
    LAST DEPLOYED: Fri Jul 31 20:38:05 2020
    NAMESPACE: default
    STATUS: deployed
    REVISION: 1
    TEST SUITE: None
    
    Helm charts deployed!
    

Test the app again

The Polly Retry and Circuit Breaker policies have been deployed. It's time to test the app's behavior.

Verify availability of the services

  1. Execute the following command:

    cat ~/clouddrive/aspnet-learn/deployment-urls.txt
    
  2. Select the General application status link in the command shell to view the WebStatus health checks dashboard.

  3. Continue to the next section when all services are healthy.

Retry policy

Complete the following steps to test the Retry policy:

  1. Place an item in the shopping bag and begin the checkout procedure.

  2. Enter the discount code FAIL 2 DISC-10 and select APPLY.

    You'll receive the following confirmation message with the number of failures configured for the code: CONFIG: 2 failure(s) configured for code "DISC-10"!!.

  3. Replace the existing discount code with DISC-10 and select APPLY.

    The operation appears to be successful on the first try after a brief wait. The resilient BFF will handle retries transparently from the user's perspective. Notice that the 10 USD discount was applied.

  4. Run the following command to view the logging page URL. Select the Centralized logging link.

    cat ../deployment-urls.txt
    
  5. Check the log traces. You'll see a variation of the following output:

    log traces

    In the preceding image, you can see:

    • The log traces when configuring the simulated failures, labeled as "1".
    • Three retries until the aggregator could finally get the value, labeled as "2".
  6. Complete the checkout procedure and select CONTINUE SHOPPING.

Circuit Breaker policy

To test the Circuit Breaker policy, you'll configure the code for 20 failures. Accordingly, you'll use the discount code FAIL 20 DISC-10:

configure a severe failure

  1. Place an item in the shopping bag and begin the checkout procedure.

  2. Enter the discount code FAIL 20 DISC-10 and select APPLY.

    You'll receive the following confirmation message with the number of failures configured for the code: CONFIG: 20 failure(s) configured for code "DISC-10"!!.

  3. Enter the discount code DISC-10 again and select APPLY.

  4. Wait about 20 seconds. You'll receive an HTTP 500 error message.

  5. Select APPLY again. The error message is received again in about 20 seconds.

  6. Select APPLY again. The HTTP 500 error message came in much faster because of the Circuit Breaker policy.

  7. Select APPLY again.

    The error message is received immediately. You can see this error clearly in the log traces:

    severe failures in log traces

    In the preceding image, notice that:

    • After waiting for 7.6 seconds, labeled as "1", you received the HTTP 500 error message with the Retry policy, labeled as "2".

    • On the next try, you validate the code. You receive the HTTP 500 error message after waiting only 3.4 seconds, labeled as "3". You don't see the "Get coupon..." trace, meaning it failed without going to the server.

    • If you check the details on this last trace, you should see a variation of the following output:

      severe failure log detail

      Notice that the last trace has the "The circuit is now open..." message.

In this unit, you added code-based resiliency with Polly. Next, you'll implement infrastructure-based resiliency with Linkerd.