question

JamesThurley-1373 avatar image
6 Votes"
JamesThurley-1373 asked JamesThurley-1373 edited

Intermittent ERR_HTTP2_PROTOCOL_ERROR / HTTP2_STREAM_ERROR / Server reset stream on App Service with Azure Front Door

We have two applications, one is a static single-page web application (SPA), and one is an ASP.NET Web API application running in .NET Framework 4.7.2. We have two App Service plans, each one in a different region, and each one hosting the two applications as App Services. We have an Azure Front Door instance for each application, each of which load balances between the two App Service plans. Users fetch the front-end SPA on one domain and the SPA talks to the API on another domain.

In the last couple of weeks we have started getting intermittent errors in the form of ERR_HTTP2_PROTOCOL_ERROR. These are not specific to any particular API request, and even happen just requesting the static website. Therefore it is not limited to either App Service. However seeing the errors on the API requests is more common as the SPA generally only gets loaded once before making many API requests over a period of time.

The website will be fine for periods, but then intermittently one or more requests will fail with ERR_HTTP2_PROTOCOL_ERROR, often in a cluster of requests (all failing), and then a few moments later the same requests will succeed.

If I sit refreshing the SPA every couple of seconds, eventually the SPA will fail to load and Chrome will display "The site can't be reached" and "ERR_HTTP2_PROTOCOL_ERROR".

I've used chrome://net-export to record when this happens, both with the API and the SPA, and in both cases I get a similar output for a failed URL_REQUEST with "HTTP2_STREAM_ERROR" and "Server reset stream". I've included an example log when failing to load the SPA at the end of this post.

Because we are getting this issue with both the SPA and the API, it seems to indicate it is an issue with something sitting in front of those (either the App Service configuration, the App Service Plan, or Front Door).

One thing that made me think it might be a Front Door issue is that our App Services were configured to use HTTP 1.1 (I've since switched to HTTP 2, which didn't help), but the errors are related to HTTP 2. The Azure Front Door docs say:

HTTP/2 protocol support is available to clients connecting to Azure Front Door only. The communication to backends in the backend pool is over HTTP/1.1. HTTP/2 support is enabled by default.

Does anyone have any suggestions on where to go from here?

Example error output

 t=652141 [st=   0] +REQUEST_ALIVE  [dt=9925]
                     --> priority = "HIGHEST"
                     --> traffic_annotation = 63171670
                     --> url = "https://portal.mydomain.com/sign-in"
 t=652141 [st=   0]    NETWORK_DELEGATE_BEFORE_URL_REQUEST  [dt=0]
 t=652141 [st=   0]   +URL_REQUEST_START_JOB  [dt=9924]
                       --> initiator = "https://portal.mydomain.com"
                       --> load_flags = 65794 (BYPASS_CACHE | CAN_USE_RESTRICTED_PREFETCH | MAIN_FRAME_DEPRECATED)
                       --> method = "GET"
                       --> network_isolation_key = "https://mydomain.com https://mydomain.com"
                       --> privacy_mode = "disabled"
                       --> site_for_cookies = "SiteForCookies: {scheme=https; registrable_domain=mydomain.com; schemefully_same=true}"
                       --> url = "https://portal.mydomain.com/sign-in"
 t=652141 [st=   0]      COOKIE_INCLUSION_STATUS
                         --> operation = "send"
                         --> status = "EXCLUDE_DOMAIN_MISMATCH, DO_NOT_WARN"
 t=652141 [st=   0]      COOKIE_INCLUSION_STATUS
                         --> operation = "send"
                         --> status = "EXCLUDE_DOMAIN_MISMATCH, DO_NOT_WARN"
 t=652141 [st=   0]      NETWORK_DELEGATE_BEFORE_START_TRANSACTION  [dt=0]
 t=652142 [st=   1]      HTTP_CACHE_GET_BACKEND  [dt=0]
 t=652142 [st=   1]      HTTP_CACHE_DOOM_ENTRY  [dt=0]
                         --> net_error = -2 (ERR_FAILED)
 t=652142 [st=   1]      HTTP_CACHE_CREATE_ENTRY  [dt=0]
 t=652142 [st=   1]      HTTP_CACHE_ADD_TO_ENTRY  [dt=0]
 t=652142 [st=   1]     +HTTP_STREAM_REQUEST  [dt=0]
 t=652142 [st=   1]        HTTP_STREAM_JOB_CONTROLLER_BOUND
                           --> source_dependency = 37851 (HTTP_STREAM_JOB_CONTROLLER)
 t=652142 [st=   1]        HTTP_STREAM_REQUEST_BOUND_TO_JOB
                           --> source_dependency = 37852 (HTTP_STREAM_JOB)
 t=652142 [st=   1]     -HTTP_STREAM_REQUEST
 t=652142 [st=   1]     +HTTP_TRANSACTION_SEND_REQUEST  [dt=0]
 t=652142 [st=   1]        HTTP_TRANSACTION_HTTP2_SEND_REQUEST_HEADERS
                           --> :method: GET
                               :authority: portal.mydomain.com
                               :scheme: https
                               :path: /sign-in
                               pragma: no-cache
                               cache-control: no-cache
                               upgrade-insecure-requests: 1
                               user-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.96 Safari/537.36
                               accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9
                               sec-fetch-site: same-origin
                               sec-fetch-mode: navigate
                               sec-fetch-user: ?1
                               sec-fetch-dest: document
                               accept-encoding: gzip, deflate, br
                               accept-language: en-US,en;q=0.9
 t=652142 [st=   1]     -HTTP_TRANSACTION_SEND_REQUEST
 t=652142 [st=   1]     +HTTP_TRANSACTION_READ_HEADERS  [dt=9923]
 t=662065 [st=9924]        HTTP2_STREAM_ERROR
                           --> description = "Server reset stream."
                           --> net_error = "ERR_HTTP2_PROTOCOL_ERROR"
                           --> stream_id = 5847
 t=662065 [st=9924]     -HTTP_TRANSACTION_READ_HEADERS
                         --> net_error = -337 (ERR_HTTP2_PROTOCOL_ERROR)
 t=662065 [st=9924]   -URL_REQUEST_START_JOB
                       --> net_error = -337 (ERR_HTTP2_PROTOCOL_ERROR)
 t=662065 [st=9924]    URL_REQUEST_DELEGATE_RESPONSE_STARTED  [dt=1]
 t=662066 [st=9925] -REQUEST_ALIVE
                     --> net_error = -337 (ERR_HTTP2_PROTOCOL_ERROR)


UPDATE:

Hi Mithun, thanks for the comment. That does sound like the same issue. I haven't resolved it yet on my side.

I've tried passing the x-azure-debuginfo: 1 header through with requests, from the SPA to the API, as mentioned in the docs here, but when the error occurs I don't receive any additional debug headers back.

I've tried enabling Front Door diagnostic logging, but the only ErrorInfo entries other than NoError are DNSNameNotResolved. This is odd in itself as it is resolving an azurewebsites.net address and the requests around it succeed, however looking at the timestamps vs when I get HTTP2 errors these aren't obviously related. I've included sanitized entries, with and without an error, which occurred within seconds of each other, below.

With Error:

 {
     "time": "2021-01-22T10:05:16.5111237Z",
     "resourceId": "/SUBSCRIPTIONS/9C81A098-BAA3-4465-9502-E464004F2F6B/RESOURCEGROUPS/MY-RG/PROVIDERS/MICROSOFT.NETWORK/FRONTDOORS/MY-API",
     "category": "FrontdoorAccessLog",
     "operationName": "Microsoft.Network/FrontDoor/AccessLog/Write",
     "properties": {
         "trackingReference": "0XKMKYAAAAADXRIIsEmR+RoeguVrUl+LlTVVDMzBFREdFMDYxOQAxZTI5YTljNy01N2M1LTQyZGEtYjljMi1iNjYyMjY2MWQxYTE=",
         "httpMethod": "OPTIONS",
         "httpVersion": "2.0",
         "requestUri": "https://api.mydomain.com:443/xyz/abc/def?",
         "requestBytes": "646",
         "responseBytes": "1470",
         "userAgent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.141 Safari/537.36",
         "clientIp": "123.456.153.25",
         "socketIp": "123.456.153.25",
         "clientPort": "27796",
         "timeToFirstByte": "0.082",
         "timeTaken": "0.082",
         "requestProtocol": "HTTPS",
         "securityProtocol": "TLS 1.2",
         "routingRuleName": "my-api",
         "rulesEngineMatchNames": [],
         "backendHostname": "my-api-msnsxfxpaaaaa.azurewebsites.net:443",
         "isReceivedFromClient": true,
         "httpStatusCode": "503",
         "httpStatusDetails": "503",
         "pop": "MUC",
         "cacheStatus": "CONFIG_NOCACHE",
         "ErrorInfo": "DNSNameNotResolved"
     }
 } 

Without Error:

 {
     "time": "2021-01-22T10:05:22.3334673Z",
     "resourceId": "/SUBSCRIPTIONS/9C81A098-BAA3-4465-9502-E464004F2F6B/RESOURCEGROUPS/MY-RG/PROVIDERS/MICROSOFT.NETWORK/FRONTDOORS/MY-API",
     "category": "FrontdoorAccessLog",
     "operationName": "Microsoft.Network/FrontDoor/AccessLog/Write",
     "properties": {
         "trackingReference": "0YaMKYAAAAADdAswx9kxdRbBqhPzDwxIlTVVDMzBFREdFMDYxOQAxZTI5YTljNy01N2M1LTQyZGEtYjljMi1iNjYyMjY2MWQxYTE=",
         "httpMethod": "OPTIONS",
         "httpVersion": "2.0",
         "requestUri": "https://api.mydomain.com:443/xyz/abc/def?",
         "requestBytes": "646",
         "responseBytes": "978",
         "userAgent": "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.141 Safari/537.36",
         "clientIp": "123.456.153.25",
         "socketIp": "123.456.153.25",
         "clientPort": "27796",
         "timeToFirstByte": "0.791",
         "timeTaken": "0.791",
         "requestProtocol": "HTTPS",
         "securityProtocol": "TLS 1.2",
         "routingRuleName": "my-api",
         "rulesEngineMatchNames": [],
         "backendHostname": "my-api-msnsxfxpaaaaa.azurewebsites.net:443",
         "isReceivedFromClient": true,
         "httpStatusCode": "200",
         "httpStatusDetails": "200",
         "pop": "MUC",
         "cacheStatus": "CONFIG_NOCACHE",
         "ErrorInfo": "NoError"
     }
 }

I'll continue investigating.






azure-webappsazure-front-door
· 33
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hi, we've had an issue very similar to this seen across both UK regions but its stopped happening the past 2 days. Has anybody else experienced this?

1 Vote 1 ·

That's interesting... yes exactly the same pattern here - two of our UK based front doors serving UK back end pools were exhibiting this random failure behaviour starting about 2 weeks ago, then yesterday morning the failures stopped and all has been fine since. No further issues reported from any users or APIs.

We made no changes, the random failures have simply stopped as suddenly as they started.

1 Vote 1 ·

Interesting indeed! I have a ticket raised with MS so will let everyone here know what I hear back from them.

1 Vote 1 ·
Show more comments

Hi @JamesThurley-1373 - Have you been able to resolve this at all? I've been facing the same issue for about last two weeks and it only happens in Chrome. My set up is exactly same as what you've described i.e. two geographically separated App Service (ASP.NET 4.7) behind Azure Front Door.

The error is random and happens in any page, it fixes itself after a second or two.

Cheers
Mithun

0 Votes 0 ·

Hi @MithunBose-7743 that does sound like the same issue. I've added more some detail below as it wouldn't fit in a comment. I haven't resolved it yet.

0 Votes 0 ·

Thanks @JamesThurley-1373, much appreciated - I am liaising with Microsoft Support on this, will let you know in case of any joy.

Cheers
Mithun

0 Votes 0 ·

This started happening to our Front Door Account too in the past few days

0 Votes 0 ·

Hi @EliPerlstein-0263 & @JamesThurley-1373 , The timeout settings on our AFD was set to 30 seconds, I increased this to 120 seconds and that seemed have resolved the issue for now.


Settings options is at top right on "Front Door Designer"



0 Votes 0 ·

Interesting suggestion @MithunBose-7743 . My timeout is already set to 240 seconds, unfortunately.

I've configured our front-end to retry API requests every 5 seconds if they fail with a status code of zero (which is all we get when this error occurs). I just witnessed it fail twice then succeed, with about 15 seconds between the first failure and the successful call. So it doesn't seem like the timeout is the limiting factor for us at least.

0 Votes 0 ·
Show more comments

Hi @MithunBose-7743

Thanks for your input, I've tried now to increase to 240 seconds from previously set 90 seconds.

3 backend machines are http only but i disabled "Certificate subject name validation" anyway. (.NET Core, IIS 10)

This strange behavior is new since 1-2 weeks ago after being 12 month on Front Door

0 Votes 0 ·

For 2 weeks we had no issues, but same issue started again Since Saturday (06 Feb)

0 Votes 0 ·

I had a report of it this morning as well.

0 Votes 0 ·

Unfortunately our users have again reported this as well. 2 weeks nothing happened and suddenly it has started again.

What about you @JamesThurley-1373 ?

Microsoft support hasn't been helpful at all when I raised this previously.

0 Votes 0 ·

I've reopened my support ticket with Azure and would request you to do so as well, so there are aware it's not an isolated incident. Cheers.

0 Votes 0 ·
Show more comments

We are testing AFD at the moment and have seen the ERR_HTTP2_PROTOCOL_ERROR error in Chrome (89) a couple of times, also tried in Edge with no observed issues. Backend is using nginx/gzip/http1.1 - this stack overflow question has various answers related to these. No large files involved - the two backends are both tiny html test sites.

Notice there have been no updates to this question since Feb so wondering if other posters are still experiencing the issue?

Edit

We've just noticed we're also now getting the same on Azure CDN intermittently (obviously served from the same POPs). This is 100% Azure as we use Azure storage as backing for this. Also only observed on Chrome.

0 Votes 0 ·

Hi @matt-7925 @JamesThurley-1373 @EliPerlstein-0263

Yes we have again started to face this issue today. I have reopened my support ticket but I have been banging my head against the wall with support on this for months but just not receiving any meaningful explanation or solution to this, it works fine for weeks and then suddenly all of us get the HTTP2 error for couple of days before is automatically resolved again.

I just hope this can be resolved once and for all.

0 Votes 0 ·

We also saw it again today (although it seems to have gone again now). I managed to create a packet capture and an HAR file capture which I've added to my support ticket.

Was anyone else getting DNS resolution issues as well? (not the major DNS issue from the 1st April, but AFD specific ones around the same time the HTTP2 errors were occurring today).

0 Votes 0 ·
Show more comments
Show more comments

0 Answers