question

GraemeFoster-6236 avatar image
0 Votes"
GraemeFoster-6236 asked GraemeFoster-6236 edited

Help understanding occasional very slow HTTP requests on Linux Azure Function Consumption Plan

Hi,

TLDR;

I’m seeing unpredictable occasional slow outbound HTTP requests within an invocation on an Azure Functions on a Linux Consumption plan. Runtime v3.0.15417.0. Can replicate in Au-East and US-Central. Problem doesn't occur on Windows Consumption plans.

I'm looking for an explanation why this might be occurring.

Function
My function makes 10 outgoing calls to http://www.google.com. It sleeps for 0.2 seconds between calls. All Http calls are synchronous (I know I can optimise this but initial problem initially came in a Python consumption plan function using CosmosClient which does not support async io).

Calling the function
I have an external script that calls the function 20 times sequentially from my local machine.

Expected Results
I'd expect to see similar execution times for the invocations.

Actual Results
I get wildly unpredictable results. It's extremely easy to reproduce on any Linux Consumption plan.
Windows Consumption plans produce steady more predictable response.

Metrics

75% of the runs are about 4.5 seconds. But often, midway through an invocation, one of the requests might take up to 8 seconds. My hunch is it is being forced to wait to get a socket by the underlying runtime, but I have no way to prove this.

There are no cold starts involved. I can see from my log statements that it may be the 3rd, 4th, 5th, 6th or 7th request within a function invocation that often takes many seconds to respond.

There is no concurrency involved. My test harness calls the function sequentially.

I can recreate similar results using a variety of Uris. I originally saw the problem using CosmosDB.

Any explanations on what might be happening would be great.

Thanks,

Graeme







azure-functions
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

1 Answer

JayaC-MSFT avatar image
0 Votes"
JayaC-MSFT answered GraemeFoster-6236 edited

Hello anonymous user, We see performance issues for python functions(in case of simultaneous calls as well). So we suggest to maximize the number of FUNCTIONS_WORKER_PROCESS_COUNT.

This behavior is expected due to the single threaded architecture of Python.
In scenarios such as , you are using blocking HTTP sync calls or IO bound calls which will block the entire event loop.

It is documented in our Python Functions Developer reference on how to handle such scenario’s: https://docs.microsoft.com/en-us/azure/azure-functions/functions-reference-python#scaling-and-concurrency . Especially the Async part.

Here are the two methods to handle this:

  1. Async calls

  2. Add more Language worker processes per host, this can be done by using application setting : FUNCTIONS_WORKER_PROCESS_COUNT up to a maximum value of 10. ( So basically, for the CPU-bound workload you are simulating with any loops, we do recommend setting FUNCTIONS_WORKER_PROCESS_COUNT to a higher number to parallelize the work given to a single instance (docs here).

  3. [Please note that each new language worker is spawned every 10 seconds until they are warmed up.]

Here is a GitHub issue which talks about this issue in detail : https://github.com/Azure/azure-functions-python-worker/issues/236

Please let me know if this helps. If it does, please 'Accept as answer' and ‘Up-vote’ so that it can help others in the community looking for help on similar topics.


· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hi Jaya,

Thanks for replying. This setting does not fix the problem. I've run the original function with / without, and the results are the same.

I'm a bit confused by your response though - as-per my original description I can demonstrate this issue using synchronous C# code in .Net core as-well as Python - but only on a Linux consumption plan. In all cases I'm executing the function in a sequential fashion. There is no concurrency involved. So I'm confused about how increasing the FUNCTIONS_WORKER_PROCESS_COUNT helps?

Something I didn't mention in the original question - I can reliably recreate using HTTP calls but haven't tried with, other IO bound calls like reading a file. I did try a blocking CPU bound (sleep) that would also block the event loop. Sleeping (synchronously) does not exhibit the problem.

I'm happy to share the details of the C# code, or the Id of the Linux Consumption plans which have the problem if it helps.

Thanks,

Graeme





0 Votes 0 ·