Improve throughput performance of Python apps in Azure Functions
When developing for Azure Functions using Python, you need to understand how your functions perform and how that performance affects the way your function app gets scaled. The need is more important when designing highly performant apps. The main factors to consider when designing, writing, and configuring your functions apps are horizontal scaling and throughput performance configurations.
By default, Azure Functions automatically monitors the load on your application and creates additional host instances for Python as needed. Azure Functions uses built-in thresholds for different trigger types to decide when to add instances, such as the age of messages and queue size for QueueTrigger. These thresholds aren't user configurable. For more information, see Event-driven scaling in Azure Functions.
Improving throughput performance
The default configurations are suitable for most of Azure Functions applications. However, you can improve the performance of your applications' throughput by employing configurations based on your workload profile. The first step is to understand the type of workload that you are running.
|Workload type||Function app characteristics||Examples|
|I/O-bound||• App needs to handle many concurrent invocations.
• App processes a large number of I/O events, such as network calls and disk read/writes.
|• Web APIs|
|CPU-bound||• App does long-running computations, such as image resizing.
• App does data transformation.
|• Data processing
• Machine learning inference
As real world function workloads are usually a mix of I/O and CPU bound, you should profile the app under realistic production loads.
After understanding the workload profile of your function app, the following are configurations that you can use to improve the throughput performance of your functions.
- Multiple language worker
- Max workers within a language worker process
- Event loop
- Vertical Scaling
Because Python is a single-threaded runtime, a host instance for Python can process only one function invocation at a time by default. For applications that process a large number of I/O events and/or is I/O bound, you can improve performance significantly by running functions asynchronously.
To run a function asynchronously, use the
async def statement, which runs the function with asyncio directly:
async def main(): await some_nonblocking_socket_io_op()
Here is an example of a function with HTTP trigger that uses aiohttp http client:
import aiohttp import azure.functions as func async def main(req: func.HttpRequest) -> func.HttpResponse: async with aiohttp.ClientSession() as client: async with client.get("PUT_YOUR_URL_HERE") as response: return func.HttpResponse(await response.text()) return func.HttpResponse(body='NotFound', status_code=404)
A function without the
async keyword is run automatically in an ThreadPoolExecutor thread pool:
# Runs in an ThreadPoolExecutor threadpool. Number of threads is defined by PYTHON_THREADPOOL_THREAD_COUNT. # The example is intended to show how default synchronous function are handled. def main(): some_blocking_socket_io()
In order to achieve the full benefit of running functions asynchronously, the I/O operation/library that is used in your code needs to have async implemented as well. Using synchronous I/O operations in functions that are defined as asynchronous may hurt the overall performance. If the libraries you are using do not have async version implemented, you may still benefit from running your code asynchronously by managing event loop in your app.
Here are a few examples of client libraries that has implemented async pattern:
- aiohttp - Http client/server for asyncio
- Streams API - High-level async/await-ready primitives to work with network connection
- Janus Queue - Thread-safe asyncio-aware queue for Python
- pyzmq - Python bindings for ZeroMQ
Understanding async in python worker
When you define
async in front of a function signature, Python will mark the function as a coroutine. When calling the coroutine, it can be scheduled as a task into an event loop. When you call
await in an async function, it registers a continuation into the event loop and allow event loop to process next task during the wait time.
In our Python Worker, the worker shares the event loop with the customer's
async function and it is capable for handling multiple requests concurrently. We strongly encourage our customers to make use of asyncio compatible libraries (e.g. aiohttp, pyzmq). Employing these recommendations will greatly increase your function's throughput compared to those libraries implemented in synchronous fashion.
If your function is declared as
async without any
await inside its implementation, the performance of your function will be severely impacted since the event loop will be blocked which prohibit the python worker to handle concurrent requests.
Use multiple language worker processes
By default, every Functions host instance has a single language worker process. You can increase the number of worker processes per host (up to 10) by using the FUNCTIONS_WORKER_PROCESS_COUNT application setting. Azure Functions then tries to evenly distribute simultaneous function invocations across these workers.
For CPU bound apps, you should set the number of language worker to be the same as or higher than the number of cores that are available per function app. To learn more, see Available instance SKUs.
I/O-bound apps may also benefit from increasing the number of worker processes beyond the number of cores available. Keep in mind that setting the number of workers too high can impact overall performance due to the increased number of required context switches.
The FUNCTIONS_WORKER_PROCESS_COUNT applies to each host that Functions creates when scaling out your application to meet demand.
Set up max workers within a language worker process
As mentioned in the async section, the Python language worker treats functions and coroutines differently. A coroutine is run within the same event loop that the language worker runs on. On the other hand, a function invocation is run within a ThreadPoolExecutor, that is maintained by the language worker, as a thread.
You can set the value of maximum workers allowed for running sync functions using the PYTHON_THREADPOOL_THREAD_COUNT application setting. This value sets the
max_worker argument of the ThreadPoolExecutor object, which lets Python use a pool of at most
max_worker threads to execute calls asynchronously. The
PYTHON_THREADPOOL_THREAD_COUNT applies to each worker that Functions host creates, and Python decides when to create a new thread or reuse the existing idle thread. For older Python versions(that is,
max_worker value is set to 1. For Python version
max_worker is set to
For CPU-bound apps, you should keep the setting to a low number, starting from 1 and increasing as you experiment with your workload. This suggestion is to reduce the time spent on context switches and allowing CPU-bound tasks to finish.
For I/O-bound apps, you should see substantial gains by increasing the number of threads working on each invocation. the recommendation is to start with the Python default - the number of cores + 4 and then tweak based on the throughput values you are seeing.
For mix workloads apps, you should balance both
PYTHON_THREADPOOL_THREAD_COUNT configurations to maximize the throughput. To understand what your function apps spend the most time on, we recommend to profile them and set the values according to the behavior they present. Also refer to this section to learn about FUNCTIONS_WORKER_PROCESS_COUNT application settings.
Although these recommendations apply to both HTTP and non-HTTP triggered functions, you might need to adjust other trigger specific configurations for non-HTTP triggered functions to get the expected performance from your function apps. For more information about this, please refer to this article.
Managing event loop
You should use asyncio compatible third-party libraries. If none of the third-party libraries meet your needs, you can also manage the event loops in Azure Functions. Managing event loops give you more flexibility in compute resource management, and it also makes it possible to wrap synchronous I/O libraries into coroutines.
Take the following requests library as an example, this code snippet uses the asyncio library to wrap the
requests.get() method into a coroutine, running multiple web requests to SAMPLE_URL concurrently.
import asyncio import json import logging import azure.functions as func from time import time from requests import get, Response async def invoke_get_request(eventloop: asyncio.AbstractEventLoop) -> Response: # Wrap requests.get function into a coroutine single_result = await eventloop.run_in_executor( None, # using the default executor get, # each task call invoke_get_request 'SAMPLE_URL' # the url to be passed into the requests.get function ) return single_result async def main(req: func.HttpRequest) -> func.HttpResponse: logging.info('Python HTTP trigger function processed a request.') eventloop = asyncio.get_event_loop() # Create 10 tasks for requests.get synchronous call tasks = [ asyncio.create_task( invoke_get_request(eventloop) ) for _ in range(10) ] done_tasks, _ = await asyncio.wait(tasks) status_codes = [d.result().status_code for d in done_tasks] return func.HttpResponse(body=json.dumps(status_codes), mimetype='application/json')
For more processing units especially in CPU-bound operation, you might be able to get this by upgrading to premium plan with higher specifications. With higher processing units, you can adjust the number of worker process count according to the number of cores available and achieve higher degree of parallelism.
For more information about Azure Functions Python development, see the following resources: