Hi,
We are building a python API to read data from a Cosmos DB. We are expecting a high concurrency of requests between 500-1000.
The API has been built on Azure function using a premium APP service P1V2.
When we tested a low concurrency, the response time were consistent and within our expectations. However, when we started doing load testing, we found that the response times gradually went up ranging from 2s all the way to 60s.
The CPU% and memory did not show much changes. Even the http queue length was low <10. What we understood was that somehow the requests were getting queued up and being served eventually but with a delay. We are using a Singleton for Cosmos Client and have setup the parameters on the App config to allow for maximum concurrency for python apps.
The questions I had is as follows :
1) What metric should we use to scale the app service up automatically? The CPU, memory and http queue length was still low. The only metric which showed a significant jump was the sockets count for inbound requests. Is this what we need to use to auto scale our App service?
2) I also read that we need to follow asynchronous request/reply pattern. However, I found an existing issue that Cosmos SDK for python does not support async method (https://github.com/Azure/azure-sdk-for-python/issues/8636). Is this true? If yes, then what is the alternative to be followed for our use case? Is there some other method to read cosmos in async mode?
Any help would be appreciated.
Regards,
Anupam