question

MarkBarge-7045 avatar image
0 Votes"
MarkBarge-7045 asked MarkBarge-7045 commented

Durable Function CPU resources

Hi

I'm new to durable functions and have set up a consumption (serverless) based function plan in which I have installed a durable function.

The function is used to calculate a large set of matrices and usually takes between 1 and 5 minutes to run. The output is written to a blob in a storage account which is polled by the client app to retrieve updates and results.

It all works fine but I'm worried about scaling.

I notice that each time I run it, it provides access to two processors. However if I run it twice from the one client then each client only seems to get 1 processor and it runs at half the speed.

Note I've tried fanning out but that's ridiculously slow and totally useless to me as the data transfer sizes are large.


My questions are: - (I'm happy to pay more if needed)

What will happen as it scales up to 100 simultaneous runs?

Is there a more useful plan that gives me 2 processors minimum per execution?

Can I pay more and get 4,8, even 16 processors for EACH single execution?

If it's really successful it may go to 500 or even 1000 simultaneous runs... can Azure cope with this?

I've thought about setting up 100 identical functions (obviously with different names) and calling them in turn for each client... would that work? If so it seems odd that its needed.

Many thanks

Mark Barge




dotnet-csharpazure-functions
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

1 Answer

PramodValavala-MSFT avatar image
0 Votes"
PramodValavala-MSFT answered MarkBarge-7045 commented

@MarkBarge-7045 The Performance and Scale doc covers many of the questions you have and would be an informative read to understand more about how Durable Functions work.

Firstly, to get more compute while still being elastic, you could simply upgrade to the Premium Plan which offers instances with more cores. Do note that there is a limit on how far you can scale-out per region. You can open a support request to increase these limits.

Next, considering that your requirement is compute intensive, it would be best to control concurrency to ensure only a few activities are running at the same time. There are Concurrency Throttles that you can setup at the host level (per instance).

With the above, you should be able to achieve optimum throughput and still ensure all queued requests are processed.

Another enhancement would be to leverage the Event Grid Trigger alternate instead of polling blob storage.


· 3
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hi Pramod

Accessing GPUs would be game changing here. The calculations involved are large numbers of Monte Carlo simulations so the more processors I can bring to the table the faster it will run. Currently a large financial projection can involve hundreds of thousands of multi dimensional matrix numbers which can take many minutes to complete.

I will certainly explore your suggestion on GPUs in Container Instances.

Thanks!

1 Vote 1 ·

Thanks for your comments Pramod

I have investigated the Premium Plan but run into the same problem again. I'm trying to avoid sharing CPU resources between multiple calls. I need each call to be allotted its own CPU resources.. so, regarding your comment on concurrency throttles, I tried setting maxConcurrentActivityFunctions to 1 and that seems to have improved the outcome but on the consumption plan each successive compute device needs to spin up first which seems to take about 20 seconds. Of course I know I can avoid this with the Premium Plan (at a cost which is fine if unavoidable). However, I think I can solve the spin up problem by getting each successive client to call the DF on a pre-run basis (call the function but deliberately stop it) during the period the client is preparing the parameters for the full DF run. That should give it time to spin up.

As regards polling, the application is a desktop one (WPF) and I'm really nervous about such a client side app having access to Storage connection details for polling. That means I can't use a Blob storage approach after all (therefore Event Grid Trigger is out). So I set up a separate VM which stores the computation progress data (sent to it by the Durable Function via REST api) where the data is stored on a database, that is then polled by the app using another API (much more secure).

Finally I'm confident that your advice about multiple regions can be used to solve the high volume concurrency problems.

Thanks for your help.

Mark

0 Votes 0 ·

@MarkBarge-7045 Glad I could help! That definitely is a solution for the spin up delay since that is by design for serverless operations.

While Durable Functions is the best option here with minimal engineering effort required, another approach you can explore is to offload the heavy processing to Azure Container Instances.

You could still use Durable Functions to orchestrate the entire workflow and just start a Container Instance on the fly for the heavy processing. You can leverage the corresponding SDK based on your language of choice (which are essentially wrappers over the REST API) to manage container instances as required.

Note that this service has its own set of limits as well but considering your use case, you might be able to leverage GPUs in Container Instances to cut down on time taken for processing.

PS: I haven't really done any GPGPU based programming before, so I might be wrong about using them here but just a thought. :)


0 Votes 0 ·