Azure durable Functions

Question

I have a function app that contains four durable functions (triggered by http). Each of those will start a new orchestrator client. Activity functions are common and called by different orchestrations. In activity functions, based on the parameter value passed, it will executes the section of code and return the value back to orchestrator function. Activity function is having database operations, external API operations and business logic. This setup is running ok, but as the load increases I am getting weird results, like duplicate entries in database, picking up wrong data, activity function timeouts etc.

Is this setup wrong? Can multiple orchestrations call a single activity function? Should activity functions broken to perform a single task?

Accepted Answer

You can increase your timeout to a max of 10 minutes (on consumption billing) by going into the host.json and adding "functionTimeout": "00:10:00" to your settings. If taking more than 5 minutes is the expected behavior this may be enough. If not you'll probably need to dig into the mongo logs to see why this is happening. Timeouts on DB queries are usually because the database is struggling to keep up as opposed to network latency, though it is possible.

As far as design goes, I don't think it is wrong as long as the API and DB calls are being separated out. I can see an argument that it would be better to put each item in the switch statement into a separate activity, especially if there is a 1 -> 1 mapping between orchestrator function and activity type, but that adds complexity to the code you just may not need. The main reason I might push to break them out is for troubleshooting/logging purposes, so if you are ensuring that you can troubleshoot effectively you should be fine.

Race conditions could definitely cause soe weird data issues and would get worse as volume increases. By default, functions don't know anything about each other- the first copy of a given activity function won't know what data is in a second copy running at the same time. Queuing is often the right solution for this, but it might be difficult to implement in practice. Assuming the race condition is limited to the DB call, then mongo might have built-in capabilities that can help you here- I've not worked with that DB engine enough to know.

If it needs to be at the function level, based on what you've shared so far, there are a few things that might help, but I don't think either would be ideal. The first would be to simply write the data to a queue, like Service Bus or Storage queues, instead of the DB and have another function that picks up from there to do the DB writes. This means that the DB write is completely disconnected from the initial http call.

Second is that you can ensure orchestration functions within a specific app have a unique id. So a request comes into Function 1 with a record that has an id of "foo". Function 1 starts up an orchestration with "foo" as the id. While that is still running Function 2 gets a record with the id of "foo" and wants to start up a different orchestration function with that document. As long as all of this is within a single function app, Function 2 can check to see if an orchestration with the id "foo" is currently active and you can handle the case where starting the second orchestration fails because the id is not unique. It is important to know that this does have its own race condition- if Function 1 & 2 are called at the same time they might both report that they successfully started their own orchestration but only one of them actually succeeded. How this works is in the documentation: https://learn.microsoft.com/en-us/azure/azure-functions/durable/durable-functions-singletons?tabs=javascript

A third option might be to write the record ID to a storage table when a function first starts up then delete it when it finishes. The next function can then check that table to see if the id of its record is already in the table and wait if it is. It's still an issue of reducing the possibility of a race condition rather than removing it completely, but it might be good enough for your purposes. The main issue I can see with this is forcing Function 2 to wait for Function 1 to complete when you are already having timeout issues.

Answer

Ok, This is not making a lot of sense to me. My azure durable function take between 2 and 4 minutes to run. I'm not aware of any queue. I call it using a web API call and it runs almost instantly. The start-up process calls the orchestration which in turns calls a single activity function which does all the work. Where the queue?
Are you saying that while its running its still counts as one item in the queue? Otherwise the queue would almost always be empty.
So lets say I call the function twice in a short period of time so that on both calls there has been no scaling out. While both are running the scale controller hits its 30 second point and realizes there are two items running (i.e. in the queue) and so starts spinning up a another VM. Is this correct?

I currently use a USA company called UltaHost to provide VPSs with up to 16 cores each (32 threads). They are much cheaper than VPSs on azure. My alternative to azure is to simply build an array of multiple VPSs (each containing the code to run this process) and simply queue them to ensure I am guaranteed two threads per call. So each machine can handle 16 simultaneous computations.

Will any configuration of Azure Durable Functions replicate this and avoid the need to purchase multiple VPSs? Clearly the Consumption Plan does not. But would the Premium plan solve the problem? If so why?

Azure durable Functions

.....

1 additional answer