question

MohamedRamzy-7269 avatar image
0 Votes"
MohamedRamzy-7269 asked haithamdheyaa-7428 answered

Azure durable Functions

I have a function app that contains four durable functions (triggered by http). Each of those will start a new orchestrator client. Activity functions are common and called by different orchestrations. In activity functions, based on the parameter value passed, it will executes the section of code and return the value back to orchestrator function. Activity function is having database operations, external API operations and business logic. This setup is running ok, but as the load increases I am getting weird results, like duplicate entries in database, picking up wrong data, activity function timeouts etc.

Is this setup wrong? Can multiple orchestrations call a single activity function? Should activity functions broken to perform a single task?

azure-functions
· 2
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Theoretically your orchestrations should be able to call the same activity, as long as the activity functions are stateless- it sounds like they are. I'm curious about the timeouts in particular- are they timing out early or is it after the configured max runtime (default is 5 minutes for consumption billing and 30 minutes for other plans if you haven't set this)?

The reason why I think the timeouts may be the root cause is because of how durable functions works. For an example of how you could see duplicates, if you have both DB calls and API calls in the same activity function:

  1. Orchestration function calls activity function.

  2. Activity function writes data to the DB

  3. API call times out so the function fails

  4. Some form of upstream logic causes a retry on the same data

  5. Orchestration calls the activity again which then writes the same data to the DB again.

There are two recommendations when writing functions that help prevent this. The first is to have each activity function perform a single task and use the orchestration function to pass data between them. You wouldn't necessary need to write a function for every API call, though it's not a bad thing to do, but things like DB calls and API calls in the same activity aren't recommended. Second is to make sure if a function, orchestrator or activity, is rerun with the same data it won't cause a duplicate entry in the DB.

I am a unsure about what could be causing the incorrect data issue. Could you provide an example of what's happening there?

0 Votes 0 ·

Thank you for your reply.

Yes, timeout is happening after 5 mins.

I have used two separate activity functions to handle DB and API calls.
My activity function is a promise based one.
It goes as: (Just want to make sure this method is acceptable)

const soap = require("soap");
const request = require("request");
const db = require("../helper/retryQueue");

module.exports = async function (context) {
return new Promise((resolve, reject) => {
var failData = context.bindings.name.retryData;

 switch (context.bindings.name.type) {
   case "TICKETDETAILS":

.....



It uses the incoming parameter to decide what part to execute and resolve with a value or reject in case of any failures.
Orchestrator functions call the same activity function with different types several times and yields the data to proceed.
In addition I need to mention that durable functions could get multiple http triggers with same data, like for consequent updates, in return each will trigger orchestration. Will this can make issues? There could be a race situation complete. Is there any we way could make it to process one after the other based on a key?

I have noticed timeout occurs when it does the save action to the database. Our DB is mongo atlas, hosted outside Azure, and connects via Vnet.

What I meant by incorrect data is, there were cases where a particular value of a dataset is appeared on a different dataset.

Appreciate your advise.

0 Votes 0 ·
SamaraSoucy-MSFT avatar image
1 Vote"
SamaraSoucy-MSFT answered MohamedRamzy-7269 commented

You can increase your timeout to a max of 10 minutes (on consumption billing) by going into the host.json and adding "functionTimeout": "00:10:00" to your settings. If taking more than 5 minutes is the expected behavior this may be enough. If not you'll probably need to dig into the mongo logs to see why this is happening. Timeouts on DB queries are usually because the database is struggling to keep up as opposed to network latency, though it is possible.

As far as design goes, I don't think it is wrong as long as the API and DB calls are being separated out. I can see an argument that it would be better to put each item in the switch statement into a separate activity, especially if there is a 1 -> 1 mapping between orchestrator function and activity type, but that adds complexity to the code you just may not need. The main reason I might push to break them out is for troubleshooting/logging purposes, so if you are ensuring that you can troubleshoot effectively you should be fine.

Race conditions could definitely cause soe weird data issues and would get worse as volume increases. By default, functions don't know anything about each other- the first copy of a given activity function won't know what data is in a second copy running at the same time. Queuing is often the right solution for this, but it might be difficult to implement in practice. Assuming the race condition is limited to the DB call, then mongo might have built-in capabilities that can help you here- I've not worked with that DB engine enough to know.

If it needs to be at the function level, based on what you've shared so far, there are a few things that might help, but I don't think either would be ideal. The first would be to simply write the data to a queue, like Service Bus or Storage queues, instead of the DB and have another function that picks up from there to do the DB writes. This means that the DB write is completely disconnected from the initial http call.

Second is that you can ensure orchestration functions within a specific app have a unique id. So a request comes into Function 1 with a record that has an id of "foo". Function 1 starts up an orchestration with "foo" as the id. While that is still running Function 2 gets a record with the id of "foo" and wants to start up a different orchestration function with that document. As long as all of this is within a single function app, Function 2 can check to see if an orchestration with the id "foo" is currently active and you can handle the case where starting the second orchestration fails because the id is not unique. It is important to know that this does have its own race condition- if Function 1 & 2 are called at the same time they might both report that they successfully started their own orchestration but only one of them actually succeeded. How this works is in the documentation: https://docs.microsoft.com/en-us/azure/azure-functions/durable/durable-functions-singletons?tabs=javascript

A third option might be to write the record ID to a storage table when a function first starts up then delete it when it finishes. The next function can then check that table to see if the id of its record is already in the table and wait if it is. It's still an issue of reducing the possibility of a race condition rather than removing it completely, but it might be good enough for your purposes. The main issue I can see with this is forcing Function 2 to wait for Function 1 to complete when you are already having timeout issues.

· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Thanks a lot for your valuable explanations. I am looking at the singleton pattern. Seems like I could make use of it with custom instance IDs.

Thanks once again.

0 Votes 0 ·
haithamdheyaa-7428 avatar image
0 Votes"
haithamdheyaa-7428 answered

Ok, This is not making a lot of sense to me. My azure durable function take between 2 and 4 minutes to run. I'm not aware of any queue. I call it using a web API call and it runs almost instantly. The start-up process calls the orchestration which in turns calls a single activity function which does all the work. Where the queue?
Are you saying that while its running its still counts as one item in the queue? Otherwise the queue would almost always be empty.
So lets say I call the function twice in a short period of time so that on both calls there has been no scaling out. While both are running the scale controller hits its 30 second point and realizes there are two items running (i.e. in the queue) and so starts spinning up a another VM. Is this correct?


I currently use a USA company called UltaHost to provide VPSs with up to 16 cores each (32 threads). They are much cheaper than VPSs on azure. My alternative to azure is to simply build an array of multiple VPSs (each containing the code to run this process) and simply queue them to ensure I am guaranteed two threads per call. So each machine can handle 16 simultaneous computations.

Will any configuration of Azure Durable Functions replicate this and avoid the need to purchase multiple VPSs? Clearly the Consumption Plan does not. But would the Premium plan solve the problem? If so why?


5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.