Hi All
We have service fabric clusters which includes an application that has a number of ServiceBusProcessors and ServiceBusSessionProcessors (it covers a number of scenarios, some with session based queues and some without).
Since moving to the newer Azure.Messaging.ServiceBus SDK nugets one of our fabric clusters, which is listening on a lower-traffic service bus namespace, is intermittently stopping picking up messages. By this I mean that all nodes (minimum 5) are still running, but the messages are backing up on the session-based queue (it doesn't appear to be affecting the non-session based queues which are handled by the SerivceBusProcessor, only the session-based one handled by the ServiceBusSessionProcessor).
We are in the process of reviewing our application code, but one vague suggestion I came across elsewhere implied that there may be an "idle timeout" that could cause processing to stop if no new messages came in on the queue. I do see the SessionIdleTimeout property, but if I'm reading correctly that would just automatically close a specific session, not affect the main processing.
It only seems to be happening on the one cluster/SB namespace pairing (touch wood) and seems to be about once a week or so - no regular pattern, just it's happened 3 times over the last 3 weeks. When it happens manually restarting the fabric nodes, so it recreates the processors kicks things back into life and it goes through clearing the queue, but obviously this is not ideal as unless we notice it has happened processing is suspended.
Are there any events we can tap into that we could use to restart the SessionProcessor? We do have logging in on the OnProcessErrorAsync, and it has shown some errors in our application code, but nothing that we can spot that should kill processing on all nodes (plus the documentation explicitly says not to try controlling processor run state from that function).
Any advice anyone could give on how to handle this would be very much appreicated,
Thanks in advance
Mark