Troubleshooting guide for Azure Event Hubs
This article provides some of the .NET exceptions generated by Event Hubs .NET Framework APIs and also other tips for troubleshooting issues.
Event Hubs messaging exceptions - .NET
This section lists the .NET exceptions generated by .NET Framework APIs.
The Event Hubs .NET APIs generate exceptions that can fall into the following categories, along with the associated action you can take to try to fix them.
- User coding error: System.ArgumentException, System.InvalidOperationException, System.OperationCanceledException, System.Runtime.Serialization.SerializationException. General action: try to fix the code before proceeding.
- Setup/configuration error: Microsoft.ServiceBus.Messaging.MessagingEntityNotFoundException, Microsoft.Azure.EventHubs.MessagingEntityNotFoundException, System.UnauthorizedAccessException. General action: review your configuration and change if necessary.
- Transient exceptions: Microsoft.ServiceBus.Messaging.MessagingException, Microsoft.ServiceBus.Messaging.ServerBusyException, Microsoft.Azure.EventHubs.ServerBusyException, Microsoft.ServiceBus.Messaging.MessagingCommunicationException. General action: retry the operation or notify users.
- Other exceptions: System.Transactions.TransactionException, System.TimeoutException, Microsoft.ServiceBus.Messaging.MessageLockLostException, Microsoft.ServiceBus.Messaging.SessionLockLostException. General action: specific to the exception type; refer to the table in the following section.
The following table lists messaging exception types, and their causes, and notes suggested action you can take.
|Exception Type||Description/Cause/Examples||Suggested Action||Note on automatic/immediate retry|
|TimeoutException||The server didn't respond to the requested operation within the specified time, which is controlled by OperationTimeout. The server may have completed the requested operation. This exception can happen because of network or other infrastructure delays.||Check the system state for consistency and retry if necessary.
|Retry might help in some cases; add retry logic to code.|
|InvalidOperationException||The requested user operation isn't allowed within the server or service. See the exception message for details. For example, Complete generates this exception if the message was received in ReceiveAndDelete mode.||Check the code and the documentation. Make sure the requested operation is valid.||Retry won't help.|
|OperationCanceledException||An attempt is made to invoke an operation on an object that has already been closed, aborted, or disposed. In rare cases, the ambient transaction is already disposed.||Check the code and make sure it doesn't invoke operations on a disposed object.||Retry won't help.|
|UnauthorizedAccessException||The TokenProvider object couldn't acquire a token, the token is invalid, or the token doesn't contain the claims required to do the operation.||Make sure the token provider is created with the correct values. Check the configuration of the Access Control Service.||Retry might help in some cases; add retry logic to code.|
|One or more arguments supplied to the method are invalid. The URI supplied to NamespaceManager or Create contains path segment(s). The URI scheme supplied to NamespaceManager or Create is invalid. The property value is larger than 32 KB.||Check the calling code and make sure the arguments are correct.||Retry will not help.|
|Entity associated with the operation does not exist or it has been deleted.||Make sure the entity exists.||Retry will not help.|
|MessagingCommunicationException||Client is not able to establish a connection to Event Hub.||Make sure the supplied host name is correct and the host is reachable.||Retry might help if there are intermittent connectivity issues.|
|Service is not able to process the request at this time.||Client can wait for a period of time, then retry the operation.
|Client may retry after certain interval. If a retry results in a different exception, check retry behavior of that exception.|
|MessagingException||Generic messaging exception that may be thrown in the following cases: An attempt is made to create a QueueClient using a name or path that belongs to a different entity type (for example, a topic). An attempt is made to send a message larger than 1 MB. The server or service encountered an error during processing of the request. See the exception message for details. This exception is usually a transient exception.||Check the code and ensure that only serializable objects are used for the message body (or use a custom serializer). Check the documentation for the supported value types of the properties and only use supported types. Check the IsTransient property. If it is true, you can retry the operation.||Retry behavior is undefined and might not help.|
|MessagingEntityAlreadyExistsException||Attempt to create an entity with a name that is already used by another entity in that service namespace.||Delete the existing entity or choose a different name for the entity to be created.||Retry will not help.|
|QuotaExceededException||The messaging entity has reached its maximum allowable size. This exception can happen if the maximum number of receivers (which is 5) has already been opened on a per-consumer group level.||Create space in the entity by receiving messages from the entity or its subqueues.
|Retry might help if messages have been removed in the meantime.|
|MessagingEntityDisabledException||Request for a runtime operation on a disabled entity.||Activate the entity.||Retry might help if the entity has been activated in the interim.|
|A message payload exceeds the 1-MB limit. This 1-MB limit is for the total message, which can include system properties and any .NET overhead.||Reduce the size of the message payload, then retry the operation.||Retry will not help.|
QuotaExceededException indicates that a quota for a specific entity has been exceeded.
This exception can happen if the maximum number of receivers (5) has already been opened on a per-consumer group level.
Event Hubs has a limit of 20 consumer groups per Event Hub. When you attempt to create more, you receive a QuotaExceededException.
A TimeoutException indicates that a user-initiated operation is taking longer than the operation timeout.
For Event Hubs, the timeout is specified either as part of the connection string, or through ServiceBusConnectionStringBuilder. The error message itself might vary, but it always contains the timeout value specified for the current operation.
There are two common causes for this error: incorrect configuration, or a transient service error.
- Incorrect configuration The operation timeout might be too small for the operational condition. The default value for the operation timeout in the client SDK is 60 seconds. Check to see if your code has the value set to something too small. The condition of the network and CPU usage can affect the time it takes for a particular operation to complete, so the operation timeout should not be set to a small value.
- Transient service error Sometimes the Event Hubs service can experience delays in processing requests; for example, during periods of high traffic. In such cases, you can retry your operation after a delay, until the operation is successful. If the same operation still fails after multiple attempts, visit the Azure service status site to see if there are any known service outages.
A Microsoft.ServiceBus.Messaging.ServerBusyException or Microsoft.Azure.EventHubs.ServerBusyException indicates that a server is overloaded. There are two relevant error codes for this exception.
Error code 50002
This error can occur for one of two reasons:
The load isn't evenly distributed across all partitions on the event hub, and one partition hits the local throughput unit limitation.
Resolution: Revising the partition distribution strategy or trying EventHubClient.Send(eventDataWithOutPartitionKey) might help.
The Event Hubs namespace doesn't have sufficient throughput units (you can check the Metrics screen in the Event Hubs namespace window in the Azure portal to confirm). The portal shows aggregated (1 minute) information, but we measure the throughput in real time – so it's only an estimate.
Resolution: Increasing the throughput units on the namespace can help. You can do this operation on the portal, in the Scale window of the Event Hubs namespace screen. Or, you can use Auto-inflate.
Error code 50001
This error should rarely occur. It happens when the container running code for your namespace is low on CPU – not more than a few seconds before the Event Hubs load balancer begins.
Limit on calls to the GetRuntimeInformation method
Azure Event Hubs supports up to 50 calls per second to the GetRuntimeInfo per second. You may receive an exception similar to the following one once the limit is reached:
ExceptionId: 00000000000-00000-0000-a48a-9c908fbe84f6-ServerBusyException: The request was terminated because the namespace 75248:aaa-default-eventhub-ns-prodb2b is being throttled. Error code : 50001. Please wait 10 seconds and try again.
Connectivity, certificate, or timeout issues
The following steps may help you with troubleshooting connectivity/certificate/timeout issues for all services under *.servicebus.windows.net.
Browse to or wget
https://<yournamespacename>.servicebus.windows.net/. It helps with checking whether you have IP filtering or virtual network or certificate chain issues (most common when using java SDK).
Run the following command to check if any port is blocked on the firewall. Ports used are 443 (HTTPS), 5671 (AMQP) and 9093 (Kafka). Depending on the library you use, other ports are also used. Here is the sample command that check whether the 5671 port is blocked.
tnc <yournamespacename>.servicebus.windows.net -port 5671
telnet <yournamespacename>.servicebus.windows.net 5671
An example of successful message:
<feed xmlns="http://www.w3.org/2005/Atom"><title type="text">Publicly Listed Services</title><subtitle type="text">This is the list of publicly-listed services currently available.</subtitle><id>uuid:27fcd1e2-3a99-44b1-8f1e-3e92b52f0171;id=30</id><updated>2019-12-27T13:11:47Z</updated><generator>Service Bus 1.1</generator></feed>
An example of failure error message:
<Error> <Code>400</Code> <Detail> Bad Request. To know more visit https://aka.ms/sbResourceMgrExceptions. . TrackingId:b786d4d1-cbaf-47a8-a3d1-be689cda2a98_G22, SystemTracker:NoSystemTracker, Timestamp:2019-12-27T13:12:40 </Detail> </Error>
When there are intermittent connectivity issues, run the following command to check if there are any dropped packets. This command will try to establish 25 different TCP connections every 1 second with the service. Then, you can check how many of them succeeded/failed and also see TCP connection latency. You can download the
pspingtool from here.
.\psping.exe -n 25 -i 1 -q <yournamespacename>.servicebus.windows.net:5671 -nobanner
You can use equivalent commands if you're using other tools such as
ping, and so on.
You can learn more about Event Hubs by visiting the following links: