Diagnose and troubleshoot issues when using Azure Cosmos DB .NET SDK
APPLIES TO:
SQL API
This article covers common issues, workarounds, diagnostic steps, and tools when you use the .NET SDK with Azure Cosmos DB SQL API accounts. The .NET SDK provides client-side logical representation to access the Azure Cosmos DB SQL API. This article describes tools and approaches to help you if you run into any issues.
Checklist for troubleshooting issues
Consider the following checklist before you move your application to production. Using the checklist will prevent several common issues you might see. You can also quickly diagnose when an issue occurs:
- Use the latest SDK. Preview SDKs should not be used for production. This will prevent hitting known issues that are already fixed.
- Review the performance tips, and follow the suggested practices. This will help prevent scaling, latency, and other performance issues.
- Enable the SDK logging to help you troubleshoot an issue. Enabling the logging may affect performance so it's best to enable it only when troubleshooting issues. You can enable the following logs:
- Log metrics by using the Azure portal. Portal metrics show the Azure Cosmos DB telemetry, which is helpful to determine if the issue corresponds to Azure Cosmos DB or if it's from the client side.
- Log the diagnostics string in the V2 SDK or diagnostics in V3 SDK from the point operation responses.
- Log the SQL Query Metrics from all the query responses
- Follow the setup for SDK logging
Take a look at the Common issues and workarounds section in this article.
Check the GitHub issues section that's actively monitored. Check to see if any similar issue with a workaround is already filed. If you didn't find a solution, then file a GitHub issue. You can open a support tick for urgent issues.
Common issues and workarounds
General suggestions
- Run your app in the same Azure region as your Azure Cosmos DB account, whenever possible.
- You may run into connectivity/availability issues due to lack of resources on your client machine. We recommend monitoring your CPU utilization on nodes running the Azure Cosmos DB client, and scaling up/out if they're running at high load.
Check the portal metrics
Checking the portal metrics will help determine if it's a client-side issue or if there is an issue with the service. For example, if the metrics contain a high rate of rate-limited requests (HTTP status code 429) which means the request is getting throttled then check the Request rate too large section.
Retry design
See our guide to designing resilient applications with Azure Cosmos SDKs for guidance on how to design resilient applications and learn which are the retry semantics of the SDK.
Azure SNAT (PAT) port exhaustion
If your app is deployed on Azure Virtual Machines without a public IP address, by default Azure SNAT ports establish connections to any endpoint outside of your VM. The number of connections allowed from the VM to the Azure Cosmos DB endpoint is limited by the Azure SNAT configuration. This situation can lead to connection throttling, connection closure, or the above mentioned Request timeouts.
Azure SNAT ports are used only when your VM has a private IP address is connecting to a public IP address. There are two workarounds to avoid Azure SNAT limitation (provided you already are using a single client instance across the entire application):
Add your Azure Cosmos DB service endpoint to the subnet of your Azure Virtual Machines virtual network. For more information, see Azure Virtual Network service endpoints.
When the service endpoint is enabled, the requests are no longer sent from a public IP to Azure Cosmos DB. Instead, the virtual network and subnet identity are sent. This change might result in firewall drops if only public IPs are allowed. If you use a firewall, when you enable the service endpoint, add a subnet to the firewall by using Virtual Network ACLs.
Assign a public IP to your Azure VM.
High network latency
High network latency can be identified by using the diagnostics string in the V2 SDK or diagnostics in V3 SDK.
If no timeouts are present and the diagnostics show single requests where the high latency is evident.
Diagnostics can be obtained from any ResponseMessage, ItemResponse, FeedResponse, or CosmosException by the Diagnostics property:
ItemResponse<MyItem> response = await container.CreateItemAsync<MyItem>(item);
Console.WriteLine(response.Diagnostics.ToString());
Network interactions in the diagnostics will be for example:
{
"name": "Microsoft.Azure.Documents.ServerStoreModel Transport Request",
"id": "0e026cca-15d3-4cf6-bb07-48be02e1e82e",
"component": "Transport",
"start time": "12: 58: 20: 032",
"duration in milliseconds": 1638.5957
}
Where the duration in milliseconds would show the latency.
This latency can have multiple causes:
- Your application is not running in the same region as your Azure Cosmos DB account.
- Your PreferredLocations or ApplicationRegion configuration is incorrect and is trying to connect to a different region to where your application is currently running on.
- There might be a bottleneck on the Network interface because of high traffic. If the application is running on Azure Virtual Machines, there are possible workarounds:
- Consider using a Virtual Machine with Accelerated Networking enabled.
- Enable Accelerated Networking on an existing Virtual Machine.
- Consider using a higher end Virtual Machine.
Common query issues
The query metrics will help determine where the query is spending most of the time. From the query metrics, you can see how much of it is being spent on the back-end vs the client. Learn more about troubleshooting query performance.
If the back-end query returns quickly, and spends a large time on the client check the load on the machine. It's likely that there are not enough resource and the SDK is waiting for resources to be available to handle the response.
If the back-end query is slow, try optimizing the query and looking at the current indexing policy
Note
For improved performance, we recommend Windows 64-bit host processing. The SQL SDK includes a native ServiceInterop.dll to parse and optimize queries locally. ServiceInterop.dll is supported only on the Windows x64 platform. For Linux and other unsupported platforms where ServiceInterop.dll isn't available, an additional network call will be made to the gateway to get the optimized query.
If you encounter the following error: Unable to load DLL 'Microsoft.Azure.Cosmos.ServiceInterop.dll' or one of its dependencies: and are using Windows, you should upgrade to the latest Windows version.
Next steps
- Learn about Performance guidelines for .NET V3 and .NET V2
- Learn about the Reactor-based Java SDKs
Povratne informacije
Pošalјite i prikažite povratne informacije za