針對輸出連接失敗進行疑難排解Troubleshooting outbound connections failures

本文旨在提供來自 Azure Load Balancer 的輸出連線可能會發生的常見問題解決方案。This article is intended that provide resolutions for common problems can occur with outbound connections from an Azure Load Balancer. 客戶經歷的輸出連線能力的大部分問題,都是因為 SNAT 埠耗盡,以及導致捨棄封包的連接逾時。Most problems with outbound connectivity that customers experience are due to SNAT port exhaustion and connection timeouts leading to dropped packets. 本文提供減輕這些問題的步驟。This article provides steps for mitigating each of these issues.

管理 SNAT (PAT) 連接埠耗盡Managing SNAT (PAT) port exhaustion

適用于PAT暫時埠是可耗盡資源,如不含公用 ip 位址的獨立 vm沒有公用 ip 位址的負載平衡 vm所述。Ephemeral ports used for PAT are an exhaustible resource, as described in Standalone VM without a Public IP address and Load-balanced VM without a Public IP address. 您可以監視暫時埠的使用方式,並與目前的配置進行比較,以判斷或使用 指南確認 SNAT 耗盡的風險。You can monitor your usage of ephemeral ports and compare with your current allocation to determine the risk of or to confirm SNAT exhaustion using this guide.

如果您知道將會對相同的目的地 IP 位址和連接埠起始許多輸出 TCP 或 UDP 連線,並且觀察到失敗的輸出連線,或是支援人員告知您 SNAT 連接埠 (PAT 使用的預先配置暫時連接埠) 將耗盡,您有數個可緩和這些問題的一般選項。If you know that you're initiating many outbound TCP or UDP connections to the same destination IP address and port, and you observe failing outbound connections or are advised by support that you're exhausting SNAT ports (preallocated ephemeral ports used by PAT), you have several general mitigation options. 請檢閱這些選項並判斷哪一個可用且最適合您的案例。Review these options and decide what is available and best for your scenario. 可能會有一或多個選項有助於管理此案例。It's possible that one or more can help manage this scenario.

如果您在了解輸出連線行為方面遇到問題,您可以使用 IP 堆疊統計資料 (netstat)。If you are having trouble understanding the outbound connection behavior, you can use IP stack statistics (netstat). 或是使用封包擷取來觀察連線行為,也會很有幫助。Or it can be helpful to observe connection behaviors by using packet captures. 您可以在您執行個體的客體 OS 中執行這些封包擷取,或使用網路監看員來進行封包擷取You can perform these packet captures in the guest OS of your instance or use Network Watcher for packet capture.

手動配置 SNAT 連接埠以將每部 VM 的 SNAT 連接埠最大化Manually allocate SNAT ports to maximize SNAT ports per VM

預先配置的連接埠中所定義,負載平衡器會根據後端中的 VM 數目自動配置連接埠。As defined in preallocated ports, the load balancer will automatically allocate ports based on the number of VMs in the backend. 根據預設,這是為了確保擴充性而謹慎執行。By default, this is done conservatively to ensure scalability. 如果您知道後端將擁有的 Vm 數目上限,您可以在每個輸出規則中手動設定 SNAT 埠。If you know the maximum number of VMs you will have in the backend, you can manually allocate SNAT ports in each outbound rule. 例如,如果您知道最多可有 10 部 VM,即可為每部 VM 配置 6,400 個 SNAT 連接埠,而不是預設的 1,024 個連接埠。For example, if you know you will have a maximum of 10 VMs you can allocate 6,400 SNAT ports per VM rather than the default 1,024.

將應用程式修改成重複使用連線Modify the application to reuse connections

您可以在應用程式中重複使用連線,以降低對用於 SNAT 之暫時連接埠的需求。You can reduce demand for ephemeral ports that are used for SNAT by reusing connections in your application. 連接重複使用與 HTTP/1.1 之類的通訊協定特別相關,因為預設會重複使用連接。Connection reuse is especially relevant for protocols like HTTP/1.1, where connection reuse is the default. 而其他使用 HTTP 作為其傳輸方式的通訊協定 (例如 REST) 也會因而受益。And other protocols that use HTTP as their transport (for example, REST) can benefit in turn.

重複使用一律比每個要求的獨立且不可部分完成的 TCP 連線來得好。Reuse is always better than individual, atomic TCP connections for each request. 重複使用可提供效能更佳又非常有效率的 TCP 傳輸。Reuse results in more performant, very efficient TCP transactions.

將應用程式修改成使用連線共用Modify the application to use connection pooling

您可以在應用程式中採用連線集區配置,這樣要求會在內部分布到一固定的連線集合 (每個都盡可能地重複使用)。You can employ a connection pooling scheme in your application, where requests are internally distributed across a fixed set of connections (each reusing where possible). 這個配置會限制使用中的暫時連接埠數目,而建立較可預測的環境。This scheme constrains the number of ephemeral ports in use and creates a more predictable environment. 這個配置也可在單一連線阻斷某個作業的回應時,藉由允許多個作業同時進行,來增加要求輸送量。This scheme can also increase the throughput of requests by allowing multiple simultaneous operations when a single connection is blocking on the reply of an operation.

連線共用可能已存在於您用來開發應用程式的架構中,或是您應用程式的組態設定中。Connection pooling might already exist within the framework that you're using to develop your application or the configuration settings for your application. 您可以將連線共用與連線重複使用搭配使用。You can combine connection pooling with connection reuse. 這樣,您的多個要求就會將數目固定且可預測的連接埠取用至相同的目的地 IP 位址和連接埠。Your multiple requests then consume a fixed, predictable number of ports to the same destination IP address and port. 這些要求也會因為系統有效率地使用 TCP 交易來降低延遲和資源使用量而受益。The requests also benefit from efficient use of TCP transactions reducing latency and resource utilization. UDP 交易也有幫助,因為管理 UDP 流程數目可接著避免發生耗盡狀況,並管理 SNAT 連接埠使用量。UDP transactions can also benefit, because managing the number of UDP flows can in turn avoid exhaust conditions and manage the SNAT port utilization.

將應用程式修改成使用較不積極的重試邏輯Modify the application to use less aggressive retry logic

當用於 PAT預先配置暫時連接埠耗盡或發生應用程式失敗時,不含衰減和降速邏輯的積極或暴力重試會造成耗盡的情況發生或持續存在。When preallocated ephemeral ports used for PAT are exhausted or application failures occur, aggressive or brute force retries without decay and backoff logic cause exhaustion to occur or persist. 您可以使用較不積極的重試邏輯,以降低對暫時連接埠的需求。You can reduce demand for ephemeral ports by using a less aggressive retry logic.

暫時連接埠有 4 分鐘的閒置逾時 (無法調整)。Ephemeral ports have a 4-minute idle timeout (not adjustable). 如果重試太過積極,耗盡情況就沒有機會進行自我清理。If the retries are too aggressive, the exhaustion has no opportunity to clear up on its own. 因此,考慮應用程式重試交易的方式和頻率,是設計的一個重要部分。Therefore, considering how--and how often--your application retries transactions is a critical part of the design.

將公用 IP 指派給每部 VMAssign a Public IP to each VM

指派公用 IP 位址會將案例變更為VM 的公用 IPAssigning a Public IP address changes your scenario to Public IP to a VM. 用於每個 VM 的所有公用 IP 暫時連接埠都可供 VM 使用。All ephemeral ports of the public IP that are used for each VM are available to the VM. (,而非公用 IP 的暫時埠會與個別後端集區相關聯的所有 Vm 共用的案例。 ) 有一些要考慮的取捨,例如公用 IP 位址的額外成本,以及篩選大量個別 IP 位址的潛在影響。(As opposed to scenarios where ephemeral ports of a public IP are shared with all the VMs associated with the respective backend pool.) There are trade-offs to consider, such as the additional cost of public IP addresses and the potential impact of filtering a large number of individual IP addresses.

注意

此選項不適用於 Web 背景工作角色。This option is not available for web worker roles.

使用多個前端Use multiple frontends

使用公用 Standard Load Balancer 時,您會指派多個前端 IP 位址用於輸出連線以及乘以可用的 SNAT 連接埠數目When using public Standard Load Balancer, you assign multiple frontend IP addresses for outbound connections and multiply the number of SNAT ports available. 建立前端 IP 組態、規則以及後端集區,以觸發 SNAT 到前端公用 IP 的程式設計。Create a frontend IP configuration, rule, and backend pool to trigger the programming of SNAT to the public IP of the frontend. 此規則不需要運作,且健康情況探查不需要成功。The rule does not need to function and a health probe does not need to succeed. 如果您也將多個前端用於輸入 (而非只用於輸出),您應使用自訂健康情況探查以確保可靠性。If you do use multiple frontends for inbound as well (rather than just for outbound), you should use custom health probes well to ensure reliability.

注意

在大部分情況下,SNAT 連接埠耗盡是設計不良的徵兆。In most cases, exhaustion of SNAT ports is a sign of bad design. 請確定您了解為什麼您在使用更多前端以新增 SNAT 連接埠之前耗盡連接埠。Make sure you understand why you are exhausting ports before using more frontends to add SNAT ports. 您有可能會掩蓋之後會導致失敗的問題。You may be masking a problem which can lead to failure later.

擴增Scale out

預先配置的連接埠會根據後端集區大小進行指派並分組到各層中,以在某些連接埠必須重新配置以便容納下一個較大後端集區大小層時,將中斷時間降至最低。Preallocated ports are assigned based on the backend pool size and grouped into tiers to minimize disruption when some of the ports have to be reallocated to accommodate the next larger backend pool size tier. 您可以選擇將後端集區調整為指定層的大小上限,以增加指定前端的 SNAT 埠使用率。You may have an option to increase the SNAT port utilization for a given frontend by scaling your backend pool to the maximum size for a given tier. 請記住,需要預設的埠配置,應用程式才能有效率地向外延展,而不會有 SNAT 耗盡的風險。Keeping in mind the default port allocation is required for the application to scale out efficiently without risk SNAT exhaustion.

例如,後端集區中的兩部虛擬機器有 1024 個 SNAT 連接埠可供每個 IP 組態使用,總計允許 2048 個 SNAT 連接埠用於部署。For example, two virtual machines in the backend pool would have 1024 SNAT ports available per IP configuration, allowing a total of 2048 SNAT ports for the deployment. 如果部署增加到 50 部虛擬機器,即使每部虛擬機器預先配置的連接埠數目維持不變,部署還是可以使用總計 51,200 (50 x 1024) 個 SNAT 連接埠。If the deployment were to be increased to 50 virtual machines, even though the number of preallocated ports remains constant per virtual machine, a total of 51,200 (50 x 1024) SNAT ports can be used by the deployment. 如果您想要向外延展部署,請檢查每一層的預先配置 數目,以確定您將向外延展到個別層的最大值。If you wish to scale out your deployment, check the number of preallocated ports per tier to make sure you shape your scale-out to the maximum for the respective tier. 在先前範例中,如果您已選擇擴增到 51 個而不是 50 個執行個體,您會進到下一層且最終在每部虛擬機器及總計上具有較少的 SNAT 連接埠。In the preceding example, if you had chosen to scale out to 51 instead of 50 instances, you would progress to the next tier and end up with fewer SNAT ports per VM as well as in total.

如果您擴增到下一個較大後端集區大小的層級,且必須將已配置的連接埠重新配置,則部分輸出連線可能會逾時。If you scale out to the next larger backend pool size tier, there is potential for some of your outbound connections to time out if allocated ports have to be reallocated. 如果您只使用部分 SNAT 連接埠,則相應放大到下一個較大後端集區大小並無意義。If you are only using some of your SNAT ports, scaling out across the next larger backend pool size is inconsequential. 每當您移至下一個後端集區層級時,會有一半的現有連接埠重新配置。Half the existing ports will be reallocated each time you move to the next backend pool tier. 如果您不希望發生這種情形,則必須讓部署符合層大小。If you don't want this to take place, you need to shape your deployment to the tier size. 或確保您的應用程式可以視需要偵測及重試。Or make sure your application can detect and retry as necessary. TCP 存留可協助在 SNAT 連接埠因為重新配置而無法運作時進行偵測。TCP keepalives can assist in detect when SNAT ports no longer function due to being reallocated.

使用 Keepalive 來重設輸出閒置逾時Use keepalives to reset the outbound idle timeout

連出連線有 4 分鐘的閒置逾時。Outbound connections have a 4-minute idle timeout. 此逾時可透過輸出規則調整。This timeout is adjustable via Outbound rules. 您也可以使用傳輸 (例如 TCP Keepalive) 或應用程式層 Keepalive 來重新整理閒置流程,然後視需要重設此閒置逾時。You can also use transport (for example, TCP keepalives) or application-layer keepalives to refresh an idle flow and reset this idle timeout if necessary.

使用 TCP 存留時,在連線的一端啟用它們就已足夠。When using TCP keepalives, it is sufficient to enable them on one side of the connection. 例如,只在伺服器端啟用它們來重設流程的閒置計時器就已足夠,不需要在兩端都起始 TCP 存留。For example, it is sufficient to enable them on the server side only to reset the idle timer of the flow and it is not necessary for both sides to initiated TCP keepalives. 應用程式層也有類似概念,包括資料庫用戶端-伺服器組態。Similar concepts exist for application layer, including database client-server configurations. 檢查伺服器端是否有應用程式特定 keepalive 的選項。Check the server side for what options exist for application-specific keepalives.

後續步驟Next Steps

我們一定會想要改善客戶的體驗。We are always looking to improve the experience of our customers. 如果您遇到本文未列出或解決的輸出連線問題,請透過此頁面底部的 GitHub 提交意見反應,我們將儘快解決您的意見反應。If you are experiencing issues with outbound connectivity that are not listed or resolved by this article, submit feedback through GitHub via the bottom of this page and we will address your feedback as soon as possible.