question

MikeTaylor-9364 avatar image
0 Votes"
MikeTaylor-9364 asked SaiKishor-MSFT commented

TCP Connections dropped after approx 5 mins of inactivity

I have the idle timeout set to 20 minutes. However, tests with a Python TCP client talking to a a Python TCP server using "epoll", give me an issue when the idle time is roughly 5 minutes of inactivity. I loose the last packet sent, and get an error thrown from the client when after 10 minutes it tries to close the connection, as follows:-

Traceback (most recent call last):
File "atlas_client.py", line 121, in <module>
make_connection()
File "atlas_client.py", line 111, in make_connection
skt.shutdown(socket.SHUT_RDWR)
OSError: [Errno 107] Transport endpoint is not connected


Running this test locally on my Ubuntu VM, gives no such issue.

I am a bit of a loss as to what to try. I can (I believe) force keep-alive packets, however on a 4g connection when we are paying for each and every byte sent/received I don't really want to do this.

How can I tell whether this is an Azure firewall issue, Ubuntu VM or a problem with my code?

azure-virtual-networkazure-virtual-machines-networkingazure-firewall
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

MikeTaylor-9364 avatar image
0 Votes"
MikeTaylor-9364 answered SaiKishor-MSFT commented

That's ok thank you, I have asked the question via Azure portal.

· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Thank you Mike.

0 Votes 0 ·
CristianSPIRIDON72 avatar image
1 Vote"
CristianSPIRIDON72 answered

I think this is because Azure firewall:
https://docs.microsoft.com/en-us/azure/firewall/firewall-faq

Check the paragraph for "TCP Idle Timeout".

Hope that help

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

SaiKishor-MSFT avatar image
0 Votes"
SaiKishor-MSFT answered

@MikeTaylor-9364

A standard behavior of a network firewall is to ensure TCP connections are kept alive and to promptly close them if there's no activity. Azure Firewall TCP Idle Timeout is four minutes. This setting isn't configurable. If a period of inactivity is longer than the timeout value, there's no guarantee that the TCP or HTTP session is maintained.
A common practice is to use a TCP keep-alive. This practice keeps the connection active for a longer period. For more information, see the .NET examples. As given in the doc- https://docs.microsoft.com/en-us/azure/firewall/firewall-faq

Hope this helps. Please let us know if you need any further assistance. Thank you!

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

MikeTaylor-9364 avatar image
0 Votes"
MikeTaylor-9364 answered MikeTaylor-9364 commented

Just so I understand...

So although the TCP idle timeout is set at 30 minutes (according to the IP configuration on Azure), the network firewall will close in-active sessions after 4 minutes of inactivity (where keep-alives haven't been set).

We have GPRS devices (which we have been using for years) that have no ability to set "keep-alives", can we set this server side (on the Azure VM) to keep the session active?

· 2
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

@MikeTaylor-9364

Can you confirm where you set the TCP idle timeout to 30 minutes?
You can implement a script on the Azure VM for keepalives, however, there is no managed service on Azure side to do the same.

0 Votes 0 ·

Please see below for screenshot

0 Votes 0 ·
KhurramRahim avatar image
1 Vote"
KhurramRahim answered
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

MikeTaylor-9364 avatar image
0 Votes"
MikeTaylor-9364 answered MikeTaylor-9364 edited

![38221-screenshot-2020-11-08-172859.png][1]I am still confused as to why the "idle" timeout value on the Azure system can be set to 30 minutes, yet it still drops connections after 5 minutes.




5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

SaiKishor-MSFT avatar image
0 Votes"
SaiKishor-MSFT answered

@MikeTaylor-9364
Thanks for providing the screenshot. I understand that you have configured the timeout value in the public IP address portal. Public IP address has an adjustable inbound originated flow idle timeout of 4-30 minutes, with a default of 4 minutes, and fixed outbound originated flow idle timeout of 4 minutes.

So if this flow is outbound originated, you cannot adjust the idle timeout using the portal. The only other option is to use the keepalives option. Please let me know if you have any other questions. Thank you!

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

MikeTaylor-9364 avatar image
0 Votes"
MikeTaylor-9364 answered MikeTaylor-9364 commented

@SaiKishor-MSFT

The client (the device on the mobile network) makes a persistent TCP connection to the server running on Azure. I would of therefore expected this to follow the rule of "inbound originated", which is set to 30 minutes. What is actually happening, is that after ~4-5 minutes of inactivity , the connection is dropped, however "netstat" still shows it as being alive.

When after 10 minutes, the client tries to send the next packet of data, it is being lost. Our mobile operator confirms there is no time-out on the SIM and this issue appears to of started since moving the server software to Azure.

I can force it to keep the connection going by enabling keep-alives, however this means more traffic on the wire, which ultimately means more costs to us.

Why, in this instance, would the fixed outbound flow idle period of 4 minutes, be taking precedence?

· 2
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

@MikeTaylor-9364
Thanks for clarifying the direction of traffic. Could you provide the following details-

  1. Capture of traffic on client and server sides using any packet capture software such as wireshark and let me know what you see between the server and client (if possible please share what you see by removing/altering any sensitive data).

  2. Can you confirm the type of client and server machines on both sides and the traffic flow between them (i.e., the path and devices they traverse)?

Thank you!



0 Votes 0 ·

@SaiKishor-MSFT

1/. I tried to attach a pcap (take from tcpdump) from both client and the server, however this server won't allow it, any suggestions? To simplify the situation, this example is not sending real data, but just highlights the problem we are seeing. After 5 minutes, the client will try and send it's next payload of data, which fails and returns a "Transport endpoint is not connected" error.39050-atlas-client.txt

2/. Both the server and the client in this example are simple Python3 scripts. Obviously in real-life, this is not the what we are using. The Quectel modems send data across the 2G mobile network, but again, I have replicated this issue using my broadband connection. Is that what you meant?


0 Votes 0 ·
atlas-client.txt (3.7 KiB)
MikeTaylor-9364 avatar image
0 Votes"
MikeTaylor-9364 answered SaiKishor-MSFT commented
· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

@MikeTaylor-9364
Thanks for the captures. However, the packets don't seem to match on both sides i.e., they don't belong to the same flow of traffic. I was able to verify this based on the identification number of the IPv4 packets. Could you re-capture it and make sure you do it simultaneously on both sides.

From the given captures, I see the client side is receiving a RST for the connection but I don't see the server sending a RST packet to end the connection.

If you prefer to do this troubleshooting directly with a Support Rep over a call, I can give you a onetime free support exception. If this is something you want to do, please let me know your Subscription ID so I can request for the same for you. Thank you!

0 Votes 0 ·
MikeTaylor-9364 avatar image
0 Votes"
MikeTaylor-9364 answered SaiKishor-MSFT commented

@SaiKishor-MSFT

Both side were done simultaneously, but to be sure I have redone them again and uploaded to the same place as before.

Here is a guide to to the timeline:-
11:56:00 Server was started
11:57:00 Client was started (Pkt No 1 in client trace)
11:57:17 Client sends first packet (Pkt No 4 in client trace, Pkt No 6 in server trace)
12:02:17 Client (attempts) sends second packet (Pkt No 8 in client trace, Nothing in server trace)
12:02:36 Client gives up and closes the connection (Pkt No 10 in client trace, Nothing in server trace)
12:06:20 Server was stopped (Pkt No 10 in server trace).

The reason you don't see the server send a RST is the connection has already been broken?

Sub Id is 48dfe72b-8506-4366-9425-869bd1f64cef

· 2
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

@MikeTaylor-9364

I have reviewed the new captures but don't see the same packets. Analyzing both the captures separately, I see that the client is receiving a RST packet, however, I don't see the server sending a RST. There may be an intermittent device in between sending the RST packet. However, I would suggest troubleshooting this issue over a call/chat with Support directly.

You have been enabled for one-time Free Technical Support. To create the support request, please do the following:
• Go to the Health Advisory section within the Azure Portal: https://aka.ms/healthadvisories
• Select the Issue Name "You have been enabled for one-time Free Technical Support"
• Details will populate below in the Summary Tab within the reading pane and you can click on the link "Create a Support Request" to the right of the message

Please let me know if you have any issues reaching out to support or need any further assistance. Thank you!

0 Votes 0 ·

@MikeTaylor-9364
Please let us know if you need further assistance with this issue. Thank you!

0 Votes 0 ·