question

Eggi-9481 avatar image
0 Votes"
Eggi-9481 asked GaryNebbett commented

Winsock2 - TCP Resends packet with increased sequence

Win21H1 (19043.1110) Win10 Pro 64Bit, and seen on 20H2, office PCs.

While sending data greater than the MTU with Winsock2 and TCP sometimes i experience that the last two packets of the stream are "retransmitted". Only that the resend packet contains the data of the 2nd last packet combined with the last packet and with a data offset of the last packet and an increased sequence number of the last packet - so it is not wrong data, but no typical retransmission where the same packet with the same data and sequence is sent.

I have not found any pattern to make this happen. It occures about every 5 to 15 times when sending the first data packet.
What i can say is, that after i connect to the peer and when i get this behaviour, i get it for a few (maybe even all the time, not tested) packets. A disconnect and reconnect, or sending data with a different size eliminates this behaviour.
Heres the sequence of what I've done and tried. I mixed all variants, the programm executes everything - except WSAStartup - in the same thread.

Initialization:
1. WSAStartup (2,2)
2. Winsock2.socket(AF_Inet,SOCK_STREAM,IPPROTO_TCP) and IPPROTO_IP
3. WSAIOCTL with FIONBIO to set it non-blocking
4. setsockopt to SOL_SOCKET, SO_REUSADDR FALSE
5. setsockopt IPPROTO_TCP TCP_NODELAY to enable and disable Nagle
6. bind with and without
7. connect - in non blocking with a select loop to check if valid
8. setsockopt SOL_SOCKET SO_SNDTIMEO to check if it's some RTO thingy
9. setsockopt SOL_SOCKET SO_RCVTIMEO
10. Here my programm loops through select fd_rcv to check if I got data

Data Send:

  1. setsockopt SOL_SOCKET SO_SNDBUF - made no difference, it's said that the newer version should not need it

  2. some select to see if fd_write is ok

  3. send - Tried WSASend too, no lappend

  4. The programm continues polling select to see if rcv data is ready

120792-retrseqincremented.png


windows-10-network
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

GaryNebbett avatar image
1 Vote"
GaryNebbett answered GaryNebbett commented

Hello @Eggi-9481,

Thanks for the .etl file - it was satisfying to see how clearly it shows "tail loss probe" behaviour.

In the .pcapng file that you made available, there were four retransmissions and one can measure the time from sending the last small segment to the retransmission of that data. For example:

 15:17:21.239458 192.168.127.10.57117 > 192.168.127.200.50101: P 16129:16385(256) ack 1 win 64544 (DF)
 15:17:21.270311 192.168.127.10.57117 > 192.168.127.200.50101: P 12353:16385(4032) ack 1 win 64544 (DF)

Here, the interval is 30.85 milliseconds and the subsequent intervals are 38, 61 and 106 milliseconds.

The default Tail Loss Probe (TLP) timeout is 2*SRTT (SRTT = smoothed round trip time); the .etl file shows how the SRTT develops during the transfer:

121623-image.png

The .etl file also contains explicit events for TLP - they look like this:

121661-image.png

The TLP mechanism "probes" with the next packet (rather than old (retransmitted)) data if it is available and allowed by the various "window" limitations (e.g. send window, congestion window). The .etl file shows how the congestion window develops over time:

121636-image.png

It does not look as though the send/congestion windows are the problem and this is confirmed by the connection summary:

121701-image.png

This basically says that the reasons for all the delays in sending data was lack of new data (the "R" before "Snd" in the last two lines is probably a typo in the ETW manifest).

It seems that your application is only sending a new 16 kibibyte "message" (block of data) when the previous block has been "flushed" (sent and acknowledged), so the only data available for a TLP is old data (hence the retransmissions).

The Windows side is behaving perfectly correctly and appropriately - it is the long delay before data is acknowledged that is causing the retransmissions.

Gary


image.png (30.7 KiB)
image.png (58.4 KiB)
image.png (57.0 KiB)
image.png (17.3 KiB)
· 3
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hi,

Thanks for the info. Indeed, the data is always a message. And it does not look like i can disable this behaviour?
I am searching for a way to get this data out of the .etl file. Are you using XPerf? Can you share the Preset?

Regards,

 Georg
0 Votes 0 ·

Hello Georg,

There is a command that "claims" to disable TLP: netsh interface tcp set supplemental template=internet taillossprobe=disabled; however, I just get a "The request is not supported" error when I try it.

I use my own tools to process and view .etl files since most of the generally available tools seem to focus on aggregating the data in .etl files and are tailored to performance analysis rather than behaviour analysis.

Gary

0 Votes 0 ·

Hello Georg,

I just remembered two tools that can show the details in .etl files in a comfortable way, the two old network monitoring tools from Microsoft: Microsoft Network Monitor ("netmon") and Microsoft Message Analyzer ("MMA").

MMA is the newer and better tool, but it has now been discontinued and is no longer available for download from Microsoft. Netmon is still available for download.

Gary

1 Vote 1 ·
GaryNebbett avatar image
0 Votes"
GaryNebbett answered GaryNebbett converted comment to answer

Hello @Eggi-9481,

Making a trace of the Microsoft-Windows-TCPIP ETW provider when performing your tests might be helpful - the events reported by the TCP/IP stack in the trace might give some insight into the retransmission decisions.

There are a number of features of the trace that seem interesting:

  • MSS of 4034 (what technology is used for the connection?).

  • Client and server appear to be on the same network (192.168.127.200 and 192.168.127.20) and the round trip time might be approaching 15 milliseconds (quite high). The interval between frames 960 and 961, and also 967 and 969 are the basis for my RTT estimate.

  • Frame 968 might be acknowledging the data sent 45 milliseconds earlier; the fact that it arrives 125 microseconds after the retransmit means that it is unlikely to be an acknowledgement for that packet; the duplicate acknowledgement received 17 milliseconds later is probably an acknowledgment triggered by the arrival of the retransmitted data.

Gary



· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hi Gary,

So learnt something new with the providers :) Can i filter for a specific network adapter too?
I had TCPIP logged with logman and I am seeing Error:ReassemblyExit / Entry errors in the logging which match from the timing with the wireshark retransmissions. What can I do with this information? In the send and receive section I don't see anything strange.
I've found the Connection ID and when I look into RemoteBound, the Information:SendPosted event is always about 20-60ms before the Reassembly error.

I have to check the .etl a bit more if there is sensitive data logged before uploading it.
I uploaded a log, which is the wireshark capture (pcapng).

This is a point to point connection, where the Windows PC functioning as server is the 192.168.127.10 machine and the client, a custom device, is the 192.168.127.200 address. A switch is between them, but there are no other participants.

  • MSS: We defined some custom protocol based on TCP. We used the increased MSS to reduce overhead.

  • I think so too,

  • I set up a switch and port mirrowed the data on the client side to analyse if this is the case. It shows me the same behaviour as in the snippet i've shown. So I think the wireshark log is correct. And yes, the dup ack is because of the retransmitted packet.


120800-winreassemblyretrans.log


0 Votes 0 ·
GaryNebbett avatar image
1 Vote"
GaryNebbett answered Eggi-9481 commented

Hello @Eggi-9481,

I am not sure if it is just a simple error or different use of terminology - I would call 192.168.127.10 the "client" since it sends the original "SYN" and 192.168.127.200 the "server" (sending SYN,ACK).

 15:17:20.736945 192.168.127.10.57117 > 192.168.127.200.50101: S 3381389475:3381389475(0) win 64544 <mss 4034,nop,wscale 8,nop,nop,sackOK> (DF)
 15:17:20.751673 192.168.127.200.50101 > 192.168.127.10.57117: S 19990761:19990761(0) ack 3381389476 win 24192 <mss 4032>
 15:17:20.751722 192.168.127.10.57117 > 192.168.127.200.50101: . ack 1 win 64544 (DF)

It would certainly be interesting to see the .etl file (assuming that it was created on the "client", the system that retransmits the data). I am now fairly confident that this is "tail loss probe" behaviour - the .etl file should confirm this.

The "filtering on capture" possibilities are limited in this case - I would suggest just capturing everything. The events in the trace are not tagged with a network interface; one could imagine developing a program that "approximately" filters the captured data but there are no standard tools (that I am aware of) that do anything like this.

Gary

· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hi Gary,

I am adding the .etl file here TCPIIIP.etl. All Errors should match a "retransmission". You can check connection ID 230.
I went through RFC 8985 (RACK-TLP Loss Detection), should this be the behaviour of the retransmission?


0 Votes 0 ·