question

IanTurner-5104 avatar image
1 Vote"
IanTurner-5104 asked vangyeco-0121 answered

Extreamly slow Upload speed in Windows (all other OS's on network are fine)

I've been having this issue since at least January 2021 on my entire fleet of Windows devices. I first noticed it when my offsite backups stopped completing in time.

Upload speed in Windows is being throttled by something. Download speed is unaffected.

WAN is 2Gbps symmetrical. My ISP (Washington State K-20 Telecommunications Network) confirmed that their circuit is not the cause and is capable of 1800 Mbps symmetrical throughput. Distance is a factor, as I can get 400Mbps to my local telco (which isn't my IPS). Going past a few hundred kilometers it drops to 30-70Mbps. There is an initial burst, but drops quickly.

I have an HPE/Aruba network and a Sophos XG 310 v2 running SFOS 18.0.4 MR-4. If I plug a client directly into my 10Gbit fiber before my firewall, I can get acceptable speeds on Windows. I haven't been able to find any setting in Sophos XG to tweak that would make any difference. Local iPerf3 tests ruled out my core router/switch/datacenter.

All tests run from Hyper-V guests on Server 2019 Datacenter running on HPE ProLiant DL360 Gen10 hardware.
Windows Server 2019 Datacenter speed tests:
81206-image.png
81275-image.png

Linux (CentOS 8) speed tests:

 Speedtest by Ookla
    
      Server: Comcast - Seattle, WA (id = 1782)
         ISP: Washington State K-20 Telecommunications Network
     Latency:     3.93 ms   (0.16 ms jitter)
    Download:   915.29 Mbps (data used: 1.2 GB)
      Upload:  1537.96 Mbps (data used: 1.8 GB)
 Packet Loss: Not available.
  Result URL: https://www.speedtest.net/result/c/c4bca417-e246-4f46-964a-c4291e4a3914

 Speedtest by Ookla
    
      Server: Comcast - Sacramento, CA (id = 9436)
         ISP: Washington State K-20 Telecommunications Network
     Latency:    24.65 ms   (0.13 ms jitter)
    Download:  1232.96 Mbps (data used: 1.7 GB)
      Upload:  1007.46 Mbps (data used: 1.3 GB)
 Packet Loss:     0.4%
  Result URL: https://www.speedtest.net/result/c/21032a9c-8285-44fe-aadf-ad4dc3d90428

OS affected for me:

  • Windows 10 2004

  • Windows 10 20H2

  • Windows 2016

  • Windows 2019

All devices are fully updated, firmware included.

I've tweaked:

  • Limit reservable bandwidth

  • AV

  • Safe mode boot

  • Domain and non-domain computers

  • autotuning

  • Interrupt Moderation

  • Receive Side Scaling

  • TCP Congestion Control

  • Large Send Offload

I've tried the following hardware:

  • Dell Optiplex 7040

  • HP Elitebook 840 G5

  • HPE ProLiant DL380 Gen10

  • HPE ProLiant DL360 Gen10

These OS' are fine:

  • ChromeOS

  • Android

  • MacOS

  • iOS

  • Linux (CentOS, HyperV)

This is a continuation of https://docs.microsoft.com/en-us/answers/questions/89768/slow-wired-upload-speed-vs-linux-on-same-hardware.html


windows-serverwindows-10-network
image.png (43.6 KiB)
image.png (43.5 KiB)
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

vangyeco-0121 avatar image
0 Votes"
vangyeco-0121 answered

I have similar Issue... I have the issue 1,5 year now after some windows update..
I have made every change on every forums i have seen..
at speed test net i have normal upload speeds like i have on my android phone (around 20 to 40mbps/up)
but on other sites like testmyspeed, restream/testspeed i can't go more than 4mbps/up
I tried 3 more network cards both wired and wireless
i tried to make my mobile hotspot still the exact same issue
mobile on the same sites at the same time work as it should even with data plan or with the same modem/router that pc uses with wifi
the problem is only on my windows 10 pc. i tried on 3 different pc's 2 with completely different hardware all of em are pure fresh windows 10 installations.
What is more strange is that sometimes the problem goes away by itself. yesterday i tried everything again.. i updated bios and all drivers changed cables etc at the start of the day everything were ok for awhile but at late night till 7am when i was trying everyhing yet again i was capping at 1,5/upload (!!) I even updated windows with a beta manual update...same issues.....now without messing anything 18:15 pm that is considered an hour that many people use the internet my upload capping at 20mbps/upload like the problem gone away..but im sure its not gonna last!!
so Im starting to suspect that its not only pure windows 10 issue but how isp handle the data that recieve from windows 10 and how handle that data/packets..maybe at times when more people are online and use the network the isp have some different packet handle or something? If it was only by windows 10 issue alone then how so many streamers on twitch and youtube can stream without any issues? I wonder now if go to windows 11 the problem will go away and stay away... although this gonna be a dealbreaker cause im using many programs/games that run not good or at all at windows 11......

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Magnis-8095 avatar image
1 Vote"
Magnis-8095 answered

Hi there,

Any update on the topic after a year? I upgraded my ISP recently and noticed the same issue.
Linux android and apple machines not affected.

Magnis

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

AxelRodz-4369 avatar image
1 Vote"
AxelRodz-4369 answered

I'm having the same problem. My setup is Windows 10 host with a pfSense Hyper-V guest as the router, Intel T350-2 Ethernet NIC, 250/10 Mbps cable internet. Downloads are fine, and everthing except my Windows 10 host uploads at the nominal upload speed, which only gets to about 0.1-0.2 Mbps. However, when I switch from wired to WiFi (Intel WiFi 6 AX200) the upload speed jumps to the nominal speed, even without disabling the wired NICs (I just enabled the WiFi NIC and gave it the lowest route metric). So maybe the problem is not in the TCP/IP settings?

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

GaryNebbett avatar image
0 Votes"
GaryNebbett answered EliasTaye-2762 commented

Hello @EliasTaye-2762,

That is all perfectly consistent with the foregoing messages. Reverting to an older version of Windows will not help - one needs to wait for the coming improvements. The slowness is not caused by intentional throttling - indeed probably quite the contrary; somewhere along the path to the systems that you are using, an update, probably intended as an improvement, is interacting badly with the current Windows TCP congestion mechanism.

Gary

· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

@GaryNebbett lets hope Microsoft fixes this problem quickly becuase we the users are really struggling! switching to Ubuntu is on the table b/c we'll be forced in the work enviroment we have no other choice...we're helpless. we actually wasted the ISP's tme over non sense then... it wasnt their fault. it was Microsoft all along. and my company is forcing for a quick fix. we're planning to do operations on a webapp over linux machines the coming week if this is not quickly fixed by then. Thank you Gary for the clarification. and Microsoft needs to step up and fix the problem. Have a good day. - Elias

0 Votes 0 ·
EliasTaye-2762 avatar image
0 Votes"
EliasTaye-2762 answered EliasTaye-2762 commented

@GaryNebbett i've seen them and it doesn't say when they will update. have you made peace with the problem and decided to live with it? or have you found a way to overcome this? and would downgrading help?

· 2
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hello @EliasTaye-2762,

I don't have the problem myself. Nothing other than these modifications will improve the situation for Windows.

Gary

0 Votes 0 ·

@GaryNebbett surprisingly i have been facing this on all windows devices including windows 7. and it happened recently after feburary... i checked with my ISP they are not throttling...and other devices are just fine and i am only having a problem with upload speeds!

0 Votes 0 ·
EliasTaye-2762 avatar image
0 Votes"
EliasTaye-2762 answered GaryNebbett commented

@GaryNebbett hey Gary i'm currently facing the same issue... did you find a fix for it? i've been struggling for months now and linux is working fine but i dont want to migrate. it's really bothering my upload speed.

· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hello @EliasTaye-2762,

As my last comment mentioned, the following two articles indicate strongly that improvements are on the way (currently available via Windows Insider builds): https://techcommunity.microsoft.com/t5/networking-blog/algorithmic-improvements-boost-tcp-performance-on-the-internet/ba-p/2347061 and https://dropbox.tech/infrastructure/boosting-dropbox-upload-speed.

Gary

0 Votes 0 ·
GaryNebbett avatar image
0 Votes"
GaryNebbett answered GaryNebbett edited

Hello All,

I thought that a graphical representation of this problem might be helpful. Let’s start with two pictures of a “healthy” speed test (where the test results match the nominal or expected speed).

First, an overview of the 5 parallel (concurrent) connections used during the upload speed test:
94395-image.png

This is a larger view of one connection:
94422-image.png

The blue line is the TCP send sequence number; the green line is the size of the congestion window (not to scale on the Y-axis – it is its shape that is most interesting).

At 6 arbitrary Y-values, points are shown when “interesting” events occur; the events are:

• TcpEarlyRetransmit
• TcpLossRecoverySend Fast Retransmit
• TcpLossRecoverySend SACK Retransmit
• TcpDataTransferRetransmit
• TcpTcbExpireTimer RetransmitTimer
• DSACK [RFC3708] arrival

Some of these events do not always have a visible impact on the congestion window. For example, expiration of the retransmit timer (lime green coloured dots on the graph) do not affect the window if a “tail loss” is suspected. Potential for “tail loss” situations occur in the speed test because, within each connection, the WebSocket protocol is used to send chunks of data and this requires WebSocket protocol level handshakes.

The overview shows that there were only a few retransmissions (six, I think), roughly equally divided between fast retransmits (based on duplicate acknowledgements, SACK, etc.) and “slow” retransmits (retransmit timer expiration).

Here is a less healthy speed test:
94383-image.png

And again, a larger view of one connection:
94384-image.png

In these traces there are a large number of fast retransmits, almost all of which are accompanied (a few milliseconds later) by a DSACK notification (indigo coloured dots, the highest row of dots) – indicating that the retransmission was not needed. The impact of these retransmissions on the congestion window is clear to see – it is kept very small for almost the entire duration of the speed test.

The first half a second of the above trace hints at what might have been – the steep slope of the blue line suggests that a much higher upload speed is probably possible.

As mentioned in previous postings, there is not much that an end-user can do about this problem. The device responsible for the “ACK compression” (impacting timing and ordering of ACK delivery) is probably not under their control and the Microsoft congestion control provider does not use DSACK information to “undo” its erroneous reductions of the congestion window.

Gary


image.png (114.4 KiB)
image.png (71.0 KiB)
image.png (135.7 KiB)
image.png (84.4 KiB)
· 1
5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

Hello All,

These articles suggests that improvements in Windows' handling of out-of-order packets are coming: https://techcommunity.microsoft.com/t5/networking-blog/algorithmic-improvements-boost-tcp-performance-on-the-internet/ba-p/2347061 and https://dropbox.tech/infrastructure/boosting-dropbox-upload-speed.

Gary

0 Votes 0 ·
GaryNebbett avatar image
0 Votes"
GaryNebbett answered

Hello All,

In the network traces that I have seen which record spurious retransmissions there is another characteristic that I mentioned as an aside, namely that "ack only" packets tend to appear in "bunches" with sub-microsecond spacing between the packets. I did not place too much emphasis on this because the process of capturing trace data can introduce "artefacts" (characteristics that would not be present if one could monitor the low-level signalling of the messages).

Whilst doing some background reading, I found this article by Christian Huitema: Implementing Cubic congestion control in Quic. In the section on "Spurious retransmissions", he writes:

If you look closely at the graph of RTT above, you will see a number of vertical bars with multiple measurements happening at almost the same time. This is a classic symptom of “ACK compression”. Some mechanism in the path causes ACK packets to be queued and then all released at the same time.

This gives a name ("ACK compression") to the bunching that can be researched in the Web. Christian was writing about apply Cubic to HTTP/3 (UDP), but it is equally relevant to its use with TCP.

The use of the expression "Some mechanism in the path" is compatible with my belief that the source of the problem being discussed in this thread is beyond our control and that additional mechanisms are needed in the Windows implementation to compensate for it (e.g. undoing reductions in the congestion window caused by spurious retransmissions).

Gary

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

GaryNebbett avatar image
0 Votes"
GaryNebbett answered

Hello @IanTurner-5104,

Speedtest uses several HTTP connections in parallel to test the upload speed. So far, I have just looked at one connection in detail. This connection retransmitted 85 segments, all of which were spurious (i.e. were ultimately received twice because the out-of-order delivery confounded the lost segment detection - the original segments were not lost but just arrived "late").

The "CUBIC for Fast Long-Distance Networks draft-eggert-tcpm-rfc8312bis-01" draft RFC (dated 2 February 2021), which might update "CUBIC for Fast Long-Distance Networks" (RFC 8312) says:

CUBIC MAY implement an algorithm to detect spurious retransmissions,
such as DSACK [RFC3708], Forward RTO-Recovery [RFC5682] or Eifel
[RFC3522]. Once a spurious congestion event is detected, CUBIC
SHOULD restore the original values of above mentioned variables as
follows if the current cwnd is lower than prior_cwnd. Restoring
to the original values ensures that CUBIC's performance is similar to
what it would be if there were no spurious losses.

The current Windows implementation of CUBIC is not undoing the reductions in the congestion window caused by spurious retransmissions - perhaps all of your other systems either do this or use a different congestion control mechanism.

Unless the out-of-order delivery is caused by a device under your control (unlikely), then there is not much that you can do about this.

Gary

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.

GaryNebbett avatar image
0 Votes"
GaryNebbett answered

Hello @IanTurner-5104,

There are certainly examples of very out-of-order delivery triggering unnecessary retransmissions and reduction of the congestion window in your trace data. The presence of D-SACK data (RFC 2883) in the trace leaves no doubt about this. I need to do some more work to quantify the frequency of such events and their impact on the throughput.

I doubt that the "out-of-order" behaviour is caused by your equipment; I think it is much more likely to be caused by network devices elsewhere in the Internet.

Gary

5 |1600 characters needed characters left characters exceeded

Up to 10 attachments (including images) can be used with a maximum of 3.0 MiB each and 30.0 MiB total.