From the Dept of Information Retrieval on .NET Compact Framework Network Performance

"How fast is the .NET Compact Framework?"

I have only recently organized enough thoughts and words to be able to answer this question with something more useful than "it depends".

The .NET Compact Framework, in the vast majority of cases, is fast enough. I know this because I've seen lots of complex applications running on the platform that are scalable and snappy. I've also seen apps that are dog slow. I used to claim that RPC's downfall was that it isolated the developer from the network but did not isolate the customer from the network. In a similar way, I fear we've isolated, or perhaps just delayed, developers from the pain of running on transiently connected, resource constrained hardware with lots of data and other apps running. Developing for devices, using any tool, is still a specialty. It's clear to me that we need to get faster (and we are), but we also need to do a better job at providing tools that give better visibility into the dynamic behavior of the system. Equally important, we need to provide some architectural guidance around best and worst practices ("don't read from a local table, convert to XML, serialize and send to another process via IPC, read into the DOM, modify it, pass it back, ..."). And finally I'm counting on the wider community to help us, and the community, to understand what works and what doesn't in practice and what we need to fix.

We presented the following information at a recent internal project review. I hope you'll be happy with the gains we've made to date and the work continues.

But this post isn't about the performance of the .NET Compact Framework, exactly. Lately, people, sensing that I'm feeling somewhat confident about general performance, have started asking me "How fast is networking with the .NET Compact Framework?"

"It depends."

I've like to be able to write an entry about networking performance with lots of tables of numbers and details and stuff. But I don't have the numbers. But I think I can do better than this post about nothing. I'm still trying to think up a way to plug my web site. I haven't thought of anything clever yet.

But I am prepared to ramble on for a while about a few things I've learned about the general problem of network performance. We will get better at specific measurements in the future. If you have better data, please share it.

Let's start with simple model of two boxes connected via a serial bus with hardware flow control. The bus has a fixed maximum throughput, measured in bits per second. The bus will deliver data to the destination as fast, but no faster, than the target can receive it. The sender can just push bits out the bus as fast as they will be accepted by both the bus and the target computer. The test app has a loop that generates blocks of data that repeat the sequence and sends them to the bus. The receiver consumes and discards the messages as they are received.

The first thing we would likely measure on this setup is the overall throughput; how many bytes / second can we move between the machines? Let's say we're not happy with the result and we want to start optimizing. Where should we start? The first step is to isolate the bottleneck. In this case, either the pipe is at capacity, the sender can't fill the pipe, or the receiver can't empty it. Working on the wrong thing will have no effect at all on the performance. The first easy check is CPU utilization at each end. If it's low, then the pipe is likely the bottleneck, or there's a timer somewhere in the code path that is not allowing the CPU to fill the pipe. If it's close to 100% at either end, then that is the place to start optimizing.

It's quite helpful to do some "order of magnitude" math to see if the measured results are in any way reasonable. Let's say we assume both machines above have a clock rate of 1 billion instructions per second (1 GHz) and the serial bus has a rate of 100Mbit/s. The serial bus throughput in bytes / sec is roughly bits/s / 10, or 10 Mbyte/s.

Bus Rate Bits/s 100,000,000
Bus Rate Bytes/s (/10) 10,000,000

Let's assume that the sending machine requires 10 machine instructions, per byte, to generate the formatted data buffers, and that the cost to send through the driver stack is 5 instructions / byte. With this information we can calculate the number of machine instructions that will be consumed by all net processing on the send side per second.

Instructions / Byte - Data Manipulation 10
Instructions / Bytes - Send/Recv Path 5
Total Instructions / Second 150,000,000

Since we know the clock rate of the machines, we can calculate the amount of processor load on the sending machine devoted to networking. In this case, it's 15%.

CPU Clock Rate / Second 1,000,000,000
CPU Utilization 15%

Clearly the bus is easily going to be saturated. The receiver is even more lightly loaded, as it is simply discarding the incoming bytes.

Bus Rate Bits/s 100,000,000
Bus Rate Bytes/s (/10) 10,000,000
Instructions / Byte - Data Manipulation 0
Instructions / Bytes - Send/Recv Path 5
Total Instructions / Second 50,000,000
CPU Clock Rate / Second 1,000,000,000
CPU Utilization 5%

Data compression would be a great optimization in the case. Assume that we count compression as an additional 25 instructions / byte on each side, and we achieve a 25% compression rate. For the sending side (the bottleneck), the math looks like this:

Bus Rate Bits/s 100,000,000
Bus Rate Bytes/s (/10) 10,000,000
Instructions / Byte - Data Manipulation 35
Instructions / Bytes - Send/Recv Path 5
Total Instructions / Second 400,000,000
CPU Clock Rate / Second 1,000,000,000
CPU Utilization 40%

The sender processor utilization is still only 40%, but the effective throughout of the system has increased by 25%, or 25Mbytes/s.

It's interesting to do this math on a 56KBits/s link and a 200 MHz processor, not unlike what you might see on a PocketPC on a (most excellent) GPRS network.

Bus Rate Bits/s 56,000
Bus Rate Bytes/s (/10) 5,600
Instructions / Byte - Data Manipulation 250
Instructions / Bytes - Send/Recv Path 10
Total Instructions / Second 1,456,000
CPU Clock Rate / Second 200,000,000
CPU Utilization 1%

I had to crank up the instruction / byte count to about 250 before I could get a 1% processor load. Clearly, compression is going to help, and CPU utilization is not a major concern.

If I substitute a 10Mhz LAN in the same system and don't change anything else, the processor can't even keep the pipe full. Focusing on the overall code path length / byte is the right optimization in this case. It's conceivable that expansion, say, by always sending the same block size, might even help (and be a disaster in the previous case).

Bus Rate Bits/s 10,000,000
Bus Rate Bytes/s (/10) 1,000,000
Instructions / Byte - Data Manipulation 250
Instructions / Bytes - Send/Recv Path 10
Total Instructions / Second 260,000,000
CPU Clock Rate / Second 200,000,000
CPU Utilization 130%

The next thing to consider is the size of transfers, as effective throughput x transfer size generally equals customer foot tapping delay, with a threshold of between 10-30 seconds generating a customer support call.

4800 56K 10M 1G
1K 2s 0.18s 0.001s 0.00001s
100K 208s 18s 0.1s 0.001s
100M 208333s 17857s 100s 1s

The key thing to note here is that at the bottom end (like on, say, GPRS networks), data size has a huge effect on responsiveness.

Finally, it's worth considering latency separately from throughput. You might expect the time it would take (in seconds) to send a single byte would be equal to 1 / effective bandwidth. In practice, multiple different protocol layers add extra bytes as envelope, and the entire package needs to be delivered before the single byte can be processed.

I'm going to break one of my strongly held principals, "measure twice and cut once", and speculate about where I think the .NET Compact Framework is going to "net out" on networking performance today.

At the managed sockets level, the .NET Compact Framework doesn't add much processor overhead to the code path. If you replicated the simple two box and water pipe test above, even in the CPU bound case, I would expect we would be reasonably close to a native C++ solution. In the real world, of course, developers like do interesting and useful things with data, and that takes processor time. As the amount of C# processing increases, the delta between managed code performance and C++ will increase. I can imagine a worst case scenario (say, a fully loaded database engine written entirely in .NET Compact Framework - something I wouldn't recommend doing) where we're 50% slower. Fortunately we are not generally CPU bound in most real world networking scenarios today.

In the more typical slow network / fast CPU case, at the managed sockets level, I expect there is also little cost of choosing C#. You would have to do a great deal of data processing to create a situation where the processor couldn't keep up with the network.

From both a throughput and latency perspective, web services changes the equation. The thing that makes web services great for real world interop, text readable messages, also adds considerable envelope size to the data. Swapping out managed sockets for web services, in the fully loaded, CPU bound case, would likely drop your performance considerably.

The best way to decide what technology to use is to try a simple test prior to writing lots of code. The key is to make sure the test scenario is representative of the ultimate configuration. If you develop a client / server app that, for example, requires a server round trip to validate every field, then latency is going to be a key issue. Be sure and try this on a slow network. But if you are downloading, say high scores, or small maps, then the convenience of web services might outweigh then envelope cost, even on the slow network, and will be in the noise on a fast network.

As compared to the bus scenario above, TCP/IP adds another level of complexity. Where a bus can be conceptualized as a water pipe, a TCP/IP connection looks more like a water pipe, feeding a pressure tank, with dozens of different size pipes coming off on it, some of which feed an outdoor fountain, which drains into a sump, which elevates the water to a dozen gravity feed storage tanks, each with dozens of different size pipes coming off on it ... Frankly, I'm amazed the thing works at all. The "backpressure" on a TCP sender is essentially a heuristic on the send side about how fast the network, end to end, can deliver packets. If the network delivers IP packets in roughly constant time, and damages or drops few of them, then TCP will stabilize at a relatively efficient transfer rate. If the packet delay is changing widely, the error rate high or constantly changing, then TCP will become confused, thinking that it is overwhelming the network, and will throttle back the send rate. The bad news is that there is nothing you can do about this. The good news is that the TCP vendors, us included, are always working to optimize this.

The reason I bring this up is that it can bite you on GPRS, and result in 50% decreases in throughput and corresponding increases in latency. At some point, you need to switch your application to a store and forward model / database replication model, where the UI is essentially a view on the local data, and an asynchronous replication engine periodically tries its best to keep the local and remote databases in sync.

The speed of your "first hop" TCP/IP network connection usually doesn't have much to do with the effective bandwidth, unless it is the slowest link (like, say, GPRS, for example).

If you are using TCP, learn about NAGLE. If you introduce any stop and wait behavior in your system, be wary of the huge effect on throughput (but not latency) that that can have.

I'm getting ready to go to MEDC in Las Vegas. I cut myself a pretty good deal this year. I'm on a panel discussion, which gets me one of those cool "speaker shirts", but I'm not giving a talk, which takes lots of preparation and frankly scares me silly. Both my team and some our MVPs are working on new material for the show. I look forward to seeing you there.


"We're all in this together".




This posting is provided "AS IS" with no warranties, and confers no rights.