Parallel programming: Where Do We Go From Here: Part 1
The Performance of Desktop Applications in the ManyCore Era
Beginning in early 2008, machines with the latest quad-core processors became available from the major manufacturers. Should you be excited about the prospect? Should you run right out and buy one?
These machines will have 4 independent processor cores on a single processor chip, each able to run independent OS or application program threads concurrently. The processor core terminology is designed to distinguish this architecture from processors that support simultaneous multithreading (SMT) technology like Intel’s HyperThreading (or HT) that simulates multiprocessing on a single processor core. Quad-core means four physical processors reside on the chip; eventually, with HT-enabled you can then simulate 8 logical processors on a single chip (supporting 8-way concurrency).
These latest multi-core machines represent an extension of an ongoing trend in processor design that has become known as the ManyCore architecture. Instead of machines with significantly faster CPUs, with ManyCore we expect machines with more physical processors on the chip. And these new parallel processing chips are directed at standard server, desktop, and portable computers. This year’s crop of quad-core machines are then expected to be replaced by a subsequent generation of ManyCore machines with 8, 16, 32, 64, or more processor cores over the next 5-10 years. These developments in computer hardware present a serious challenge for the software development community to harness this computing power effectively in the next generation of computer software.
In this series of blog entries, I am going to look at that challenge from several perspectives, including the advances in software development technology that are necessary to make many-core processors yield their full potential. Multiprocessor scalability and the software necessary to exploit massively parallel processors is such a wide and deep topic in computer engineering that it is difficult to tackle head-on. In this first installment, I will focus on the factors influencing the hardware evolution. It also discusses why many-core processors are happening now. And, finally, I will talk about what to expect in the way of scalability & performance from this hardware running both current application software and the software of the future that will be designed to exploit the Manycore platform fully.
Along with the move towards multiple CPUs on a single multiprocessor chip, the discussion will highlight several related hardware developments, including power management, improved timing features, 64-bit computing trends, non-uniform memory access speeds, new instructions sets, and even a word or two about the impact of virtualization technology.
First, let’s look at the performance and scalability implications of deploying multiprocessors to run current Windows desktop and server applications. This will provide a broader perspective on some of the key performance-oriented challenges associated with many-core processors.
To illustrate these challenges, please look at Figure 1. It is a screen shot from a performance monitoring session I ran showing utilization on my desktop computer for one of my somewhat typical days as a Knowledge Worker. During the course of this particular day, I attended a couple of meetings when my machine was virtually idle. (Ah, meetings, I just love them!) But while I was working at my desk, I was mostly using my computer. I answered a bunch of e-mails, surfed the web a few times, and even managed to find some time to do some programming and software testing. Figure 1 shows processor utilization across a dual-core two processor machine at one minute intervals throughout the day. As you can see, processor utilization peaked at less than 40% busy. Given this feeble load on my machine, explain to me again why I need more processing power. If the two processors I have currently are usually idle, wouldn’t having four of them mean just that much more idle horsepower?
This is just a bit of the strong evidence that suggests that the need for many-core processors on the average person’s desktop machine is currently quite limited. Therein lies the first and foremost challenge to the industry’s goal of replacing your current desktop hardware with a new machine with 4, 8 or more processors. The overall performance gains that this hardware promises absolutely require application software that can fully exploit parallel processing hardware. That software is generally not available today for typical desktop applications which were written with conventional single CPU machines in mind. Not enough of today’s software was designed with parallel processing in mind. But that is beginning to change as the software industry rushes to catch up with the hardware. (Nothing new about this, by the way – software technology usually lags the hardware by 3-5 years or more.)
But the situation is far from hopeless. Developers of desktop applications have not had good reasons to program for multiprocessors until recently. Consider, for example, the xBox 360 platform, which is a three-way 64-bit multiprocessor with a parallel vector graphics co-processor (GPU) that is capable of performing something like 20 billion vector operations per second. The rich, immersive user experience associated with a well-designed xBox 360 application is a harbinger of the future of the Windows desktop as we start to build desktop applications that fully exploit multi-core architecture machines.
But getting there will be a serious challenge. I say that for several main reasons:
· Although the software industry has made significant progress in parallel programming, there is no general purpose method available that can reliably take a serial process and turn it into a multi-threaded parallel processing application. While some parallel programming patterns like “divide-and-conquer” can be used in many situations, most applications still require parallelization on a case by case basis.
· Multiple threads executing in parallel have non-determinative execution patterns that introduce subtle errors that can be very difficult to debug with existing developer tools.
· If adding a little parallelism to a serial application often leads to major speed-ups, there are often diminishing returns from adding more and more parallel processing threads. Determining an optimal number of parallel processing threads can be quite difficult in practice.
More about the multi-core challenge and related topics in software performance in the next blog post.
 During lunch and the other periods when I wasn’t even at my desk, utilization of the machine never dropped below 2%, for which I can thank all those background threads in both Windows Vista and Office 2007 that wake up and execute from time to time, regardless of whether I am using the machine or not.