To Share Or Not To Share

I remember finding the big-endian vs. little-endian argument fascinating back in the day when there was a real question about which processor architectures were going to wind up on top. It was interesting both because it really did seem so very important (I liked little-endian machines) and it was so obviously not a pure techical question, at least not from a software perspective.

A similar situation exists today in the area of programming concurrent systems -- should we build concurrent software on a shared memory model or rely on message-passing between isolated memory components? It's a similarly silly argument, not because of it being irrelevant, but because the answer is yes and yes. The two are neither exclusive, nor in competition.

It seems pretty clear that any programming model that is based on accessing shared memory will have trouble scaling linearly beyond a certain number of concurrent threads -- the pressure on the cache-coherence infrastructure will simply be too great unless you find ways to limit the actual concurrent accesses. The more concurrent processing, the more risk for lock contention, which inevitably cuts into scalability.

On the other hand, models relying on message-passing between isolated components promote separation of the accessed addresses but must be fairly coarse-grained in their interactions to realize any benefits: the overhead of passing data in messages quickly eats into any benefits you may derive from isolation, and message-passing is typically implemented using shared memory anyway, both for the message payload and the locks protecting the messaging infrastructure.

The scalability benefit of isolation comes from each component spending significant time minding their own business, and it may be very hard to find lots of components like that in an application.

Thus, we need to use a a shared memory model within each such component. This is what programmers in the HPC world are used to doing -- message-passing between nodes, shared-memory parallelism within each node.

Using component isolation to cut down on the complexity of the possible conflicts together with carefully chosen shared-memory algorithms within a component, we will be more likely to both find the concurrency we need to make our apps run like the wind on moderately parallel hardware and make them scale to higher degrees of hardware parallelism in the future, without rewriting.

Anyway, it's interesting to think about it.