{ End Bracket }

Transactions for Memory

Joe Duffy

Injecting parallelism into your app is easy—managed code offers things like explicit threading and a thread pool for that. Ensuring your code remains correct when run in parallel, on the other hand, is not quite so simple. And if you manage to make it correct, tuning it to effectively exploit parallel hardware is even harder. Let's face it: concurrency is not a walk in the park.

History is often quick to offer unique insight into new problems. Concurrency is no different. In fact, database vendors have faced and solved similar problems for years. When a client app generates some SQL, the client programmer seldom thinks in terms of managing concurrent access to individual tables, rows, and columns. Database server software takes care of correctness through the use of transactions, ensuring atomicity, consistency, isolation, and durability (ACID). Databases also manage parallel scale-out through intelligent data- and query-parallelization algorithms, resulting in efficient utilization across many processors, cores, and in some cases over a cluster of machines.

This sounds promising. What might transactions for memory bring to our concurrent software?

Locks Are Implementation Details If you take a close look at lock-based programming, a reasonable conclusion is that locking is just a tool, not a programming model:

lock (...) { try { // (1) 'atomic' updates } catch { // (2) ensure that invariants hold throw; } }

The intent of this code is that updates to data inside (1) are atomic and consistent, provided that those accessing the data follow the locking protocol. As written, this is rather clumsy. It requires that the programmer hand-author the compensation code in (2) to roll back inconsistencies in the face of failure. But what if she could write the following instead?

atomic { /* (1) atomic updates */ }

Thread-safety, isolation, and compensation are implied.

How Might Atomic Blocks be Implemented? One approach is transactional memory. The block runs inside a transaction and all memory reads and writes are isolated within a thread-private log. When the atomic block completes, conflicts are detected and, if none are found, the writes are committed to memory and the effects become visible to the entire program. Otherwise, the transaction log is discarded and the block is reexecuted. If an exception escapes the atomic block unhandled, all updates made by the block are automatically thrown away. It all just works, much like working with a database as I described earlier.

Declarative Concurrency Permits the Runtime to Optimize Serializing access to data with locks involves trading correctness for parallelism and scalability. Clever locking schemes can be used to reap the benefits of both, but require complex code and engineering. If you're crafting a highly reusable, scientific-quality library, this amount of work might be par for the course. But if you're writing an application, the tax is simply too high.

Locks employ a technique known in database circles as pessimistic concurrency, while transactional memory prefers optimistic concurrency. The former prohibits concurrent access to data structures while the transaction executes, while the latter detects (and sometimes even tolerates) problematic conflicts at commit time. If two concurrent tasks work on the same data structure and do not conflict, for example, serialization is not necessary; yet a naïve lock-based algorithm might impose it. Permitting the transaction manager to choose an appropriate execution strategy using runtime heuristics can lead to highly scalable code.

Moreover, in-memory transactions can eliminate hard-to-debug "heisenbugs" such as deadlocks, which often result in program hangs and are hard to provoke, and priority inversion and lock convoys, both of which can lead to fairness and scalability problems. Because the transaction manager controls locking details, it can ensure fair and deadlock-free commit protocols.

An Eye Towards the Future There are several interesting programming models that seek to simplify mainstream concurrent programming, with transactional memory at the front of the pack. Whether it will win out remains an open question, but one thing is certain: with ever-increasing parallelism coming to the desktop- and server-machines near you, easier concurrency programming models will evolve alongside. In fact, some might say we must do for concurrent programming what garbage collection did for memory management. Transactions for memory is a step in that direction.

Joe Duffy is a Program Manager on the CLR Team at Microsoft, where he focuses on concurrency programming models for managed code. Keep an eye out for Joe's upcoming book, Professional .NET Framework 2.0 (Wrox, 2006).