September 2011

Volume 26 Number 09

Windows with C++ - The Thread Pool Environment

By Kenny Kerr | September 2011

Kenny KerrThe objects that make up the Windows thread pool API can be divided into two camps. In the first are those representing work, timers, I/O and waitable objects. These all potentially result in callbacks executing on the thread pool. I’ve already introduced the work object in last month’s column and will explore the remaining objects in subsequent articles. In the second camp are those objects that control the environment in which these callbacks execute. That’s the focus of this month’s column.

The thread pool environment affects whether callbacks execute in the default pool or a specific pool of your own making, whether callbacks should be prioritized and so on. Being able to control this environment becomes increasingly important as you move beyond a handful of work objects or callbacks. It also reduces the complexity of coordinating the cancelation and teardown of these objects, the topic of next month’s column.

The thread pool environment isn’t an object in the same sense as the other objects that make up the thread pool API. For the sake of efficiency, it’s simply declared as a structure so that you can directly allocate storage for it within your application. You should, however, treat it the same way as the other objects and not assume any knowledge of its internals, but rather access it only through the public set of API functions. The structure is named TP_CALLBACK_ENVIRON, and if you go and look it up you’ll immediately notice that it has already changed since it was first introduced with Windows Vista. That’s just another reminder that you must stick to the API functions. The functions themselves simply manipulate this structure but shield you from any changes. They’re declared inline to allow the compiler to optimize them as much as possible, so don’t be tempted to think that you can do a better job.

The InitializeThreadpoolEnvironment function prepares the structure with default settings. The DestroyThreadpoolEnvironment function frees any resources used by the environment. As of this writing, it does nothing. This may change in future, however. Because it’s an inline function, there’s no harm in calling it, as it will just be compiled away. Figure 1 shows a class that wraps this up.

Figure 1 Wrapping the InitializeThreadpoolEnvironment Function

class environment
{
  environment(environment const &);
  environment & operator=(environment const &);
 
  TP_CALLBACK_ENVIRON m_value;
 
public:
 
  environment() throw()
  {
    InitializeThreadpoolEnvironment(&m_value);
  }
 
  ~environment() throw()
  {
    DestroyThreadpoolEnvironment(&m_value);
  }
 
  PTP_CALLBACK_ENVIRON get() throw()
  {
    return &m_value;
  }
};

The familiar get member function is provided for consistency with the unique_handle class template I introduced in my July 2011 column (https://msdn.microsoft.com/magazine/hh288076). Attentive readers may recall that the CreateThreadpoolWork and the TrySubmitThreadpoolCallback functions I introduced in last month’s column had a final parameter that I didn’t mention. I simply passed a null pointer value in each case. That parameter is actually a pointer to an environment, and is how you associate different work objects with an environment:

environment e;
work w(CreateThreadpoolWork(callback, nullptr, e.get()));
check_bool(w);

What good is this? Well, not much—that is, until you start customizing the environment.

Private Pools

By default, the environment will direct callbacks to the default thread pool for the process. This same thread pool would’ve handled the callbacks had you not associated the work with an environment. Consider what it means to have a default thread pool for a process. Any code running within the process can use this thread pool. Keep in mind that the average process loads dozens of DLLs directly or indirectly. It should be obvious that this can seriously impact performance. This isn’t necessarily a bad thing. Sharing a thread pool among different subsystems within a process can often improve performance because the limited number of physical processors in the computer can be efficiently shared.

The alternative is that each subsystem creates its own pool of threads that all contend for processor cycles in a much more uncooperative manner. On the other hand, if a particular subsystem is abusing the default thread pool, you may want to shield yourself from this by using a private pool. This other subsystem may queue long-running callbacks or so many callbacks that the response time for your callbacks is unacceptable. You may also have specific requirements that necessitate certain limits on the number of threads in the pool. This is where the pool object comes in.

The CreateThreadpool function creates a private pool object completely independent from the default thread pool. If the function succeeds, it returns an opaque pointer representing the pool object. If it fails, it returns a null pointer value and provides more information via the GetLastError function. Given a pool object, the CloseThreadpool function instructs the system that the object may be released. Again, the unique_handle class template I introduced in my July 2011 column takes care of these details with the help of a pool-specific traits class:

struct pool_traits
{
  static PTP_POOL invalid() throw()
  {
    return nullptr;
  }
 
  static void close(PTP_POOL value) throw()
  {
    CloseThreadpool(value);
   }
};
 
typedef unique_handle<PTP_POOL, pool_traits> pool;

I can now use the convenient typedef and create a pool object as follows:

pool p(CreateThreadpool(nullptr));
check_bool(p);

I’m not hiding anything this time. The parameter in this case is reserved for future use and must be set with a null pointer value. The SetThreadpoolCallbackPool inline function updates the environment to indicate which pool callbacks should be directed to:

SetThreadpoolCallbackPool(e.get(), p.get());

In this way, work objects and any other objects created with this environment will be associated with the given pool. You could even create a few different environments, each with its own pool, to isolate different parts of your application. Just be careful to balance the concurrency between the different pools so that you don’t introduce excessive scheduling with too many threads.

As I hinted at before, it’s also possible to set minimum and maximum limits on the number of threads in your own pool. Controlling the default thread pool in this way isn’t permitted because it would affect other subsystems and cause all kinds of compatibility problems. For example, I might create a pool with exactly one thread to handle an API that has thread affinity, and another pool for I/O completion and other related callbacks without any limits, allowing the system to adjust the number of threads dynamically as needed. Here’s how I would set a pool to allocate exactly one persistent thread:

check_bool(SetThreadpoolThreadMinimum(p.get(), 1));
SetThreadpoolThreadMaximum(p.get(), 1);

Notice that setting the minimum can fail, whereas setting the maximum can’t. The default minimum is zero and setting to anything else can fail because it will actually attempt to create as many threads as requested.

Prioritizing Callbacks

Another feature that the thread pool environment enables is the ability to prioritize callbacks. This happens to be the only addition to the Windows Thread Pool API in Windows 7. Keep this in mind if you’re still targeting Windows Vista. A prioritized callback is guaranteed to execute ahead of any callbacks with a lower priority. This doesn’t affect thread priorities and therefore won’t cause executing callbacks to be preempted. Prioritized callbacks simply affect the order of callbacks that are pending execution.

There are three priority levels: low, normal and high. The SetThreadpoolCallbackPriority function sets the priority of an environment:

SetThreadpoolCallbackPriority(e.get(), TP_CALLBACK_PRIORITY_HIGH);

Again, any work objects and other objects created with this environment will have their callbacks prioritized accordingly.

A Serial Pool

I introduced the functional_pool sample class in last month’s column to demonstrate the various functions related to work objects. This time, I’m going to show you how to build a simple prioritized serial pool, making use of all the functions I’ve introduced this month that deal with the thread pool environment. By serial, I mean that I want the pool to manage exactly one persistent thread. And by prioritized, I’m simply going to support the submission of functions at either normal or high priority. I can go ahead and start defining the serial_pool class, as shown in Figure 2.

Figure 2 Defining the Serial_Pool Class

class serial_pool
{
  typedef concurrent_queue<function<void()>> queue;
 
  pool m_pool;
  queue m_queue, m_queue_high;
  work m_work, m_work_high;
 
  static void CALLBACK callback(
    PTP_CALLBACK_INSTANCE, void * context, PTP_WORK)
  {
    auto q = static_cast<queue *>(context);
 
    function<void()> function;
    q->try_pop(function);
 
    function();
  }

Unlike the functional_pool class, the serial_pool actually manages a pool object. It also needs separate queues and work objects for normal and high priority. The work objects can be created with different context values pointing to the respective queues and then simply reusing the private callback function. This avoids any branching at run time on my part. The callback still just pops a single function off the queue and calls it. However, the serial_pool constructor (shown in Figure 3) has a bit more work to do.

Figure 3 The Serial_Pool Constructor

public:
 
  serial_pool() :
    m_pool(CreateThreadpool(nullptr))
  {
    check_bool(m_pool);
    check_bool(SetThreadpoolThreadMinimum(m_pool.get(), 1));
    SetThreadpoolThreadMaximum(m_pool.get(), 1);
 
    environment e;
    SetThreadpoolCallbackPool(e.get(), m_pool.get());
 
    SetThreadpoolCallbackPriority(e.get(), TP_CALLBACK_PRIORITY_NORMAL);
    check_bool(m_work.reset(CreateThreadpoolWork(
      callback, &m_queue, e.get())));
 
    SetThreadpoolCallbackPriority(e.get(), TP_CALLBACK_PRIORITY_HIGH);
    check_bool(m_work_high.reset(CreateThreadpoolWork(
      callback, &m_queue_high, e.get())));
  }

First up is the creation of the private pool and setting its concurrency limits to ensure serial execution of any callbacks. Next, it creates an environment and sets the pool for subsequent objects to adopt. Finally, it creates the work objects, adjusting the environment’s priority to establish the work objects’ respective priorities and make the connection to the private pool that they share. Although the pool and work objects need to be maintained for the lifetime of a serial_pool object, the environment is created on the stack because it’s only needed to establish the relationships between the various interested parties.

The destructor now needs to wait for both work objects to make sure no callbacks execute after the serial_pool object is destroyed:

~serial_pool()
{
  WaitForThreadpoolWorkCallbacks(m_work.get(), true);
  WaitForThreadpoolWorkCallbacks(
    m_work_high.get(), true);
}

And finally, two submit functions are required to queue functions at either normal or high priority:

template <typename Function>
void submit(Function const & function)
{
  m_queue.push(function);
  SubmitThreadpoolWork(m_work.get());
}
 
template <typename Function>
void submit_high(Function const & function)
{
  m_queue_high.push(function);
  SubmitThreadpoolWork(m_work_high.get());
}

Ultimately, it all comes down to how the work objects were created, in particular what information about the desired thread pool environment was provided. Figure 4 shows a simple example that you can use, and in which you can clearly see the serial and prioritized behavior at work.

Figure 4 Serial and Prioritized Behavior at Work

int main()
{
  serial_pool pool;
 
  for (int i = 0; i < 10; ++i)
  {
    pool.submit([]
    {
      printf("normal: %d\n", GetCurrentThreadId());
    });
 
    pool.submit_high([]
    {
      printf("high: %d\n", GetCurrentThreadId());
    });
  }
  getch();
}

In the example shown in Figure 4, it’s possible that one normal-priority callback will execute first—depending on how quickly the system responds—because it’s submitted first. Beyond that, all of the high-priority callbacks should execute, followed by the remaining normal-priority ones. You can experiment by adding Sleep calls and raising the concurrency level to see how the thread pool adjusts its behavior according to your specifications.

Join me next month as I explore the critical cancelation and cleanup capabilities provided by the Windows thread pool API.


Kenny Kerr is a software craftsman with a passion for native Windows development. Reach him at https://kennykerr.ca.

Thanks to the following technical expert for reviewing this article: Stephan T. Lavavej