Introduction to the Windows Threadpool (Part 1)
Download source code from MSDN Code Gallery.
I regularly receive feedback that the Win32 Threadpool API is complex and there is a need for better examples. To improve this situation, I decided to create three wrapper C++ classes which provide the following; queuing work items, associating callbacks with events and timer functionality. You can directly use these wrapper classes or look at the source to understand how to use the Threadpool APIs. These classes are header only code in the windowsthreadpool namespace and can be used by including the header file “WindowsThreadPool.h”. The entire source code is available on MSDN Code Gallery.
A little background on threadpools:
Threads are the basic abstraction used to schedule work on the CPU in Windows. With the increase in the number of cores/CPUs, developers need to architect their applications to run asynchronously and exploit maximum performance out of the system. There are usually two approaches to running code asynchronously; explicitly creating threads to run code asynchronously or using system provided facilities like threadpool which manage thread lifetimes.
Explicitly managing thread lifetimes is cumbersome from a coding perspective and can degrade system and application performance if the lifetimes are managed poorly. In addition, creating and destroying threads is an expensive operation; however having too many threads and not enough work is not ideal either, as it increases system memory utilization and context-switch overhead.
So how does a developer decide the ideal number of threads and get it right on different machines with different number of cores? That’s where Windows threadpool comes into the picture; it frees the developer from managing thread lifetimes and provides a pool of worker threads appropriate for the hardware. The developer queues work-items to the threadpool which executes them asynchronously. As long as there are free CPUs to execute those work-items the threadpool will create new threads to run them and once there is no more work, the threadpool will destroy threads based on its internal timeout heuristics.
Every user-mode process on Windows has a default threadpool available to it, the application does not need to initialize it and can submit work items by calling TrySubmitThreadpoolCallback or SubmitThreadpoolWork. System components within Windows use this to execute work items within the process.
You can use the class windowsthreadpool::SimpleThreadPool to queue work items to the default threadpool.
Note: All classes in windowsthreadpool namespace only support callback functions with the following signature. A function accepting a single PVOID pointer (can be NULL) and returns nothing.
void CALLBACK FunctionName(PVOID state)
This is conveniently typedef-ed to THREADPOOLCALLBACK in the windowsthreadpool namespace. In the example below, PrintI uses the PVOID parameter whereas PrintRand does not.
void CALLBACK PrintI(PVOID state)
int *i = reinterpret_cast<int *> (state);
cout << *i << " ," << endl;
void CALLBACK PrintRand(PVOID state)
cout << rand() << " ," << endl;
int _cdecl _tmain()
using namespace windowsthreadpool;
int *arr = new int;
for(int i=0; i<10; ++i)
arr[i] = i;
for(int i=0; i<10; ++i)
You queue work items to the threadpool using the QueueUserWorkItem function. This overloaded function can accept up to two parameters; the function pointer (PrintI) and the data (array element) that needs to be passed into the function. If you don’t need to pass in any data to the function pointer (PrintRand), then you don’t need to pass anything. To wait for all the work items to complete, you need to call WaitForAll. To wait for currently running work items and cancel any queued work items which haven’t started, use WaitForAllCurrentlyRunning.
Notice you didn’t use any of the win32 threadpool APIs or create any PTP_WORK objects; all you had to do was define your callback function and pass that function pointer to QueueUserWorkItem function. To understand how the SimpleThreadPool class works let take a look at the internal helper class SimpleCallBack. This is the key piece which submits work items to the process-wide threadpool and hides all the PTP_ parameters.
template <class Function>
const Function m_Func;
static void CALLBACK callback (PTP_CALLBACK_INSTANCE Instance, PVOID Param)
SimpleCallBack<Function> *pc = reinterpret_cast<SimpleCallBack<Function>*>(Param);
SimpleCallBack(const Function Func, PVOID st, PTP_CALLBACK_ENVIRON pEnv) : m_Func(Func), state(st)
if (!TrySubmitThreadpoolCallback(callback, this, pEnv))
throw "Error: Could not submit work item.";
The user provided callback function is executed in the callback function when the line pc->m_Func (pc->state) is executed. If the user does not provide any parameter for the callback, the state value is NULL. The TrySubmitThreadpoolCallback function can take in a pointer to an environment block or it can be NULL in which case the work item would run in the default environment. In this case, it’s taking the environment block which is initialized in the class SimpleThreadPool and this environment block is associated with all work items queued to SimpleThreadPool.
The WaitForAll function waits for all work items to complete; it accomplishes this by associating every work item with the same cleanup group. The cleanup group is a convenience which allows us to wait for all work items with one single call to CloseThreadpoolCleanupGroupMembers. It also frees all threadpool structures associated with the cleanup group with one call to CloseThreadpoolCleanupGroup. The second parameter to CloseThreadpoolCleanupGroupMembers controls whether to cancel any work items which haven’t started running yet.
void WaitForAllCallbacks(bool CancelNotStarted)
assert(InfraInitialized == true);
CloseThreadpoolCleanupGroupMembers(CleanupGroup, CancelNotStarted, NULL);
Next up, work item priority…