General Best Practices in the Concurrency Runtime

Article
08/03/2021

This document describes best practices that apply to multiple areas of the Concurrency Runtime.

Sections

This document contains the following sections:

Use Cooperative Synchronization Constructs When Possible
Avoid Lengthy Tasks That Do Not Yield
Use Oversubscription to Offset Operations That Block or Have High Latency
Use Concurrent Memory Management Functions When Possible
Use RAII to Manage the Lifetime of Concurrency Objects
Do Not Create Concurrency Objects at Global Scope
Do Not Use Concurrency Objects in Shared Data Segments

Use Cooperative Synchronization Constructs When Possible

The Concurrency Runtime provides many concurrency-safe constructs that do not require an external synchronization object. For example, the concurrency::concurrent_vector class provides concurrency-safe append and element access operations. Here, concurrency-safe means pointers or iterators are always valid. It's not a guarantee of element initialization, or of a particular traversal order. However, for cases where you require exclusive access to a resource, the runtime provides the concurrency::critical_section, concurrency::reader_writer_lock, and concurrency::event classes. These types behave cooperatively; therefore, the task scheduler can reallocate processing resources to another context as the first task waits for data. When possible, use these synchronization types instead of other synchronization mechanisms, such as those provided by the Windows API, which do not behave cooperatively. For more information about these synchronization types and a code example, see Synchronization Data Structures and Comparing Synchronization Data Structures to the Windows API.

[Top]

Avoid Lengthy Tasks That Do Not Yield

Because the task scheduler behaves cooperatively, it does not provide fairness among tasks. Therefore, a task can prevent other tasks from starting. Although this is acceptable in some cases, in other cases this can cause deadlock or starvation.

The following example performs more tasks than the number of allocated processing resources. The first task does not yield to the task scheduler and therefore the second task does not start until the first task finishes.

// cooperative-tasks.cpp
// compile with: /EHsc
#include <ppl.h>
#include <iostream>
#include <sstream>

using namespace concurrency;
using namespace std;

// Data that the application passes to lightweight tasks.
struct task_data_t
{
   int id;  // a unique task identifier.
   event e; // signals that the task has finished.
};

// A lightweight task that performs a lengthy operation.
void task(void* data)
{   
   task_data_t* task_data = reinterpret_cast<task_data_t*>(data);

   // Create a large loop that occasionally prints a value to the console.
   int i;
   for (i = 0; i < 1000000000; ++i)
   {
      if (i > 0 && (i % 250000000) == 0)
      {
         wstringstream ss;
         ss << task_data->id << L": " << i << endl;
         wcout << ss.str();
      }
   }
   wstringstream ss;
   ss << task_data->id << L": " << i << endl;
   wcout << ss.str();

   // Signal to the caller that the thread is finished.
   task_data->e.set();
}

int wmain()
{
   // For illustration, limit the number of concurrent 
   // tasks to one.
   Scheduler::SetDefaultSchedulerPolicy(SchedulerPolicy(2, 
      MinConcurrency, 1, MaxConcurrency, 1));

   // Schedule two tasks.

   task_data_t t1;
   t1.id = 0;
   CurrentScheduler::ScheduleTask(task, &t1);

   task_data_t t2;
   t2.id = 1;
   CurrentScheduler::ScheduleTask(task, &t2);

   // Wait for the tasks to finish.

   t1.e.wait();
   t2.e.wait();
}

This example produces the following output:

1: 250000000 1: 500000000 1: 750000000 1: 1000000000 2: 250000000 2: 500000000 2: 750000000 2: 1000000000

There are several ways to enable cooperation between the two tasks. One way is to occasionally yield to the task scheduler in a long-running task. The following example modifies the task function to call the concurrency::Context::Yield method to yield execution to the task scheduler so that another task can run.

// A lightweight task that performs a lengthy operation.
void task(void* data)
{   
   task_data_t* task_data = reinterpret_cast<task_data_t*>(data);

   // Create a large loop that occasionally prints a value to the console.
   int i;
   for (i = 0; i < 1000000000; ++i)
   {
      if (i > 0 && (i % 250000000) == 0)
      {
         wstringstream ss;
         ss << task_data->id << L": " << i << endl;
         wcout << ss.str();

         // Yield control back to the task scheduler.
         Context::Yield();
      }
   }
   wstringstream ss;
   ss << task_data->id << L": " << i << endl;
   wcout << ss.str();

   // Signal to the caller that the thread is finished.
   task_data->e.set();
}

This example produces the following output:

1: 250000000
2: 250000000
1: 500000000
2: 500000000
1: 750000000
2: 750000000
1: 1000000000
2: 1000000000

The Context::Yield method yields only another active thread on the scheduler to which the current thread belongs, a lightweight task, or another operating system thread. This method does not yield to work that is scheduled to run in a concurrency::task_group or concurrency::structured_task_group object but has not yet started.

There are other ways to enable cooperation among long-running tasks. You can break a large task into smaller subtasks. You can also enable oversubscription during a lengthy task. Oversubscription lets you create more threads than the available number of hardware threads. Oversubscription is especially useful when a lengthy task contains a high amount of latency, for example, reading data from disk or from a network connection. For more information about lightweight tasks and oversubscription, see Task Scheduler.

[Top]

Use Oversubscription to Offset Operations That Block or Have High Latency

The Concurrency Runtime provides synchronization primitives, such as concurrency::critical_section, that enable tasks to cooperatively block and yield to each other. When one task cooperatively blocks or yields, the task scheduler can reallocate processing resources to another context as the first task waits for data.

There are cases in which you cannot use the cooperative blocking mechanism that is provided by the Concurrency Runtime. For example, an external library that you use might use a different synchronization mechanism. Another example is when you perform an operation that could have a high amount of latency, for example, when you use the Windows API ReadFile function to read data from a network connection. In these cases, oversubscription can enable other tasks to run when another task is idle. Oversubscription lets you create more threads than the available number of hardware threads.

Consider the following function, download, which downloads the file at the given URL. This example uses the concurrency::Context::Oversubscribe method to temporarily increase the number of active threads.

// Downloads the file at the given URL.
string download(const string& url)
{
   // Enable oversubscription.
   Context::Oversubscribe(true);

   // Download the file.
   string content = GetHttpFile(_session, url.c_str());
   
   // Disable oversubscription.
   Context::Oversubscribe(false);

   return content;
}

Because the GetHttpFile function performs a potentially latent operation, oversubscription can enable other tasks to run as the current task waits for data. For the complete version of this example, see How to: Use Oversubscription to Offset Latency.

[Top]

Use Concurrent Memory Management Functions When Possible

Use the memory management functions, concurrency::Alloc and concurrency::Free, when you have fine-grained tasks that frequently allocate small objects that have a relatively short lifetime. The Concurrency Runtime holds a separate memory cache for each running thread. The Alloc and Free functions allocate and free memory from these caches without the use of locks or memory barriers.

For more information about these memory management functions, see Task Scheduler. For an example that uses these functions, see How to: Use Alloc and Free to Improve Memory Performance.

[Top]

Use RAII to Manage the Lifetime of Concurrency Objects

The Concurrency Runtime uses exception handling to implement features such as cancellation. Therefore, write exception-safe code when you call into the runtime or call another library that calls into the runtime.

The Resource Acquisition Is Initialization (RAII) pattern is one way to safely manage the lifetime of a concurrency object under a given scope. Under the RAII pattern, a data structure is allocated on the stack. That data structure initializes or acquires a resource when it is created and destroys or releases that resource when the data structure is destroyed. The RAII pattern guarantees that the destructor is called before the enclosing scope exits. This pattern is useful when a function contains multiple return statements. This pattern also helps you write exception-safe code. When a throw statement causes the stack to unwind, the destructor for the RAII object is called; therefore, the resource is always correctly deleted or released.

The runtime defines several classes that use the RAII pattern, for example, concurrency::critical_section::scoped_lock and concurrency::reader_writer_lock::scoped_lock. These helper classes are known as scoped locks. These classes provide several benefits when you work with concurrency::critical_section or concurrency::reader_writer_lock objects. The constructor of these classes acquires access to the provided critical_section or reader_writer_lock object; the destructor releases access to that object. Because a scoped lock releases access to its mutual exclusion object automatically when it is destroyed, you do not manually unlock the underlying object.

Consider the following class, account, which is defined by an external library and therefore cannot be modified.

// account.h
#pragma once
#include <exception>
#include <sstream>

// Represents a bank account.
class account
{
public:
   explicit account(int initial_balance = 0)
      : _balance(initial_balance)
   {
   }

   // Retrieves the current balance.
   int balance() const
   {
      return _balance;
   }

   // Deposits the specified amount into the account.
   int deposit(int amount)
   {
      _balance += amount;
      return _balance;
   }

   // Withdraws the specified amount from the account.
   int withdraw(int amount)
   {
      if (_balance < 0)
      {
         std::stringstream ss;
         ss << "negative balance: " << _balance << std::endl;
         throw std::exception((ss.str().c_str()));
      }

      _balance -= amount;
      return _balance;
   }

private:
   // The current balance.
   int _balance;
};

The following example performs multiple transactions on an account object in parallel. The example uses a critical_section object to synchronize access to the account object because the account class is not concurrency-safe. Each parallel operation uses a critical_section::scoped_lock object to guarantee that the critical_section object is unlocked when the operation either succeeds or fails. When the account balance is negative, the withdraw operation fails by throwing an exception.

// account-transactions.cpp
// compile with: /EHsc
#include "account.h"
#include <ppl.h>
#include <iostream>
#include <sstream>

using namespace concurrency;
using namespace std;

int wmain()
{
   // Create an account that has an initial balance of 1924.
   account acc(1924);

   // Synchronizes access to the account object because the account class is 
   // not concurrency-safe.
   critical_section cs;

   // Perform multiple transactions on the account in parallel.   
   try
   {
      parallel_invoke(
         [&acc, &cs] {
            critical_section::scoped_lock lock(cs);
            wcout << L"Balance before deposit: " << acc.balance() << endl;
            acc.deposit(1000);
            wcout << L"Balance after deposit: " << acc.balance() << endl;
         },
         [&acc, &cs] {
            critical_section::scoped_lock lock(cs);
            wcout << L"Balance before withdrawal: " << acc.balance() << endl;
            acc.withdraw(50);
            wcout << L"Balance after withdrawal: " << acc.balance() << endl;
         },
         [&acc, &cs] {
            critical_section::scoped_lock lock(cs);
            wcout << L"Balance before withdrawal: " << acc.balance() << endl;
            acc.withdraw(3000);
            wcout << L"Balance after withdrawal: " << acc.balance() << endl;
         }
      );
   }
   catch (const exception& e)
   {
      wcout << L"Error details:" << endl << L"\t" << e.what() << endl;
   }
}

This example produces the following sample output:

Balance before deposit: 1924
Balance after deposit: 2924
Balance before withdrawal: 2924
Balance after withdrawal: -76
Balance before withdrawal: -76
Error details:
    negative balance: -76

For additional examples that use the RAII pattern to manage the lifetime of concurrency objects, see Walkthrough: Removing Work from a User-Interface Thread, How to: Use the Context Class to Implement a Cooperative Semaphore, and How to: Use Oversubscription to Offset Latency.

[Top]

Do Not Create Concurrency Objects at Global Scope

When you create a concurrency object at global scope you can cause issues such as deadlock or memory access violations to occur in your application.

For example, when you create a Concurrency Runtime object, the runtime creates a default scheduler for you if one was not yet created. A runtime object that is created during global object construction will accordingly cause the runtime to create this default scheduler. However, this process takes an internal lock, which can interfere with the initialization of other objects that support the Concurrency Runtime infrastructure. This internal lock might be required by another infrastructure object that has not yet been initialized, and can thus cause deadlock to occur in your application.

The following example demonstrates the creation of a global concurrency::Scheduler object. This pattern applies not only to the Scheduler class but all other types that are provided by the Concurrency Runtime. We recommend that you do not follow this pattern because it can cause unexpected behavior in your application.

// global-scheduler.cpp
// compile with: /EHsc
#include <concrt.h>

using namespace concurrency;

static_assert(false, "This example illustrates a non-recommended practice.");

// Create a Scheduler object at global scope.
// BUG: This practice is not recommended because it can cause deadlock.
Scheduler* globalScheduler = Scheduler::Create(SchedulerPolicy(2,
   MinConcurrency, 2, MaxConcurrency, 4));

int wmain() 
{   
}

For examples of the correct way to create Scheduler objects, see Task Scheduler.

[Top]

Do Not Use Concurrency Objects in Shared Data Segments

The Concurrency Runtime does not support the use of concurrency objects in a shared data section, for example, a data section that is created by the data_seg#pragma directive. A concurrency object that is shared across process boundaries could put the runtime in an inconsistent or invalid state.

[Top]

General Best Practices in the Concurrency Runtime

Sections

Use Cooperative Synchronization Constructs When Possible

Avoid Lengthy Tasks That Do Not Yield

Use Oversubscription to Offset Operations That Block or Have High Latency

Use Concurrent Memory Management Functions When Possible

Use RAII to Manage the Lifetime of Concurrency Objects

Do Not Create Concurrency Objects at Global Scope

Do Not Use Concurrency Objects in Shared Data Segments

See also

Feedback

Feedback

Additional resources