Практическое руководство. Использование функции parallel_invoke для написания программы параллельной сортировки

Статья
10/12/2023

В этом документе описывается использование алгоритма parallel_invoke для повышения производительности алгоритма битовой сортировки. Алгоритм битовой сортировки рекурсивно делит входную последовательность на небольшие отсортированные секции. Алгоритм битовой сортировки может выполняться параллельно, так как каждая операция секции не зависит от всех остальных операций.

Хотя битовая сортировка является примером сети сортировки, которая сортирует все сочетания входных последовательностей, в этом примере сортируются последовательности, длина которых равна двум.

Примечание.

В этом примере для демонстрации используется параллельная подпрограмма сортировки. Кроме того, можно использовать встроенные алгоритмы сортировки, которые предоставляет PPL: concurrency::p arallel_sort, concurrency::p arallel_buffered_sort и concurrency::p arallel_radixsort. Дополнительные сведения см. в разделе "Параллельные алгоритмы".

Разделы

В этом документе описаны следующие задачи:

Выполнение последовательной сортировки bitonic
Использование parallel_invoke для параллельной сортировки битонов

Выполнение последовательной сортировки bitonic

В следующем примере показана последовательная версия алгоритма сортировки битонов. Функция bitonic_sort делит последовательность на две секции, сортирует эти секции в противоположных направлениях, а затем объединяет результаты. Эта функция вызывает себя два раза рекурсивно для сортировки каждой секции.

const bool INCREASING = true;
const bool DECREASING = false;

// Comparator function for the bitonic sort algorithm.
template <class T>
void compare(T* items, int i, int j, bool dir)
{
   if (dir == (items[i] > items[j]))
   {
      swap(items[i], items[j]);
   }
}

// Sorts a bitonic sequence in the specified order.
template <class T>
void bitonic_merge(T* items, int lo, int n, bool dir)
{
   if (n > 1)
   {
      int m = n / 2;
      for (int i = lo; i < lo + m; ++i)
      {
         compare(items, i, i + m, dir);
      }
      bitonic_merge(items, lo, m, dir);
      bitonic_merge(items, lo + m, m, dir);
   }
}

// Sorts the given sequence in the specified order.
template <class T>
void bitonic_sort(T* items, int lo, int n, bool dir)
{
   if (n > 1)
   {
      // Divide the array into two partitions and then sort 
      // the partitions in different directions.
      int m = n / 2;
      bitonic_sort(items, lo, m, INCREASING);
      bitonic_sort(items, lo + m, m, DECREASING);

      // Merge the results.
      bitonic_merge(items,lo, n, dir);
   }
}

// Sorts the given sequence in increasing order.
template <class T>
void bitonic_sort(T* items, int size)
{
    bitonic_sort(items, 0, size, INCREASING);
}

[В начало]

Использование parallel_invoke для параллельной сортировки битонов

В этом разделе описывается, как использовать parallel_invoke алгоритм для параллельного выполнения алгоритма битовой сортировки.

Параллельное выполнение алгоритма сортировки битонов

Добавьте директиву #include для файла заголовка ppl.h.
```
#include <ppl.h>
```
Добавьте директиву using для concurrency пространства имен.
```
using namespace concurrency;
```

Создайте новую функцию, которая parallel_bitonic_megeиспользует parallel_invoke алгоритм для параллельного объединения последовательностей, если требуется достаточное количество операций. В противном случае вызов bitonic_merge последовательного объединения последовательностей.

// Sorts a bitonic sequence in the specified order.
template <class T>
void parallel_bitonic_merge(T* items, int lo, int n, bool dir)
{   
   // Merge the sequences concurrently if there is sufficient work to do.
   if (n > 500)
   {
      int m = n / 2;
      for (int i = lo; i < lo + m; ++i)
      {
         compare(items, i, i + m, dir);
      }

      // Use the parallel_invoke algorithm to merge the sequences in parallel.
      parallel_invoke(
         [&items,lo,m,dir] { parallel_bitonic_merge(items, lo, m, dir); },
         [&items,lo,m,dir] { parallel_bitonic_merge(items, lo + m, m, dir); }
      );
   }
   // Otherwise, perform the work serially.
   else if (n > 1)
   {
      bitonic_merge(items, lo, n, dir);
   }   
}

Выполните процесс, похожий на тот, который на предыдущем шаге, но для bitonic_sort функции.

// Sorts the given sequence in the specified order.
template <class T>
void parallel_bitonic_sort(T* items, int lo, int n, bool dir)
{   
   if (n > 1)
   {
      // Divide the array into two partitions and then sort 
      // the partitions in different directions.
      int m = n / 2;

      // Sort the partitions in parallel.
      parallel_invoke(
         [&items,lo,m] { parallel_bitonic_sort(items, lo, m, INCREASING); },
         [&items,lo,m] { parallel_bitonic_sort(items, lo + m, m, DECREASING); }
      );
      
      // Merge the results.
      parallel_bitonic_merge(items, lo, n, dir);
   }
}

Создайте перегруженную версию parallel_bitonic_sort функции, которая сортирует массив по возрастанию порядка.
```
// Sorts the given sequence in increasing order.
template <class T>
void parallel_bitonic_sort(T* items, int size)
{
   parallel_bitonic_sort(items, 0, size, INCREASING);
}
```
Алгоритм parallel_invoke сокращает затраты, выполняя последнюю из ряда задач в контексте вызова. Например, в parallel_bitonic_sort функции первая задача выполняется в отдельном контексте, а вторая задача выполняется в контексте вызова.
```
// Sort the partitions in parallel.
parallel_invoke(
   [&items,lo,m] { parallel_bitonic_sort(items, lo, m, INCREASING); },
   [&items,lo,m] { parallel_bitonic_sort(items, lo + m, m, DECREASING); }
);
```

Следующий полный пример выполняет как последовательные, так и параллельные версии алгоритма сортировки битов. В примере также выводится в консоль время, необходимое для выполнения каждого вычисления.

// parallel-bitonic-sort.cpp
// compile with: /EHsc
#include <windows.h>
#include <algorithm>
#include <iostream>
#include <random>
#include <ppl.h>

using namespace concurrency;
using namespace std;

// Calls the provided work function and returns the number of milliseconds 
// that it takes to call that function.
template <class Function>
__int64 time_call(Function&& f)
{
   __int64 begin = GetTickCount();
   f();
   return GetTickCount() - begin;
}

const bool INCREASING = true;
const bool DECREASING = false;

// Comparator function for the bitonic sort algorithm.
template <class T>
void compare(T* items, int i, int j, bool dir)
{
   if (dir == (items[i] > items[j]))
   {
      swap(items[i], items[j]);
   }
}

// Sorts a bitonic sequence in the specified order.
template <class T>
void bitonic_merge(T* items, int lo, int n, bool dir)
{
   if (n > 1)
   {
      int m = n / 2;
      for (int i = lo; i < lo + m; ++i)
      {
         compare(items, i, i + m, dir);
      }
      bitonic_merge(items, lo, m, dir);
      bitonic_merge(items, lo + m, m, dir);
   }
}

// Sorts the given sequence in the specified order.
template <class T>
void bitonic_sort(T* items, int lo, int n, bool dir)
{
   if (n > 1)
   {
      // Divide the array into two partitions and then sort 
      // the partitions in different directions.
      int m = n / 2;
      bitonic_sort(items, lo, m, INCREASING);
      bitonic_sort(items, lo + m, m, DECREASING);

      // Merge the results.
      bitonic_merge(items,lo, n, dir);
   }
}

// Sorts the given sequence in increasing order.
template <class T>
void bitonic_sort(T* items, int size)
{
    bitonic_sort(items, 0, size, INCREASING);
}

// Sorts a bitonic sequence in the specified order.
template <class T>
void parallel_bitonic_merge(T* items, int lo, int n, bool dir)
{   
   // Merge the sequences concurrently if there is sufficient work to do.
   if (n > 500)
   {
      int m = n / 2;
      for (int i = lo; i < lo + m; ++i)
      {
         compare(items, i, i + m, dir);
      }

      // Use the parallel_invoke algorithm to merge the sequences in parallel.
      parallel_invoke(
         [&items,lo,m,dir] { parallel_bitonic_merge(items, lo, m, dir); },
         [&items,lo,m,dir] { parallel_bitonic_merge(items, lo + m, m, dir); }
      );
   }
   // Otherwise, perform the work serially.
   else if (n > 1)
   {
      bitonic_merge(items, lo, n, dir);
   }   
}

// Sorts the given sequence in the specified order.
template <class T>
void parallel_bitonic_sort(T* items, int lo, int n, bool dir)
{   
   if (n > 1)
   {
      // Divide the array into two partitions and then sort 
      // the partitions in different directions.
      int m = n / 2;

      // Sort the partitions in parallel.
      parallel_invoke(
         [&items,lo,m] { parallel_bitonic_sort(items, lo, m, INCREASING); },
         [&items,lo,m] { parallel_bitonic_sort(items, lo + m, m, DECREASING); }
      );
      
      // Merge the results.
      parallel_bitonic_merge(items, lo, n, dir);
   }
}

// Sorts the given sequence in increasing order.
template <class T>
void parallel_bitonic_sort(T* items, int size)
{
   parallel_bitonic_sort(items, 0, size, INCREASING);
}

int wmain()
{  
   // For this example, the size must be a power of two.
   const int size = 0x200000;

   // Create two large arrays and fill them with random values.
   int* a1 = new int[size];
   int* a2 = new int[size];

   mt19937 gen(42);
   for(int i = 0; i < size; ++i)
   {
      a1[i] = a2[i] = gen();
   }

   __int64 elapsed;
   
   // Perform the serial version of the sort.
   elapsed = time_call([&] { bitonic_sort(a1, size); });
   wcout << L"serial time: " << elapsed << endl;

   // Now perform the parallel version of the sort.
   elapsed = time_call([&] { parallel_bitonic_sort(a2, size); });
   wcout << L"parallel time: " << elapsed << endl;
 
   delete[] a1;
   delete[] a2;
}

В следующем примере показаны выходные данные, полученные на четырехпроцессорном компьютере.

serial time: 4353
parallel time: 1248

[В начало]

Компиляция кода

Чтобы скомпилировать код, скопируйте его и вставьте его в проект Visual Studio или вставьте его в файл с именем parallel-bitonic-sort.cpp , а затем выполните следующую команду в окне командной строки Visual Studio.

cl.exe /EHsc parallel-bitonic-sort.cpp

Отказоустойчивость

В этом примере используется parallel_invoke алгоритм вместо класса параллелизма::task_group , так как время существования каждой группы задач не выходит за рамки функции. Рекомендуется использовать parallel_invoke , если это возможно, так как оно имеет меньшие затраты на выполнение, чем task group объекты, и поэтому позволяет создавать более качественный код.

Параллельные версии некоторых алгоритмов выполняются лучше, только если достаточно работы. Например, parallel_bitonic_merge функция вызывает последовательную версию, bitonic_mergeесли в последовательности имеется 500 или меньше элементов. Вы также можете спланировать общую стратегию сортировки на основе объема работы. Например, более эффективно использовать последовательную версию алгоритма быстрого сортировки, если массив содержит менее 500 элементов, как показано в следующем примере:

template <class T>
void quick_sort(T* items, int lo, int n)
{
   // TODO: The function body is omitted for brevity.
}

template <class T>
void parallel_bitonic_sort(T* items, int lo, int n, bool dir)
{
   // Use the serial quick sort algorithm if there are relatively few
   // items to sort. The associated overhead for running few tasks in 
   // parallel may not overcome the benefits of parallel processing.
   if (n - lo + 1 <= 500)
   {
      quick_sort(items, lo, n);
   }
   else if (n > 1)
   {
      // Divide the array into two partitions and then sort 
      // the partitions in different directions.
      int m = n / 2;

      // Sort the partitions in parallel.
      parallel_invoke(
         [&items,lo,m] { parallel_bitonic_sort(items, lo, m, INCREASING); },
         [&items,lo,m] { parallel_bitonic_sort(items, lo + m, m, DECREASING); }
      );
      
      // Merge the results.
      parallel_bitonic_merge(items, lo, n, dir);
   }
}

Как и в случае с любым параллельным алгоритмом, мы рекомендуем профилирование и настройку кода соответствующим образом.

См. также

Параллелизм задач
Функция parallel_invoke