# How to: Write a parallel_for_each Loop

This example shows how to use the concurrency::parallel_for_each algorithm to compute the count of prime numbers in a std::array object in parallel.

## Example

The following example computes the count of prime numbers in an array two times. The example first uses the std::for_each algorithm to compute the count serially. The example then uses the **parallel_for_each** algorithm to perform the same task in parallel. The example also prints to the console the time that is required to perform both computations.

```
// parallel-count-primes.cpp
// compile with: /EHsc
#include <windows.h>
#include <ppl.h>
#include <iostream>
#include <algorithm>
#include <array>
using namespace concurrency;
using namespace std;
// Calls the provided work function and returns the number of milliseconds
// that it takes to call that function.
template <class Function>
__int64 time_call(Function&& f)
{
__int64 begin = GetTickCount();
f();
return GetTickCount() - begin;
}
// Determines whether the input value is prime.
bool is_prime(int n)
{
if (n < 2)
return false;
for (int i = 2; i < n; ++i)
{
if ((n % i) == 0)
return false;
}
return true;
}
int wmain()
{
// Create an array object that contains 200000 integers.
array<int, 200000> a;
// Initialize the array such that a[i] == i.
int n = 0;
generate(begin(a), end(a), [&] {
return n++;
});
LONG prime_count;
__int64 elapsed;
// Use the for_each algorithm to count the number of prime numbers
// in the array serially.
prime_count = 0L;
elapsed = time_call([&] {
for_each (begin(a), end(a), [&](int n ) {
if (is_prime(n))
++prime_count;
});
});
wcout << L"serial version: " << endl
<< L"found " << prime_count << L" prime numbers" << endl
<< L"took " << elapsed << L" ms" << endl << endl;
// Use the parallel_for_each algorithm to count the number of prime numbers
// in the array in parallel.
prime_count = 0L;
elapsed = time_call([&] {
parallel_for_each (begin(a), end(a), [&](int n ) {
if (is_prime(n))
InterlockedIncrement(&prime_count);
});
});
wcout << L"parallel version: " << endl
<< L"found " << prime_count << L" prime numbers" << endl
<< L"took " << elapsed << L" ms" << endl << endl;
}
```

The following sample output is for a computer that has four processors.

```
serial version:
found 17984 prime numbers
took 6115 ms
parallel version:
found 17984 prime numbers
took 1653 ms
```

## Compiling the Code

To compile the code, copy it and then paste it in a Visual Studio project, or paste it in a file that is named parallel-count-primes.cpp and then run the following command in a Visual Studio Command Prompt window.

**cl.exe /EHsc parallel-count-primes.cpp**

## Robust Programming

The lambda expression that the example passes to the **parallel_for_each** algorithm uses the **InterlockedIncrement** function to enable parallel iterations of the loop to increment the counter simultaneously. If you use functions such as **InterlockedIncrement** to synchronize access to shared resources, you can present performance bottlenecks in your code. You can use a lock-free synchronization mechanism, for example, the concurrency::combinable class, to eliminate simultaneous access to shared resources. For an example that uses the **combinable** class in this manner, see How to: Use combinable to Improve Performance.