Choosing between array and array_view in C++ AMP

Article
07/17/2012

As you know, with C++ AMP you can perform a computation through parallel_for_each and you can pass data to the computation (and hence to the accelerator) by capturing the data in the lambda that you pass to the parallel_for_each.

Two common ways to pass data is through a concurrency array or an array_view. While the two types have a very similar interface, they do represent different concepts and you may wonder when you should use one and when you should use the other.

When in doubt, use array_view

The answer is simple: If you are facing that dilemma, you should use array_view. Using array_view will result in better future proofing of your code, as shared memory architectures become more common, while still working with discrete GPUs. Because array_view is not associated with any specific accelerator, and instead data transfers between different memory spaces implicitly happens for you under the covers, you end up not having to perform explicit copy operations. In contrast, array is associated from creation time with exactly one accelerator, and as such any data movement has to be explicitly managed by your code through additional use of copy commands.

So by default you should always use array_view.

Then your question may become: "When should I use array?" Good question! There are 4 scenarios where you cannot avoid using the array type in your code, at least with v1 of C++ AMP.

For DX interop, use array

With C++ AMP you can smoothly interop with other DirectX code that you may have written, and the interop APIs expect an array, not an array_view. If you are not familiar with that, please read our blog post on DX interop.

For staging buffers, use array

With C++ AMP, for certain scenarios where frequent data transfers take place between the CPU and the GPU, you can use a technique to make those data transfers faster. That technique requires the use of at least one array object. If you are not familiar with that, please read out blog post on staging arrays.

For measuring the performance of the transfer TO the accelerator, use array

If you are not familiar with how data transfers work with C++ AMP, please read our blog post on transferring data between accelerator and host memory. If you are not familiar with how to measure performance of C++ AMP code, please read our blog post on how to measure the performance of C++ AMP algorithms. As stated in that blog post it is not possible to separate, and hence to measure the performance of, the data transfer to the accelerator and the kernel execution on the accelerator. So if you want to measure the performance of the data transfer to the accelerator, you have to use an array as described in that blog post.

In case it is not clear, it is possible to measure the performance of the transfer from the accelerator back to the host, for example by creating a marker to measure the kernel execution on its own and then measuring the synchronization of the array_view separately.

For copying to the accelerator asynchronously, use array

If you are not familiar with the asynchronous APIs offered by C++ AMP, please read the blog posts linked in the preceding section where we discussed measuring performance. In addition, you should read our blog post on asynchronous operations and continuations. You certainly can and are encouraged to use array_view for asynchronous operations with C++ AMP. However, as per the preceding section you cannot separate the transfer to the accelerator from the kernel execution, and similarly you cannot issue an asynchronous data transfer to the accelerator - it is always synchronous.

Again, you can certainly synchronize the array_view back to the host asynchronously.

Even when using array, you could still... combine it with array_view

Even for the scenarios above, where you would use the array for separating the transfer of the data to the accelerator from the kernel execution, or where you are using array for DirectX interop, or using arrays for the staging array technique, you can still use an array_view object that wraps over the said array. For example:

     // other code

    array<T,N> arr(...);

    // other code

    array_view<T,N> av(arr);


    parallel_for_each(av.extent, [=](index<N> idx) restrict(amp){

        av[idx] = ...


    });

In other words, you use the array object for its facilities, but then just before the parallel_for_each you wrap an array_view over it and use that in the lambda. The only reason to do this would be stylistic, but I know folks who prefer this approach for consistency between their various parallel_for_each occurrences (especially when designing reusable libraries)...

Hope this helps, and if you encounter an additional scenario where you found array more useful than array_view, please let us know in the comments below or in the MSDN forum.