PLINQ 介绍Introduction to PLINQ

并行 LINQ (PLINQ) 是语言集成查询 (LINQ) 模式的并行实现。Parallel LINQ (PLINQ) is a parallel implementation of the Language-Integrated Query (LINQ) pattern. PLINQ 将整套 LINQ 标准查询运算符实现为 System.Linq 命名空间的扩展方法,并提供适用于并行操作的其他运算符。PLINQ implements the full set of LINQ standard query operators as extension methods for the System.Linq namespace and has additional operators for parallel operations. PLINQ 将 LINQ 语法的简洁和可靠性与并行编程的强大功能结合在一起。PLINQ combines the simplicity and readability of LINQ syntax with the power of parallel programming.

提示

如果不熟悉 LINQ,则它具有统一的模型,用于以类型安全方式查询任何可枚举数据源。If you're not familiar with LINQ, it features a unified model for querying any enumerable data source in a type-safe manner. LINQ to Objects 是针对内存中集合(如 List<T> 和数组)运行的 LINQ 查询的名称。LINQ to Objects is the name for LINQ queries that are run against in-memory collections such as List<T> and arrays. 本文假定你对 LINQ 有基本的了解。This article assumes that you have a basic understanding of LINQ. 有关详细信息,请参阅语言集成查询 (LINQ)For more information, see Language-Integrated Query (LINQ).

什么是并行查询?What is a Parallel query?

一个 PLINQ 查询的许多方面都类似于非并行的 LINQ to Objects 查询。A PLINQ query in many ways resembles a non-parallel LINQ to Objects query. 与顺序 LINQ 查询一样,PLINQ 查询对任何内存中 IEnumerableIEnumerable<T> 数据源执行操作,并且推迟了执行,即在枚举查询前不会开始执行。PLINQ queries, just like sequential LINQ queries, operate on any in-memory IEnumerable or IEnumerable<T> data source, and have deferred execution, which means they do not begin executing until the query is enumerated. 主要区别在于,PLINQ 会尝试充分利用系统上的所有处理器。The primary difference is that PLINQ attempts to make full use of all the processors on the system. 方法是将数据源分区成片段,然后在多个处理器上针对单独工作线程上的每个片段执行并行查询。It does this by partitioning the data source into segments, and then executing the query on each segment on separate worker threads in parallel on multiple processors. 在许多情况下,并行执行意味着查询运行速度显著提高。In many cases, parallel execution means that the query runs significantly faster.

通过并行执行,通常只需向数据源添加 AsParallel 查询操作,PLINQ 即可显著提升性能(与某些类型查询的旧代码相比)。Through parallel execution, PLINQ can achieve significant performance improvements over legacy code for certain kinds of queries, often just by adding the AsParallel query operation to the data source. 但是,并行可能会引入其自身的复杂性,因此并非所有的查询操作的运行速度在 PLINQ 中都更快。However, parallelism can introduce its own complexities, and not all query operations run faster in PLINQ. 事实上,并行实际上会降低某些查询的速度。In fact, parallelization actually slows down certain queries. 因此,应了解排序等问题将如何对并行查询产生影响。Therefore, you should understand how issues such as ordering affect parallel queries. 有关详细信息,请参阅了解 PLINQ 中的加速For more information, see Understanding Speedup in PLINQ.

备注

本文档使用 lambda 表达式在 PLINQ 中定义委托。This documentation uses lambda expressions to define delegates in PLINQ. 如果不熟悉 C# 或 Visual Basic 中的 lambda 表达式,请参阅 PLINQ 和 TPL 中的 Lambda 表达式If you are not familiar with lambda expressions in C# or Visual Basic, see Lambda Expressions in PLINQ and TPL.

本文的其余部分将概述主 PLINQ 类,并讨论如何创建 PLINQ 查询。The remainder of this article gives an overview of the main PLINQ classes and discusses how to create PLINQ queries. 每部分包含指向更详细信息以及代码示例的链接。Each section contains links to more detailed information and code examples.

ParallelEnumerable 类The ParallelEnumerable Class

System.Linq.ParallelEnumerable 类公开了几乎所有的 PLINQ 功能。The System.Linq.ParallelEnumerable class exposes almost all of PLINQ's functionality. 它和 System.Linq 命名空间类型的其余部分一起被编译到 System.Core.dll 程序集中。It and the rest of the System.Linq namespace types are compiled into the System.Core.dll assembly. Visual Studio 中默认的 C# 和 Visual Basic 项目均会引用该程序集并导入该命名空间。The default C# and Visual Basic projects in Visual Studio both reference the assembly and import the namespace.

尽管 ParallelEnumerable 实现了 LINQ to Objects 支持的所有标准查询运算符,但它不会尝试并行执行每个实现。ParallelEnumerable includes implementations of all the standard query operators that LINQ to Objects supports, although it does not attempt to parallelize each one. 如果你不熟悉 LINQ,请参阅 LINQ (C#) 简介LINQ (Visual Basic) 简介If you are not familiar with LINQ, see Introduction to LINQ (C#) and Introduction to LINQ (Visual Basic).

除了标准查询运算符外,ParallelEnumerable 类还包含一组启用并行执行专用行为的方法。In addition to the standard query operators, the ParallelEnumerable class contains a set of methods that enable behaviors specific to parallel execution. 下表中列出了这些特定于 PLINQ 的方法。These PLINQ-specific methods are listed in the following table.

ParallelEnumerable 运算符ParallelEnumerable Operator 描述Description
AsParallel PLINQ 的入口点。The entry point for PLINQ. 指定如果可能,应并行化查询的其余部分。Specifies that the rest of the query should be parallelized, if it is possible.
AsSequential 指定查询的其余部分应像非并行的 LINQ 查询一样按顺序运行。Specifies that the rest of the query should be run sequentially, as a non-parallel LINQ query.
AsOrdered 指定 PLINQ 应为查询的其余部分保留源序列的排序,或直到例如通过使用 orderby(在 Visual Basic 中为 Order By)子句更改排序为止。Specifies that PLINQ should preserve the ordering of the source sequence for the rest of the query, or until the ordering is changed, for example by the use of an orderby (Order By in Visual Basic) clause.
AsUnordered 指定保留源序列的排序不需要查询其余部分的 PLINQ。Specifies that PLINQ for the rest of the query is not required to preserve the ordering of the source sequence.
WithCancellation 指定 PLINQ 应定期监视请求取消时所提供的取消标记的状态以及取消执行。Specifies that PLINQ should periodically monitor the state of the provided cancellation token and cancel execution if it is requested.
WithDegreeOfParallelism 指定 PLINQ 应用于并行化查询的处理器的最大数量。Specifies the maximum number of processors that PLINQ should use to parallelize the query.
WithMergeOptions 提供有关 PLINQ 应如何(如果可能)将并行结果合并回使用线程上的一个序列的提示。Provides a hint about how PLINQ should, if it is possible, merge parallel results back into just one sequence on the consuming thread.
WithExecutionMode 指定 PLINQ 应如何并行化查询(即使是当默认行为是按顺序运行查询时)。Specifies whether PLINQ should parallelize the query even when the default behavior would be to run it sequentially.
ForAll 一种多线程枚举方法,与循环访问查询结果不同,它允许在不首先合并回使用者线程的情况下并行处理结果。A multithreaded enumeration method that, unlike iterating over the results of the query, enables results to be processed in parallel without first merging back to the consumer thread.
Aggregate 重载Aggregate overload 对于 PLINQ 唯一的重载,它启用对线程本地分区的中间聚合以及一个用于合并所有分区结果的最终聚合函数。An overload that is unique to PLINQ and enables intermediate aggregation over thread-local partitions, plus a final aggregation function to combine the results of all partitions.

选择使用模型The Opt-in Model

编写查询时,请对数据源调用 ParallelEnumerable.AsParallel 扩展方法,以选择使用 PLINQ,如下面的示例所示。When you write a query, opt in to PLINQ by invoking the ParallelEnumerable.AsParallel extension method on the data source, as shown in the following example.

var source = Enumerable.Range(1, 10000);

// Opt in to PLINQ with AsParallel.
var evenNums = from num in source.AsParallel()
               where num % 2 == 0
               select num;
Console.WriteLine("{0} even numbers out of {1} total",
                  evenNums.Count(), source.Count());
// The example displays the following output:
//       5000 even numbers out of 10000 total
Dim source = Enumerable.Range(1, 10000)

' Opt in to PLINQ with AsParallel
Dim evenNums = From num In source.AsParallel()
               Where num Mod 2 = 0
               Select num
Console.WriteLine("{0} even numbers out of {1} total",
                  evenNums.Count(), source.Count())
' The example displays the following output:
'       5000 even numbers out of 10000 total      

AsParallel 扩展方法将后续查询运算符(在此示例中为 whereselect)绑定到 System.Linq.ParallelEnumerable 实现。The AsParallel extension method binds the subsequent query operators, in this case, where and select, to the System.Linq.ParallelEnumerable implementations.

执行模式Execution Modes

默认情况下,PLINQ 是保守的。By default, PLINQ is conservative. 在运行时,PLINQ 基础结构将分析查询的总体结构。At run time, the PLINQ infrastructure analyzes the overall structure of the query. 如果通过并行可能会提高查询速度,PLINQ 则将源序列分区为可以同时运行的任务。If the query is likely to yield speedups by parallelization, PLINQ partitions the source sequence into tasks that can be run concurrently. 如果并行化查询不安全,PLINQ 则只会按顺序运行查询。If it is not safe to parallelize a query, PLINQ just runs the query sequentially. 如果 PLINQ 可以在可能会较昂贵的并行算法或成本较低的顺序算法之间进行选择,它会默认选择顺序算法。If PLINQ has a choice between a potentially expensive parallel algorithm or an inexpensive sequential algorithm, it chooses the sequential algorithm by default. 可以使用 WithExecutionMode 方法和 System.Linq.ParallelExecutionMode 枚举指示 PLINQ 选择并行算法。You can use the WithExecutionMode method and the System.Linq.ParallelExecutionMode enumeration to instruct PLINQ to select the parallel algorithm. 如果你通过测试和测量知道特定查询以并行方式执行得更快时,此做法非常有用。This is useful when you know by testing and measurement that a particular query executes faster in parallel. 有关详细信息,请参阅如何:在 PLINQ 中指定执行模式For more information, see How to: Specify the Execution Mode in PLINQ.

并行度Degree of Parallelism

默认情况下,PLINQ 使用主机计算机上的所有处理器。By default, PLINQ uses all of the processors on the host computer. 可以使用 WithDegreeOfParallelism 方法指示 PLINQ 使用不超过指定数量的处理器。You can instruct PLINQ to use no more than a specified number of processors by using the WithDegreeOfParallelism method. 当你要确保计算机上运行的其他进程收到一定的 CPU 时间量时,此做法将非常有用。This is useful when you want to make sure that other processes running on the computer receive a certain amount of CPU time. 下面的片段将查询限制为最多使用两个处理器。The following snippet limits the query to utilizing a maximum of two processors.

var query = from item in source.AsParallel().WithDegreeOfParallelism(2)
            where Compute(item) > 42
            select item;
Dim query = From item In source.AsParallel().WithDegreeOfParallelism(2)
            Where Compute(item) > 42
            Select item

在查询要执行大量非受计算限制的工作(如文件 I/O)的情况下,最好指定比计算机上的内核数要大的并行度。In cases where a query is performing a significant amount of non-compute-bound work such as File I/O, it might be beneficial to specify a degree of parallelism greater than the number of cores on the machine.

已排序和未排序的并行查询Ordered Versus Unordered Parallel Queries

在某些查询中,一个查询运算符必须产生保留源序列排序的结果。In some queries, a query operator must produce results that preserve the ordering of the source sequence. 为此,PLINQ 提供了 AsOrdered 运算符。PLINQ provides the AsOrdered operator for this purpose. AsOrdered 不同于 AsSequentialAsOrdered is distinct from AsSequential. 尽管仍并行处理 AsOrdered 序列,但会缓冲和排序它的结果。An AsOrdered sequence is still processed in parallel, but its results are buffered and sorted. 由于顺序暂留通常涉及额外的工作,因此处理 AsOrdered 序列可能比处理默认 AsUnordered 序列更慢。Because order preservation typically involves extra work, an AsOrdered sequence might be processed more slowly than the default AsUnordered sequence. 特定的已排序并行操作是否比操作的顺序版本更快取决于许多因素。Whether a particular ordered parallel operation is faster than a sequential version of the operation depends on many factors.

下面的代码示例演示了如何选择使用顺序保留。The following code example shows how to opt in to order preservation.

var evenNums = from num in numbers.AsParallel().AsOrdered()
              where num % 2 == 0
              select num;
Dim evenNums = From num In numbers.AsParallel().AsOrdered()
              Where num Mod 2 = 0
              Select num


有关详细信息,请参阅 PLINQ 中的顺序保留For more information, see Order Preservation in PLINQ.

并行和顺序查询Parallel vs. Sequential Queries

某些操作要求按顺序提供源数据。Some operations require that the source data be delivered in a sequential manner. 必要时,ParallelEnumerable 查询运算符自动还原为顺序模式。The ParallelEnumerable query operators revert to sequential mode automatically when it is required. 对于要求顺序执行的用户定义的查询运算符和用户委托,PLINQ 提供了 AsSequential 方法。For user-defined query operators and user delegates that require sequential execution, PLINQ provides the AsSequential method. 使用 AsSequential 时,查询中的所有后续运算符都会顺序执行,直到再次调用 AsParallelWhen you use AsSequential, all subsequent operators in the query are executed sequentially until AsParallel is called again. 有关详细信息,请参阅如何:合并并行和顺序 LINQ 查询For more information, see How to: Combine Parallel and Sequential LINQ Queries.

合并查询结果的选项Options for Merging Query Results

当一个 PLINQ 查询并行执行时,它从每个工作线程得到的结果必须合并回到主线程上,以便由 foreach 循环(在 Visual Basic 中为 For Each)使用或插入到列表或数组中。When a PLINQ query executes in parallel, its results from each worker thread must be merged back onto the main thread for consumption by a foreach loop (For Each in Visual Basic), or insertion into a list or array. 例如在某些情况下,指定一个特定类型的合并操作可能会有好处,以更快地开始产生结果。In some cases, it might be beneficial to specify a particular kind of merge operation, for example, to begin producing results more quickly. 为此,PLINQ 支持 WithMergeOptions 方法和 ParallelMergeOptions 枚举。For this purpose, PLINQ supports the WithMergeOptions method, and the ParallelMergeOptions enumeration. 有关详细信息,请参阅 PLINQ 中的合并选项For more information, see Merge Options in PLINQ.

ForAll 运算符The ForAll Operator

在顺序 LINQ 查询中,执行一直延迟到在 foreach(Visual Basic 中为 For Each)循环中或通过调用 ToListToArrayToDictionary 等方法枚举查询。In sequential LINQ queries, execution is deferred until the query is enumerated either in a foreach (For Each in Visual Basic) loop or by invoking a method such as ToList , ToArray , or ToDictionary. 在 PLINQ 中,还可以使用 foreach 执行查询以及循环访问结果。In PLINQ, you can also use foreach to execute the query and iterate through the results. 但是,foreach 本身不会并行运行,因此,它要求将所有并行任务的输出合并回该循环正在上面运行的线程中。However, foreach itself does not run in parallel, and therefore, it requires that the output from all parallel tasks be merged back into the thread on which the loop is running. 在 PLINQ 中,在必须保留查询结果的最终排序,以及以按串行方式处理结果时,例如当为每个元素调用 Console.WriteLine 时,则可以使用 foreachIn PLINQ, you can use foreach when you must preserve the final ordering of the query results, and also whenever you are processing the results in a serial manner, for example when you are calling Console.WriteLine for each element. 为了在无需顺序暂留以及可自行并行处理结果时更快地执行查询,请使用 ForAll 方法执行 PLINQ 查询。For faster query execution when order preservation is not required and when the processing of the results can itself be parallelized, use the ForAll method to execute a PLINQ query. ForAll 不执行最终的这一合并步骤。ForAll does not perform this final merge step. 下面的代码示例说明如何使用 ForAll 方法。The following code example shows how to use the ForAll method. 此处使用 System.Collections.Concurrent.ConcurrentBag<T> 是因为它已优化,可以同时添加多个线程,而无需尝试移除任何项。System.Collections.Concurrent.ConcurrentBag<T> is used here because it is optimized for multiple threads adding concurrently without attempting to remove any items.

var nums = Enumerable.Range(10, 10000);
var query = from num in nums.AsParallel()
            where num % 10 == 0
            select num;

// Process the results as each thread completes
// and add them to a System.Collections.Concurrent.ConcurrentBag(Of Int)
// which can safely accept concurrent add operations
query.ForAll(e => concurrentBag.Add(Compute(e)));
Dim nums = Enumerable.Range(10, 10000)
Dim query = From num In nums.AsParallel()
            Where num Mod 10 = 0
            Select num

' Process the results as each thread completes
' and add them to a System.Collections.Concurrent.ConcurrentBag(Of Int)
' which can safely accept concurrent add operations
query.ForAll(Sub(e) concurrentBag.Add(Compute(e)))

下图展示了 foreachForAll 在查询执行方面的区别。The following illustration shows the difference between foreach and ForAll with regard to query execution.

ForAll 与ForEachForAll vs. ForEach

取消Cancellation

PLINQ 在 .NET Framework 4 中与取消类型集成在一起。PLINQ is integrated with the cancellation types in .NET Framework 4. (有关详细信息,请参阅托管线程中的取消。)因此,与顺序 LINQ to Objects 查询不同,可以取消 PLINQ 查询。(For more information, see Cancellation in Managed Threads.) Therefore, unlike sequential LINQ to Objects queries, PLINQ queries can be canceled. 若要创建可取消 PLINQ 查询,请在查询中使用 WithCancellation 运算符,并提供 CancellationToken 实例作为参数。To create a cancelable PLINQ query, use the WithCancellation operator on the query and provide a CancellationToken instance as the argument. 如果令牌上的 IsCancellationRequested 属性设置为 true,PLINQ 就会注意到它,停止处理所有线程并抛出 OperationCanceledExceptionWhen the IsCancellationRequested property on the token is set to true, PLINQ will notice it, stop processing on all threads, and throw an OperationCanceledException.

在设置取消标记后,PLINQ 查询还可能会继续处理一些元素。It is possible that a PLINQ query might continue to process some elements after the cancellation token is set.

为了提高响应速度,还可以在长时间运行的用户委托中响应取消请求。For greater responsiveness, you can also respond to cancellation requests in long-running user delegates. 有关详细信息,请参阅如何:取消 PLINQ 查询For more information, see How to: Cancel a PLINQ Query.

异常Exceptions

当一个 PLINQ 查询执行时,可能会同时从不同的线程引发多个异常。When a PLINQ query executes, multiple exceptions might be thrown from different threads simultaneously. 此外,处理异常的代码可能与引发异常的代码处于不同的线程上。Also, the code to handle the exception might be on a different thread than the code that threw the exception. PLINQ 使用 AggregateException 类型封装查询抛出的所有异常,并将这些异常封送回调用线程。PLINQ uses the AggregateException type to encapsulate all the exceptions that were thrown by a query, and marshal those exceptions back to the calling thread. 在调用线程上,只需要一个 try-catch 块。On the calling thread, only one try-catch block is required. 不过,可以循环访问在 AggregateException 中封装的所有异常,并捕获任何可以安全恢复的异常。However, you can iterate through all of the exceptions that are encapsulated in the AggregateException and catch any that you can safely recover from. 在极少数情况下,可能会抛出未在 AggregateException 中包装的一些异常,ThreadAbortException 也没有进行包装。In rare cases, some exceptions may be thrown that are not wrapped in an AggregateException, and ThreadAbortExceptions are also not wrapped.

如果允许异常向上冒泡回到联接线程,则查询也许可以在引发异常后继续处理一些项。When exceptions are allowed to bubble up back to the joining thread, then it is possible that a query may continue to process some items after the exception is raised.

有关详细信息,请参阅如何:处理 PLINQ 查询中的异常For more information, see How to: Handle Exceptions in a PLINQ Query.

自定义分区程序Custom Partitioners

在某些情况下,可以通过编写利用源数据的某些特征的自定义分区程序来提高查询性能。In some cases, you can improve query performance by writing a custom partitioner that takes advantage of some characteristic of the source data. 在查询中,自定义分区程序本身是被查询的可枚举对象。In the query, the custom partitioner itself is the enumerable object that is queried.

int[] arr = new int[9999];
Partitioner<int> partitioner = new MyArrayPartitioner<int>(arr);
var query = partitioner.AsParallel().Select(x => SomeFunction(x));
Dim arr(10000) As Integer
Dim partitioner As Partitioner(Of Integer) = New MyArrayPartitioner(Of Integer)(arr)
Dim query = partitioner.AsParallel().Select(Function(x) SomeFunction(x))

PLINQ 支持固定数量的分区(尽管在运行时期间为了负载均衡可能会将数据重新动态分配到这些分区)。PLINQ supports a fixed number of partitions (although data may be dynamically reassigned to those partitions during run time for load balancing.). ForForEach 仅支持动态分区。也就是说,分区数在运行时发生变化。For and ForEach support only dynamic partitioning, which means that the number of partitions changes at run time. 有关详细信息,请参阅 PLINQ 和 TPL 的自定义分区程序For more information, see Custom Partitioners for PLINQ and TPL.

衡量 PLINQ 性能Measuring PLINQ Performance

在很多情况下,可以并行化查询,但是设置并行查询的开销可能会超出获得的性能收益。In many cases, a query can be parallelized, but the overhead of setting up the parallel query outweighs the performance benefit gained. 如果查询不执行大量的计算,或者如果数据源较小,则 PLINQ 查询的速度可能比顺序 LINQ to Objects 查询的速度慢。If a query does not perform much computation or if the data source is small, a PLINQ query may be slower than a sequential LINQ to Objects query. 可以在 Visual Studio Team Server 中使用并行性能分析器比较各种查询的性能,查找处理瓶颈,以及确定查询是并行运行还是按顺序运行。You can use the Parallel Performance Analyzer in Visual Studio Team Server to compare the performance of various queries, to locate processing bottlenecks, and to determine whether your query is running in parallel or sequentially. 有关详细信息,请参阅并发可视化工具 SDK如何:衡量 PLINQ 查询性能For more information, see Concurrency Visualizer and How to: Measure PLINQ Query Performance.

请参阅See also