Configuring LINQ to HPC Queries

The HpcLinqConfiguration class contains settings that control the behavior of LINQ to HPC queries and DSC operations. The following code shows how to create an instance of this class.

var config = new HpcLinqConfiguration("MyHpcClusterHeadNode");

The argument to the constructor is the name of the HPC cluster head node computer.

Configuration options are available as properties of the HpcLinqConfiguration class. The following code shows how to set the LocalDebug property.

config.LocalDebug = true;

Setting configuration properties has no effect until you pass the configuration instance to the constructor of the HpcLinqContext class. The following code is an example of how to do this.

var context = new HpcLinqContext(config);           

At this point, the context object has been configured and is ready to be used. You can still modify the configuration object that you passed to the HpcLinqContext constructor, but changes to the configuration are ignored by existing context objects. You must instantiate a new context object if you want configuration changes to take effect. After you create a context object, you can use the Configuration property to query for its configuration, but the configuration object that is returned is a read-only copy. An exception is thrown if you attempt to set a property of a read-only configuration object.

The following example creates an instance of the HpcLinqConfiguration class, sets the NodeGroup property, and passes the configuration instance to the constructor. It then runs the lines query on the node group MyNodeGroup.

string fileSetName = ...
var config = new HpcLinqConfiguration("MyHpcClusterHeadNode");
config.NodeGroup = "MyNodeGroup";

var context = new HpcLinqContext(config);

var lines = context.FromDsc<LineRecord>(fileSetName);
Console.WriteLine("The number of lines is {0}", lines.Count());

Note

The fileSetName variable in this example is the name of the DSC file set that contains the records that are used by the query. You should make sure that the file set name is unique. For example, you can prefix the file set with your user name. There is no directory structure for file sets. All users share the same pool of file sets and must, therefore, devise naming strategies that avoid naming conflicts among users.

This table lists the HpcLinqConfiguration class's configuration properties. Types are specified in the Visual C# short form.

Property name

Type

Default

Meaning

AllowConcurrentUserDelegatesInSingleProcess

bool

true

LINQ expressions may be executed in parallel on multiple threads within the same process. Set this to false to guarantee single threaded execution of all LINQ to HPC components that call back to user code.

Note: opting for single threaded execution might have a negative impact on performance. You should only do this if your query requires it.

CompileForVertexDebugging

bool

false

This property enables you to specify whether to compile code that enables you to debug vertex tasks that execute on an HPC cluster. If the value is true, vertex code is compiled without code-level optimizations, and a program database (PDB) file is generated. The query execution job will include the PDB associated with each DLL resource that is part of the job.

EnableSpeculativeDuplication

bool

true

Allow the graph manager to run additional instances of a vertex when this may improve performance. This may result in the same vertex being executed more than once, only the results from the first vertex to finish will be used, other vertices will be cancelled and their output ignored.

GraphManagerNode

string

null

This property enables you to specify the node on which the graph manager runs. If the value is null, the graph manager runs on an arbitrary machine that is selected from the group of machines that are executing the job. Because you cannot run the graph manager on the same machine as a vertex, it is often a good practice to designate one node that is not used as a DSC node. This prevents copying data to a node that cannot make use of it.

In general, you should not set the GraphManagerNode property. Instead, let the system choose a compute node for this purpose.

HeadNode

string

No default value

This property enables you to specify the head node on which the HPC query job executes. It is set by the configuration constructor. This property must be set in order to get a valid context object.

IntermediateDataCompressionScheme

DscCompressionScheme

DscCompressionScheme.Gzip

The data compression scheme to use for intermediate data produced during the execution of a query. Typically compressing data improves performance by reducing disk I/O. Consider disabling compression if the data is already compressed, for example JPEG image data.

IsReadOnly

bool

(Always set by the system)

This property indicates whether the HpcLinqConfiguration object can be modified. If you use the HpcLinqContext class's Configuration property to retrieve a configuration object from a context, the configuration object that is returned has an IsReadOnly property with the value true. If you create a new instance of the HpcLinqConfiguration class, the value of the IsReadOnly property is false.

Note: You cannot set this property.

JobEnvironmentVariables

string

“”

A dictionary that initializes environment variables on the DSC node before executing the vertices of a LINQ to HPC query.

JobFriendlyName

string

null

This property enables you to specify the name that describes the HPC query job. This name appears in the HPC Cluster Manager. It can be overridden by cluster settings, such as node templates.

The default node template’s JobName will be used if no JobFriendlyName is set in this property. If the job template limits the names, then the JobFriendlyName must meet the requirements of the template’s list of valid names.

JobMaxNodes

int?

null

This property enables you to specify the maximum number of cluster nodes that the HPC server job will use. The number should equal the number of vertex nodes and the graph manager. This value is passed to the scheduler, which uses it to schedule the job. If the value is null, there is no upper limit.

JobMinNodes

int?

null

This property enables you to specify the minimum number of cluster nodes that the HPC server job will use. The number should equal the number of vertex nodes and the graph manager. This value is passed to the scheduler, which uses it to schedule the job. If the value is null, there is no lower limit.

JobPassword

string

null

This property enables you to specify the RunAs password that is associated with the job. If the value is null, then the credentials associated with the current thread are used.

JobRuntimeLimit

int?

null

This property enables you to specify the length of time a job will run, in seconds. If this value is exceeded, then the job has failed and is canceled. The default is null, which means there is no time limit. The value can be overridden by cluster settings such as node templates.

JobUsername

string

null

This property enables you to specify the account under which the job will run. If the value is null, then the credentials associated with the current thread are used.

LocalDebug

bool

false

This property enables you to specify whether to use the local debugging mode. If the value is true, then the query executes in the current AppDomain, and uses LINQ-to-Objects to execute the query. Local debugging is useful if you want to debug user functions before you execute the job on the cluster. This mode accesses the DSC in the usual way for input and output data. However, the vertex code is not compiled, and the job is not submitted to the HPC server.

MatchClientNetFrameworkVersion

bool

false

When true the LINQ to HPC runtime Vertex Host targets the .NET version of the client process that compiled the vertex code.

NodeGroup

string

null

This property enables you to specify the name of the compute node group on which the job will run. The value can be overridden by cluster settings, such as node templates. HPC enables you to identify groups of nodes within a cluster. This allows you to schedule a job on a named set of nodes. Use the HPC Cluster Manager to create and manage node groups. Remember that your data is distributed across the cluster. If you confine your job to a specific set of notes, then data that resides on the excluded nodes must be copied to the node group. This can affect performance. Note that data locality is lost if a job runs on a subset of DSC nodes.

OutputDataCompressionScheme

DscCompressionScheme

DscCompressionScheme.None

The data compression scheme used for data output by a query. Typically compressing data improves performance by reducing disk I/O. Consider disabling compression if the data is already compressed, for example JPEG image data.

ResourcesToAdd

IList<string>

An empty IList<string>

This property enables you to specify the list of resources that must be added to a job in order to execute the query. By default, all the assemblies that your program loaded when the query was submitted are copied to the cluster along with the execution plan, except for any assemblies that begin with Microsoft.* or System.*. Use the ResourcesToAdd property when additional assemblies or files must be copied. Each resource should be specified with a complete path name, and the resource must be accessible from the local computer.

ResourcesToRemove

IList<string>

An empty IList<string>

This property enables you to specify the resources that must be removed from a job. You may want to delete resources to improve performance, and to reduce the amount of copying. Each resource should be specified with a complete path name.

RuntimeTraceLevel

HpcQueryTraceLevel

HpcQueryTraceLevelError

This property enables you to specify the trace level for a job. The trace level affects the logs that are produced by all the components that are associated with a job that has executed. The possible values are Off, Critical, Error, Warning, Information, and Verbose. Typically, this property is set by customers when they are working with Microsoft® to reproduce a problem.

SelectiveOrderPreservation

bool

false

Guarantees the preservation of ordering in output data set with respect to the input for the following query operators; Select, SelectMany and Where. If this flag is false then the LINQ to HPC runtime will execute the most efficient query, regardless of whether ordering will be preserved. When true ordering will be preserved at the possible expense of execution efficiency.

Properties and job templates

Several of the properties in the table can be overridden by the HPC job templates. These properties are:

  • JobFriendlyName

  • NodeGroup

  • JobRunTimeLimit

There are two general rules that govern the interaction between properties and job templates. One is that if you do not specify the property, then the template's default value is used. The other is that if you do specify the property, then it must comply with the limits that are set by the template. An error is thrown if a value is not in compliance. In other words, if a value is in conformance, it is used. If it is not, the template's value is used.

Unless you modify it, the HPC server's default job template does not specify any constraints, which means that configuration properties are unconstrained. You can, however, provide a job template that includes both defaults and range restrictions. These are independent of each other. For example, if the template specifies a default but no range restriction, then the configuration property overrides the default, and is not rejected by the job template.

If the template specifies both a default and a range restriction that is the same as the default, the job submission fails if you set the corresponding configuration property in the HpcLinqConfiguration object to anything other than the template's default value.

You should be careful when you modify the HPC server’s default job template. It is possible to introduce failures. For example, you might inappropriately limit the UnitType and unintentionally exclude compute nodes.