Compute Cluster Server Command Line Interface Reference
Applies To: Windows Compute Cluster Server 2003
This reference provides command descriptions and syntax for all cluster-specific command line executables used in Windows 2003 Compute Cluster Server. The Command Line Interface (CLI) commands provide a keyboard alternative to most actions otherwise performed using the Job Manager or Administrator interfaces.
All commands in this reference except clusrun follow this general syntax:
<command> <operator>[options]
—Or—
<command> <operator>[options] <command_line>[arguments]
There are five commands in the CLI:
job
task
node
cluscfg
clusrun
Each command (except clusrun) has its own set of operators. It is the combination of a command and an operator that constitutes a CLI executable. For example, job new creates a new job.
Each operator has a set of options. Options are all preceded by a forward slash and take the form /<option>:<value>. For example, job new /jobname:my_job creates a job named my_job.
The command line parameter is the command line of a task. Arguments are the arguments associated with that command line.
To obtain help for a CLI command, type:
<command> /? or <command> /help
To obtain help for a CLI command operator, type:
<command> <operator> /? or <command> <operator> /help
Syntax conventions
This document uses the following special syntax conventions:
/scheduler:<host> is a universal option and is used to specify a host other than the local host.
/jobfile:<template_file> is the XML file in which the specifics of a job and its tasks are stored.
/taskfile:<template_file> is the equivalent of /jobfile:<template_file> when the template file is being used as a task property.
standard_job_options are one or more of a set of options applicable to a job. These are defined once for job new and afterward are referenced.
standard_task_options are a set of options applicable to a task. These are defined once for job add and afterward are referenced.
task_options_subset is the subset of standard_task_options that can be used with the job submit command.
credential_options is the option pair /user:<domain\user> /password:<password> and applies when the job is run under a user other than the invoking user.
jobId is the system-assigned identification number for a job.
jobId.taskID is the system-assigned identification number for a task.
Job command
The job command is used to create, submit, view, and manage jobs. The job command operators are:
add
cancel
list
listtasks
modify
new
requeue
submit
view
job add
Adds a task to the task queue for a specified job and returns a unique task ID. Tasks can be added only to jobs in the Not_Submitted, Queued or Running state.
SYNOPSIS
job add <JobID> [standard_task_options] [/scheduler:<host>] <command> [arguments]
Standard Task Options
Option | Description | Maximum Characters |
---|---|---|
/name:<task_name> |
Name of the task. |
80 |
/numprocessors:<min_processors> or <min_processors>, <max_processors> |
Minimum and maximum number of processors to be allocated. The default is one processor. |
N/A |
/rerunnable: true | false |
A flag indicating that that a task can be rerun after a failure. Default is true. The scheduler allows a failed job to be requeued if the failure is due to any error that can be fixed without changing the task command line. If the task or task fails for reasons of system failure (for example, a node crashes), the scheduler requeues the job automatically. Only incomplete tasks are re-run. |
N/A |
/requirednodes:<node1>,<node2>,…<nodeN> |
Specifies by name the nodes to be allocated to the task. /requirednodes overrides /numprocessors and also forces the job to reserve the nodes that are specified. |
2080 |
/env:name1=val1 /env:<name2=val2 … /env:nameN=valN> |
Specifies the environment variables for the task. (For more information about environment variables, see Use Environment Variables.) |
2048 |
/exclusive: true | false |
A flag indicating that the task has exclusive use of reserved nodes. |
N/A |
/runtime:[[[days:<num>]hours:<num>]minutes: <num>| infinite] |
Maximum run time in day-hour-minute format. The job will be cancelled rather than allowed to run past the maximum run time. Default is Infinite. |
8 |
/workdir:<path> |
The full path of the working directory (the directory for input, output, and error files). The path may contain environment variables. Default is %USERPROFILE%. |
160 |
/stdin:<file_name> |
Take standard input for the task from file <file_name>. |
160 |
/stdout:<file_name> |
Redirect standard output of the task to the file <file_name>. |
160 |
/stderr:<file_name> |
Redirect standard error of the task to a file <file_name>. |
160 |
/depend:<task_name1> |
Specifies that this task depends on a task or tasks of the name <task_name>. (Multiple tasks of the same name will have different task IDs.) If multiple tasks of different names are depended on, the job add command needs to be repeated: job add 21 /name:task3 /depend:task1 myapp3.exe job add 21 /name:task3 /depend:task2 myapp3.exe |
320 |
An additional task option is:
Option | Description |
---|---|
/taskfile:<template_file> |
Overwrites the contents of the task with the values in this job template file, except where a different value is explicitly set in the command line. |
job cancel
Terminates the running job and cancels all of its resource reservations.
SYNOPSIS
job cancel [options] [/scheduler:<host>] <jobID>
Option | Description |
---|---|
/message:<msg_string> |
Msg_string is an optional user-written log entry for session cancellation. The default log entry is cancelled by <invoking_user>. Messages containing white spaces should be entered in double quotes. |
job list
Lists all jobs in the cluster. The output contains a table for each job in the following form:
Job ID | Description |
---|---|
USER |
Submission user |
NAME |
User-specified name of job |
STATUS |
Not_Submitted, Queued, Running, Cancelled, Finished or Failed |
PRIORITY |
Highest, AboveNormal, Normal, BelowNormal, Lowest |
By default, if the invoking user is an administrator, then all jobs are listed. If the invoking user is a user without administrative rights, only his or her jobs are listed.
By default, only active (queued or running) jobs are displayed.
SYNOPSIS
job list [options] [/scheduler:<host>]
Options
Option | Description |
---|---|
/user:[<user_name> | *] |
Show only jobs of the user <user_name>. If the keyword ‘*’ is specified, then all users jobs will be displayed. |
/status:[<stat1,stat2,…,statN> | *] |
Display the jobs of each status specified. (Not_Submitted, Queued, Running, Cancelled, Finished or Failed) If the keyword /all is used, both active and completed jobs will be displayed. If completed jobs are displayed, one more column, Complete_Time is displayed onscreen. |
/all |
List all the jobs in the system. This is the equivalent of “job list /user:* /status:*” |
job listtasks
List the tasks of the job <jobID>. Each task will be displayed with the following fields:
Task ID | Description |
---|---|
Status |
Status of the task. |
Name |
Name of the task. |
Command line |
Command line of the task. |
Number of processors |
Minimum and maximum number of processors. |
Execution nodes |
Compute nodes the task runs on. If the task is not running, this value is blank. |
SYNOPSIS
job listtasks [/scheduler:<host>] <jobId>
job modify
Modifies a queued or running job. For jobs in the Queued state, all modified terms take effect immediately. For jobs in the Running state, only changes to the following options take effect:
/runtime:
/rununtilcancelled
/projectname:
The following modified terms will not take effect unless and until the job is requeued:
/jobname:
/numprocessors:
/askednodes:
/priority:
/license:
/exclusive
If /numprocessors: is specified at both the job and task level, the values for the job must be a superset of the values for the tasks.
If /runtime is specified at both the job and task level, the job run time must be equal to or greater than the longest task run time.
SYNOPSIS
job modify [/jobfile:<template_file>] [credential_options] [/scheduler:<host>] <jobId>
job modify [standard_job_options] [credential_options] [/scheduler:<host>] <jobId>
job modify [/jobfile:<template_file>] [credential_options] [jobterm_options] [/scheduler:<host>] <jobId>
Option | Description |
---|---|
Standard job options |
For job options, see job new. |
Credential options |
This is typically used when there is a need to update the user’s password for the Job Scheduler. |
/jobfile:<template_file> |
Overwrites the contents of the job with the values in this job template file, except where a different value is explicitly set in the command line. |
Examples:
job modify [/jobfile:<template_file>] [credential_options] [/scheduler:<host>] <jobId>
This command overwrites the content of an existing job jobId with the job options specified in the template file. The task options specified in the template file are ignored.
job modify [standard_job_options] [credential_options] [/scheduler:<host>] <jobId>
This command updates the job jobId with the job options specified.
job modify [/jobfile:<template_file>] [standard_job_options] [credential_options] [/scheduler:<host>] <jobId>
This command overwrites the content of an existing job jobId with the job options specified in the template file, then updates the job options values with those made explicit on the command line.
Note
Modifying the run time for a backfill job (one that has jumped the queue to take advantage of idle reserved nodes) is not permitted, because doing so could delay the reserving job.
job new
Creates a new job and returns a unique job ID. The job is created in the Not_Submitted state and contains no tasks.
SYNOPSIS
job new [standard_job_options] [/jobfile:<template_file>] [/scheduler:<host>]
Option | Description |
---|---|
/jobfile:<template_file> |
Use the settings in this job template XML file, except where a different value is explicitly set in the command line. |
Standard job options
Option | Description | Maximum Characters |
---|---|---|
/jobname:<job_name> |
Name of the job. |
80 |
/numprocessors:<min_processors> or <min_processors>-<max_processors> |
Minimum and maximum number of processors to be allocated. The default is one processor. |
N/A |
/askednodes:<node1>,<node2>,…<nodN> |
Specifies nodes to be allocated to the job by name. By default, all nodes in the cluster are candidates. |
2080 |
/exclusive: true | false |
By default, a job has exclusive use of nodes reserved by it. If /exclusive: is set to false, idle, reserved processors on these reserved nodes are available to other jobs. This is reciprocal, making nodes reserved to other jobs available to this job if they have also been flagged as nonexclusive. |
N/A |
/priority:<priority_class> |
Schedule priority class: Highest, AboveNormal, Normal, BelowNormal,or Lowest. Highest and AboveNormal are available only to administrative users. The default is Normal. Within a priority class, the job is placed in the job queue in the order received unless requeued. If requeued, the job always goes to the top of its priority class. |
N/A |
/runtime:[[[days:<num>]hours:<num>]minutes: <num>| infinite] |
Maximum run time in day-hour-minute format. The job will be cancelled rather than allowed to run past the maximum run time. Default is Infinite. |
8 |
/rununtilcanceled: true | false |
Flag indicating that the job will hold its resources until it is cancelled or reaches its run time limit. This way, additional tasks can be run on the nodes. |
N/A |
/projectname:<project_name> |
Name of a project, if any, to which the job belongs. |
80 |
/license:<feature1>:<amt1> /license:<feature2>:<amt2> …/license:<featureN>:<amtN> |
License features required to run the tasks in the number of tokens of each. |
160 |
job requeue
Requeues the job specified by jobId. To requeue a job is to stop it and reinsert it as the topmost job in its priority class segment. The job retains its original submission time, not the requeue time. Requeuing can be performed on running, canceled, and, in some cases, failed jobs. Only unfinished tasks are rerun.
By default, a failed job is requeued automatically if the failure is due to a system failure, such as a node reboot. If automatic requeue is not desired, set the task property /rerunnable to false.
A failed job can also be requeued manually if failure is due any error that can be fixed without changing the task command line. For example, a task may call for an input file that is not there or contains errors. Such jobs are not requeued automatically, because the error must first be corrected.
SYNOPSIS
job requeue [/scheduler:<host>] <jobId>
job submit
Submits a new or existing job to the queue.
SYNOPSIS
job submit /id:<jobID> [credential_options] [/scheduler:<host>]
job submit /jobfile:<template_file> [credential_options] [/scheduler:<host>]
job submit [standard_job_options] [task_options_subset] [credential_options] [/scheduler:<host>] <command> [arguments]
Options | Description |
---|---|
Standard job options |
See job options for job new. Option /numprocessors: will apply to both job and task. |
Task options subset |
See standard task options for job add. Only this subset applies: /name: /rerunnable /workdir: /stdin: /stdout: /stderr: |
Examples:
job submit /id:<jobID> [credential_options] [/scheduler:<host>]
This command submits a job created by the job new command by jobID. Only the creator of the job can submit the job.
job submit /jobfile:<template_file> [credential_options] [/scheduler:<host>]
This command submits a job based in a job template file. It returns jobID and taskID.
job submit [standard_job_options] [task_options_subset] [credential_options] [/scheduler:<host>] command [arguments]
This command submits creates and submits a job. It returns jobID and taskID
job submit “cmd.exe /k myapp.exe 1> %my_resultdir%\myapp_%CCP_JOBID%.out 2> %my_resultdir%\myapp_%CCP_JOBID%.out”
This command submits the command line of a user executable as a job.
Job view
Displays the details of a specified job.
SYNOPSIS
job view [/scheduler:<host>] <JobID>
Display the details of the specified job, including:
Option | Description |
---|---|
Job ID |
Job ID. |
Status |
Not_Submitted, Queued, Running, Cancelled, Finished, or Failed. |
Name |
Job name specified by the user |
Submitted by |
Cluster user that submitted the job. |
Number of processors |
Minimum and maximum number of processors. |
Allocated Nodes |
Execution nodes. |
Submit time |
Submission time in date-hour-minute format. |
Start time |
Time job started, in date-hour-minute format. |
End time |
Time job ended, in date-hour-minute format. |
Number of Tasks |
Number of tasks. |
Notsubmitted |
Number of tasks not yet submitted to the cluster nodes. |
Queued |
Number of queued tasks. |
Running |
Number of running tasks. |
Finished |
Number of finished tasks. |
Failed |
Number of failed tasks. |
Cancelled |
Number of cancelled tasks. |
Task command
The task command is used to view, cancel, and requeue tasks. The task command operators are:
cancel
requeue
view
task cancel
Terminates the running cancelled task and cancels all of its resource reservations.
SYNOPSIS
task cancel [<options>] <jobID.taskID>
Option | Description |
---|---|
/message:<msg_string> |
Msg_string is an optional user-written log entry for the session cancellation. The default log entry is cancelled by <invoking_user>. Messages containing white spaces should be entered in double quotes. |
task requeue
Requeues the task specified by jobId.taskID, stopping it and reinserting it as the next task in the queue.
SYNOPSIS
task requeue [/scheduler:<host>] <jobId.taskId>
task view
Displays the details of task in the following form:
Term | Description |
---|---|
Task ID |
Task ID. |
Status |
Status of the task (for example, Finished). |
Name |
Task name. |
Command line |
Task command line. |
Allocated nodes |
Execution node list. |
Exit code |
Exit code of the task: 0=task finished; any other exit code = task failed. |
Submit time |
For a failed task, the error message. For a cancelled task, the cancellation message provided by the user. Default is "cancelled by <invoking user>.” |
Start time |
Values for the current usage of a task. These include:
|
End time |
For each node, displays the processes created on the node. |
Kernel time |
Kernel mode CPU time used by all processes since the start of the task. |
User time |
User mode CPU time used by all processes since the start of the task. |
Working set |
Current total working set size of all processes in the task. |
SYNOPSIS
**task view [/scheduler:<host>] <jobId.TaskId> |<**jobId >
EXAMPLE:
task view 101.0
Displays the details of the first task of job 101.
Node command
The node command allows you to add, remove, and manage nodes. The node operators are:
approve
list
pause
resume
node approve
Approve an added node. After a node has been added, that node is in a pending state, awaiting approval by an administrative user.
SYNOPSIS
node approve [/scheduler:<host>] <node_name>
node list
Lists the nodes and the status and statistic for each. The output is a table with each row containing the following fields:
Term | Description |
---|---|
NODE_NAME |
Name of the node. |
STATUS |
Pending, Ready, Paused, Unreachable. |
MAX |
Maximum number of job slots available. |
RUN |
Number of job slots used by running jobs on this node. |
IDLE |
Number idle job slots available on this node. |
SYNOPSIS
node list [/scheduler:<host>]
node pause | resume
Pause a node or resume the activity of a node that is paused. When a node is paused, jobs running on the node continue to run but no new jobs from users without administrative rights will be started. New administrator jobs will be accepted.
SYNOPSIS
node [pause |resume] [/scheduler:<host>] {node_name} [/all]
Option | Description |
---|---|
/all |
Pause or resume all nodes. |
Cluscfg command
The cluscfg command allows monitoring and manipulation of the queue. cluscfg operators include:
delcreds
listenvs
listparams
setcreds
setenvs
setparams
view
cluscfg delcreds
Deletes the cached credential of the named user from the invoking user’s cache. If /user is not supplied, the invoking user is assumed.
SYNOPSIS
cluscfg delcreds [/user:<DOMAIN>\<user>] [/scheduler:<host>]
cluscfg listenvs
Lists the cluster-wide environment variables of the cluster.
SYNOPSIS
cluscfg listenvs [/scheduler:<host>]
cluscfg listparams
Returns the following cluster parameters, which are stored in HKLM\System\CurrentControlSet\Services\CCPSchedSvc\Enum.
Parameter | Description | Default Value |
---|---|---|
ActivationFilterProgram |
Activation filter executable file name. |
15 seconds |
ActivationFilterTimeout |
Activation filter program time-out. |
15 seconds |
BackFillLookahead |
Specification of backfill behavior or number of jobs the scheduler searches to find jobs that can backfill the jobs at the top of the job queue. |
<0=search through the entire job queue (default) 0=no backfill >0=number of jobs to search |
EventLogLevel |
Sets the level of Job Scheduler events that that appear in the Event Viewer. Levels are: ActivityTracing- Stop, Start, Suspend, Transfer, and Resume events are displayed. All-All events are displayed. Critical-Critical events are displayed. Error-Critical and Error events are displayed. Information-Critical, Error, Warning, and Information events are displayed. Off-No events are displayed. Verbose-Critical, Error, Warning, Information, and Verbose events are displayed. Warning-Critical, Error, and Warning events are displayed. For more information about event levels, see TraceEventType Enumeration (https://go.microsoft.com/fwlink/?LinkId=60988). |
Error |
HeartbeatInterval |
Interval by which the scheduler sends health probes to the Node Manager. |
60 seconds |
InactivityCount |
Number of missing beats (no reply from the health probes) before the Job Scheduler declares the node Unreachable. |
3 |
JobRetryCount |
Maximum time the system reruns a job. |
3 |
JobRuntime |
Format:<dd>:<hh>:<mm>. |
Infinite |
SpoolDir |
The directory where the output of the clusrun command is redirected. |
\\<head_node>\spooldir |
SubmissionFilterProgram |
Submission filter executable file name. |
“” |
SubmissionFilterTimeout |
Time-out value (in seconds) for the submission filter. |
15 seconds |
TaskRetryCount |
Maximum time the system reruns a task. |
3 |
TTLCompletedJobs |
Time in days for completed job records to remain in the MSDE. |
5 days |
SYNOPSIS
cluscfg listparams [/scheduler:<host>]
cluscfg setcreds
Sets the credential of a named user into the credential cache of the invoking user. If /user is not supplied, the invoking user is assumed. If /password is not provided, the Stored User Names and Passwords UI is prompted.
SYNOPSIS
cluscfg setcreds [/user:<DOMAIN>\<user>] [/password:<password>] [/scheduler:<host>]
cluscfg setenvs
Setting cluster-wide environment variables to specified values
Cluster-wide environment variables are variables that apply to the entire cluster. Cluster-wide environment variables can be viewed or set using the cluscfg command. To add or set a cluster-wide environment variable, you must have administrative credentials. For more information, see Compute Cluster Server Command Line Interface Reference (https://go.microsoft.com/fwlink/?LinkID=64065).
There are two preexisting cluster-wide environment variables. These are set during system deployment and are rarely changed manually or used in commands:
Environment Variable | Description |
---|---|
CCP_CLUSTER_NAME |
Name of the cluster. |
CCP_MPI_NETMASK |
Subnet mask for the interface to be used by the MPI process, if a separate MPI network exists. Example: CCP_MPI_NETMASK=172.30.0.0./255.255.0.0. |
MPICH_SOCKET_SBUFFER_SIZE |
Send buffer size for the socket and shared memory channel (CH3) used by MS MPI. Default size is 32*1024=32768 bytes. |
MPICH_SOCKET_SBUFFER_SIZE |
Send buffer size for the socket and shared memory channel (CH3) used by MS MPI. Default size is 32*1024=32768 bytes. |
The most common example of an added cluster-wide environment variable is Path. Path functions identical to the Windows Path environment variable, but applies to all nodes in the cluster and only in the context of a job task.
SYNOPSIS
cluscfg setenvs “<name1=value1>” “<name2=value2>”… “<nameN=valueN>” [/scheduler:<host>]
To unset an environment variable, use an empty string as the value. Example:
cluscfg setenvs "MY_VAR=”
This unsets the environment variable MY_VAR.
cluscfg setparams
Sets the named parameters to the values specified. Refer to cluscfg listparams for parameter definitions.
SYNOPSIS
cluscfg setparams [TTLCompletedJobs=val] [JobRetryCount=val] [TaskRetryCount=val] [JobRuntime=val|Infinite] [SubmissionFilterProgram=val] [SubmissionFilterTimeout=val] [ActivationFilterProgram=val] [ActivationFilterTimeout=val] [BackFillLookahead=val] [HeartbeatInterval=val] [InactivityCount=val] [SpoolDir=val] [eventloglevel=off|critical|warning|error|information|verbose|activitytracing|all]
[/scheduler:<host>]
cluscfg view
Displays the details of a cluster. The output contains:
Term | Description |
---|---|
Cluster name |
Name of the cluster. |
Total number of compute nodes |
Number of nodes in cluster. |
Number of ready compute nodes |
Number of nodes with Ready status. |
Number of paused compute nodes |
Number of nodes with Paused status. |
Number of unreachable compute nodes |
Number of nodes with Unreachable status. |
Number of compute nodes pending for approval |
Number of nodes with Pending for Approval status. |
Total number of processors |
Number of processors in the cluster. |
Number of idle processors |
Number of processors not running tasks. |
Number of busy processors |
Number of processors running tasks. |
Number of jobs not submitted |
Number of pending jobs not submitted to the Job Scheduler. |
Number of queued jobs |
Number of jobs in the CCS job queue. |
Number of jobs running |
Number of jobs from the queue. |
Number of finished jobs |
Number of jobs that completed successfully. |
Number of failed jobs |
Number of jobs that have failed. |
Number of cancelled jobs |
Number of jobs that have been cancelled. |
SYNOPSIS
cluscfg listparams [/scheduler:<host>]
Clusrun command
clusrun is an administrative command that runs an instance of a specified command on multiple nodes, redirecting output to the client node. The client node can be a head node or any compute node on the cluster, accessed directly or remotely. Redirected output includes the standard output and error streams as well as run time system error messages. The output from each node is delimited by a header indicating the node.
If clusrun isinterrupted or terminated, the remote command instances are also terminated.
clusrun requires administrative rights.
Running MPI applications through clusrun is not supported.
SYNOPSIS
clusrun [/scheduler:host] [credential_options]
[/nodes:node1,node2…nodeN]
[/all] [/pausednodes] [/oknodes]
[/stdin:file] [/workdir:dir] [/env:name1=val1] [/env:name2=val2] command [arguments]
Options
Option | Description |
---|---|
/nodes:[<node1>[,<node2>…]] |
Specify a list of nodes on which the command is invoked. Default is all Ready and Paused nodes. |
/all |
Run command on all Ready and Paused nodes. This is the default. |
/oknodes |
Run command on all Ready nodes. Default is all Ready and Paused nodes. |
/pausednodes |
Run command on all Paused nodes. Default is all Ready and Paused nodes. |
/stdin:<file> |
Take standard input for all command instances from file <file>. |
/workdir:<file> |
Work directory for input, output, and error files. Default is %USERPROFILE%. |
/env:<name1>=<val1> /env:<name2>=<val2> … /env:<nameN>=<valN> |
Specify the environment variables for the task. For more information, see Use Environment Variables. Compute node–side, environment variable expansion is not supported. For example: /env:myvar=^%XYZ_HOME^% will NOT cause the %XYZ_HOME% to be expanded on the remote node side. |