Jobs and Tasks

A task represents the execution of a single process or multiple processes on a compute node. A collection of tasks that is used to perform a computation is known as a job. Jobs are used to reserve the resources required by tasks.

To create a job, use the ICluster::CreateJob or ICluster::CreateJobFromXml method. To create a task, use the ICluster::CreateTask, ICluster::CreateTaskFromXml or ICluster::CreateTaskFromXmlFile method. To add a child task to a job, use the ICluster::AddTask method.

Compute jobs

The following diagram shows the job life cycle.

Job life cycle

The ICluster::AddJob method adds the job to the cluster, but the job state remains JobStatus_NotSubmitted. You can add tasks to the job, and then submit the job using the ICluster::SubmitJob method. If the job is successfully submitted, its job state is JobStatus_Queued. As an alternative to calling both AddJob and SubmitJob, you can call the ICluster::QueueJob method.

When resources are allocated to a job, the job state changes to JobStatus_Running. When a job finishes, fails, or is canceled, its allocated resources are released and its job state changes to JobStatus_Finished, JobStatus_Failed, or JobStatus_Cancelled, respectively.

To start a failed job in the queue, call the ICluster::RequeueJob method. Failed, canceled, and finished jobs stay in the cluster until the TTLCompletedJobs value expires.

Compute tasks

The following diagram shows the task life cycle.

Task life cycle

All tasks that are associated with a job are queued or submitted when the parent job is queued or submitted. When resources are allocated to a task, its task state changes to TaskStatus_Running. When a task finishes, fails, or is canceled, its allocated resources are released and its task state changes to TaskStatus_Finished, TaskStatus_Failed, or TaskStatus_Cancelled, respectively.

About CCP

Job Scheduler Architecture