Jobs and Tasks
A task represents the execution of a single process or multiple processes on a compute node. A collection of tasks that is used to perform a computation is known as a job. Jobs are used to reserve the resources required by tasks.
To create a job, use the ICluster::CreateJob or ICluster::CreateJobFromXml method. To create a task, use the ICluster::CreateTask, ICluster::CreateTaskFromXml or ICluster::CreateTaskFromXmlFile method. To add a child task to a job, use the ICluster::AddTask method.
The following diagram shows the job life cycle.
The ICluster::AddJob method adds the job to the cluster, but the job state remains JobStatus_NotSubmitted. You can add tasks to the job, and then submit the job using the ICluster::SubmitJob method. If the job is successfully submitted, its job state is JobStatus_Queued. As an alternative to calling both AddJob and SubmitJob, you can call the ICluster::QueueJob method.
When resources are allocated to a job, the job state changes to JobStatus_Running. When a job finishes, fails, or is canceled, its allocated resources are released and its job state changes to JobStatus_Finished, JobStatus_Failed, or JobStatus_Cancelled, respectively.
To start a failed job in the queue, call the ICluster::RequeueJob method. Failed, canceled, and finished jobs stay in the cluster until the TTLCompletedJobs value expires.
The following diagram shows the task life cycle.
All tasks that are associated with a job are queued or submitted when the parent job is queued or submitted. When resources are allocated to a task, its task state changes to TaskStatus_Running. When a task finishes, fails, or is canceled, its allocated resources are released and its task state changes to TaskStatus_Finished, TaskStatus_Failed, or TaskStatus_Cancelled, respectively.