Jobs
Deployments and Replication Controllers are a great way to ensure long running applications are always up and able to tolerate a wide array of infrastructure failures. However, there are some use cases this does not address—specifically short running, run once, tasks as well as regularly scheduled tasks. In both cases, we need the tasks to run until completion, but then terminate and start again at the next scheduled interval.
To address this type of workload, Kubernetes has added a Batch API, which includes the Job type. This type will create 1 to n pods and ensure that they all run to completion with a successful exit. Based on restartPolicy
, we can either allow pods to simply fail without retry (restartPolicy: Never
) or retry when a pods exits without successful completion (restartPolicy: OnFailure
). In this example, we will use the latter technique:
apiVersion: batch/v1 kind: Job metadata: name: long-task spec: template: metadata: name: long-task spec: containers...