Courier jobs
Jobs serve as a way to encapsulate and automate a process. A Courier job is a configuration that represents:
- input options
- the steps involved in the process
- an expression that identifies the nodes where those steps execute
- execution control parameters
Chef Courier enables you to organize and execute jobs while monitoring their progress in real time. You can view a list of running jobs or delve into the details of individual executing steps. If needed, you can cancel pending job executions.
Each job execution is recorded, providing information about the nodes where steps were executed, the success, and duration of each step. The output of the job execution can be reviewed and downloaded.
Schedule a job
Schedule a job by defining the job details in a JSON, YAML, or TOML file and then submit that job to Chef Courier. The default format is JSON.
For more information on defining a job, see the job template documentation.
Create a job definition in a JSON file:
{ "name": "a simple job to perform one action", "description": "Perform a simple shell command on specific nodes to understand the fundamentals of Courier jobs", "scheduleRule": "immediate", "exceptionRules": [], "target": { "executionType": "sequential", "groups":[ { "timeoutSeconds": 240, "batchSize": {}, "distributionMethod": "batching", "successCriteria": [ { "numRuns": { "type": "percent", "value": 100 }, "status": "success" } ], "nodeListType":"nodes", "nodeIdentifiers":[ "--NODE1--" ] }, { "timeoutSeconds": 120, "batchSize": { "type": "number", "value": 1 }, "distributionMethod":"batching", "successCriteria": [{ "numRuns": { "type": "percent", "value": 100 }, "status": "success" }], "nodeListType":"nodes", "nodeIdentifiers":[ "--NODE2--", "--NODE3--" ] } ] }, "actions": { "accessMode": "agent", "steps": [ { "name": "step to sleep", "interpreter": { "skill": { "minVersion": "1.0.0" }, "name": "chef-platform/shell-interpreter" }, "command": { "linux": [ "sleep 10" ], "windows": [ "timeout 10" ] }, "inputs": {}, "expectedInputs": { }, "outputFieldRules": {}, "retryCount": 2, "failureBehavior": { "action": "retryThenFail", "retryBackoffStrategy": { "type": "linear", "delaySeconds": 1, "arguments": [1,3,5] } }, "limits": {}, "conditions": [] } ] } }
Schedule the job by submitting the job definition to the job scheduler using the
scheduler jobs add-job
command:chef-courier-cli scheduler jobs add-job --body-file <FILENAME.json> --profile <COURIER_OPERATOR_PROFILE_NAME>
To submit the job in YAML or TOML, specify the file type with the
--body-format
option. For example:chef-courier-cli scheduler jobs add-job --body-file <FILENAME.yaml> --body-format yaml
Chef Courier returns a response similar to:
{ "item": { "exceptionReasons": [], "id": "61013744-bd6e-437c-a995-b6211318624e", "nextExecutionTime": "2024-03-27T14:57:20.551342549Z" } }
The response includes a job ID (
61013744-bd6e-437c-a995-b6211318624e
) that you can use to get details about the job.
Get job instance details
Courier creates a job instance every time a job executes. For immediate jobs, Courier creates only one job instance; for recurring jobs, Courier creates a job instance every time the job executes.
You can get the details of each job instance using the job ID.
Get the job instance using a job ID:
chef-courier-cli state instance list-all --job-id <JOB_ID> --profile <COURIER_OPERATOR_PROFILE_NAME>
It returns a response similar to:
"items": [ { "actionSpec": { "accessMode": "agent", "steps": [ { "command": { "linux": [ "sleep 10" ] }, "conditions": [], "description": "", "expectedInputs": {}, "failureBehavior": { "action": "retryThenFail", "retryBackoffStrategy": { "arguments": [ 1, 3, 5 ], "delaySeconds": 0, "type": "none" } }, "inputs": {}, "interpreter": { "name": "chef-platform/shell-interpreter", "skill": { "maxVersion": "", "minVersion": "1.0.0" } }, "limits": { "cores": 0, "cpu": 1, "timeoutSeconds": 0 }, "name": "sleep", "outputFieldRules": {}, "retryCount": 2 } ] }, "createdAt": "2024-07-12T01:10:20.064474Z", "id": "befe88f3-ac3f-4bad-9e1a-6017332929ab", "jobId": "448ecaf3-5e44-4e7b-9e98-62dfa8c69b1c", "lastUpdatedBy": "00000000-0000-0000-0000-000000000000", "status": "running", "targetSpec": { "executionType": "sequential", "groups": [ { "batchSize": { "type": "number", "value": 1 }, "distributionMethod": "batching", "filter": null, "filterId": "00000000-0000-0000-0000-000000000000", "listId": "00000000-0000-0000-0000-000000000000", "nodeIdentifiers": [ "1f4c680d-4dc6-4edf-83b6-ded81388f244", "020ab866-9d3d-4ca7-919d-c21b4f9ef2a4", "925f244c-65b4-4ab9-9ebd-4b17f8e2277b" ], "nodeListType": "nodes", "successCriteria": [ { "numRuns": { "type": "percent", "value": 100 }, "status": "success" } ], "timeoutSeconds": 60 } ] }, "updatedAt": "2024-07-12T01:13:20.62717Z" } ], "pagination": { "itemsPerPage": 10, "nextLink": "", "page": 1, "pageItemCount": 1, "previousLink": "", "startIndex": 1, "totalItems": 1, "totalPages": 1 } }
The response contains the job instance ID in
items[<ITEM_NUMBER>].id
. In this example, the job instance ID isf6cd0ea0-b955-4db1-b697-e9d28b5a0772
.
Get job run details
One job run is created for each target node in a job instance. You can get the details of all job runs using a job instance ID.
Get the details of the job runs of a job instance:
chef-courier-cli state instance list-instance-runs --instanceId <INSTANCE_ID> --profile <COURIER_OPERATOR_PROFILE_NAME>
It returns a response similar to:
{ "items": [ { "groupNumber": 0, "lastUpdateTime": "2024-03-27T15:00:09.185273Z", "nodeId": "03ba18c0-68b2-43ba-93fd-b6952443421e", "receivedTime": "2024-03-27T14:58:57.226555Z", "runId": "496f0d9a-f0be-4cb7-8cbb-2285b5c42045", "status": "success" }, { "groupNumber": 0, "lastUpdateTime": "2024-03-27T14:58:48.582415Z", "nodeId": "2768a3b5-f5a6-45ae-a8b6-42d531c6416a", "receivedTime": "2024-03-27T14:57:51.753876Z", "runId": "306d3a00-d483-4aa2-bb09-1cd8c91c15e8", "status": "success" }, { "groupNumber": 1, "lastUpdateTime": "2024-03-27T14:58:51.090507Z", "nodeId": "a8b1f470-fedc-45e8-ba0a-b26dd551c1d0", "receivedTime": "2024-03-27T14:57:49.229442Z", "runId": "376226b9-f916-42a1-8620-a4636598c5e5", "status": "success" }, { "groupNumber": 1, "lastUpdateTime": "2024-03-27T14:59:58.186538Z", "nodeId": "acb38595-64af-4532-8589-2aeb2ad876fc", "receivedTime": "2024-03-27T14:58:55.934629Z", "runId": "22d7d6a9-33b5-4565-8575-2f2ee24a06a1", "status": "success" }, { "groupNumber": 1, "lastUpdateTime": "2024-03-27T15:01:05.766027Z", "nodeId": "e4b1b524-4e77-4448-b1a9-01b80288c898", "receivedTime": "2024-03-27T15:00:03.220273Z", "runId": "56960929-ca32-463a-bc6e-c0a6cca4f89d", "status": "success" } ], "totalItems": 5 }
The response shows five jobs runs for this job instance, each with a unique job run ID. In this example, the job run IDs are:
496f0d9a-f0be-4cb7-8cbb-2285b5c42045
306d3a00-d483-4aa2-bb09-1cd8c91c15e8
376226b9-f916-42a1-8620-a4636598c5e5
22d7d6a9-33b5-4565-8575-2f2ee24a06a1
56960929-ca32-463a-bc6e-c0a6cca4f89d
Get the job run step details
You can get the details of each step of a job run using the job run ID.
Get the step details for a run:
chef-courier-cli state run list-steps --runId <RUN_ID> --profile <COURIER_OPERATOR_PROFILE_NAME>
It returns a response similar to:
{ "items": [ { "inputs": "", "interpreterPath": "/hab/pkgs/chef-platform/shell-interpreter/0.1.3/20240318113204/bin/shell-interpreter", "interpreterVersion": "", "numAttempts": 1, "outputs": "", "reason": "", "status": "success", "stepNumber": 1 } ], "totalItems": 1 }
Get previous jobs
Get details of previous jobs:
chef-courier-cli scheduler jobs list-jobs --profile <COURIER_OPERATOR_PROFILE_NAME>
The response is similar to the following:
{ "items": [ { "actions": { "accessMode": "agent", "steps": [ { "command": [ "sleep 10" ], "conditions": [], "description": "", "expectedInputs": {}, "failureBehavior": { "action": "retryThenFail", "retryBackoffStrategy": { "arguments": [ 1, 3, 5 ], "delaySeconds": 0, "type": "none" } }, "inputs": {}, "interpreter": { "name": "chef-platform/shell-interpreter", "skill": { "maxVersion": "", "minVersion": "1.0.0" } }, "limits": { "cores": 0, "cpu": 1, "timeoutSeconds": 0 }, "name": "sleep", "outputFieldRules": {}, "retryCount": 2 } ] }, "createdAt": "2024-07-12T00:35:41.553747Z", "description": "", "exceptionReasons": [], "exceptionRules": [], "executionCount": 1, "id": "6f0adc12-adf5-466e-aa0f-c12870d61e93", "modifiedAt": "2024-07-12T00:35:50.063667Z", "name": "a job for TENANT-1 ORGA", "nextExecutionTime": "0001-01-01T00:00:00Z", "scheduleRule": "immediate", "status": "active", "target": { "executionType": "sequential", "groups": [ { "batchSize": { "type": "", "value": 0 }, "distributionMethod": "batching", "filter": null, "filterId": "00000000-0000-0000-0000-000000000000", "listId": "00000000-0000-0000-0000-000000000000", "nodeIdentifiers": [ "1f4c680d-4dc6-4edf-83b6-ded81388f244", "020ab866-9d3d-4ca7-919d-c21b4f9ef2a4", "925f244c-65b4-4ab9-9ebd-4b17f8e2277b" ], "nodeListType": "nodes", "successCriteria": [ { "numRuns": { "type": "percent", "value": 100 }, "status": "success" } ], "timeoutSeconds": 240 } ] } } ], "pagination": { "itemsPerPage": 10, "nextLink": "", "page": 1, "pageItemCount": 1, "previousLink": "", "startIndex": 1, "totalItems": 1, "totalPages": 1 } }
Get details of all job instances
You can retrieve details of all job instances. For example:
chef-courier-cli state instance list-all --profile <COURIER_OPERATOR_PROFILE_NAME>
The response is similar to the following:
{
"items": [
{
"actionSpec": {
"accessMode": "agent",
"steps": [
{
"command": [
"sleep 10"
],
"conditions": [],
"description": "",
"expectedInputs": {},
"failureBehavior": {
"action": "retryThenFail",
"retryBackoffStrategy": {
"arguments": [
1,
3,
5
],
"delaySeconds": 0,
"type": "none"
}
},
"inputs": {},
"interpreter": {
"name": "chef-platform/shell-interpreter",
"skill": {
"maxVersion": "",
"minVersion": "1.0.0"
}
},
"limits": {
"cores": 0,
"cpu": 1,
"timeoutSeconds": 0
},
"name": "sleep",
"outputFieldRules": {},
"retryCount": 2
}
]
},
"createdAt": "2024-07-12T00:35:50.107976Z",
"id": "8bf43df4-4af2-40f3-9eef-033c2e5c2260",
"jobId": "6f0adc12-adf5-466e-aa0f-c12870d61e93",
"lastUpdatedBy": "00000000-0000-0000-0000-000000000000",
"status": "failure",
"targetSpec": {
"executionType": "sequential",
"groups": [
{
"batchSize": {
"type": "",
"value": 0
},
"distributionMethod": "batching",
"filter": null,
"filterId": "00000000-0000-0000-0000-000000000000",
"listId": "00000000-0000-0000-0000-000000000000",
"nodeIdentifiers": [
"1f4c680d-4dc6-4edf-83b6-ded81388f244",
"020ab866-9d3d-4ca7-919d-c21b4f9ef2a4",
"925f244c-65b4-4ab9-9ebd-4b17f8e2277b"
],
"nodeListType": "nodes",
"successCriteria": [
{
"numRuns": {
"type": "percent",
"value": 100
},
"status": "success"
}
],
"timeoutSeconds": 240
}
]
},
"updatedAt": "2024-07-12T00:47:52.09999Z"
},
{
"actionSpec": {
"accessMode": "agent",
"steps": [
{
"command": {
"linux": [
"sleep 10"
]
},
"conditions": [],
"description": "",
"expectedInputs": {},
"failureBehavior": {
"action": "retryThenFail",
"retryBackoffStrategy": {
"arguments": [
1,
3,
5
],
"delaySeconds": 0,
"type": "none"
}
},
"inputs": {},
"interpreter": {
"name": "chef-platform/shell-interpreter",
"skill": {
"maxVersion": "",
"minVersion": "1.0.0"
}
},
"limits": {
"cores": 0,
"cpu": 1,
"timeoutSeconds": 0
},
"name": "sleep",
"outputFieldRules": {},
"retryCount": 2
}
]
},
"createdAt": "2024-07-12T01:02:20.082838Z",
"id": "80b8591c-1eb4-448b-a28a-a43e92921431",
"jobId": "2e6a6b46-d7eb-411e-8f90-94715c9ad20d",
"lastUpdatedBy": "00000000-0000-0000-0000-000000000000",
"status": "failure",
"targetSpec": {
"executionType": "sequential",
"groups": [
{
"batchSize": {
"type": "",
"value": 0
},
"distributionMethod": "batching",
"filter": null,
"filterId": "00000000-0000-0000-0000-000000000000",
"listId": "00000000-0000-0000-0000-000000000000",
"nodeIdentifiers": [
"1f4c680d-4dc6-4edf-83b6-ded81388f244",
"020ab866-9d3d-4ca7-919d-c21b4f9ef2a4",
"925f244c-65b4-4ab9-9ebd-4b17f8e2277b"
],
"nodeListType": "nodes",
"successCriteria": [
{
"numRuns": {
"type": "percent",
"value": 100
},
"status": "success"
}
],
"timeoutSeconds": 60
}
]
},
"updatedAt": "2024-07-12T01:05:20.665098Z"
},
{
"actionSpec": {
"accessMode": "agent",
"steps": [
{
"command": {
"linux": [
"sleep 10"
]
},
"conditions": [],
"description": "",
"expectedInputs": {},
"failureBehavior": {
"action": "retryThenFail",
"retryBackoffStrategy": {
"arguments": [
1,
3,
5
],
"delaySeconds": 0,
"type": "none"
}
},
"inputs": {},
"interpreter": {
"name": "chef-platform/shell-interpreter",
"skill": {
"maxVersion": "",
"minVersion": "1.0.0"
}
},
"limits": {
"cores": 0,
"cpu": 1,
"timeoutSeconds": 0
},
"name": "sleep",
"outputFieldRules": {},
"retryCount": 2
}
]
},
"createdAt": "2024-07-12T01:10:20.064474Z",
"id": "befe88f3-ac3f-4bad-9e1a-6017332929ab",
"jobId": "448ecaf3-5e44-4e7b-9e98-62dfa8c69b1c",
"lastUpdatedBy": "00000000-0000-0000-0000-000000000000",
"status": "failure",
"targetSpec": {
"executionType": "sequential",
"groups": [
{
"batchSize": {
"type": "number",
"value": 1
},
"distributionMethod": "batching",
"filter": null,
"filterId": "00000000-0000-0000-0000-000000000000",
"listId": "00000000-0000-0000-0000-000000000000",
"nodeIdentifiers": [
"1f4c680d-4dc6-4edf-83b6-ded81388f244",
"020ab866-9d3d-4ca7-919d-c21b4f9ef2a4",
"925f244c-65b4-4ab9-9ebd-4b17f8e2277b"
],
"nodeListType": "nodes",
"successCriteria": [
{
"numRuns": {
"type": "percent",
"value": 100
},
"status": "success"
}
],
"timeoutSeconds": 60
}
]
},
"updatedAt": "2024-07-12T01:13:20.62717Z"
}
],
"pagination": {
"itemsPerPage": 10,
"nextLink": "",
"page": 1,
"pageItemCount": 3,
"previousLink": "",
"startIndex": 1,
"totalItems": 3,
"totalPages": 1
}
}
Filter job instance results
You can filter the results by the time an instance ran, or by the status of job instance.
chef-courier-cli state instance list-all --run-before <END_TIME> --run-after <START_TIME> --status <JOB_STATUS> --profile <COURIER_OPERATOR_PROFILE_NAME>
Use the following parameters to filter job results:
--run-after
- Filters by job instances initiated after the given time.
Data type: String
--run-before
- Filters by job instances initiated before the given time.
Data type: String
--status
- Filters by the provided instance status.
Data type: String
For example:
chef-courier-cli state instance list-all --run-before "2024-03-31T19:59:20Z" --run-after "2024-03-29T12:00:00Z" --profile <COURIER_OPERATOR_PROFILE_NAME>
The response is similar to the following:
{
"items": [
{
"actionSpec": {
"accessMode": "agent",
"steps": [
{
"command": {
"linux": [
"sleep 10"
]
},
"conditions": [],
"expectedInputs": {},
"failureBehavior": {
"action": "retryThenFail",
"retryBackoffStrategy": {
"arguments": [],
"delaySeconds": 0,
"type": "none"
}
},
"inputs": {},
"interpreter": {
"name": "chef-platform/shell-interpreter",
"skill": {
"minVersion": "1.0.0",
"maxVersion": ""
}
},
"limits": {
"cores": 0,
"cpu": 0,
"time": 0
},
"outputFieldRules": {},
"name": "sleep",
"retryCount": 0,
"stepNumber": 1
}
]
},
"createdAt": "2024-03-29T21:24:46.537387Z",
"id": "86988dde-d0bb-4a3f-85a2-688403dceb23",
"jobId": "1bd70404-b619-4c99-9e4d-7bdedc787bbc",
"lastUpdatedBy": "00000000-0000-0000-0000-000000000000",
"status": "success",
"targetSpec": {
"executionType": "sequential",
"groups": [
{
"batchSize": {
"type": "number",
"value": 1
},
"distributionMethod": "batching",
"filter": {
"constraints": {
"attributes": [
{
"name": "kernel_name",
"operator": "=",
"value": [
"Linux"
]
},
{
"name": "primary_ip",
"operator": "MATCHES",
"value": [
"^172\\.31.*"
]
}
],
"skills": [
{
"name": "courier-runner",
"version": [
"\u003c= 1.0.66"
]
},
{
"name": "chef-gohai",
"version": [
"= 0.1.0"
]
}
]
}
},
"filterId": "00000000-0000-0000-0000-000000000000",
"listId": "00000000-0000-0000-0000-000000000000",
"nodeIdentifiers": null,
"nodeListType": "filter",
"successCriteria": [
{
"numRuns": {
"type": "percent",
"value": 100
},
"status": "success"
}
],
"timeoutSeconds": 3000
}
]
},
"updatedAt": "2024-03-29T21:32:32.297414Z"
},
{
"actionSpec": {
"accessMode": "agent",
"steps": [
{
"command": {
"linux": [
"sleep 10"
]
},
"conditions": [],
"expectedInputs": {},
"failureBehavior": {
"action": "retryThenFail",
"retryBackoffStrategy": {
"arguments": [],
"delaySeconds": 0,
"name": "none"
}
},
"inputs": {},
"interpreter": {
"name": "chef-platform/shell-interpreter",
"skill": {
"minVersion": "1.0.0",
"maxVersion": ""
}
},
"limits": {
"cores": 0,
"cpu": 1,
"time": 0
},
"outputFieldRules": {},
"retryCount": 0
}
]
},
"createdAt": "2024-03-29T21:52:46.721021Z",
"id": "0641d617-3ae1-4dd9-a610-5570a5ad8354",
"jobId": "50bdc41e-eac5-4681-9b2b-ce29e0096343",
"lastUpdatedBy": "00000000-0000-0000-0000-000000000000",
"status": "success",
"targetSpec": {
"executionType": "parallel",
"groups": [
{
"batchSize": {
"type": "number",
"value": 1
},
"distributionMethod": "batching",
"filter": null,
"filterId": "00000000-0000-0000-0000-000000000000",
"listId": "00000000-0000-0000-0000-000000000000",
"nodeIdentifiers": [
"03ba18c0-68b2-43ba-93fd-b6952443421e",
"2768a3b5-f5a6-45ae-a8b6-42d531c6416a"
],
"nodeListType": "nodes",
"successCriteria": [
{
"numRuns": {
"type": "percent",
"value": 100
},
"status": "success"
}
],
"timeoutSeconds": 1500
},
{
"batchSize": {
"type": "number",
"value": 1
},
"distributionMethod": "batching",
"filter": null,
"filterId": "00000000-0000-0000-0000-000000000000",
"listId": "00000000-0000-0000-0000-000000000000",
"nodeIdentifiers": [
"a8b1f470-fedc-45e8-ba0a-b26dd551c1d0",
"acb38595-64af-4532-8589-2aeb2ad876fc",
"e4b1b524-4e77-4448-b1a9-01b80288c898"
],
"nodeListType": "nodes",
"successCriteria": [
{
"numRuns": {
"type": "percent",
"value": 100
},
"status": "success"
}
],
"timeoutSeconds": 1500
}
]
},
"updatedAt": "2024-03-29T21:56:09.603277Z"
}
.
.
.
],
"pagination": {
"itemsPerPage": 10,
"nextLink": "",
"page": 1,
"pageItemCount": 3,
"previousLink": "",
"startIndex": 1,
"totalItems": 8,
"totalPages": 1,
"
}
View evidence of jobs
You can retrieve artifacts uploaded by nodes for each step and for each job run.
Use the following command to get jobs artifacts:
chef-courier-cli state run get-step-attempt-evidence --runId <RUN_ID> --stepNo <STEP_NUMBER> --attemptNo <ATTEMPT_NUMBER> --profile <COURIER_OPERATOR_PROFILE_NAME>
This command requires the following arguments:
--attemptNo
- The attempt number for a given step/action within a job run.
Data type: Integer
--runId
- The unique identifier of a job run.
Data type: String
--stepNo
- A single step/action within a job run.
Data type: Integer
For example:
chef-courier-cli state run get-step-attempt-evidence --runId 21c589d3-2a6d-4862-a801-a2d7435a9c01 --stepNo 1 --attemptNo 1 | jq -b .item.artifactUrl --profile <COURIER_OPERATOR_PROFILE_NAME>
The response is similar to the following:
{
"item": {
"artifactUrl": "http://pmtest2beta1.demos.chef.co/evidence/run/21c589d3-2a6d-4862-a801-a2d7435a9c01/step/1/attempt/1/evidence-run-21c589d3-2a6d-4862-a801-a2d7435a9c01-step-1-attempt-1.zip?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=admin%2F20240422%2Fregion1%2Fs3%2Faws4_request&X-Amz-Date=20240422T140827Z&X-Amz-Expires=900&X-Amz-SignedHeaders=host&x-id=GetObject&X-Amz-Signature=ed945c79bcec81aa23369f19b4ac887174e3a623e60763f78e35fb5b2a2d6ed3"
}
}
The artifact URL expires in 15 minutes.
Debug Chef Courier jobs
To debug a Chef Courier job, SSH into the node and check the Courier Runner logs.
For Linux nodes:
cd /hab/svc/courier-runner
tail -f logscourier-log
For Windows nodes:
cd C:\hab\svc\courier-runner\
gc .\logscourier-log -Wait