cback agents
A cback agent is a stateless process responsible for executing cback jobs and/or updating job definitions in the
job database as different actions occur, cback as of writing has:backup
, restore
, prune
, switch
, verify
and portal
agents available.
Below is a basic, high level diagram showcasing the general layout of a cback system. It can be seen that agents
clearly fall into two main classes, those that orchestrate and update
both the cback jobs db and jobs found within it, and secondly
those that run
jobs to perform as a unit of work from the db:
[note] it is clear from reading the cback jobs document page, that agents that
run
jobs for execution,
clearly update certain fields in jobs they run on conclusion or failure. In the context of this diagram, the
author is conveying that update
agents are categorically responsible for allowing cback operators or users to
perform significantly more complex update operations on the cback job db.
Backup
The backup agent is responsible for the execution of backup jobs within cback. This takes the form of picking backup jobs that have entered a pending status, executing them by copying the delta of files in a defined source location into a new snapshot in the repository and finally updating the job to Completed so that a switch agent may schedule the job for reissuing.
Restore
The restore agent is responsible for the execution of restore jobs within cback. This takes the form of picking restore jobs that have entered a pending status, executing them by copying the files defined in a specific snapshot (and those linked below it) in the repository to a destination location defined within the restore job. Typically Restore jobs and thus restore picking is one time, with the jobs being added manually by a cback operator, and not being switched back to pending by the switch agent.
Prune
The prune agent is responsible for the execution of prune jobs within cback, this takes the form of picking prune jobs that have entered a pending status, executing them by:
-
checking against a group configuration
retention_policy
and the age of snapshots in a repository, to determine which snapshots have exceeded their lifespan and must be deleted from the repository. -
if
graceful
deletion is enabled for a tag provided by a cback portalSNAPSHOT DELETE
http operation, the snapshot won't be immediately deleted, instead, it will be retained for a period specified bygraceful_deletion_retention_period
before being permanently removed.
[note] The default configuration behaviour is for graceful_deletion
to be enabled in the prune agent at the
group level, thus protecting snapshots in a cback system from accidental scheduled pruning. It must be explicitly
disabled to use a retention policy.
Switch
The switch agent is responsible for the reseting of finished prune
, verify
and backup
jobs so that they may be
repicked by their respective agents, this is achieved by comparing the run_completed
field of a given job to
a expiration_time
parameter set per job type. If a job has 'expired', e.g. run_completed + expiration_time > current_time
, the switch agent will set the job back to pending from completed so that it may be repicked.
Verify
The verify agent is responsible for validating the integrity of a cback repository. It executes verify jobs by selecting repositories scheduled for verification and applying the following rules:
-
The repository size is compared against the group verify agent configuration parameter
full_verify_threshold
. -
If the size is below this threshold, a full verification of the repository is performed.
- If the size is above this threshold, a partial verification is performed instead.
For partial verification, the agent uses two group verify agent configuration parameters:
partial_verify_percentage
— defines what fraction of the repository should be verified.partial_verify_threshold
— defines an upper limit on the portion size to prevent excessively large or resource-intensive verification runs. If the calculated portion size frompartial_verify_percentage
exceeds this limit, onlypartial_verify_threshold
worth of data is verified.
The verification logic described above is illustrated in the following diagram:
Portal
The cback portal is an optional agent, aimed at operators that wish to programmatically interact with cback. In
effect, it provides a http/s REST interface webserver that can be used for the manipulation, creation and deletion
of jobs / workloads in a cback system. The portal is self-documenting, and provides a fast api interface at
http/s://<portal-node-url:<port>/docs#
that can be viewed to see the methods available to you.
Enabling agents
In the same way that we have discussed how jobs have an enabled
state, likewise agents also have this mechanism,
which can be useful for temporarily disabling all agents of a class on a given worker node, the below
shows this in practice:
# disable all backup agents on a worker node
$ cback backup agent --disable "example reason"
2024-05-17 11:44:17 INFO /usr/lib/python3.9/site-packages/cback/model/agents/agent.py:141 > agents disabled pid=1058614 agent=backup id=0 reason=example reason
# check to see the lock file exists
$ ls /etc/cback/locks
backup-0.disabled
# check that a running agent will not pick jobs by running one ephemerally
$ cback backup agent
2024-05-17 12:09:27 DEBUG /usr/lib/python3.9/site-packages/cback/model/agents/agent.py:109 > agent is disabled pid=1072942 agent=backup id=0 reason=example reason
2024-05-17 12:09:27 DEBUG /usr/lib/python3.9/site-packages/cback/model/agents/agent.py:112 > agent will sleep for 3 secs pid=1072942 agent=backup id=0
# reenable the backup agents
$ cback backup agent --enable
2024-05-17 11:44:28 INFO /usr/lib/python3.9/site-packages/cback/model/agents/agent.py:120 > agents enabled pid=1058626 agent=backup id=0
[warning❗] It is important to note that disabling a set of agents only stops them from picking new jobs, it will not terminate an existing job that an agent has already collected and is processing. You should consider this prior to performing any actions that might fail the job such as a systemd unit restart or stop, and wait for the agents to enter an idle state of operation.