cback agents
A cback agent is a stateless process responsible for executing cback jobs and or updating job definitions in the
job database as different actions occur, cback as of writing has:backup
, restore
, prune
, switch
, verify
and portal
agents available.
Bellow is a basic, high level diagram showcasing the general layout of a cback system, it can be seen that agents
clearly fall into two main classes, those that update
both the cback jobs db and jobs found within it, and secondly
those that pick
jobs to perform as a unit of work from the db:
[note] it is clear from reading the cback jobs document page, that agents that pick
jobs for execution,
clearly update certain fields in jobs they run on conclusion or failure. In the context of this diagram, the
author is conveying that update
agents are catagorically responsible for allowing cback operators or users to
perform significantly more complex, update operations on the cback job db.
Backup
The backup agent is responsible for the execution of backup jobs within cback, this takes the form of picking backup jobs that have entered a pending status, executing them, by copying the delta of files in a defined source location into a new snapshot in the repository and finally updating the job to Completed so that a switch agent may schedule the job for reissuing.
Restore
The restore agent is responsible for the execution of restore jobs within cback, this takes the form of picking restore jobs that have entered a pending status, executing them by copying the files defined in a specific snapshot (and those linked bellow it) in the repository to a destination location defined within the restore job. Typically Restore jobs and thus restore picking is one time, with the jobs being added manually by a cback operator, and not being switched back to pending by the switch agent.
Prune
The prune agent is responsible for the execution of prune jobs within cback, this takes the form of picking prune jobs that have entered a pending status, executing them by:
- checking against a group configuration
retention_policy
and the age of snapshots in a repository, to determine which snapshots have exceeded their lifespan and must be deleted from the repository. - if
graceful
deletion is enabled, for a tag provided by a cback portalSNAPSHOT DELETE
http operation, and then based, off agraceful_deletion_retention_period
which defines how long the snapshot should be protected for prior to deletion.
[Note} The default configuration behaviour, is for graceful_deletion
to be enabled in the prune agent at the
group level, thus protecting snapshots in a cback system from accidental scheduled pruning, it must be explicitly
disabled to use a retention policy.
Switch
The switch agent is responsible for the reseting of finished prune
, verify
and backup
jobs so that they may be
repicked by their respective agents, this is achieved by comparing the run_completed
field of a given job to
a expiration_time
parameter set per job type. If a job has 'expired' e.g. run_completed + expiration_time > curent_time
the switch agent will set the job back to pending from completed so that it may be repicked.
Verify
The verify agent is responsible for validating the integrity of a cback repository, it picks verify jobs, which in effect, are restore jobs targeting a random snapshot to a configured location on the worker node. It performs the restore and then validates the file integrity of the given snapshot. Once concluded the local files are cleared.
[Note} Given that the space on a worker node is often limited and incresingly backups exceed this size, Large cback repositories may benefit from having multiple verfy jobs added to a workload that target specific compartments of the given repoitories file structure.
Portal
The cback portal is a optional agent, aimed at operators that wish to programatically interact with cback, in
effect, it provides a http/s REST interface webserver that can be used for the manipulation, creation and deletion
of jobs / workloads in a cback system. The portal is self documenting, and provides a fast api interface at
http/s://<portal-node-url:<port>/docs#
that can be viewed to see the methods availible to you.
Enabling agents
In the same way that we have discussed how jobs have a enabled
state, likewise agents also have this mechanism,
which can be useful for tempoarily disabling all agents of a class on a given worker node, the bellow
shows this in practice:
# disable all backup agents on a worker node
$ cback backup agent --disable "example reason"
2024-05-17 11:44:17 INFO /usr/lib/python3.9/site-packages/cback/model/agents/agent.py:141 > agents disabled pid=1058614 agent=backup id=0 reason=example reason
# check to see the lock file exists
$ ls /etc/cback/locks
backup-0.disabled
# check that a running agent will not pick jobs by running one ephemeraly
$ cback backup agent
2024-05-17 12:09:27 DEBUG /usr/lib/python3.9/site-packages/cback/model/agents/agent.py:109 > agent is disabled pid=1072942 agent=backup id=0 reason=example reason
2024-05-17 12:09:27 DEBUG /usr/lib/python3.9/site-packages/cback/model/agents/agent.py:112 > agent will sleep for 3 secs pid=1072942 agent=backup id=0
# renable the backup agents
$ cback backup agent --enable
2024-05-17 11:44:28 INFO /usr/lib/python3.9/site-packages/cback/model/agents/agent.py:120 > agents enabled pid=1058626 agent=backup id=0
[warningâť—] its important to note that disabling a set of agents only stops them from picking new jobs, it will not terminate a existing job that a agent has allready collected and is processing. You should consider this prior to performing any actions that might fail the job such as a systemd unit restart, or stop, and wait for the agents to enter a idle state of operation.