cback agents

A cback agent is a stateless process responsible for executing cback jobs and or updating job definitions in the job database as different actions occur, cback as of writing has:backup, restore, prune, switch, verify and portal agents available.

Bellow is a basic, high level diagram showcasing the general layout of a cback system, it can be seen that agents clearly fall into two main classes, those that update both the cback jobs db and jobs found within it, and secondly those that pick jobs to perform as a unit of work from the db: screenshot [note] it is clear from reading the cback jobs document page, that agents that pick jobs for execution, clearly update certain fields in jobs they run on conclusion or failure. In the context of this diagram, the author is conveying that update agents are catagorically responsible for allowing cback operators or users to perform significantly more complex, update operations on the cback job db.

Backup

The backup agent is responsible for the execution of backup jobs within cback, this takes the form of picking backup jobs that have entered a pending status, executing them, by copying the delta of files in a defined source location into a new snapshot in the repository and finally updating the job to Completed so that a switch agent may schedule the job for reissuing.

Restore

The restore agent is responsible for the execution of restore jobs within cback, this takes the form of picking restore jobs that have entered a pending status, executing them by copying the files defined in a specific snapshot (and those linked bellow it) in the repository to a destination location defined within the restore job. Typically Restore jobs and thus restore picking is one time, with the jobs being added manually by a cback operator, and not being switched back to pending by the switch agent.

Prune

The prune agent is responsible for the execution of prune jobs within cback, this takes the form of picking prune jobs that have entered a pending status, executing them by:

  • checking against a group configuration retention_policy and the age of snapshots in a repository, to determine which snapshots have exceeded their lifespan and must be deleted from the repository.
  • if graceful deletion is enabled, for a tag provided by a cback portal SNAPSHOT DELETE http operation, and then based, off a graceful_deletion_retention_period which defines how long the snapshot should be protected for prior to deletion.

[Note} The default configuration behaviour, is for graceful_deletion to be enabled in the prune agent at the group level, thus protecting snapshots in a cback system from accidental scheduled pruning, it must be explicitly disabled to use a retention policy.

Switch

The switch agent is responsible for the reseting of finished prune, verify and backup jobs so that they may be repicked by their respective agents, this is achieved by comparing the run_completed field of a given job to a expiration_time parameter set per job type. If a job has 'expired' e.g. run_completed + expiration_time > curent_time the switch agent will set the job back to pending from completed so that it may be repicked.

Verify

The verify agent is responsible for validating the integrity of a cback repository, it picks verify jobs, which in effect, are restore jobs targeting a random snapshot to a configured location on the worker node. It performs the restore and then validates the file integrity of the given snapshot. Once concluded the local files are cleared.

[Note} Given that the space on a worker node is often limited and incresingly backups exceed this size, Large cback repositories may benefit from having multiple verfy jobs added to a workload that target specific compartments of the given repoitories file structure.

Portal

The cback portal is a optional agent, aimed at operators that wish to programatically interact with cback, in effect, it provides a http/s REST interface webserver that can be used for the manipulation, creation and deletion of jobs / workloads in a cback system. The portal is self documenting, and provides a fast api interface at http/s://<portal-node-url:<port>/docs# that can be viewed to see the methods availible to you.

Enabling agents

In the same way that we have discussed how jobs have a enabled state, likewise agents also have this mechanism, which can be useful for tempoarily disabling all agents of a class on a given worker node, the bellow shows this in practice:

# disable all backup agents on a worker node
$ cback backup agent --disable "example reason"
2024-05-17 11:44:17 INFO /usr/lib/python3.9/site-packages/cback/model/agents/agent.py:141 > agents disabled pid=1058614 agent=backup id=0 reason=example reason

# check to see the lock file exists
$ ls /etc/cback/locks
backup-0.disabled

# check that a running agent will not pick jobs by running one ephemeraly
$ cback backup agent
2024-05-17 12:09:27 DEBUG /usr/lib/python3.9/site-packages/cback/model/agents/agent.py:109 > agent is disabled pid=1072942 agent=backup id=0 reason=example reason
2024-05-17 12:09:27 DEBUG /usr/lib/python3.9/site-packages/cback/model/agents/agent.py:112 > agent will sleep for 3 secs pid=1072942 agent=backup id=0

# renable the backup agents
$ cback backup agent --enable
2024-05-17 11:44:28 INFO /usr/lib/python3.9/site-packages/cback/model/agents/agent.py:120 > agents enabled pid=1058626 agent=backup id=0

[warningâť—] its important to note that disabling a set of agents only stops them from picking new jobs, it will not terminate a existing job that a agent has allready collected and is processing. You should consider this prior to performing any actions that might fail the job such as a systemd unit restart, or stop, and wait for the agents to enter a idle state of operation.