Introduction

This website provides documentation for cback Operators at CERN, however the information provided is largely applicable to anyone who might want to run cback themselves.

cback is an orchestration tool built on Restic for the management and automation of a high volume of backups to an S3 endpoint.

Restic will not be explained in detail throughout these docs, therefore it is recommended to gain a basic understanding of how the tool operates from the upstream documentation prior to attempting to use cback.

Is cback for me?

cback was designed for the purpose of backing up large network file systems where high parallelism is possible (home folders, project spaces, etc). In contrast to other backup software/systems, cback is considered to be "always-running", the software is ran as a service that performs backups.

cback uses a centralized database to persist stateful information on different jobs (their state, progress, target destination and more) which can be queried, run and subsequently updated by a stateless agent of a corresponding type. cback allows for a scalable number of these agents.

There is no set in stone schedule for a cback backup such that they will run on X hour on Y day every Z week, instead the scheduling of a cback backup is controlled by a job expiration timer that defines the frequency at which a job can be allowed to run.

In essence this means while a rough window in which the job will be re-performed is generally determinable, it is not guaranteed that a job that has exceeded this timer will immediately be picked up and ran.

If you are looking for a tool to perform point in time consistent backups at specific intervals, cback may not be for you.

If you are looking for a tool to orchestrate a large number of parallel backups across an organisation where the explicit runtime is less important then data integrity, you might be in the right place :)

Preread Glossary

Bellow is a list of common terms that will be used throughout this documentation.

cback system - A collection of hosts that provide cback orchestration.
Worker node  - An individual physical or virtual node in a cback system.
Agent        - A cback job agent, deployed on a Worker node.
Job          - A tangible unit of work stored in the cback database.
Picking      - The selection of a given cback job for execution by a cback agent.
Job DB       - A central database storing all job definitions + tertiary information.
Id           - A unique (per job class) integer used for job identification.
Repository   - An S3 bucket, storing a restic repo that jobs can target for work.
Workload     - A collection of cback jobs that form a lifecycled backup for a given repository.
Status       - The current lifecycle status of a given cback job.
Group        - A delimited group of cback workloads with a specific configuration policy.

within the docs the following tags are used to bring attention to a particularly key aspect of the topic at hand:

[note] - refers to a concept the author believes to be fundamentally useful to cback operation
[warning❗] - refers to a disclaimer regarding a method by which you may damage cback