Backup and Restore Stateful Workloads using Velero and Restic

This is quick post to summarise using Velero to backup your stateful workloads running on Kubernetes without the use of cloud provider plugin to snapshot the persistent volumes.

The current setup

I’ve got a Tanzu Kubernetes Grid cluster provisioned with the Guestbook application installed into a namespace called (imaginatively) guestbook. I’ve also added some random comments to the guestbook so we have some data to backup from the persistent volume.

Guestbook Before

I’ve also got Minio installed on another VM.

Install Velero

Installing Velero is simples.

Install the Velero CLI using the right method for your OS, I’m using a Mac so Homebrew, further details are available in the Docs.

Once installed, ensure your kube.config is pointing at the right cluster and run the below command. At a high level, this is going to install Velero into the cluster (Create CRDs, Daemonsets, Controllers etc.). I’ve used these options in this case:

  velero install \
  --provider aws \
  --plugins velero/velero-plugin-for-aws:v1.1.0 \
  --bucket velero \
  --secret-file velero-cred \
  --use-volume-snapshots false \
  --use-restic \
  --backup-location-config region=minio,s3ForcePathStyle="true",s3Url=http://192.168.15.15:9000

Confirm everything is ok with the deployed pods, for example they won’t start if you’ve configured the backup location incorrectly.

  kubectl get pod --namespace velero

I’ve only got 2 worker nodes so only 2 restic pods:

NAME                      READY   STATUS    RESTARTS   AGE
restic-28vsn              1/1     Running   0          29m
restic-q6s66              1/1     Running   0          29m
velero-866bf8c8d5-7pxvr   1/1     Running   0          29m

Backups

The Velero CLI makes this really simple again. In general you can create a backup as a one-off, or you can create a schedule for it to run scheduled backups. The main decision I had to make was whether to use ‘Opt-in’ or ‘Opt-out’ volume backups. What does that mean?

Opt-In

In this case we run the backup without specifying what to do when a persistent volume is found. Once Velero finds the persistent volume it will check the annotations of the pod that has the volume mounted. If the pod has an annotation like the below, it will be backed up using Restic:

  backup.velero.io/backup-volumes=redis-leader-data

If not, it will attempt a volume snapshot. In our case, this will fail as we haven’t got this option available to us. This will make the backups show as PartiallyCompleted, making true errors harder to spot.

Alternatively, you could run the backup with the --snapshot-volumes=false option.

Opt-Out

In the opt-out case we configure the backup job to run with the --default-volumes-to-restic option. For any persistent volumes that are found, Velero will attempt to back them up using Restic. In this case you need to opt out if you don’t want to backup a volume using the below annotation on the pod:

  backup.velero.io/backup-volumes-excludes=redis-follow-data

Run the backup

Which method you choose is likely to be another decision like any other, and can easily be set per backup job. In general I try to keep it as consistent as possible as it makes supporting the platform easier. Running the backup is a simple case of running the below command.

  velero backup create guestbook-backup \
  --include-namespaces guestbook \
  --default-volumes-to-restic

Or to create an ongoing schedule:

  velero schedule create guestbook-hourly \
  --include-namespaces guestbook \
  --default-volumes-to-restic \
  --schedule="@every 1h"

The schedule can also be set using standard cron notation.

Check the Backup

A quick velero backup get command will show us a list of the backups and their current status.

  NAME                              STATUS      ERRORS   WARNINGS   CREATED                         EXPIRES   STORAGE LOCATION   SELECTOR
guestbook-backup                  Completed   0        0          2020-12-09 21:21:37 +0000 GMT   29d       default            <none>
guestbook-hourly-20201209212153   Completed   0        0          2020-12-09 21:21:53 +0000 GMT   29d       default            <none>

If we run velero backup describe guestbook-backup it will show us a little more information, including how many volumes were backed up using restic. For a more verbose option, velero backup logs guestbook-backup, will give you details about every command run during the backup.

Restores

So we’ve got a successful backup, lets test the restore. First I’ll delete the namespace that contains the application, deleting all the objects created within.

Now if the application was stateless, likely we could redploy the deployment manifests and be pretty close to the original state of the application. In this case we had all the guestbook comments stored on a persistent volume.

So lets restore it, no surprises that we’ll be using the Velero CLI. In it’s simplest form it’s:

velero restore create guestbook-restore \
--from-backup guestbook-backup

The option I’ve made use of repeatedly is the --namespace-mappings which enables you to restore to another namespace. This has all kinds of use cases such as restoring an application for troubleshooting, to verify data, to restore into a test environment or even just to test your backups. I mean you are testing your backups right?

Once the restore is complete, we open up the page with the application on. Hey presto! It’s restored, along with our precious comments.

Guestbook After

Summary

In short, Velero is awesome. In a future post we’ll follow the same process but using the vSphere plugin. This leverages the vSphere CSI to allow volume snapshots.