Backup and Restore Stateful Workloads using Velero and Restic
This is quick post to summarise using Velero to backup your stateful workloads running on Kubernetes without the use of cloud provider plugin to snapshot the persistent volumes.
The current setup
I’ve got a Tanzu Kubernetes Grid cluster provisioned with the Guestbook application installed into a namespace called (imaginatively) guestbook. I’ve also added some random comments to the guestbook so we have some data to backup from the persistent volume.
I’ve also got Minio installed on another VM.
Installing Velero is simples.
Install the Velero CLI using the right method for your OS, I’m using a Mac so Homebrew, further details are available in the Docs.
Once installed, ensure your kube.config is pointing at the right cluster and run the below command. At a high level, this is going to install Velero into the cluster (Create CRDs, Daemonsets, Controllers etc.). I’ve used these options in this case:
- Installed the AWS plugin, this is needed to communicate with the S3 bucket (Minio)
- Instructed the installer to use the velero bucket
- Pointed the install at the credentials to login to said bucket
- Not to use volume snapshots (No cloud provider in this instance)
- Use Restic
- Some configuration parameters for the S3 bucket
velero install \ --provider aws \ --plugins velero/velero-plugin-for-aws:v1.1.0 \ --bucket velero \ --secret-file velero-cred \ --use-volume-snapshots false \ --use-restic \ --backup-location-config region=minio,s3ForcePathStyle="true",s3Url=http://192.168.15.15:9000
Confirm everything is ok with the deployed pods, for example they won’t start if you’ve configured the backup location incorrectly.
kubectl get pod --namespace velero
I’ve only got 2 worker nodes so only 2 restic pods:
NAME READY STATUS RESTARTS AGE restic-28vsn 1/1 Running 0 29m restic-q6s66 1/1 Running 0 29m velero-866bf8c8d5-7pxvr 1/1 Running 0 29m
The Velero CLI makes this really simple again. In general you can create a backup as a one-off, or you can create a schedule for it to run scheduled backups. The main decision I had to make was whether to use ‘Opt-in’ or ‘Opt-out’ volume backups. What does that mean?
In this case we run the backup without specifying what to do when a persistent volume is found. Once Velero finds the persistent volume it will check the annotations of the pod that has the volume mounted. If the pod has an annotation like the below, it will be backed up using Restic:
If not, it will attempt a volume snapshot. In our case, this will fail as we haven’t got this option available to us. This will make the backups show as PartiallyCompleted, making true errors harder to spot.
Alternatively, you could run the backup with the
In the opt-out case we configure the backup job to run with the
--default-volumes-to-restic option. For any persistent volumes that are found, Velero will attempt to back them up using Restic. In this case you need to opt out if you don’t want to backup a volume using the below annotation on the pod:
Run the backup
Which method you choose is likely to be another decision like any other, and can easily be set per backup job. In general I try to keep it as consistent as possible as it makes supporting the platform easier. Running the backup is a simple case of running the below command.
velero backup create guestbook-backup \ --include-namespaces guestbook \ --default-volumes-to-restic
Or to create an ongoing schedule:
velero schedule create guestbook-hourly \ --include-namespaces guestbook \ --default-volumes-to-restic \ --schedule="@every 1h"
The schedule can also be set using standard cron notation.
Check the Backup
velero backup get command will show us a list of the backups and their current status.
NAME STATUS ERRORS WARNINGS CREATED EXPIRES STORAGE LOCATION SELECTOR guestbook-backup Completed 0 0 2020-12-09 21:21:37 +0000 GMT 29d default <none> guestbook-hourly-20201209212153 Completed 0 0 2020-12-09 21:21:53 +0000 GMT 29d default <none>
If we run
velero backup describe guestbook-backup it will show us a little more information, including how many volumes were backed up using restic.
For a more verbose option,
velero backup logs guestbook-backup, will give you details about every command run during the backup.
So we’ve got a successful backup, lets test the restore. First I’ll delete the namespace that contains the application, deleting all the objects created within.
Now if the application was stateless, likely we could redploy the deployment manifests and be pretty close to the original state of the application. In this case we had all the guestbook comments stored on a persistent volume.
So lets restore it, no surprises that we’ll be using the Velero CLI. In it’s simplest form it’s:
velero restore create guestbook-restore \ --from-backup guestbook-backup
The option I’ve made use of repeatedly is the
--namespace-mappings which enables you to restore to another namespace. This has all kinds of use cases such as restoring an application for troubleshooting, to verify data, to restore into a test environment or even just to test your backups. I mean you are testing your backups right?
Once the restore is complete, we open up the page with the application on. Hey presto! It’s restored, along with our precious comments.
In short, Velero is awesome. In a future post we’ll follow the same process but using the vSphere plugin. This leverages the vSphere CSI to allow volume snapshots.