Kubernetes 101 – Volumes

March 22, 2020

This is the fourth in a series of blog posts that will explain the different components of Kubernetes. Primarily because if I can explain it here, I’ll have learned it quite well myself.

The first part is about Pods and can be found here.
The second part is about Controllers and can be found here.
The third part is about Services and can be found here.

Where does the data go?

Kubernetes was originally planned to be for stateless applications to be deployed. So when the pod died, or was killed, the data went with it. In fact it’s a little more granular than that, when a container starts it has an exact copy of the filesystem of the image it is built from. So if a container within a pod is restarted, anything written to that filesystem is gone. Or if two containers are running inside the same pod, they have logically separate filesystems so can’t share data.

This is where volumes come in. What follows is an overview of your options with regard to volumes.

What is a volume?

A volume is a component of a pod and can’t be defined as a separate object in Kubernetes. As it is defined at the pod level, it persists across individual container restarts. It can also be mounted into the filesystem of any of the containers in each pod. When defining your volume, you also define the type of volume. Typically if you want to share semi-persistent data between two containers in the same pod you’re going to use one of the below:

emptyDir - Exactly as it says. When the pod first starts it is empty until a container writes to it. When the pod dies, the volume (and it’s data) die too.
gitRepo - This starts off as an emptyDir. Then a git repository is cloned into the volume before the containers start. This is not kept up to date while the pod is running.
hostPath - Path on the worker node where the pod is running. This isn’t seen very often as it has quite a lot of drawbacks, and really not very secure. However, there is the occasional edge case where this is required.
While these solve a problem, they’re not really persistent.

Configuration Volumes

Other types that could be used to pass configuration information into a pod are listed below. These will be the subject of a separate blog post in the future.

configMap - Presents a configuration map through as a volume.
secret - Same again, but with a secret.
downwardAPI - Enables the pod to hit the Kubernetes API and pull information about the platform it’s running on.

Persistent Volumes

What to do if you want to maintain a central repository of files on the network for your application to access, like static website content? Or run a database like MySQL? Or a stateful cache like redis or memcached? I.e. you want a truly persistent chunk of storage that persists across pod & node restarts.
If you’re on a public cloud vendor, you’re likely to have options around their block storage providor:

gcePersistentDisk - GCE disk
awsElasticBlockStore - AWS EBS disk Or you could use an existing on premises store:
nfs - Path to an NFS server.
These are great, if you’re only using that single providor. Otherwise you need to change your manifests depending on where you’re deploying to. What you really need is to do what the rest of Kubernetes is doing for you, and abstract the infrastructure away from the application. The types you’d be looking at for this are:
persistentVolume - A statically created volume on shared storage.
persistentVolumeClaim - The method of a pod claiming a persistentVolume.
Now these solve the persistency problem, but they still need to be managed (created, deleted, documented). Say you create a bunch of persistentVolumes for the developers to use. There’s still the “infrastructure bottle neck” when more volumes need to be created. Wouldn’t it be nice if Kubernetes could dynamically create them when the pod manifest is posted to the API server?

Dynamically Provisioning Persistent Volumes

Wouldn’t you just know it, you can! With the use of the Cloud Storage Interface (CSI) you can hook into whatever cloud provider you’re running on and initiate a request for a volume. Simply install the cloud provider’s controllers and custom resource types and all this extra functionality is exposed to be consumed by whoever needs it. All the cluster administrator needs to create at this point is a storageClass object (or as many as are required) and let whoever is deploying applications know of it’s existence. If they really don’t get on, the deployment person can query the API server for available classes or the cluster administrator could nominate one of the classes as the default.
Once this is done, a persistentVolumeClaim can be created and a volume matching the storageClass and volume size will be created.

Summary

There’s no shortage of options with volumes on Kubernetes. It’s more about ensuring that you’re using the right type and are managing them correctly. Anything that can be completed automatically by your platform can only make your life easier!