A container is a type of server virtualization technology. Compared with a virtual machine (VM), it has the following advantages:
Docker and Kubernetes are typical platforms for hosting containers. Containers are also highly compatible with microservices architectures and DevOps, and many companies are promoting containers as part of application modernization.
Another difference between VMs and containers is that containers don’t hold data—that is, they are stateless. For instance, in a VM, the data written to the virtual disk is still available after the VM is restarted. But every time a container is restarted, the data written inside the container is deleted. Therefore, if you’re configuring a stateful application (an application that holds data) such as a database on a container platform, you must make the data available (“persistent”) outside the container.
As an example, in Kubernetes, an external volume that stores data is managed by an object called a persistent volume (PV), so that the data persists. Also, to use this PV, application developers generally make a request by creating a Kubernetes object called a persistent volume claim (PVC). Furthermore, by introducing a storage plug-in for container platforms called a Container Storage Interface (CSI) provisioner, it’s possible to automatically allocate storage volumes that meet the requirements of the application.
With these mechanisms, application developers can immediately use storage suitable for their applications without having to know the specifications of the devices that store the data, so they can focus on development work. For infrastructure administrators, after a storage provider is in place, management—such as storage volume creation and deletion—is automated, reducing the operational burden. The following illustration shows a PV payout flow using NetApp® Astra™ Trident, a storage orchestrator provided by NetApp. (For more information about Astra Trident, see the official documentation and the technical blog.)
By combining container mechanisms such as PVCs and PVs, which abstract complex storage infrastructures, and storage plug-ins such as Astra Trident, development efficiency is dramatically improved and business needs can be quickly met. This is a major benefit of migrating applications to containers. On the other hand, containerization also creates new challenges.
The idea of backup in containerized applications is very different from traditional monolithic applications.
In Kubernetes, for example, an application is managed by a collection of files called a manifest. A manifest is a YAML file that describes what various Kubernetes objects (pods, services, secrets, PVCs, and so on) should look like. In other words, to protect a container application during operations like backup, replication, and migration, it’s necessary to manage and store the many manifests that constitute the application. Managing such a large number of manifests properly is one of the most common challenges in operating a container environment.
Protecting application data (data in a PV) is also particularly important, especially for stateful applications that hold data. Data in a PV is generally stored outside Kubernetes, and the data itself isn’t managed by a manifest. This means that even if you can successfully back up a large set of Kubernetes manifests, you cannot protect the data in the PV by itself. Therefore, in a stateful application, in addition to backing up the Kubernetes manifests, application data in external storage must also be backed up somehow, and the backed-up manifests must be consistent with the application data.
In the past, application backup operations were limited to system backups of VMs, but containerization can complicate operations.
So, how exactly can you achieve an application backup that includes the data in the PV? In this blog post, we’ll compare three backup methods. Methods 1 and 2 are widely used methods for Kubernetes backup operations; method 3 uses NetApp Astra Control.
Backup method | Method 1: Etcd backup | Method 2: Version-control system (such as Git) | Method 3: Astra Control |
Backup of application configuration information (manifests) | Application-specific backup and restore isn’t possible because it’s done cluster by cluster | Per manifest | Per namespace or per object with a specific label |
Backup of application data (data in a PV) | Not included | Not included | Stored in externally with standard functionality |
Key use cases | Scheduled backups in case of site failure or for disaster recovery (DR) | • Backup before and after app updates
• Application replication and migration to another cluster |
• Scheduled backups in case of site failure or for DR
• Backup before and after app updates • Application replication and migration to another cluster |
Method 1 backs up the entire Kubernetes cluster configuration information database (etcd). This method backs up and restores the entire database that stores cluster configuration information, and is widely used as a regular backup in case of site failure. On the other hand, it’s difficult to restore only a portion of a database, and this method isn’t suitable for use cases such as restoring specific applications or migrating to a different cluster. Method 2 uses a version-control system such as Git to manage Kubernetes manifests.
Unlike method 1 (etcd backups), this method allows granular, manifest-level backup and restore operations. But the manifest stored in the version-control system must be kept consistent with the configuration information of the actual cluster at all times, so the version-control system itself requires a degree of maturity. Also, methods 1 and 2 protect only application configuration information (manifest groups). If the target application is stateful, application data stored outside Kubernetes (in a PV) must be backed up separately.
The third method uses Astra Control to protect your applications. Astra Control is NetApp’s data protection solution for container workloads, allowing you to back up elements in bulk, whether they’re a set of Kubernetes objects with a namespace or specific label, or application data outside Kubernetes (data in a PV). This approach makes it possible to protect applications with simple operations—even for complex backups of stateful applications. The following is the execution flow for backing up a Kubernetes application by using Astra Control.
Astra Control also provides a wide variety of features, including the ability to store backup data in external object storage, the ability to manage periodic backups and backup generations (protection policies), and third-party integration through REST APIs and Python SDKs. It’s suitable for many use cases, including DR and DevOps.
In addition to this blog, we’re also disseminating various technical information on Qiita.com, such as the basic usage of Astra Control. We also have applied examples in DevOps. Both resources are in Japanese, but the screenshots and code examples are in English. You can also find more information on Astra Control in our documentation.
Be sure to check it out!
Since joining NetApp in 2020, Shimizu Yu has been a part of the company’s consultancy department, providing consulting services focused on areas such as containers/DevOps and AI/ML. He is a professional with more than ten years of experience in the industry, including designing and building IT infrastructure in general, as well as storage, and working as a cloud service provider. In recent years, he has been actively disseminating his knowledge in various fields to the outside world through NetApp events and communities such as Qiita.