Sign in to my dashboard Create an account
Menu

How to back up containerized applications

containers stacked on top of each other
Contents

Share this page

Shimizu Yu
Shimzu Yu

What is a container, and how is it different from a virtual machine?

A container is a type of server virtualization technology. Compared with a virtual machine (VM), it has the following advantages:

  • It’s lightweight, so applications can be launched in a short time.
  • It shares the Linux kernel with the host operating system for better resource efficiency.
  • The application execution environment can be packaged as a container image, making it easy to ensure consistency from development to production.
  • A container is platform-independent and can run on premises or in the cloud.

Docker and Kubernetes are typical platforms for hosting containers. Containers are also highly compatible with microservices architectures and DevOps, and many companies are promoting containers as part of application modernization.

Managing data in containerized applications

Another difference between VMs and containers is that containers don’t hold data—that is, they are stateless. For instance, in a VM, the data written to the virtual disk is still available after the VM is restarted. But every time a container is restarted, the data written inside the container is deleted. Therefore, if you’re configuring a stateful application (an application that holds data) such as a database on a container platform, you must make the data available (“persistent”) outside the container.

As an example, in Kubernetes, an external volume that stores data is managed by an object called a persistent volume (PV), so that the data persists. Also, to use this PV, application developers generally make a request by creating a Kubernetes object called a persistent volume claim (PVC). Furthermore, by introducing a storage plug-in for container platforms called a Container Storage Interface (CSI) provisioner, it’s possible to automatically allocate storage volumes that meet the requirements of the application.

With these mechanisms, application developers can immediately use storage suitable for their applications without having to know the specifications of the devices that store the data, so they can focus on development work. For infrastructure administrators, after a storage provider is in place, management—such as storage volume creation and deletion—is automated, reducing the operational burden. The following illustration shows a PV payout flow using NetApp® Astra Trident, a storage orchestrator provided by NetApp. (For more information about Astra Trident, see the official documentation and the technical blog.)

Developer to IT Admin data flow diagram

By combining container mechanisms such as PVCs and PVs, which abstract complex storage infrastructures, and storage plug-ins such as Astra Trident, development efficiency is dramatically improved and business needs can be quickly met. This is a major benefit of migrating applications to containers. On the other hand, containerization also creates new challenges.

Application backup made more difficult by containerization

The idea of backup in containerized applications is very different from traditional monolithic applications.

In Kubernetes, for example, an application is managed by a collection of files called a manifest. A manifest is a YAML file that describes what various Kubernetes objects (pods, services, secrets, PVCs, and so on) should look like. In other words, to protect a container application during operations like backup, replication, and migration, it’s necessary to manage and store the many manifests that constitute the application. Managing such a large number of manifests properly is one of the most common challenges in operating a container environment.

Protecting application data (data in a PV) is also particularly important, especially for stateful applications that hold data. Data in a PV is generally stored outside Kubernetes, and the data itself isn’t managed by a manifest. This means that even if you can successfully back up a large set of Kubernetes manifests, you cannot protect the data in the PV by itself. Therefore, in a stateful application, in addition to backing up the Kubernetes manifests, application data in external storage must also be backed up somehow, and the backed-up manifests must be consistent with the application data.

external storage to application data back up flow process

In the past, application backup operations were limited to system backups of VMs, but containerization can complicate operations.

How do you back up containerized applications?

So, how exactly can you achieve an application backup that includes the data in the PV? In this blog post, we’ll compare three backup methods. Methods 1 and 2 are widely used methods for Kubernetes backup operations; method 3 uses NetApp Astra Control.

Backup method Method 1: Etcd backup Method 2: Version-control system (such as Git) Method 3: Astra Control
Backup of application configuration information (manifests) Application-specific backup and restore isn’t possible because it’s done cluster by cluster Per manifest Per namespace or per object with a specific label
Backup of application data (data in a PV) Not included Not included Stored in externally with standard functionality
Key use cases Scheduled backups in case of site failure or for disaster recovery (DR) • Backup before and after app updates
• Application replication and migration to another cluster
• Scheduled backups in case of site failure or for DR
• Backup before and after app updates
• Application replication and migration to another cluster

Method 1 backs up the entire Kubernetes cluster configuration information database (etcd). This method backs up and restores the entire database that stores cluster configuration information, and is widely used as a regular backup in case of site failure. On the other hand, it’s difficult to restore only a portion of a database, and this method isn’t suitable for use cases such as restoring specific applications or migrating to a different cluster. Method 2 uses a version-control system such as Git to manage Kubernetes manifests.

Unlike method 1 (etcd backups), this method allows granular, manifest-level backup and restore operations. But the manifest stored in the version-control system must be kept consistent with the configuration information of the actual cluster at all times, so the version-control system itself requires a degree of maturity. Also, methods 1 and 2 protect only application configuration information (manifest groups). If the target application is stateful, application data stored outside Kubernetes (in a PV) must be backed up separately.

The third method uses Astra Control to protect your applications. Astra Control is NetApp’s data protection solution for container workloads, allowing you to back up elements in bulk, whether they’re a set of Kubernetes objects with a namespace or specific label, or application data outside Kubernetes (data in a PV). This approach makes it possible to protect applications with simple operations—even for complex backups of stateful applications. The following is the execution flow for backing up a Kubernetes application by using Astra Control.

IT admin or App Developer Astra control back up process

Astra Control also provides a wide variety of features, including the ability to store backup data in external object storage, the ability to manage periodic backups and backup generations (protection policies), and third-party integration through REST APIs and Python SDKs. It’s suitable for many use cases, including DR and DevOps.

In addition to this blog, we’re also disseminating various technical information on Qiita.com, such as the basic usage of Astra Control. We also have applied examples in DevOps. Both resources are in Japanese, but the screenshots and code examples are in English. You can also find more information on Astra Control in our documentation.

Be sure to check it out!

Shimzu Yu

Since joining NetApp in 2020, Shimizu Yu has been a part of the company’s consultancy department, providing consulting services focused on areas such as containers/DevOps and AI/ML. He is a professional with more than ten years of experience in the industry, including designing and building IT infrastructure in general, as well as storage, and working as a cloud service provider. In recent years, he has been actively disseminating his knowledge in various fields to the outside world through NetApp events and communities such as Qiita.

View all Posts by Shimzu Yu

Next Steps

Drift chat loading