 Hello everyone. Today, Xiangqian and I will give a presentation about the data protection working group in Kubernetes. My name is Xinyang. I work at VMware in the cloud storage team. I'm a co-chair of Kubernetes 6 Storage. I also co-lead the data protection working group with Xiangqian. Hello everyone. This is Xiangqian. I am a software engineer from the cloud department. I work with Xinhevry providing data protection working group support. Here is today's agenda. First, we will talk about the history behind this data protection working group who are involved, why we need data protection in Kubernetes. We will talk about the charter, the data protection definition, what are the existing building blocks in Kubernetes and what are still missing, and what we are working on in order to fill those gaps. And finally, we will talk about how to get involved. In Kubernetes, one snapshot was introduced as an average feature in 1.12 and was promoted to beta in 1.17 release and it is targeting GA in 1.20. This allows us to backup and restore a volume based on volume snapshots. However, many things are still missing. At Kubecom in San Diego at the end of last year, we discussed about this and decided that we should form a working group to focus on this area. The working group was formally established in January this year and we have been holding bi-weekly meetings since then. As shown here, many companies have been supporting this working group. Both backup vendors and storage vendors have been participating in the working group. I will talk about why we need the data protection in Kubernetes. Applications have been around forever. However, the architecture of applications has changed drastically or gradually over time. With the transition from traditional applications to cloud native applications, the architecture is completely different to its predecessors. So we need a new way to do data protection for cloud native applications. When containerization was taken off, it was speculated that container applications would not need data protection because they are stateless. But that has changed. A lot of stateful applications are being run within Kubernetes the states. Kubernetes stateful applications use persistent volumes to stay their data. A persistent volume has an independent lifecycle from the pod that is consuming it so that data can be preserved on the underlying storage system even if the pod goes away. However, what if the underlying volume on the storage system gets corrupted for some reason? What if the underlying storage system is struck by a disaster? When that happens, even data stored on the persistent volumes will be gone. To prevent data loss from happening, we need to find a way to protect the data stored in the persistent volumes used by the Kubernetes stateful applications. Although how to provision persistent volumes is well known, it is still a challenge to protect your workloads in Kubernetes. This is a problem that this data protection one group wants to solve. This is our charter. This one group is formed so that we can have a cross-sig collaboration to figure out what are the missing functionalities and work together to design features in order to provide support for data protection in Kubernetes. Sponsoring six for this one group are six apps and six storage. Next, Xiangqian is going to talk about data protection definition. Thank you, Xin. Next slide, please. As Xin stated before, more and more stateful workloads started to move into Kubernetes environment. We observe the strong desire of protecting the stateful applications in Kubernetes context. The main purpose, of course, for that protection is to ensure those stateful applications or stateful workloads can be restored to a previously preserved state at any given point in time, especially in the cases like data corruption or DR, et cetera, et cetera. In Kubernetes context, we mainly target, we mainly discuss or target two types of entities. One is the resource, the API resources, and the other with the data that users store on persistent volumes. This itself is very complicated and we're a layered problem. So far, there are a couple of approaches of doing that. I'll go through this later. Next slide, please, Xin. Part of our challenge is to define what are the Kubernetes native constructs to enable backup and query for different labels. We are not in a position to provide end-to-end solution to protect every single application, but we do want to provide common modules that backup vendors or users can easily use. Those include persistent volume level, volume snapshot, volume backup, and how to restore from a rehydrated volume from those snapshot backups. At application level, how do you know which resources belong, API resources, belongs to a specific application? How do you consistently inquire and inquire as an application so that an application consistent snapshot can be taken? And finally, the cluster level. Next slide, please. As of today, we observe this backup workflow in Kubernetes context. The user, first of all, starts a backup. So it actually goes for two steps. The first step is to collect all the Kubernetes resources and the backup into some external repository, backup repository. And the other one is the data backup. And in the data backup piece, there are two models right now we observed. One is so-called application native data.dump. For applications like MySQL, they already have native support of dumping the snapshot data into a file. And then some extra components can pick up those dumps and put it in the backup repository. And the other way is that the controller coordinate it. And that way, the application do not have a native data dump mechanism. So, but it can queers itself and it can inquires itself. So the controller will first try to queers the application and creates volume snapshots for all the volumes that the application is used and unqueers the applications that the application can start serving again. And after that, volume data or snapshot data can be exposed into some external backup. On the reverse side, the restore workflow. Next slide, please. The user does a restore. You import the backup into the cluster and then firstly restore the Kubernetes resources. But PVPVCs needs to be handled especially because there are many dependencies. In the native way, map to the previous mention native way as well, the application can have native data can restore from a native data dump. In the non-native way, just rehydrate PVCs from volume snapshots and volume back. So we have these workflows. What are the building blocks to support these workflows is what we want to answer in this working group. Next slide, please. So what are there at this moment? Existing bit. In application, from an application's perspective, we have workload APIs, state presets, deployment, et cetera, and application CRD. And those are high-level constructs that groups a set of Kubernetes resources together to form your application. The other in the storage layer, we have volume snapshots, which is built on top of PVC, can take a point in time snapshot of volume. So how does this building blocks fit into the picture? Next slide, please. Next slide, please. Yes, thanks. The workload API and the state apps application CRD can fit into the Kubernetes resource backup process. With that, it provides convenient way to group Kubernetes resources in together. Next slide, please. Let's take a look at how exactly it looked like for the application CRD. The application CRD is nothing but providing an API for managing applications in Kubernetes. It aggregates individual Kubernetes components. For example, in this example, it's an application CRD for MongoDB. It contains a service, a state offset, and maybe some secrets underneath for the YAML file. The key is that with this single CR, you can tell which resources belong to this MongoDB in the Kubernetes world. Now, let's take a look at how worm snapshot fits in the picture. Next slide, please, Jin. In the backup workflow, worm snapshot can be used in the controller-coordinated workflow, where worm snapshot will be created after the application is coerced. Next slide, please. In the restore workflow, worm snapshot can be used to rehydrate PVC from them and restore the workload into the previous state. Worm snapshot has been moving to beta since 119. Next slide, please. We plan to move the feature to GA in 120. One addition web hook has been added in version 119, and we are right now working on enhancing the observability around the controllers and adding more end-to-end tests, mostly stress tests. Looking forward to the GA of this feature. We're talking about existing. What are missing there? Next slide, please. We're missing a whole lot. This is not yet a complete list yet. Worm backup, repository, quiz and inquires hooks, et cetera, et cetera. If you take a full picture of this, next slide, please. All of these green boxes are currently existing. The yellow boxes are working in progress, and the orange boxes are not there yet. So those components, application backup can be used to group the resources and coordinate data backup as well. Container notify fits in multiple scenarios in this backup workflow. Next slide, please. In the restore workflow, it is similar. I will not spend too much time on this. I will let Xin to go through all of these orange and yellow boxes in the next couple of slides. Thanks, Xin. It's all yours. Thanks, Xiangqian. I'm going to explain in more details on why we think all of these are missing building blocks and what we are planning to do with them. The first missing building block we identified is voting backup. We need this because we need to extract data to a secondary storage. We already got a voting snapshot API, but there's no explicit definition made in the design to have snapshots stored on a different backup device separate from the primary storage. For some cloud providers, a snapshot is actually a backup that is uploaded to you and object store in cloud. However, for most other storage vendors, a snapshot is locally stored alongside the volume on the primary storage. Therefore, it is impossible to design a portable data protection policy to support all storage vendors. Without a voting backup API, the alternative is for backup vendors to have two solutions. For storage systems that upload snapshots to object store automatically, a snapshot is a backup. For storage systems that only take local snapshots, use voting snapshot API to take the snapshot and then have a data more to upload snapshot to a backup device. We just started discussions about this in the working group. Let's take a look of this diagram. Voting backup is next to what is snapshot here. We put it in an orange box to indicate that it is a missing Kubernetes component. We have started discussions about it, but there's no concrete design yet. The next one is CBT and the changed file list. Without CBT and changed file list, backup vendors have to do full backups all the time. This is not space efficient, takes longer to complete and miss more bandwidth. Another use case is snapshot-based replication where you take snapshots periodically and replicate to another site for disaster recovery purpose. So what are the alternatives? Without CBT, we can either do full backups or call each storage API individually to retrieve CBT, which is highly inefficient. We just started a discussion about this in the working group. The next one that we think are a missing building block is the backup repository. Backup repository is a location or repo to store data. This can be an object stored in a cloud, an on-prem storage location, or some NFS-based solutions. There are two types of data to be backed up that we need at the store time. The first one is Kubernetes cluster metadata. The second one is local snapshot data. We need to back them up and store them in a backup repository. Currently, there is a proposal for object store backup repository. That is the proposal for object bucket provisioning or COSI. This proposes object storage Kubernetes APIs to support orchestration of object store operations for Kubernetes workloads. They are for bringing object storage as the first class citizen in Kubernetes, just like file and block storage. It also introduces container object storage interface or COSI as a set of GRPC interfaces for provisioning object stores. Kubernetes COSI is already a subproject in six storage. CAP was merged as provisional in 2020 and the plan is to do prototyping. We already have GitHub reposts created for this. There is also a session about COSI at KubeCon. Check it out if you are interested. Let's see where COSI is in this diagram. COSI is seeing a yellow box indicating that this is a working progress Kubernetes component. This is an object store backup repository. It can be used to export backup and store the data. Now let's take a look at the restore. COSI is used to import backup data at the restore time. The next one is generic data populator. Currently, we can only create a PVC for another PVC or a volume snapshot. But what if the backup data is stored in a backup repository such as an object store? The generic data populator feature allows us to provision a PVC from an external data source such as a backup repository. In addition, it allows us to dynamically provision a PVC having data populated from that backup repository and honor the wait for first consumer volume binding mode during restore to ensure that volume is placed at the right node where the pod is scheduled. There is an any volume data source alpha feature gate which was introduced in 1.18. In 1.20, the plan is to do the design and prototyping for generic data populator implementation. Now let's take a look at the diagram. We can see that generic data populator is needed at the restore time. Generic data populator is in a yellow box indicating it is a working progress Kubernetes component. It is used to rehydrate PVC from a backup repository during restore. Next one is Quias and unquias hooks. We need these hooks to quias application before taking a snapshot and unquias afterwards to ensure application consistency. We investigated how quias unquias works in different types of workloads. We looked at relational databases such as mysql that provides a command to flush tables with readlock so we can use that to do quias. We looked at time-series databases such as newdb, prometheus, influxdb which do not have a explicit quias command but there is a CLI that supports consistent backups and restores. Key value store, we also take a look at the example, etcd that provides a command to backup and restore but it does not have a explicit quias command. We also look at message queues such as Kafka. For example, Kafka is designed for fault tolerance. We can do backup and restore in Kubernetes with best effort but there are many issues. Partitioning may be rebalancing in Kafka where broker is offline and that might cause data loss. We also look at distributed databases such as MongoDB that provides a command to flush all pending write operations to disk and locks MongoDB instance against writes so we can use that command to do quias but we need to keep in mind that the process to backup non-sharded versus sharded MongoDB databases are different. We don't have time to go over all the findings here but we will include them in the white paper that we are working on. We want to design a generic mechanism to draw commands in containers but we want to mention that application-specific semantics is out of scope. We currently have a proposal called container notifier, CAPI submitted and it has been reviewed. Let's take a look at the diagram here. Container notifier is mainly used at backup time to do quias before taking snapshot and unquias afterwards. This is also a working progress Kubernetes component. The next one is consistent group snapshot. We talked about the container notifier proposal which tries to ensure application consistency. What if we can't quias the application or if the application quias is too expensive so you want to do it less frequently but still want to be able to do a crash consistent snapshot more frequently. Also an application may require the snapshots for multiple volumes to be taken at the same point in time. That's when consistent group snapshot comes into the picture. There is a CAP group and group snapshot. It proposes to introduce a new voting group CRD that groups multiple volumes together and a new group snapshot CRD that supports taking a snapshot of all volumes in a group to ensure right order consistency. The CAP is being reviewed. Let's take a look at the diagram here. We don't have container notifier to do quies here but we have a consistent group snapshot that facilitates the creation of a snapshot of multiple volumes in the same group to ensure right order consistency. Next one is application snapshot and backup. We have snapshot APIs for individual volumes but what about protecting a stateful application? There is a CAP submitted that proposes a Kubernetes API that defines the notion of stateful applications and defines how to run operations on those stateful applications such as snapshot backup and restore. This is still at a very early design stage. As shown in this diagram for backup application, backup handles the backup of a stateful application. It can leverage container notifier to do quies and use COSI as a backup repository. Similarly, we can have an application restore that handles the restore of a stateful application. So these are all the missing building blocks that we have identified and are working on. We hope eventually we can turn all of these yellow and orange boxes into green ones and when that happens someday in the future, our mission is accomplished. All right, next I'm going to talk about how to get involved. As discussed in previous slides, this working group is working on identifying missing functionalities in supporting data protection in Kubernetes and trying to figure out how to fill those gaps. We have bi-weekly meetings on Wednesdays at 9 a.m. Pacific time. If you are interested in joining the discussions, you are welcome to join our meetings. We also have a mailing list and a Slack channel as shown here. This is the end of the presentation. Thank you all for attending the session. If you have any questions, please don't hesitate to reach out to us. Thank you.