 Hello everyone. Today, Xiang Chen and I will give a deep dive of Kubernetes data protection working group. My name is Xin Yang. I work at VMware in the cloud-native storage team. I'm also co-chair of CNCF tech storage and Kubernetes 6 storage. I work with Xiang Chen in the data protection working group. Hello everyone. This is Xiang Chen. I am a software engineer in Google. I lead the data protection working group in Kubernetes along with Xin. Next slide, please. We have a pretty full agenda today. As a euro, we will quickly go through what are the motivations, what problems we're trying to solve within this data protection working group. And we scan through the enterprises and companies that get involved in this effort. And we have some exciting key updates that want to share with you all today. And then we will spend the majority of the time deep diving into different components and what kind of progress this group has been making in the past year or two. Lastly, Xin will go through how can you get involved into this data protection working group. Moving on to next slide. So, motivation-wise, we all are aware that day one operations for stateful workloads are very well supported as of today for Kubernetes users. We have constructs like persistent volume claims and persistent volumes that has this life cycle kind of detached from the crisis life cycle. Various operations is supported over there, including provision of volume, deletion of volume, et cetera, et cetera. And then we have a good amount of workload API that allows you to use persistent volumes to deploy your stateful workloads in the Kubernetes context, such as deployment, step forces, and demon sets, et cetera, et cetera. Now, with more and more stateful workloads moving into Kubernetes because it gives you benefits in easy ease of management of scanning, et cetera, et cetera. And we observed the great productivity gains of moving from more legacy infrastructure into Kubernetes environment. With that drive, we see more and more gaps for data operations to protect your valuable data assets in a Kubernetes environment. There are tools like GitOps or GitOps, which can effectively protecting your application configuration, but there are still significant gaps in terms of protecting your application state. In other words, your application's data in Kubernetes context. So this is the main motivation of this data protection working group, and we are focusing on building components or proposing components to support data operations. Next slide, please. Thanks to all that who have been contributing into this working group, those are a list of companies that are supporting this initiative. If you don't see your name over there, please don't hesitate to reach out to me or Sheen. We're more than happy to add it. Next slide. There are some key updates I want to share with this group here. The first thing is that we have published a white paper. It's a very long white paper that clearly articulates what are the projects and what, as a working group, what do we see are the problems. And where are the gaps in Kubernetes, and what are the components this whole group is working on and some of the directions we are taking. In the second link, it's annual report which documents what have been achieved in the past 2022 year. And then what's looking forward, what this group will continue to focus upon. We do have quite a bit of talks in the past below all the links if you're interested. Please go ahead and watch them. Next slide, please. Now, before we dive into individual components, let's revisit a little bit what we see as an application backup in Kubernetes context and also the restoration data. So, studying all the way to the left. What we want to eventually do is application level backup in the backup workflow and typical application consists two parts for a state for workload. One is the Kubernetes resources that shapes the application, for example, how many pods, how many persistent volumes it needs and how, how it should be scaled, how much memory, etc, etc. The other big piece is the so-called data backup. The data backup are the data that stored on persistent volumes. Persistent volumes can well be a storage system that externally manage a storage system backed volume that attached to your pot. So, typically, it's not managed outside of the Kubernetes itself. So, application backup contains two pieces. As you can imagine, one of the big thing we want to solve is, okay, how do I effectively define an application? For example, your simple web service can contain a service entry. Maybe sometimes I used your entry, some secrets to the application, and then your deployment or state for sale, etc, etc. All these are the Kubernetes resources that constructs your application. On the data front, we have all the data stored on the persistent volume, and obviously there are many ways of how you can back up your volume, right? We distinguish this by roughly two categories. One is so-called native data dump, and the other is so-called controller coordinated orchestrated volume backups. So, native data dumps can include MySQL dumps, etc, etc, and controller coordinated can be volume snapshot-based or change block-based, etc, etc. And then, of course, it has the mechanisms to support quiescent application and unquiescent afterwards, such that we can achieve application consistency. On the price hand side, other building blocks we think that is needed to support the entire blue entry of your application backup workflow. The green boxes are existing today. Some good news over here in 1.27, and the consistent volume group snapshot support has been released to AFA, and the volume model convention has been released to beta. We have COSI, which is the object storage initiative to natively support object storage bucket provisioning and permission management. It has been in AFA in 1.25. Sheen and I will give more details later on what exactly those components we're doing. Next slide, please. The restore workflow is typically the reverse right though. So, you have backup, you want to restore your application to previously preserved state. So, from time to time, the most challenging piece is the PVC-PV restoration. In this context, it needs quite a bit of special handling, and that's where components like volume snapshot, rehydrating the PVC come into the picture, and also we will be talking about volume populator, which is better into 1.24. Again, COSI is in the picture as well because it serves as backup repository to store your backups. Next slide, please. Now, let's dive into individual components. The first one is called volume model convention. So, why we need this? Initially, there are two things over here, ghost contradiction. The story starts when you allow volume model transition blindly, and it may introduce vulnerability to the kernel, which is against the security rule. For example, if you take a snapshot of block devices and you try to rehydrate it into a file system PVC, and that file system PVC may well contain compromised sections that can cause issues to your kernel. This is a security vulnerability, and we have to solve that. However, this feature is needed in a backup workflow. So it's not rare that you take a snapshot of your file system PVC, but instead of doing file-based differences calculation, you want to do block-level differences for efficiency. In this case, you really want to rehydrate the snapshot into a block PVC and then from there to calculate the block differences and shift all the differences, the block differences off of the site to your backup repository. So, given that on one hand, it's a security vulnerability, on the other hand, there's a need in your backup workflow. So what we do here is to exercise, please, introduce this volume model convention. So how it works is that in our volume snapshot content resource, as of today, we introduce a source volume model, and this source volume model basically suggests from what kind of volume this snapshot is taken from. Whether it's a block devices or it's a 5.6 file system. So, and on top of that, an annotation which is part of the API contract called allow volume model change is added on the content. So the behavior is when the reconciler tries to rehydrate the PVC, the worm snapshot, the reconciler tries to rehydrate a worm. A volume from a worm snapshot, it checks whether the annotation is there. If it has it is there has it been set to two before it even tries to rehydrate the volume. And this way we leave the control to the end users, because worm snapshot content is actually an NM space, the resource. So we're expecting the users to clearly articulate whether this snapshot can be rehydrated into a different type of volume during the rehydration time. So this is basically the API change. Next slide, please. Right now this effort has been moved to one point to beta in 1.27 huge call out to the dev-laid runok. This is kept, the cap link and the alpha block is there, beta block will come in also. Moving on, we are trying to cover volume populator. So this is another component which is needed in the restoration path. So while we're doing this, so when we create a PVC from some external data source, the, we used to limit which data sources we can rehydrate a PVC from. So it used to only support a volume snapshot as well as another PVC. So this is to allow the backup systems to plug in any format the volume backup mechanism is to rehydrate a PVC. So this is now also supporting wait for the first consumer binding model. Go in next slide, please. So there are a couple of components that being developed have been developed in this effort. So the first thing is the volume populator will need some kind of signal. Imagine this controller or it's a pod running in your cluster. It needs to rely on some kind of signal to trigger its actions. So a CRD, it supports, will be created to trigger the events. This CRD will be part of the data source reference of a PVC. The volume populator controller then watch the PVC creation and exams the data source field. If it is the data source field it understands then it takes its action. There are also other Kubernetes CSI building components being developed. One of this is the library, which is effectively the API. Each individual volume populators can use to develop the controllers. The logic for actually rehydrated data is left to individual volume populators to implement it, but the interface there is kind of fixed to provide consistent weight of implementation. There is also a validator that has been implemented, which generates volume events on PVCs with data source. If there is no such a volume populator or some error happened during the process. Let's move on to the next slide to go exactly how it works. On the right side there are two Kubernetes resources. One is named example hello and the kind is hello. This is the CRD we're talking about, which specifies, which tells which volume populator there should be, which volume populator should pick up this request. The other one is embedded into the V1 persistent volume claim, the PVC object. You can see we introduced this data source references actually points to the CRD that is specified above. In order to use this feature, you need to enable the any volume data source feature gate. It is better in 1.24, so by default it should be on. The volume data source validator controller needs to be deployed into the cluster, such that all the events can be populated in the PVC and the individual or a special volume populator that developed by your backup vendor or whoever needs to be deployed into the cluster. So the workflow looks like you create a CR that a volume populator is aware of and then the volume populator will watch the creation of the CR, and then you create a PVC with the data source pointing to that CR. In the example, the PVC points to the example hello resources. So the corresponding volume populator then makes sure a PV is created and populated with the data from the corresponding data source specified in your example hello. This one, it's of your format. For example, in this case, it says file name is the example.txt and the content is like this. After all this is done, then the PV will be bounded to a PVC that you specify and you can start to use the PVC with all your data already in place. With this, it will make the restore workflow more flexible to allow different volume populators to plug in their own logic, how to rehydrate the volume. Next slide, please. We'll go out to the David Ben who have been almost single handed implemented all this logic in the community. And right now it is in beta, it is already in beta for a couple of cycles. The next step is really to move this effort to GA and there are a couple of reference links over there. If you're interested, feel free to click through them. With this, we're transferred to XIN to go through other components in this community. Thanks, Shang Chen. I'm going to talk about CBT. This is a feature that the data protection when group is actively working on CBT stands for change block tracking. As its name suggests, it identifies blocks of data that have been changed. It enables incremental backups. Without CBT, backup vendors have to do full backups all the time. This is not space efficient, takes longer time to complete and needs more bandwidth. The second case is snapshot based replication where you take snapshots periodically and replicate to another side for disaster recovery purpose. Without CBT, this solution becomes highly inefficient. Without a standard CBT API, we can either do full backups or call each storage vendors API individually to retrieve CBT, which is not ideal. We do have a cap that is based on aggregated API server to avoid possessing CBT records in the Kubernetes API server. However, there are concerns regarding the design. So now the data protection when group have been working on a new design that proposes to introduce Kubernetes API to create a session to request the changed block information. Add a GRPC API to retrieve data on this changed blocks in a session. There will be a new CSI CBT sidecar and CSI driver needs to implement the CBT logic. The CBT project is led by Yvonne and Prasad. There are also others who are contributing. Next, I'm going to talk about backup repository. Backup repository is a location or a repo to store data. This can be an object store or an FS or other type of storage. It could be in a cloud or an on-prem location. There are two types of data to be backed up that we need at the restore time. There are Kubernetes cluster metadata and snapshot data. We need to back them up and store them in a backup repository. There's a project called COSI aimed at supporting object store in Kubernetes. COSI provides Kubernetes APIs to provision object buckets and allow the buckets to be consumed by the pod. It also introduces GRPC interfaces for object storage providers to write drivers to provision buckets. There are three COSI components. COSI components include a COSI controller manager that binds COSI-created buckets to the bucket claims. A COSI sidecar that watches COSI Kubernetes API objects and calls a COSI driver. A COSI driver that implements GRPC interfaces to provision buckets. There are two sets of COSI Kubernetes APIs. The relationship between the bucket, bucket claim and bucket class is very similar to that for PVC and storage class. There are also Kubernetes APIs to allow a pod to access a bucket. As shown here, the bucket is in the cluster scope. It represents a physical bucket in the storage system. And the bucket claim is a request of a user for a bucket. And the bucket class allows Admin to describe what type of bucket will be provisioned. It supports three particles, A3, Azure, and GCS. And here we have bucket access class, which specifies the type of authentication. Authentication type includes key, which is the default, or IAM, which uses the service account token. In the bucket access, we specify bucket access class name and bucket claim name and credentials. First, user create a pod with a projected volume pointing to the secret in the bucket access. And the secret containing bucket info is mounted in the specified directory. Steve and Akash have been leading the COSI project. This project moved to Alpha status in Kubernetes 1.25 release. There are weekly standout meetings on Thursdays. Join the meeting if you are interested in learning more about it and contributing to the project. If you are a storage vendor that has an object storage product, you are welcome to write a driver for COSI. I included a link for blog posts here for your reference. Now I want to talk about Quias and Unquest hooks. We needed these hooks to your Quias application before taking a snapshot and Unquest afterwards to ensure application consistency. We investigated how Quias Unquest works in different types of workloads. They have different semantics. Our goal is to design a generic mechanism to run commands in containers. We currently have a proposal called Container Notifier. It proposes a way to specify a pod inline definition to run a command to Quias and Unquest and application. The proposal applies to general cases beyond Quias and Unquest. The cap is still being reviewed. We talked about the Container Notifier proposal which tries to ensure application consistency. What if you can't Quias the application or if the application Quias is too expensive, so you want to do it less frequently, but still want to be able to take a crash consistent snapshot more frequently. Also, an application will require the snapshots from multiple volumes to be taken at the same point in time. There's also a performance element here. It's much more efficient to take one snapshot across all volumes in one step if the storage system supports it. Then take one snapshot of a volume at a time. That's why when consistent group snapshot comes into the picture. In the consistent group snapshot design, we have a volume group snapshot that is a names-based object. It represents a user's request for group snapshot. We have a volume group snapshot content that is in the cluster scope. It represents a group snapshot on the storage system. And we have a volume group snapshot class that defines the type of group snapshot at admin. There are new CSI gRPC interfaces to create, delete, get volume group snapshots. New logic that manages a life cycle of volume group snapshots is being added to the snapshot controller and the CSI snapshotter setcar. The cap for consistent group snapshot was merged. We are targeting alpha in 1.27 release. I'm leading this feature. Group snapshot, news, devos, and a few others are working very hard on this project. Deep shout out to them. A blog post will be out soon. Next, I'm going to talk about application snapshot and backup. We have snapshot APIs for individual volumes. But what about protecting a stable application? There is a cap submitted that proposes a Kubernetes API that defines what is a stable application and how to take a snapshot backup of those stable applications. The cap is still in very early design stage. Now let's take a look of this diagram again. Cozy moved to alpha in 1.25 release. Voting mode conversion moved to beta in 1.27. Consistent group snapshot is targeting alpha in 1.27. And in this diagram, voting populated moved to beta in 1.24. As shown in this diagrams, we have made progress. The colors of Cozy, voting populated, consistent group snapshot have changed from yellow working progress to green existing. We hope to make more progress in the future. Now let me talk about how to get involved. We have a data protection within group community page here. It has lots of information to help you get started. We hold bi-weekly meetings on Wednesdays. And we have a mailing list and a Slack channel. Join us and get involved. That's all we have today. Thank you all for attending. Bye-bye.