 Hello, everyone. Today, Michelle and I will give an introduction and update to Kubernetes 6 Storage. My name is Xin Yang. I work at VML in the Cloud Storage team. I'm also a co-chair in 6 Storage, along with Sadali from Google. Hi, I'm Michelle. I'm a tech lead for 6 Storage. And then also Yan from Red Hat is another tech lead in 6 Storage. So our session today will include two parts. In the first half, we will give an introduction. In the second half, we will give an update and deep dive. In the introduction, I'm going to talk about some basic concepts in Kubernetes Storage and how to get involved. First, I'll talk about persistent storage. Kubernetes storage provides a way for containers in the pods to consume block or file storage. Persistent storage is one type of storage that lives beyond a pod's lifecycle. The terminologies we heard most in 6 Storage are probably PVC, PB, and storage class. PVC, persistent volume claim, is a user space object. It is a request by a user for storage. A PV, persistent volume, is a cluster scope object. It represents a physical volume on the storage system. A PVC and a PV have a one-on-one mapping. Storage class is in the cluster scope. It is a way for admin to describe the classes of storage. Different classes might map to different quality of service levels or other admin-defined policies. In dynamic provisioning, storage class is used to find out which provisioner should be used and what parameters should be passed to the provisioner when creating the volume. A pod is a group of one or more containers with a shared storage and network resources and a specification for how to run the containers. A pod is a user space object. A PVC is used by a pod. In static provisioning, a cluster admin creates a number of PVs which carry the details of the real storage. The control plane can bind the PVCs to PVs in the cluster. However, if you want a PVC to bind to a specific PV, you need to pre-bind them. When none of the static PVs the admin created match a user's PVC, the cluster may try to dynamically provision a volume, especially for the PVC. This provisioning is based on storage classes. The PVC must request a storage class and admin must have created and configured that class for dynamic provisioning to occur. Here is an example of a pod, PVC and a storage class. The pod is using the PVC. PVC has capacity, access modes. It is rewritten once here and storage class name specified here. In the storage class there is a provisioner that determines what volume plugin is used for provisioning PVs. The reclaim policy is retained. This means the PV will remain along with the physical volume on the storage system when the user deletes the PVC. If the reclaim policy is deleted, the PV along with the physical volume will be deleted when the user deletes the PVC. Allow volume expansion. It's true, so volume expansion can be requested by the user here. And volume binding mode is immediate in the storage class. Immediate means volume binding and dynamic provisioning occurs once the PVC is created. However, this may result in unschedulable pods. A cluster admin can address this issue by specifying the wait for first consumer mode which will delay the binding and provisioning of a PV until a pod using this PVC is created. Storage class also has parameters that are storage provider specific and opaque to Kubernetes. Next, I'm going to talk about ephemeral storage. Ephemeral storage becomes available when the pod is started and goes away when the pod goes down. We have local ephemeral storage such as Emptider secrets, config maps, and downward APIs. And we have CSI ephemeral volumes and generic ephemeral volumes. An Emptider volume is first created when a pod is assigned to a node and it exists as long as that pod is running on that pod, that pod is running on that node. As the name suggests, the Emptider volume is initially empty. All containers in the pod can read and write the same files in the Emptider volume. Though that volume can be mounted at the same or different paths in each container, when a pod is removed from a node, the data in Emptider is deleted permanently. And Emptider is usually used as scratch space. A secret volume is used to pass sensitive information such as passwords to pods. You can store secrets in the secret Kubernetes API and mount them as the files for use by pods. Secret volumes are backed by 10-peps so they are never returned to non-molotile storage. The next one is a config map. A config map provides a way to inject non-convidential configuration data into pods. When referencing a config map, you provide the name of the config map in a volume. In this example here, the config map is mounted as a volume and all contents are mounted into the pod at the path derived from the mount path and the key in the config map. Here's an example of the downward API volume. It makes downward API data available to the applications. It amounts the directory and writes the requested data in plain text files. In this pod.yaml file, you can see that the pod has a downward API volume and the container mounts the volume at a specified location. Each element and items is a downward API volume file. The first element specifies that the value of the pod's metadata labels field should be stored in a file named labels. Now I'm going to talk about CSI inline fmoral volume. In this example here, we set volume type to CSI in pod inline definition and specify the driver name and volume attributes. For a CSI driver to support CSI fmoral volumes, it must be modified or implemented specifically for this purpose. A CSI driver is suitable for CSI fmoral inline volume if it serves a special purpose and meets custom per volume parameters like drivers that provide secrets your pod. The secret store CSI driver is a good example. A CSI driver is not suitable for CSI fmoral inline volumes when provisioning is not local to the node or when fmoral volume creation requires volume attributes that should be restricted to an admin, for example, parameters in a storage class. Next, I'm going to talk about generic fmoral volume. The generic fmoral volume feature allows any existing storage driver that supports dynamic provisioning to be used as an fmoral volume with the volume's life cycle bound to the pod. It can be used to provide scratch storage that is different from the root disk. For example, persistent memory or a separate local disk on that node. All storage class parameters for volume provisioning are supported. All features supported with PVCs are supported, such as storage capacity tracking, snapshotting, cloning, and volume resizing. This features data since 1.21 release and it is targeting GA in 1.23. Next, I'm going to talk about volume plugins. Kubernetes volume plugins include entry plugins, auto tree flex volume, and CSI drivers. Some entry plugins, such as those fmoral ones I mentioned earlier were stay entry, but most other entry plugins are either deprecated or are migrating to CSI drivers. Michelle will talk more about it later. Flex volume is deprecated. CSI driver is the recommended way to write plugins. The Kubernetes implementation of the container storage interface CSI has been GA since the 1.13 release. CSI is designed to be vendor-neutral, interoperable, and has a focus on specification. It defines a set of storage interfaces so that a storage vendor can write one plugin and have it work across a range of container orchestration systems. In a CSI spec, we have RPCs for volume lifecycle management. This includes provisioning support, such as create and delete volume, and RPCs that make sure volumes are available for a part to use, such as attach and detach volume and mount and amount volume. It also has other functions such as expand volume, snapshotting, cloning, holding health, and so on. Here is an example of our CSI deployment. It shows various Kubernetes components, the CSI driver and the storage system that is used to persist the data. Here we have cube controller manager on master node. CSI driver controller plugin is deployed together with Kubernetes CSI external provisioner, external attacher, external resizer, and external snapshotter site cards. Note that the CSI driver controller pod does not have to run on the same node as the Kubernetes master, but is recommended to run on dedicated control plane nodes. The Kubernetes CSI site cards are watching Kubernetes aka objects such as PBCs, PBs, volume attachments, volume snapshots to detect create volume, attach volume, volume expansion, and volume snapshot requests. The site cards call the CSI driver, and the CSI driver communicates with the storage system to complete those volume operations. On Kubernetes worker nodes, we have kubelet and the CSI driver node plugin deployed together with the node driver registrar site card container. Know the driver registrar such as driver information using node get info from CSI endpoint and the registers the CSI driver with a kubelet on that node. Kubelet directly issues CSI node get info, node state volume, and node publish volume calls against the CSI drivers to get info and amount volumes. That's all for the basic Kubernetes story concepts. Next, I will talk about how to get involved. Here I included the six storage community page. It has a lot of information to get you started. We have bi-weekly meetings on Thursdays where we go through features or tracking for each Kubernetes release and discuss any design issues or other issues added to the agenda doc. This is a good place for a new contributor to get started, join the meeting, and see how the SIG works, what you are interested in, and get assigned to work on some tasks. The communication within the SIG is through the mailing list or Slack channel. Here I included some resources for your reference. Here are docs that explain what are the Kubernetes storage concepts and what is CSI. The last reference is an example to deploy the sample CSI host pass driver for a new contributor who wants to contribute a code. It's good to follow this example and learn how CSI works. That's all for the introduction. Handing over to Michelle for the six storage update. Thank you, Xing. So for the six storage update, I'm going to deep dive into a couple of major efforts that have been going on in the SIG for the last couple of releases. And then I'll give a summary of projects that we have promoted in the 1.22 timeframe and talk about some things that we're working on in 1.23 and also things that are currently in design prototyping phases and then other projects that we are working with other SIGs on. So first on the deep dive, I'll talk about CSI migration. To give some background on why we are doing CSI migration, the Kubernetes project has deprecated all of the built-in cloud providers and the project is targeting to remove these cloud providers starting in 1.24. So work has been going on for a lot of releases now to decouple the cloud-specific controllers and Kubernetes from the core Kubernetes engines. And now that effort is finally approaching a point where we can confidently switch over to the external cloud provider model. However, persistent volumes have an especially interesting problem because the cloud-specific volume types are built directly into the core Kubernetes volume APIs. And the Kubernetes APIs have very strict backwards compatibility policies that make it difficult to modify or remove support for the API. So what we've come up with is the CSI migration project. Basically, it allows you to continue using any existing PDs and storage class objects that you have today that are referencing the legacy volume APIs, even when the cloud provider controllers are removed from the core of Kubernetes. How this works is underneath the covers, Kubernetes will actually translate the legacy API into the new CSI API and it will redirect any volume operations that would have normally gone to the entry controllers to the equivalent CSI driver. So, oh, sorry, if you go back the page. Currently, the following entry volume types have a beta implementation of CSI migration that you can enable today in your clusters. And the plan is for these specific plugins, CSI migration features to be GAID in 1.24. So we have AWS, EBS, Azure Disk, Azure File, OpenStack, Cinder, GCEPD and these sphere volumes. All right, so moving on, what do you need to do to actually turn on CSI migration? So the answer depends on whether or not you're using a managed Kubernetes distribution or if you are managing your own or creating and managing your own cluster. So if you're using a managed Kubernetes distribution, you will need to double check the documentation for your distro to see how they're handling the CSI migration and if there's any steps they might require you to do. But in most cases, actually, the distro should be taking care of everything to enable the feature including installing the CSI driver. So if you're using a managed distribution, you very likely won't have to do anything to enable the feature, but it's good to double check just to be sure. Now, if you are managing your own Kubernetes flavor, then there are a couple of steps that you need to do to turn on the feature. First of all, you will need to install the replacement CSI driver into your cluster. And once you do that, then you'll have to enable the Kubernetes feature gates in a pretty specific order, which is documented in the link below. And so please take a look if you're in this boat to look at exactly the ordering sequence of enabling these feature gates. So there are some caveats with this feature. Even though it is using CSI under the covers, you will not be able to use CSI only functionality with the legacy API objects such as snapshots or cloning. The main purpose of CSI migration is to make sure you have feature parity with the legacy APIs. And the purpose is not to provide forward-looking feature compatibility. So if you do need to use the newer CSI features, then I think instead of using CSI migration, you will have to manually re-import the PVE object as an equivalent CSI volume type so that you can use CSI directly in your cluster. Another caveat is that some drivers have some sort of corner case and uncommon functionality that is not going to work with CSI migration. We have identified most of these behaviors and have already deprecated these behaviors in Kubernetes. So please take a look at the Kubernetes release notes with the specific drivers you're using in mind to see if you are depending on any of these behaviors. We especially encourage everyone to try out the feature in their own dev environments first to help us make sure that there's no major issues and that we catch any surprises before we end up GA-ing the feature in Kubernetes and removing the cloud providers. So that's mainly CSI migration in a nutshell. I think basically this is coming soon over to the next release or two. So please take a look if you're in a working in a cloud environment and using a cloud volume plugin and please reach out to us if you have any questions or concerns about the feature. All right. So the next feature I'm going to talk about is CSI Windows. This feature GA-ed in 122. And so to deal with the lack of Windows privilege containers support, the team has created this binary called CSI proxy. It runs as a service on every host and CSI drivers will communicate with it through GRPC to perform any privilege operations that it needs to do, such as formatting disks and melting disks. Currently it supports operations for NTFS-based file systems, Samba, and then iSCSI support is available in Alpha as well. And there's a couple of drivers, known drivers that have already implemented support for the CSI proxy API. I put a... So I did put a little asterisk next to the mention about no privilege containers support in Windows. That's because in 122 there's actually a new Alpha feature that will allow privilege containers in Windows. So what does that mean for the future of CSI proxy? I think if this new Alpha feature for privilege containers goes well and it matures to beta and GA statuses, then we can then remove the need for CSI proxy. And instead of this CSI proxy client doing GRPC calls, the CSI proxy client could instead just turn into a library call to make various direct calls to the Windows system. And so I think with that plan it also makes the migration between the CSI proxy model to a library model more seamless for drivers because the APIs will still remain the same, or at least significantly the same. So if you are writing a CSI driver and you want to support Windows, sorry, if you're writing a CSI driver and you want to support Windows, please take a look at CSI proxy and also go ahead and if you have questions, there's a CSI Windows Slack channel where you can ask questions or if you want to help out with the project, that's also great too. All right, so I think that's it for the deep dives on some of the major features that we have been working on over the last couple of releases. Now I'm just going to show a list of all of the features that we graduated in 122 and I'm going to highlight a couple of them. So graduating to GA, I think like we mentioned we have CSI Windows and then we have this other feature called passing the pod service account to a CSI driver. This is a very important feature for any CSI driver authors out there as this allows CSI drivers to authenticate on behalf of the pod. So this feature allows you to be able to support per pod ACLs on your volumes or whatever data that your CSI driver is accessing. And it's already being used in some ephemeral CSI drivers like the secret store CSI driver that we mentioned earlier. Another interesting feature that we've done in 122 is the read write once pod access mode. So this feature went alpha in 122 and this basically fixes a common misconception about access modes and Kubernetes. So the current read write once access mode that's available today indicates that a volume can be attached to one node at a time, but it doesn't actually indicate how many pods on that node can mount that volume. So with read write once PDs today you can actually have multiple PDs or sorry, you can have multiple pods scheduled on the same node and all be able to mount that volume. And that is sort of an unexpected surprise to a lot of applications there. And so now we have added this new read write once pod access mode where we are actually able to enforce not only the single node for attach but also that only a single pod can mount that volume. So yeah, that's a very interesting one to look forward to in 122. So I'm going to go on to 123 now and talk about some of the things that we're doing there. We are doing, again, we're doing a lot of things in 123 in the SIG. So I'm only going to highlight a couple of them here. First, we're planning to graduate two features related to FS Group to GA. So this first FS Group feature is called Skip Volume Ownership. This feature improves the time that it takes to mount volumes that have a lot of files in them by only updating the ownership of the files when the top level directory owner doesn't match what the FS Group says. So for volumes that have thousands of files in them, we've seen in some cases that this feature is able to bring them out time down from something that originally took more than 30 minutes and it brings it down to seconds. So this is a very important feature to enable in your pods if you're using FS Group and you have, and your volumes have a lot of data inside of them. The next FS Group related feature is called CSI FS Group Policy. This feature allows a CSI driver to explicitly opt into supporting FS Group. Previously, without this, Kubernetes uses a heuristic to determine which drivers will have FS Group applied with them and the heuristic is a bit flawed in certain scenarios. So this feature allows CSI driver to opt in explicitly. And lastly, for our GA graduation features, we are providing generic ephemeral volumes, which Sheng introduced earlier during this presentation. So I won't rehash that here. I think moving on beyond 1.23, we have a number of features that are currently in prototyping and design phases. So if any of these things sound useful to you, please reach out to us and reach out to us via our mailing list or Slack channels. We'd be happy to discuss further on any of these. And if you want to be involved in the design discussions too. In addition to these features, we are also working with other SIGs in Kubernetes to co-develop some other things. So we have in the data protection workgroup, we're working on design for change block tracking. And this is pretty important for enabling the ability to do efficient backups of volumes. And then in SIG apps, we are collaborating with them to make some improvements to the stateful set PVC lifecycle. And then SIG node, we're working on this initiative called Container Notifier, which is important for the snapshotting feature to be able to queuesse applications before taking a snapshot. And lastly, with SIG API machinery, we're proposing this new protection mechanism called Alien to prevent objects from being deleted while they are still being used or taken by another object. So yeah, as you can see, we have a lot of projects going on in various phases of design and implementation. If you're interested in learning more about any of these projects or you want to help out, please join us in our SIG storage meetings. Reach out to us in our Slack channel. And we will be happy to discuss any of these in more detail. All right, so thank you, everyone. This concludes our session and will be available for a Q&A after this. Thank you.