 Hello, everyone. Today, Michelle and I will give an introduction and update to Kubernetes 6-Storage. My name is Xin Yang. I work at VMware in the cloud storage team. I'm also a co-chair in 6-Storage along with Sadali from Google. Hi, my name is Michelle. I work at Google and me along with Yan from Red Hat are tech leads for 6-Storage. Our session today will include two parts. In the first half, we will give an introduction. In the second half, we will give an update and a deep dive. In the introduction part, I'm going to talk about some basic concepts in Kubernetes storage and how to get involved. Kubernetes storage provides a way for containers in the pods to consume block or file storage. Persistent storage is one type of storage that live beyond a pod's lifecycle. The technologies we heard the most in 6-Storage are probably PVC, PV, and storage class. PVC, persistent one-and-claim, is a user space object. It is a request by a user for storage. A PV persistent volume is a cluster scope object. It represents a physical volume on the storage system. A PVC and a PV have a one-to-one mapping. Storage class is in the cluster scope. It's a way for admin to describe the classes of storage. Different classes might map to different quality of service levels or other admin-defined policies. In dynamic provisioning, storage class is used to find out which provisioner should be used and what parameters should be passed to the provisioner when creating a volume. A pod is a group of one or more containers with a shared storage and network resources and the specification for how to run the containers. A pod is a user space object. A PVC is used by a pod. In static provisioning, a cluster admin creates a number of PVs, which carry the details of the real storage. The control plan can bind PVCs to PVs in a cluster. However, if you want a PVC to bind to a specific PV, you will need to pre-bind them. When none of the static PVs, the admin, the administrator created, match a user's PVC, the cluster may try to dynamically provision a volume specifically for that PVC. The provisioning is based on storage classes. The PVC must request a storage class and the admin must have created and configured that class for dynamic provisioning to occur. So here are examples of a pod, a PVC, and a storage class. The pod is using the PVC. PVC has capacity, access modes. It's a read-write once here and a storage class name specified. And in the storage class, there is a provisioner that determines what a warning plugin is used for provisioning PVs. The reclaim policy is retained here. That means the PV will remain along with physical volume on the storage system when the user deletes the PVC. If the reclaim policy is deleted, the PV along with the physical volume will be deleted when the user deletes the PVC. And allow warning expansion is true here, indicating that warning expansion can be requested by the user. And the warning binding mode can be either immediate or wait for first consumer. It is immediate in this example, indicating that a warning binding and dynamic provisioning occurs once the PVC is created. But this may result in unschedulable pods. So a class set man can address this issue by specifying the wait for first consumer mode, which will delay the binding and provisioning or PV until a pod using the PVC is created. Storage class also has parameters that are storage provider-specific and opaque to the Kubernetes. And next, I'm going to talk about FMR storage. FMR storage becomes available when the pod is started and goes away when the pod goes down. So here we include local FMR storage that includes EmptyDir, Secrets, ConfigMaps, and Donmar APIs. And we also have CSI FMR volumes here. And then there's also generic FMR volumes. So I will go over them. EmptyDir volume is first created when a pod is assigned to a node and exists as long as the pod is running on that node. As the name suggests, the EmptyDir volume is initially empty. All containers in a pod can read and write the same files in EmptyDir volume, though that volume can be mounted at the same or different paths in each container. And when a pod is removed from a node for any reason, the data in EmptyDir is deleted permanently. And EmptyDir volume can be used as a scratch space. A secret volume is used to pass sensitive information such as passwords to pods. You can store secrets in the Secrets Kubernetes API and mount them as files for use by pods. Secret volumes are backed by tempfs, so they are never returned to non-volatile storage. ConfigMap provides a way to inject non-convidencial configuration data into pods. When referencing a ConfigMap, you provide the name of the ConfigMap in volume. In this example, the ConfigMap is mounted as a volume, and all the contents are mounted into the pod at the path derived from the mount path and the key in the ConfigMap. And then next, I will talk about the downward APIs. So here we can see an example of the downward API volume. It makes downward API data available to the applications. It mounts a directory and writes the requested data in plain text file. And in this YAML file, you can see that the pod has a downward API volume, and the container mounts the volume at this specified location. And each element on items is a downward API volume file. So the first element specifies that the value of the pod's metadata labels field should be stored in a file named labels. So next, I want to talk about the CSI inline fmoral volume. So we set warning types to CSI in the pod spec and specify the driver name and the warning attributes. And for a CSI driver to support the CSI fmoral volume, it must be modified or implemented specifically for this purpose. A CSI driver is suitable for CSI fmoral inline volume if it serves a special purpose and it needs custom providing parameters like drivers that provide secrets to a pod. So the secret store CSI driver is an example. A CSI driver is not suitable for CSI fmoral inline volumes when provisioning is not local to the node or when fmoral volume creation requires warning attributes that should be restricted to an admin. For example, parameters in a storage class. Next, I'm going to talk about generic fmoral volume. The feature allows any existing storage drivers that supports dynamic provisioning to be used as an fmoral volume with the volume's lifecycle bound to a pod. It can be used to provide scratch storage that is different from the root disk. For example, persistent memory or a separate local disk on that node. All storage class parameters for warning provisioning are supported. All features supported with PVCs are supported such as storage capacity tracking, snapshotting, cloning, warning resizing. This feature is beta since 1.21 release and it is targeting GA in 1.23. So that's all for the introduction of fmoral volumes. So now I will talk about warning plugins. Kubernetes warning plugins include entry plugins, auto-tree flex volume, and CSI drivers. Some entry plugins, such as those fmoral ones that I mentioned earlier, will stay entry. But most of other entry plugins are either deprecated or migrating to CSI drivers. Michelle will talk more about that in detail later. Flex volume is deprecated. CSI driver is the recommended way to write plugins. The Kubernetes implementation of the container storage interface CSI has been GA since the 1.13 release. CSI is designed to be vendor-neutral, interoperable, and has a focus specification. It defines a set of storage interfaces so that a storage vendor can write just one plugin and have it work across a range of container orchestration systems. In the CSS spec, we have RPCs for volume lifecycle management. This includes provisioning support, such as create and delete volumes. And RPCs that make sure volumes are available for part to use, such as attach and detach volume and mount and unmount volumes. And it also has other functions, such as expand volume, snapshotting, cloning, holding house, and so on. So here is an example of a CSI deployment that shows various Kubernetes components. The CSI driver and the storage system that is used to persist the data. Here we have the CUBE controller manager on the master node. CSI driver controller plugin is deployed together with Kubernetes CSI external provisioner, external attacher, external resizer, and external snapshotter setcars. Note that CSI driver controller pod does not have to run on the same node as the Kubernetes master, but it's recommended to run on dedicated control plane nodes. The Kubernetes CSI cars are watching Kubernetes API objects, such as persistent volume claims, persistent volumes, volume attachments, volume snapshots, to detect create volume, attach volume, volume expansion, volume snapshots, requests. The setcars, called the CSI driver, and the CSI driver communicates with the storage system to complete those loading operations. On Kubernetes worker nodes, we have Kubelet and the CSI driver node plugin deployed together with the node driver registrar setcar container. Node driver registrar, such as driver information using node.getinfo from a CSI endpoint and registers the CSI driver with Kubelet on that node. Kubelet directly issues CSI node.getinfo node stage, volume, and the node publisher volume calls against CSI drivers to get info and mount volumes. So that's all for the basic Kubernetes story concepts. Next, I will talk about how to get involved. So here is the Sieg storage community page. It has lots of information to get you started. We have bi-weekly meetings on Thursdays, where we go through features we are tracking for each Kubernetes release and discuss any design issues or other issues added to the agenda. This is a good place for a new contributor to get started, join the meeting, and see how the Sieg works, what you're interested in, and get assigned to work on some tasks. The communication within Sieg is through the main list or the Slack channel. I included some resources for your reference. Here are docs that explain what are the Kubernetes story concepts and what is CSI. The last reference is an example to deploy the sample CSI host pass driver. For a new contributor who wants to contribute code, it's good to follow this example and learn how CSI works. So that's all for the introduction. Now I will hand it over to Michelle for the Sieg storage update. Thank you, Xing. So I'm going to give an update on some of the major projects and initiatives that the Sieg has recently completed and is also actively working on. I'll start with a deep dive on two major projects that we've been doing and then give an update on 122 and what we're targeting for the upcoming 123 release and then go into some longer-term projects that we're sort of investigating and designing. So let's deep dive into the first project, which is CSI migration. Next slide, please. This is a subarea of the overall cloud provider extraction effort, where the built-in cloud providers are going to be removed from Kubernetes starting in 1.24. The overall effort has been ongoing for a couple of years now, and we're finally reaching a point of maturity where we're feeling confident in switching over to the external cloud provider model. Now, persistent volumes has some unique challenges compared to the rest of the cloud provider components. The main challenge is that many of the original volume types are built directly into the core Kubernetes API. That imposes a very strict backwards compatibility requirement with this entry API, so we can't just remove the built-in volume plugins directly. So what we've done instead is CSI migration. The CSI migration is a feature that lets you continue using the existing API for your existing persistent volumes and storage classes. But underneath the covers, it actually makes calls to a corresponding CSI driver instead of going through the built-in drivers. So right now, all of the major cloud provider plugins have beta support for CSI migration, including AWS EBS, Azure Disk and File, OpenStack Cinder, GCEPD, and vSphere volumes. And all of these plugins are expected to GA their CSI migration implementations starting in 1.24. Next slide. So what do you need to do to turn on CSI migration? The answer depends on how you are provisioning your Kubernetes clusters. If you're using a managed Kubernetes distribution, then in most cases, you should expect that the distribution will take care of everything for you transparently. But it is best to double check with your provider to make sure that is indeed the case. If you are managing your own Kubernetes clusters, then you will have to install the corresponding CSI driver for your cloud and then enable the CSI migration feature gates in a very specific order, which is detailed in the link here. There's also a few caveats to watch out for. First, despite using CSI underneath the covers, turning on CSI migration does not enable CSI only features for your volumes that are using the older APIs. CSI migration is meant to be able to provide backwards compatibility with the entry API, but it's not designed for future compatibility of future CSI features. So if you need to use the newer features like CSI snapshots, then the best option is to actually re-import your persistent volumes directly using the CSI API. Another caveat is that some of the drivers have some rare corner cases and behaviors that might not be supported by the CSI driver. We expect these behaviors to be rare and not commonly used, so most people should not be impacted. Nevertheless, we have explicitly deprecated all of these behaviors that we're aware of. So please double check the Kubernetes release notes going all the way back to Kubernetes 117 and look for these deprecation notices for your cloud. Also because of this, it's important that you start trying out CSI migration in your non-production clusters so that we can catch any other behavioral differences that we might have missed as early as possible and then we have more time to address those issues. So in summary, CSI migration is coming up really fast within the next half year or so. If you're using one of these internal cloud providers, please help us out and start testing out your workload and workflow compatibility with this feature. We have the CSI Migration Slack channel and we'll be there to help answer any questions that you might have and be able to help you out with any issues. All right, so that is CSI Migration. Moving on to the next feature deep dive is CSI Windows. So we GA CSI Windows in the most recent 122 release. What this feature is, is a way to be able to run CSI drivers on Windows Notes. One of the biggest challenges that we faced was dealing with the lack of Windows privilege container support, which is something required by most CSI drivers to be able to do privileged operations on the file system and the mount points. So to deal with this, we created a binary called CSI proxy. CSI proxy runs as a Windows service and it exposes a GRPC endpoint to CSI drivers for doing privileged operations. The current operations that CSI proxy currently supports include disk and volume operations, operations on the NTFS file system, and Samba or Sips. ISCSI support is also available in the Alpha phase. There's a number of CSI drivers that have implemented CSI support on Windows, including AWS, CBS, Azure Disk and File, GCE, PD and a generic Samba driver. So if you're running on any of these platforms, please check this out. One thing to note, in 1.22, there is also a new Alpha feature put out by SIG Windows that adds privileged container support. So once that graduates in maturity, then potentially we can remove the need for CSI proxy and having a separate service to do the privileged operations. However, to retain backwards compatibility with existing CSI drivers, what we're planning on doing is to turn the CSI proxy GRPC client library into a normal library with using the same APIs. That way we can minimize the transition between the CSI proxy model to making direct library calls and that should mitigate any major changes that you might have to make in your CSI driver to transition between the two models. So if you're interested in the Windows work or if you have any further questions, we have a CSI Windows Slack channel where you can come in and ask any questions or get help on any issues you may have encountered. So now I'm going to give a brief overview on some other projects in the SIG. There's a lot of efforts going on right now, so I won't be able to cover all of them today, but I'll highlight a few. First, in 1.22, we GA two major features. The CSI Windows, which we just talked about, and CSI service account token. This is an important security feature that enables CSI drivers to authenticate using the pod service account token. This allows CSI drivers to be able to support per pod authentication instead of broader shared credentials that many systems may require today. One example of a CSI driver using this feature is the secret store CSI driver, which lets you mount secrets from an external secret manager like Volt. If you're a CSI driver author, please take a look at this feature and see if it can help improve your security posture. So moving on to 1.23, there's a few features that we're gonna target GA. A next slide please, thank you. The first two are related to FS Group and volume ownership. The first enhancement improves scalability when mounting very large volumes that contain a lot of files. This feature adds a new option to the pod spec that tells Kubelet to skip the process that updates volume ownership. Before this change, we've seen instances where mounting a very large volume could take more than 30 minutes, but after this change, mounting of a volume is back to a couple of seconds. So please take a look at this feature if you're experiencing slow mounts and you have very large volumes. The second FS Group feature is for CSI driver authors. It lets CSI drivers explicitly opt in to support FS Group. Instead of using a heuristic that was not accurate in a lot of cases. So if you're a CSI driver author, please take a look at this FS Group policy feature. The last feature that we're targeting for GA is generic ephemeral volumes. Qing talked about this earlier, so I won't go into a lot of details, but to summarize, this feature allows a pod to specify a volume template which will dynamically provision and manage volumes with the same lifetime as your pod. Any existing volume type that supports PVCs today will be able to work with this feature. So go ahead and check this out. All right, and so moving on, now I will highlight a couple efforts that we're actively prototyping and designing right now and where we could actually use a lot of help from the community. First up is non-graceful node shutdown. This effort is exploring ways that we can safely fence and detach volumes when a node is shut down and still ensure that we don't get into a split brain scenario. This is in a design and prototyping phase. So if you have any ideas or you have work in this area that you think will be useful, please join us in these discussions and we would appreciate any ideas or help that you can provide. Another interesting initiative that we're starting is to explore is volume snapshot namespace transfer. This project is looking at ways to allow easily moving volume snapshot objects to another namespace. And potentially depending on the ideas explored here, we could expand that to also look into ways to actually move volumes across namespaces as well. Another exciting area that we're starting. Next slide, please. We're also collaborating with a couple other SIGs on a variety of other projects. I'll highlight the container notifier effort. This is a feature that allows for sending custom signals to a pod and being able to define custom actions for that pod to take when processing that signal. This is the mechanism that we're exploring to be able to quiesce an application before taking a snapshot. So this is a key driver for being able to enable application consistent snapshots. Another interesting proposal that we're exploring with API machinery is leans. This is offering the ability to essentially lock an object and prevent it from being accidentally deleted. Now you may think, how is this different from finalizers? The main difference is that finalizers, their main goal is to ensure that all of the deletion handlers for an object are able to finish before the object is finally deleted from the API server. With finalizers, the big difference is once the deletion process starts, you can't reverse it and you can't undo a delete. With leans, the biggest difference is that leans will prevent the deletion process from starting in the first place. So this will help with the scenarios when you really want to protect some objects from being accidentally deleted. And so that's another exciting effort that we are working with API machinery on. And we're hoping to be able to use this to add some extra protection to secrets, especially in terms of protecting secrets that are used for volumes. All right, so as you can see, there are a lot of different projects that we are working on in sick storage. And we could use help all over in all of these efforts. So we always welcome new contributors. If any of these projects sound interesting to you, please reach out to us through our Slack channel or join one of our SIG meetings that we hold every two weeks. We welcome new ideas and new contributors, and we'll definitely try to help you get involved into the SIG. So that concludes our presentation today. Thank you very much for watching and we look forward to seeing you in the SIG. Thank you.