 Hello, everyone. Today, Jan and I will be giving an introduction and update to Kubernetes 6.0. My name is Xin Yang. I work at VMware at the Cloud Storage team. I'm a co-chair of Kubernetes 6.0, along with Sadali from Google. And I am Jan Schafranek, working for Red Hat, and I am tech lead together with Michel So here is today's agenda. First, we'll talk about what is 6.0. Then we will talk about what we did in 1.20 and 1.21 releases. We'll talk about what we are working on for the future and talk about cross-seq working groups and projects. Finally, we will talk about how to get involved. So what is 6.0? 6.0 is a special interest group that focuses on how to provide storage to pods in your Kubernetes cluster. 6.0 is scope is in the storage control plane. It provides a way for containers in the pods to consume block of our storage. This can be persistent long-term storage that leaves beyond a pod's lifecycle, or it can be fMoral temporary storage, which becomes available when the pod is started and goes away when the pod goes down. 6.0 is responsible for the lifecycle of volumes used by pods. This includes provisioning a new volume, attaching a volume to the node, and mounting it so that the pod can use it, unmounting, detaching, and deleting the volume when it is no longer needed, taking snapshots so that it can be used to restore the volume if the original volume is corrupted for some reason. 6.0 also looks at how to influence the scheduling decisions based on topology information to see whether the storage is accessible to a node and make sure what is scheduled to a node, which can have access to the storage. Also, 6.0 is responsible for managing storage capacity, managing quota based on the capacities or number of resources, and provides ability to expand volume if a volume runs low in space. 6.0 rich on the persistent volume and persistent volume claim feature. This allows storage vendor to create a volume and possess data in a volume which can be preserved even if the pod goes away. We have the Assorted Class concept. Assorted Class provides a way for administrators to describe the classes of storage they offer. Different classes might map to different quality of service levels. In dynamic provisioning, Assorted Class is used to find out which provisioner should be used and what parameters should be passed to the provisioner when creating the volume. 6.0 rich has been working on migrating from entry volume plugins to auto-tree CSI drivers. New features are only added to CSI drivers. Other than persistent volumes, there are also ephemeral volumes. Ephemeral volume is specified directly in a pod spec. It's mounted on the pod as a directory. Data can be stored in a file under that directory. Ephemeral volumes include secrets, config maps, and generic ephemeral volumes, and so on. It follows the lifecycle of a pod. So that's a brief introduction of 6.0 rich. Next, I will talk about what we did in the 1.0.20 release. In 1.0.20, CSI snapshot feature moved to GA. A snapshot represents a point-in-time copy of a volume. It can be used as a data source to create a new volume. This feature provides a basic building block for supporting data protection in Kubernetes. Backup vendors are standardizing on CSI snapshot APIs to build their backup solutions. CSI snapshot feature has Kubernetes warning snapshot CRDs, or snapshot controller and a validation webhook. Those should be bundled and deployed by the Kubernetes cluster distro. There's also an external snapshot or sidecar that is deployed together with the CSI driver. In 1.0.20, there are also two FS group related features moved to beta. The first one is non-recursive voting ownership. This feature allows users to skip recursive permission changes when mounting a volume. Traditionally, if your pod is running as a non-root user, you must specify an FS group inside the pod's security context so that the volume can be readable and writeable by the pod. However, there is a downside. Each time a voting is mounted, Kubernetes must recursively change mode all the files and directories inside the volume. This can be very expensive for larger volumes with a lot of small files. This could cause pod startup time to be very slow. With this beta feature in Kubernetes 1.20, we are providing a way to opt out of recursive permission changes if the voting already has the correct permissions. The second one is the CSI driver policy for FS group. The CSI driver now has a FS group field, which allows storage drivers to explicitly opt in or out of recursive modifications. This way, Kubernetes can avoid a needless modification attempt. This optimization helps to reduce voting mountain. There is a default policy called read-write once with the FS type. It is applied if a no FS group policy is defined. This preserves the behavior from previous Kubernetes releases. In beta 20, we also added a new feature called pass pod service account token to CSI. This provides a way to obtain service account token for pods that the CSI drivers are mounting volumes for. Since these tokens are valid only for a limited period, this feature also gives the CSI driver an option to re-execute node publish volume to mount volumes. This is what we did in 1.20. Next, I will talk about what we did in 1.21 release. In 1.21 release, we have this immutable secrets and config maps moved to GA. This feature allows users to specify the contents of a particular secret or config map that should be immutable for the lifetime of the object. For such secrets and config map, cupelets will not watch for changes to update the amounts for their pods. That will reduce the load on the API server. It also enables users to better protect themselves against accidental bad updates that could cause outages. In 1.21, CSI windows also targets GA. So, windows containers can be privileged, but CSI drivers need to perform privileged operations such as mount. So, we have a CSI proxy binary that runs directly on the host and performs all the privileged operations. CSI drivers communicate to proxy through a gRPC interface. We also have a few features that move to beta in 1.21. So, we have a storage capacity tracking that becomes a beta feature. Traditionally, the Kubernetes scheduler was based on assumption that additional persistent storage is available everywhere in cluster and has infinite capacity. Topology constraints addressed this first problem, but without this feature, pod scheduling was still done without considering that the remaining storage capacity may not be enough to start a new pod. The storage capacity tracking addresses that by adding an API for a CSI driver to report storage capacity and uses that information in the Kubernetes scheduler when choosing a node for a pod. And we also have a generic ephemeral volume feature moved to beta. Kubernetes provides volume plugins because life cycle is tied to a pod and can be used as a scratch space such as the built-in empty-door volume type or to load some data to a pod such as the built-in config map and secret volume types or CSI inline volumes. The generic ephemeral volumes feature allows any existing storage driver that supports dynamic provisioning to be used as an ephemeral volume with the volume's life cycle tied to the pod. It can be used to provide scratch storage that is different from the root disk, for example, persistent memory or a separate local disk on that node. All storage class parameters for volume provisioning are supported. All features supported with PBCs are also supported such as storage capacity tracking, snapshots and restore and ordering resizing. In Wendell 21, we also have the passport service account token to CSI feature moved to beta. As mentioned earlier, this was just introduced as an alpha feature in Wendell 20. We also have a azure file CSI migration moved to beta in Wendell 21. We also have a couple of alpha features in Wendell 21. The first one is CSI volume house monitoring. This was first introduced as an alpha feature in Wendell 19 release. In Wendell 21, we did a second offer due to a design change where we moved the volume house monitoring logic from an external agent to Kubelet. This feature enables CSI drivers to share abnormal water conditions from the underlying storage system with Kubernetes so that they can be reported as events on PBCs or pods. This feature serves as a stabbing stone towards a problematic decision and resolution of individual or volume house issues by Kubernetes. We also have this prioritizing nodes based on volume capacity feature that is a new alpha feature in Wendell 21. Without this feature, Kubernetes didn't take volume capacity into account when scheduling a pod that can run in multiple topologies. A large PV may be used by a PBC with a small capacity request, even if there are many suitable small PVs in other topologies. PVCs with a large capacity request may not find feasible PVs to use if too many large PVs are consumed by PVCs with a small capacity request. With this feature, the scheduler takes volume capacity into account in scheduling pods to ensure balanced resource usage. This prioritizes nodes based on the best matching size of statically provisioned PVs. This is what we did in Wendell 21. Now, I'm going to hand it over to Yang to talk about our future plans. Thank you, Xing. As six storage, we keep track of our features in a planning sheet. During Kubernetes 121 development, we had 40 features. So in the list below, I have just the most notable ones for the next release. We are graduating CSI migration towards GA. Here we basically remove all the entry code we have in Kubernetes for cloud providers and redirect all the storage operations to CSI drivers under the hood. So it should be invisible to users. You will be able to use the same PVS, PVCs, storage classes as you use today. But all the storage operation will be done by CSI drivers instead of Kubernetes. Right now, we have all cloud-based volume plugins deprecated in Kubernetes. And CSI migration is in beta. But it is off by default for the most cloud providers. The only one enabled by default in Kubernetes 121 is OpenStack Cinder. In Kubernetes 122, we want to have CSI migration on it by default for most of the other cloud providers. And finally, in Kubernetes 123, 24, we would like to have the CSI migration GA and finally remove all the entry volume plugins. Another feature we are evaluating for some time is volume expansion. It is in beta. It is in beta for a long time. And it works pretty well. However, there are still some nasty corner cases we need to iron out. And then we have a number of features in design and prototyping phase, again, only the most interesting ones here below. Recovery from volume expansion failures is one huge corner case of volume expansion. So it got a feature on its own. And as you know, Kubernetes, it retries after a failure. So it tries to reach the requested state, which in this case is a bigger volume. But if a user, for example, asks for two big volumes, like exabytes, the storage backend is probably going to reject the expansion. And again and again, as Kubernetes tries to resize, then the storage backend will reject it. And at the same time, we don't allow users to shrink volumes. So this user is basically stuck because they will not get those exabytes they asked for, but they cannot cancel the operation at the same time. So in this feature, we are trying to allow users to cancel a failed expansion in a safe and without races. Another feature is volume groups. And we want to allow users to put their persistent volume claims into higher level groups. And this will allow interesting features like taking snapshot of a whole group as a whole at the same time. Because if snapshots we have today, you can take snapshots of a single PVC. And if you want to take snapshot of more of them, then you need to do it one by one. And they will be taken at a different time, basically. So with volume groups, you could take the snapshot of a whole group at the same time. Generic data, generic volume populators expands on an idea we introduce with volume cloning and restoring snapshots. In the early days of Kubernetes, newly dynamically provisioned volumes were always empty. And today with cloning and snapshots, you can pre-populate a volume with either content of a different volume, that's the cloning, or with a restored snapshot. And data populators, with data populators, anybody could write a piece of code that can pre-populate a volume during dynamic provisioning with virtually anything. For example, if the populator could clone a git repository or put their content of container image or virtual machine image or basically whatever. We had volume populators in Kubernetes since Kubernetes 118. And in 122, the API should be complete and in beta. Container object storage interface is an attempt to bring object storage like Amazon S3 to Kubernetes pods. It has been in design phase for quite some time. There is a special workgroup for that. And now we are finishing the API and trying to go alpha in the next release in Kubernetes 122. And finally, we are trying to allow users to move PDCs and volume snapshots among namespaces in a safe method manner. And without races, which is, again, harder than it may seem. And it will take some time until we implement this feature. Next slide, please. So together with other six, we are working on data protection, which got its own workgroup on its own. Here we are expanding on volume snapshots again. And right now, right now, if you take a snapshot of a single volume, the application, it can be still running. So there can be a pot of writing to the volume. At the same time, we take the snapshot. Like most applications like databases can recover from partly written data. If you restore the snapshot, then you can see only part of the data written. So the databases can usually recover from that, but it's not ideal. So in this workgroup, we are trying to add some hooks into pots. So when taking a snapshot, we could poke the pot to flush the caches and transactions and everything and possibly phrase the file system. Then we can take the snapshot and then resume the application. So the snapshot will be consistent from the application point of view. And when this is combined with the volume groups I cut before, you could take a snapshot of a whole state for set and the snapshot could be guaranteed to be consistent. So this was the application side of backups, while change block tracking helps with the backend side. So backup software could do a div between two snapshots and get a list of change blocks between these two snapshots. So it could take a backup only of those backup, only of those changed blocks to speed up the backup and to save some storage capacity and also transfer bandwidth. Together with SIG apps, we are improving state for sets to allow expansion of their volumes and also to optionally delete persistent volume claims after you scaled down a state for set. Because right now, if you scale a state for set down, it will leave the volumes behind and it's up to the user to delete them manually. And finally, together with SIG node, we are trying to recover volumes from shutdown nodes better. Currently, when a node becomes unavailable from whatever reason, Kubernetes will not detach volumes from that node because the node can be still running, just it may not be able to talk to the API server. So detaching such volumes would corrupt the volumes because they are still mounted on the node. But what if we know that the node is shut down? If the node is shut down, then the volumes are not mounted, we are pretty sure. And we can detach the volumes and we can attach them somewhere else. So the ports that use the volume can be rescheduled to the other nodes and your application can resume much quicker than today. Okay, next slide, please. So these are the features now how to get involved with SIG storage. On the top, you can see a link to our home page with all the details about the SIG, our meeting times, nodes from the last meetings, list of GitHub repositories, subprojects, select channels, mailing lists, and so on. We have several meetings per week dedicated to individual subprojects, but the main meeting is every other first day at 9 a.m. Pacific time. We go there through our features planned for the current release and we can also discuss details and even pull requests. It's open for everybody to hesitate to join and if you have a topic you want to be discussed by the SIG, then please add it to the agenda. Some people prefer to read the code. So if you're interested in how storage works, you can read code in many of our GitHub repositories. You can list the issues that people report on GitHub and you can submit your code. It may be hard to find the right place where the code lives because, as I said, we have many repositories. Just ask on Slack and we can direct you the right way. We use labels on issues to flag issues that are easy to fix. That's the good first issue label or where we need help from the community. Next slide, please. I already mentioned our features tracking spreadsheet, so you can see what we are working on and if you find anything interesting there, don't be afraid to contact the feature owner directly. Again, ask on Slack when you are not sure. If you want to contribute a whole new feature, then please join our bi-weekly meeting. A new feature can be added at any time and we will guide you through the process. We don't have any specific process for new features. We just follow the generic Kubernetes one, so you will need to create a tracking issue. Other people can know that there is a feature in progress being developed and you will need to write the Kubernetes enhancement proposal, kept in short, and shepherd it through various reviews and through alpha, beta, and GA phases. Don't be afraid. Again, we are here to help you with those processes. We are always interested in new contributors. Every little pull request counts. It doesn't need to be a whole new feature. You can improve our documentation at unit tests, at E2E tests, or do some minor effectoring if you see that the code is ugly. It's up to you and your contribution will be always welcome. Next slide, please. Yeah, so we are at the end of our talk. Here is a list of other storage-rated talks at this KubeCon. Most of them actually were yesterday, so at least you can watch their recordings. And thanks for watching and we are here to answer your questions.