 Thank you. My name is Jan Schafranek. I work for Red Hat and I am a tech lead of 6 storage in Kubernetes. There was supposed to be Shing together with me, our co-chair, but unfortunately she got COVID. She's fine. Just she's positive and she can't go here. Sorry for the technical difficulties. I had like a beautiful presentation. We've worked on long time for that on that, but you won't see it. So For some part, we can go with all the slides. The first thing, who is 6 storage? 6 storage is special interest group. We look after storage in Kubernetes. We have almost. Hey! Thank you. So, well, now I can show you the agenda. So I will briefly cover who we are. I will dive deeper into CSI migration, which is one of the latest features we do. And then I will cover the past releases, the future releases and how to get engaged. So who we are? Officially, we have two co-chairs, Sat Ali from Google, Shingyang from VMware and two tech leads, Michelle O from Google and me from Red Hat. But it's not only us four people. We have many contributors from many companies. Shing did the numbers and we have more than 4,000 people on our select channel, on the main channel. We have more than 1,000 people on select channel just for CSI. Our bi-weekly meeting is attended roughly around by 25 people and throughout the time we got 32 different approvers in our packages and GitHub repositories. So we are pretty big group of people, but not everybody is active. Somebody just appeared, asked the questions like, and it's then silent. So we are always looking for new contributors and new coders. You don't need even to code. You can help with reviews. You can help with tracing bugs. You can fix our documentation, which deserves some work and so on. So don't hesitate. Just contribute some pull request. Ask on Slack. We are there for you. What we do actually is we maintain the storage APIs in Kubernetes. Means like persistent volumes, persistent volume claims, storage classes and everything around that. Dynamic provisioning, volume attachment, mounting, un-mounting, detaching, deleting volumes and related stuff like snapshots, resizes and that stuff. We maintain Kubernetes in three volume plugins and we have a deal with Signal that we will maintain part of the ephemeral volumes like secrets, config maps and that does the like real storage stuff, the mounting and they do the other stuff like getting the config maps and getting the secrets from Kubernetes API server. We also maintain Kubernetes, sorry, we maintain, yeah, we maintain Kubernetes implementation of CSI. We maintain all the sidecars and quite surprisingly, we don't maintain too many CSI drivers. Most of the CSI drivers that Kubernetes ships are owned by SQL provider. These are the CSI drivers for the clouds. For example, Rook has its own CSI driver and Rook and so on. We maintain basically NFS and couple of similar CSI drivers, but yeah, that's what we do. Now I would like to dive deeper into CSI migration. Unfortunately, we started late and I don't even have time, so we will barely scratch the surface. So what is CSI migration? As you probably know, we have some entry code in Kubernetes, GitHub repository for volume plugins and at the same time, we have CSI drivers that do basically the same thing. So we would like to move the code from Kubernetes completely, wipe it out and under the hood silently route all the storage calls to the CSI drivers so we don't need to maintain everything twice. The same idea got six cloud providers. They want to move the cloud providers from entry from Kubernetes, move it away, so the volume plugins must naturally follow. We are not migrating any data. The data state stays where it is, although the CSI migration term could resemble data migration, but we are not migrating anything anywhere. We are not even changing the API, so if you have entry volumes, entry PVs, entry storage classes and your stateful set deployments that use that, you don't need to change anything in theory. If we did our homework right, you just upgrade to a version that has CSI migration enabled or you enable it manually while it's off by default and you don't need to change anything. Everything should work. And of course, the cloud-based volumes will go first and the others will slowly follow. We are moving portworks and stuff. Right now, who knows who will move the other ones? So brief introduction how currently entry volume plugins handle all the storage requests. So on the left, there is a user that creates entry storage class and PVC. On the right, there is a PV controller, persistent volume controller in the control manager that decides it needs to provision a new volume. So this diagram shows dynamic provisioning. So the PV controller sees the claim, it sees the storage class, it sees its entry storage class, so it finds the volume plugin, calls it. This is a simple function call, nothing complicated. The volume plugin calls the cloud provider and the cloud provider uses some cloud API and provisions of volume. So now the same scenario with CSI migration enabled. It starts the same. The user has exactly the same storage classes before. No change there. It creates the same storage class. They create the PVC, PV controller wants to provision. PV controller doesn't have any entry volume plugins. They are gone. So what does it do? It translates the storage class, the entry storage class to CSI storage class using CSI translation library. And since it has a CSI storage class now, it knows what to do with CSI. It just marks the PVC for dynamic provisioning. And on the CSI driver side, in the CSI driver pot, it looks this way. There is external provisioner, which basically translates the Kubernetes API objects into CSI course, into CSI GRPC course. So the external provisioner, it says, I have a PVC, I need to dynamically provision it. And it says, entry storage class. Because the translation in the Qube controller manager, it happens just in memory. So the external provisioner sees the entry storage class. It uses the same library, translates it to CSI, and it handles it in the CSI way. And of course, the driver calls the cloud. So all these translations, they happen only in memory. We can't update storage classes. We can't update PVs in the API server. Basically, these objects are mostly immutable. Most of you probably have already heard that. And for good reason. If we change the objects during, I don't know, attaching or during dynamic provisioning, things could go very, very racy, very, very wrong. So they are immutable. So everybody who processes entry PVs, entry storage classes now has the library. They can translate it to CSI and use it after that. And all the other components, like external attacher or attach-detach-controller, resizer, everything, everybody does this in memory translation. So why it's so complicated? Because the API is bad. Well, it's not bad, it's good. But it doesn't allow us to change the object. And even if we could change the objects, and we would change the storage class and PV, and you decide that you don't like the new version with CSI migration is broken, and you want to downgrade, you would downgrade to a version where you have the CSI objects. So you couldn't really downgrade. But in this case, we keep the entry PVs, entry storage classes as they are. So you can upgrade, downgrade as you want. There is a link to CAP and design. It's pretty complicated. Like the picture I showed, it's just barely scratching the surface. We have a schedule. The schedule you can see we started in Kubernetes 114. That's three and a half years ago. We had the first alpha. But in Kubernetes 124, the first two are GA. OpenStack, Cinder, and Azure Disk. GA means it is the CSI migration is on by default. The entry volume planings are not really used. And you cannot turn it off. So you must use the CSI drivers there. The others, most of the other entry volume plugins based on cloud will follow very shortly. Azure file, Amazon EBS, Google persistent disks will follow in 125 and vSphere probably in 126. All these timelines is kind of work in progress and subject to change. We switched this with the GA a couple of times already. Now it looks like we are really going for it. And ChefRBD and Portworx, we just started moving the entry. Maybe it will go beta in the next release. Maybe not. We will see. And also when we are doing this CSI migration, we are sometimes deprecating features that were in three and are not supported by the CSI drivers. For example, for vSphere, if you want to test CSI migration, you need vSphere version 7.0.2, which is pretty recent one. So if you have anything based on 6.5, 6.7, it's time to upgrade it to 7.0 something. So if you want to test it, if you use some managed Kubernetes in the cloud or use some enterprise Kubernetes distro, consult your Kubernetes vendor. They should tell you what to do. And also they will handle the migration during upgrade to GA version. So we just follow their documentation. If you are on your own and you use one of the Kubernetes, you, of course, must install the CSI driver, the replacement CSI driver, and either you test it while the CSI migration is still off by default, then you need to just enable a couple of feature gates. Nothing special here. There is only one cache. You need to update to flip them in a certain order, like kube control manager first, and then kube less. If you do it in wrong order, bad things will happen. And why are you flipping the gates on nodes? You should drain the node, flip the gate, and encode on it. So like the migrated and non-migrated volumes are not mixed on one node. The whole node must be either migrated as whole or non-migrated as whole. So for the features, in CSI migration, we support only the features that were supported before the migration. So if a volume plugin supported the size, the same volume migrated to CSI will support the size, but it will not get new features that CSI driver offers. So you will not get snapshots, you will not get cloning. You must use the CSI PVs right away. If you use entry, that will not work. And I already told you about the deprecations. Please read these notes very, very carefully. And finally, that was CSI migration. Now, brief overview of what we did in Kubernetes 123. If you had any issues with FS group being applied too slowly because we applied recursively on every file on the volume, there is a way how to skip it. So it's applied only once when the port first starts, but all the other ports will skip it. This is now GA. We also allow CSI drivers to opt out from FS group at all. So if CSI driver provides a volume that doesn't have like POSIX or the Khan support FS group, for example, NFS share with Rootsquash, the CSI drivers have option how to tell it Kubernetes and Kubernetes will not try to apply FS group. And finally, generic FMR volumes are GA, which is like empty there on steroids, much better version of empty there. As beta, we allow CSI drivers to apply FS group by themselves, but it is only for CSI drivers that have mount option or something similar to apply FS group. All the other CSI drivers should depend on Kubernetes to apply the FS group. And we are moving with CSI migration in 124. And as alpha, users can now cancel volume resize if it failed on the storage backend. So, well, you know the Kubernetes, it always retries, retries, retries, but this storage backend always says no, no, no. So you can now cancel the resize. We are honoring reclaim policy in PVs in better way than we did, because if you delete PV objects manually, sometimes the reclaim policy was not honored and we left orphan volumes in the storage backend. Now it's going to be better. We will always honor the reclaim policy even if you remove, if you delete the PV manually. We are working with C-caps to remove PVCs automatically from SQL sets. And again, there has been CSI migration. And it was the time. Awesome. So, and for the rest, I should, yeah, well, I closed it then. I had a, sorry, I have a recording from Shing. And I hope it's going to work. The audio is going to work. Can you pass the audio from... Boring expansion was introduced as an other feature in Kubernetes 1.8 and it went beta in 1.11. And with Kubernetes 1.24, it is finally GA. This is a big milestone. This feature allows users to add to their PVC objects and specify new size in PVC spec. And Kubernetes will automatically expand the volume and underline storage backend and also expand the file system in use by the pod. This can happen either online or offline, depending on CSI driver's support. Hormone has done lots of work to bring this feature to GA. There's also this recovering from resize failure feature mentioned earlier that is also aimed to make this feature more robust. The second feature I want to highlight is CSI storage capacity tracking. CSI storage capacity tracking was introduced in 1.19, beta in 1.21, and now it is GA in 1.24. Storage capacity tracking adds an API for CSI driver to report storage capacity and uses that information in the Kubernetes scheduler when choosing a node for a pod. This feature is especially important for local volumes, while capacity is bound to each node. Thanks Patrick for driving this feature to GA. We also have a beta feature, Volume Populator in 1.24. This is an important feature for the backup and restore use case. The Volume Populator feature allows us to provision a pvc for an external data source, such as a backup repository, not just for another pvc or for a volume snapshot. In addition, it allows us to dynamically provision a pvc, having volume populated from that backup repository and owner the wait for first consumer volume binding mode during restore to ensure that volume is placed at the right node where a pod is scheduled. When there is a request to create a pvc with a data source, the Volume Populator controller makes sure pv is created and populated with data from the data source and binds with the pvc. To use this feature, the any Volume Data Source feature gate needs to be enabled. It is beta in 1.24, so the feature gate is enabled by default now. We also worked on a few other features in 1.24. Volume House monitoring feature allows CSI driver to communicate back to Kubernetes regarding Volume's condition after it is provisioned and used by a pod, so that Kubernetes can report an event on a pvc or pod if the volume becomes unhealthy. It has controller and agent side logic. This feature was originally introduced in 1.19. In 1.21, the agent side logic was moved to Kubernetes. In 1.24, we did an update and added Volume House into metrics on the Kubernetes side. In 1.24, we also introduced a new AVA feature, non-graceful node shutdown. This feature allows stave workloads to fail over to a different node after the original node is shut down or in a non-recoverable state such as hardware failure or broken OS. You might have heard about the graceful node shutdown feature and wonder what is the difference between these two. The graceful node shutdown can be graceful, only if the node shutdown action can be detected by a tubelet ahead of the active shutdown. However, there are cases where a node shutdown action may not be detected by a tubelet. This could happen either because the shutdown command does not trigger the system D inhibitor locks mechanism that tubelet relies on or because of a configuration error. To use the non-graceful node shutdown feature, you must enable the node out-of-service volume detach feature grid for Kube controller manager. Manually set the out-of-service tint on the shutter node. The parts on the shutter node will be detected, will be deleted. Persistent volumes attached to the shutter node will be detached. And for stable sets, new parts will be created successfully on a different running node. And next, I want to talk about this new AVA feature, control volume mode conversion. Without this feature, it is possible to create a PVC from a volume snapshot with the volume mode that is different from the original volume mode. This could lead to a potential security issue even though there is a CDE in the kernel. On the other hand, converting the volume mode from file system to block when creating a PVC from volume snapshot is being used by backup vendors for more efficient backups. So we introduced this AVA feature that allows Kubernetes to check whether the user has the permission to convert the volume mode. If not, reject the request. This way, we can support this value use case only for authorized users. There are also some deprecations and removals in 1.24. Volume snapshot V1 Beta1 API is removed in 1.24. Please update to volume snapshot V1 API as soon as possible. CSI storage capacity V1 Beta1 API is deprecated in 1.24. And it will be removed in a future release. This version less than 7.02 is deprecated in 1.24. This is related to the CSI migration feature. We recommend users to upgrade to 7.02 and hire as soon as possible. 1.24 was just released and we just started with 1.25 planning. I will quickly go over what we're working on in 1.25. There are a few features targeting GA in 1.25. The first one is CSI FMO volume. For this feature, we set a volume type to CSI in a pod inline definition and specify the driver name and volume attributes. For a CSI driver to support CSI FMO volumes, it must be modified or implemented specifically for this purpose. A CSI driver is suitable for CSI FMO inline volumes if it serves a special purpose and meets custom per volume parameters, like drivers that provide secrets to a pod. Secret storage CSI driver is a good example. A CSI driver is not suitable for CSI FMO inline volumes when provisioning is not local to the node or when FMO volume creation requires volume attributes that should be restricted to an administrator, for example, parameters in a storage class. We are also planning to move local FMO storage capacity isolation feature to GA in 1.25. This feature provides storage usage isolation for shared partitions. There is also a delegate FS group to CSI driver instead of QBITS feature that we are targeting GA in 1.25. We are also targeting a few features to beta in 1.25. This includes CSI volume health, recovering from resize failures, and non-graceful no-shutdown. In 1.25, we are also working on a few alpha features. COSY is a project we have been working on for several releases now. COSY proposes object storage Kubernetes APIs to support orchestration of object store operations for Kubernetes workloads. It also introduces GRPC interfaces for object storage providers to read drivers to provision buckets. COSY components include a COSY controller manager that binds COSY-created buckets to bucket claims. This is similar to how the PV controller binds PVs to PVCs. COSY components includes a COSY set card that watches COSY Kubernetes API objects and calls COSY driver. COSY components also includes a COSY driver that implements GRPC interfaces to provision buckets. We also have the secure Linux relabeling with mount options that tries to speed up container startup time and avoid the change in each file on the volumes recursively. And there's also volume snapshot namespace transfer and provisioning volumes for cross-namespace snapshot PVC. Those features are also targeting alpha in 1.25 release. We also have a few features that are in design or being prototyped. We also have cross-seq working group and projects that are listed here. I want to mention change block tracking or CBT. This is a feature that the data protection working group is actively working on. This feature identifies blocks of data that have changed. It enables incremental backups to identify changes from the last previous backups, writing only changing blocks. Without CBT, backup vendors have to do full backups all the time. This is not space efficient, takes longer time to finish and needs more bandwidth. So we have done a POC of CBT and design is in progress. We're trying to target alpha in 1.25. There's also runtime assisted mounting of persistent volumes that is a project we co-owned with a signal. We also try to bring it to alpha in 1.25. Another feature is in-use protection Another feature is in-use protection for links. We co-owned that feature with APM machinery. This feature proposes a generic way to protect objects from deletion while it is in use. We are targeting alpha in 1.25 release. So those are all the features that we want to talk about today. Now let's talk about how to get involved. Here is a community page that shows a lot of information on how to get started in six storage. We have bi-weekly meetings that happen every second to Thursday at 9 a.m. Pacific time. In that meeting, we will go over the features that we are working on for every release and we will discuss about designs, talk about some PRs that meet the tension. Join that meeting and you can learn what we are doing there. If there is anything you are interested in, you can maybe pick it up and help contributing. We also have a mailing list that is shown here. Join that mailing list, you will get invites to our meetings. There are also Slack channels. You can also ask questions on the Slack channels. That's all we have today. Thank you all for attending. Bye-bye. I just noticed there is one thing I forgot about. We are deprecating effects volumes. If you are using effect volumes, please move to CSI. That's basically what I wanted to cover here. At the end, there are a couple of links for the newbies, like the landing page of storage, some concepts about PVC and this kind of stuff. If you are new, this is a great place to start. I am now open for any questions. I'm not sure how much time we have. He said we can go five minutes over, which is very nice of them. Wait for the microphone so that your question gets recorded for the recording. Who would like to start? Thank you, Jing. I think Jing is watching. There were no online questions at the moment, but if there are, I will flag you. Questions, come on. I know this is a very talkative group. I've known you all well for years, especially the front row over here. Okay, here we go. Hang on. I was wondering about non-graceful shutdown of the node. Based on what information KCM is able to detect whether the node is shut down or not? Right now, it is a very manual process. Somebody must add a special tank to a node that the node is really shut down, and also the person must make sure that the node is really shut down and will not come back in any foreseeable future, and they will start collecting stuff. Of course, people are thinking about some automation based on cloud API, pick up based on IPMI or whatever. This will come in the future, I believe. Hello. Do you hear me? We are using a very old provider, which is the fiber channel. We don't have any CSI provider. What will happen with this kind of volumes? So, if we want to migrate an entry volume to CSI, we must have replacing CSI driver. We don't have one... We don't have a good one for CSI for... We don't have a good one for FabryChannel, so we are not migrating them. We will stay entry until there is good replacement. At the same time, I know that all the storage... Not all, but most of the storage vendors have their own CSI drivers for iSCSI, NFS, and FiberChannel. So, you should start using the vendor driver, if you can, and leave the entry TVs for FiberChannel for iSCSI. Just leave them there. The support until we have the replacement CSI driver. Does it answer your question? Don't worry. Other questions? Okay, hang on. Hi. Are there any plans for supporting, like, the real production one, the host of CSI driver? Who's past CSI driver? Who's past CSI driver is a test tourer. It's not anything like production ready. Yeah, I know. I wonder if there is any plan or... What's the use case, I would ask? So, we are using the local disks for our database, and we would like, for example, to set up some quotas on the local disk, right? Why can't you use the local static volumes, like local volumes API instead of CSI, or topoLVM or something like that? So, the topoLVM, it is something that brings you another layer, and we are worried about performance, and we require also an XFS partition for our use case. So, something like a production ready host buff would be really great. Okay. Like, again, it's a test tourer. We stuff like weird stuff that we need to test. Yeah, I know. We use it for testing of CSI features, so we... There are some hooks and this kind of stuff, if you don't want that in production. But the local volumes, like, what's the just local volumes? It should just work. But there is no dynamic provisioner for them. Well, so, hold on, hold on. So, either you want dynamic provisioning, or end LVM or something that does the dynamic provisioning, or you want to disks, like raw disks. You can't dynamically provision raw disks. Why not? Like, maybe somebody who brings the disk and connects it to the machine, right? You can call it dynamic provisioner, but... So... Well, if you create directories dynamically on some partition that is attached to your volume and you are able to set quotas on them, this is something like... It is dynamic provisioning. Yeah, it is, but yeah. I guess this is a special use case, like, gather some community right as a CSI driver. I did. I wonder whether... I'm duplicating the host pub, so... Yeah, we don't have such plans with the pub, really. I like this idea, though. It's a community. Gather the community and build it, right? All right, here we go. I'll be out of time for, like, two more questions. No, I think... Because we've just built it. We just released an open source operator for databases. And actually, I think that's a use case that we... Especially on bare metal, you can... You might want to explore. So... Yeah, two years ago in San Diego, I was actually talking... Asking about, I mean, the same questions to Rook people. They took me for crazy, you know, at that time. I remember that, but... I think... We can talk, maybe. Yeah, but I think, I mean, creating logic... I mean... Physical volumes and logical volumes on the fly. Also, with separate permissions. I think that's something that... I think could have a use case. Maybe it's niche, but... For high-end... Like, you're free to... Yeah, yeah, okay. Yeah, we already contributed to the databases, so... I think if we can get some app on the storage, that would be good. Thank you. All right. I think that's probably the time we have, but Yan can probably stick around for a few minutes if you have any extra questions for him. But I just want to say thank you, Jing. We hope you feel better soon. Thank you, Yan. Can we give him a hand? Big round of applause. Thank you.