 Alright, everybody. So as we have here, our talk is going to be about what's going on with storage and new things that are coming up in Kubernetes as they relate to storage. How many of you use Kubernetes today? Any of you? Okay, cool. So most of you are probably already familiar with the basic concepts and stuff like that, so that will make it a lot easier. So my name is John Griffith. I'm an engineer at Red Hat. Most of my work these days is around Kubernetes, a Kubevert project, and storage. So those are the kind of things that I do. I've been doing a lot of container-type stuff for a number of years, so it's kind of the thing. A quick recap on some of the basics on storage, just in case some of you don't know. It's kind of, there's some interesting building blocks and principles that we use inside of storage. So the base building block is a storage class. A storage class is basically a definition that allows you to kind of define a type. An administrator can kind of use this in a fungible way to set it up to be a back-end or a QoS service or SSDs or something like that. There's all kinds of interesting options. The next thing is the PV, which is the physical volume, and that's the actual allocation of storage on a back-end device. It's non-namespaced. It's a global thing. It's also something that isn't used directly, usually, by an end user. The end user actually gets into what's called a claim. So it's a PVC, and that's an actual request from a user to use a piece of storage from a PV. So just a quick breakdown on that. So the big thing about everything that I'm going to talk about is it's all related to CSI, which is the container storage interface. CSI is the direction for Kubernetes going forward in terms of storage. Up until recently, there have been a number of options. There have been a number of different entry providers. There have been a flex volume provider, and then a whole bunch of different out-of-tree providers that kind of did their own thing. The whole intent of CSI was to put together a standard interface and move all of the storage plugins to kind of a common behavior, a set of behaviors and expectations and things like that and sort of tighten that up. We went GA with CSI as of Kate's 1.13. So it's a real thing now. And now it's kind of in the stage of continuing to grow and getting more and more folks to write plugins for it. And then also we're working on converters to convert you from old architectures that were non-CSI into CSI. One thing that I'd like to point out about CSI that kind of gets lost a lot of times, in my opinion, is that the intent initially was for it to not just be a Kubernetes thing. So truly we wanted to do Docker, Swarm, Mezos, all the different things. So that's kind of the idea. So it would be a pretty cool effort. All right. So now onto features. So everything that I'm going to discuss is going to be centered around CSI. There's no discussions about backporting any of these features to Flex or anything like that anymore. We did that for resize, but that's going to be it. Just because I talk about these things doesn't mean they're actually ever going to happen. So these are things that we have put on the roadmap and talked about. Most of them have somebody signed up, but you know how it goes. Things could get squashed. They could get changed drastically, so you never know. And then finally, if any of you have any use cases or feedback on these types of things, I'm going to ask some questions later, but also reach out to me. I'd love to get input from folks that are using Kubernetes and see what we can do. All right. So first up, volume resizing. So in 1.10, we added support for resizing a volume that was not in use. Currently, getting ready, what we were targeting for 1.14 is the ability to actually resize a volume that while it is in use. So there's a lot of extra things that we need to do there, right? So you need to do things like quite as the file system and deal with file systems and all that sort of thing. So it's a pretty massive undertaking. I'm not sure when that may or may not hit, but that's kind of the plan right now. As far as use cases, the number one use case that comes up is I have a database, my SQL database. It's getting full, but I don't want to take it down to actually increase my volume. Personally, I would shut down my pod and increase the volume and come back. Some people want to live on the edge. Topology awareness. This is another one that's kind of interesting. So with the CSI plugin model, the thing that's interesting is the plugins are actually external. So just like any other pod that you deploy or anything else inside of Kubernetes, you're deploying a plugin. What's interesting about that for some storage devices is they may have requirements in terms of where their software runs inside of the cluster. So what's being worked on today is the ability for a external controller or CRD to be running in the controller that actually publishes information about the topology and the different nodes and what their capabilities and characteristics are. That way, when you deploy a driver or a plugin, it has the ability to go and figure out which nodes it can or cannot run on efficiently. Does that make sense? It's kind of straightforward. Okay. Local PVs is another one that's currently available. It's targeted to GA in 114. Typically, a lot of people in cluster scenarios are using shared storage of some sort, whether it's iSCSI or NFS or even Fiber Channel. It's still popular. But apparently, there are use cases where people actually want to just use the local disk on a node. So that's pretty cool for performance. It's great for things like that. It's great for doing things like allocating an entire disk just to yourself so you don't have any contention issues. You get all kinds of performance and you get some security out of it if you need it, that sort of thing. Of course, that means that you can't do fancy things like failover to another node and things like that and take your data, which is kind of a problem. But anyway, that's continuing to advance and be flushed out. So I think there'll be some really good stuff in the next release to kind of make that better. Snapshots. Snapshots is another really big thing. Just to kind of recap, a snapshot is a point in time copy or record of a volume or data set, your standard storage concept. They're made available back in 1.12 as alpha. Again, that continues to grow. It runs as an external CRD as well. So there's an external controller, a snapshot controller that you load up and you run it, your admin configures it and everything's set. Sorry about that. So some of the things that are currently going on there are adding additional features like group snapshots. So actually having a group of volumes defined and snapshotting them all simultaneously. And then with that, along goes consistency handling. So again, we're back to the whole deal of being able to freeze applications, quiet file systems and things like that and take point in time snapshots. So I wanted to give kind of on some of these an example of what the manifest looks like. So you can see it's your standard API type. Your kind is a volume snapshot content, which is our external CRD. And then you can define the source, which is going to be your PVC that you want to snapshot. Okay, the next one is inline volumes and I'll be completely honest with you. I've been in a lot of discussions about inline volumes and some of it's still really foggy to me. So currently today in Kubernetes with non-CSI, you have the ability inside of your pod spec to define a volume and it will just allocate storage on the node and pass that into the container. So the proposal here is with CSI, you can't do that today. CSI assumes everything's external. So the proposal is to actually bring that behavior forward into CSI and let you do that. So there are two types of inline volumes that are being worked on for CSI. One is persistent and one is ephemeral. So persistent obviously it means it's durable and you can delete the pod and you can get the volume back. Ephemeral of course, it will when the pod is deleted, that volume will be deleted as well. But in both cases what's interesting is this is not actually a standard PV-PVC model. This is actually something that's created on the back end outside of that PV-PVC context. So here's an example of what the proposed manifest would look like for an inline persistent PVC. So you'll notice the thing we have down here is a volume handle. So that would be some sort of identifier that the plug-in could pass to the back end device to know what piece of storage we're using. And it must exist already. So an admin is typically going to have to create that or something like that. In the ephemeral case, we don't pass that in and what happens is if the back end supports it, it will receive an instruction from the CSI controller and it will tell it, hey, I need a piece of ephemeral storage to attach to this node and it will do that. In either case, what these do is they bypass the provisioning aspect of CSI. All right. Now these last two are ones that I'm working on so I can actually talk in more detail about them. So the first one is the ability to do a namespace transfer of a PVC. So currently, PVCs are namespaced and PVs are not. So what that means is once you create a PVC and you put data on it, that's great, but it is relegated to that island. You can't take it out of that namespace. So what we'd like to do is add the ability to transfer an existing PVC into another namespace. This lets you do interesting things, in my opinion, like having a trusted user create a template of some data set or an image, for example, if you're doing VMs inside of Kubernetes, and then provide those things to users in the cluster or even a set of secrets or whatever it might be. But it's also really useful for things like volume clones, snapshots as well, because you can take a clone or a snapshot of these things, but what if you want to put them into another namespace inside of your cluster? Today you can't do that. Well, you can, but it's really, really hard and it's very manual and you have to be an admin. So the idea is to give the ability to do those things. The next one that we have is a thing that we call populators. And the idea of populators is to make PVCs inside of Kubernetes declared of just like everything else. So what I want to do with a populator is I want to be able to say, hey, give me a PVC with this populator type that puts some data on it, right? So this could be a repo from GitHub. It could be a TGZ file on an HTTP server. It could be a blob inside of an S3 object store, whatever it might be. So the idea is, again, you can have back to this model of template type things or known good data sets and things like that. You can actually just have your spec specify exactly what you want and get it in a declarative manner instead of allocating a volume and then populating it with the data and then passing it into your pods or your workloads. And we're going to look at these two. The proposed manifest for the populator, I wanted to show what's kind of interesting about that is I'd like it to have multiple types. So this is where you could specify GIT or S3 or HTTPS. And I'd also actually like to make this extensible so that an admin could create additional types and implementations on this if they want. This might make a little more sense in a minute when we walk through it. Here's an example of what it would look like if we just swapped out GIT with an HTTPS location. So with that, I'm going to, is that a decent font size for everybody? Is that okay or do you want it bigger? Up a little bit? Alright. I couldn't tell if you were flipping me off. I can certainly do that. So this is not my laptop, so bear with me. Let's try. Oh, that's worse. Man, that's terrible. Alright. There we go. Alright. Let's do that. Of course, the screencast is going to play and it's probably going to be the wrong color. Oh, that should work. What I'm going to do is I got burned really bad doing live demos yesterday. So last night I did screencasts of these. I'm learning. So this is going to be the example of the namespace transfer. So you can see here what we have is I just did a get namespaces and you can see I have the default namespaces and demo NS or demons. I have two namespaces on the system right now. I'm also going to go and just kind of show there's no pvcs on the system. I'll run a kubectl get pvc using both namespaces just to show. And then the short round I'll just show that there's no pvcs. You can't have any pvcs if there are no pvcs. So now let's take a look at a spec for a pvc. We didn't specify a namespace which means it will go into the default namespace. We'll do a create there. And if you look now you'll see. So we now have a pvc. pvc1 is bound and it's ready to go. The next thing that we'll do is we'll go ahead and check and just verify that in the demo namespace we can't see it. So we get no resources found there as we would expect. Here's our pv so we know that the pv is there and everything is cool. So the next thing we want to do is we want to create this thing called a transfer. So we have this external controller, the CRD running. We have a change running inside of the controller for the persistent volume that is actually going to look for annotations on the pvc that says hey I want to do a transfer. So right now I'm using annotations for this. Long term it will either be a new API type or it will be a label or something else. But what this actual manifest is, this is the exact same pvc1 manifest. I just added this transfer destination annotation. So what's going to happen is the pv controller is going to see that annotation and act upon it. So we'll do an apply on that and then now if we look at the pvc again you're going to notice some things were added. So we're a little wide on our output but you can kind of see. So here's the transfer destination annotation that I added. But then also the I guess I can look at this screen. The other thing you'll notice is the controller added this other annotation here. So that indicates and lets me know that the controller did pick up my annotation and it is pending a transfer. So the way this works right now what I wanted to do is make sure that I wasn't like introducing a whole bunch of new security problems or quota problems or anything like that. So the way this works is this transfer is pending and staged until the pvc is deleted from the user that currently owns it. So what's going to happen is they're going to delete that pvc and when that gets deleted it's going to trigger the controller to say okay change the reclaim policy on the pv to keep it and not delete it. But only allow it to be claimed again by the new namespace with the name that was specified. So there's two pieces. It has to be in that namespace and it has to be the name that we put in the transfer. So I delete the pvc and see you can't see it anymore. I'll show that again it's not transferred yet and then finally you can see now the pv is still there. Now normally what would have happened in the default scenario without any changes is that pv would have been deleted. So the reclaim policy by default is delete. So now oh I forgot sorry. So what I did now is I had I changed in the new namespace I went ahead and created the new pvc. So I just did a manifest that had that and so now in the default you see no resources found. Now if I use if I specify the demo namespace and get a pvc there's my pvc right. So that and it's the same data it's everything is exactly the same it's just issued off of that pv. Does that make sense anybody? Okay I kind of forgot to explain some things in the middle there so I want to make sure but so now if I delete it and I didn't put any sort of transfer or anything else it just reverts back to the normal behavior. The pvc is deleted the pv is deleted everything's gone neither namespace has anything. Cool. Alright so the next one actually yeah so we'll run through this next one really quick and then I want to leave some time for any questions unless you guys tell me now you have no questions. Alright so this one is the populator piece oh yep yep sorry at least the colors are there right. Okay so what we do here is again it's a CRD it's an external CRD we've already deployed it what I did there just now is I went ahead and I just did a scribe on the CRD so that you can kind of see what sort of fields are in there and stuff like that. So the next thing we want to do is get populators so populators are the object that we create right now we don't have any so I've got this populator yaml file here this manifest this is going to create that git populator that I talked about you can see what we're doing is we're saying hey I want a populator of type git so it's a github action I want to mount this data inside of my container at slash git so that's where I'm doing a git clone 2 basically. This is the name of the repo that I want to use and this is the branch that I want out of that repo you can also specify a tag and things like that if you want. Let's create that populator object real quick we can do a describe and you'll see we now have that so that's good and ready to go. Now here's my pvc spec so what I've done now is I just changed my pvc spec to have this thing called a data source and the data source in this case is going to be a populator.storage.kates.io right so before the only thing that was valid was a snapshot api so what I am trying to do or what I would like to see is adding populators and existing pvcs to that data source field so that you can do cloning and you can do external population type things and what I'd like to do long term is build this out almost like an SDK and make it extensible enough that then you could do plugins into this populator so it would almost be like CRDs plugging into the CRD so to speak which might be a little goofy but anyway so that's kind of the run that's how that works what kind of happens in the background with the populator is it sees that data source and when it gets that what it's going to do is it's going to go and launch a Kubernetes job that is going to then use the image that you specified and the arguments that you specified to actually populate data onto that pvc one of the things that's missing right now is something to keep your actual work pod or application pod from grabbing that before it's populated what we're trying to work on right now is taints and tolerations for pvcs that will solve that so what would happen in this particular case is the provisioner would see that there's a data source associated with it that has to do some work and so it would put a taint on the pvc so that nobody else except for that populator can use it okay so the question was can I not use the unique container stuff that already does that so there is no concept of taints and tolerations for volumes at all so today so that's what we want to change there is taints and tolerations for containers and pods the proposal is to carry those over and allow those to be used on volumes as well how do you avoid to have like you have transferred the pvc and the latitude and it keeps in the volume how do you handle not to have like zombie volumes coming around I mean if you didn't claim the volume in the second name space so if the question was how do you deal with orphans if you created a transfer and the end user never actually picked it up right now I don't so long term what we could probably do is figure out if we want to have a controller run that puts a timer on that or make it admin configurable or user configurable that says time to live on this transfer request is one hour or one day whatever it is and then automatically delete it if it's never claimed so I think that's a really good point you speak about resize like it is done feature but in fact it is mostly useless I'm sorry I'm having trouble hearing with the ok so you're speaking about resize feature is like done feature but in fact it is have no sense I mean in general we use pvc in stateful sets but resizing stateful set is not supported which means that hey ok I have a resize right that is actually so stateful sets are one of the things that are targeted to in the enhancements of the existing resize so you're correct today right now resize is only just on a bare pvc and it's when it's not in use it's already it's not all that useful in your case right so one of the things that is being worked on with the QIS and everything in the online is the ability to actually carry that through into stateful sets and do resizes on stateful sets as well alright so the transfer that is triggered on the delete sounds a little bit scary for me so if I don't have everything in place right I delete the data so the transfer actually right now the way it is is about as safe as I can come up with to do it because what it's actually doing is running on the existing pvcontroller and it's just using the reclaim policy semantics but the beauty is it's also a gated feature so you can just turn it off and not do it if it's troublesome if it's something that you're interested in though in terms of the feature but you don't like the idea of how it works or how it looked here I would love for you to take a look at it and give me feedback on how to make it better thank you alright well thank you everyone