 Alright. Thanks everybody for coming to my talk today. My name is Adam Litke. I'm a software engineer at Red Hat. Today I'm here to talk about declarative pre-population of your Kubernetes persistent volumes to help make your apps happier. So it's a Kubernetes talk. I don't have enough time to talk about that subject on its own, but let's just say it's the de facto way to run containers at scale. A couple things that I like the most about it is the declarative nature of the system and how that's implemented with the controller pattern. We'll get into more details on how we're going to leverage that today. Basically by the next point here, which is that the cluster itself is very extensible. You can add new features to the base functionality by creating your own custom resources and your own custom controllers. A couple of the objects in Kubernetes that will be relevant to today's talk are nodes, which are Kubernetes worker machines. They have everything that's needed to run pods, which I'll tell you about in a second. That includes the container runtime like Docker or Cryo, as well as all the Kubernetes bits like the kubelet, kubeproxy, etc. And of course the nodes are managed by the master of the cluster to do all the fun work. So what are pods? They're basically a logical application unit that consists of one or more containers. They all get scheduled together and run on the same node. And because they're a logical unit, they have the access to the same storage and networking that's been given to the pod they live in. So the way that storage gets managed in Kubernetes in a nutshell here is through two custom resources. We have a persistent volume and a persistent volume claim. Persistent volumes essentially represent a piece of storage that was provisioned by the storage administrator to be available to the cluster for use. That can either happen statically by an admin directly or via dynamic provisioning. All storage persistent volumes have a type. There's different kinds you can have if you're running in a cloud. You can have cloud volumes from Amazon or Azure or Google, for example. You can have software defined storage like stuff for Gluster. You can even have local storage or host path storage. Lots of different options there. It doesn't really matter to me at my level what the storage is. This will work everywhere. A persistent volume claim is when a user makes a request for a piece of storage. This is just saying I need 10 gigabytes of storage and I might want to choose a storage class to pull it from. It just depends. What will happen is Kubernetes will satisfy the user's request by binding the claim to a volume. Then the user has available to them an empty volume to associate with the pod. I'll just repeat again that Kubernetes is always provisioning empty volumes. That generally has worked for most applications but sometimes it's not always going to be the best plan. Sometimes you're going to want to have your volumes pre-initialized. Here's a couple of examples for that. Database is as they come into containerized environments. Oftentimes know how to populate an empty database when they're starting for the first time. That's going to happen once. After that point you probably have data that you want to resume from. If you're, for example, standing up this database on a volume from a new storage class or a larger volume or something, you're going to want to import that data in and it should be there before the database tries to start. If you have a big data application and you're processing multi-gigabyte data sets, probably better to have that data available when the pod comes up rather than having the pod have to try to acquire it in real time as it's running. Another case would be restoring from backup. I don't know how many people are actually backing up their persistent data in their Kubernetes clusters these days, but more should be and more will be as this continues to mature. So this is going to be another case where you have your backup and it's time to restore. Then the last category, mutable persistent data. This really comes as an inspiration from a project I work on called KubeVert, which allows you to run virtual machines on top of your Kubernetes cluster. That's way beyond the scope of today's talk, but definitely check it out. But in that case, you have virtual machine disks that are on the persistent volume claims and everyone knows that VMs don't boot very well off of empty data. So you have to have the data in place for that case. So Kubernetes provides a couple of options for you to populate data into your PVs. The first one I'll mention just because it comes up so often, but it's really not an option. And that's KubeCTLCP. This allows you to copy from your laptop to the file system of a running container or running pod. The reason that's not a good idea is several cases that the apps already running, so you're too late. Also, you are using the Kubernetes API server to proxy your data, which is not really production grade and it's also very imperative. I have to decide, sit down at my laptop, do some typing, put the data in there. That's not really automatable or in the spirit of Kubernetes. So let's throw that one out. The next one is I can run a special pod ahead of time that knows how to populate the PVC. So that's better. The Docker container maybe has W get in it and some steps to do the download or whatever you might want to do. However, you're going to pull that data in. So this is pretty good. It's better except you still have to make sure that you run that before your actual application. And if you have multiple PVCs that you're managing this way, you're again in charge of managing all that. So it sounds like that's something that Kubernetes was supposed to help us solve. The third one is using an init container. So we're getting a little bit closer. Kubernetes pod specs allow you to have containers which are your main application and init containers which are guaranteed to run successfully before Kubernetes will allow the actual containers to start. This is what one of those specs looks like in a pod spec. So in this particular example, we have an Nginx application container. And then we have a container that's running busy box that just does a W get somewhere. So this is getting a little bit closer. We're almost there. But in Kubernetes, it's important to note that the lifecycle of persistent volumes was deliberately kept separate from the lifecycle of pods. And the reason for this is because pods can be short lived going up and down, iterating over the same piece of persistent storage which is there for a long time. So I'm not sure that it makes sense to always tie the lifecycle together by having population occurring in the context of an init container. So a couple of other points. I've already mentioned some of the caveats of the existing options. A couple of other points I want to bring up is that you're always going to want more features with something like this. What about error handling with even that init container example? Do you want the logic of the populator to retry itself? Or do you want Kubernetes to restart the pod over and over until it succeeds? That's a choice you'd have to make. If you want, for example, to access data that is privileged or you need credentials to access it, you may want to integrate with the Kubernetes secrets API. In terms of if you're using this a lot and with lots of data, you're probably going to want some discoverability or observability into what's the process. So you're going to need some kind of logging, maybe Prometheus endpoints where you can monitor the process. So those are features and then the management aspect again. So I've kind of alluded to this already but making sure that it's not pre-population if it doesn't happen before. So it must always. If you have replica sets or stateful sets in your application which are actually doing maybe spawning additional pods, potentially dynamically creating new PVCs to host the application, how are you going to make sure that those dynamically created PVCs are getting populated? And then error handling comes up here again because you don't want to introduce instability or random behavior into your cluster when you start doing this. So I'd like to suggest a better approach and I would just take inspiration from the design that Kubernetes started with that made it so successful which is let's take a declarative based approach that's using custom resources and controllers to accomplish what we need. So the way that this works is you define custom resources and post them to your cluster that describe a population strategy. So this could be anything like please check out this Git repository, this branch and tag and store it in this directory or download from this web URL and provide credentials found in this Kubernetes secret. It can be really anything that you can think of along those lines. Then what you have is a PVC that wants to be populated. The creator of that PVC is going to reference one of the custom resource objects in a field called the data source. And all it is is a local object reference and that's it. Then the user posts that to the cluster. You then have a populator controller which is sitting there just like all the other controllers watching for objects. It discovers the PVC being created and it looks, determines that the type of the data source reference is one that it can handle. So then it takes charge, kicks off the process which is really just spawning a pod like we kind of talked about was I guess that was option two. Spawning a pod that it knows how to spawn attaching the PVC where the population gets performed. So if we stick with a standard approach like this it's going to give us a really great opportunity to collaborate as a community, add those features we were talking about where we're all sharing all that good stuff in terms of having a robust solution that has the features everybody wants. So that's kind of what the idea is. I'm going to try now to show you a demonstration of it. So what I have here on screen, it's a little bit off but it's a Kubernetes cluster running here in a VM on my laptop. And so the next thing I'm going to do is prepare the cluster a little bit. And so we're making sure the cluster is up and then we're going to add some persistent storage to it using local storage. Yeah, give me just one second. Yeah, I can try that. Terrible with these track pads. Is that, are we getting? Okay, all right. So this did a few things. We created some persistent storage and then because I didn't want to rely on the FOSDEM famous wireless network I created my own Git server inside of the cluster so that this, my laptop's in airplane mode. So we pushed some data to that Git repository and then I started up a populator controller which is now running at the bottom here. So that's kind of the cluster setup. Now let's go kind of to the persona of the user and what we're going to do is add some objects to the cluster. So the first thing is the populator object. So you can see what these look like. These are two Git populators. They're almost the same. The first one is requesting a checkout of the master branch and the second one is requesting a checkout of the v1.0 branch. So let's add those. Okay, so those have been added to the cluster. Now let's take a look at a PVC so we can start to populate. Okay, so this is a pretty basic looking PVC. There is the data source field and we're just referencing the master populator and it has a kind and an API group. So that's just how a local object reference works in Kubernetes and we're just asking for 100 megabytes of storage. So I am going to create that. Okay, so now let's see how that's going. So we have a bound PVC and we have a job that is already run by the populator controller to do the population according to the request. So now I'm just going to show you this little app that we have. It's going to bounce off the screen a little bit, but we have an Nginx container. It's using the PVC being mounted at the place where Nginx by default looks for web content and then a service so that we can see it on a browser. So let's just create that app. Notice how the app doesn't know anything about any kind of population. It's just expecting to have a PVC. Okay, so let's go. We've created that app. Let's see if we can take a look at it. Okay, so it's just a basic super exciting website that has some content in it. Let's see what happens if we adopt the 1.0 branch. So first I'm going to clean up that instance of the app. Okay, and then it's going to take just a second here. Okay, and then there's just a little bit of aggressive cleanup. And now let's take a look at the v1.0 PVC. It's exactly the same except for the local object reference here. I'll bump it up a little higher. It's now the v1 populator. So let's do that. PVC v1. And let's get the PVC to make sure it's bound. And I did the same type of again and get the job. Okay, so we have a completed job again for the different type of populator. And now let's create the app. And let's see if anything changed much better. Okay, so basically what I've shown here is that in a way that is completely unknown to the app itself and its configuration done population behind the scenes, I will note it's obvious when you're shipping web content in Kubernetes that's typically in the Docker container. So this is a bit of a contrived example, but one that's pretty visible on screen for you guys to look at. So that's why I did it that way. So let's talk about declarative again because what you did not see was a declarative API yet. And the reason is that there's a promise that you make when you're doing something declarative which is if the object has a state on it, PVC has a data source reference, Kubernetes must promise as I say here that a pod won't be allowed to run with that PVC until it's in fact populated. Otherwise somebody's not telling the truth and things aren't working correctly. So how are we going to actually make that happen? You notice that I kind of populated the PVC and I said let's spawn the app. But typically you would have the PVC declared with its data source and the app all put together in one deployment. So in order to make that work we're going to reuse a concept that's already employed in Kubernetes today called taints and tolerations. And this is actually invented for use with nodes and pods where you could taint a node so that pods wouldn't necessarily run on that node unless you wanted them to. But we are going to instead apply it to PVCs and pods. And so what happens is a taint gets applied to the PVC when it's posted to the cluster if it has a data source reference. So that's all done by an admission controller as part of Kubernetes. The effect of that or what it actually means is that this PVC needs to be populated before it's generally usable. So there's kind of a problem with it if you will. What happens is regular pods now that are referencing it like that NGINX app you saw would not be able to start with that taint on it. So they would just wait. But then the poplator controller is still detecting the PVC and it wants to populate it. So it's going to spawn a pod that has a toleration which says I'm going to tolerate the empty PVC because I'm going to fill it in. And then that will schedule fine. The population hopefully completes correctly. Then the poplator controller removes the taint from the PVC and everything's ready to go. So it's all orchestrated and managed by Kubernetes itself. So the next steps for us along the lines of this idea are first to allow custom resources in the PVC data source field. They're not allowed yet because that feature was actually designed to create volumes from snapshots and to create volumes from other volumes. So clones. So we're going to extend it. We need to do the PVC taints and tolerations. That's already been under discussion in the community and in progress. And then the last thing that we want to do is create a repository where we can have an example poplator and example CRs maybe like with this Git repo one and some others to begin the process of collaborating with people so that we can get a library of these going. Some of the other examples of things you can do you've seen the Git repo. I've talked several times about HTTP import using Kubernetes secrets. You could really do something cool with this which is to clone an existing PVC. If your storage can't do it behind the scenes you can do a host assisted clone. How that would work is the poplator controller would spawn a pod that actually attaches to both PVCs and then just copies from one to another. Upload is even more kind of crazy and interesting but rather than using kubectlcp to put data into the cluster you can have an upload proxy server that listens and when an authenticated request comes in via the Kubernetes API to that it can actually forward the data to a pod that's running that happens to be attached to a PVC and then proxy the data in so you could support direct uploads to your cluster. This is something we're actually already doing in a project called the containerized data importer which I'll link to in a minute here so you can actually try that out in its current form. And then restore from backup is here again if you're a backup vendor and you have specific logic on how to pull data out of your backup back end you could write a poplator that then puts that into the PVC. So that's all I have. Here's the link to the containerized data importer. It's part of the Kubevert project as well. So this is kind of the grandfather to these ideas that we're pushing up into the community of Kubernetes trying to get some generalized traction on it. So I think I have a few minutes for questions. Yeah, go ahead. The taints and tolerations for PVC, when do you expect it to hit upstream? So the question is when would we expect taints and tolerations for PVCs to hit upstream? So I guess we're designing it now I would hope by like Kubernetes 114. It's not super complex and it's an existing concept that everyone gets so it should be pretty easy to get going through but that's a dangerous thing to say with the Kubernetes community so we'll see. Anything else? Yeah, go ahead. We're using the data source as part of the spec like in the PVC and data source is an existing field from the PVC so you're leveraging an existing field in order for the populated to work. So the question is are we leveraging a pre-existing field in the PVC in order to make this work and the answer to that is yes. In fact, we were part of the group participating in the snapshots and cloning features and so this has been kind of a long time coming and we chose this general purpose way of describing this concept of this PVC should have something in it and we want to describe that declaratively. Any other questions? Am I missing anyone? Alright, I guess not so thanks a lot guys.