 Welcome everyone. I'm glad to see so many people, despite I'm competing with the AWS talk. My name is Tomas Netana. I work as an engineering manager for Red Hat in Czech Republic. I do a bit of technical work from time to time, so I also contribute some code to Kubernetes and other projects. Yeah, you can't hear me. I'm not sure if I can do anything about it, but I'll try to speak loud. Is it better if I speak like this? Okay, so just wave the hands if you can't hear me, because I know I talk too softly, usually, and if you don't understand, just shout. I work for Red Hat on an open chip project. That means I have to contribute to Kubernetes as well. I'm active in the storage SIG group because of my previous engagements. I started working on Kubernetes about a year ago, and I'm not sure if you realize how large project it is. I mean, it's not just a matter of amount of code. It's mostly about the size of the community. If you look around yourself, this is a huge conference for one software project. And the community is not only contributing code, there's many users. They run Kubernetes in production. That means they don't want this thing to break. So there's many processes established around Kubernetes. You cannot just, you know, go and change things there. If you want to contribute to code, how many of you have ever tried to send a patch or contribute patch to Kubernetes? Yeah, so a few of you. So you probably know that it takes some time for the patch to be merged. It has to be tested thoroughly. It has to pass a review. It has to be approved. And depending on which part of Kubernetes you're contributing to, there are certainly certain people who are allowed to approve a patch. So if you create something slightly bigger that touches more parts of Kubernetes, you have to chase the approvals. You have to convince the community that, yes, this is what you want. And no, it's not going to break anything, really. Things get more interesting if you not only fix in bugs because people want you to fix bugs. But if you are trying to add a new feature to Kubernetes, I think a new feature usually means doing all these things that go against the stability. You create new API objects or you want to change API. You want to interact with the other objects and you have to prove that things will not change, that the behavior will not change. So you've got to go to community meetings. You've got to propose the feature. You've got to create formal document that's describing your feature. How will it work? How will it interact with the other objects? You've got to go to the community meetings where they regularly track the progress of the development of the feature. It's very difficult. And my goal was to teach Kubernetes to be able to take snapshots of persistent volumes. Persistent volume is basically a disk that your pod can use to store data. And the disk will survive the pod once it goes down. The data stays there. So it's like a normal disk. And people who are using disks on computers, they do things like backups. And snapshot is part of this workflow. You take a snapshot of a disk and your application can continue. And then you back up the data, for example. One of the use cases for volume snapshots. It's a very frequently used feature. So we thought it would be really good if Kubernetes would have an API to take a snapshot of persistent volume so all the applications could use the same API no matter where they run. So we proposed the feature. And we thought it was simple. It's snapshot, right? Snapshot is basically a tiny thing. And I come from Linux background. So for me, snapshot is LV create dash s. And there's nothing exciting about it. So I thought it would be simple. All right. So this was the plan. This is how it should look like when you propose a feature. So you present the idea to the community. Somehow send a mail to a mailing list, go to a meeting or whatever. Create a formal proposal. The proposal has to pass a review. It has to be accepted to the Kubernetes tree. And then, of course, you have to implement the feature. That means the code, again, has to pass the testing, the formal reviews, and it has to be accepted. Then when the code gets merged, right, you can go work on something new, exciting. So what we did, I thought the snapshot is really easy. So how would you do a snapshot? You just tell the system to take a snapshot. So with Kubernetes, I would create an object. It's how we talk to Kubernetes. We just declare I want this. So my idea was user comes, creates an volume snapshot object, and Kubernetes will take a snapshot of the PV that volume snapshot object references. Simple, easy, would work. I presented the idea to the community by, you know, calling for meeting. So whoever is interested in developing this feature, please join my conference call that time that day. And turn out I was right. Snapshot was a very demanded feature according to the attendance of the call. Unfortunately, if you have this many people, they have many ideas, many opinions on how the thing should look like, what should it do, what it should not do. So they ask questions, like, would the snapshot object be namespaced or not? Of course, it would be because user has to be able to do that. So how do I move the snapshot to other namespace, then? No idea. I mean, I didn't think about that. And then people were asking, like, and how do I stop the application before we start taking the snapshot? No idea. I didn't think about that. And do we need that? Can we do that? Can we even do that? Can it be so universal that every application or every part would be able to tell component is what to do, you know, to make the snapshot consistent? It's not easy. So the first meetings were really dramatic. We were taking notes. First change we made was that snapshot cannot be just one object. We decided to make it two objects, like PV and PVC. Similar. So the object that the user be able to create or delete would be the volume snapshot object. And somewhere outside of the namespace, like PV, there would be another object. We call it snapshot data. Doesn't matter. And this arrangement would eventually allow us to move snapshots between namespaces because there is some way how to make the non-namespaced object accessible somewhere else. It's a very difficult gymnastic with editing the objects, but it's possible. And it's one of the things that come out from the meetings. One interesting thing, I work from Czech Republic. That's a Central European time zone. Most of the people that I interacted with at meetings work from the states or from different time zones. So, you know, we were taking notes. I allowed them to suggest changes. And the meetings look like, we agreed that we will do this, this, and that. All right, I went home. When I come back in the morning, I opened the document, and there was this big, you know, list of suggested changes. We agreed, we will do this. No, no, no. Again, so as time progressed, we didn't actually move forward that quickly as I wanted to. And we still didn't write a line of code. We were still talking about the proposal only. So, we had to come with some plan B. How do we do that? How do we really give the users the feature? And the plan B was let's not go through this official path of creating a proposal, creating implementation, and have it all reviewed and have it all merge afterwards. Let's do the other way around. Let's, you know, be agile. Create MVP. Create the small thing that would take the snapshot. Give it to the users and see how they use it. What do they do with that? And if it works for them or not, if it does not, make the changes, and rinse and repeat. And once we are sure we have something the users want, describe it and call it a proposal. And then we should be able to get it in the main tree. Be sure we have a useful feature that it works the way users want it. And we would have the code for free because the logic would be basically the same. So, what did we do? It's obvious. Create a custom resource. So, we started with external implementation. And here's the awkward part of my presentation because if you've been here before, then you saw all the details of how custom resources are being created, how are they being compiled, how are they being used, how the controller looks like. So, I don't know how much detail could I go. So, I will just try to be quick. Describe what I consider to be the necessary minimum for you to understand how custom resources work and try to make it somewhat specific talking about the volume snapshots. Moreover, if you are interested in details, times have changed because we started working on snapshots a few months ago, maybe longer even, right? And since then, there's many projects using custom resources, custom resource definitions, external controllers. Here's one of the examples. There's Kubernetes incubator organization on GitHub where you can see many repositories where you can basically take a look and get some inspiration if you want to develop something on your own. I think it's the easiest way, you know, just copy the other code. That's why the open source is so great. So, let's do the volume snapshots outside of Kubernetes first. So, okay, I was too quick. Custom resource for those who have not been here before. It's basically a custom data type that you can create. It's like persistent volume nodes, I don't know, pods. These are data types known to API server in terms of API server they call kinds. So, you create your own kind that's basically unknown to the API server. And you can dynamically add it to API server, register it so it recognizes it, it can store it, it can validate it. That's important thing because the predecessor of custom resource definitions was called third-party resource. Third-party resources had the problem of not being validated by the API server. That's why the change has been introduced. Implementation detail that I consider to be important because it's probably the one constraint of your new data type. It has to implement the runtime object interface. That means you have to use or you should use the code generators. Again, you might have seen it in the previous presentation to get you the necessary methods like deep copy methods for your new object. So, custom resource is your new data type. Custom resource definition is the built-in API that allows you to register your new data type in the API server and teach it to handle it. So, you can store your new object on the API server. You can delete your new object on the API server. You can edit it. And that's everything. The custom resource definition usually is being used by the external controllers on startup to register. You don't want to register your new data type manually. So, usually it's the new controller that basically on a startup registers the new data type. Again, you've seen it in the previous presentation and I bet you would see it in the following ones too because it's how these things work. This is our volume snapshot data type. This is a piece of Golan code. It's snippet. Don't try to compile it. It's just taken from the snapshot controller. So, there's nothing interesting in there. You've seen before, we defined some strings that would allow you to access the data type. So, the plural is volume snapshots. When you do kubectl get volume snapshots, the API server knows it should retrieve this. And you also give it a group name and version. So, it's, I don't know, position as well. So, accessible. The data type or the kind is, again, if you know how the PV or PVC look like, this is exactly that. We got inspired by it. So, there is some spec which says how I want my snapshot to look like, which is the desired state. There is the status which describes what is the current state of the object. And Kubernetes or your new controller makes some action to make the desired state and current state equal. That's it. This is the custom resource definition. Again, it should be quite self-explanatory. You just specify what I want. The custom resource to look like. So, what is its API group name? What is its API version? There's a scope. So, for the volume snapshot, I said we decided to make it a namespace. So, the scope is namespace scoped. And, again, names, plural, kind. Again, it should be self-explanatory. And now, we have it. If you register this in the API server, the API server is able to store, delete, validate the new objects. But it's not very useful on its own because you will store your object. You will retrieve your object. But we want something to take action when you create the object. That's why you have to write the external controller. This is the biggest part of the code. So, controller talks to the API server and watches for changes on the objects. Again, I repeat what my predecessor has already shown you in great detail and great presentation. So, you just register your handlers for addition of objects, updates of objects, deletions of objects, and your controller then can act. So, for example, if the user creates a new volume snapshot object, the controller sees that, talks to the back end, tells it, create a snapshot, talks back to the, or gets from the back end some status like, okay, I've started. So, it updates the object. Once the snapshot is taken, it updates the object with success. And that's how it basically works. The difficult part is that no one takes care of the object interactions for you. You have to be careful with the races. So, what happens if somebody deletes the volume snapshot before it's actually taken on the back end and all the corner cases, they're up to you. Also, the controller is usually the part that takes care about registering the custom resource definition in the API server so the users don't have to do that manually or you don't have to do something just too difficult. And now snippets of code that I consider to be, again, the important ones. So, you get the client set. Again, that's the thing that the code generator creates for you. Client set talks to the API server and creates the custom resource definition. This is a polling protein that waits for the custom resource to be available in the API server so we can start using it. Once the custom resource is available, we install the handlers. Again, we need just the rest client, the scheme, so we can access the objects and we can go to the API server and install the handlers. So, this is just the... There's a lot of syntactic sugar around it, but the gist of it is this. You say that my SC on Snapshot ad is the addition handler on Snapshot update and there's basically nothing complicated and that's the point of it. It's really simple. If you want to create a simple controller that would do something for you, that would act on some of your new object changes, it's really easy. And again, in the previous talk, you might have seen it in great detail with demos. So, advantages for us because I'm talking about not extending the Kubernetes with external thing. Mind you, our goal is still to eventually get the feature in Kubernetes, to get it in three. But now we are not bound to the Kubernetes development cycle. We can change our code as we want it. We can say to the users that this is an experimental feature, something that you don't see in Kubernetes or you don't want the Kubernetes or it's more difficult to implement inside of Kubernetes. We can make the changes. We can make them whenever we want. We can make a deep changes. We can rewrite it completely and be more agile. That's probably the point of why we think or why I think this is the good way or easiest way of getting features like this into Kubernetes. If you have something that isolated, then today I probably wouldn't go the path we started with. I wouldn't start with the community meetings and formal proposals. I would really try it out myself and show the code to the others first and tell them, see, I have this and it works. Is it useful for you? Yes, no. If no, change, redo. And it would be much, much faster. And we would be talking about specific, concrete, real thing. I believe this is important. This is what I really like about the CRDs is that it's so easy. It's so easy to create something on my own, something that works and something I can show. There's obviously some disadvantages. What I found in our code, maybe it's because we don't have it, right? But I believe it's a consequence of having an external controller. You don't have things like shared informers because there's nothing to share them with because it's your own tiny piece and it has to talk to the API server and the API calls is basically the only way you can interact with Kubernetes. So the API calls might be more, might need to be more frequent and this might be somewhat less performant than would have been, if it would have been implemented in the Kubernetes itself. But what's more problematic is this is not what the user gets with Kubernetes. If they install Kubernetes, if they do the heck local cluster up, it's not there. So you also have to take care about deployments or good documentation of the deployments for the users, for the admins, role-based access control. Again, this is something that you don't have any access control by default in Kubernetes and depending on how is the cluster set up, the access control might slightly change. So you have to really be over both in the documentation, tell your users how to do that. Also, that turns you into a release engineer and the package maintainer because you have to take care about updating your upstream images. You should really do some marketing around your project because it's not in Kubernetes, it's not so closely watched. So you must make sure that people really know what you're doing and that they will test because it's the point of doing that. I believe that this non-technical part is actually the most difficult one when you try to take this path. You really quickly get some working codes, but getting that code to the users is not so easy. This is not for me. Another thing we're running into is the dependencies. Vendoring in Golang, I don't know if you've seen that. It might become really, really difficult because there's dependency circles. And now you depend on Kubernetes, which is large. And once the Kubernetes, once Kubernetes dependencies changes somehow, change somehow, you got to reflect it in your code. So we spend a lot of time fighting with Glide and updating the rendering directory. Again, this is something that I hope somebody will eventually solve in the future because I don't think it's bearable. Yeah, I've talked about less of your visibility for potential users and contributors. One more thing that I'm seriously afraid of is because Snapshot is such a cool feature. I'm not sure somebody else in the world is not developing it in parallel. That's the thing that you would probably avoid if you would do it right away in Kubernetes. So it's mostly a social thing. This big project somebody else would see that and join you. If you're developing it in isolation somewhere else, it's pretty possible that there will be splitting forces. I don't know. I hope that nobody else is working on Snapshots. If you know someone who's working on Snapshots, let me know. So we're getting to the end of my presentation. So just sum it up. CRDs are really, really easy way on how to extend Kubernetes. And that's why I think they're really great way of experimenting with Kubernetes. If you know that your feature is well-isolated, then there's basically no reason to start with working Kubernetes and changing Kubernetes code itself. You can do it with custom resource definitions with external controllers and then try to merge it back in Kubernetes. And one more thing, if you heard the keynotes in the morning, people talk about slimming down Kubernetes. Much of the functionality that's now part of the Kubernetes itself is going to get external anyway. So if we decide not to merge my new feature in Kubernetes itself, there's no problem. I still have not failed. I have it as an external component, same as many other features that eventually will exist in Kubernetes in the future. So this is basically a way how you can ensure that you don't fail or not fail that easily, lowers the probability of failure. And I still hope eventually we get volume Snapshots in Kubernetes. But if not, they exist. You can use them. And if you've been to OpenShift Commons yesterday, they are going to be tech preview in the next OpenShift release. So we're looking forward for some feedback. And I think it's the end. How much time do we have left? Okay, we have seven minutes left for questions. So first question, please. Yes. I agree. The thing is, this is actually quite old. So I think the work on Snapshots is older than the aggregator. That's why. Yeah, we started at the end of 1.6, I think. We actually started with third-party resources and changed that later. So, yeah. There was a comment that we can use the API aggregator. And it would be also working. This is high-level, close to the code. Yes, it's true. You might have used that. Other questions? Yes? Do you have a snapshot control somewhere in GitHub? Sure. Snapshot control is on the GitHub. It's in the Kubernetes incubator repository. If you want to take a look, you might. I've built images on Quay, so it should be easy to deploy it. And there's some documentation. Mind you, it's still very alpha beta quality. So it might be unstable. It works for Amazon. It works for GCEPD. It works for host path. And we have new patches for Cinder. That's OpenStack. And we have patches for Gluster. But these are very new, so be careful, please. Are there any other questions? Yes, please. It seems like if you have it in public incubator, and I'm working on this, if anyone wants to collaborate with me. Yeah. Because you're in the open. Yeah, we do this. But again, that's right. So what do you draw the line of when you have a new feature to do it as an external standing with CRDs in an incubator, rather than do it as a form of proposal like the traditional way? What is that drawing of? Because it looks like for me, it's having a feature of. Again, we think this is something like a persistent volume. It belongs to volumes. So if you have persistent volumes, you should be able to take snapshot. For me, it's actually quite core feature. That's why I want to have it in three. And it's up to the maintainers and community to tell me whether I'm right or wrong. So my plan is to really show them that it's this useful. It does these things. Do you want to have them in three? Do they say yes? I'll work on merging it if they say no. Okay. This is the official snapshots now. All right. Are there any other questions? Yes, sir. Yes, custom resources can be, you can create roles that would basically define access control for custom resources as well. Custom resources in this manner, they behave like standard API object that already exists in the API server. So you can define role-based access even to CRDs. Some more questions? All right. Thank you very much for listening.