 Welcome to the maintainer track session for Operator Framework where we're going to explain to you what's new, what's happening over an Operator Framework land. By way of introduction, my name is Jonathan Burkhan. I work for IBM. I'm one of the steering committee members of Operator Framework. Also here we have Austin McDonald from Red Hat, also a steering committee member, Alex Green from Red Hat, also a steering committee member, and Varsha from Operator SDK. She's one of our maintainers. Okay. In case you have forgotten what we look like, that's what we look like. Show of hands here. Who here has used any of the things on this page? Have you ever built an operator? Do you know what an operator is? Have you ever used? Okay. Pretty much everybody here is at least passingly familiar with what we're going to be talking about today. But in case you aren't, one is an operator. An operator is a design pattern for deploying software on top of Kubernetes. Rather than statically deploying your software by running a pod spec and running that on top of a Kubernetes cluster, what if we taught Kubernetes how to make your thing by extending the API? And then we would provision and interact with those resources the same way we interact with the normal Kubernetes core resource types. So I'm sure we're all familiar with Kube CTL Create. You can use that to push pods. We're going to teach Kubernetes to Kube CTL Create Your Thing. And then there's going to be some process that lives somewhere that's called a controller that's going to see those new objects being created and is going to go make them happen, whatever that means for your thing. So operator framework, the stuff that we make, is an open source collection of tools that is designed to make this task easier. We have a variety of tools that we offer that we're going to be talking about today that can be used basically all parts of the lifecycle of these operator software things. So we're going to talk about operator SDK, which is a CLI that lets you build and scaffold operators. We're going to be talking about operator package manager, which lets you distribute them. And then finally we're going to talk about operator lifecycle manager, which lets you manage and upgrade operators on clusters. And we're talking about things we've been working on on these things, future plans, and then we'll probably have some time at the end for some questions. So without any further ado, I'm going to pass this off to Austin to talk to you about operator SDK. Hello, so of course all of you here know that the operator SDK makes it easier to build Kubernetes native applications. And hopefully you don't have to know as much as you once did. So I'll be talking about our plug-in architecture, the new bundle format, and how you can be validating those operators. So one of the things that I've encountered with people coming to the booth is that they're not aware that the operator SDK and CubeBuilder are now on the same team. If you're scaffolding a Go operator, you're basically going to be getting the same stuff that you'll be getting from CubeBuilder because we've fully integrated. And we've done that by introducing a plug-in architecture, which means that all of the new cool stuff with CubeBuilder automatically comes down to us. So I wanted to talk a little bit about some of the new plug-ins that were written for us by Google Summer of Code. We had two interns doing that for us. So first of all, we've got the Grafana plug-in. If any of you came to my talk on Wednesday, one of the best practices for developing operators is to start collecting metrics early in the development cycle so that you can track how that performance changes over time. With the Grafana plug-in, it will automatically scaffold for you the configuration that you need to plug straight into Grafana. So not only are you collecting Prometheus metrics right out of the box, you can now visualize those metrics right out of the box. And we need to thank Tien Tony for writing that for us and hire him. Next up, we've got the Deploy Image plug-in. For the most basic use cases, if all you need to do is deploy an image, this will do it for you. It basically would get you through our entire tutorial right out of the box. And we have to thank Nikhil Sharma for providing that to us. Next up, we've got a new bundle format, or a catalog format, which you don't really need to know too much about, because the operator SDK will handle it for you under the hood. But what you can think of is that a bundle can represent an operator. It's just a set of manifests. And then a set of bundles or operators is a catalog. This catalog is going to be more human readable. It's no longer going to be an SQ Lite, which hopefully none of you had to deal with those problems, but more on that in a bit. And the last thing that I wanted to talk to you all about is that you can use external validators now with the operator SDK. This is particularly valuable to pipeline teams, validating operators, but instead of being forced to use the validators that are provided in the binary for the operator SDK, you can now write your own validator, you can use validators that are provided anywhere on the web, and you can run them externally to make sure that your operator bundle is going to be doing exactly what you need. And if you want to learn more, you can obviously go to our website, which hopefully you all know by now. You can have a look at the KubeBuilder plugins and also be sure to go to the Operator Framework Community page, which is not on here, but it is GitHub, Operator Framework org slash community. And that will list all of the places, all of the meetings that you can get involved. And I hope to see all of you on the mailing list. So next up we've got Varsha. Okay. So now that we know what operator SDK does, let's look into operator package manager or OPM. So the main task of OPM is to distribute operator content across clusters. So OLM initially used the format of having SQLite based registries where we had operator catalogs. Now we had indexes where we had operator images and we had the process of fetching content through that and distributing it. Now unfortunately the APIs which OLM exposed weren't sufficient enough to enable operator authors to look into a bundle or to even debug it. Now imagine the daunting task of downloading an image locally and tarring it and then typing a bunch of SQLite commands and then reaching the point where you can debug a bundle. Now that seems a very daunting task for an operator developer. Now a solution for this where file-based catalogs. The main goal of a file-based catalog is to enable better catalog editing, verification, and extensibility. Now as the dependency scale and we have a lot more releases in a bundle, file-based catalogs become a little bit more complex. Now veneers were introduced as an approach to make the interaction with file-based catalogs easier. So we have two kinds of FBCs. One is the basic, sorry, we have two kinds of veneers. One is the basic veneer format and the other is the semmer veneer format. The basic veneer format is pretty straightforward. It helps in bundle identification. It basically lists out the bundle images which can then be extended into a full file-based catalog. And the semmer veneer is a little bit more opinionated and a little bit more complex. So as the dial suggests here, it has an increased automation level. So it not only lists the bundle images, it also gives a holistic view of the channels and the bundles which are being promoted inside the channel. The packages and an interesting feature which operator authors would love the most is a visualization of an upgrade cycle. Now upgrades are always painful. So we have a solution for it. Through the opium-alpha-renda-graph command. Now alpha-renda-graph command generates a mermaid output of visualizing upgrade graphs and this way operator authors can actually verify whether their upgrade cycle looks like what they wanted to be actually in the production environment. So we have a lot more discussion going on about this and we have documentation on file-based catalogs and veneers. So please do have a look at it. Now I'll pass on to Alex. Hello, KubeCon. Hi, everyone. My name is Alex. I've been working on OLM for about four years now and it's been a great pleasure working on that project. For those of you that don't know, OLM allows you to extend the Kubernetes API by installing operators onto it. What this looks like in practice for you guys is you create an operator with the operator SDK. You then package it with OPM and finally you ship it out to your customers. OLM would notice that you released a new version and would notify all your customers that an upgrade can commence if they're on a previous version or allow them to install the new version of your operator. And that ability to push releases that developers off their cluster make to customer clusters has been very powerful and well received by the community. So with that in mind, Alex, you have this incredible powerful feature so that you can allow operator authors to push content to customers' clusters so that they're using the latest and greatest. Why would you ever want to update OLM's APIs? So, frankly, OLM has been in the game since CRDs have been in their beta format. A lot of the design decisions that we made early on in OLM's development life cycle were based off of assumptions in terms of where we thought CRDs were going to go. Early in OLM's development cycle, CRDs had no versioning. They didn't have conversion webhooks and a lot of these other nuanced changes really affected the attack plan that OLM had in terms of the experience we were going to give to customers, especially when you apply it to our multi-tenancy solution. And so with that knowledge that CRDs have evolved a lot and OLM's management of them has changed over time, it's been very difficult to create a set of APIs that are both backwards compatible with features that we released and new features that take advantage of updates that have happened in the CRD space. So some of the design decisions that we had made back then we simply wouldn't make knowing what we know now although they made sense at the time. And so what are the new APIs going to look like? OLM's API service right now is very large and when you jump into it you have to know a lot about how to use OLM before you can use it effectively. And so what we hope to do is break OLM down into a set of core features that facilitate very focused features. These tightly-scoped controllers will be able to be used independently to fulfill their intended purpose or can be used in conjunction with one another to fulfill a more robust experience when installing off-cluster stuff onto the cluster. Okay? So there are basically three APIs that we are actively developing and welcome you to get involved with and contribute to. First, there's going to be the operator API and you can think of that as your landing page when using all these components together. It's going to act as a viewpoint for understanding what's available for installation, what's currently installed, what resources are associated with what's currently installed, and stuff of that nature. Holistic view into the entire ecosystem of stuff. We are then going to introduce DEPI. DEPI is effectively a SAT solver. It's a generic one, so other projects will be able to use it as well. But at a high level, DEPI is going to use this as a way to ask, hey, I have some constraints which include installing this operator at this version. Based on the stuff that can be installed, can I install something that meets those constraints? Okay? And the third component is Ruckpack. And Ruckpack is simply in charge of installing a set of manifests. So, Ruckpack. Installing a set of manifest. Notice that I didn't say it's in charge of installing operators. What is an operator? An operator is simply a set of manifests, one of which is a CRD that extends the Kubernetes API. We've mentioned that the ability to allow our developers to create a solution and package it, and then share it with all their customers is really powerful. We do not want to limit ourselves by requiring that it has to be an operator. So Ruckpack is going to be a generic way to retrieve off-cluster stuff and install it on your cluster. So, what are some of the concepts that we've been working on within the Ruckpack space? The first one is the bundle CRD. Now, Austin had mentioned this earlier in our presentation, but a bundle is a collection of stuff that defines a version of something you want customers to use. Right? So, a bundle points to a set of manifests that you want to install. Right. Now, the second CRD is called the bundle deployment CRD. A bundle deployment effectively points to a bundle, which again, specifies a set of manifests that you want to install on the cluster. So if you point the bundle deployment at a bundle, it's going to install the stuff. Okay? Perfect. Then the final piece that makes up this solution, which is my favorite part of the whole design, are the concept of provisioners. Provisioners are effectively controllers that subscribe to the design philosophy behind Ruckpack. They serve two purposes. One, they're able to retrieve off-cluster content defined as a bundle. And two, they're able to install stuff defined by a bundle onto the cluster. So there's an aspect of reaching out and getting the stuff available for install, and then there's the aspect of actually installing it. This is extensible. You can write your own provisioner, and we'll go more into that soon. You can think of this as a BYO provisioner model. So what does this look like today? So today, we're going to look at, on the left side of the screen, we're going to see an example of a bundle deployment. Within that bundle deployment, the first field in the spec is the provisioner class name. Each provisioner will have a unique provisioner class name that allows the provisioner to effectively know like I am supposed to handle this bundle deployment. So in our example, we've written the plain provisioner, and so any provisioner class name that is equivalent to core rockpack io-plain is going to be reconciled by our plain bundle provisioner, or bundle deployment provisioner. If that name did not match what it's set to, the plain provisioner would leave the bundle deployment alone. This would allow you to specify your own provisioner and reconcile the bundle deployment. The next thing in the spec is the template. Bundle deployments allow you to specify bundles in the spec of the bundle deployment so that you only have a single surface that you have to interact with. This is nice for a development model where you just want to define it inline. It gives you one API you have to create or one CR you have to create. Alternatively, you could point to a bundle that's on cluster. And so in that template, the template has a spec and there's a source. The source type is how you define where the bundle content is going to come from. And in this case, we're looking at an image type. And then within the source structure, there is an image and a ref field, and the ref field points to the image that needs to be pulled down and unpacked. And once again, you see the provisioner class name. It's the plain rock pack provisioner. So again, you could use any combination of provisioners to retrieve the stuff off cluster and then deploy it onto the cluster. So it's a very plug and place kind of system, right? Bring your own provisioner. Now, what does the image actually look like? You can see that on the right side of the screen where there's a single directory called Manifests and a whole bunch of resources that need to be deployed. Those are the resources associated with the version of the thing that you want to install. So that is the bundle. That is ultimately going to be deployed onto the cluster. What's really cool about the plain provisioner is that if anything follows that format, you can effectively apply it. So the plain provisioner today supports you specifying an image that has a manifest directory in the base level, and then it will apply the manifests. You could just as easily point it at a GitHub repository. It will pull down the little repo or look at it. It will look for a manifest directory, and it will apply the stuff in there. And I believe you can also point it at HTTP endpoints. And as long as it's set up with a directory structure, you're good to go. Some other provisioners that are actively under development right now are a Helm chart provisioner. So effectively you can specify charts and the bundle deployment spec has a blob that allows each provisioner to kind of define their own spec. And you can supply a values field and then you can do all the key value pairs that you want to inject into it. So that allows support from other Helm charts that are already available. With that, that's everything I wanted to say about Ruckpack. And I'm going to hand it off to Varsha so that she can talk about Deppi. So a bit more about Deppi. It's the most interesting module out of the whole structure. Deppi is basically a Kubernetes API, which is an independent module that helps independency resolution for Ruckpack-based models. So what can it do? It can decide on what to install. It can decide if a set of constraints can be met. And it can identify any dependencies. Basically, if I have or if a user has to install an operator, it can check and tell me if the operator is installable without breaking any other dependencies present in the cluster. Just as a note, this project is still under design discussion. We are brainstorming a lot about it. So we do welcome you all to give your inputs and suggestions. So we have GitHub repositories for Ruckpack and Deppi. And we have independent Slack channels, RuckpackDepp and DeppiDepp. So please do give your thoughts and suggestions on how we would like OLM V2 to look like. And just as a note, I know today is the last day, but we do have interesting demos on few of the topics which we all spoke about. So please do visit our booth. It's the operator framework booth. And yeah, please provide your feedback on our talk and on future talks, the topics which you would all like to listen to. And I think we do have time for a few questions. So any questions? We couldn't quite hear. One more time, please. So do we have any tighter integrations with open tracing or open telemetry for operator SDK? Not that I'm aware of, but with the plugin architecture, if you have a specific need, it would not be very complicated to add a new plugin. But do mention that in our Kubernetes operators channel in Kubernetes Slack, or actually a better place might be the operator SDK dev channel in Slack, and possibly we'll create an issue and see if we can meet that need. Any other questions? Can you spend a little bit of time on how easy to debug the operator? I think someone mentioned debug, but a little bit of insight into how easy it is to debug. Do you mean like an operator itself or like building an operator with operator SDK? An operator itself. Okay, so who here has ever tried to run a pod and it blows up and you're not really sure why the pod's not starting? So you start, you know, kubectl logging and you see, oh, it's timing out when trying to pull the image or for some reason the docker container's not starting or that sort of thing. It's pretty much like that. So, I mean, it also I suppose depends on how much logging you're doing in your controller. But it's basically the same workflow. You've got some sort of resource that is probably a thing running in a pod somewhere and you have a controller that's responsible for reconciling it and you've got to look for the logs in the various places to see how, where is it getting stuck? Why is it causing it to just sit there and not actually come up and run? So if you're familiar at all with debugging just applications running on Kubernetes, it's a very similar experience. Does that answer your question? Okay, thank you. One thing I might add on to that is that the SDK and possibly QBuilder I forget if they do this, but they provide the ability to actually run your operator from the command line. What this effectively looks like is that it uses your kubectl config and it will just run the code like any code would run on your computer and it will use the kubectl config that you supplied and so you can do rapid development just testing to make sure that it's receiving events against the API and reconciling them without even spinning up a pod. So you can just control C out, make a change, spin it back up and it just allows you to really quickly iterate on the state and status of your controller. I've got one more hot tip. So apparently CERN has an operator to deploy their Drupal-based websites, which is apparently like a thousand websites or something like that and they have a really interesting pattern where they add an annotation that tells the operator to stop. So basically you could tell the operator to stop at any point in your reconcile loop which is a really great way to get in there and get some fantastic debugging at exactly the moment that you need it. You would add this to your reconcile loop by hand in development. You wouldn't want this in production but you can think of it as like a GDB statement that will just stop the operator in its tracks right then. So it won't be adding any more logs, it won't be being noisy anymore and the last thing that you'll see in the logs will be a bit of information that you're interested in. Just to add a bit more to it we do have a scorecard testing framework where basically you can test your operator on cluster without actually deploying it. So it's more kind of validation where you can write a particular set of rules to check if this is what is expected and if that's what is happening before you send it to the production environment. I guess my question for existing V1 users of operator SDK what does the upgrade path for V2 look like? So are you talking about OLM the upgrade? Yeah it should be pretty seamless we have a registry V1 provisioner so once again you can write a provisioner to handle whatever format you like keeping your stuff in right so the registry V1 provisioner is aware of CSVs and so it is backwards compatible I think there's like some nuance there but it should be relatively painless. From the operator SDK side of things assuming you use that to generate your bundles the plan is it's going to be completely seamless you're going to you know cut a new version of your operator you're going to generate a bundle for it and the interface and user experience should be literally identical although different stuff will be going on behind the scenes so you hopefully shouldn't notice a difference Oh here we go Sorry one last note so this is a feature that I kind of just glossed over but the existing bundle format is white listed so you can only accept certain resources inside the bundle with the plan provisioner it's currently anything so there's no longer a limitation there if you have something you want included in it you can just throw it in this support will be provider sorry provisioner dependent so thanks for the presentation of ALM and I have questions probably because I was not quite following the idea of ALM because so for basically before that if we have to create an operator and want to deploy it we possibly will try to first install the CRD and deploy the operator and then I guess the ALM has helped to resolve the overall process and you can just use one stop command and try to over cover all of the things like that so my question is like if we do want to do with that like what is kind of the difference between we want to wrap it in the helm chart versus we deal with it in the ALM and if we run this in the operator and it's possible is it possible that we drive the operator through ALM which that is not scaffolded by operator SKK but it's rather like a customized operator yeah so I'm going to answer that second question first because it's a little more scoped ALM doesn't really as long as you package it in a format that ALM like v0 supports which includes a CSV ALM will run it we don't care if it how it's written it just needs to be packaged the way we want it but with the new model it no long because of provisioners right where you can define your own format for deploying stuff there's no longer that limitation you don't have to include a CSV for ALM to know how to manage it now with that said in terms of like why you would use oh I'm trying to jump in real quick I have another thing to say about the second one so if you if you have a non operator SDK operator it's also fairly trivial to add the little points so that you can then use operator SDK to generate a bundle for it that's pretty easy yeah SDK does make it very easy to build a bundle that subscribes to the old packaging format and they will do it for the new one too and so going back to your earlier question why would you use ALM versus helm so I think they've served slightly different purposes and they're both great tools that you can use to meet your needs depending on what they are what ALM really tries to emulate is that you have bought into a new cluster admin on your cluster that's going to automatically know when new versions of things appear so that it can move it along it handles the upgrade automatically if you want it to you can do it in a manual mode where you just click ok yep prove the upgrade and the the ability to subscribe to the latest and greatest is a really powerful on cluster thing if there's a thing that breaks, ALM is going to recover like if you installed an operator and somebody deleted it ALM is going to come in and recreate it so that that service is still available on your cluster that means DevOps teams aren't going to get pinged and be asked to go in and fix this thing after it breaks and one last thing is that helm has somewhat shyed away they release an official statement saying that they're not giving full support to CRDs as a resource CRDs are very difficult to version largely because when you introduce a CRD you're introducing a new API meaning that people can create resources and these resources are stored in SCD and when they're stored in SCD there's only a single version of your CRD that's actually stored so as you introduce new versions of the CRD there is a potential to invalidate the existing CRs in the SCD database and if you invalidate them that means they can't be updated until they subscribe to the new storage format and what OLM does which is kind of cool is it doesn't do it before you kick off the upgrade but once you kick off the upgrade OLM will say hey this upgrade includes a new CRD version that's the new stored version that new stuff there's stuff on cluster that does not match that new format so upgrading will invalidate that stuff so I'm going to block the upgrade until you fix those existing CRs I think we have time for one more question yeah that's really great actually we were facing problems with the upgrade of the CRDs so does it take care of the admission checks as well in somehow or like how does that pan out actually can you define what you mean by admission checks so the admission server basically all the checks in like we have the admission webhooks right like mutating and validating webhooks like are those also encompassed in the operator like bundles so you are able to even with the existing OLM version you are able to define admission and conversion webhooks right so there's no issue there we I will add in an asterisk and say that we only support it with self-signed certificates from OLM which can be like kind of wonky so I would just like know that in advance but the so yes they like if you create an admission webhook or conversion webhook it will be there and we will handle the upgrade and installation of those things I do recommend in general though moving away from webhooks and looking up the CL or CEL expressive language that's recently been introduced to CRDs webhooks can be fairly dangerous if they ever are not available you effectively work garbage collection like Kubernetes garbage collection and that can really cause some problems on a cluster in terms of resource consumption so I really encourage anyone who's writing a CRD to look at the new CEO expressive language that has been introduced oh totally yeah but now's a good time to start moving to it because it's out I think that's time yeah that's all the time that we've got so we will be around to answer questions at the booth we are going to head down there right now if you want to follow us thank you everybody for coming