 Hi, everyone. Welcome to this talk by the OLM team for KubeCon North America 2021. In today's talk, we will be discussing OLM catalogs and how a paradigm shift from imperative to declarative in the context of building OLM catalogs reduce the cost of building and maintaining catalog artifacts drastically. My name is Anik. I'm a software engineer working at Red Hat, primarily focusing on the Operator Lifecycle Manager project, commonly referred to as OLM. I'm joined by my teammate, Joe, who's a principal software engineer also working at Red Hat, who is also a part of the OLM team. In today's presentation, I will be going over a short refresher on what an operator is. Then I'll be covering a short refresher on what the Operator Lifecycle Manager project is about. I will then talk about what OLM catalogs are and then discuss what the current catalog building process looks like. I'll also be showcasing some of the pain points involved in building catalogs in the current imperative way. Joe will then talk about the new declarative solution for building catalogs and how it manages to ease the pain points of building imperative catalogs. And in the process showcase a significant reduction in cost, the new solution enabled for everyone involved in building and maintaining catalogs. Finally, we will close out with a demo of the new workflow for building catalogs. So let's begin with a short refresher on operators. Operers were first introduced by CoroS as a concept for a special class of software. They're basically application-specific controller that extends the Kubernetes API to create, configure, and manage instances of stateful applications on behalf of Kubernetes users. It builds upon the basic Kubernetes resource and controller concept, but includes domain or application-specific knowledge to automate common tasks. So when you talk about any software, you need a way to manage the lifecycle of the software, at least in the context of enterprise softwares. This is where the Operator Lifecycle Manager project comes into play. The Operator Lifecycle Manager project is a component of the Operator Framework, which is an open-source toolkit to manage the lifecycle of operators in a streamlined and scalable way. OLM extends Kubernetes to provide a declarative way to install, manage, and upgrade operators on a cluster. It provides a way to make operators and the services available for cluster users to select and install. Essentially, OLM enables operators to behave like managed services providers through the APIs they expose. And it has built-in mechanisms to ensure cluster stability while those operators are being used on cluster. These features are enabled by allowing operators to express dependencies on specific platforms and on other operators, together with the help of rich update mechanisms to keep operators up-to-date automatically via over-the-air updates to catalogs. So catalogs, what are they? To discuss catalogs, we first have to go over what an operator bundle is. An operator bundle is essentially all the manifests that defines the bundle, that defines the operator version packaged in a directory. So what do I mean by the manifests that define the bundle? So as you can see here, it's basically a directory that contains all the manifests. And it's basically a Docker file, and it has two folders. So the first folder, the manifest folder directory, contains kubectl-applicable-yaml-manifests, including the custom resource definitions that the XCity operator owns. The cluster service version, which is an API exposed by OLM that allows the XCity operator to relay various information to OLM. And we'll see a few examples of these information in the next few slides. And other co-group resources, like config maps and services. The metadata directory contains application metadata, including the name of the operator version, the version, and package information, including dependencies. So the dependencies that I mentioned on platforms and other operators, they are expressed through the dependencies.yaml file. So this directory fully defines version 0.6.1 of the XCity operator. And once you have that folder, it's basically those kates manifest packaged in container images and then stored in a container registry. That is what we mean by an operator bundle and an operator bundle image. So now that we've covered what an operator bundle is and how operator bundles are expressed for OLM, how do I make a catalog of these operator bundles? Essentially, a catalog or an OLM catalog is a collection of references to these operator bundle images that I talked about. And we'll talk about how they are added to this database more in the next few slides. Essentially, like I said, it's a SQLite database that contains all the information about operator bundles in that catalog. So like I mentioned, in OLM, you can mention upgrade graphs for your operator. For example, the XCity 0.6.1 can upgrade to 0.0.6.2. And so in the operator bundle table, you would have these two bundles stored. And in the channel, for example, you can mention that both those bundles belong in the alpha channel. So in the channel table, we'll have the information about the alpha channel. In the channel entry table, we'll have information about both the versions belonging to the alpha channel. So basically, all of the information that is required to define each bundle and the relation between other bundle versions for your operator, all of that information is stored in this database. This SQLite database is then packaged into a runnable container image. So you can see an example of the Docker file here. It's basically this index.database that is being copied and pasted into the container image. And the contents of this database is then served with a command that we'll talk about more in the next few slides over a gRPC API. So once you have this catalog source image, once you have this catalog image, the way we introduce it in the cluster is through the catalog source API. So you can see over here in the red box, that's my catalog image. And once I create this catalog source with this catalog image, what I get is a registry pod, essentially a pod that starts up that exposes the content of that SQLite database over the gRPC API. And anybody can query for the content of the SQLite database through that API endpoint. So an example of that is the package manifest API, which also is owned by OLM that leverages this API endpoint to let users discover what operatives are available to install on their cluster. So you can do kubectl get manifest to see what operatives are available to install on your cluster. Now, how is this database created? So the operator registry project, also component of the operator framework, is where we originally house all the tools that we use to create and manipulate this database. So the project started off with an array of binaries. It had the initializer binary, the register server, config map server, and all of this did their own job. So for example, the initializer initiated or created us a new SQLite database. And then we had commands to add the directory that I showed earlier of bundle inside that database. All the information from that directory would get pulled into that database. The registry server command would then, or binary, would then use the database to expose the content of that database over the gRPC API. So we had a lot of binaries going on. But with the introduction of operative bundle and with the introduction of the concept of storing these directories of bundles in container registries, OPM was introduced to bring in to house all of these operations under the same tool and also work on container images. So the way OPM started off was it was pretty simple in the beginning. It had the registry sub-command and the index sub-command. And both of those had the addRM and serve command. The add would just add the operative bundle. So once you have it in a container registry, an image reference, you would pass that on to OPM registry or OPM index add. And that would extract the information from the bundled image and store that in the database. The RM would remove an entire operator. So you could add v.0.6.1 and v.0.6.2 of XID operator using add, but you had to do OPM, RM, XID. So the entire operator and all the versions that entered in the operator would get removed from the database using the RM command. And finally, you had the serve command that would, again, serve the content of that database over the GRPC API. So with the introduction of the new command, we had to build out processes for building these catalogs. So imagine you have a catalog called the cool catalog. Obviously, you have a team responsible for building this catalog and maintaining this catalog. And then you have the XID operator team, Prometheus operator team, and a lot of other operator teams who wants to include their operator or who wants to make their operator available on cluster for users through your catalog. So they would submit, they would first build a bundle, each bundle that they have in their operator upgrade graph, and they would submit that to the team responsible for creating the cool catalog. And the team, the cool catalog team would then use OPM index add over the bundle images that were submitted to them to create the cool catalog image, essentially. That was fine. But imagine a scenario where the operator team makes a mistake and wants to replace the bundle in the catalog. So suppose I make a mistake. The XID team makes a mistake on V.0.6.1 and wants to replace that. The pipeline team then had to OPM index RM, the XID operator entirely from the catalog, and then rebuild the bundle. And the reason that we had to rebuild all the bundles is because V.6.2 would have a reference to V.6.1 of the XID operator. Think about it as a link list. And V.6.2 is where we would say that this version replaces V.6.1. So we had to rebuild all the bundles and then OPM index add all the bundles individually to that catalog. That was still fine. But now imagine a scenario where an operator team wants to include a CV for an existing bundle or wants to replace the bundle in the catalog. So like I mentioned, the pipeline team had to rebuild all of the bundle images, remove the operator from the catalog, and then re-add all the bundles. Imagine now that the pipeline team is getting a request to do this for 10, 20 operators, out of the 100 operators that you have inside Cool Catalog. Certainly not a fun time for the Cool Catalog team. So this is when a request came in for OPM for adding a new sub-command called Prune. That would get rid of all of the packages or all of the operators in the catalog, except for the ones specified. That only helped a little bit. It didn't help all of the pain points that I described earlier, though. Now imagine another scenario where operator team wants to add a bundle in the middle of the upgrade graph. So this is, again, where we have to rebuild all the bundles because we have to mention the replaces for each. We have to edit the replaces for each bundle that would come up after that bundle. And again, like I said, think about this as a link list, and you're adding an item in the middle of the link list. So you would have to rebuild all of that link list item. In our case, container images, OPM index RM the operator, and then rebuild or re-add all the bundles inside the index. So this is when another request came in to add a substitute for our field in the cluster service version API that I mentioned earlier. Essentially, this would add a new field in your database. And you could mention through this cluster service version the new bundle that would be a substitute for the old bundle. Now, imagine another requirement coming in from another team where they say that, OK, we released a bundle earlier, but now we want to deprecate that bundle. And therefore, all of the bundles that came in later in the upgrade graph after that bundle would have to be removed so that users cannot upgrade to that bundle. In the current scenario that I mentioned, the Cool Catalog team had no way of doing that. This is where the request came in to add another sub-command called the deprecate truncate for the lack of a better name, which would deprecate a bundle and then truncate the upgrade graph, i.e. remove all of the bundles that came after that bundle from the upgrade graph. So as you can see, imagine I was just talking about the Cool Catalog, but imagine three, four catalogs being built, and then each catalog being the responsibility of each team, of an individual team. Each team would then have come up with their unique way of building and maintaining these catalogs, and therefore they would have their unique requirements for how OPM should behave or for how that tool should let them handle or maintain the catalog. And as you could see, we kept getting requests for adding more and more sub-commands to OPM. And that increased the surface area of the OPM sub-commands. And we all know, with more code, comes more bug that we all needed to maintain. So this is where I will hand it over to Joe, who will talk about how we solve this problem with the new solution. Joe, I think you're muted. All right, so as Anik was saying, we got into this really difficult situation where all these operator authors and pipeline teams that were building catalogs would continue asking for more and more features. And we're almost to this point now where adding a new feature is going to add another three or four bugs on an unrelated feature that is difficult to maintain and keep everything on track. So what we came up with is a brand new format for the underlying index, which is just declarative plain text files stored in a directory structure. So now every package, so every operator team, so you might have an SED operator, you might have Prometheus operator. All of those operators can store their own package level metadata for their index now, just in a plain text, Jamel or JSON file. And this is, so the OPM tool has now been updated to support this new plain text format. The SQLite database format has been deprecated and we're gonna try to migrate a lot of the catalog usage over to this new declarative format to alleviate a lot of these problems that Anik was talking about. So this is just a short example of what something looks like. You have a couple of like root level schemas. Here we see like an OLM dot package, which has package level metadata, the name of the package, the default channel for the package, the description for it. And then you can start listing other things like an OLM dot bundle, which is just an index representation of that bundle image. So we have the name of the CSV that was in the bundle. We have the package that this bundle is a member of. We have an image reference, which OLM can use to unpack that image and find those manifests and metadata that Anik was showing before. And then we have a list of properties that the OLM resolver can use to make decisions about what can I install? What dependencies do I need to have installed? What can't I install because the dependencies aren't met, that kind of thing. So now all of this stuff is sort of plain text and operator authors can now have the control to make changes declaratively based on what they need their index to look like. So the issues that Anik was explaining about, you know, inserting into a linked list of an upgrade graph, now is just cracking open this index file and inserting the changes in the right way, rather than having to do this imperative workflow of removing the whole thing and then re-adding everything in order. So there's a lot of motivation behind this. I think some of it is kind of obvious already from what Anik has been talking about, but the three main things that we were looking for are editability, composability, and extensibility. So kind of what I was just explaining from an editability standpoint, now both operator authors and catalog authors can, or catalog maintainers, can directly edit the contents of the index. They don't have to rely on a tool like OPM that has just a very specific set of things that it knows how to change about the index. They can go and make arbitrary changes. So this opens the door for like lots of new use cases, right? One case would be, I want to make sure that this bundle that I already added a long time ago, I wanna add it to a new channel. So I can just go into my index plain text file and add it to a new channel, done. And before, you would have to basically rebuild that bundle image, delete the entire package from the index, and then rebuild the entire index with that new bundle image that has that new index reference, or the new channel reference. Composability, so now what we can do, because this is organized as like an arbitrary directory structure, there's not a single source of truth for like the entire contents of the catalog. Now what happens is operator authors can build just an index that contains just their package. And then there's kind of two ways you could do this. You can have a catalog that says like, oh, I wanna maintain a list of references to other indexes that already exist, and I'll just copy those into my larger catalog. So now basically we can have this composability feature where you can build indexes of indexes of indexes just by copying more sub indexes into your index. And then lastly with extensibility because this is a plain text format, it's JSON or YAML, there's lots of programming language there's lots of tools out there that understand JSON and YAML. So it's really easy for both operator authors and catalog maintainers to build their own tooling on top of the schema and format and come up with things that the OLM team either hasn't come up with or has decided as to niche of a concern for the larger OPM tooling to maintain and use. But it gives all of these other users an escape hatch to build their own kind of workflows around this low-locking schema. So we've got, looks like maybe five or six commands new for OPM that are supporting this new file-based catalog format. So a couple are just kind of really straightforward. So like OPM and NIT, all it does is it just generates that OLM.package blob for package. You give it a package name and you can specify some flags for specifying the default channel an icon or a description and it'll just generate and help put on standard out just the OLM.package blob. And you can then pipe that into a YAML file you can pipe that through YQ or JQ to make some edits. It's kind of this UNIX way of doing things where it'll print it to standard out and then it's up to the user to decide how to use that. Render is the same sort of thing except that it takes an existing index image or a bundle image or an old SQLite file and produces the equivalent file-based catalog for that content. So this is nice because all of that existing content out there that's in an SQLite-based index image or in an SQLite file can just be readily converted over to this new file-based catalog format just by running opium render, the index image that you want to convert. And then again, this will just spit that entire thing out on standard out and then you can write, you can directly redirect that into a file and package that into a new file-based catalog index or you can pipe that through some other tools and maybe split things out based on each package. Opium Validate, so this is a key tool that is super important for the file-based catalog. The difference in file-based catalogs is that you don't have, like when you make edits to a plain text file, like there's no way of ensuring that every single edit is valid. With opium in the SQLite-based indexes, opium could guarantee that any changes made to the database were valid based on opium's code or based on the database schemas that were underlying those commands. So with file-based catalogs, we needed a Validate command that users can use such that when they make changes to their file-based config, they can run this opium Validate command that'll tell them if what they have built is actually a valid catalog. Opium Serve, this is basically the equivalent of the opium registry serve command for SQLite databases, except this knows how to read a source directory that contains file-based catalogs. It serves the exact same GRPC API. It's fully backward compatible, so you don't have to use a new version of OLM to use this. Any version of OLM that works with the GRPC API will work with file-based catalogs as well via this opium serve command. Then we have a couple of alpha commands. These are still subject to change. That's why we got them under the alpha sub-command, but so alpha-diff is basically a tool that you can send like an old and a new ref in. It does pretty much what it says, and it'll tell you what changed in between the old reference and the new reference. And these references, again, are basically index images or reference or like declarative config directories. And then lastly, there's an opium alpha-generate docker file. This kind of takes the place a little bit of opium index add, which in the past, opium would try to manage your images for you, and it had underlying tooling and configuration to let you decide to push and pull images with certain container tools. So for instance, opium index add has a flag that says push like pull tool equals docker and push tool equals podman. And then opium would actually invoke docker and podman for those pull and push actions. We've decided based on lots of feedback and lots of real world usage that trying to maintain these image push and pull kind of workflows directly in opium is not very maintainable. So the alternative that we've come up with is basically like let's just have a command that generates a docker file that knows how to publish your catalog for you. And then it's up to you then to go to the next step basically and invoke whatever your tool of choice is rather whether it's docker, build or podman. So let's look at kind of what this new catalog building workflow is. So right now we're seeing kind of how this used to work, right? This is what Anik was explaining earlier. Operator teams would submit bundle references and they would only submit bundle references. So a bundle would basically have to fully explain every change that would possibly want to be made to a catalog. And then the cool catalog maintainers would run opium index add and opium index add would have to know how to interpret that bundle and update the catalog appropriately. So now what we can do is we can say that each operator team now maintains their own GitHub reposer directories or however they want to organize their directory of index metadata. And they can do a couple of things. They can push that, they can, well they need to push their bundle images like they always have to a container registry. And then when they go and talk to the catalog they can just basically make declarative changes to their operators in their package based index and submit pull requests potentially to the catalog that says, here's the changes I want to make to my index. There's another option where an operator team could say we want to maintain our own index repository and maybe the catalog says, well that's fine, like just tell us a reference, like tell us where your index repository is and then we'll pull from that. So there's lots of new workflows that we are opening up by having this composability kind of thing that I was explaining earlier. The other nice thing here is that now we can build some interesting kind of CI CD workflows, right? So when the SED operator team publishes a pull request to the cool catalog, that cool catalog might have some GitHub actions, for example, that run opn validate. Maybe that check that the author of the pull request is in the owner's file for basically has the permission to update that particular package in the catalog or at least requires a review from one of the maintainers of that package. And it also can build the entire new index based on that pull request, validate the entire index with opn validate, potentially spin up a Kubernetes cluster, install OLM and attempt to install operators out of that package. So there's lots of interesting kind of CI workflows that we can build on top of this new format. And then lastly, at the very end of this, if all of that passes, then the cool catalog could say, you know, yeah. Sorry, I don't know if we're trying to change slides, but the slides are not changing. That's okay, yeah. Yeah, I'll get to the next slide. I just want to finish my thought here. So the last thing is that once the entire CI has passed and we merge that PR into the main branch, we can actually build a new image for that. So we have some examples of what this looks like, right? So we have this cool catalog repo. It's just an example, but it kind of demonstrates how all of this could work. So in this case, we've got just a single operator. We've got this etcd folder. We've got a Docker file and we've got some GitHub workflows, right? So the current etcd index has just got a single version in it. We declare our OLM package, name it etcd. We've got 0.6.1. So now we decide, okay, we've got an etcd 0.9.4 bundle that we want to add to our index. So we render that. We can see what it looks like on our output. We call render again. We redirect it into, you know, just cat it to the end of our index file. And then we submit a port request, right? So we've just added some lines to our index file. This is what the rendered output looks like there. We've submitted this port request to the cool catalog and we say like, hey, we want the index to be updated such that this new 0.9.4 version is available to users who are using this catalog in their cluster. So then, you know, it goes through that CI and review process that I was explaining. We've got some GitHub actions that basically can build the image and push the image to a cluster. And then at the end of this run, after this thing is merged, we've got a brand new catalog image pushed that has this 0.9.4 version available. So at that point, then users of this catalog will potentially get an automatic update if they've been to a tag that we're updating or they can go and add it, you know, if they don't have this catalog yet in their cluster, they can add a catalog via a catalog source and now they'll be able to install the SCD 0.9.4 version. So there's lots of resources here. We're constantly updating our documentation. If you go to the olem.operatorframework.io website, we've got all the docs. And there's lots of docs about this new file-based catalog. If you wanna check out this example cool catalog workflow, here's the link to that. It's just operativeframework.coolcatalog in GitHub. And then if you're interested in OLM and you wanna join the OLM or the operativeframework community, we'd be super interested. We're a CNCF incubation project. We have weekly meetings between just kind of open public working group meetings and then issue triage meetings. If you're interested in discussing anything with us at those, we'd be happy to have you. And then we also have a mailing list. So we're interested in hearing what you think about the project, get questions or comments and wanna talk to us there. Awesome. If you want more live chat discussion, we've got an OLM dev channel in the Kubernetes Slack. And then there's always the full listing of ways to get in touch with us in the OLM community page. So we're super interested in getting more eyes on this project and more contributors to the project. So we welcome any contribution from the community that we can get. And I think that's all I've got. Anik, did you have anything else? Yeah, no, just thanks for joining everybody. We'll be sticking around to answer any of our questions. So we'll talk to you there. Thank you. Thanks.