 Welcome to our developing applications, Automating Kubernetes application, Stateful applications with Kubernetes operators. Hello, my name is Marek. I am the developer advocate at the OpenShift team at Red Hat. And I will be speaking at the end of the presentation doing some hands-on show the demo, which is not going to be live demo, but whatever. And my colleague is Jorge. My name is Jorge Morales. I come from Spain. And I work as OpenShift developer advocate for Red Hat. OK. This is the agenda of what we are going to be talking today. Just take a look at it. I hope this is what you expected to see throughout the day. If not, it's still time to go to some other talks. But it's going to be about operators, Kubernetes operators. Scaling stateless applications in Kubernetes, it's easy. There is Kubernetes primitives that allows you to scale to model stateless applications, which is a replica set. And scaling up is really easy. So you just have to execute this command, this little command. Keep CDL scale, the name of your application, the amount of instances that you want, and you'll have your application scale. Kubernetes is a declarative platform. So it will look into the state that you as a user have declared. You want your application to have, which in this case is three instances. And it will look into the real estate that your application has in the cluster. In this example, there is one instance. And it will just scale it up. So it will create a couple of instances more to match your desired state of three instances of your application. OK. But what about applications that require you to store some data? What about these applications that have the notion of a cluster? Database is one example of them. There is software that has different instances, might have different identity, different role, masters, and a slave. How easy is to run those on Kubernetes cluster? Well, running a database in Kubernetes, it's also easy. It's just a matter of keep CDL run and the name of your image. Or if you want to use a stateful set, probably you need to use a JSON or Jamel definition of your stateful set and you apply it. And then you have your application, your database, up and running. But running databases or cluster stateful applications on Kubernetes, over time, it is much harder. Why? Because there is specific topics or the specific actions that stateful applications need over time that are more complex to achieve and that are specific to your application. Resizing and upgrading your application might be different from one application to another. Reconfiguring, doing backups and restore or healing. Some applications might require, for example, whenever they failover to do some rebalancing of the data that it has. Some others may not. So these type of constructs, they are specific to the applications to the stateful application. And you cannot model them in a generic way. Every application on any platform must be installed and configured. But over time, they also need to be managed and upgraded. Why you need to do these kind of things over time is because once you install a database, for example, today, you want in a year or in two years, that application, that database, to be up to date. There might be probably some security vulnerabilities for which you might require to have an updated version of the database. Maybe there is a new version which provides a new features. Security, it's one of the topics that for everybody is critical. And patching your software is critical to security. Security is critical to every business. So we need to eventually take care of the software that is running on our platform and update it. Anything that is not automated is slowing you down. So who has not any time told somebody, hey, I need to upgrade my database. And then they get in contact with the DBA. They start planning what is the upgrade process, what steps do I need to take. And then they think, hey, when can I do this upgrade on the production cluster? And they plan for that. So all of that takes a lot of time. If you can automate it, you can automate all those processes, all those steps. They can happen much faster. So you can have more agility on your application lifecycle. If only Covenant is new. If only Covenant has had the knowledge of how every application is managed over time. It's upgraded, it's installed, it's configured, it's back up over time. So meet this guy, Grant. Grant Pecañonbre. He's a DBA. He's been working for, it's functional. Of course he doesn't exist. He's been working for a big database vendor for over 18 years. Do you know how much knowledge this guy has? He's got a lot. In these 18 years, he's been probably working with different versions of the database. He's been working with many customers. He's been seeing different cluster topologies, databases topologies, many different ways. He has a huge deal of expertise. But the problem is that this guy is rare. Not every organization have one of these guys. Even if you have one, not everything can get hold to this guy at any moment. I used to work in a company where getting the DBA to help you in your project, it was like, hey, yeah, in one month you'll get time off the DBA to help you out. Like, cool. This is slowing my whole process down. What if we could get his knowledge and put it in a box? Put it into a software component. We can make what he knows into a component into a software component. And this software component that now does and knows what this guy, what Grant knew about running databases over time, we can now deploy it on different clusters. Not only deploying it on different clusters, on different Kubernetes clusters, but also anybody can use that. So from small companies to big companies, now they have the ability to have a DBA on a box. They have the ability to have production ready databases anytime. Or they have a software that knows now how to upgrade the database whenever a new release comes. They now have the knowledge or the automated software that knows how to configure the database, knows how to back up and restore that database. As long as this is running on a Kubernetes cluster, the software component will work. So we come to the point to what are operators? Operators are automated software managers for Kubernetes application. They deal with the installation and lifecycle of your application. One of the key pieces of here is Kubernetes applications. So right now, this talk is meant for something that will take applications that will take advantage whenever they are created of Kubernetes constructs. They will be configured in a cloud native way using secrets, config maps, all these Kubernetes things that are on the hood. But the pattern itself can be used in OpenStack software, can be used in any type of software. This is not really the pattern itself. It's not really specific to Kubernetes. So what is the Kubernetes recipe? The recipe is to provide through the extension points that Kubernetes has hooks into the platform to make this work. So Kubernetes for some time has been working on making the platform extensible by providing a pluggable networking through CNI, pluggable stories, through CSI, pluggable runtimes, through CRI, and pluggable custom applications through a custom resource definition. So using this custom resource definition on controllers, you will be able to create your own application operational knowledge built into the platform. You no longer need to maybe modify Kubernetes code base to extend it with additional functionality. You just plug your additional functionality into Kubernetes, and Kubernetes will know, hey, this is custom. This is for this guy. This is how he wants to manage his software. I'll pass this to his specific software manager. So application-specific controllers are how this is built into Kubernetes. There is a set of controllers running in Kubernetes that will use Kubernetes APIs to create, configure, manage instances of your applications, of your complex stateful applications, of your software. These are packaged as a container runtime or container image. This is deployed into Kubernetes like any other application. So it's a native Kubernetes application that runs. And that acts as a Kubernetes controller. So Kubernetes will, at some point in time, talk to this controller. Then you need to model what your application is, how it will behave, what type of characteristics you'll want for your application. This is a made-up example for a production-rated database. So I might decide that whenever I create my software, I might want to define, hey, I want to have the different cluster size. Whenever I create instances of this, I want to define the size. I want to define how many replicas, replicas, or even the version. And just looking to, I'm saying the version, but I'm not saying where the software is coming from. That is built into the controller. The controller has all the knowledge of where he should be pulling things in or where he will be getting the images. The only things that I need to provide or I want the user to provide is how I want to use the software, which I simplify as possible. Then this descriptor that defines my application, how my application will be instantiated in the cluster, I deploy that definition in the cluster using the custom resource definition mechanism. And then Kubernetes, it will say, hey, you deploy it or you try to deploy something called production-ready database. I don't know anything about this production-ready database. But I have a guy, which is the operator that knows about this production-ready database. So I will give him the ability to start managing this production-ready database with this information that you have just provided. So what will happen is that the operator will take care of doing the reconciliation of the desired state that you provided through this manifest into the cluster. So in this example, I define two create three instances of a database with three instances. So it will create probably be a stateful set the database. But it will not only maybe use a stateful set, it may use secrets and config maps to store some configuration that is internal to how I have modeled my application of what the knowledge it is about my application. If I have deployed it instead of creating the database from scratch, I might have provided a new manifest with an updated version. This controller will take care of doing the upgrade of the database the way that it's meant to be. So it might stop certain instances and start the map or might do some rebalancing, whatever is embedded in the code in the operational logic that goes with the operator. So to make things easy, Rehat and Coros have been working on the operators for a couple of years, maybe three years now into this pattern, and they have gathered a lot of best practices and recommendations on how to do things, and they have created something called the operator framework. This framework is an umbrella for different, more specific projects that I'm gonna describe now. One of the things that happens is that I created an operator, I deploy my operator, and it's two years later, my operator that was deployed back then, it also has a life cycle. The application that is current, maybe it's different version, the processes that are involved with the database are probably also different. So I need also to upgrade my operator. I need my operators to be kept up to date. So for that, why not use the same pattern? If we have an operator, it's a software manager for applications, but at the same time, because an operator is an application, I can have a software manager for operators. So this is an operator for operators. And this is what the operator life cycle manager is. It's an operator that will take care of the life cycle of your operators. And also for the operators, because the operators are supposed to be, are meant to be, backward compatible, one of the cool features that we might want in a cluster is to be able to have them updated automatically over the year. So like you have on your phone, whenever you have new installed an application and you say, hey, I want this application to be automatically updated when there is a new release. Because you should trust the vendor that is providing, or you can trust the vendor that is providing this to have the expertise of deploying the databases in the most meaningful way. You can say, hey, I trust this operator to be upgraded if it's coming from a trusted source. So I will let this operator to upgrade automatically. Otherwise, you can just say, hey, in your cluster, you will get a notification. Hey, there is a new version for this operator. And then you decide whenever you want to apply the update for the operator yourself. But the operator life cycle manager will take care of doing the upgrade of that operator using the same techniques and the same patterns that are used for the operators. There is also a huge ecosystem marketplace of operators. And this is growing. So all the ISVs, all the vendors, some vendors are trying to produce operators for their technology because this relieves a lot the adoption of their technology while also makes it make it easy. So that means that I can now deploy a Redis or Postgres or Prometheus. And I don't need to have somebody being an expert on how to operate this Prometheus because the guys that are developing that are giving me the expertise they have on how to manage this Prometheus. There is in this GitHub repository, there is a collection of open source frameworks that we are collecting, but there is more and more added every day. Once you have the operators, you need to have a marketplace. So the framework also provides a operator marketplace. This is the implementation from OpenShift, but you can have your own implementation in your Kubernetes cluster. That will show you what operators are available in your source of operators. So like Docker Hub, there is a source for images, for container images, or Kuei.io, the operators will be at the end, they are sour, will be residing in some software channels, and then this marketplace will connect to this and will show you what is the available software that you can deploy. As an admin, as a cluster admin of your Kubernetes cluster, you might say, hey, I want to deploy this operator, for example, Couchbase, because I want my developers, my users to deploy Couchbase databases. So as an admin, I go to this marketplace, I say, deploy this Couchbase operator, I might decide, hey, I want to subscribe before automatic updates or not, but once it's available, the only thing that is left is for the users, the developers to go to the service catalog and say, hey, whenever I'm developing, now my application requires a Couchbase database. He goes to the service catalog and says, instantiate me a Couchbase database. That database is gonna be instantiated by the operator and it's gonna be managed and lifecycle by the operator. We also have something cool that we're working on that is for metrics to get statistics, to get information about the operators and the things that are created by the operators, so you can do some telemetry and charts back eventually. So you can be able to say, hey, you are using this technology, you have used so much of this, or even at the end, you may need to be charged for the uses of this technology. At the end of the day, if you deploy that Couchbase databases and you are paying subscriptions or licenses, you may want to know which departments are instantiating this Couchbase databases to be able to, at the end of the day, do some conciliation. So how do you build your own operator? For that, we have in the framework, there is an SDK. The SDK, it's just something that it makes it easy for you to create operators. How? By providing scaffolding and code generation to bootstrap may play. So you just go with the SDK, command line operator SDK, create new on your print, and it will give you the project structure for you to create an operator with a lot of code already built in for you just to plug your operational logic. And then this code that has been created, it provides all the extension points that cover the common operator use cases. Backup, restore, install, upgrade, configure, reconfigure. So all of that is already built into or there is a boilerplate code that has been created. So the only thing that you need to do is what is specific to the application. All the waving of these things, it's provided by the SDK. And then it also provides a high level of APIs and abstractions for you to use this or create your operational logic. So you need to be a Kubernetes expert in order to use this SDK because the SDK simplifies the process of working with Kubernetes. Sometimes whenever you are working with a Kubernetes resource, you need to watch the CD server for changes. Then you need to get into a list that is coherent and all of these kind of things that is complex for the engineers to do. All this code that is complex, we have simplified that and made it into a high level abstraction and APIs for you to use. So you just do SDK.whatever method and there will be a lot of boilerplate code, nice code happening under the hood that will simplify creation of operators. And with that, I will give it to Marek. He will give you a taste of it. Thank you Jorge. Can you hear me fine? If I have the mic like that. So before I start, I would like to ask you a few questions. How many developers do we have here? That's the people who build cool stuff. Okay, a lot. How many operations do we have here? That's the people who run cool stuff. Okay. And how many managers do we have? That's the people who slowed out the other two groups. Ooh. Okay, almost no one. So we have a lot of engineers and you have a lot of operations that's good. So as Jorge said, the operator framework that's all for you. So we have few slides that actually show how to use the operator itself. Would you like to see those as well? Or not? That's mostly for engineers. Would like to see those. Okay. We can do that. Close this. We have the slides over here. Shift. And now how do I show them? Skip slides. Sorry? Remove the skip slides. Skip slides. Remove. Go here. Present. And present. And make it big. Okay. So now this is the section mostly for engineers. I can do it. That's fine. So first you run the operator SDK. So that generates the, sorry, I have to be this way. So you can hear me. So the operator SDK actually generates the bootstraps, the skeleton of the application generates some stuff for you. And then you can follow up from there. It follows the basic layout that is good and that has been tested that works for this kind of applications. It's a CLI application that you can actually use. So second, if you have bootstrap your application, you will need to define your CRD. So the CRD tells Kubernetes, this is what my resources are going to look like. Here you can see the example of the resource that could be Jorge's production database that he was speaking about before. So when I have these two things, I will need to tell using the SDK that I will be watching something. So you see that you don't really have to go into big details of communicating with Kubernetes. You are essentially saying this is what I care about and watch it for me and this is my handler that is going to receive all the notifications. Have somebody here coded or created something using the Kubernetes API that's something like an operator or something in this area, raise your hand. Okay, how difficult is it? It's pretty cool, right? Because if you actually do it in a real world, you will see that you need to listen to a lot of changes. You need to check the state. You need to handle a lot of race conditions. And the API is powerful but also quite challenging if you want to do something that's really production ready. So if you see this and you have actually been doing something like that, you will see that it's pretty, pretty cool. And the handler is very simple. We don't have to write it here. But then you have the reconciliation. I hate that word. But this is the reconciliation logic that's inside the handler. And I will move to that in the demo because I have a slide that has all these three steps next to each other. And then you generate a code that actually generates the deployment and all the resources that you need to deploy the operator. And finally, you can push it to Docker Hub, Quai, whatever you like. And then you can apply it on your Kubernetes cluster so you have your operator running. It will also create the RBEC configuration so it setups the permissions and all the configuration of the users correctly. So that's one of the other benefits that you can get for free. Okay. And then you can just deploy an application just as Jorge did before. So this is from the perspective of the developer what developers needs to do. I am simplifying a bit, right? We are at a talk, so there is still a lot of logic that needs to be handled in the requisition loop. And your handler will do a lot of work. But essentially all the logic that needs to be done for communicating with Kubernetes, getting the right information, being notified in the right time that has been pulled into the framework. So you don't have to actually work so much, so hard against Kubernetes. This has been done by the SDK already. Okay, so now this is ETCD cluster. Who knows ETCD? Good, everybody knows ETCD. So this says, give me ETCD cluster and it will have three nodes and it will be version 339, right? This is the custom resource that I can apply to Kubernetes. And my ETCD operator is going to consume it and do something. And that something is here in three different steps. First, the operator is observing the state. So essentially it knows what is the state of the world, right? So I know that there is a cluster that has two nodes and those two nodes are 338 and 339. What did we specify? That we want 339, right? So what is happening here? It's already, it picked up the change already and redeployed one of the nodes because there was an upgrade, right? It's 338, 339, so we have two nodes on different levels. So it is observing the state of the cluster. It's observing the state of my definition of the resource and when there is a change, it acts somehow. So in this case, it has updated one of the nodes already to 339, one is still 338. So what's happening then? It knows there should be a version of specific happening and there should be three members of the cluster. So then it will, needs to remove one cluster because we specified, where is it? Three clusters. What's, this is wrong. My slides are wrong, but that doesn't matter, right? That doesn't matter. I probably wrote it in a wrong way. That doesn't really matter. The idea is the same. First you're observing the state. You see that something's happening or something's not happening. When you see that the change of this, if your definition changes, you need to analyze the definition. You need to check if you need to change the world to actually accommodate for the new version of the resource. And eventually, when this happens, you act. You remove something, you add something, you upgrade something, you do something else. This is called the reconciliation route. They could have told us some better name for that. So this is how all the operators work. This is what your handler does. So in your own operator, you will be writing these loops, these logics that have to analyze something because you will get the information from the observers. And then you need to act. You need to do something against the cluster to actually accommodate the changes that you have been doing. So there is a demo that actually installs the, first the operator, then it installs the cluster and then you see that the cluster is ready. Then you can use the ETC's control tool to actually do something against the cluster, right? So you have three ports and you have the operator. So we have deployed the cluster and when we delete one, one is terminating and then it's a new one running because we deleted one, the cluster knew that there should be three members so it created a new member and joined it into the cluster. So if you have just basic Kubernetes cluster and you have a replication controller, it essentially knows that if there is not enough ports with a specific label, it should create new ports, right? Everybody's familiar with that? Okay, not so many people. So essentially in Kubernetes, you have something called replication set or replication controller and it knows that there should be some containers with some specific application running at the time. When I delete one, it will just create a new one but it doesn't have any logic that is required to actually join a cluster. Like if I have a master slayer application, it doesn't know how to join MySQL or ETCD into a cluster. It just knows that it should spin up a container. So what the operator is putting on top of the basic logic in Kubernetes is it understands how these applications work. So it knows that I need to have three even I deleted one. So it will know how to join the cluster, not only how to spin up new container with my application. Okay, so there is a database over here. This is a bit more complex. So this comes from Europe, right? Everybody knows GDPR, everybody loves GDPR, right? So when I'm running my application, I have to take care of my data. So when I am creating a new operator for my database, I can take this into account. So I can create a resource that is a compliant database and that one knows that it has to be restricted to some specific space. It has to be deployed in Germany because we are in Germany, right? And it has to be backed up hourly because I have to have consistent state and GDPR so that I have to be always able to restore the data. And I can scale up and I can, something can fail my database. But this logic is very specific to your deployment. So if you write your own controllers, your own operators, they can be tweaked to specific use cases that you actually have. It doesn't have to be something generic, like running ETCD cluster. It can be like running a T-Mobile database somewhere in Germany provided for a customer that the data cannot never leave the German soil. And the operator can take all these things, these legal aspects into the consideration as well because in the operator, you can code any logic you want and you can apply it automatically just by changing these different structures. But if you change these structures, you need to teach the operator how to handle it, right? Because that's in the operator, the logic is there. So you are essentially encoding the operators or the operations logic into code, putting it on the Kubernetes and using the low-level Kubernetes features that provide resiliency, high-ability, and all these things. But you're putting your own knowledge that you have gathered through years of real work and trying to automate it, code it in, and made it repeatable. As Jorge said, what's not repeated, that slows you down. It's not automated, that slows you down. So do we actually duck-foot ourselves, like at Red Hat? Oh, yes, we do. So there will be OpenShift 4 and there's going to be new version and we will have a new installer. And how our new installer is going to work? I wanted to have a demo. It takes a bit too long because the Wi-Fi is slow and I work against AWS and it creates stuff, so I will just talk you through it. So our installer, what it does, it provisions in infrastructure. So it takes Terraform, generates a Terraform template based on some definition and applies it against cluster, right? So it can be AWS, it can be Google, it can be Azure, whatever, because Terraform supers those or it can be on-premise, but it provides the infrastructure. Then it provides a temporary Kubernetes cluster to actually, so we are able to use Kubernetes. And then we start some so-called Ignite process. So Ignite process takes this temporary Kubernetes cluster and starts deploying operators from that onto a new cluster. And everything is handled through operators. So this slide is a screenshot of a deployed OpenShift cluster. How many operators can you count? And it's not all of them, right? So if you, can you read it? Who can read it? Okay, half of the room. So just for your information, so you can read, there is a Tecdonate operator, there is a Caster ingress operator, there is a cluster samples operator. So there is an operator that deploys sample applications inside the cluster. There is an operator that handles a network logic. There is an operator that handles config injecting and et cetera, et cetera, et cetera. So essentially what happens is we deploy one operator that starts deploying other operators and those deploy other operators until you end up with like 100 operators. And essentially the whole process is like Ignite from nothing from a clean infrastructure through operators, through using this pattern. And it configures the network. So usually we have the provisional cluster so we deploy the network operator and network operator writes down to the operating system and changes things and deploys your networking operator, networking structure, et cetera, et cetera. So there is a list of those that we actually have and there is a lot of them. So it doesn't have to be just I am running my database cluster. It's like I am igniting my own cluster of Kubernetes or Open Chief, which is a distribution of Kubernetes. And I can use operators exactly for that. I can use them to bootstrap anything, to manage almost anything. And in the end what it is is I have encoded the logic to do something, operations logic into the code and I run it on Kubernetes and I containerize that and that's pretty much it. And I think that's what I was supposed to say. Can I get? So a cool part about using operators in Open Chief for the installation and bootstrapping process is not only that it will help you install, at the end of the day the object is not to have 20, 30 operators up and running, is that these operators will manage the cluster for you. Okay, so how many people run Kubernetes right now and how many people do you need in order to run Kubernetes in an effective way? Now you have the operational expertise that we have for running Kubernetes clusters built into the cluster itself. So there is an operator that knows how to deal with the networking SDN. If there is a new version of the networking SDN, if there is a patch that needs to be applied because there is a bug or vulnerability that needs to be applied, whenever the new version of the cluster will get shipped, it can, the cluster can auto heal itself and can auto install the new updated version for the networking or the similar thing for samples or for anything that is there, the HCDs, anything that you have running in cluster. So the operational requirement, the operational burden that a platform like Kubernetes has, it's simplified a lot. That means that you can fire all your operational people? No, hopefully not. Okay, but it will be simplified. They will probably be able to do some other stuff and they will get all the expertise that people creating the Kubernetes platform, Google, Red Hat, CoreOS, it's Red Hat now, IBM, Pivotal, all these guys that run Kubernetes clusters, they will provide all these operational expertise into OpenShift, into Kubernetes, into their platforms so you can rely that they will be up and running in a proper way. Like you will trust the applications that you are deploying via the operator like the databases that we talk about. So there is, after the talk, we'll tweet them, that's why you should look into our Twitter hundreds, we'll tweet the link to the slides, there is a link to the helpful, some helpful resources. These are the operator framework. The most important one is the first one, which is the GitHub operator that's framework organization and most of the stuff that we have shown today, it's hosted on that GitHub repository. There is some blogs, some information that will help you understand even more what is the operators are about. And then, coming soon, we'll have an online learning platform, we'll have some training on how to create operators which is available in learn.openshift.com. And with that, thank you all for coming. I think we have a couple of minutes for questions if you have one, otherwise you can just grab us on the holes and ask us any question. Thank you. Thank you.