 All right, thank you very much everyone. The next talk is by Michael. He also spoke in his room last year and he'll talk a little bit about code workload management. Thank you very much. Thank you. Welcome and let's let's hit the road and get going. Come on in find your favorite seat. So my name is Michael. Hopefully you can all hear me okay in the back. I mean if not I've been at Red Hat for almost eight years now. This is my I think at least third maybe fourth time at Fozdem. I've been working on container tooling and technology at Red Hat for the last at least five years. Basically since Docker became a thing and everybody decided wait a minute we need to get in on this Docker thing and start shipping Docker containers and so on. In the last two years I've been very focused on cluster orchestration, workload orchestration, workload management, deployment tooling all that kind of stuff on top of Kubernetes and that's we're going to talk about some today. In particular what the intersection is of the control plane for Kubernetes, what we use to control infrastructure in Kubernetes itself and the workloads that run on top of that and the tooling that manages the glue between all of that. So we need to understand a couple things about the Kubernetes API to really dive into that. Let's dive into some details. Now Kubernetes itself of course we're familiar with that. Runs containerized workloads across some collection of machines, some collection of nodes, right? And the API that enables us to do that includes a number of primitives like pods. Pods are the way we run a container at a very simple level. Services that give you a network presence and this sort of thing. And if we look at that API we see some interesting things about it. Has anybody ever used this kubectl proxy command before? Few hands. Very handy if you want to explore the API and see what's actually there. So you can run kubectl proxy and it will just start up a local proxy for you, unauthenticated, that you can connect to and just start hitting Kubernetes endpoints. So here I did exactly that and I just curled the root of that API and we're getting a listing of API groups. So this Kubernetes API, it's a web API. Some people call this REST. It's not really REST if you know REST. But you know it's REST-ish. It's what a lot of people call REST but really it's a web API, hierarchical in the way you would expect. And it has these API groups. So many of the APIs you're familiar with are under this core API slash v1 URL space. That's where a lot of them are. But you can see the beginnings of this list. We're only into the a's in this list in terms of what APIs are available. You can add these groups and we can in fact add our own. And that's one of the really special and unique things about Kubernetes. Its API service will allow you to automatically to add your own API endpoints, your own kind, your own data structures. It will serve them for you. It will track them. It will even validate them for you and all that stuff. And that's really pretty amazing. And that's central to what we're going to do with Kubernetes today. And then this API is declarative. We'll dig into that in a moment. But declarative is a very important part of managing things in Kubernetes. So just as an example, you can see down here, I just made up. I completely made up. I've no idea if FOSDEM uses Kubernetes or ever will. But we could have, for example, an app.fosdem.org API namespace that we could then put APIs underneath in a Kubernetes cluster. All right, let's dig into an actual resource and how resources are structured. So if we were going to look at one individual resource like a pod or like a service or something along those lines, we would see these kind of things. So let's start at the top and we start with an API version. What API is this endpoint a part of? So this could be app.fosdem.org. For example, this could be I work with a lot of OpenShift.io endpoints for obvious reasons. You have a namespace and then a version. So versions are things like v1, alpha 1 is where most of us start. Could just be v1, v2. These are semantics. They have a very specific meaning within the Kubernetes context. And then the kind. The kind is like a table name in a traditional database. Kind is like pod. Kind is service. We could make up a kind like WordPress and have a kind that describes what a WordPress deployment will look like. That could be our custom kind. Now into this metadata section, name and namespace are the way that we uniquely identify one individual record of the type that is defined by group, version, and kind. You with me? So name, that's a unique idea within a namespace. And of course, the Kubernetes namespace itself divides resources up within the cluster, largely for reasons of role-based access control and quota control. And then this is the really interesting part. Spec. Spec is where you declare what you want. You specify what state you want to exist in the world. So u as a user of the API, the spec part is for you. The status is not for you to write. The status is for you to read. The status is where a controller of some kind running in the cluster. We'll get into what a controller is in a minute. Status is where a controller can write some information for you to find about what is its understanding of the world. What has it actually done? So in the spec you may have asked for a certain number of things to exist. I want 10 pods. I want 15 of something else. The status is somewhere that it might tell you how many currently exist. And is it trying or not to make more or remove some? Is it encountering problems while it's doing that? The status is your way to get feedback from a controller about what it's been doing. But remember this is all asynchronous. So when you create a resource, you define the spec, then you basically wait around and watch that resource for changes. And eventually you'll see some changes to that status that will inform you about what's actually happening. So here's examples. A pod. So let's just look at the one on the left. A pod. So we've got the API version. The group is implicit. Part of core Kubernetes. And the version is v1. And the kind is pod. So we got that down. The name, it's got a name. Now there's no namespace here. But there's no namespace listed here because if we were going to create this resource, whatever namespace we are creating it inside of will be implicit in the create action. I have a namespace as part of that create action. So that namespace will really just be implicit at that time. In the spec, we've specified the typical kind of things you would anticipate. A pod to have. What container image should be running? What ports if any need to be exposed? And you can imagine other kinds of things we could specify here. Storage. Do we need to attach some persistent storage? Do we need to mount some secrets in here? All that kind of stuff. Also notice there's no status here. Why is there no status? Because this is a resource in the form where we would, we're about to create it. We're about to hand this to the, to the API. So no controller has touched this yet. After we created this pod resource, then a controller would start acting on it and start putting a status on it. And then we'd find out what's going on. Okay. Now this is the real exciting part. Custom resource definitions enable us to tell the Kubernetes API service at an endpoint, at a new endpoint among all the endpoints that you're serving right now. And we can tell it what is the structure of the, of that resource. What is the specific data structure that would go into the spec of that resource? And we can then use those just like any other native Kubernetes resources. We don't have to write any code to do this. We don't have to run any new API services to do this. All we have to do is, of course, in YAML, because it's Kubernetes, fill out a YAML resource. There is an actual core resource called custom resource definition. We just create one of those and poof, the API service starts serving this new endpoint and a new data, data type. It will even do validation of that data type using open API v3. So you can, you can tell it things like this field is a string. This field is a positive integer. This field has a minimum value of zero. Whatever other kind of validation mean is meaningful to you and meaningful to your data structure. This is really powerful. Of course, once we've done that, all we have is a data store. All we have is, well, I can create a resource called WordPress. I can read them. I can update them. And this API service will keep track of them. But nothing is happening in the background. We don't have any action yet. Nothing's taking action because we don't have a controller yet. So this is an example. What if we wanted to deploy memcache D? We could make a memcache D custom resource definition and then somebody could come along and create one of those resources and specify a size of three, for example. Later on we'll see the operator SDK mentioned. If you wanted to go through the getting started guide for the operator SDK for making an operator, you're going to start by making a memcache D operator. Nice simple example. Okay. But what is an operator? So an operator is three things. One is a custom resource. We make a custom resource, something like WordPress. So if we were going to make a WordPress operator, our goal is to make an operator that understands how to deploy, manage WordPress in a Kubernetes cluster. And we would start with a custom resource called WordPress that describes what would a WordPress deployment look like? What's the title of the blog? What's the contact info for the person that runs it? Color scheme, you know, those kind of things. I'm not much into blogging so I don't do that. But you could. Next we need a controller because with just a custom resource, no action is happening. So we need a controller to be running in the cluster. What is a controller? A controller is typically a pod running, a container running in that cluster side by side with all the other workloads. And it watches the Kubernetes API for the resource types that it cares about. And whenever it sees one and sees an event happening related to one of those resources it cares about, it runs what we call reconciliation. And reconciliation is where that controller will look at the spec. It looks at your spec that you just specified. What do you want? What have you declared? Then it looks at the real state of the world. What actually exists right now? And then it makes changes to the state of the world to bring that closer to what you've asked for. So that's what a controller does. It's a key part of this pattern. But there are lots of controllers. What makes an operator an operator? What makes a controller an operator? That is the operational knowledge. All the things that an SRE team might do, for example, in the course of managing some service, all that stuff about how do you deploy it? What steps do you take to deploy it? How do you monitor it? How do you repair it when it breaks? Every ops team understands a certain number of common failure modes for their application, right? Something shows up in the logs or some alarm gets set off. You say, ah, I bet I know what's wrong here and we just got to go do the thing and it'll be back on the road in a minute. You can automate that process. You can code those kind of repairs. Custom kind of scaling. If you're going to scale based on custom metrics, based on Q depth for some particular service, you can customize scaling. All kinds of things around managing the full life cycle of your workload, you encode into a controller and that's what makes an operator. These three things make an operator. And what can an operator do? So deploying and managing an application is the prime use case. That's what we talk about, like a WordPress for example. But that's not all. Sometimes people write operators to either report on or even enforce policy. So you could ensure pick some resource type that you want all pods to have some particular annotation on them or all namespaces need to have a certain quota in them. You can imagine other kinds of things. You could write an operator that watches those resources and if it finds one that does not have whatever state you want to have, either it could tattle and tell you about it or maybe even enforce it. Maybe just put it back the way it's supposed to be. And if somebody tries to change it, it'll get that event immediately, slap that away and put it back the way it's supposed to be. Then managing external resources. This is a really interesting area. A little bit newer of an area. But you can manage things like a network switch. Imagine reconfiguring a switch that's in use by a cluster from the cluster's own API. So we're natively in a Kubernetes native way interacting with and managing physical infrastructure around us from the Kubernetes control plane by using this controller and operator pattern. And you can imagine some other examples. So here's the pattern illustrated. We have this controller, this watching events. It's going to tell the API service which resources it wants to watch and which namespaces it wants to watch. And it'll get an event. Anytime one is created, updated, deleted, it gets an event. And it's time to run this reconcile function. Now if you wanted to go implement your own controller, normally there's frameworks that will help you with this. They'll scaffold out even a basic skeleton of a controller for you. And your job is to go to the reconcile function and start implementing that. So the reconcile function starts usually with something like let's just retrieve our WordPress resource. And okay, now let's start looking at it. So this person is asked for a WordPress with a title of this. Does that exist? Is there a WordPress deployed that has that title? If not, let's go deploy one. Or maybe it is deployed but it's got the wrong title. So let's change the title. Maybe the version that's deployed is not the one that's supposed to be deployed. So let's initiate an upgrade. And you can just imagine you validate the entire state of the world and then you make changes that bring you closer to the spec. And in a traditional operator, what pops out as the product of this is an application. All the pieces of an application. Pods, services, secrets, config maps, persistent volume claims, other services that depends on all that stuff, you would orchestrate the creation of and the life cycle of from your operator. So this puts you in a good position. This is thinking of an example with Prometheus. There's a Prometheus operator, you'd not be surprised to know. So by having a Prometheus operator, we're being proactive about the management. When something changes inside the cluster that is relevant to this running Prometheus, whether it's good, bad, immediately this operator gets an event and goes and checks out what's going on and makes sure that everything is still the way it's supposed to be. As opposed to being reactive. We're not paging somebody and then waiting for this to get fixed. We're not submitting a ticket when traffic increases, asking for more infrastructure and then getting approvals. We're just taking immediate action and doing it. So how do we make an operator? So we have this operator framework that has a bunch of pieces, but the operator SDK in particular is your starting point. Operator SDK lets you make operators in different ways. You don't even actually have to know go. Go is what we typically think of for operators and what many operators are written in, but that's not the only way. And it will scaffold out for you a skeleton project and from there you mostly implement reconcili logic and it's pretty easy. The operator lifecycle manager is like a package for a package management system even for operators. You could think of it like yum or DNF or apt or pick your favorite package manager and then metering is a more advanced feature that you might find useful as well. There's three primary ways you can make an operator. Helm is a very good way if you already have Helm charts, which many people do and you want to start participating in this operator pattern where you're being proactive and immediately responding to events related to whatever resources you're managing, then you can take one or more Helm charts and actually in one command you can convert a Helm chart into a basic operator. There's limits to how much you can do with a Helm based operator because Helm by its nature is mostly about templates. It's harder to code into a Helm chart this kind of active management. But Ansible is something I'll see many of you are familiar with. Ansible is very powerful. You can do just about anything with Ansible and you can make an operator with Ansible. I'm pretty sure I gave a talk about that here last year. You can make an operator where that reconcili logic is written in Ansible. It's a great fit for that. So that's something you might consider and then go of course a very popular way to write controllers and write all kinds of tooling around Kubernetes. So we've been talking about operators and controllers and we dug into the API service and what's it really boiled onto? What does it mean? What have we done? So we have an operator, one way to look at it, one advantage we have here, we've created a higher level API. So somebody who wants to deploy WordPress rather than needing to go to the Kubernetes primitives of making their own pods or deployments and having secrets and persistent volume claims and services and doing load balancing and all this other kind of stuff. Deploying databases, deploying logging infrastructure, whatever else might be involved. Not to mention having an upgrade strategy, having some kind of tooling that's going to manage that upgrade strategy for them. We've abstracted all of that and software engineers love abstractions. We've abstracted all of that behind just a WordPress API where someone can just describe, I want a WordPress that looks like this and maybe this is the version of WordPress I want to be running and then all the rest of that is taken care of for them. Our workloads are starting to look more like managed services. This is a really key part of this pattern. So you're familiar with I'm sure working in various public clouds. If you want a database, if you want a queuing service, if you want any number of other services, you can just get a fully managed service and just use a database service provided by whoever your cloud provider is. Here we can approach that same kind of experience but running in your own cluster by having this operator, this automated piece of management, running side by side in that cluster right next to your actual workload, managing it, babysitting it, scaling it up and down as necessary, fixing it based on common things, upgrading it as necessary. We can start to approach that same experience. Infrastructure and workloads side by side. Infrastructure is not just the load balancing, it's not just the the primitives that we normally think of with running workloads on Kubernetes. It's storage. It's like the real core networking infrastructure. In some cases it's the cluster itself, the virtual machines that the cluster itself is running on, managing all that side by side with one API. And that means we get to use RBAC, one RBAC system for the whole thing. So we can put quotas around and other controls around who's allowed to deploy WordPress in this cluster, how many of them can they deploy or how large can their WordPress deployment get. API discovery. So Kubernetes is really powerful for API discoveries, much like an application platform in that way. And using the same API discovery and service discovery mechanisms across that spectrum of infrastructure and workloads can really be very powerful as we combine them. And when we start thinking of all this together, this is what starts to be what we think of as Kubernetes native. So we could take something like WordPress that vastly predates Kubernetes. But if we expose it in this way and we manage it this way, it's not only running natively on Kubernetes, it's being managed natively through Kubernetes, through that same API. One set of tooling, so you can use kubectl. Any tool that can be applying for Kubernetes can manage that WordPress now. That's what we often think of as Kubernetes native. Who does this sort of stuff? Software vendors, of course. If you're a database vendor, you got to have an operator these days and you probably do. You can imagine why. Databases are sticky things and there's a lot to managing them, especially in a containerized environment. Cloud providers have made operators to expose their services natively inside Kubernetes. And then ops teams. This is really probably the heart of this for you. Is ops teams for your own customized software, your own in-house proprietary software, whatever it is you're responsible for running on your infrastructure. Many, many ops teams are building their own operators to automate that in this Kubernetes kind of way. Super powerful. If you want to see some examples of these, you can go to operatorhub.io. There's, I don't know, something probably approaching 100 now. I think it was about 80 the last time I looked. At least a couple months ago. Different operators from all kinds of vendors, all kinds of providers and projects. That'll show you a real good cross-section of real-world use cases for these kinds of operators. And then just a couple of interesting use cases here real quick. Cluster API. Fascinating project where people are making controllers. So for example, there's an Amazon controller that if you want to scale up your cluster, you can go change a size from like 10 to 15 and that controller will go talk to the Amazon API and get you five more virtual machines, ensure that they've got Kubernetes running on them, add them to the cluster, do all that work for you and you've now just scaled out your actual infrastructure. There's the MetalCube project that we unfortunately didn't hear from earlier that is all about doing that with bare metal. So it's taking cold, dark hardware and actually on-demand provisioning images onto it and adding that to Kubernetes. RookSeph running storage natively in a Kubernetes cluster, not just any, but even on bare metal. Managing data resiliency and redundancy on bare metal machines inside a Kubernetes cluster. Interesting problem space, some hard challenges, but they're using extensively controllers and operators to manage that and make that viable, make that something that's reasonable to do. CubeVert really fascinating, you can run virtual machines inside a pod in your Kubernetes cluster. So you can now schedule virtual machines the same way you're scheduling pods in as controllers that make that possible. So that's all the time we have. I'm going to step outside and answer any questions that you guys have and I'll be there as long as you like and if you can't do that right now feel free to contact me here, email is probably the best. I'll be around this weekend and happy to chat with you guys about operators. Thank you very much. Thanks Michael. You're a very good speaker. Thank you.