 Okay, so Let's say a lot have been going on with in red at so I wasn't actually sure Which background I needed to put in here. So maybe you like this one better so Well, I choose red At the end so we'll go with red. Anyway Welcome everyone. My name is Antonio. I work with red at or IBM whatever I'm a software engineer leading the MCO or machine config operator team in over shift and today presentations is about, you know Coming from nowhere and Understanding the huge land in Kubernetes with respect to controllers and operators The whole talk is shaped around my experience and my team's experience And all the gotchas and pitfalls that we have to learn, you know, on our way Towards writing a core operator for open shit for this applies to almost, you know, any Controller or operator that you might want to write Or you might be tasked with This was better though. Anyway Just to give you a rough idea what we'll be discussing today, I'll introduce Controllers and specifically the controller pattern within Kubernetes, which is, you know, it's the skeleton that they're gonna use when Right, of course controllers and operators And then we'll dive into, you know, differences between What makes an operator compared to a controller? And then I'm gonna show all of this using the MCO machine config operator Which is the operator my team has been writing for the past few months And then, you know, meanwhile, well, I'll also do a live demo of Hopefully that is going to go well and then cross check whatever we have been talking about before with You know the code in visual code So first thing first the controller pattern is an extension pattern in Kubernetes Much of the Kubernetes behavior is implemented through controllers An example would be the demon set controller Where it makes sure that all available Replicas of the demons that are running at that in given time And The controller is actually made of the controller pattern is actually made of other, you know little pieces that make it a controller and The first one this new fence comes from the MCO code base. That's pretty simple A controller does have a control loop The control loop is nothing more than guess what a loop that keeps running forever unless something panics and Then what it does is jc is basically Dequeuing items from a queue which we're gonna look at in the next slides and then run a sink handler and Will any potential error? You know coming from the sink handler and then keep going forever and ever The queue this is just you know, this is the queue that we initializing one of the controller in the MCO code base The queue, you know is the bucket where Whenever there's something that the controller has to you know reconcile or process We take the items from from a queue How the queue is filled is through something called, you know shared informers because basically the queue holds Kubernetes objects that you know change it and that we're interested into in our controller This is This just initialize a you know a pod informer informer is nothing more You know than a concept of I want to be informed whenever something in the pods object In the API server changes I want to be notified about that so that you know, I can run my own logic in the controller That just initialize it but then Every informer has also a set of callbacks That those are fired whenever, you know one of the specific callback is actually invoked say In this example, we're interested into the pods So when something or someone changes a pod in the API server Then if it's adding a pod, you know, this is super Yeah If it's adding a pod we'll get you know the ad function cold same for update and same from delete what those callbacks are basically doing is you know Stuff like filtering out if you are really interested into that pod maybe that pod is in an A space that we don't want to you know care about or Yeah, our similar tasks, but in the end of every of this Callback there is actually a call to an queue an object in the queue so that the same handler which is the core of every controller can actually act on then on that object the same camera is the I said is the core because the same kind of a controller is actually the one that You know as all the logic for the controller itself usual task for a controller would be I have a desired state for an object the same handler would you know read the desired state you know process the object and Turn that desired state into the current state of the object after this run You know the loop continues forever and ever. I think the best analogy I have for the controller is actually a state machine So there's something there looping there's something triggers or fires Then we you know we process and then we go back to step one and keep running This is this is also another you know super usable Concept that you know sharing the informants are providing Listers are read only getters and listers we use those in controllers because there may be so many controllers So now you're running clusters so that this is a Listers are just a read-only cache so that you can get at least object of any given kind in this case is a pod We're using those to avoid You know eating the API server back and forth you know basically to to avoid to overload the API server and I'll show you how we're using those in in the code as well So This is basically those are basically the basic concept behind Behind the controller pattern if you're interested This is this sample controller, which is something the Kubernetes community came up with as a I say a reference Implementation, but I learned that it's not actually you know doesn't contain any every Behavior that you you could you know implement your controller, but if you're interested this contains So many useful patterns that you can actually even copy paste in your controller And then operators so What's the difference between a controller and an operator? I really struggled with the definition that you can find online at the chorus web page so What I came up with was An operator is nothing more than application that implements the controller pattern and that means He can also contain more than one controllers within a single operator It has an API extension and then translates to a right the logic of the operator itself is driven by CRDs usually Or custom resource definitions and there is a single up focus This I guess the best example to explain the last point is You know the most famous operator, which is the coach base operator you have a database That operator is focused on you know the whole life cycle of the of that database You make sure that there are all available replicas across the cluster You know if something goes down it takes action and bring another replica up and stuff like that But then you know there was this confusion so If operators use the controllers those are controllers as well, but it actually not true because You need to have the single up focus and the API You know extension so There are controllers especially in Kubernetes that are just controllers You know the knob controller the demon set controller Those are controllers that act on just on Kubernetes of native Kubernetes object and then Yeah, the MCO the MCO stands for the machine config operator I'm not gonna fully dive into the MCO, but I use it You know as a reference to explain what we've been what I've been discussing before But the MCO what it does it makes sure on an open shift for clusters that The configuration that you know a cluster admin or user want is actually is owned by the cluster itself so that you know in Let's say you want to add a node to a running cluster then Kubernetes itself an open shift knows how to configure the node And also it's responsible for You know upgrading the running operating system that we run on every node This is a huge topic, but suffice to say that on over shift for we're running Every node at least in the control plane runs on something called route color s Which is a an operating system built on top on OS tree and that allow it allows us You know to deliver atomic upgrades And then the MCO is going to manage, you know to apply the update and reboot I'm gonna show you but not the upgrade, but we're gonna see how it does that So yeah, the key takeaways for the machine configure braider is that the cluster itself is actually controlling the operating system and This is this has been super powerful because Since the you know the configuration is there so you know a cluster admin can drive Whatever he wants related to the operating system through the cluster itself and through Kubernetes So if you want to let's say add a file to a you know to the nodes You're gonna do that using native Kubernetes object like a CRD And this is actually the the core piece within the MCO This is part of the API extension that I've talked before when I highlighted You know the operator differences between operators and controllers This CRD does nothing more than you know this is an intent to Write a file at Etsy test with the test, you know as content so You can see that we're driving configuration changes from Kubernetes itself from you know from the command line as well And I'm introducing this CRD so that we can go through the code later on the other Important series that I wanted to talk about is the machine config pool the machine config pool is a concept of grouping Machines that is a you know the same role Say you have like in this example. This is the default ownership for installation We have three masters and three workers and of course we have three We have two pools one from the master and one from the worker what the pool does other than grouping You know same role machines is also grouping together all the machine config for a given role like this this say test but this can be like worker or masters and so what the pool does Is grouping also all the machine config very role generating, you know a huge machine config that takes All the changes that we want on the notes and then you know apply that and you know This can be seen here like the The name of the config is render red dash because it's the you know It's the union of all the machine config for any given pool This is what drives us updates within the MCO This leverages again our PMOS 3 so that This is just an example of well the the config map that contains You know the actual the actual payload for that even upgrades So what it happens is that with this config map we're able, you know Someone updates it like even read that can push an update And then this config map is changed us image URL is changed And then what the MCO will do is read this unpack You know this container image which contains nothing more than our PM then I know S3 diff Apply the upgrade and reboot and it will do that on every node in the cluster Yeah, so The demo is actually Pretty simple You see that So what we're going to do And this is to explain our controllers and you know the whole MCO operators working with respect to just the operator concept All we're going to do is create a CRD and cross check with the code on visual code What is actually happening with respect to the flow that I showed it before So this machine config is basically saying I want this file at CTAS with the test as content Deploy it on every worker We have three workers and three masters so we're gonna cross check also that this file is actually on You know at least on any worker because you may take some times to actually roll out to every to every node So the first thing we're gonna do is create Okay, so What I've done just now was creating a CRD. Well a machine called a manifest containing the machine config You know spec for For the machine config. So if you remember from before Every controller as a you know an even handler for the object It's interested into and so this first one is the render controller You know the MCO is something like five or six controllers itself This is the very first one that when you create the machine config CRD is gonna take action and that's because You can see here When a machine config is added into the API server this controller is interested into you know Knowing that a machine config has been added and so from here You know, this is the this is the informer callback this contains so many Stuff but you know as I said before the most important part is that This piece of code the callback is gonna enqueue an odd an object in this case is a machine config pool That then it's interested into when it runs the same handler And so we can actually check You can see this is a call to actually enqueue the machine config pool and then after you know sometimes You'll see so It is actually triggered, you know when the when the machine config pool was added into the queue for the render controller It triggered, you know the sink handler for this controller and what this what what you know this sink handler has done was basically Generating a you know the render of machine config so a group of machine config for the worker pool You know in generating a new one and what it did was just upgrading Uplating another CRD, which is the machine config pool here and it changed it, you know This field here Right. It's still following still with me somehow okay, and so So so we changed the machine config pool CRD so we changed yet another object and guess what we are another controller which is actually Interested into understanding if the machine config pool CRD Is actually changed and that's the node controller So we have another controller and what happened was the node controller Is interested into you know any change that happens on the machine config pool objects So we change it while the other controller changes the machine config pool. And so we are here again, we're gonna Enqueue the machine config pool again and what the not controller does It's running its same handler Which at the very end It's gonna do nothing more than you know adding an annotation to an odd object And does this can be cross chat with this So we can see that these rendered machine config is nothing more the you know what we have here and so You know following the flow from before we added the machine config We created the new rendered one, which is this one and then the not controller Told you know added an annotation to denote to say all right I want this configuration for that node in this case is a worker node And so following along again Now our nodes, sorry Our node is this desired config. So guess what we have yet another controller, which listen on node of the changes and this is the demon controller What this does again as this has a You know an informal callback so that when a node is updated we take action on that And this is again doing nothing more than Enqueuing the node object so that when the sink handler for the not controller runs, you know, we can take action on On it and what this is basically doing is reconciling the state of a node to what You know the user wants in this case. We wanted to add a file. So with this code, which I'm not gonna show because it's huge What this code does is basically We have a desired state which is reflected from that annotation So this code is gonna, you know, this case is gonna write a file and then reboot the machine Making sure that the desired config is actually the same as we want in the current one Hopefully we landed that file It's here. So There have been three controllers involved in this case just to land the file But imagine what this can you know can do and for the MCO use case is pretty powerful Because we're able to again configure, you know, the node and have this configuration Stored in Kubernetes itself and you know in scaling scenarios when you know if I'm going to Add, you know a an additional node to the worker pool What that node would do is ask Kubernetes itself or I give me the configuration for a worker And then you know, the MCO will run on that node configured the node as Kubernetes says Has been pretty pretty powerful To the MCO did you mean Etsy test That does that yeah, so The one responsible to actually writing that file or any file that you know You specify that a machine configure that it and that it's part of a machine config pool Is actually written, you know the state of the of the of the node itself of the system itself Is managed through this controller and specifically this sink handler I can actually go through the place where we actually wrote that you know that file So the first thing we did in this sink handler was rubbing the node So that you know, we can read its annotation understand, you know, which configuration we want and then You can see here that since the annotations for current and desired weren't the same We actually acted on that and so what that did was trigger an upgrade an update And so we jump here this is doing stuff that this is actually the piece of code responsible, you know for Reflecting what's in the machine configs and what's on the disk on the actual operating system Here we're doing a lot of stuff, but we're basically doing it's this function We have a you know, we have an old state and a desired state So we're gonna diff them and understand, you know, this file must be added this must be dropped, you know, and Later on Yeah, you can see here We actually work different to machine configs to understand what differs and acts on that and then if it's reconcilable You know, this is actually the the function that actually writes that file to disk but other than that we're also responsible for doing so much other stuff like You can write files. You can write system the units you can You know, we can upgrade the whole underlying operating system. And you know, you can upgrade SSH keys Kernel arguments, FIPS mode Oh There's no ansible no the You can see from our machine config that it contains the spec of the machine config itself Is nothing more than ignition? I don't know many here familiar with ignition itself, but it's a way You know, it's a decorative way to specifying You know the state of a system in this case. It's similar to ansible. I'd say I can say that. Can I say that? Okay But then, you know, the ignition part is, you know, the actual part that drives all of this And Rastly just to follow up on your question again This is also the part where we're actually going And upgrade the over the underlying operating system if you're really interested into doing that as well if we have an upgrade coming So, okay, maybe I went too fast maybe but This is actually a quote from one of my colleague Luca. I I You know, the whole team the whole MCO team was coming from nowhere Where I was working on Docker cry on stuff like that other people were working on, you know operating system and what we learned was That it was not that straightforward to jump on on Kubernetes and so it He told me like a rather than you used to work to play checkers But now, you know, the whole Kubernetes land is chess and I can actually feel that because There are so many moving parts even writing a con, you know, a operator like the MCO which You know Can say like maybe it's easy to follow as I did in this presentation But when there is a bug at this level, it's, you know, nothing is straightforward So that's why it's chess. You need to be, you know Three steps ahead to understand what's going on Yeah, this is just, you know, a narrow list of our coaches and pitfalls that were run through during, you know, our Lessons in understanding Kubernetes controller the first one was You should never invalidate the Lister's cache as I said before The Lister's are nothing more than a read-only layer So if you grab an object and you mean to update it like even changing a field or adding an annotation you should copy it because otherwise if you are If you are changing it and updating it to the API server that's gonna cause Again a cache miss for everything else. So that's really expensive. So always deep copy We learned this about this we weren't sure what was going what was happening The other thing that we learned was I'm gonna show you this in code The informer is again a way of staying format about a specific object But then imagine you you have a pod informer. So and you have 1,000 users you have 1,000 users users on your clusters each of them creating a pod so your informer is gonna get floated with, you know With all the this you know callback for adding a pod So what we learned because we were wrongly running a sync Every time we had an object changing. It wasn't the pod the pod is an example of you know There are so many that you can get floated. So what we learned was that there is a way to actually filter The informer so that So that if you're interested into pod object, you can say all right that one all the pods just for Disney space Imagine so the issue we had was we're running the sync because there was Another pod in another name space that we weren't interested into And that was changing a lot triggering a sync in the machine config operator So what we had to do was actually leverage one of the you know informers Function and just add a filter So that you know that pod in that name space wasn't actually causing any sink any re-sync of the operator This can be found Here This is an example of This is a show of the polar for the config map And we're just interested into config map coming from the open-shift config name space So that you know if there is a user creating a tons of config map That's not gonna cause any re-sync of the machine config operator Yeah The last thing well one of the last thing that we learned was Kubernetes as a concept Generation and observed generation I'm gonna show you that So You can see here, you know every object in Kubernetes as a Generation that's that number is increased Every time someone changes the spec of the object itself and then your operator your controller is Responsible to you know sync that with the observer generation So when I change something in this object in the spec of this object that generation number is gonna increase and that That means that the spec so the desired state for this object has changed But we're still you know on the status one on the old one because Observer generation isn't matching that and that for us was really hard to to Understand because at the end read a bug where we weren't actually you know We were playing just with the status field of the object and that reflects the current State not the desired state. So we were only playing with the status who were you know updating field in this status configuration name and But then in the code we were actually comparing You know generation observer generation and that was all right There was always one for instance because we weren't updating the spec So we learned at our own expenses that you need to actually your object need to reflect a Desiree than a current state and you need to play with Generation and observer generation to understand where you're at with the status of your object. I Have the last line this yesterday because I've been asked it like why aren't you using cube builder or a radar SDK? part of that is because Some of the code in the MCO was written before we came And some of the controllers were already written, you know playing like like I showed you before Those aren't using any cube builder any operator SDK, but we might want to you know In the future look at those but I'm not familiar with that. The team is not familiar with that yet I went too fast, but that's it Questions I can't listen Thanks, thanks for your talk The you have a test suite for this and how how easy is it to write tests for those various So As in every test for testing there are so many ways to actually run a test for for this close What we've been doing was a mix is a mix of you know end-to-end test and Unit test fake it as integration test. So well the end-to-end test is pretty simple We spin up a cluster we run commands and make sure that everything reconciles the way we want This is the easiest one But then forever controller that we have Somewhere We're actually, you know faking A run of the controller as if you know as as it would be in a cluster And so we're basically setting up a fake API server where we have all the object and then We run the sync handler with all the list are set up the way we want with the object We want and then make sure that you know at the end of the day the controller itself is just I Have a state a desired state and a current state so The current one needs to be desired, you know, they have to match at some point So that's the whole point of the sync handler So what we do in testing is making sure that if you give you you know this input this object this way in a desired state We get what we want and you know testing also includes, you know, no hop Say that you know your object is already a desired state But maybe the annotation, you know what drives the difference isn't actually set. We're testing also that How long does it take to run the whole suite? Oh, yeah, the M20 test is involving involves spinning up a you know a full ownership for a cluster So it takes around an hour and a half, but the actual unit test is really fast Well, it's not going to be now that I'm doing this live, but usually it's a minute or something like that Even less just the time to compile every suite Anthony, I just want to add that with the end test does this turn on We you spin up the cluster and then you run the full Kubernetes suite right all of our components do that The MCO does that too. So there's like thousands of Kubernetes tests being run The core Kubernetes suite runs with all of our ownership components I'm assuming MCO does that too Right The AWS E2E test Yeah, other than our own test we're also making sure that we don't break, you know, the full ownership test Which is a super set of the Kubernetes test as well So we're running, I don't know many, but a lot of that I looked the other day. I think it's like two thousand two hundred and some Yeah, it's a lot Yeah But yeah, the full this full suite that you can see also in GitHub takes more or less one hour and a half Nothing more than that I'm very new to operators like most of us I guess When is an operator essentially a pod that's always running or is that triggered based on the event? Like can you just go over the life cycle of the The image that is running So yeah, I'll show how the MCO works Well with respect to pods and demon set and deployments that we have So the MCO, you know, architectural structures is as follow There is just a pod which is the machine config operator pod That's the operator code that syncs every other controller in there Because it's gonna update an object, let's say an object And then all the other controllers reacts to that when they need to And so we have a deployment for the machine config operator pod And so we have a deployment for the machine config operator that makes sure that we have the MCO At any given time we have at least a replica on a master And then the machine config demon is the last controller that I've showed And it's actually that's running on every node in the cluster Because that's the one responsible for, you know, laying down files as I showed you before So that's a demon set running on every node in the cluster Then the machine config controller is also a deployment running on a master And the three machine config server, which would be a lot to explain what they are But those are running on masters as well as deployments And the top three as well are deployments So this is the structure of the whole MCO I don't know how other operators within OpenShift are actually laid out But I think, you know, this is the general structure Lunch? Thank you for the presentation So those of you who would be interested for the party tonight at 7pm Please go collect your party tickets at the registration desk We'll resume here at 1.10