 Okay, so my name is Luca Bruno. I'm gonna talk here about autobdates. I'm gonna do a kind of like mixed talk. It's mostly technical, but it is also a bit of like history. So the idea is, in container Linux, as a different distribution from Fedora, we were used to have autobdates that were automatically going to nodes. After CoroS as a company has been acquired by Ray that we are trying to port the same model to Fedora CoroS. And we are kind of like redesigning part of it, learning a few lessons, and kind of like re-implementing things because they didn't, we couldn't manage to port them as they were from container Linux or to the Fedora CoroS one. And this is more or less the whole talk. I am here. This is my second vlog. It's pretty nice. I'm mostly a Rust and Go developer. I take care of a lot of things at the OS level itself. I used to work, as I say, that CoroS as a company and based in Berlin, after the acquisition, now I'm working at Ray that, after the other acquisition, now I'm working at IBM. After the next acquisition, I don't know. I still plan to be based in Berlin. So if you're passing by, you find me there. Before doing this, I was into research, mostly like security research. And after that, I was working as a security engineer. And that's it. So the overview of these talks is pretty much, we're gonna split in three parts. The first one is an overview of the container Linux model, which I guess for most of you is new, let's say. And then I'm gonna pretty much like recaps all the friction point that we found when using this model, when applying this model to the real world. And so we learned a few lessons and we are trying to kind of like avoid the same problems when doing the same for Fedora CoroS. And then I'm gonna introduce like what we currently have as a model for Fedora CoroS, which is still a work in progress. We are still like deploying and implementing part of this, but at least we have a rough idea of what we want to do. So let's start from the container Linux model. This picture is a diagram of our world in container Linux. And the world is mostly split in three parts. There is something at the top, which is, no, okay, this is out of battery. There is something at the top, which is our own infrastructure. So what we as a company, as CoroS company, we were running, which was mostly like the backend for the out updates. And then there is a bottom part, which is instead like what our users were running. And those users, the context of container Linux is that we provide a distribution for clusters. So those users were running like nodes with container Linux as part of a larger cluster. So we have on the bottom part this cluster, which we kind of like generalize into a single machine, which is the one on the right, and then the rest of the cluster on the left side. So if we focus on every single machine, every single machine is basically running at least two components out of these three. They are always running the update engine, which is the top level component, which is connected to our infrastructure. And then they are running at least one of those two, one between locksmith and cluel, which is a component which talks to update engine in order to orchestrate applying reboots. In those components, they normally have some kind of like cluster-aware logic. So they defer to the rest of the cluster for actually orchestrating these kind of reboots. And then I'm gonna get into more details about each one in a second. Just remember that there is this split between like our architecture, every single machine and the infrastructure for the cluster that the users are actually running. So we start from the top part, which is like the server side, our backends. On that side, we have one component or two components, depending on how you look at that, which is core update. Core update is a server. It is, if I'm not wrong, is Retingo. It is an implementation of the HOMA protocol, which is a protocol for providing updates and it's based on XML. From our side, we use this as a normal web application with a JavaScript interface, and we also have some command line interface for the same API. The key point is that this is proprietary code, so this is something that as a company, we were actively like selling to customers as a service, except for like the large majority of user that were using our free offering, in which case they were just like consuming this, but without access to the web interface or to the product itself. There are third-party implementation of this, which is why I have like two names at the top, and an interesting part of this is that it's a traditional application, so it has like a back-end, a front-end, and some database, which is Postgres, if I'm not wrong, and it is like stateful at every single level, and it's not even like a distributed state, if it's actually like linked to a specific database instance. And as I said, this is the server side for the out updates, but at the same time, by the way that Yoma protocol works, we can also track client statistic and usage metric so that we can actually serve different updates to different clients. This is both good and bad. The problem is that this kind of architecture has several issues when you start distributing the setups, especially in the database, and in general, this tracking client statistic is very stressful on the database, and this is fine in most cases, let's say you can just scale your resources as you prefer, but at the same time, this is a key component of the out update infrastructure, so the more pains and the more stress we can get out of this path, the better it is. And then we proceed, the counterpart of this on the client side, so on every single machine, on every single node, is the update engine, which is a client, unexpectedly, for the same protocol that the update engine provides, so it's still like an XML-based Yoma protocol. This component is pretty much a kitchen sink, like it takes care of everything from periodically polling for new updates information, downloading these new updates information, applying them to the system, which in our case means we have an AB scheme for the user partition, so we always have an active partition with the active slash user and a passive partition with the passive slash user, and this component is in charge of writing into one, activating it, rebooting into the proper one, or going back to the other one if something went wrong. This is kind of like a huge, I mean, not the largest software that I've seen, but it's kind of like a big project. It's written in C++, originally it was written by Google. There are multiple forks of these in the public, but also internally at Google. Our own, it's just like, it's under our GitHub organization, and it's just like our own maintain fork at some point in the past, but it's notably used by all the Chrome OS, for example, for all the Chromebooks, sorry. So the key point is that this is like an i-complexity piece of software, it's written in C++. It owns every single aspect of the out update story, and from our point of view, it's kind of like we don't have, unfortunately as a startup, we don't have enough workforce to properly maintain this one, so we are mostly using it as we initially forked it, but we touch it very, very rarely, so it's effectively pretty much unmaintained software from a point of view that it works and we don't touch it. Then there is the last piece of this story, which is whenever we provide an update to some machine, this machine is gonna try to apply this update and then reboot in order to actually activate this out update. This is because we follow like an atomic system where an update updates the whole OS as a whole without like single packages updates, so in order to actually use this new update, you need to reboot into that. There is a problem with this model, which is if we start pushing updates to all the nodes and you have like 20 nodes for high availability of your cluster and all of them are applying the same update at the same time and rebooting, your availability just immediately disappear because then you have a downtime of the whole cluster. So something that sits on the side of this update engine is another company which takes care of reboot coordination and it's basically like something which is in charge of deciding which node can be down rebooting at what time. So initially we wrote something called locksmith, which is a Go binary that is part of the operating system itself. It provides a few strategies for rebooting, like reboot immediately or reboot within a window of a maintenance windows or reboot whenever you get some semaphore locked in a distributed database, for example, on EC2. That was our initial implementation. Then as the world moved forward into the Kubernetes world, somebody came up with like a different requirement, which is on Kubernetes I already have a distributed database and an API for like putting and retrieving objects in there, which is the Kubernetes API. But I don't have access to the HCD cluster itself because it's normally part of the internal implementation of Kubernetes. So we've wrote another component, which is similar but different, which is a containerized version of the same logic, which is called the container Linux operator or clua. So we basically push this logic out of the operating system into a container. This container is scheduled itself by Kubernetes and the logic is that now, this kind of like fleet-wide reboot coordination is done via Kubernetes objects. So we are pushing the object to Kubernetes in order to know which node is allowed to reboot and when to reboot. And we are retrieving these nodes in order to get the information about the state of the reboot across the cluster. And this is actually, this architecture actually requires two components at this point, one which is the update operator itself, so the manager, and something that is in charge of like rebooting every single node, so something like an agent on each node. And that one is another container which is deployed as a demon set on every machine. So the key point in this slide is that those two components, they basically implement a very, very similar logic, but they share pretty much no code between them because they're running completely different contexts and they also take completely different kind of like API and logic and stuff. So there is a lot of overlaps but not a lot of like compatibility and logic sharing. And that's showing back like the picture that's just what we saw so far. And now we're gonna progress. It's like, that was the point when Red Hat acquired CoreOS and it was working pretty well except for a few friction point. Those friction point were a bit like everywhere in this picture, like the first one was the server side. The server side was proprietary software. It didn't honestly get a lot of like revenues from that but it was still like something that we were selling. The problem is that people also want to use it without paying royalties to us, which is fine. So they had to implement basically exactly the same stuff as open source, which is a failure if you care about like free software. Another like goal, so these are kind of like goals that are derived from those friction point. Another goal was like, this service was taking care of tracking stuff as well on top of like serving updates and that was, and that part was actually causing us more friction. So we can actually try to decouple those two problems and say, okay, if the out updates were working well and the problem were on the tracking part, let's move the tracking somewhere else. That way, another idea that we had is kind of like, we try to avoid an explosion in the cardinality in the database because we are not tracking stuff anymore actively. We can try to make it like as much stateless as possible. So getting rid of all the statefulness were possible. And then most of this was, most of this model was fitting well for a company that was doing stuff like internally. So people outside of the company were just consuming updates that were not in charge of like looking how we were making release except for like, they know that these contents goes inside these release, but when we are actually taking a release or how we are rolling out to the cluster, that's not their concern. So there was a lot of like private and hidden states which was, let's say it's likely a problem for the general audience, but also a problem for us because then we started like selling instances of these that were like in other environments, like in our gupt environment. And so there was always like some kind of manual coordination point for syncing all the database and syncing all the customers and kind of like providing these kind of like bundles which was, it was working if you are a company but if you are a community is not something that you are really looking forward for. Next step is like in the client side. In the client side, it was working pretty well. We didn't have major problem. The major problem was, so the only problem was kind of like maintaining these beasts. Like it's something that we got from Google was working pretty well, but we definitely don't have like the developer team that Google have dedicated to this. So we ended up with kind of like something very complex written in C++ that is not one of our main languages and effectively like not getting any kind of development or stuff which is good and bad depending on which stage you are of this journey throughout updates. One actual architectural problem is that everything was coupled into these update engine. Like it was doing discovery. It was doing the deployments. It was doing rollbacks and everything else. And it was also like mostly interactive. Like it offered a debuts API. That's how you do everything. It's kind of like you cannot configure this stuff initially on first boot in a declarative way. This was built before SystemD. So even the configuration itself was not like taking care of overlays and droppings and things like that. So it's kind of like, it's a traditional old style Linux demon. And it's not particularly easy for administrator to monitors which means that if something goes back with the out updates and there is a rollback and stuff like that, you're kind of like supposed to manually look into every single node interactively or build your own tool to monitor this stuff. So these are all like goals, friction that we saw, goals that we are trying to kind of like achieve when we're doing this stuff. And the last step is the reboot coordination. The problem here is that just in our small world we already ended up with like two implementation of this that were running in different scenarios. And the problem, the reason why we ended up there is that we initially built one implementation with the assumption that it was fitting in every single use case. And then we had to actually move most of this logic somewhere else because it was not fitting another use case that we had. So this time we're kind of like trying to future-proof the design a bit more. So moving most of this logic out of the host itself so that the customer if it's running, sorry, the consumer if it's running Kubernetes or any other cluster orchestration they can actually decide which backend to use. We are trying to future-proof a bit the design, decoupling it from a specific database because before it was only at CD or only Kubernetes and some specific version of Kubernetes. And by doing this we hope to allow other people to implement like their own backend's logic. So if you want to implement like a Nomad backend or a Postgre backend, you can do whatever you want. Okay, and the last step is not really technical. It's more like the human process. The human process behind this was again like designed for a company, which means that you basically trust every single component. You don't need to have like 100% audit or review of what's going on because you know the few people that have access to stuff and you always can reconstruct what was going on via logs, chats, email and everything else, which is not normally true for a distributed open source team. There were a lot of manual operation and manual coordination via chat, like now I'm doing this, now I'm doing that, now please review this, now please give me access to that and so on and so forth. And there was not a central public source of truth. So the central source of truth was this internal communication channel. So we are also trying to kind of like improve the process in that regard by kind of like having something which can be like reviewed, audited and observed from the outside and being able to point people to what is the central public source of truth so that they can actually reproduce the whole flow. So the new model more or less looks like this. It's still split in these three areas like some infrastructure from our side, at this point is the Fedora infrastructure, a local cluster that the user is running and then the local cluster has one specific machine that we are looking at and then the rest of the cluster. So the key ideas, the main points in these slides is that we are actually trying to split the logic a bit more and moving some of these components. So as you can see, like now there are two components on the top, one which is like providing the auto-update ints and the other one which is providing like the update payload. Then there are two components at the bottom as well, one which is consuming the auto-update ints and the other one which is actually downloading and the applying those updates. And then there is another component which we kind of like pushed to the left side. It was not there before, it didn't exist which is something that you run on the cluster for doing this reboot coordination. What before was on the host or at least there was some kind of like split between host and cluster components, now it's pushed to the cluster. And so now it says airlock and the CD3 or other because it's like it doesn't really matter what you're running on the rest of the cluster from the point of view of the node. So let's start from the last point that we were touching how to do the release process itself. We overhauled these a bit, quite a bit actually and we basically ended up with what is our normal development process which is you have a repository somewhere, you open a full request, somebody reviews it, you get merged and then some other action is triggered after that. And in the middle you run CI or whatever you want to run. So that is exactly what we do by having a repository on GitHub that contains definitions about like what are the current updates being rolled out to the cluster. We basically implemented all the process that we had internally revealed for this scenario so we can do stuff that we were doing before which basically means we can push multiple rollouts in parallel, rollouts means that we are pushing an updates gently, let's say over a period of time to all the nodes that exist in the world. We can pause some updates, some rollout that is going on, we can resume these updates if we think that it could be resumed. We can have update barriers, which means we want all the nodes to pass through some specific updates. We can signal when there are dead ends because we are human, we make bugs from time to time, we make releases that cannot proceed further via out updates and so we need to signal this somehow to the client. This process is automatable, which means that we have an initial step which is manual like opening this pull request and pushing the definition. And then everything else can be kind of like automated in the public, it could be audited by anybody looking at that and there is no sprawling or like multiple private database that is actually no more database involved into this. Is less eye catching than core updates, no web UI, no whatever, it could be added, but it's definitely more developer, DevOps friendly. Then the other component at the top is Cincinnati. Cincinnati is the backend, it's more or less what we had before we call update, except that now it only does one thing, which is update hinting, which means the clients are periodically pulling this server and the server is returning back a JSON object which is a DAG, a direct acyclic graph of the available updates. And this is just basically hinting clients, telling them, hey, if you want, there are these updates available. This company is completely stateless as I say, like in the process there is no states. It's written at us, it's deployed in the federal infrastructure and we don't actually record any more any usage metrics. We just have like some metrics from the application to see if it's going well and that's it. So by doing this, we reduce the scope, we reduce the protocol, we formalize what is the graph model for the other dates and that's the word that we are kind of like following. Then the component on the side of this one is still like on the node itself, it's RPMOS3, which is used by other Fedora flavors as well. It does atomic OS management, so it's kind of like Git, but for the root file system. It's based on OS3, which means that you have kind of like saving this consumption by not having like duplicate copies of the same binary content. It allows out updates, atomic updates and atomic rollbacks. An arbitrary number of deployments and you can also install like RPMs. It's written in C, part of these are ported to Rust, it's still in progress. And it basically bridges between two words, the RPM, the traditional RPM word and the immutable OS word. On the client side, there is also something that pulls, since you're not in, which is this Zincati client. It's another component, like this time it's fully declarative, it's written in Rust. It checks for out updates and is in charge of like triggering reboots. It is a bit less complex than the update engine. It's a single state machine with less than 10 states. It mediates between these other components, like Zincati, RPMOS3 and the unlock that we're gonna see soon. And it actually has post metrics in the Prometheus format so that you can monitor the whole cluster, the whole fleet of nodes from a single point of view. This is the component that we started writing from scratch, but in practice it's pretty much like, we took the logic from locksmith and we kind of like reorganized it and reshuffle it a bit and wrote it in a component that works with RPMOS3. And the last piece is Airbox, which is this logic that we pushed out of the node. So it does reboot coordination. It's actually like kind of like a server for Zincati to ask for permission for rebooting. But it's not anymore on the OS itself. It's in a container somewhere. So you basically do accounting semaphore with the cursive locking over HTTPS, and nothing more. We have an implementation of this, which is this airlock, which is a CD3 as a back end. And we standardize the protocol between Zincati and airlock, which means that we provide this, but if you want to provide your own Kubernetes-based implementation or Postgres-based implementation and you are the expert in this in your domain, you are free to maintain this as your own container and deploy it as a container. And that's the recap of what we just went through. So it's like, that's the split. The server side, the client side, a client side which is checking for update scenes, a server side which is providing update scenes, a server side which is providing OS3 updates, a client side which is downloading and applying these OS3 updates, and then something in the cluster which takes care of like reboot management across the whole fleet. And that was all. So we started a bit late, we finished a bit late. I can take, I guess, a couple of questions maximum. And these are the references for what I just talked about. You have a question from yesterday, so. Yeah, well, it's sort of the, this, probably, you might just want to basically work out of the box. Yeah, so that's the basic picture. And in this picture, there is no Kubernetes or OpenShift evolved at all. And the idea is, we provide the initial implementations that is because we want one-to-one parity, future parity between what we had in container Linux and what we have now in Fedora Chorus, which means the model that works without Kubernetes or OpenShift in the middle. So you just need, basically, somewhere to deploy this airlock which could be in the cluster itself, but it gets tricky at some point. And a database which is a net-cd database. But again, if you want to implement this stuff, like in Perl and using a MySQL database and you can talk to this somehow, then it's like, there is no Kubernetes involved in any of these. Please. So I think that my personal answer is no because these are just like providing updates. And your question is more about like, when are you building a release? Do you actually send out messages? And the answer is no, but we plan to if I'm not wrong. But again, this is just about updates and not releases. Okay, so you would like to get basically a Fed message whenever we do this kind of an impolation. I'm from Italy, I live in Germany. Somebody in the US decides to name stuff according to American cities, and I'm fine with that. As long as I can pronounce them, so it's like. Since you know that it's kind of like okay to pronounce so it's fine, there are weirder ones. I don't have an answer for that, again. So the protocol is here. So we didn't come up with this protocol, I mean, not me personally. This is something that we share with the OpenShift organization, sorry, with the OpenShift product, let's say. As you can see, the implementation, it's under the OpenShift org on GitHub. Which means that we are basically just like piggybacking on some design that they did for the out updates for OpenShift, and we are basically providing another implementation and logic for the same protocol, both server side and client side. Other questions? No? One, two, questions. I would say that I submitted this presentation before actually implementing half of this. We actually tested it exactly once, which was this Monday, and it worked pretty well. It didn't show any kind of like major bugs, and it works exactly as we were trying to do on the first try, which is kind of like, my point of view is amazing, but I could not say that like this model that we are still implementing was better than the other one that we proved for kind of like four years or a bit more. So it's like, that's a bit of a bold statement I'm not making. It should, according to what we are kind of like thinking and designing, but it could have bugs anywhere, and we could have kind of like missed stuff. So it's, I don't know. It's an experiment at the end of the day that we are trying it. Okay, I think I'm gonna close here and leave the stage. Thank you very much.