 Hey everybody, my name is Dusty Mabe. I work for Red Hat as a principal software engineer and I work primarily on Fedora CoreOS and Red Hat CoreOS which is kind of the basis for OpenShift which you may have heard of or may not have. I'll try to go fast. Was running a little over time earlier but we'll see how it goes. So today's talk, I'm gonna talk a little bit about what is Fedora CoreOS. Some recent developments, future developments have a challenge for you if you've never touched it before and then hopefully we'll have some time for questions but we will see about that. So first off, what is Fedora CoreOS? I'm gonna jump right into some of the features that make CoreOS unique at least within the Fedora ecosystem. So Fedora CoreOS, one of the first things that we usually talk about is the fact that we have automatic updates turned on by default. So if we really want people to use automatic updates, they need to be reliable. Otherwise people are just gonna disable them and it's not really gonna be a feature anymore and one of the goals is to have people leave them enabled so that their systems are secure because they are updated. So in order to keep updates reliable we have extensive tests in our CI pipelines to try to catch anything or any regressions for features that we want to make sure keep working but we don't have tests for everything. So we also have several update streams that people can use to preview what's coming. So like we have tests for things we know about but users can also find issues by previewing what's coming down the line. I'll talk a little bit more about those here in a minute but another mechanism that we have to kind of make updates more reliable is if we happen to notice that an update that we're shipping is breaking people we have something called a managed update rollouts which means there's an update window and some people win the lottery and they update early in the window and some people update really late in the window. And so over that window which is usually a 48 hour time period if somebody comes to us early in the window and says, hey, there's a real issue with this we can pause the rollout. Let's say only 30% of people had been updated investigate the issue and then either decide to continue the rollout or to not continue the rollout and hopefully 70% of people didn't have the issue. So that's another mechanism that we have but inevitably sometimes things go wrong for when that happens there's RPMOS tree rollback that can be used to go back to the previous state. In the future, we might make that more automated so you don't have to manually rollback. Another feature that we like to talk about is automated provisioning. So Fedora Core OS uses ignition to automate provisioning which is kind of like baking any logic for that machine's lifetime. We try to encode that in the ignition config up front this makes it very easy to automatically reprovision nodes. So for example, if something happens to a node you don't really lose a lot well, hopefully your data is backed up somewhere but as far as how the node was configured you don't really use a lot, lose a lot and you're able to replace that node very easily. Another point about ignition is or how we build things is because we use ignition you kind of have the same starting point whether you are on bare metal or cloud. So traditionally people would, if you're in the cloud use cloud in it or if you're on bare metal use something like Anaconda, like an installer but for Fedora Core OS we ship an image and we use ignition to essentially do our provisioning. So if you have a heterogeneous environment where some things on-prem, some things are in the cloud you're using ignition wherever you are and getting the same experience. The next feature I wanna talk about is being cloud native and container focused. So first off container focused software runs in containers we have Podman, we also have the Docker CLI via the Moby engine RPM. So those are two options for you to choose from when running for the work or less. As far as the cloud native piece goes we say that we're ready for cluster deployments which means you can spin up a hundred nodes have them join a cluster and the ignition config is essentially what has that node tells that node, hey, here's the cluster manager here's how to check in and register and then theoretically in your cluster management dashboard you'll see it pop up and then it starts to get assigned to work. So this is the more cloud native model of you you spin up nodes when you need them you spend them down when they're no longer needed and you don't pay for it, right? That's the idea. The other piece of being cloud native is we are offered on or for a cluster of cloud platforms. So we have like 10 plus cloud platforms that we're in or we offer images for and we're trying to add to that all the time. The next feature I wanna talk about is OS versioning and security. So Fedora Core OS uses RPM OS tree technology. I describe this as being like get for your operating system. So you get an identifier kind of like a tag that relates to a commit hash and that tag essentially tells you or is a single way to convey what all is in that release. So you as a user can say, hey, I'm on this version of Fedora Core OS I'm running this command and I'm seeing this problem and I as a developer on Fedora Core OS can do the same thing locally and we have pretty high confidence that we're running in a very similar environment which is really powerful. Another thing is Fedora Core OS uses a read only file system amount for most of the software. This prevents accidental OS corruption and trivial attacks from modifying the system but we have a little more advanced security via SE Linux is enforcing by default. This prevents compromised apps from gaining further access that they shouldn't have access to. See Dan Walsh with a little clappy over there. So what's in the OS? So we have the latest Fedora Linux based components all that's built from RPMs just like you are used to. We have hardware support. So we wanna be able to run on any bare metal server that for example, Fedora server can run on. So we have the hardware support RPMs baked in. We have basic administration tools, container engines I mentioned earlier but we also do have some different policy decisions that we've made. For example, some configurations are, can be slightly different between Fedora Core OS and say Fedora server. In the past this has been based on our target user base. One example of this is C Groups V2. So Fedora switch to C Groups V2 a couple of releases before Fedora Core OS switched to C Groups V2. And this was because a lot of our users, a lot of Fedora Core OS users run Kubernetes on top and the Kubernetes community wasn't quite ready for C Groups V2 at the time that Fedora switched. So we held that back for a couple of releases. But in general, we tried to stay in line with Fedora and only differ when it makes sense. So as far as multiple update streams goes, this is how we present it to our users on the download page. So we have a stable, we have a testing and we have a next stream. And on any of these streams, you can essentially set your node to follow that stream into the future. So I'll go into a little more details here. So basically we have next, which is experimental features and Fedora major revases. So for example, we might put something here first. I think C Groups V2 is actually one of those things that we put in next first for a couple of months before we actually moved over into testing. Another example is Fedora major revases. So when Fedora 36 went beta, our next stream got switched over to Fedora 36 content. So our users were able to preview that, let us know what's breaking them. We actually ran a Fedora test day based on our next stream, which was Fedora 36 content. Testing and stable are a little more coupled together. So testing is a preview of what's coming to stable. It's a point in time snapshot of Fedora stable RPM content. And then if no issues are found or reported for that point in time snapshot, it's then directly promoted to stable. So the same content gets built into a stable stream release and pushed. The goals that we have with this update stream model is that we publish new releases often every two weeks and we also find issues before they hit stable. You can't catch everything in CI. So the goal here is to have users actually tell us when things break. As far as release promotion goes, here's a little bit more of an illustrated example of that promotion where we snapshot on a particular day, we do a testing release and then two weeks later it gets promoted to stable. That is assuming nobody reports an issue with testing, right? If they do, then we try to find and fix the issue and then promote that to stable. So around Fedora 36 specifically, what happened? So Fedora 36 I mentioned earlier at Fedora beta release time, our next stream was switched over to Fedora 36. The next milestone was the Fedora final freeze. So when the Fedora final freeze happened, we say to ourselves, okay, we're getting a little closer to GA. So we want a little more rapid feedback when something breaks a user here. So at that point in time, we switch our next stream to now build weekly instead of bi-weekly. So that hopefully users will let us know sooner than later if something happens. And we can also report that back to the rest of Fedora and say, hey, this new update that was introduced for a freeze exception actually broke something else, right? The next milestone is around Fedora GA. So around Fedora GA, Fedora Core S actually re-orients its release schedule. And the way it does this is first off week minus one. So this is when the Fedora go decision is decided. We do a next release on that week with the latest Fedora 36 GA content. The next week, which is GA release, which was actually Tuesday, actually I'll move to the next slide which illustrates this a little better. So this is minus one week, right? So this is go week, the next stream is created. One week later on this Tuesday, which is when Fedora 36 GA was, we released the testing stream directly promoted from that content from next last week. Now two weeks after that, stable, we'll get that Fedora 36 GA content and then we'll do releases every two weeks from then on out. So that's kind of how it works. Okay, important note, none of this happens without passing tests. So we have VM tests, we have bare metal workflow tests, test our ISOs, test our live pixie. We also have tests for various cloud environments. So right now we have AWS Azure, GCP and OpenStack that run tests on every build that we do of Fedora CoreOS. We don't ship updates without them passing tests and we're trying to add more cloud platforms as we go. Okay, so what has happened recently in Fedora CoreOS land? So we have the Fedora 36 rebase, which I just mentioned, Podman 4.0 was in there. There were a couple, backwards incompatible changes that we wanted to kind of dig into since we do automatic updates, we wanna make sure that people who aren't paying attention as much, you know, what's gonna happen to them if this node gets automatically updated? Do we need to do anything special? We also had the switch to IP tables NFT by default. So this happened a few releases ago for Fedora, but actually the way updates alternatives work means that it didn't actually apply for RPMOS tree based systems. So we had to realize, oh, we actually didn't absorb that change and we had to figure out how to reconcile that and to also, how do we reconcile this for people that are automatically updating, right? And for people who wanna stay on the old IP tables legacy. So we had to think about that. Other things we did, we added a few platforms. So we added a virtual box and a Nutanix artifact for people who want to use those. We added support for creating a minimal ISO image. We updated the VMware OVA to use UEFI and secure boot by default. We also added CI testing for Azure. That was one we weren't testing on before, mainly because we didn't have an account with credits, now we do. And the last thing there is I wanted to mention, Brent mentioned earlier, Podman machine is actually using Fedora CoreOS now as a base for platforms that need to run a virtual machine for Podman. So if you're on an Apple M1 Mac, you are now using a Fedora CoreOS AR64, Fedora CoreOS AR64 image as your backing for Podman machine, which is kind of cool. Okay, so what's coming in the future? A few small things. So first we have a few Fedora changes that we're going to propose. I'm working with various other working groups now to kind of write up the proposals and get those submitted. We have various community interests in PowerPC and S390X support that we're gonna try to add. We're gonna try to integrate better with Kubernetes distributors. So have a little bit more of a two-way talking relationship with them and add some documentation there, maybe some CI. We're gonna always add more additional cloud platforms and CI on those cloud platforms. I'm talking to a few other ones now, IBM Cloud and Alibaba and trying to get our images there and CI tested. We're gonna make SE Linux policy updates a little safer and more reliable. Right now, if you make a policy modification locally, then you won't get updates to your policy in the future, which is not a good experience. But we're fixing that, should be rolling out soon. And then the last one is a big one, CoreOS layering. So this is one that Colin submitted a change proposal for and Fedora 36 is kind of like an experimental change proposal. It's also known as OS tree native containers. So the idea here is that you are able to build and distribute a derived Fedora CoreOS. So you can easily make changes to Fedora CoreOS and share that and have your nodes that are essentially following that stream, essentially follow a container image tag to receive updates in the future. So Fedora CoreOS would be offered as a container in a container registry that you can then pull from. So in this case, from quite out IO Fedora Fedora CoreOS or something along those lines. And then you do a container build, push to a registry and you're good to go. So this is an example of using CoreOS layering. So this is example pulls from a Fedora CoreOS container. It's installs tail scale enables it and then commits that. And then once this container build is done and pushed to a registry, you can then rebase your system onto that container in that container registry. There's a lot of details we still are working out with this whole flow, right? There's some implications in various parts of the stack but we're kind of excited about, what this should enable people to do. It will make it really easy for people to kind of take Fedora CoreOS and make it a little more what they want it to be which gives me some anxiety because I like knowing a little bit more about, this is the exact contents but our tooling makes it very clear when things have derived as well, which is nice. Okay, community growth. So user engagement has been increasing in Fedora CoreOS land at least from my anecdotal experience because we don't really have a lot of good numbers there right now. So IRC and Matrix, people have been dropping in a lot more and asking questions. Our weekly meetings, people have been starting to participate more there and propose issues to discuss. And then our issue tracker and forum has had a lot more activity. So I'm pretty proud of how the community has grown and where we're going. But we do have some data which is more on the user side. So we have our count me pings which are stats based on yum repo hits. And then we also have update server pings which are stats based on update server hits. So given that those two pieces of information we can kind of make some loose, contrived conclusions about things. So the first one, update server stats. So the way the update server works is it keeps account of unique IDs that it has seen since the last time the update server was started. And so every time it gets restarted or moved to a different machine or something else it starts over at zero. But in this case, it was restarted April 1st and six weeks later we're at 185,000 unique IDs that it has seen, which is interesting. And I'll talk a little bit about some of the differences in some of this data here at the end. So that's the update server. For count me, the stats are a little different. So for all nodes in the count me data which means all nodes any lifetime or any length of lifetime, for the past few months we've gone from a little under 15,000 to something like 23 or 24,000. But for nodes that are older than one week we've gone from a little less than 10,000 to a little more like 12,000 nodes. So we have a lot of shorter lived nodes that are kind of being used, which is interesting. It's nice to know how people are using Fedora CoreOS. One thing to note is that our CI specifically does not hit this data because we disable peeing these servers for our CI. So by Fedora release version in architecture most of our users are on 35, which makes sense because that's where our stable stream is right now is on Fedora 35. We have some that have already moved to Fedora 36 because testing and next are over there. And then we have some percentage which are still on 34 which means those people have disabled updates and are now on systems that probably have some security sensitive issues. I don't know. But yes, please enable your auto updates. As far as architectures we have about 20% or so that are AR64 and the rest are x86, 64. We have one or two on PowerPC and S390 just cause somebody's building them somewhere but we don't offer those right now. Okay, so some conclusions from that data that I have drawn that are maybe right or maybe wrong. We have many very short lived nodes and I say that's about 20,000 weekly. These short lived nodes spin up, do work and then shut down. They don't stay up long enough for the count me peeing which basically happens semi-randomly on a three day window. So these nodes don't stay up long enough for a count me ping but they do stay up long enough to ping the update server which happens almost immediately as soon as Zincotti starts on boot. But we have about 10,000 nodes that do stay up long enough for the count me ping. So these spin up except requests for work maybe they stay up a few hours or a few days but they are long enough to hit the count me server and then shut down. And then we have longer lived nodes that was the number earlier that was about 12,000 or so. Another interesting thing that I derived from some of the data was about half of our longer running nodes don't contact the update server. So when you restart the update server within about an hour you get a number of nodes that are just consistently checking in and that number was about a little under 6K but we know our long running nodes are about 12,000. So from that you can derive that about half of the longer running nodes aren't actually contacting the update server. And why is that? So this could be because people like to some people like to disable automatic updates and control it themselves. So yeah, automatic updates are a little scary to some people, I understand that. But another reason could be that in some scenarios updates the update server is disabled by default. So in live pixie it makes no sense to have updates enabled or automatic updates enabled because the way you update in a live environment is you reboot and you boot into the new live environment. It's a network boot type situation. So data conclusion, we have users but my challenge to you here today is to join us, join those users, try out Fedora Core OS, join our community. We have a lot of different ways to get involved. So we have website issues, forums, docs, mailing lists, IRC, matrix but one of the best ways to get involved or to just check out Fedora Core OS is to look at the tutorials. So you can actually go follow the tutorials. It'll kind of walk you through some of those features that I talked about earlier. One of the big ones being ignition and show you how you can use Fedora Core OS, how you can actually bring up these nodes in an automated fashion, in a cloud, locally, on bare metal or whatever. So let's see, I think that was it for me. Does anybody have any questions? So there was one question in the Q&A tab. Do you take whatever is in the repo at the time a new testing is composed or do you cherry pick package updates from that? So for the most part, we take whatever is in the repo but what we do actually, in order to prevent breakage from entering our update streams, we have a job that's kind of like a pre-runner that runs every day. And what it does is it does a new build of Fedora Core OS and run some basic tests. And only then will those packages that were in that build get promoted to the point where they will then enter one of our update streams. So yes, RPMs won't even get into a build of Fedora Core OS unless they pass an initial set of tests. So let's say, a new update for glibc broke some things. I don't know why, but let's just say, and it didn't make it through that initial pre-runner job, we call it bump lock file. In that case, we would actually pin glibc on the older version and then we would rerun bump lock file again and then it would promote all of the other content other than glibc into our testing stream. So that's a long way of me saying, we take whatever's available, but we don't allow breakage to come in. In a similar vein, we're also able to fast track packages if they fix an issue specific to us. So if a package comes in, hasn't quite yet made it to Bodie stable, we can fast track that and get it into our streams just so we can start doing automated tests on it. And then if it passes our CI, we can add Bodie update feedback to it. Okay. Can you proxy the update server behind a firewall? So there are some open issues around like proxied environments and completely offline updates. I don't know of anybody, well, can you proxy it behind a firewall? I don't know if Zincati itself specifically has the ability to like set HTTP proxy or something along those lines. Luca would know. But yes, ask that question on the discussion forum and we'll get you an answer. How about dropping run C and using C run? You'll save size and won't mention dropping Moby engine. Yeah. C run versus run C. I don't think I have a strong opinion there, but that would be something that we would kind of open an issue in our Fedora Core OS issue tracker and talk about it in a meeting. I don't have a strong opinion on that one. Or maybe I should say, I don't know enough to have a strong opinion on that one yet. And the last question is any chance for a way to provide an offline way to preset SE Linux Booleans? Offline way to preset SE Linux Booleans. So I don't know about offline, but you should be able to essentially set SE Linux Booleans as part of your ignition config, right? So you write a service that is defined in your ignition config that sets the SE Boolean on first boot and then it's set forever. I don't think there's a way for somebody to like crack open the image that we ship and set a Boolean that way. And there's also in the Core OS layering GitHub repo, Colin has an example of setting an SE Boolean using Core OS layering. So that way, theoretically, you do the SE Boolean setting quote unquote server side, even though you're deriving from our image and the node that you are starting essentially doesn't do that client side.