 All right. Yeah, let's try that again. I'm Benjamin Gilbert, formerly on the Container Linux team at CoreOS, now on the CoreOS team at Red Hat, working on Container Linux and on Fedora CoreOS. I am Dusty Mabe. Previously I've been involved in the atomic working group within Fedora, and now involved in the Fedora CoreOS working group. So that's what I do. Cool. So we're gonna do a few things here. Interesting. We're gonna do a few things. I'm gonna talk for a while about Container Linux. I assume that folks in this room aren't necessarily that familiar with it. And then afterward, Dusty's gonna talk about project atomic, and at the end we're going to close with how both of those projects have informed where we think we're going for Fedora CoreOS. So where did Container Linux come from? This is a Linux OS project that we started in 2013 at the then fledgling startup CoreOS. It's based on Chromium OS, which is the Google Chrome OS upstream open-source project. We inherited a lot of its design choices, including the fact that it uses Gen2 packages, Gen2 tooling, to build software. Since 2013 we've had a whole bunch of releases. And the thing we're building is a minimal container-focused server operating system for production use. Targeting primarily cluster deployments though also operate standalone. And we think a lot in terms of a mutable infrastructure, and I'll talk about that more in a minute. Automatic updates are a big part of the story. The idea is that you should be able to install Container Linux node. Install Container Linux node, and not think about it afterward. The node should just update itself, and you shouldn't have to worry. We try to have broad platform support so that whatever cloud you want to run on, or whatever bare metal system or whatever, you can do so. And of course, stability and security are a primary concern. So that seems to have helped, but could you... Okay, so going into some of those in a little bit more detail. Production operating system, what does that mean exactly? We do not ship anything that you don't need for running containers in production for the most part. No development tools, except Git. Not a lot of debugging tools, and really just enough to get your hardware going and then run containers. Immutable infrastructure, I mentioned a moment before. The idea here is that you have the base OS image, and then any customizations you want to make to configuration, or what have you, are encoded in a single provisioning configuration that you pass through user data, for example, in the cloud. We don't prevent you from SSH-ing in and arbitrarily modifying the node afterward, but really you shouldn't. Configuration management, this is a bit tongue-in-cheek, is not something we really encourage. You're free to do it. You can, Ansible can run with some effort, but really the idea is that if you want to make changes, you should probably be reprovisioning the node. And then automatic updates, as I said, users shouldn't have to think about them. This has consequences for us. We cannot break compatibility ever. If there are old bootloaders that we no longer ship, which is true, if there are old versions of cloud agents that are still on nodes, we have to work. If a daemon is updated, it still needs to be able to read the old version of the config file, or we have to go through some conversion process. And if we really do need to deprecate a service and remove it from the operating system, there has to be a long deprecation window, which long for us means probably upwards of a year. And then, of course, no regressions. That's hard to do. If we regress too much, users will disable automatic updates, and they will no longer be getting security patches, and they'll be running some OS version from two years ago. Here's a list of the platforms we support, pretty much all the major, at least, US and European-focused cloud providers. Live Pixie is an important case for us. We actually have quite a large percentage of our user base is booting container Linux from Pixie into RAM, and the OS itself is running under RAM. A big part of the story is install and update. How does that work? I'm sorry, but I'm going to present you with a partition table. And the reason is that this is pretty important for understanding how the operating system fits together. So at the end there is the root file system that is for user data. Should we try the other mic? The root partition is for user data, and it's for configuration data. You can put whatever you want in Etsy, and that's fine. The rest of the partitions are owned by the operating system, and users are not encouraged to mess with them. So boot, of course, is for kernels, and then we have two identical partitions for user, because all of the operating system code is shipped in user as a mutable file system image. So there's an active passive partition, and we update one while running from the other, and then reboot. The OEM partition is for, it's a weird name for historical reasons, it's for platform-specific customizations, like on GCE, or Packet, or whatever, and I'll talk about that a little bit more later. How do we install this thing? There are no installers in the cloud. You launch an Ami or an image of some sort, and of course for Pixi there's no installer. So that leaves bare metal installed to disk, and we didn't want to write an installer just for that use case, where you're doing your customization in a way different from the other platforms. So the installer is just a shell script that really just downloads an image and de-dies onto the disk. That works pretty well. It doesn't work perfectly. 4K sector drives are a problem, but for the most part it works. You get a monolithic image, same as on other platforms. Once you have that monolithic image booted, you need to customize it, provision it in some way. We've had a couple attempts at this. Canonical CloudNet is the obvious thing to try. It's written in Python, we don't ship Python. We wrote our own. It's more or less CloudNet, kind of less. It turned out to be a problem. It runs as a regular process halfway through boot, and at that point it runs around, and writes our config files, and enables and disables services, and starts services. And when it fails, you're left with a half-configured system. Even when it succeeds, you end up with complications around service ordering and that kind of thing. Also it runs on every boot, which means that users try to modify the config behind its back and then reboot to have it pick up changes, and it was really never designed for that. So to correct those problems, we built Ignition. Ignition is a provisioning system that runs in the NetRAMFS once on first boot. And because it's running that early, it can do things with impunity that CloudNet couldn't. So it can, of course, drop files, it can create system to use units, enable and disable services, and that's fine because the boot hasn't really started yet. It can create users and groups. But because it's running in part before the root file system is even mounted, it can do things like reformat the root file system. If you want it to be XFS or butteress instead of X4, you can do that with Ignition. And then finally, if provisioning fails, you're done. We just drop you to an emergency shell and we don't boot the system at all. Which means that if the system comes up, you know that it's provisioned properly. Okay, atomic updates. How do we update? I mentioned the active and passive system before. We ship you an X4 file system image in a kernel and it gets written down to the disk. That's easy to explain. It seems pretty nice. It has limitations. Updates overwrite the previous version. We had a case recently where we had a bad version and we were trying to update out of it. But that was kind of scary. We weren't sure whether we were going to be able to update out of it in that case, but trying to do so was going to overwrite the last known good previous version. And there was just no way to do that with the AB model. Once we boot into the new system, 45 seconds later, it commits. And at that point, the updater can now fetch a subsequent update and download that. So there's this very narrow window where rollback is even possible. And we have essentially one update image per architecture because the platform customizations are only embedded in the install image. An important piece of how we do updates is staged rollouts. We do not just push an update to the mirrors and let everyone download at once. If a client wants to receive an update, it has to call into a server to get permission, and we ramp up over time the number of nodes we're allowing to update. Nominally, this allows us to watch test reports coming back from nodes. But in practice, the major use of it is that if a user reports a bug in the new version, we can stop the rollout before it hits all nodes. In the process of talking to the update server, the client sends some metrics, cloud platform OS version, original OS version, that kind of thing. But because that's coupled to the update system, if a user disables updates for whatever reason, we no longer get those metrics and we have no idea. The node becomes dark matter. We have no idea that it's out there. These are some of the components that are involved, protocol, client server. They all have problems. That was not code that we wanted to carry forward. Clustering. So once a node is updated, it needs to reboot into the new kernel and the new user partition. And the original assumption was that the OS would be installed in clusters, and the cluster has to handle node failures. And so it's okay to just reboot a node. We can't reboot all the nodes at once, of course, or the cluster would go down. So we have a coordination mechanism called locksmiths, which takes a lock in EtsyD to ensure that only one node in the cluster can reboot at any given time. But the cluster doesn't know about any of this. If you have a Kubernetes system with container Linux underneath, and just using this mechanism, the nodes don't drain before it reboots. So some connections that you're serving will just drop. Conversely, the cluster doesn't control the OS version running on each of the nodes. The nodes are all updating at their own pace, and that's not ideal either. So what we needed to do for Tectonic and for Kubernetes was build a higher level coordination mechanism. And we did that, but it doesn't get any help from the operating system. It's reaching in and twiddling some of the same mechanisms without going through locksmith or any of the tooling that's shipped on the OS. So that's not ideal. Automatic updates also have exclusions. The OEM partition that I mentioned earlier, which contains a lot of the platform-specific code, the cloud agents, cannot be updated for weird historical reasons. The bootloader can't be updated either for the reason that if you break the bootloader, you break the machine. And so it just didn't seem safe. And then one day we had a bug where there was memory corruption inside Grubb, and depending on the exact, like, block-level layout of the boot partition, Grubb would start boot-looping. We have a shell script, core-rest-post-enst. It runs after the image is laid down, and it contains all the hacks. So what we had to do to fix this was detect the broken versions of Grubb and use printf and dd to binary patch them. Just briefly, automatic rollback. It seems like a useful thing to do if the node fails to come back after a reboot, boot back into the old version. That kind of works. There's no reason we couldn't have gotten this completely working, but we didn't put the time into it. So if the new OS version is early enough in boot, we will automatically reboot back into the old system. Because you get further into the boot, things get fuzzier. If the network doesn't come up because you have a bad network driver, there's a problem. We don't detect it. If there's a service that you particularly care about like Dockerdee that doesn't start, we don't detect it. You end up with a broken machine. Even if we do automatically roll you back, the update client doesn't know and it'll just apply. Now you're boot-looping except with the 300 megabyte download in between. And finally, we don't have user-specific health checks for things like special services that the user cares about. Very briefly, update channels. There are several. Most of the time you'll want to run container links on the stable channel, but there exists faster-moving beta and alpha releases. For reasons I'll get to in a moment, we try to keep those also usable, and we encourage users to run a few percent of their nodes on each of them. Here's the reason. There's plenty of manual routine testing. We have CI, and the CI tests are fine as far as they go, but we're shipping millions of lines of kernel and system D and Docker, and there are things that CI isn't going to catch. This is why you want to run alpha and beta. So the idea is by the time code is promoted to the stable channel, users who have a few percent of their nodes on the earlier channels should have caught problems that are specific to their environments and may be able to report them to us. Briefly, a few odds and ends about the container Linux runtime. We support AMD64. We added ARM64. Later, we never quite got there. We actually just recently removed it within the last month or so. It should be surprised to no one who's ever even looked at this that bolting on a second architecture after the fact is a bad idea. We push pretty hard on the idea that you should run software in containers on CL. You can copy binaries to the host system and run them. There's nothing preventing you from doing so. But we really don't think you should. There's no package manager. There are no conveniences for this. Run software in containers. Okay. Well, it turns out that... Sorry. Sorry. Okay. Well, it turns out that there are some things like demons associated to particular storage cards or whatever that you might want to run in the host. And of course, people are interested in out-of-tree kernel modules, which they need for various reasons. We don't have a great answer for that right now. People ask on IRC about out-of-tree kernel modules, and my co-worker David sends them a gist. Docker turned out to be a problem. Initially the model was that we were going to ship the latest version of Docker and everyone was going to be happy. And for folks using Docker Engine standalone or for using Docker Swarm or whatever, that was fine. The problem is that Kubernetes is not qualified against any version of Docker which is still under support. And so we had a choice to make. And if we went either direction in terms of picking a version, someone was going to be unhappy. So the net result is we had to provide a way in this OS that's just supposed to work and you don't have to think about it, we had to provide a way to let you choose a version of Docker. The result was Torx, which was just enough package manager to get the job done. It doesn't do dependencies. It installs essentially every boot into a TempFS. It works well enough, but no one was really happy with that compromise, I think. We try to keep interpreters out of the operating system. This will become important later when we talk about Fedora Core OS. We have Bash and AUK. We don't have Python. We don't have Perl. It keeps the image small. It keeps the attack surface small. You probably shouldn't be running a lot of fancy code in the host anyway. However, then we have to do things like re-implement CloudNet. Finally, platform agents, we have them in the OEM partition. They can't be updated. Sometimes we have to ship Python in the OEM partition that can't be updated. The code that we're running there is of perhaps uneven quality and perhaps uneven usefulness. That's been an ongoing issue for us. At this point, Dusty is going to talk about Atomic Host. We'll see if I manage to stay away from the speaker interference. I'm here to talk a little bit about some of the Atomic Host design goals, the structure, what went well, what didn't. Let's dig right in. First of all, I want to take a detour from design goals and talk a little bit about the update model. Basically, there's three steps with Atomic Host. You download the update in the background, you stage a new deployment for re-boot, and then you boot into the upgraded deployment. One, two, three, pretty easy, right? I'll talk about the design goals, and we'll keep that update model in mind the whole time. Essentially, what we wanted to do with Atomic Host was create reliable, fault-tolerant updates. We didn't want people to be scared of updates. We wanted to create offline updates, and I'll get into what that is here in just a minute. We wanted to focus on security. How do we do this? A way that we can do this is basically shrink the base of the operating system. If you ship less, you are responsible for less, and you're less likely to have security issues with the software they deliver. If we're shrinking the base, how do we still deliver something that is useful to the users and the admins who are administering these systems? At the time, Atomic Host was created in 2014. Container technology was starting to come on, so the answer leveraged containers. Then we needed to essentially develop an image-based update system that was different than the model that we had been using in the past with Yammer DNF. We need to revisit our design goals. First, we had reliable updates, offline updates in security, but now with that last slide, in consideration of the first three, we add good container host. That is something that we want to do as well. Okay, so the reliable updates. If you consider the update model I mentioned earlier, one, download updates, two, stage new deployment, three, reboot. In this model, there are two deployments, so you essentially have your pre-upgraded deployment and your upgraded deployment. This means that if your new upgrade doesn't work for any reason, you can easily roll back to the old one. If it makes it all the way to user space and it doesn't work, you can run RPMOS tree rollback and do a reboot and you'll be back into the old one. If for some reason it doesn't make it to user space, say there was a kernel or an NRD issue or something just early on, you can actually boot into the pre-upgrade deployment and choose to keep that one forever as well. So a little bit of fault tolerance here for people who are risk-averse and want to be able to go back if something goes wrong. This encourages people to not necessarily fear upgrades as much. The offline updates, which I said I clarified earlier, again, we download updates in the background, we stage a new deployment, and then we reboot. So this means that no software ever runs in a half-upgraded state. You know, getting back to the first design goal reliable updates, if you download and essentially perform an update, like on a YUM or DNF system, what you end up with is, you know, if you happen to have a failure in between somewhere, maybe DNF has a stack trace or RPM has a stack trace or somehow you get in a half-upgrade state, you lose power or anything like that, your system might be in a state that you can't recover out of. And in this case, for offline updates, software that is currently running on the system, say you didn't stop your services or whatnot, can still be running old software after the update has completed. So if you don't do a reboot after you run a DNF upgrade, you then check your packages to see what software is vulnerable. Well, guess what? It's all updated, but there might be some stuff in memory that's not. So this model allows us to have fully offline updates. You can actually leave your hosts online serving content, you know, performing their service on the update. The only time they go down is obviously when you reboot. The third design goal was security. So basically, smaller base, less risk there. The image-based update system also allows for users to verify what they have on their system, matches what was built on the server. So like you can basically cryptographically verify, and it's also signed. We also mount file systems read-only, so if somebody does get on your system, hopefully they're not able to actually modify the content of Mary's. And then leveraging SE Linux, you know, means that if they do happen to get on the system, hopefully they are confined and not able to touch things. And then the last one, good container hosts. So we basically provide container run times for users to use so that they can run their applications, because if you can't run your applications, pretty much useless. So we provide container run times and basically manage host updates ourselves. So you are essentially trusting another team to do your updates for you, or at least deliver you something that is reliable from an updates perspective. And then you manage the applications that are on the system. And the separation of concerns here, basically, atomic host team manages the host updates and the admins manage the application updates. Kind of helps the reliability. Okay, the structure of atomic host. I mentioned the image-based update system. So basically we created RPM OS tree. It actually existed before atomic host, but this was quite a good use of it. And RPM OS tree is really good at content tracking. It's a hybrid approach. It's not a disk image-based system, so it doesn't work at the block device level. It works right above the file system level, and it's smart content tracking. It knows about RPMs. It knows about bootloaders. It's able to manage all of that for you. So when it does an update, it basically gets the new content. It will update the bootloader, and then once you reboot, you are into your OS. The nice thing about this content tracking system is it is like Git for your operating system. Five years ago, if somebody would have told me, it's like Git for your operating system, I would have been like, yes, that's exactly what I want. I love the conceptual model. OS tree repo, or OS tree server side repo is like a Git repo. It is a remote to your client. It has branches that you can follow. It has checkouts. It has rebasing, so that's really nice. And the other thing is it shares common contents between different deployments. So for example, if there's a new update, and only one RPM change, and only 10 files change in the RPM, you're only downloading those 10 files when you do an update. So basically on the local system, you're only storing those 10 files in addition to what you already had on the system. So that's nice. The disk layout structure that we have is pretty generic, basically. You know, it gives the user or the admin a lot of flexibility. So we can have partition base. We can have LVM base. We can have ButterFS base systems. I think that works. I haven't tried it lately. And this is configured during install. So like if you were to run a bare metal install, you could configure this. Obviously, if you download a pre-baked image, you will get a partition setup that is predefined for you. And our mount points. User is read-only. Var is read-write. There's some other ones. For example, top-level directories like Slash Home and Slash Mount are redirected to Var Home and Var Mount and sim links are in place there. And the state in Slash Etsy is tracked and restored on rollback. This basically makes rollbacks more reliable because updated configuration files for new software might not have worked with the old software that existed on the system. Bootstrapping. Basically, you heard Benjamin talk earlier about how they're using shell scripts to installs. We, for bare metal, use an installer ISO, which has Anaconda in it, which is similar to what you might experience with the Fedora CentOS-based system today. And for cloud, we use Canonical Cloud in it, which is baked into the OS tree so you provide your user data for that. And it spins up the system. Obviously, there are problems with that that Benjamin mentioned earlier because it runs during boot rather than kind of before the system has switched into the new root. Hostic sensibility. So, you heard Benjamin talk about torques. Our answer to that was two-fold. One was package layering. So, basically, what you can do is take an RPM and mutate the immutable post, so you can add that RPM as a layer on top of what you already have, what was delivered to you by the atomic host team. So, that's one option. The other option is running system containers via Atomic CLI. What is a system container? It's basically like an OCI image, just how you would have built one using Docker build or Podman build. But Atomic CLI helps you grab that, set up a system to unit, so it'll run on every boot, and then also, you know, it works for making them super privileged, so that, you know, it can do the low-level things it needs to do to the host. Is this a good idea? I don't know. Probably not. It's too undefined what all those things can do, unfortunately. And it makes it harder to update reliably. So, what worked well? I think using the Fedora slash RPM ecosystem has worked really well for the Atomic host project, because essentially, we get to reuse all the hard work that's done in Fedora and also contribute back. So, if there's a bug in the kernel in Atomic host, it's the same bug that's in the kernel in Fedora. So, if we find a bug, we get it fixed. It helps everybody. If somebody else finds a bug in Fedora workstation or whatnot, and gets it fixed, it helps us. So, that's really nice. Packaged layering has worked really nice. Obviously, containerizing low-level system tools is hard. And this, that helps us get around these issues, you know, somebody has to really want to containerize a low-level system application in order to do it right. And we haven't hit a lot of cases where people are jumping in the water and ready to do that. And the get for your OS conceptual model is really nice for new people who want to understand what OS tree does and what RPM OS tree does. So, the other thing that's really good is representing system state in a clear way. So, this is an example of running a status on a system. And you can clearly see there's a version here that people can use to talk to other people, talk to us when they report bugs and say, hey, I'm having this bug. Is anybody else seeing this? I'm on version XYZ. And I tell them, no. I'm on version ABC and I'm not seeing it. Can you try ABC or I'll either rebase my system to XYZ and see if I see the problem. It represents any mutations to the system. So, layered packages. I've listed out there. I've got a few layered packages on that system. So, it clearly lists. Here's what a user has done to the system. Maybe you can replicate the state. So, what didn't work well? I sent this email to AtomicDevel. I forget. I think it was like eight months ago or so. There were four big issues that were kind of plaguing our ecosystem. I mean, people would just hit these periodically. And one of them is lost content in Etsy problem. So, I mentioned earlier, Etsy is tracked. The way it works is a user will run RPMO history upgrade. And then, as part of that, a new deployment is created and a new Etsy is made as part of that. It basically takes the current Etsy on the system and the Etsy from the new commit and merges them together and creates a new Etsy for the new deployment. The problem is, if you run RPMO history upgrade and then wait three days to reboot, if you made any changes to Etsy on the running system before you reboot, those will be lost because the Etsy merge was done at the time you ran RPMO history upgrade. That's now fixed. We started doing the Etsy merge right as you do the reboot rather than before. So, that's been fixed. Another problem can't package later some RPMs. If RPMs write two directories that are read-write on the system. So, for example, user local is read-write because people put stuff there. Slash op is read-write because people put stuff there. RPMO history does not allow somebody to install an RPM that puts contents there because we don't want to control directories that read-write. We want to control ones that are read-only essentially. And then third party kernel modules. Benjamin mentioned this. DKMS, AK mods don't work. This has been a problem for a while. We haven't really got a good solution to it. His just answer is probably about as good as we got to. We did have a user that actually created like a subsystem for doing this for WireGuard. But that's about as far as we've gotten. The bad the update philosophy, obviously our update philosophy from beginning was more of a you update it on your timeframe type of scenario rather than the automated update policy that Benjamin mentioned. I think it would have been better to go the other way because obviously people aren't as proactive about their updates. So this essentially makes upgrades less reliable because if you're upgrading from something a year old that's a lot more likely to have issues than if you're upgrading from n-1 to n which is probably something that we tested. And package layering also makes upgrades less reliable because we can test the base upgrade to another base upgrade and we can even test base upgrades with layered packages, but we probably didn't test your set of layered packages with our upgrades. It's just a combination we have a big matrix there if we start adding a lot of layered packages. I'll hand it back over to Benjamin because do you want this? Alright, so first of all what are we trying to build? I mentioned before that we're essentially trying to take the best of container linux and atomic host. This is the language we have now I won't read it out. It looks a lot like the container linux usage model as it exists now. It's for running containers in production. Minimal system monolithic image we have some use cases defined I haven't there's some stuff down below that I haven't included on the slide but essentially the primary use cases are either a single server essentially node for running docker or podman or a clustered node for a kubernetes or okd and then secondarily cluster nodes for running other container orchestrators. So platforms we're targeting the same cloud we see and so on that container linux targets. The install process we're currently thinking will be something like CoreOS install probably not the same shell script but essentially download a monolithic image and write to disk. Provision will be by ignition the partition layout is still up in the air a little bit the straightforward thing to do would be a single partition for both user data and operating system code but it seems like it'd be nice to separate those out first so that you could format the user data partition using ignition and first boot and also just to keep a little bit more isolation in terms of disk space usage and so on. Automatic updates we are throwing away all of the container linux update code server protocol client on the client side there will be RPMOS tree installing primarily RPMOS from Fedora we will have rate limited rollouts the same way we do on container linux that code does not exist yet there is some work going into a network wire protocol for that and we'll have to write new server code because we're supplying automatic updates the same model applies we cannot break users any backward compatibility breaks in the OS will need to be announced with a deprecation window far in advance and we're going to work on the automatic rollback story as well that there will be user provided health checks and hopefully more comprehensive automatic reboot. Update streams this will probably not be the container linux alpha beta stable model but we want to do something a little bit like that so there will be a stable stream and there will be at least one pre-stable stream and there will be some details still TBD the reason that's important is that the testing model will be similar as well there will be CI but it will also be important that users run some nodes on the pre-stable stream in order to catch any regressions metrics turn out to be important the metrics that we have on container linux are better than nothing they have gaps and without those it's been really difficult to figure out how to spend our time is some corner case actually worth spending a bunch of development effort on are we prioritizing the correct platforms so having some sort of metric gathering is useful there will of course be privacy knobs so that you can turn that off if you really don't want to give us that information but hopefully we'll be able to do this in a way so that we can get metrics independently of whether you're obtaining updates container infrastructure the docker problem has not gone away there's still a need to be able to run either the docker or the one that Kubernetes requires we'll be shipping podman and at the moment it looks as though we will not be shipping rocket rocket is not the direction that the container ecosystem has gone and we think it's a good time to start from now in addition, farther up the stack we're currently looking at shipping Kublet and Cryo in the OS image so that's one less thing that you have to provide separately but the problem is that both those pieces of software care about the version of Kubernetes running in your cluster and so in that case we're going to have the same sort of issue as with docker we need to figure out a way to give you the version of those components that you need at any given time how to do that is still an open question but probably it'll have something to do with package overlays because that mechanism exists and it works well they're also useful for debugging so container Linux is a little bit weak in this area if you need debugging tools that aren't on the machine you have to bring them in, there are a couple of different ways to do that so we want package overlays around but we're probably going to try discouraging them being used any more widely than necessary the exact nuance there has yet to be worked out but as Dusty said overuse of package overlays complicates the update story a couple quick things ARM64 support, I mentioned before bolting it on much later is a bad idea we will try to ship it right off the bat we may need to cut it for the initial release if time becomes a factor Contrary wise Python, we will try not to ship it but we may need to include it if time becomes a factor Cloud Agents the OEM partition has to go we're not doing that again so for starters we're going to ship all of the Cloud Agents inside the base image effectively and then conditionally enable them based on what Cloud you're running on the install images will be slightly different because each image needs to know what Cloud it's running on but probably that will just be a few bytes and otherwise the images will be identical in the long run the idea is not to ship all of the Agents inside the OS but rather to not ship the Agents where we can get away with that in other cases there's just a little bit of callback code or whatever needs to be implemented to tell the Cloud that the node is healthy we can implement that ourselves and not ship it for the code so there are a couple other things that a couple other things that are still up in the air that we know will need more thought I mentioned the cluster coordination issue before we were container Linux with locksmith wants to just reboot nodes out from under the cluster that's not okay and it would be nice to have some sort of unified model where code on the OS node interfaces with the cluster in order to coordinate and then of course third-party kernel modules that's still up in the air as well alright I think we have just a few minutes left if people have questions yeah so the comment is some problems that we have where people want to use either atomic host or the future fedora core OS for things that aren't necessarily primary use cases that we're targeting it would be nice if we made it really easy for people to create their own atomic host or core OS system so that they could deliver their own and have a little more control over it a goal is that people are able to use the build tools and create their own systems that they can update so that would be really nice for updates well offline updates or third-party update servers so that's how it will work as well so I'll do the first part so comment from SUSE basically they're doing something similar in the cubic project and they do recommend having a separate var from root as part of that and then the question was have we looked at the Kubernetes reboot daemon for coordination I tend personally to work further down the stack and there have been a variety of reboot coordination things in the Kubernetes context that have been attempted I have not looked at that particular piece of software Luca are you aware right so the comment was that several of them require running in Kubernetes cluster I think from the OS perspective the interesting piece is finding a way to provide some some hook or library something on the host that those things can interact with how do you do static IP network configuration without an installer it's relevant for bare metal and on bare metal hopefully you think about this I was about to say you control the DHCP server but that's not the answer you wanted you can when you're installing the system either from pixie or from an ISO image or whatever you can modify the kernel command line in the grub config and on container linux at least we support ip equals where you can pass stuff in for the net ramifas for the real root file system ignition can write down a network dconfig so it depends on exactly which piece you need to get working with there's a couple ways to do it do we have in the contract with container linux customers or folks like that that they need to run alpha and beta the need to run alpha and beta has been sort of an ongoing issue because it's something we talk about in talks and things but we don't actually document it very well and so we have users showing up reporting stable regressions and that's the first time they learned they probably should have been running a little alpha and beta the messaging there has been uneven any other questions so the question is what happens if there's a merge conflict in slash etsy directory um that's probably a better question for Colin Walters or Jonathan Laban because they know the technology better than I do but I have not hit that problem um I think what they basically do today and it probably could be smarter but I think what they do today is if there is a file in the current etsy that's on the system that has been modified then that overrides essentially what comes in in the new deployment or as part of the new update so if it's been modified it stays if it's the same basically if it hasn't changed if you as the user has not touched it it will pull in the one from the new deployment and use the new one so I think that's what happens in rpms today but the rpms will basically drop down a .rpm new file so you can essentially inspect the old one and the new one and I think that's what it does in Atomic Coast Josh the question was why are we in Fedora CoreOS switching to automatic updates which is the container Linux model rather than the Atomic Coast model the answer I guess comes down to we hope you want updates bug fixes security fixes the nodes should be essentially an appliance that you don't have to think about because you're thinking about the things running in containers on top of it and so if it becomes another thing to manage then we're not achieving our goal right and to second that basically the idea here is that and I know I'm going to ding for using this analogy but I feel like people know it really well pets versus cattle right we essentially want to support more the you know the spin up a node throw it away model rather than the spin up a node let me customize it exactly how I want it let me do create a snowflake and in the cattle model essentially automatic updates should be something that's welcomed because if for example something doesn't go right you can throw that note away and bring it back up I also feel like it has reached it's definitely resonated with a larger user base because there are more people using CoreOS container Linux than either Relatomic Coast or Fedora Atomic Coast today so obviously that spoke to someone and we listen to that so I think it's the right way to go in the future. Save that again. The question is how do we plan on dealing with RPM dependencies in Fedora CoreOS? How do we trim it down to make it smaller? So pretty much right now we need to see what tools would be removed from the host if we remove Python and which of those tools are ones that we absolutely can't do without and then either try to work with those teams to somehow remove the Python dependency because some of the things that we've looked at are essentially packages that have one Python script in them that could probably be rewritten and then some of them are packages that are completely Python so we'll just have to deal with those one by one and see if we can do this or not. Right. I guess a way to do that is either convince people to rewrite it or try to get them to put it in a sub-package. The other option is when we create the OS tree we can theoretically remove particular things from the OS tree before we do the commit so that's an option too. It's not as clean but real quick, let me try again on your static IP question. If you're running Chorus install from whatever medium it can inject an ignition config and the ignition config can have a network D unit in it. Real quick, are we done at 3.20? Yes. Anyway, we might be out of time. Thanks everybody for coming. I appreciate it.