 Hi everyone we are super excited to be here with you today to talk about container operating systems the goal of this presentation of this session of today's session is to give you the key elements to get started with container res so we are going to introduce some low-level elements technical implementation and stuff like that but also high-level concept to understand container res the goal of this session is to give you the key elements and to understand why you should use container res to operate our container workloads so before going further let me quickly introduce myself my name is Mathieu I work as a flat car engineer inside Microsoft I'm mainly involved in the test automation of flat car I'm involved in future development and universal topics like for example cluster API or upstream contributions to over projects in the Linux ecosystems and outside of work I co-founded the SRE France association so with my friend we're here to organize develops and SRE events in France meetups and also bigger events like the SRE summer camp which is a today's event where we organize talks and stuff like that around DevOps topics and what about Timothy hey thanks Matthew for the introduction so I'm Timothy Rabier and I'm currently working as a chorus engineer at Red Hat so I work on Fedora chorus and Red Hat chorus and then on the sign in the community I work a lot in the Fedora community especially on the Fedora atomic desktops we just went through rebranding we were used to call Fedora silver blue Fedora Kinoite and all of those variants now we call them all the Fedora atomic desktops and I also do some work on the KD side KD developer working on discover and other applications all right enough about us let's talk about before setting up a little bit of context of why we're here and why we're bothering talking about container focus voices so we're here to run applications here with the entire conference is about KubeCon cloud native app's running application inside of containers and pushing them somewhere to run them to get a service out to users to get a service to our enterprise to sell or to customers or to clients and so those applications we now we run them mostly on containers and we need a platform to run them to deploy our containers to make sure they are okay to make sure they run well to monitor them to ensure security all those things and so well we are KubeCon so here obviously we're going to look at Kubernetes and how do how all of this fits in and so yeah we run our application in Kubernetes which brings a lot of benefits around uh manageability uh prediction all those things and for most people that's just fine you get uh Kubernetes clusters from your cloud provider you have four clusters it gets auto provision it gets auto setup and you start pushing out containers applications services uh all things and you start there what we're going to do in this presentation is to look underneath Kubernetes and what's making Kubernetes shine what the operating system uh that makes Kubernetes work well and so here we're going to look at flat car container Linux and federal queries which are two container focused operating systems uh which are great fit for Kubernetes deployments so of course the the infrastructure here our layer doesn't stop here your way is going to run that on cloud platform on your private cloud on your data center and so usually you want to have those operating support all of those platforms and also support different architectures especially AMD 64 which is like the classic one of course and now the ah H64 one which is getting more and more popular so that's why we're here we're going to look at Kubernetes under the hood and what uh does it uh what it takes to run containers uh efficiently at scale inside of a cluster and how the OS can actually help you do that not just be just a component of the sack but be something that really empowers you to run Kubernetes all right so let's take a quick look back through time a little bit of history uh here it's about 10 years ago a little bit more than 10 years ago we got the first release of Docker which started this whole thing in a sense which really started uh the the thing around containers and the popularity around containers and how we ship them how we deploy images and all those things how we deploy applications inside of containers and this happened around March uh 2013 a little bit a little from months after a little bit while after uh we got the first release of at the time named chorus uh the chorus operating system the first container focused operating system which also introduced etcd uh notably and used Docker by default a few months later again uh we got another uh entry another entrant in the field Fedora atomic host which were uh the Fedora variant of having a container focused operating system which also came with Docker and at the time it came with Docker with a project called DRD which was kind of like a mix between having a continuity or a cryo or something like that like a component that would help you create managed containers at scale and then all of this had happened but we didn't even have Kubernetes yet Kubernetes happened only at the end uh of uh 2014 and September 2014 and that's where our journey starts along the along the along the right two major events admin the first one is that uh in January 2018 radar acquired container Linux the chorus company and also the container index projects and a few months later the king folks organization uh started the flat car containers projects so here that's about the history of our two projects and why we're here and why we're here talking about container focused sources but I've been thinking that a lot of times already what is actually a container focused source okay thanks a lot for this great historical listen uh so yeah we kept using the word container whereas but what is it uh actually so it's an operating system but in the opposite of day today's operating system for example Ubuntu or Debian um container OS are designed to run containers that's it nothing more so everything yes you can thank you everything inside the operating system is designed to run containers so you have the container runtimes already available in the system out of the box um you have the kernel the right kernel modules the right kernel options to run containers in a safe way uh so yeah you just want to run containers that's it can be single containers it can be bigger workloads for example with Kubernetes um container OS benefits of the automated updates feature which means once you've deployed a container OS you just want to forget about it you just want to be able to forget about it in term of auto uh of updates uh in the opposite of the general purpose distribution you where you need to uh do the actual upgrade with container OS it's up to the the release itself to push the new release into the system so like so you can just forget about it and every two weeks every month you get the new update of the container rest so it's really nice in term of security and in term of technical depth and finally the last main concept of container rest is the immutability so with both system the idea is to get some nice feature for example we can see on the next slide uh the ability to have slash usr in read only yes thank you so for example if you ssh into a container rest you can't write anything into slash usr so you can write something on ubuntu for example but on container rest you can't write anything into slash usr not even as a super user so that's one of the main feature of immutability and why this uh basically all the all the system is image based and this image is the slash usr so the binaries the libraries the configuration the configuration files from the from the maintainers team are shipped via the slash usr partition so with this uh yeah we ensure the immutability of the system and you can be quite sure that your system has not been modified between between the release servers and the actual deployment of the instance now the question is how can I provision the system how can I provision the immutable so with container rest we're using a pretty cool software which is called ignition so you certainly heard about on sible to provision system you heard about cloud init ignition is another one and what is the main feature of ignition first it runs from the init ram file system really early during the boot of the system this is quite handy if you want to do some low level operation for example partitioning the disk or injecting or removing kernel command line parameters you can do the do this easily uh with ignition from the init ram file system in the opposite of cloud init uh ignition runs only at the first boot so once your instance has been provisioned and successfully provisioned uh ignition won't run anymore so for example with cloud init you need to put some condition is it the first boot yes no do something do something else in in case if it's it's not a first boot so with ignition you don't have to think about this it's only applies once at the first boot and if there is an issue during the provisioning of the instance for example if you want to write a file or get a file uh from a remote location and you can't get access to these files for whatever reason the the provisioning will just fail and your instance will run into an emergency shell so it's quite counterintuitive at the beginning but once you get used to it's pretty convenient because you know that once your instance is up and running it means the provisioning has been done perfectly uh this is an example of ignition configuration so if you want to have a look it's a json file and it's declarative so you just define which files you want to get on the system uh which public key uh for which user you want to add uh you define some sim system d units uh you enable them or not so yeah this is a configuration of a system you can see that it's running using json uh format and we all know that we don't like run write json files directly because it's pretty complicated to to with the indentation and stuff like that so we use a handy tool called butane butane is uh is a tool to use uh with ignition to generate ignition configuration so butane uh has some variant available for flakka for vedorakores for openshift for uh yeah this tool like that and the idea is to bring an additional layer to generate the ignition configuration like so you can easily catch any issue with your configuration for example if you use a key that does not exist in the ignition specification uh you also some you also have some pretty nice uh sugar on top of this if you want to auto generate some ignition configuration based on some values in the butane configuration and finally it's eml so it's easier to write and we are the kubcon so everything is eml uh yeah so that's it for for butane and we can see now how we go from butane to ignition just using the cli and that's it so we have the butane uh configuration that is being transpiled to json configuration file so it's not only converting eml to json as i said there is a few over elements that are handled for example the content of my etc crony.com file uh has been converted to url encoded format for easier uh if you have some character to escape or stuff like that okay so now i have my ignition configuration what am i supposed to do with this i want my my container os to be able to consume this configuration uh so here uh we we do support with ignition a couple of uh cloud providers well known cloud providers for example azure aws open stack scaleway recently um but also vmware for example so once you have your ignition configuration you are just going to upload this configuration on the cloud provider using the instance metadata service imds so it's the same thing as cloud init configuration for example you're just going to take this ignition configuration put it on the imds of your favorite cloud provider if your favorite cloud provider do not have this kind of service for example vmware you can use the guest info of vmware if you run on bare metal that's fine you can also specify a remote url to say to ignition to fetch this configuration from this remote url and finally we have the instance that wants to boot and the instance is going to fetch the ignition configuration based on some well known value so your instance will know if it runs on bare metal on this cloud provider or this server cloud provider and it will know consequently how to fetch this ignition configuration so this is how we can provision a container rest using ignition and butane so uh this mechanism this kind of feature uh has been leveraging uh leverage for sorry uh on many projects for example cluster api uh use use this mechanism so with cluster api you can use of course cloud init but there is a flag that you can enable when you init your project with cluster api to say hey i want to use ignition provisioning which means that basically with cluster api you can out of the box use flat car or fedora chorus for example or micro s that use ignition um to deploy kubernetes cluster you can also use typhoon typhoon is a collection of terraform modules that allows you to deploy flat car or fedora chorus once again on many cloud providers so it's just terraform modules that use butane and ignition configuration and by the way there is a terraform provider for butane so if you want to use infrastructure as code as usual with terraform and everything you can use the butane provider so which is probably convenient and of course we have ok d and open shift that do use uh this kind of mechanism to provision the nodes the final note final word on the provisioning uh as i said ignition runs only once at the first boot and that's it so if you want to change something into your ignition configuration what we recommend to do is to scratch your node and start a new one a fresh new one so do not try to modify the current node that is running just delete it and create a new one with this new ignition configuration that's it about configuration now i want to see how to automate auto auto ah sorry yeah do automatic updates this is like a kind of a major topic because when we say automatic updates for you know focus so it says usually the first question that come to mind are like making updates automatic isn't that risky what if the happen the updates happen on sunday morning when everybody's out from work like everybody's enjoying their weekends or what if it happened at 6 a.m. or um is what if there's a book in the new version that is breaking everything on my cluster um and all of the things i had to i control automatic updates essentially it's the it's kind of like the the big question around that and the id is that first for automatic updates you need them to be reliable so you need them to be consistent always working and the the main id the main principle behind that is that they are atomic on those systems and so they are either applied or not applied you don't get enough updated system the entire system is updated as an image and so you either get the new version or you stay on the previous version you don't get half the update on the system and that makes them reliable so you know that when you're going to update a system you're gonna get the new one or keep or stay on the current one and that needs that creates a lot of a lot of interesting properties on fly car container Linux it's implementing using the ab partition model it's a little bit similar to what's in chrome or android chrome s or android you've got two partitions and you update the partition that you're not using and then you reboot to use the new version on fedora chorus it's a similar id except for using partition we have a single file system and we use os 3 to duplicate the files of each version uh to to save a little bit on disk space the main benefits of this mechanism is that it makes a bit less risky the the main thing is that you always have at least two versions of the system installed at the same time on a node so when you update and you have an issue i don't know there's a bug in brunsey hopefully that doesn't happen that often but sometimes it does or there's a bug in another component of the sack and well you still have the previous versions you can immediately roll back to the previous version and get your communities closer your system is running straight away and then take the time to debut issue so you don't have to debut thing in production right away because you know that you can roll back and just cost a reboot this also limits the risk when testing fixes so if you get a new version you've got a new upcoming version of the system potentially adding a feature that you like or fixing a bug that you really want you can give it a try and you know that if something breaks really bad you can always go back to previous version so how do we make that happen across the stack well the idea is that we also make this safe because we are creating channels so we're creating update channels for nodes to get updates at a different pace the idea is that the system itself there's there's always going to be updates like we're not done with updating systems we're not done with fixing bugs if you know how to fix things call me but the idea is that there's always going to be the next update and so we need to prepare for that and so we're creating channels so that you can prepare and see things come in advance so the idea is that we have the alpha or we call it next depending on the projects or the beta testing channel and the stable channel family and the changes flow through those channels through time so all the new changes and new features they get into the alpha channel first and that's where you get new bugs new new new bug fixes new features and you get them pretty soon once they've been baked in once we're like confident that the new feature is good enough the bug fix actually fixed the issue we promote the content to the beta channel so a few weeks later if you are on the beta channel you would get the upcoming the content from the alpha channel that has been tested before hand and then we do that again we try we make sure that everything works great we make sure that nobody finds any major problems and if a few weeks later again we don't have anything major to report on this from the beta channel then we promote again the content into the stable one so that everybody gets it now into the stable channel so what this lets you do is that gives you time that gives you time to things see to see things coming so you can run a little bit of alpha some beta and some stable on a cluster and see things see changes progress on on the systems on your cluster and see how that reflects all that works so the ideal cluster that we would we could describe that that is to run a little bit of alpha so we say you would run 1% of your node running on alpha to make sure that you see problems as early as possible but at the same time if you only have one node normally that and if this one goes bad that wouldn't break everything on your cluster so that's okay to be a little bit risky on this one then you would run like 5% of the node on your clusters on the beta channel the beta channel really should be pretty safe because it's been running for a while in the alpha one and so yeah it should be much safer to run a larger set of node but still you want to don't want to take risks so you run about 5% of your node on the beta one and then finally you run the rest of the cluster on the stable channel the one that shouldn't have any major bearing at all that hadn't been proof tested from being going through these two previous channels so yeah this is like the major way that automatic updates are made safe because it gives us a new time to react to train to react to change and to make sure that issues get boarded and fixed usually before they even land into the beta channel ideally so what happens during updates so all right we've got safe reliable updates you've got time to react to them but like when they actually happen when a new version is pushed to the servers then all the nodes will query the server regularly and start start pulling the updates and start preparing it and eventually reboot because from the systems the way we apply that is that we reboot the system if you have the clusters and we release new update and all your nodes start updating and rebooting at the same time that's probably not going to go well because even Kubernetes can withstand having half or two two-thirds or two quarters three quarters of the fleet go down for reboot at the same time it's not going to go well so the idea is that we use reboot coordinators so we use programs so it can has there be can has to be demons so a headlocking QoD our examples like that of demons that runs on the node that coordinated updates or it can also be Kubernetes operators that make sure that not all of you nodes on the cluster reboot at the same time so usually you only want like one or two nodes depending on how big your cluster is to reboot and wait for them to come back to make sure that they are on the right new version and then start again to reboot progressively all the nodes so this lets you reboot progressively your clusters and update progressively your cluster so yeah you got operators so flat card Linux the data bird is one example and they all is also the matching config operator that does this as part of the okd open shift projects you don't strictly need you also don't strictly need kubernetes to do that so you've got support index mix and main gati both demons on each projects to do those kinds of coordination so whether or not you have the kubernetes clusters you can do the things the major feature that is part of those demons is that they also support we what we call reboot windows so that you can see for example i don't ever want reboots to happen on weekends because on weekends i don't want to call folks i want to be i want to have my reboots happen on my cluster only during work hours during the day so that everybody is here in front of the computer and can react if there's an issue all right so we said container focus OS what does it mean to be container focused so everything inside container rest run as container as we said so the idea is to keep this philosophy in mind when we provide new package inside the operating system the more packages you ship the more vulnerabilities you ship so that's why we try to keep only the very minimum into the operation operating system to run containers so if you want to get firefox or whatever on flat car or fedora coro is there is a big chance that it doesn't get accepted so most of the time yeah we try to run everything as containers so out of the box there is a docker container d runtimes available and also podman on fedora coro is so we talk about immutability we talk about provisioning the system and automatic updates but now how can i extend the system indeed there's some hks where running things as container is not enough for example if you want to try a new container runtime if you want to get back to an old container runtime because you didn't get the time to upgrade your workloads this is this kind of specific use case where you can't run things as containers or if you need some specific modules or stuff like that being loaded into the system so based on this uh you can't use containers so what how can i use uh the immutability features to extend the system there is no package manager slash us is in read only so what can i do for example on flat car we decided to do the system this is x approach so system this is x is a brand new feature from system d a project the idea is to mount as an overlay on slash us and slash opt uh an image an image can be a directory can be a squash fs image can be anything that's compatible with system d c's x and here you can see how to install podman on flat car so uh podman is not available on flat car but uses system d c's x you can use podman and one nice thing to say is that we are using also continuity and docker shipped a system d c's x into flat car so you can using butane provisioning deactivate docker for example if you just want to use container d what about fedora chris yes on fedora chris unfortunately right now there's some slight incompatibilities between our industry and system d c's x so we're not using that we've taken another approach which is to say but that we call continuous chris container layering so the main idea is that we put the content of the os like the entire file system the entire all the binaries of the application all the content including the kernel and the need to run the first and we placed all of that into a container so it's like just like a transport mechanism you get a container with everything it you can also run it but it doesn't really make sense to run the application on top of that but yeah it's more like a transport mechanism and so as you get a container you can use container native tools to extend it to change the content of the file system so here about an example container file where you start from the basic the base image of fedora chris this table release and then you had an external repo so here i picked the tail scale service the tail scale demon from a friend's tail scale and you can so install the demon without permission install and then system city will enable the demon and then finally do some keynotes with special command just just for that for the special containers as it's container file it's like docker found you can use any container native tools to build layers a layered version a layer container image out of it and so here i use podmap build to create a layered image out of it and then push it to a registry and then finally rebase all your node to this image and at the end you get a system which has the tail scale demon so still the same properties imageable reliable updates except now you've customized the bits that you really wanted to have on your system so the landscape of container focused OS is not just our two projects there are other projects here that we haven't talked about but you can take a look at them they also focus on the kind of the same use case they have also similar properties with slight differences for each project some are available only on some cloud providers some are available everywhere it kind of depends some are more minimal than others there's many more options for you but yeah here we took the focus on the our two community projects fat car contend next and federal careers we share a lot of the components we share a lot of the history a lot of the legacy and so yeah that's that's what the the goal of all of this all right and thank you very much for attending and listening to us and here's like the famous sleeper forms