 Hello, everyone, and welcome to our session on Kozi here at KupaniU. We hope you're enjoying the conference so far. I'm Stephen Borelli. And I'm Andrew Reinhart. We're very excited to show Kozi to a larger audience for the first time. Kozi is the result of developing Talos for the last five years and distilling the lessons we've learned into a specification we would like to share with the world. Let's start at the very beginning and define what Kozi is. Kozi, it's an acronym for the Common Operating System Interface. And what it concerns itself with is the configuration of the nodes that run in a Kubernetes cluster. So whenever you provision a new cluster, first you provision the nodes, whether they're bare metal or VM. And then on the next thing is you actually have to configure those nodes for things like networking, disk, any processes that you want to run installing the Kubernetes software. That's what Kozi concerns itself with. It also gave us a chance to reimagine operating systems. And for this is what do we do in operating systems in the Kubernetes world? Another thing about Kozi that's actually been really fun is it's given us the chance to rethink exactly what it means to be an operating system in the age of containers and distributed schedulers. Finally, let's talk about the why Kozi. One of the reasons is that Kubernetes and its ecosystem from cluster API to things like the storage and networking interfaces, they're deeply intertwined with the underlying Linux operating system in many ways. And what we find is that Kubernetes and Linux, they don't only differ in the way they're implemented technologically, but there's also a difference in philosophy, in the way the systems are composed. And we're going to talk about this a little bit, the philosophy of Unix. And what we see is that this causes problems at the boundary where Kubernetes and Linux interact with one another. So now that we've talked about the motivation for this project, let's talk about some of the things that we'd like to have as desired features. The first thing, one of our highest priorities is having an operating system that is completely API driven. And what this means in practice is that we want an operating system that doesn't need to have a shell installed. And we would not like to have people SSH-ing into this box to configure things. Everything on the operating system should be configurable via an API. Another thing that we're really looking to implement in this operating system is having configuration settings have a standardized model to them. Currently, in the Unix world, we have lots of different file formats for every single different application. And this causes a lot of issues in terms of integrating with other tools. So one of our goals is to be similar to Kubernetes and have a standardized model for any kind of settings that we want to put into it, whether it's a disk or DNS settings or anything like that. Another thing that was very inspiring to us was the pattern of Kubernetes controllers and custom resource definitions. Now, custom resource definitions allow you to extend the platform with a flexible schema so you can make your own objects look like Kubernetes. On the back end, the way it's decoupled and you could run controllers that actually get to the desired state and you could write these in any language you want and they just continually run in reconciles. So this was something that inspired us, this kind of running model. And then finally, we wanted these systems to be able to easily propagate events to other applications running on the system. Now, there are a lot of eventing systems on Linux, but we've seen in the past that they're not heavily used because sometimes the interfaces are difficult to use. So we're looking for something that, say, if someone pushes a power button somewhere, we could actually tell Kubernetes that someone's done that. So let's get to the agenda for the rest of the talk. First, we're gonna venture into the past and talk about a little bit of Unix history and we hope this will help you understand where Unix comes from. Then we're gonna venture into the present and talk about modern container operating systems and then we'll be having a deeper dive into Cozy. After that, Andrew's gonna show you a Cozy demo of some of the cool things that we've implemented so far in the reference implementation and then we'll have a Q&A. So first, let's go back in the past and learn about the history of Unix. So let's start by talking about the history of the Unix system and to assist us in our history lesson today, we'll be sourcing a video from 1982 by Bell Labs. It's called the Unix system, making computers more productive and I do appreciate the fancy graphics. So the first thing you need to know is that if you're running Kubernetes today, the odds are overwhelming that you are running it an operating system called Linux, which is a descendant in a very complex family tree that goes all the way back to this earlier system called Unix. And Unix itself is very opinionated. It's a very simple operating system because it descended from another system that was being developed in the late 60s that was called Multix and Multix due to its complexity was having a very slow development rollout. So the folks who were developing Unix at Bell Labs wanted simplicity as a core function of the operating system. Another thing that was very important for the early inventors of Unix like Ken Thompson and Dennis Ritchie is that they were programmers and they wanted an environment that they can enjoy working in in a communal environment. So from the very beginning, this concept of a common work area that you can log in and program interactively was very important to them. And these kinds of ideas, simplicity and communal work influence Unix to this day. Another real key core concept of Unix is this idea that it's a multi-user time sharing system. What this meant was Unix was designed so that many people could log in it to the same time and perform work. And the way folks logged into a Unix system was via a serial connection. They would set up a terminal, hook it up to the system. And as soon as they did that, they would get a program called a shell that they could issue interactive commands to. So there's two concepts here that we see in Unix that we may not want in the Kubernetes world. The first one is we do not want users directly logging into our systems. And we don't want them sharing resources. The ideal Kubernetes workload is something that's highly isolated from every other workload. Another issue is that these shells that still exist today in the form of programs like secure shell, we still predominantly manage Unix systems by issuing commands and reading the output. Another key concept of Unix is that files are really unstructured in the system. And the canonical way to work in Unix is to take this unstructured text and pass it through a series of filters and all these small single purpose utilities to get the result that you want. And while this allows an enormous amount of flexibility, it also had a legacy that Unix ended up with not having a lot of standards in terms of configuration files. And this made building tooling around Unix very complex. The last thing I want you to know about Unix is fragmentation. When AT&T developed Unix, since they were a monopoly, they were forced by the government to license Unix to other companies and academic institutions. And because of this, we ended up having many slightly incompatible Unix variants. And this was the condition for the first couple of decades of the life of Unix until in 1991, an open source operating system came out called Linux that was mostly Unix compatible. Unix Linux over the next couple of decades ended up eliminating the market share of almost every commercial Unix and becoming dominant. Of course, Linux itself saw multiple distributions emerge each which was slightly incompatible with the others. In 2013, the Linux community was transformed by the emergence of Docker. Docker is built upon C groups and name spaces in the Linux kernel and it made it easier for developers to package, share, and deploy applications. Using Docker means you don't have to have a system administrator install your application on the server itself, they could just run your container. With the growing popularity of containers came the emergence of cluster schedulers, including our favorite one, which allowed organizations to run containerized workloads across a pool of servers. This combination of containers and schedulers caused many developers to start rethinking the role of the traditional operating system. And this led to the emergence of the container operating system. Now container operating systems are focused on stripping out functionality from full featured Linux distributions so that the only remaining software installed is what's required to run a containerized workload and the schedulers. This was a significant advance but container operating systems are starting to present their own challenges. So let's talk about modern container operating systems. As part of the research into CoZ, we surveyed the documentation and code for a number of popular container operating systems including Flatcar, Bottle Rocket, K3OS and Talos. We were really impressed with all the operating systems but there are several things that we noticed which ended up influencing the design of CoZ. First, the systems have very little in common in configuration from the initial node user data that you send on boot to how OS settings are defined and validated. Another thing we noticed is that there was no common API. Some container operating systems use SSH, others have HTTP or GRBC APIs. Other ones are designed not even to be managed via API. You just boot them and if you need to make changes, you just boot them with different user data. Another difference between all these API, all these container operating systems is core technologies. Of the four surveyed operating systems, there are three different init systems. Now, two of these container operating systems use system D but the way they manage it is completely different. Finally, even though these operating systems share many of the same goals, we found that there wasn't a lot of shared technology in the underlying implementations of these operating systems. So we're seeing the same fragmentation. We're running the risk of repeating UNIX history and having multiple incompatible operating systems. But here's the thing, what we've seen with Kubernetes is that we could define common standards that still allow innovation. All right, so let's talk about CoZ. First things first, this is a disclaimer. We're recording this talk about a month before KubeCon and CoZ is an active development. So what you see at the time of this talk may be very different than what we're presenting here. The first thing we want you to know about CoZ is that it includes a set of modeled configurations. And what this means is that basically everything that could be configured on a container operating system has a model defined. The examples we're showing here are in YAML but CoZ doesn't care. It could be in Tommel or JSON. As long as it could be serialized into the CoZ putter above message, you can use it. The next thing that CoZ defines is a common RPC definitions. If you're used to Kubernetes, these are things like get, set, list, and watch, and others. CoZ also defines plug-in standards and these are inspired by Kubernetes projects such as the CNI and the CSI. Finally, as part of the CoZ project, there's a reference implementation that's written in Rust and this is being inspired by an internal rewrite of the Talos OS engine. Models in CoZ are protobuf messages and these protobuf messages map very closely to Linux and other Unix settings. The next step in CoZ is defining a series of RPCs and these are inspired and very similar Kubernetes RPCs. You can manipulate resources by listing them, getting them, creating them, watching them and CoZ tends to be a declarative system in terms of that you set the desired state for a resource and then the CoZ plugins will work to get the system into that state. For communication, CoZ uses bi-directional GRPC. Now as part of the research into CoZ, we have investigated other internal IPC systems for Linux but for simplicity of implementation, we are starting with GRPC. Another important thing to understand about CoZ is that plugins are designed or meant to be as close to the underlying operating system as possible. CoZ is not a configuration management system. We don't want to wrap other commands and execute shell commands and parse the output. The expectation is that a CoZ plugin will be as low level as possible, either talking to the kernel or low level APIs. Finally, the things that mutate the work and do change the system are plugins. And plugins are very similar. They're inspired by Unix binaries and Kubernetes controllers. Every plugin is a separate executable like Unix utility, but it runs continuously like a controller. Now within a plugin, you can manage multiple resources. So when you bring up a plugin, it registers with the engines what types of resources it's responsible for managing. And a plugin internally could be like a controller manager. It can manage multiple other controllers inside that manage different parts of resources. The CoZ engine could be responsible for the life cycle of plugins. So what you can have is that when CoZ starts up, it can start all the plugins. And if a plugin dies, a CoZ engine can restart it. Another thing to know about plugins is instead of standard output and standard input, they take inputs and outputs over the GRPC channel and act on those. A special class of plugins generate events. And these are called generators. And the way generators are implemented in CoZ is they use Linux kernel EBPF infrastructure. So a generator will deploy a probe into the kernel. And this EBPF probe will generate kernel events. The generator is then responsible for taking these kernel events and converting these to CoZ events. And then these CoZ events can be used by other parts of the system. The next is the engine and the runtime. And for this, I'm gonna hand it over to Andrew. The core of CoZ is two parts, the engine and the runtime. The engine's responsibilities are process management and runtime and plugin orchestration. It can run as PID-1 or as a container in existing systems. The idea is that we can write a bridge between system D and CoZ. We also aim to simplify what PID-1 should be and have plans to write an init system into the engine. The runtime is responsible for state management and routing resource events to the appropriate plugins. It implements basic CRUD operations you would expect from an API with the addition of orchestrating reconciliation of requests across all plugins. Before I jump into the demo, I want to introduce some of the core concepts of CoZ. One of the things we want to stress is that CoZ is not meant to replace the Unix philosophy, but rather to evolve it. We have here the characteristics of Unix on the left and what we aim to evolve them to on the right. With the advent of VMs and containers, we have moved from shared to isolated workloads. This is an assumption that the originators of Unix didn't have in mind, but we should as we work out the details of CoZ. Plugins become the new Unix utility, but written with the concept of a controller in mind, not human interaction. One of the lessons we have learned from Kubernetes is the power of the controller pattern. In CoZ, we swap human interactions with controllers. And if we aim to remove direct human interaction, we need an API instead of a shell. One of the pillars of Unix is the idea that tech streams are the universal interface. One feature that this pillar is missing is structure. With CoZ, we amend the third pillar of Unix, summarized by Peter Salas, as write programs to handle structured tech streams because that is the universal interface. Let's now take a look at a simple demo to illustrate these concepts. We will see a model representing a mount point, use GRPC to create and delete a mount point, and then see an example of a generator. From the engine's output, we can see that it has started the runtime, generators and plugins. The standard out of each process is piped to the standard out of the engine. Here is an example of structured input representing a desired mount point. You can imagine this as a replacement for an entry in F-stab, and a list of these then representing Etsy F-stab itself. Let's go ahead and create this. You can see in the engine standard out that the mount plugin has acknowledged the request. At this point, the mount plugin's controller is responsible for reconciling the request and outputting a resource that represents the mount point status. You can imagine that in another plugin, I may have a controller that depends on this mount point. This controller can set up a watch asking to be notified when the mount status resource is created. It can then go about doing whatever it needs. Let's go ahead and delete the mount point. Now I wanna move on to the concept of a generator, which I think is extremely powerful. And before I show an example, I wanna set the stage a bit. One of the things we have done in Talos is that we have strived to never depend on any of the pseudophile systems in Linux and to never shell out to utility. If we can query the kernel via an IOCTL or the netlink API, we favor that over, say, parsing a file under slash proc. The problem with this is that sometimes that just is not an option and we have to fall back to parsing unstructured text. Furthermore, getting updates comes in the form of polling. This isn't what we want in Cozy. So what if we could actually tap into when the kernel updates its own state? This is the perfect use case for EBPF. Let's see what that looks like. You can see that the disk generator picks up a kernel event when I plug in a USB stick and now an SD card. This becomes a powerful way to reflect the kernel state in real time and with events, solving the problems that I mentioned. And to drive the point home, here is an example of an ACPI event when I unplug and plug in my AC power supply. Where this could be useful is in being able to subscribe to an ACPI power off event and performing a graceful leadership election of LCD first. Even becoming aware of ACPI events on other control plane nodes and refusing to shut down if it means LCD quorum is lost. That is all the time I have for a demo and I hope by now you can see the potential we have in Cozy. Cozy is completely open source and released under the Apache license. The specification and code can be found at github.com slash Cozy spec. Today, we're also announcing Cozy.dev, a website for all things Cozy related. We'd love for you to join us. Thank you for attending.