 All right, let's get started. Hello and welcome to the final day of KubeCon U2024 in the afternoon. And we're amazed that so many of you made it here. This is the special purpose operating system working group panel of the tag runtime. And without further ado, I will have the participants introduce themselves. Please go ahead, Sean. Hi, I'm Sean McGinnis. I am a former maintainer of the Bottle Rocket project. Currently working at Lambda Labs, but still a special purpose operating system enthusiast. My name is Justin Haynes. I'm at Google right now working on their container optimized operating system that underpins GKE. And before that, I worked on Bottle Rocket. Hi, everyone. My name is Mauro Morales. I work for Spectrum Cloud and I'm currently a maintainer at Kairos. Hi, everybody. My name is Pilipe Wissi. I'm one of the creators and maintainers of Unicraft, a Unicurnal project and more recently, CEO and co-founder of a company that's building a new cloud platform based on Unicurnal's Cloud Cloud. Hey, gentlemen and maybe two ladies I spotted in the crowd. My name is Danielle Tal. I work at Microsoft and PM for Flatcar. And I'm also co-chairing the tag runtime. And maybe our lovely host here would want to introduce himself to you. Hi, I'm Tilo. I also work for Microsoft. Today I'm trying to have no project affiliations because that will make two of us and that wouldn't work out. All right, the way we want this panel to work is we want your participation. So if you have any remarks or questions or just wanna stop us babbling and bring something in yourself, please raise your hand. We have an assistant in the back that he'll find you with his mic and then you'll be part of this discussion. All right, so in this spirit, let's get things started. What for you would be a special purpose operating system? Did you come here with any idea? Anyone? There is an idea. Of course, immutable and of course, usable also, not only as a Kubernetes node, but also for special purposes, think of edge deployment without the need for a full Kubernetes cluster, but with a container runtime like Podman or something like that, but always with the purpose of running containers, of course. Everything is a container, everything is possible. That's a good opinion. We may have a surprise for you later. Any other opinions? Maybe something running at the edge or something running like with low resources, so. Good input. It's great to get your expectations one more. So the first thing that comes to mind when talking about special purpose operating systems for me at least is like chorus and flat card container Linux and also Talos Linux. So always specialized for the purpose of running containers, but I guess there's also others I don't know about. Maybe one more? I mean, if we're talking about special purpose operating systems, the purpose can be anything, right? It doesn't have to be anything to do with containers. It's just an operating system that's got one use case. Great. We actually, you would have needed that you, when we prepared this session, if you had a little bit of trouble basically capturing all of the input and all of the representation that we have in this group, it's a wide variety and you'll see in the next slide. But for the time being, so for us, special purpose operating systems in the context of this working group are free and open-source software projects that are designed to run very well defined, specialized workloads with minimal boilerplate dependencies often suited for niche use cases. So basically what you said, but with more words, thank you. Now regarding the representation here and some of our folks from the working group couldn't even make it, this is kind of the very opinionated and probably slightly wrong kind of landscape that we've built. And I'll hand over to Philippe for starting on the very left because I think Unicraft is very unique in what you're doing regarding specialization. Yeah, hi everybody. How many people know what a Unicurnal is? Okay, so a Unicurnal is an extremely specialized virtual machine and often it's based on a modular operating system. So you can pick the bits and pieces that the application needs to run. So if you wanna run a Redis or Nginx, you build different images for each application and then you deploy it as a virtual machine and you get very fast boot times and less memory consumption and so forth. So Unicraft is one project that implements Unicurnals. This is kind of the before joining the working group, I didn't really know about this field in particular and I was very surprised and kind of the minimalness that this can grow. I'm going to try and channel tailors here. So in our working group, we had all of the folks, all of the representatives of the operating systems introduced their operating system and you'll find the info of tailors by someone way more knowledgeable than I am on our YouTube channel. Tailors is a very minimal but still Linux-based operating system so it's not as lean as Unicraft, which is just a few kilobytes or megabytes of binaries and then the application. Tailors has a very opinionated user space. It's super small and it's entirely API-driven. So tailors is made and tailored for operating Kubernetes workloads and it basically tries to integrate individual nodes into Kubernetes and give basically the full control of the node to Kubernetes operators using an API that the minimal user space basically provides. Tailors is awesome. What about Kairos? Sure. I would say the way we try to distinguish the project is by saying that we try to run on the edge and do it by providing solutions to make day two operations as easy as possible. Having said that, you can pretty much configure Kairos to run whatever you want and it still provides enough tools to have anything you need whether it is in a data center at the edge or whatever. So for me, it's a little bit hard to put Kairos somewhere but yeah, the edge is mainly where we're trying to tackle. Awesome, let's follow up with Bottle Rocket. I'm not sure who of you two wanna present. We can fight over it. Yeah, so Bottle Rocket is, I think more along the lines of what more of the folks over here we're talking about with a specialized operating system is it really is focused on running containers but even within that, there are multiple ways to run containerized workloads. So at least in Bottle Rocket's case, there are a set of variants where there's a ECS variant, there's a VMware variant and there's a Kubernetes variant. So it's rather than having the capability just included for everyone to run on, say, VMware, that's not needed if you're not running on VMware. So that's kind of the reason why it focuses. It still tries to be flexible but specialized where it really will focus on, okay, I'm just running in the cloud within a Kubernetes cluster. So just include what you need to run the container and the Kubelet to be part of a Kubernetes cluster. I don't have anything to add about Bottle Rocket, that was well said. So Google's cost has a sort of similar story to Bottle Rocket but was largely built to underpin GKE. It's since grown though, a lot of the managed services within GCP run on costs. And so it is a little less specialized but it's still really aimed at running containers. We have Docker and container D because we have some customers that want Docker and some customers that want container D and has a little more full-featured user space because we had internal customers who really needed all of that when it was first being built. So for Flatcar, I guess, that we provide a lot of flexibility because you could run it on-frame, bare metal, any cloud provider, Cappy. So that would be one thing maybe to mention. And one thing that we're working on a lot lately is system DCSX extension. So like if you want to have more use cases that are not necessarily like containerized but like extends to a system. So that's something that you could use. So like one example for that would be like kind of simplifying distribution into Cappy. So like you need to, you have less complexity metrics of images that you need to create. Just like add on top of the base image of the cloud provider, the flavor that you want of Kubernetes or whatever. So I guess it's like specialized but kind of general. Thanks, Fox. You remember that if there's anything that you want to say, just raise your arm, right? Because you're part of this as well as everybody on the stages. All right. So this is pretty much the static that we had. And this is kind of what unifies us and it's also what distinguishes us. So all of those have very specific focuses. They're structured with their own problems and now we prepared a few questions to get the folks talking. But as said, if you have any, just raise your hand. He'll be here for you. Thanks so much for the introduction. I actually do have a question. We talked already about like there are a lot of operating systems specialized for running container or container workloads or Kubernetes workloads. And this year at Kipcon we talked a lot about Wasm and how it will integrate with Kubernetes. Do you see a specialized operating system coming for specialized on running Wasm workloads? Or is it something that will be integrated in the already mentioned operating systems? So actually we, like I mentioned, system the CXX extensions. So that's actually a use case that we have for that. So there's a bakery of images. Maybe we should share the links later or something. Anyway, so like, yeah, so you extend the usability of flat carb by just like creating a same link to the configuration file that is during boot time. That kind of link to an image that you want to use. So in that case it could be any specific Wasm use case that you have. So we have already kind of extensive library of Wasm images for that. I think kind of in the more general sense, right? The notion of Wasm or Firecracker or kind of any of these other G-Visor maybe more isolation than containers, but not maybe a full weight VM. Is a really interesting space. The tooling around all of this is still growing and being built. So at least to me it's hard to imagine it being a whole new ecosystem, right? Like we've been here for three days hearing about this massive ecosystem. And there's so much integration, Firecracker container D and a lot of the work that has been going on in Wasm land really hooks nicely into a lot of these notions that are containers. The Unicraft folks use a Docker file as the way you wind up building up to something that looks nothing at all like a container. And so I don't know that it would be a whole new paradigm but it might be like a bottle rocket variant that doesn't have RunC and container D but instead is dedicated to spinning up Wasm VMs. We'll say we didn't really talk too much about the background. We're actually a part of the special purpose operating system working group that's Undertake Runtime. So when we were forming this group that was one of the questions that came up when we were writing the charter for the working group is are we talking about containers or are we talking about Wasm? Are we talking about other things like that? So we actually did change some of our original wording because since we're under CNCF everything's within the context of cloud native. And at least right now mostly containers are synonymous with cloud native but that doesn't mean we can't evolve and we can't change. So I just want to say if anyone is out there working on a specialized operating system that really is focused on Wasm or really anything else in this space, let us know. I mean, just in one minor we have targets to build things like spin or Wasi into Unicraft. And when you do, what you end up doing is having an operating system dedicated to Wasm that you can then deploy on the cloud. Hello, so with raising demands for special purpose hardware like various accelerators and TPUs and whatever, how does the special purpose operating systems are playing to them? Does something need to be changed in order to get this better support? This is something that is kind of a constant challenge, right? One of the things, at least in my head when I think about these special purpose operating systems is they kind of have just what they need and nothing extra, right? And so if you're gonna run on three different generations of GPUs now all of a sudden in your read-only operating system or do you have three GPU drivers that you link at runtime and now two of them are just sitting there or how do you tackle that? And I think it's a really good question and I don't know that we've fully answered it. With costs, we host the GPU drivers in a bucket and we have a nice little installer that either our first-party customers or our third-party customers can run and pull those down at runtime or customization time. So they get to make that choice and we got to punt on it. But no, I think it's really challenging both for GPU drivers as well as the folks who are trying to run on a broad suite of hardware. It gets really hard. Yeah, with Kyra's we're also trying to tackle some AMD devices and that has been part of the challenge that it's not like the X86 where you simply use some BIOS but every board has their own way of allowing you to load the operating system. I would say it will depend really on requirements and popularity of these devices. Right now, for example, we're trying a little bit with NVIDIA Jetsons because so many people wants to run LLMs at the edge. But it's a lot of work. It's complicated. It brings a lot of... It makes your ISOs or images very big and a lot of these devices don't actually have the memory to run all of that. So I think we're going to start seeing more solutions and hopefully also have the makers of these devices talk more with the creators of the different operating systems to allow, for example, to have extra space to run these images. I don't know if I talked about system extensions before but we're working also on the DBS one. It's also a good question for us because we are a Linux API compatible operating system but we're not Linux, so drivers could potentially be a problem. So we often target the cloud a lot because then it's virtual drivers and it's just fewer drivers and we do support those. There's some people on the OSS side that work, they use Unicraft in automotive applications where safety critical systems need very little code, maybe not even a scheduler and then certification is linearly costly with a number of lines of code. So yeah, and there's even people working on the compatibility layer so we can steal drivers from Friartos and others, right? So it's a good question. So I'm actually new to the CNCF and to KubeCon but I found out about the session like maybe two hours ago or something like that. I've been working on a specialized operating system under LF Edge called Project Deep and maybe I can share some things from what we've observed over the years and it was actually triggered by a question over here about sort of the scope of this. So we actually started building something for running virtual machines six years ago. First with a Senni hypervisor then with KVM Quimo and then sort of containers came in and container on time and now it's running with K3S, right? And one of the things it's what we built is I guess the closest to Kairos, I would think, sort of looking at AB booting and whatever making the stuff be robust, immutable, those types of things and this sort of notion that, well, that is a key function, the sort of device management from the perspective of the device being the computer, not the GPU, but and once you have that in place, well now, do you want a different runtime? Well, you could download that one on demand. It might not have to be part of it, right? But then you want it to be immutable as well. So what are the trade-offs in this space? But I thought it might be useful to share those thoughts about if we can potentially decouple the stuff. Will we see you at the working group meetings because that would be awesome. Awesome, we have a link at the end. Thanks for sharing. Yeah, I think real quick, one of the things that we hear a lot from customers is like there's two things that they often want. One is some amount of customizability, the least amount, but only for my thing. Also, I want it to boot really fast and not have to do a whole lot when it first starts up. And that becomes a really hard challenge and trying to figure out the balance between those two things, I think is a huge piece of why, like we see this size and spectrum of operating systems because some are really leaning hard towards like we do this well and we do it at build time or you can actually take this thing and turn it into whatever you want at runtime. With containerD2.0 about to be released and the Sandbox API about to be the new default, do you think we'll see even more of an explosion in these kind of things with micro VMs like Firecracker or Clyde Hypervisor and other things specifically to cater for this container isolation use case? Since I'm holding the mic, I'll talk first. I don't know if we'll see a more explosion, but I think we'll continue to see the explosion before containerD's Sandbox. We had GVisor and Firecracker and all these other technologies that are further trying to isolate containers like I mentioned earlier. So I think yes, we'll continue to see more and more and more because we're building more kind of, I'm gonna keep using that word niche that we used earlier, tools that are really aimed at individual use cases and individual deployment techniques and things like that. And so the choice grows, but I think it means that for kind of an end user who's building their software suite or as Solomon talked about, kind of building their factory, they're able to pick things that are better for their factory, right? We used to just have a couple of robots and now we have more and more and more customization with the robots that we're able to pick. Just a little clarification, Firecracker, Kimu, these are for virtual machines, not for containers. If you know Zen, there's XL and XM, you can then run container runtimes within those virtual machines, especially if you go to the cloud, you're always running a virtual machine underneath whether you know it or not and then the container runtime on top. So it's probably clear that there are gonna be no single operating system that everyone will be happy. If that would be possible, we will have just single Linux generic distro. So what about, is there a work on going to make it easier for users to choose to switch? One of the options is about initial configuration. We have already cloud init, we have already ignition, we have already, so I guess Talos is completely API configured. There are some other options. So what can be done here to have some more generic uniform configuration? I think it's one of those jokes, right? Like you tried to come up with a new standard and now you have another version of it, right? I don't know, every project will have their ideology and how they think is the best way to tackle this. At Kyros, what we think is that it's very costly for you to change your existing know-how. So if you're already using Ubuntu, Kyros is more, despite me not liking the word, but it's more like a framework. So it will run on top of Ubuntu. If you would rather use Fedora, it runs on top of Fedora. If you're paying licenses to have Red Hat, it runs on top of Red Hat. So you don't have to really train your people again with a new operating system. And in terms of configuration, the same thing. It comes with support for a subset of Cloud init. But again, if you want to run something else or do it with Docker files, it's still possible to do it. I don't know, I think there are a couple of projects that definitely made a significant use. Like in my opinion, Docker files, I think a lot of people are very comfortable with Docker files. But I don't know, it will depend on, I don't know what you guys think. Yeah, we're a high rate in that we're not Linux. So when you go to deploy, there's no distro, right? When you go to build, we build based on Docker files, right? So we use Docker files as a template as to figure out which binaries you want to run and what your file should be, how we should build the file system. And then we sort of automatically trick you and convert you into a special extra machine underneath that just runs. So we're a little bit in the middle. I guess we have also some history which Flat Car is built. So like our approach is based on ignition therefore. But like we try to have less like to extend the usability, you know, like you can run using Kappy and so on. So like just like try to adjust to whatever you want. But yeah, there's also a history that comes with some distributions. So just coming from the ball rocket perspective, they use yet another one. It's not clouded in it, it's not ignition. That I think it's a hard part right now that we need to figure out, is there a way that we can have a common API that works with all of these? But another aspect I guess to think about that I wanna point out is ideally, hopefully, with running these more minimalized OSes that you can really embrace that pets versus cattle idea. And you may need to do a little adjustment on how you deploy initially. But then hopefully you're not really going to have to think about it much after that. That you kind of set up the way you deploy and then you're done with the OS. Then you go on and do the things that you need to do. Both Bottle Rocket and Talos are kind of API driven, very different APIs, maybe unfortunately. At one point the founder of Talos was chatting with us about maybe we should come up with a, for the operating systems that wanna be API driven, should we come up with a standard here? I don't know if that's gone anywhere since I left, but I think things like that idea are really good. A lot of the history comes from clouding it when everybody's running big, full OSes in the cloud. And that's just bash scripts on steroids, which is great for all the configurability, but for a lot of these more customized OSes, it's way more functionality than you need, right? So figuring out kind of where to draw lines and make standards I think is still early days of figuring for sure. Justin, you set me up beautifully because we did look at cloud in it to be supported and the thing is on the cloud we boot in milliseconds and the cloud in it then comes in and does stuff for seconds. So if we could just have, keep our milliseconds, but then the configurability, that would be nice, but drives to the point that it's really hard to come up with a standard, right? I just add that there's at least a little bit hope that their frameworks as Danielle mentioned, like class API that kind of abstracts everything from you and they support both. And then there's collaboration of projects. So instead of inventing a new standard, Susan Micro has went with ignition and already iterated and improved and there's multiple projects, at least from the ignition perspective that went back to the original ignition source, like everybody's using upstream and we're collaborating on extending this instead of coming up with our own spin. Those are baby steps, but there's light at the end of the tunnel. So maybe related question. One difficult choice is kind of which you pick. So, you know, obviously if you wanna run on bare metal that halves down your choices, but then it's still a bit difficult. And especially on bare metal, you make a bit of a commitment, right? Because you have to like physically re-image if you wanna switch between them. And certainly like, I don't know if it's at all possible if anyone can think of an idea of how it would work, but if there's a way to even think about updating from one of these to another one, if we could maybe have some standardization in the boot chain, like I was looking at Talos and I kind of realized, oh wait, the unicarnal image is like 90 megabytes and I only have 70 megabytes in my FE system partition. So if I ever wanna kind of switch from what I'm running to there, then I definitely have to repartition the FE system partition, which is a bit scary. So yeah, maybe just a standard about the size of the FE system partition, things like that could be a step in the direction to some flexibility for people deploying this. I would love to see that, but even the sizes of the images that you can install in some FE devices is not a standard out there. So my guess is that it's gonna be really hard to get all these hardware makers together and discuss about that. Maybe at the operating system since it's a software we can come up with something, but I would say it's good that we have this kind of group and discuss these things a lot more and hopefully we would see something like that. But right now, for example, with Kairos you can do upgrades between, I don't know, Kairos Ubuntu to Kairos Suze, but even those we are like, it's not really supported, it just happens to work sometimes, you know? So, yeah. I think one of the other interesting places to kind of poke in that direction is, right, a lot of the folks that are sitting up here, we build operating systems, we also almost build operating system builders, right? And the further we can lean into that direction, right? Like containers weren't new until, it was just a bunch of Linux primitives and then it got wrapped up in a really nice bow that made it super, super easy to bundle up your code and isolate the process. And maybe that's another direction is like, we just need to make it way easier for people to take well-maintained sets of software and tournament operating systems, because then you don't have to do any migration, you get to choose from your menu. All right, we're closing in on the end. I'm so happy to have all of you here because I wasn't able to ask any of the prepared questions. It's amazing to have this traction with you. Before I come to the last question, I think the slide I should display is our contact. So if you wanna like jump in on the working group folks and participate in our meetings, that's the left-hand side, just scan and you'll get a link to the working group charter and it has all of the information. If you wanna leave feedback on how we did, that's the right-hand side. And now for the last question. See, look at it, sorry, but I just wanna interject one more thing. It's not linked on there yet, I don't believe, but we had mentioned that each of the projects had given a kind of a quick presentation overview of what their project is. So we should probably get that where that links to. But if you are interested, if you go to YouTube and look for the TAG Runtime channel, there is a special purpose operating system playlist where all of our meetings are recorded and we have one more to go, but each of the projects have a recording with someone from that project giving an overview of what it is. So if you are interested, wanna find out more, you can go there, sorry. So some of the underlying hardware in these sorts of systems has lots and lots and lots of cores, so much so that a lot of traditional schedulers originally developed on a four core virtual machine fall over because they're not used to having 128 cores or 256 cores and two NUMU domains or some other sort of large, many core system. Can you comment? It's possible that this is just a special case of special purpose hardware for special purpose operating systems, but I have a sense that there's parallelism that has to be understood at a scale that might be more so than a typical for some of these systems. When you say scheduler, are you talking about like a pod scheduler, Kubernetes kind of scheduler or more like scheduling work on CPU scheduler? I'm thinking of a problem that I'm aware of where there's language primitives that are unintentionally single threaded at some point, which scale really well up to 16 or 32 things and fall over at 128. And so I can just wonder, as you're developing a special purpose operating system, as the hardware gets more and more hardware cores, do you have to do anything special to address that or does it just sort of fall out of the virtualization system or whatnot to make it all continue to work as you'd expect? I guess my fairly non-technical answer would be like for many of us, we get to lean heavily on the giant Linux kernel community to handle all of that for us, thank goodness. For some of us though, maybe they have a much more detailed answer about why that's hard. I don't know why you're looking at me specifically. Yeah, so one thing I'll say, schedulers and Unicraft are modular, so we can have multiple of them. There's a well-defined API and you can add all others. We try to keep Unicraft BSD licensed. So what we do is we go to the free BSD kernel often and projects that have BSD licenses and we steal from them instead. I just wanted to say something quick before I guess you wrap it up, then that the next meeting would be on the 4th of April and we would be very happy to see new faces turning. All right, thanks everybody for coming. Eric, we're gonna see you on the 4th of April in the tech meeting, looking forward to that. Pretty European times on friendly, right? 2PM UTC. All right, thanks everybody for your great questions. They were a lot better than what you preferred. So big round of applause for our tech folks.