 Can you guys hear me? Okay. Good. Thank you. Good afternoon, everyone. I hope you're doing great. Today's a day zero at Kripgon. My name is Sohan Kunketkar. I work at Red Hat as a senior software engineer along with me. Hello. My name is Peter Hunt. I also am a senior software engineer working at Red Hat. I work primarily on cryo and sig node, kubelet, container runtimes, that whole area of the stack. So today we are embarking on an exciting journey as we delve into the realms of cryo and wasm. Our journey promises you to be both lightweight and secure, at the same time, providing you a containerization and language agnostic execution. So here's the agenda for today's talk. We're going to quickly cover the introduction to cryo and wasm. Then we'll talk about the importance of wasm in the container space, followed by the benefits of wasm with cryo and Kubernetes. Then we'll talk about our journey into integrating wasm into cryo, followed by quick demos. Then we'll have closing remarks and questions and answers. Let's get started. Can I get a quick show of hands? How many of you know about cryo? That's good. So cryo is a container runtime interface. O stands for open source. It's a lightweight container runtime designed specifically for Kubernetes to run containers within the Kubernetes boards. Cryo brings a host of features to the table. It's native integration with Kubernetes, brings or facilitates orchestration. It exels in image handling at the same time, provides runtime execution and other security protocols. Next slide. So talk about the wasm. Wasm is a powerful technology for web browsers. But today we are seeing its reach beyond web browsers to client and server applications. So wasm in a nutshell is a fast secure virtual machine to execute binary execution without doing the CPU and OS resources. Wasm's appeal lies in its language, agnostic nature, fast execution at the same time providing a sandbox execution environment. Now the magic happens when cryo and wasm comes together. We are trying to leverage the lightweight containerization of cryo to actually execute wasm workloads. So here we are trying to integrate the Kubernetes concepts with language agnostic execution of wasm workload to bring this energy. Next slide. All right. So here I want to talk about the unlocking the potential with wasm in the container space. Wasm has changed the way we use containers nowadays. It added some kind of flexibility in the way we run containers in production. So wasm enables us to compile the code in binary format, which would be utilized for cross architecture and for multiple OS. And that is something which we wanted to use in the container space. At the same time, wasm's binary format is something which is easy for the execution at the same time, minimal in the start of time. So which is really helpful in optimizing the resources in the container space. When you talk about wasm, wasm also adds some security aspect. So it provides you a sandbox environment. At the same time, it provides you model signing for code for integrity and maintaining the security. It allows users to give you some kind of flexibility to add security knobs at the container runtime level without needing to understand the host level or OS privileges. And wasm enables fine-grained control over how application and access may be. That not only gives you enhanced security, but also prevents some unauthorized access. Next slide. Now we know about the importance of wasm. Let's understand the benefits of wasm with cryo in Kubernetes. Here I want to talk about the real-time use cases. The first thing that comes into picture where Kubernetes is the de facto or I would say the popular platform that covers most of the age-competing use cases. So when we integrate wasm with cryo, we are trying to not only add lighter deployments, but at the same time giving us that flexibility to support broader spectrum of OS and architecture. So imagine the use case where you are deploying the lightweight microservices with cryo containers on diverse age-competing devices within the Kubernetes environment. How fascinating is this? The second use case I want to talk about is dynamic scaling. Kubernetes does support dynamic scaling with the help of HP and VPA. So when there is a constraint in the resources, Kubernetes does that thing where it leverages that concept. Now imagine having wasm integration into it. Wasm modules are very light in nature. As I said, we are compact. So we can actually use that for faster deployment and minimal startup time. Since we talked about microservices, we also need to think about the security aspect of it. Kubernetes does support security aspect when it comes to deploying microservices on the platform with cryo container security. Now wasm is like a cherry on the top where it not only provides sandbox execution environment, but also gives you model signing for code and allow users to tweak security-specific knobs at the container and time label. And last but not the least, polygot microservices architecture. Towards the end, I've mentioned my reference, which basically says that there's a research paper which says the containerization of polygot microservice architecture improves their performance. Now imagine artifacting that using wasm will definitely help in improving the performance and reducing the resource usage. With these benefits, our journey started into integrating wasm into cryo. Every journey has some hurdles. So we face few roadblocks that I want to discuss in the next slide. Cryo is at the higher level runtime. It does support two OCR runtimes, RunC and C run. RunC being the popular low-level runtime doesn't support wasm natively. What it means, it doesn't support the integration of wasm runtime to run wasm workloads. However, cryo does support C run where if you mention that image annotation mentioned on the slide, you can run the wasm workloads. However, that is not the case where you basically run that. There's a caveat where this mechanism is passing OCR annotation down to the runtime. In a standard runtime container runtime setup, passing OCR image annotation down to runtime is something which is not organic. And image might try to change the runtime in a way that might pose some container specific issues or security specific issues. Coming back to the cryo level, we thought of adding that support. When pod gets created, at that point of time, we don't know the actual platform of the image. So if you don't assign the right runtime class, cryo will error out saying there's a problem with the platform. So the ideal scenario here is to treat, especially for wasm, is to treat posi wasm as wasm by default. What does that mean? Like we introduce a new field, we introduce a new field called platform runtime parts under cryo's runtime config, where you can mention platform architecture and the corresponding runtime for it. Now, as you can see on the slide, there's a config there. The next one. That config you can see we are using siren wasm. So just to note here under the hood, we are using wasm edge as a runtime. And with this cryo config, you can run wasm workloads without any problem. I can quickly show you the demo. He's a barcode QR barcode. You can scan it. You can get the artifacts for it. So I'll be using the same cryo config that is used on the slide. You can use it as a drop-in config or the bend config. And you need to spin up the cryo instance. I'll be using a local up cluster to run Kubernetes locally. Just show the cryo config. So you can use this as a drop-in config or the normal config. With this, I'm going to spin up a cryo instance. And then we'll be using Kubernetes local up cluster script to spin up the cluster with cryo as the container and time. Let's wait for some time. It takes some time. So the idea here is to have a wasm workload. I'll be using a pod definition to spin up the pod, which will have http server, which takes the input and displays the output. As you can see, the cluster is up and running. Here is the pod definition. And simply its http server wasm artifact, which is used to send the input and get the appropriate output. I'll just go ahead and apply that definition. And I'll expose that service so we can hit the endpoint by calling them, by giving the input and providing the output. So here's the IP for it. I'm going to use that IP to provide the input, say hello world, and that should be the output. All right. And that's some of the demo. With this, I'll hand it over to Peter who will continue with the rest of the presentation. Thank you, Sohan. So this, as it is, is like what currently works in cryo. So we have a wasm image. We wrap it up in a scratch image and we can push it to a registry. And cryo can pull that down and using CRUN wasm and a specific cryo configuration can run that wasm image and have a working wasm container. So that's cool. It works. We love that. But next up, I'm going to talk a little bit about some of the things that we as the cryo community are thinking about in extending our integration with wasm. And some of the ideas we have about changes we may make in the future or may not. We haven't decided yet. So, oh, what happened? Okay. That's cool. Future. Wait, sorry. This is cool. Okay. So, yeah, we're going to be, this is what we want, right? Yep. Okay. It's quite hard to see it from over here. All right. Cool. We're back. So let's talk about the future. The first change that we're thinking about making is extending the current behavior that cryo is able to do and like integrating more of the, you know, features that some of the wasm runtimes have. So one of the things we're thinking about is adding support for plugins coming to. Okay. Let's. Nope. Okay. Here. Hmm. That's good. What's going on? There we go. So, here we have so the, you know, supporting plugging plugins and wiring in additional data into, you know, a running wasm image. So it allows us to do more meaningful work and leverage more of the capabilities of the various wasm runtimes. We're focusing on wasm edge, but see around wasm also support for wasm time and span, I think. And so some of which allows us to do some more meaningful work. We're going to showcase some work that so on has been working on that is very subject to change. In fact, a little bit of this demo is already a little bit wrong, but we'll update it in the slides and in the QR code that you scan before the GitHub repo with the demo information to be more accurate as we move forward and sort of come to the implementation. As I said, this is a future thing, but we're going to show another video, which means I'm going to have to look like this, which is cool. Okay, so what we're going to do is we're going to do a very similar setup as before, but a key that we're going to note in a moment is we're going to print out the cryo configuration. It's going to look, oh man, it's going to look a little, it's going to look a little bit different. You see at the, in the middle around, there's the runtime environment that actually might change a little bit. But basically the idea is we're passing environment variables down to C run wasm itself and that allows us to customize the behavior of, you know, the various wasm runtime. So we're looking to add plugins here, which will allow us to use a llama plugin to do some fancy AI stuff. So that is the crowd configuration as it's subject to change. And then we're going to start up our cube cluster in the same way that we did before. And I'm going to time travel a little bit. So here we did. We did it. And that took no time. And then we're going to print. Okay, pause this. We're going to print out. So this is the container file or Docker file, as many people know it, of the workload we're going to be running. So a couple of notes about it is we have the two environment variables that are defining both the plugin that we're going to be using. Whose name I'm going to try to read. I can't read it. Yeah, wasm edge and then the we're also going to be changing the, you know, having a model path that we're going to be using for the, you know, the llama GPT work. We have our work door, which is slash app and inside of that work door. We're putting our wasm binary. So notice this is a scratch image. We're going to talk about this a little bit later, but you have to wrap it currently in an OCI image. And, you know, kind of specified as you would an OCI image. And then, you know, so this is the image that we're going to use. And then we're going to, I think inspect or maybe you start it. What's next? Oh yeah, so we're going to start and so we're going to expect the image and show that it is indeed a wasm, a wasm, wasm image, which is the platform in the architecture, which is the way that the OCI has sort of defined how to designate a wasm image or wasi. So we're going to highlight that in a moment so you can see the architecture and the OS show that, you know, this, this is awesome, we promise. And then we're going to show the pod spec and I'll point out a couple of details of this as well. So here we have our pod spec. It's, we'll look similar to the one that Sohan showed before, but a defining characteristic of it is we're mounting a volume which will have our model in it. And we did it this way because, you know, otherwise we haven't totally figured out the distribution of models yet. So it could be cool as an OCI artifact, but then it takes like forever to pull because they're huge as anyone who's run AI knows. So we just have it locally and we're cheating a little bit. And we're using the image that we just showed earlier and we have a couple of environment variables. So there's our pod. We're going to run it or apply it, I suppose would be the verb. All over here. Okay. So here, here we have it created. And then we show that it's running. And so it won't look super fancy, but basically, we're going to show the logs of the llama pod running and it just, we have ourselves a little chat prompt. It's going to ask us to talk to it. In fact, talk to it using a cube cuddling sec where we can exact into the process of a pod. And we can, you know, give it, we can ask it some questions, some philosophical questions, some questions about life. I think in this instance, we're asking a trivia question about the capital of North Carolina. Does anyone know? I didn't want that to happen. Oh, man. Rats. Okay. Well, you're never going to know then. Okay. And our plugin will tell us this as we, we're going to do it all again. So that's that. And we go here. We're attaching. Okay. We're back into it. What is the capital of Raleigh? North Carolina? Spoilers. It's really thinking about it. Took me this long to figure it out too. So this is an AI model that's running on our, you know, pre-trained model that's running on our laptop here in this video. And then we're also going to ask how the weather is, but it doesn't know it's an AI. So it's actually going to answer in a kind of a confusing way, which is like it is weather outside. But that's okay because we don't want AI to take over the world yet. So we still can have weather people. It's currently insert weather. So good thing, you know, we got time, but so that's our demo that's running our AI model for all of its quirks. And it's, you know, could use some work, but so that shows that we, so that's an idea that we have about extending the mechanism for running, see, you know, Wasserman cryo. Next, we're going to talk about a couple of pieces that were, you know, maybe thinking Maddox, you know, changing the way that we actually run the Wasserman cryo. So is that right, plugins? No, we just did that. Okay. Oh, come on. Sorry, folks. Yeah, this one. Okay, so there are a couple of optimizations that we can make for this situation. You can recall from before and I won't get back because it's kind of horrible to look at the, we had to wrap the wasm binary in an OCI image to run it because he run wasm the way that it thinks about wasm artifacts is sort of scaffolded by the OCI image spec. But one can imagine a world in which we take an OCI artifact, which is the, you know, an OCI artifact is just a file on an OCI registry, pass around with the OCI distribution spec. And you can imagine a world in which we have a binary and we just check it to a registry, and then we can pull it down from the registry and was just the wasm binary, but, you know, we can pull it and run it. I mean, wouldn't have all of this extra image stuff around it. So that's one option that we have that we're kind of investigating and thinking about, which would sort of reduce the overhead. It wouldn't really change materially the amount of data on the registry because, you know, scratch images and it's really just like a, you know, a top level folder. So it's not really making a lot of changes, but it is sort of simplifying the over the model and our conception of it. Oh, too far. The other thing that we're thinking about is actually running wasm plug in a wasm, you know, run times directly from cryo. So currently we're delegating this behavior down to see your own wasm. And that works well, but that also requires us do some scaffolding around the wasm binary to like massage it and to look like an OCI image and pass it down an OCI runtime spec. And there's a couple of, like it just is a little bit of extra work that we necessarily need to because really we just have the binary and we just want to run the binary. So an option that we have that we're thinking about but might not take because, you know, it's kind of nice having see your own wasm taking care of this stuff for us. But, you know, we could eventually integrate it directly into cryo. If anyone's familiar with container D or running, you know, it's container D. This container D shim is what they call the runtime classes or handlers and they have the run wasi, which does something similar to this. It runs it directly within container D as opposed to delegating down to see your own wasm, as far as I believe at least. So that can optimize sort of taking serum out of it, but at the expense of a little bit more code and cryo. So we're not totally sure we're going to do that yet, but we might. The other thing. So this one I'm actually quite excited about. And this is actually largely switching gears from thinking about running wasm as a workload and more of running wasm as a sort of plugin mechanism. So, but first we're going to do a quick aside. We're going to talk about NRI. So if for anyone who's not familiar with NRI, NRI is the node resource interface. It was originally created by the container D community, but it's since, you know, we have adopted support for it and cryo as well. It's basically a way of doing arbitrary modifications to an OCI runtime spec, which you can just think of as like, imagine I'm, you know, some a hardware vendor and I want to inject every pod with some piece of hardware. In the past, you had to manually change container D and cryo in the same way so that both of them would do this injection. But with NRI, you can just run a plugin on the node and that does that injection for you. So we can look at the current sort of diagram of how NRI works. This is our works today. So basically there's an adaptation layer that runs in cryo and it speaks over TTRPG TTRPG sorry to the NRI plugin and that plugin is a process running on the node alongside cryo usually distributed with a Damon set. And then that does its arbitrary modifications or maybe triggers an event on some, you know, container, you know, event, and then it returns a reply up to cryo and then cryo goes on its merry way. That works great today. It's good. It has a couple of issues though. Like, you know, this is a separate process. So there is a certain amount of coordination that an admin has to do to make sure that their, you know, their plugins are running before their workloads are so that their plugins can get their special resources injected in or whatever the plugin is trying to do. There really isn't a way to define that or do that in Kubernetes like, you know, defined before or after four different pods. So that just adds some management overhead. So what would be really cool is if we could have a plugin mechanism inside of cryo or container D if you prefer that. And which would allow you to, you know, basically run a process like within the process itself. So there isn't this external process and we've imagined a world in which we could have Watson do that. So the idea is basically use the same TTRPC protocol or a similar sort of protocol between cryo and the wasm binary that it has embedded within it. And that allows an admin to sort of distribute wasm plugins that can make arbitrary, you know, changes to their OCI runtime specs or, you know, trigger events from the node, but do it in a safe way because we can, you know, very carefully define what resources on the node that wasm plugin is able to access and do it more efficiently because it's all running within the process itself. We'll still have a little bit of serialization, but, you know, there will be less of this distribution problem. So you can see here we have, you know, we take out a whole section of the back and forth and instead have it be, you know, wasm plugins built directly within cryo. None of this has been done yet. We've only just started talking about it with the main maintainer of NRI Christian. We're considering, you know, sort of going down this path and there's a number of problems or like hurdles that we have to get over for it, but this is what we're imagining it would be really cool if we could do it. And so if that sounds interesting to you, we'd love your feedback about it. This is all to say so that that is basically some of the future things we're thinking about and between what is currently possible today and what, you know, we're thinking about in the future. This is to paint a picture about cryo sort of adoption and backing of wasm as a platform. So, you know, we have this sort of a classic in the cryo talk is cryo loves Kubernetes because it's a, you know, container runtime that's only made for Kubernetes. And, you know, we also love Cilium, sorry, CNI because, you know, that's the way that we love to do networking within containers. And we also love SigSort because that's how we love to do, you know, signing. But we also, this is to say, we also love wasm and we also see a lot of potential in sort of the merging of, you know, cryo as a lightweight container runtime for Kubernetes and wasm as a way of actually running binaries within Kubernetes, whether it be, you know, workloads themselves or modifying cryo itself. So, yeah, here are QR codes to the cryo repo and the cryo Slack, which you can use to harass us about anything that we've talked about today. Here's some references. And thank you very much. Does anyone have any questions? You can clap also if you'd like questions. Here we go. So disclaimer, I've been previously very oblivious to my container runtime. It's just been there. I know when Docker went away and then we had to switch, I mean, well, the shim went away. But what is the risk of just making this the default that I don't need to fiddle with this whole CR, like cryo configuration, because that seems very, very painful, especially if I would like to port this at scale. Totally. So the, the idea of the CR I generally is that actually you don't really have to think about it like continuity and cryo should be functioning pretty similarly. All of this special configuration, this is just a setup wasm and cryo. I think you have to do some similar configuration like enable the continuity wasm shim. You know, it's also just configurations of, you know, container runtime. So that doesn't really go away depending on which if you want to use this, but cryo. I mean, you can run cryo out of the box on your note. And as long as you tell the cuba to talk to it, it'll, you know, it should work. And if it doesn't, that's a bug. So ideally the overhead would be exactly the same between cryo and container D. And, you know, I have a whole laundry list of things that, you know, make me kind of feel like cryo could be a better choice for you. If you want to talk about it with me later, I'd be happy to talk your off about it. It's just because I've seen like runtime classes and then it just kind of falls into place, you know, and then I can just provide that. And then I don't need to worry about it. Right. So the runtime class mechanism like that is something that you have to configure again between container D and cryo, unless you have all of your workloads running as wasm. Or like, you know, homogenous workloads that are running with a specific container, like OCI runtime. So all of that configuration is going to be the same between container D and cryo like most people do exactly what you did. You choose a runtime, you know, based on whatever you've heard and you just let it go and it just runs. And, you know, that happens with cryo and container D. You know, so that that won't necessarily change the specifics of like choosing a run like, you know, deploying the runtime class and making sure there's a corresponding runtime handler and a shim or a runtime handler and cryo that like corresponds to it. That overhead doesn't really change between the two options. Okay. Thank you. Other questions in the back here. Thank you for your presentation. I think I missed something really important here is like what is exactly the end goal of running a wasm on top of a Docker container or like on top of the container runtime. I guess it's trying to make some architecture independent container images. What are we looking at here for the end goal? Well, you have a couple of different, you know, sort of options. I mean, you know, cryo currently runs on Linux. Primarily we're working on adding support for free BSD. So platform agnosticism doesn't necessarily, it's not yet like it, you know, it's not like we can run on Windows necessarily. But, you know, wasm, as I'm sure you've, you know, gathered through today, you know, it by being a binary format, it runs efficiently. It doesn't, you know, you don't need to run time to sort of manage that. And the portability of that means you can port it across different architectures, which crowd does support. So there are a number of like a couple of different, you know, efficiency and also portability. Yeah, the idea is to actually integrate with Kubernetes. And since cryo is lightweight thing, we want to actually leverage that part. That's the only reason we are doing this way. And primary support we added here is because of OCP workloads. I mean cryo is used in open ship. That's like the enterprise ready Kubernetes. So in future, we might think about improving the use case. I mean, as mentioned about, you know, no need to mention about the runtime handler. We might think about this, but we definitely need some feedback. Currently, this use case was predominantly there for OCP and obviously for Kubernetes. But definitely we will think about improving the use case and UX in the future. I've got maybe a spicy question. You know, in the last Epic of Tech, you know, we, there were a number of virtual machine orchestrators, things like OpenStack. And there were a bunch of things that tried to be hybrid orchestrators. And you guys are in this incredible position. I'm sorry, the two of you are in this incredible position as somebody that builds a runtime to think about, you know, is there impedance mismatch between the atomic unit of compute with WebAssembly and containers. What's that been like from your perspective? And obviously it's a better together story for now. And there's lots of ways to integrate. There's all kinds of cool ways. I'm there. We live for it. We love it. And is there eventually going to be a wasm native way to do this? Well, you can imagine a world. I mean, you know, wasm native. By that, you mean like a runtime that only runs wasm, but for Kubernetes. Is that kind of what you're, you know, or Kubernetes? So the, I mean, I think the cloud native ecosystem has sort of, I mean, at least, you know, CNCF like the, the, I believe the only orchestrator that is like sort of in the CNCF is Kubernetes. I could be wrong about that. But like, you know, Kubernetes, you can kind of think of as like the now gold standard. One, the orchestration, you know, battle per se. And so a lot of like the thought that's going around you process orchestration at scale is going into Kubernetes. So like, we kind of find like C running, sorry, wasm and cryo to be a way to bridge that gap for wasm workload specifically. Like you can obviously run wasm directly on a node, you know, with wasm edge or something. You know, how do you orchestrate it at scale? And, you know, there could be a world in which, you know, another entity makes a runtime or like, you know, there was the Crustlet project, which like did a virtual kubelet thing, but for wasm workloads, like that's also an option. But like, you know, by having more sort of an establishment in the Kubernetes space, like we work in signal directly, you know, we're entrenched in the environment that gives us sort of a leg up and sort of like ensuring that the pathways are clear between, you know, to get from an orchestration environment to running on wasm on your node. So that would be so much for our perspective. Maybe one more and then we'll, then we'll move on to where I last talked the day. And now I just came to think again of the you complained about the size of the LLM, right? What if you use the node resource interface for the LLM. So you basically say on this node, I have this LLM cached. So I can use the node resource interface to specify this part can be scheduled to that node in order to execute that LLM. So you could do that. NRI doesn't necessarily speak to the scheduler. So I think that might be something a little bit more like, you know, DRA is a big conversation device resource, dynamic resource allocation, which actually like registers with the Kubernetes scheduler. So you could say like, I want a node that has this specific model that might be a little bit of a better fit. We're thinking about the distribution of that, you know, using OCI artifacts or something or like, you know, we use it as a volume here and you could have like a shared volume on some like, you know, NFS mount, you know, that all the nodes can share within a cluster that can access it. You know, there's a bunch of different options and we're still sort of teasing out that. Yeah, I think we are reducing the footprint for container images. And we are actually thinking about using OCI artifacts, which definitely fits this particular use case. Okay. Let's thank Peter and Son. Thank you so much. Thank you. Thank you.