 So next up, we've got Dan Norris and Taylor Thomas of Cosmonic talking about orchestrating WASM. Reconciliation loops are not owned by Kubernetes. Please join me in welcoming them to the stage. Thanks, sir. Thanks for the warm welcome. As Liam said, this talk is orchestrating WASM with the rather maybe possibly inflammatory subtitle of reconciliation loops are not owned by Kubernetes. And we're going to talk a lot about that today. So first off, just some brief intros. My name is Taylor Thomas. I am an engineering director at Cosmonic. I'm a WASM cloud maintainer and a serial open source contributor, also a CNCF ambassador. I am a WASM enjoyer. I've been doing WASM for a while and also big rest programmer. And I've done far too much Kubernetes. Because it actually matters in this talk, just to prove that I have some Kubernetes credentials and Dan does, we'll each tell our worst Kubernetes horror story in five date words. And I once had to troubleshoot why Cluster wasn't giving out DNS addresses. So yeah, that was fun, or IP addresses, I mean. Always good times. Anyway, my name is Dan Norris. I am the infrastructure lead over here at Cosmonic and pertinent this discussion. I've been working on Kubernetes on and off since about 2016. One big thing I did was help to build DigitalOcean's internal Kubernetes service and several other Kubernetes-based platforms in my career. I've also done, as you might imagine, too much Kubernetes and my own Kubernetes horror story is actually upgrading a cluster from SUV2 to SUV3 with no ability to store from backups. Good times. So let's talk about what we're going to go through today. The first thing we need to review is what is WASM Cloud? That provides a lot of context for everything else in this talk. We'll talk about why we're building a scheduler, what AM is, and we'll talk about what that is and how it works, seeing it all in action in a little bit of the future. So with that, let's go ahead and talk about WASM Cloud. So WASM Cloud is a CNCF project. It is currently in Sandbox and it's on its way to incubating its proposals out there, so should be incubating sometime here. And it is a, WASM Cloud is a bunch of different pieces. As an orchestrator, which we're going to be talking about with declarative deployments, it's cloud, edge, and platforming Gnostic through the Power of Web Assembly. It gives you the ability to securely access hot, swappable, vendor-less capabilities. So this is taking your dependencies and externalizing them from your actual business logic, and it is a seamless compute mesh with automatic load balancing, failover, RPC, everything that kind of goes on underneath the hood. And we're based on a bunch of open source technologies that we have listed on this slide. Now these next two slides are really the important part. So what is it good for? I always like to be really clear, like, what should you be using WASM Cloud for? What should you be using something else for? And what we're really good at is anything with distributed data locations. So if you have data in several different regions or clouds, this is one, it's a very good tool for that. It's also very good in heterogeneous environments. So if you're talking anything from, like, edge to, like, a big, massive node sitting in a data center and combining those kind of pieces of compute, we're good at that. Same thing with network-constrained apps where you might have firewalling or other things in place. And then multi-region or cloud, we do very good at that. And also anything that involves failover. If you're talking about running very simple applications or, like, functions as a service type things, just that, we can do that. That's probably better elsewhere. Also anything like a web front-end. That's what you have CDNs and other things for. And then the last two are kind of web-assembly specific right now. Anything that's CPU or OS-specific code or very heavily optimized, you probably aren't going to be doing anything on WASM yet. Now this is the slide. If we kind of had to firehose this here at the beginning. So if you've never heard of WASM, you're like, whoa, what is this? All you have to understand in context of this talk is that, first off, web-assembly is the thing that lets you run your applications anywhere. WASM Cloud lets you run these web-assembly components. And we mean that in the term of the component model anywhere, and allow them to talk to each other the same way no matter where they're running or how they're being distributed. So those are the key points here as we go into kind of what we're talking about with scheduling. Yeah, that sort of begs the question. So given that the idea behind WASM Cloud and WASM in general is to be able to run pretty much anything anywhere at any time, how do you actually end up managing those sanely and how do you actually end up rolling that out or orchestrating them? And so our answer, of course, if you couldn't figure that one out, is we wrote a scheduler. And so the obvious question that kind of comes out of that, of course, is why didn't we just simply reuse something like Kubernetes or Nomad? And so we have a couple of different answers to each of them, actually. And so in terms of Kubernetes, I think the thing that we realized and known from quite a lot of experience with that particular technology is that everything is really shaped like a container in Kubernetes. There isn't really great abstractions for anything that's not there. I mean, people have tried it. My colleague actually worked on Crustlet, which was an attempt to basically get Web assembly working natively, kind of had mixed results there. And so we could have, in theory, Kubernetes is real flexible. We could have used its data store to coordinate everything. But again, lots of experience sort of told us that just writing custom resources and trying to basically extend everything in Kubernetes just isn't, it's kind of square a peg round hole for what we wanted to do. And then the other thing that was really pertinent for us is that just scheduling across clouds and regions really is not a thing in Kubernetes. You can try and make it work, but it's not that well supported. On the flip side, so Nomad, which is an orchestrator by HashiCorp, could have been actually a really good use case for us. It is more flexible with various types of workloads, kind of like plug things in. We actually run it internally for various reasons. But what we really needed was something that actually natively understood what components are and Nomad probably wasn't going to get there with a lot of effort. The other thing that kind of concerned us was that many orgs don't run Nomad. I mean, it's a lot easier to ask people to run Kubernetes than it is Nomad. So we didn't really want people to take a dependency on it. And again, scheduling across clouds or regions just really isn't a thing with Nomad. And so, but why, right? Like writing a scheduler is a pretty big project. Like why would you end up actually having to do this? And so really what it came down to us to for us is we want the ability to deploy components, not just single components, but components that actually comprise pretty complex WebAssembly based applications. And there just really wasn't anything out there, particularly with Wasm Cloud. And so, you know, you can with a little bit of effort, right? If you're doing serverless or doing just like a one-off module here and there, you can figure out a way to kind of get that to work pretty well. But again, we wanted the ability to deploy many different components all at once and probably with some sort of like manifest type thing. So that was pretty much why we went down this road. So how does it work? Well, before we go too far into that, I want to talk about kind of our requirements and what our must-haves and what we did not want to do. So what we wanted to avoid was some of that type coupling that we've seen in other places, like for example, Kubernetes, you can't just reuse the scheduler. You can't reuse components. It's all kind of one big thing, even though the ship does microservices. We didn't want to introduce another dependency in order to actually run everything. We wanted to make sure that this scheduler could just take advantage of everything that Wasm Cloud already had available by virtue of just running Wasm Cloud. And we also didn't want to reinvent the wheel. We wanted to be able to reuse things that other schedulers have used, like for example, label-based constraints. We didn't really need to reinvent anything here. There's a lot of prior art. In terms of what it had to do and our actual core requirements, we wanted the scheduler to work with environments that could split or otherwise go offline. We see there's a big frontier, for example, in IoT or embedded systems, things like that, or even just cross-cloud or in the cloud in general. So it needs to be resilient. So we wanted to make sure that this scheduler really understood from the get-go how to embrace the distributed nature of what we're building. And we wanted it to also be easy to scale and match. So we wanted just one binary, just run that, drop it in somewhere right. It'll be able to figure everything out for you, and it'll just work. And we also wanted it to be able to just consume the Wasm Cloud API, just like any other client. We didn't want to have to shim in anything or add a whole bunch of custom stuff just to get this thing to work. It should just be able to reuse what's already available to anybody who happens to be running the system. And so in terms of what this looks like, I did mention the word manifest. So unsurprisingly, it's just a giant pile of yaml, like anything else that we manage these days. The flip side is because we didn't want to reinvent the wheel, we actually used the open application model, OM, that's being used in other environments, namely in Kubernetes, right, to kind of describe various applications. And so if you've ever run Kubernetes, this should look fairly familiar. There's some metadata at the top, but one of the core differences here is, like, yeah, there's this thing called an image, because we typically do ship our WebAssembly modules and components through OCI. But the real key here is that basically everything in this file is describing WebAssembly. So this first image is a WebAssembly component. The second image is a WebAssembly component. And this third thing is actually almost a WebAssembly component. It's not quite, it's actually a Wasm Cloud provider, but it will be a WebAssembly component soon. And you can start to see, by looking at these manifests, the difference in what you have in Kubernetes and why we're building another schedule. Each of these is a component, like Dan was mentioning, that can run anywhere. So we'll show some of this in our demo in just a second. These could be running in two different clouds. They could be running next to each other, but it doesn't matter. We're defining how these things are linked together and how they're supposed to interact. Basically, we're defining an application. And this is not something you can do really inside of the confines of Kubernetes or Nomad or anything that's out there. That's why we needed something custom that could understand deeply components and the fact that this was all distributed. So let's talk a little bit about how this works. Like I mentioned at the very beginning, this project is called Wadam or Wadam. You can say either one. I like Wadam because it sounds like a punch. And so that's what it's called. And that's our fun little logo for it. Now, it runs with, you could say there's a bunch of hosts running. And a lattice is what we call inside of wasm cloud. And it's communicating with gnats, which is basically our backplane. Wadam runs, and you can actually spin up any number of instances. There's no configuration, static host names, anything like that. You just spin it up. It's actually kind of cool. And you spin up those and those handle things and emit compensating events that say like, if something's wrong, I want to compensate to this specific thing. So it's these event and command things that are going on here. So this YAML is obviously hard to read. We made it on purpose because it's YAML. And we all know it's a flaming pile of garbage anyway, because we do it all the time. But everybody uses YAML. So you give it some YAML. And when you start off with YAML, Wadam is sitting there doing nothing. And then it will say, hey, I've got this manifest from you. I parsed it. And then you can deploy it. And when you deploy it, you'll see here on the right that there's these things called scalers. Every single unit of thing in here is managed by a single scaler. So if you have one of your WebAssembly components running, it is managed by a single scaler. The things that link them together called link definitions are handled by single scalers apiece. And each of them are responsible for modifying the state for that. So what happens is as soon as you deploy, Wadam goes, crap, it's empty. I need to do something. And so it goes, okay, I know what I need to do. I'm gonna do the work. Yes, sir. And it starts says, okay, here's the desired state. And when it sees that it starts doing all the work and it issues a bunch of commands to start everything that's supposed to happen. Once again, this is just using the normal WasmCloud API that anyone should be using if they're running WasmCloud. And then once it does all those or receives events and it's like, hmm, something's still not there yet. Here's what I'm looking for and I don't have it yet. You'll notice this is just a style of reconciliation loop, but very much in a distributed way because any of these distributed Wadams can handle it. And then it'll say, okay, I've computed the state. This thing's yellow and yellow because I haven't gotten it yet. I'm gonna do some more commands and then bam, we're good to go. And then it just rinse and repeat. Goes again and again and again. So it's a just constant watching loop where it gets these feeds of events and then returns those things back. Now, this is where we start to get to some of the cool things that we tried to build into this to really leverage the fact that WebAssembly runs everywhere. Yeah, so one of the implicit guarantees of WasmCloud because it's built basically on top of it is that we always have NATs Jetstream available. So NATs is basically a PubSub system. It's what's effectively connecting all of the, or in coordinating all the communication between all of the various WebAssembly components in the system. The upside to that is we actually have a durable data store available to us basically all the time. It's called NATs Jetstream. And so what that allows us to do is basically scale out everything that Wadam in particular happens to be able to be watching. And so we're able to take advantage of a whole bunch of key features to ensure our resiliency. So stuff like durable consumers to make sure that we don't miss things when we actually need to be able to handle events. And one of the cool things about the design actually is because the reconciliation loop is sitting around kind of observing the state of the world and kind of updating itself in real time, the only thing that we really need to worry about in terms of data durability are the manifests. So those are stored in NATs Jetstream. We can back those up. We can restore them. We can version them. All of that's taken care of before us. There's other state that we happen to serialize in and out just to make sure to basically make things a little more efficient, but all of that can go away and then everything's actually pretty good to go with Wadam. And so with that, we're gonna do a live demo and see how it goes. On conference Wi-Fi. I know. We'll see what happens. There will probably be at least one time when you very much know it's real because we are on conference Wi-Fi. That's true. So what we're going to do is kind of give you a little bit of a play of the land, is I went ahead and spun up a whole bunch of different wasm cloud hosts in a bunch of different environments. So up at the top left, we've got a digital ocean droplet, a virtual machine running in Singapore, just a regular old wasm cloud host. I actually have two Kubernetes clusters in two different regions, one in Azure in Central US, which is in Iowa. And then another one in GEP running in GKE, say that three times fast, running in South America, East One, which I think is Sao Paulo, but I actually don't know. And then I've also got one running in our own managed cloud. And one thing to notice that everything is actually x86 other than this one running in our managed product, which happens to be AR64. Tend to just give you that by design basically just because it's sufficient. So what we're gonna do is we're gonna start up some applications and we're gonna watch and see what Wadam does with them in real time. So you wanna talk about the first one? Yeah, so this first one we're gonna do, both of these examples we're keeping simple so you can follow along with what's actually going on here. This first application you'll see up here is a spread scaler and it is meaning that it's going to select a specific set of requirements and make sure there are enough things ready to spin up to whenever requests come in. Of a simple basically hello world that's gonna echo back some data about the HTTP request. And so we're gonna go ahead and run that and it's just going to be running in Singapore and the traffic is going to be feeding in through basically our managed host in US East. And so we're talking about literally opposite sides of the globe right here. So we've deployed an application, it's going to deploy here, we're gonna, and look, it did spin up, perfect. Very spun up. So I'm gonna go and click on our wormhole, which is just a space themed ingress. Oops, I don't wanna edit it, I just wanna click on it. There we go. And so all this really does, it's a pretty simple app, right? All you do is basically you give an HTTP request, it responds with an HTTP request. I can tag on a little query string, right? It responds back to me. Not super complicated, but we're gonna do another version of this slightly different spin on it. And right here, you'll see that we changed this to be a demon scaler. This is very similar to a demon set. The previous one was similar to a deployment if you're coming from Kubernetes lands. And so the demon set here is now going to run this on every single machine in the whole cluster that we have connected. Obviously not gonna do that for a real Hello World app, but you would do it for an actual real app you're running in production. So we're gonna run, and you can target this, we're gonna run it on every single host. So it should be spinning up. Should be spinning up. So I should be able to go back to the same ingress, right? Nothing really changes. It's just now distributed in more places and is potentially responding to me from more places. So we'll do yet another iteration of it. Let's go ahead and kill that one. Oh, right. We forgot to kill it. I forgot to kill it, that's true. I really like this VM, but unfortunately it needs to go away. So I'm just gonna terminate this host that happens to be running in Singapore. I don't know why it's upset at me. I can't click on it. Re-zoomed in. No, it's because it's different resolution and whatnot. That just, man, I was trying so hard. Okay, there it is, look at that, goodbye. So this was the host that we were initially running it on. So we were getting those requests back from Singapore. We now distributed everywhere and now that that host is gone, boom. And we can also run it. We're gonna join in Dan's laptop right now, live to the same cluster. So this is gonna connect it all up and we'll actually, if we go back, you'll see it pop up in the toast in the bottom. Bam, like what Am saw that something had changed. There was a new host that came on and it spun up five more copies on, or basically a max running of five on each host with that, like on here now. So that was the first thing we do. Now that's what ties us into the second example right here. We're gonna take the same exact thing that we were running, but here at the very bottom of the manifest, we're going to see that we have two different HTTP servers. One's gonna be running locally so that Dan can expose it from his laptop and one's still gonna be running from the same point we've been showing the whole time. So we'll go ahead and modify that right now and then you'll see this just pop up. What, did I not know how to copy paste? Yeah, you only copied the labels. I only got the, there we go. So what happens when you do things live? I swear, I know actually do sometimes know how computers work, but not always. There we go. Perfect. Okay, so now that that's deploying, Dan will be able to show you this, actually running locally as well. Yep. So what this spun up was basically just a local endpoint and it's a local HTTP server that bridges in to this Echo Watson Cloud component, right? The same way that I'm able to click on this ingress, I'm able to do the same thing from my laptop. So to show you that, I'm just gonna curl it, right? Slight spin on it and ta-da, it works. Just bridged all in like nothing was really that hard. So the next one we're gonna go on to is a little bit more complex because we're gonna introduce a data store. So this one is just again a simple app to keep it just straight forward. So what will happen here is we're gonna deploy what's called our KV counter example. And this KV counter example is just a simple thing that increments a number in a key value store. And we're gonna deploy it pretty much locally. It's gonna start on just one host. It's gonna start on our managed host and it's gonna use the managed Cosmonic key value store just to start off. Yep, and so what that looks like kind of from an architectural perspective, we've got this component, which is our built-in key value store. We've got the KV counter, which is our little example app. And I'm gonna click on its ingress. That's the wrong one. Now we have to guess because I have three of them. There we go. So very exciting, right? All of this stuff is now running in the cloud. I can just hit this button forever. It's gonna continue incrementing this counter. Super awesome. Yeah, and so now just one last twist. We're gonna go ahead and on this last manifest, and I'll have you show this one. You'll see here at the top that this is now going to be a demon scaler again, but it's gonna be running on our Kubernetes clusters. So what we're gonna do is we're gonna take the actual business logic of our application. We're going to be running it on two different Kubernetes clusters in two different regions from two different providers. And then we're going to also have the database that it's using, use Redis on Dan's computer right here. This is going to be a stand-in for, let's say it's your local data center or whatever it might be. And so he just literally spun up Redis right now and he's gonna go ahead and apply that manifest change and it's gonna redistribute the application entirely. So we started from something that was basically fairly monolithic and redistributed it out by across, like I said, two different Kubernetes clusters that are running this and then something running here locally. Yep, and so you can see that changed. It spun up this little local version of the Redis key value store, the same application, but happens to be running in a whole bunch of different Kubernetes clusters. So now if I go back to this same application, I refresh it, notice that went to zero and then to one because I refresh the page and it automatically incremented, but I can just increment it. So these Kubernetes clusters are actually what's processing the logic, right? That's the ingress basically is kind of relaying the request, right? And then it's actually sending the increment command all the way to my laptop. So this could be anything, right? This could be Cassandra, this could be Dynamo, who knows what, right? But this is just kind of an example of what Wadam can actually do. Yeah, and this is very important to notice. Like we completely, but again, this is not what you can do with schedulers right now. And Wasm lets us do this thing where we're running everywhere and it now allows us to connect it all together. But we always have this question, well, this is kind of some toy demos. Like what does this look like for real? Well, Dan has that up right now. This is actually our production backend right now. So we run most of Cosmonic on Wasm Cloud. And so the bulk of what's actually running is a Wadam app with what, 23 actors, 10 providers, and a whole bunch of link definitions that define how they communicate with one another. And so this is live. Like this is just me straight up logged in here. So like it's, we've been doing this for a while. Like how long did we actually roll out Wadam? Was that like six months ago? Six months ago, yeah. So we've been, you know, kicking the tires on this for quite a while. And I mean, it's a pretty complex set of applications, right? Like these lines are, we call link deaths. They're basically what connects actors to providers. They provide some config, but basically the takeaway from that kind of the spaghetti is that this is a pretty non-trivial application running all in WebAssembly today distributed across like a number of different hosts. So with that said, we wanna go ahead and just wrap up with just what is the future here of Wadam? We're gonna be doing custom scalar logic. So you might have custom things that you're trying to do. And we're gonna do that via Wadam plugins and or Wadam cloud actors in the future. And that's relatively near future. We also would hope that as Wadam continues to mature, we'll be able to compile this down to WebAssembly and just run it as well. So we don't know, like I said, that could be someday in the deep, deep, distant future, but we know that we want to do those things as well. So just last but not least, just a call to action. If you're interested in these things, on the left is our Wadam cloud Slack. We have a very welcoming community there. Can ask questions, try things out. You can pretty much just download our Wash CLI, which is the Wasm Cloud Shell. And so that will have Wadam included if you want to run it. And then also on the right is the actual link to the Wadam project. These are the kinds of things we really want the feedback on and want to see people using and what kind of things they run into. So with that, we'll go ahead and stop. We have just a small, small amount of time for questions. Wonderful presentation you guys. I wanted to ask, it was something that was at the start of the slides. What exactly are CRDs, if that's the right term for it, or CDRs? It is, yeah. So it's just an acronym for custom resource definition is what it stands for. There are ways, it's basically a way to extend Kubernetes with like your own custom stuff. So you can define basically like your own resource and manage it as if it were anything else. It kind of teaches Kubernetes about all sorts of new capabilities. And one follow, would that mean that it kind of helps you with all the applications that are being processed or do you have another like process with that in whole? That's where we come into like, CRDs are an interesting thing. Honestly, they can be a little bit clunky in my experience, having done far too many of them. That's why Wadam is its own, it uses OAM as the manifest format, but we have our own definition of what goes in there because it needs to understand components, which is what we were showing off there in the demo. Other questions? Oh, one over there in the back, yeah. Test, awesome. Yeah, awesome talk, great. And thanks for making the language similar to the Kubernetes people. Question I have actually, so is there also something like resource limits? Like I want to limit my CPU and GPU, whatever usage in that scheduler. Maybe I just didn't see it. Yeah, not yet because there's been a concept in one of the run times. I think it's just wasm time. Here's where I look at Bailey to make sure it's only in wasm time. The concept of fuel because wasm is very deterministic, so you can know exactly how much stuff it's gonna consume to do something, but we haven't as a community decided on something yet. So as we start getting more people using wasm cloud and they're like, we need these now, we'll start doing it and then saying, hey community, what does this look like? Do you like this? Is this good? And we'll keep contributing back because we want this thing to be, with the component model, we want it to be as flexible as possible. You can take stuff that you run in wasm cloud, run it elsewhere, take something from elsewhere, run it in wasm cloud. So we want to make sure we choose things that work really well. And so we'll probably have something that are like the memory limits here in probably the relatively near future because people will need them. Yeah, to elaborate quickly on that, the closest thing that we have right now, and I don't know if it probably was hard to see on the manifest itself, is we effectively have a concurrency limit. So you can say, I only want 50 of these running at any given time. So you can kind of clamp down how much work you're actually doing. But yeah, as Taylor said, figuring out the fuel thing I think is TBD. Awesome, thanks. Yeah, of course. Other questions? Can we have time for one more? So I'm trying to keep up with wasm as much as I can, but with the distributed execution that you have, is it still limited by the container sandbox of WebAssembly, or does WASI coming down the pipe give you more access like you would with containers and Linux kernel access? Yeah, so the good thing here there's an area container to be found in what you just saw. So it's all WebAssembly. And so we have the capability, this is where once again, the component model comes in. Those things that you saw on the lines, you probably saw something that said HTTP server and key value, those are those interfaces that are defined by WIT, which is the IDL for all of this. And we say, okay, this is a key value server. This is the contract I'm expecting. And then you can bring anything custom. That's been the feature that wasm cloud has had since the beginning before the component model even was fully realized. And that we still have now that we're going into the component model as we want you to be able to say, you know what, like here's the generic contract that everybody can use for the 80% use case. But also I might have like my super optimized database thingy at my corporation. So I go at acne corp, colon, whatever your interface name is, you can bring that. We, you're not restricted to the stuff that we have coded inside of wasm cloud itself. All right, Dana Taylor, thank you so much. All right, thanks. Thank everyone.