 Good afternoon, everyone. Thanks for being here. My name is Francisco Caudera. I'm a technical program manager at the AKS Hybrid team at Microsoft, and today with me is Angel. So my name is Angel. I'm part of the VMware AI Labs team working on the exploration and development of WebAssembly related projects. So yeah, and today we'll be talking about how you can use RunWassi in order to run WebAssembly in kind of a serverless approach, specifically using container D and Kubernetes. But before I start, I'll just tell a little story. So over the last two years, I've been coming to different kind of Kubernetes conferences, WebAssembly, and every time I go back and I talk with my Kubernetes team and I just talk about WebAssembly, they always say, hey, feels great. That's great. But how I can use it? How I can actually start using WebAssembly with Kubernetes? How I can actually bring it to customers? And there is actually where I actually lack of good answers. So when I was talking with Angel talking about this talk, we actually thought about, okay, how are we going to frame it so that we can help everyone actually start doing, you know, creating applications using WebAssembly specifically, right, with Kubernetes? So let's go ahead and start it. So we'll start talking a bit about, you know, WebAssembly. Then we're going to actually create a Kubernetes sample app, really simple. Then Angel will go over, you know, what is the Wasam worker servers, how we actually do that migration process of moving from using Linux containers and start using WebAssembly modules. Then if we have time, we'll go over how you can actually extend this kind of a really simple app using serverless AI, using WebAssembly, using Wasign, and finally just, you know, some conclusions. So the sort of that kind of like WebAssembly is the next wave. For those that are not familiar, WebAssembly is kind of a low-level assembly-like kind of language with this really compact binary format, which has real kind of native performance and has multiple benefits over containers. And yeah, sounds great, but, you know, it's difficult, right, to actually start using WebAssembly. But if we go actually and see the CNCF, the annual survey in 2022, we see that container, they said that containers are the new normal WebAssembly is the future. Again, great. But when you actually go and start seeing the numbers, you see that 64% of these kind of respondents are using Kubernetes in production environments. 25% are actually, you know, using, doing POCs. We'll probably end up using production environments also. But when you go to WebAssembly, you should just see that 35% actually tested WebAssembly. So we need to make sure that we're able to tackle, right, how you actually move from testing to actually production, and you bring in, right, side by side with Kubernetes. So how are things going? So let's start with the good things. We're, you know, the community is making great, great progress. So there's, there was this year just, you have Wasm IO, there was Wasm Conference, Wasm Day, right, for both KubeConny, you and KubeConnet in North America. A lot of kind of, you know, adoption when it comes to WebAssembly and Wasi and all the different kind of community projects. When it comes to different languages support, there was Python, JavaScript, now with the Wasi interface for garbage collector, you know, it's going to probably bring Java, bring .NET, there being kind of also news around Go. So a lot of kind of these languages are actually bringing this kind of compilation to WebAssembly. And also there's been new kind of tooling when it comes to, you know, to provide a better developer experience. So now you can actually go ahead and pack a WebAssembly component into an OCI container and actually push that to your container registry and use actually Kubernetes to download and kind of run it. But also it's true there are a lot of challenges. There's a restricted capabilities. So right now you, we don't have like, there's no support for sockets, there's no support for threats, that's coming right with a new Wasm interface. If you want, for example, to use GPU, there's also kind of restricted there. Again, things are coming. But right now the state of art is just, there's some kind of restricted capabilities. When it comes to debugging, because you have this kind of binary format, it's quite difficult to actually do the debugging and actually use the same tools that you're using with a normal, let's say, Visual Studio, kind of the tool that you're using with your normal programming languages and actually bring into WebAssembly. Better Wasm support, again, there's still like, you know, features. There are some kind of interfaces that are being kind of developed right now, but more are needed in order to bring, right, these kind of all the required features for these new apps. And finally, kind of the developer ecosystem and the guidelines. And here it's just like, when we talk with Angela, every time you're trying to start to do a WebAssembly kind of application, it's quite difficult because you have to go over a lot of things. And there's no clear kind of guidelines of how you should start. So the idea here is that we'll try to go in this talk to talk about how we can actually improve disabled adoption by showing this kind of migration journey from containers to WebAssembly. To do that, we are just going to be doing this sample app, which is a gallery app. So it's just this kind of app where you can just upload pictures and you have reactions. You just kind of like this, right? And you're restoring both on a database and also as local kind of files for us. So if you go into kind of a bit into the detail of the architecture, you can see that you have this kind of Node.js reactions module, really simple kind of API using Hono. Then we have Python Flask App, just with the front end, really simple again, just to upload the pictures, show the pictures, and actually communicate with the Node.js right for their reactions. And then finally a database, just an IQ light, we choose this because it's just simple when it comes to doing integration with APIs. And then you'll have, of course, your Kubernetes kind of environment with the login networking, you know, all the different kind of Kubernetes components. So if we were to actually use this kind of solution, there are a couple of challenges. And the first one, of course, is the size and dependencies. So here we're just using, right, a Python kind of a base image. And you can see there, we're using the Python 3.11. And you can see the actual size of that is 300 MB. If you go actually to the other app, which is the Node.js, we're going to use, let's say, a Node base image using just Buster's LIM. And there you can get, it's much better, you get all the way to ADMB, but still it's quite big. And the worst part of this also is that you are taking a lot of dependencies. You're using a lot of packages inside these images that you're not actually kind of using. Maybe you're not, you don't even need those kind of packages in your app. But they're still there, they're still out in this kind of base image. The second challenge is around kind of security, right? Because you're having these kind of packages and you're having these kind of big base images, you have a lot of CVEs. Like, even before starting, and we did the kind of the investigation of just going to Docker Hub and check, right, this base image. And you can see in this case it was a Node.js image that you already have 31 CVEs there to fix. And you haven't even started with your application. That's just the base image. And the last thing is about portability. So over the past kind of years we've been heard a lot about ARM64, you know, how better RISC-5 it is. And kind of, if you actually go right now into Azure or Amazon Web Services, you can see that if you're using, for example, an ARM64 Kubernetes node pool, you can get a much better price compared to X86. You can get it specifically, I think, in Azure, it's like you can get a 50% better price performance. And in AWS, you can get something like 40%. And right now, this kind of trend of just, you know, getting this adoption, you know, better prices with ARM64 is also coming to kind of the PC-class hardware devices. So we've talked about a lot, you know, we know everyone about the Apple Max, right, without using, right, with M1s, without using the ARM64 chipset. But now I think last week Qualcomm announced also that they have this kind of big competitor to this kind of chipset. And they're claiming that they can have 68% better performance with the same, you know, with low power. And they actually claim that they have 50% better kind of faster, peak multi-thread performance compared to M2. So I don't know if that's going to, it's actually true, but we're going to see a lot of kind of traction also there with ARM64 on these kind of PC-class hardware devices. So when you start thinking about that, that means that now you will have to take into account that you'll have Linux containers for X64, right, now you actually need to add ARM64, probably in the future, right, if we keep in this trend, we're going to have to add RISC-5. And it could be another OS, Windows, let's say. So actually that matrix keeps getting bigger and bigger. So you actually need one container page of these different kind of permutations. So we try, now we're going to talk about how we actually can tackle all these challenges by using kind of a was on worker servers. So thank you very much, Francisco, for all the interaction, all the context. One of the things that is happening around ecosystem in the was on ecosystem is that there are different ways in which you can tackle all these approaches and try to deploy applications using WebAssembly as the underlying technology. One of them, of these projects, is was on worker server. And this is the one that we are going to use today for moving these projects that we were describing before into WebAssembly. So was on worker server is a tool to develop and run serverless applications. Internally, it uses WebAssembly and it enables you to do different things like combining multiple languages into the same application. So you can, for example, run Python with JavaScript altogether. And then you can run the application as Francisco was describing almost anywhere. Because one of the promises of WebAssembly is that once you compile one specific application, you are doing it targeting virtual matching on binary format specifically for WebAssembly. This same model can run in any environment that provides a runtime that can read and run that specific model. It means that once you compile it, you can run it in ARM64, in x86 on any new chip that the runtime supports without having to recreate or recompile again the project. You can think as was on worker server as a regular server that you could deploy your application in your infrastructure and that you can drop some code inside and then it will figure out what's the code and how to run, how to expose all of the information to the users. Giving you a very, very brief information about how this was on worker server works, it's a server as I mentioned. On top of that, it sits a set of different language runtimes which we precompile for you so you don't need to do that work. We have Quick.js, which is the JavaScript runtime that we are using for this specific project. We have Python, we have Ruby, and the good thing is that the community is moving forward and adding more and more languages over the time. Because I think one year ago when we started working on this, there were just a few languages supported in WebAssembly, and if you wanted, for example, to use a new one like Ruby, you had to actually do the work and compile the entire Ruby interpreter to WebAssembly. And that was a huge bore in some cases, depending on the interpreter that you use. But it took time. But now the good thing is that different languages are keeping up with new proposal, like Kotlin, for example, announced that they have good support, it's much faster and much smaller. There are other languages as we were mentioning, Java that will come in the future. As far as I know, like Go, Rust, Zeke, all these languages have really good support for targeting WebAssembly. Was on worker server in reality is a layer on top of an existing WebAssembly runtime, which is was on time, the one that it uses under the hood. So this specific server can run in many different environments. Today, we are talking about Kubernetes, we are at KubeCon, of course, but it's just one of the targets that you can run this application, meaning that if you want in the future to use a different approach for developing the same applications that we are going to show today, you can run just was on worker server in your laptop, a different kind of server, integrates in your infrastructure in a way that makes sense for every use case. So now let's go back to the project architecture. So as we mentioned before, we have two different apps, reactions and carousel, both are written in different languages, using different frameworks, and then we have acrylite. So the work that we have to do today is to migrate these two applications and converting in a way that was on worker server can run on top of WebAssembly. The good news is that that process is not really complex for Natalie. It's true that it depends on the application, of course, but in the case of the on application or the reaction application, which is the first one that we are going to migrate, we have already the language that is supported, that's good, you don't need to change the language and start from scratch. We have already the framework that is pure JavaScript, so you can basically move with the code. And then you have all the features that are required for this specific application, which is calling a third-party service, the acrylite database, and that's also supported in was on worker server. So we have everything. So the steps that I'm going to show now is that first you need to identify the required capabilities, because by default, any application on top of was on worker server, that thing has access to any resource. It means that it cannot perform HTTP calls, it cannot access the file system, environment variables, anything. You need to configure and grant all these permissions one by one, keeping your application totally isolated from other elements in your infrastructure. Then you need to create a configuration file, which will define these features. In this specific case for JavaScript, currently, we are not supporting having multiple models as you could do important exploring the JavaScript world, so you need to bundle everything into a single file, as you could do, for example, for a front-end application. Then you define the application structure, meaning about how you are going to expose that API or what are the points that you use for that, and then run the application. That's all. So let me see if I can play the video. Okay. There it is. Okay. So the first thing, whoops, it goes. Okay. So the first thing, let's take a look at the it's going. Do you know how to pose in the middle? Okay. Yeah. So I'm going to show you how this works pretty quickly. Thank you. So the first thing that I want to show is about the code. The code is a pretty simple project, not going into the details, but you have a package JSON that defines the project dependencies, you have an app.js file that contains the basic logic server that will run on Node.js, and then you have all the different pieces like the database connection and the handlers that will work for this specific application. If we take a look, for example, to the package.json file, it's a pretty standard package.json. We already required ono that we were talking before and the node server for exposing it. If we take a look at the app file, it's an ono application which is a framework similar to express on Flask, but for JavaScript. It returns all the handlers that will reply to the different API endpoints. It configures the database at the top. Then we prepare the database to start receiving information. At the bottom, we have all the API endpoints that this specific API will reply to. So the next thing is the server file. This is just for Node.js for getting this specific application and running it. And now take a look. Let's take a look about how this works. So the first thing we need to set up the database, I'm going to use Docker Confos locally. I start the project using Node.js with NPN start. And then I call the base API and I get the status okay, which is the main endpoint. So now let's migrate this project to was on worker server. We can do it in different ways. The way that I explain is about bundling everything together. Another option is to split the code application into separate pieces, but to avoid having to write code and having to do it again, we just bundle it in a single file. For that, I'm going to use this build with ECMAScript models format and then I output it into this folder, just 42 kilobytes. And then I need to create the configuration file, which I already created before. So this configuration file is the one that is associated to the application that we just created and enable the different features that will be required for this application. For example, here, we have the environment variables that we want to configure. And then we have the HTTP request feature, which is one of the features that was on worker server provides to the application. So here we are allowing HTTP connection, so not without HTTPS. And then some specific hosts that this specific application will access to. It means that if the application tries, for example, to access an HTTP endpoint that is not declared in the configuration, it will fail because the service is totally isolated. So it needs to go through the host, which is validating all the calls that it's doing before reaching the service. So now we have all the pieces that we need to run the same application in was on worker server. The first thing that we need to do is to install the project, which is just a single line of code. As I mentioned, it's a CLI that you can download, just score and install it to local in your system. I have already installed, so not going to run it. And then we define the folder that we want to run. So at this point, you may be thinking how this, how was on worker server knows what are the endpoints that should reply to this application? Because here I'm using JavaScript, but I could drop in the same folder Python scripts and Ruby scripts, and they will also reply into different requests. So for this, specifically in was on worker server, we use a pattern that it's called file system routine, which means that was on worker server automatically maps the files that it identifies as functions or what we call workers, and then exposing on an HTTP API. In this case, we're using the dot notation to ask was on worker server to reply all the different routes that can happen in the server with this specific worker. If we, for example, write here a file, which is hello.js, was on worker server will expose that file at hello endpoint. But in this case, we want all the different requests to go through the same worker. So this is the pattern that we should use. So now if we run was on worker server, we need to, in this case, I put the port, I just pass the folder that they want to run as a project. And then if I call the same API, I get the same result. So not very exciting, but it works. And we could make great one application without changing a single line of code in the project just by compiling all the different JavaScript files into a single one and prepare the configuration for was on worker server. So the next thing that we need for deploying it to Kubernetes is basically distributing that file. So for this step, what we need to do is to create a container image that we can push to an existing registry and then pull in our Kubernetes cluster to run the application. The good news here is that was on worker server could be installed as a shim inside the Kubernetes cluster as Francisco will demonstrate later. So the only thing that you need in the Docker file are the project files. You don't have any dependency, you don't have any runtime, you don't have to write anything inside. Just from scratch copy the files and point to the worker that you want to serve. That's all you need. So let's go back. Okay, we are back. So that was for the JavaScript application. For the Python application, the process could be really similar to this. In fact, if we check the list of things that you need to do, both at the top, identify the capabilities and prepare the configuration file as well as defining the project architecture and running it are common for any language that you want to do. And then you have in the middle something that is specific to every language. So for JavaScript, we had to bundle all the files into a single one. And for Python, we need to mount the libraries that we want to run, for example, the flask libraries, and then configure Python to read them. So just because of time, we are going to escape this part of Python. Let's continue and talk a little bit more about the benefits and challenges. So when we work with Wasm worker server, the WebAssembly ecosystem in general, the good thing that the first thing that we get is that the application that you distribute only includes the code that you care about. So all the different things about layering, all the different gaps, exposing things, all that it's basically exposed by Wasm worker server. So you just need to care about the business logic that is important for your use case or for your application. The second thing is that you have feign-grain permissions. So workers, every independent worker will have access only to the resources that you grant manually. And then you don't need to install a runtime locally because we distribute a set of different runtimes like Python, Ruby, and JavaScript with Wasm worker server. You don't need to install everything locally in your system. We just install Wasm worker server and that's all you need. So what are the challenges here, of course? There are always challenges. So not all the languages and libraries are supported. This is something specific to Wasm worker server, but for the entire WebAssembly ecosystem. The community is growing, new languages are coming as we were talking before, so this is going to be better in the future. In this case, the performance is worse. The response time is lower when you are running Wasm worker server. I'm going to explain in the next slide why this is happening and what we are doing to improve those. And as a general thing, as Francisco mentioned, debugging and monitoring is something that is not clear today for Wasm. So the main reason about the performance is that many of the applications that you run today expect a long running process. So, for example, when you have Python with flask, at the beginning, when you boot the server, it loads all the different libraries and then it waits for getting requests. However, in the model that we are working with Wasm worker server, all these models are invoked by request. So it means that on the good side, it can scale down to zero. So if you don't have any user access into the request, you are not consuming anything on your server. But any time someone reaches the server, it needs to load everything and provide a reply. What we are doing today is trying to find ways in which we can speed up this initialization process by, for example, do model pricing initialization, catching, sharing the resources so you have just one runtime that you can share across all the workers that you have. And this is something that we will take a look and improve in the future. And that said, let's go back to Kubernetes and deployments. So the idea here is that now we are going to migrate the initial application that was using Linux containers right to the WebAssembly kind of modules. So as you can see here, same kind of architecture, but now we have the Wasm worker servers and then on top of that, the two WebAssembly modules, the same database. So if you go a bit more into the details here, so you're having the Kubernetes cluster, right? And if you go into those two deployments, you can actually see two things. First is that when it says the runtime class, it says runtime class name, there we have WWS, that's Wasm worker server. So we're actually saying, hey, when you're going to download this kind of container, you actually need to run with this specific shim, right, this container, this shim, that will already contain the Wasm worker server. And then if you see there on the images, right, it's just reactions, right, at WWS, which is kind of the Wasm worker server container that is much smaller, but we'll go over it. So I'll go over really quick over the demo. Let's see here. So we'll start just with really quickly with the Kubernetes cluster. And I'll just go ahead and apply first the Linux kind of a module. So there has all the services created. I can see the pods getting created. Let's go up. Now I'm going to go, I deploy first the database. And now I'm going to actually deploy the two Linux containers, right? So it's going to be the Flask app and the Node.js app. Let's go running. Okay. Now I have both running here. And now I'm going to go and check the service where I have my Carusel Python Flask app. And I'll just go here, check that it's working. And actually I'm going to go and upload a picture and then just use, you know, a couple of reactions just to make sure everything works. Yep, there it is. So I upload a picture, just press a couple of times. I get the reactions. Those are being stored right now in the database. And that's actually using Linux containers. So the idea is now that we're going to move right into using WebAssembly modules. So now I already deleted the previous deployment. And we're going to be applying now the Kubernetes. There was some kind of deployment. So same approach, right? So you have the same database. What I'm changing now is moving right from the Linux to actually these Wasm containers. So let's go ahead and make sure that they're running. I can go and check the logs. It's actually loading these workers. One route, in the case of the Node.js app, it was just one route, the reactions where you would get what's, you know, from the database. They're loaded. And actually I'm going to make sure, you know, the carousel is also running. Let's go quickly. Yep, it's running. And now I'm going to change, right? If you see now, I'm changing to the new service that this new service is actually targeting the Wasm worker server. And you can actually see right here that is on a different port. So I'll just go ahead and run it. And the experience should be exactly the same. So I'm going to log into this new service that is actually targeting the Wasm worker server. And we should see the same, right? So the only difference here is that, you know, the app, the image is not there because in the end we're starting images directly on the kind of, let's say, on the folder inside the container, right? And the different kind of Linux and the Wasm kind of container. But as soon as I upload the new picture, we're going to see the same telemetry from the database. I'm going to upload the same picture. And we should see, it takes, as you see the performance is a bit slower, but you should see that the same picture, right? And it was having the same telemetry, the same kind of insights that were stored on this database. Just to wrap up, let's go back here. We could, as part of our GitHub, we're uploading all this, we also have a way to add a new AI service. And in the end, it's really simple. You would just add this new kind of worker that right now we have an example how to do that with Rust and Wasm. So you would just add one extra worker. And then you can actually connect right and get, you know, for example, in this demo that we uploaded, we're just getting tagging. So every image that you upload, you get a tag. And just to conclude here and leave some time for questions, so we're providing the same feature set, right? So you can actually have the same features, both Linux and WebAssembly. Now, because we're using kind of WebAssembly, we have a better security posture. So I already mentioned the deny by default and all how you actually need to approve all the different kind of features that you want to enable. You have less CVS, right? Because now your base image is a scratch, right? You don't have that kind of base image. We improved portability. Now, with these kind of modules, you can actually move them around kind of, let's say, ARM64, X64, Windows Linux should work anywhere. And then finally, you have a reduced container size and memory usage. And you can see here in our example, the reactions kind of Node.js, Linux container originally was 86MB. With the Wasm kind of version, you get to 12.5KV. That's 688 reduction. Same with the Carousel app. The original Python app, it was 264MB. And then with the Wasm version, you get all the way down to 17MB. And like, there's like mainly there, it's, you know, the kind of the bootstrap and kind of the styles. But if those were not there, it actually goes really down same as kind of Node.js. So yeah, that's kind of it. And thank you. Any questions? Yes. So actually, that one of the benefits of Wasm Worker Server and is that the same way that you mount the entire reactions or Python app in one specific root like slash whatever, you can also mount the second application in a slash API with the following pattern. So by doing that, you can deploy just one container with the two applications together. And then everything inside a slash API will be replied by the Python Flutter application. And anything in JavaScript will be replied by the main application. The good thing about that is that that gives also you a way of start like moving specific endpoints to a new language. For example, if you are migrating because you need to write it in a different language in a more iterative way. So you don't need to go from scratch one service, maintaining a legacy one, and then start moving things to a new one. You can do that directly. So yeah. So the question is how this combines the Wasm Serverless approach? How it works with Kubernetes space? Right? Okay. Yeah. So the thing is that not all the projects in the WebAssembly ecosystem works in the serverless way. Let's say Wasm Worker Server does it in this way. So when we mentioned serverless is more about how you architecture the application more than how you deploy the specific apps. In this case, we are mounting all the different endpoints in the same path. So you have one application that internally has a router that redirects the request to the right method internally. But another option is to split all those methods in individual workers that we have associated their own request and response. So in this specific case, what WebAssembly brings to the ecosystem is more about how you can run the applications in a portable way so you can move it in different places and how you can reduce those containers from having all those base resources that you need to put inside libraries into a more streamlined way, just the binaries that you want to run. So in general, it would depend on the application that you want to run. You have both options. But it's just another way to run applications in Kubernetes. You have the ocean running there. You can run it in a different environment, but using the same resources. Okay? So thank you very much.