 Hi everyone, to our talk over here at Emerging OS Forum within the open source in Latin America. I'm Shavai with my friend Rashid and we are going to be presenting on the topic WebAssembly based AI as a service. A very quick introduction about ourselves. I'm Shavai, I'm a developer advocate at Millie Search and also a contributor to layer 5 and projects like machinery which is under the CNCS. Hello, I'm Rashid Dabli. I'm a high school student and an incoming student at the University of Toronto. A love and find me working on machine learning and I've contributed to multiple projects, open source projects and created my own including but not limited to TensorFlow, Kubeflow, Kubernetes and what look. So, we can probably move to our main part of the slides and that is of course comparing the difference between Python and Rust when it comes to machine learning. Now of course in the traditional sense, Python has always been the number one language when it comes to doing any kind of machine learning inference or actually writing machine learning models. There are a few reasons where Rust actually might be a better language to select for doing certain machine learning tasks. One of the biggest reasons why Rust is actually overtaking Python is in terms of performance. Because of the fact that Rust can actually be directly compiled into machine code. There's usually no virtual machine or interpreter that is required in between the code and the computer which usually is the case when we are interacting with Python. And also another key advantage of Rust over Python is in terms of both thread and memory management. So while Rust does not have a proper garbage collector like we have in Python, the compiler in Rust actually enforces certain invalid memory reference to the Linux checks and that is where it really shines as compared to Python. So one of the studies done by the picture that we showcase over here, we can actually see that by the study by IBM we can actually see that how Rust when paired with WebAssembly actually performs up to 12 to 15 times more than Node.js and also approximately 25 times more performance as compared to Python. So that's where there are certain use cases where it might be actually beneficial to use Rust instead of Python. Now, how does WebAssembly actually come into the picture? So WebAssembly is essentially a compile target that provides executables that can actually run at native speeds in very small containers. And since these are portable as well, that means you can actually run them anywhere without having to actually worry about having to deal with like separate compilation targets. It is, it can be run anywhere very easily. And also at the same time since it has its own. So let's see how does actually WebAssembly come into the picture. So well, first of all, what exactly is WebAssembly? Well, WebAssembly is a compilation target that allows and provides executables that can actually run at native speed in really small efficient containers, but also being portable at the same time. So we don't have to actually worry about having to install and then run local dependencies for that particular portable target. Since it is portable, it can run virtually anywhere. And also it is highly secure at the same time. And it's really great that a lot of multiple languages such as C++, Rust and even scripting languages like Python, JavaScript can actually directly be compiled into WebAssembly as the compilation target. And since the code can actually run anywhere, it's also easily applicable for being able to actually do things such as machine learning inference and being able to actually deploy machine learning on various platforms with the help of WebAssembly. And now moving on to why today WebAssembly is actually expanding to spaces within the cloud native. So essentially WebAssembly is expanding and it's the arena of machine learning web and also cloud native being directly leveraged by some of the popular CNCF products like Wasm Edge and Wasm Cloud. For those who might not be aware of what exactly is Wasm Edge, it's a lightweight high performance WebAssembly runtime that has been specially designed for edge and cloud native applications. And essentially with the help of Wasm Edge, we can either do things such as Wasm Edge being able to provide serverless functions that can be embedded into multiple software platforms for like let's it being able to actually run applications on the edge. And there's also great comparison that has been provided by the University of Tilburg where they compare the execution of machine learning with the help of WebAssembly and also Docker. And it's clearly seen that WebAssembly when used in implementation for the creation of model and then the evaluation. Both the container size and also the time of execution that are actually took for the machine learning inference was actually smaller with comparison to the Docker implementation. And then when it comes to smaller applications or edge based applications being able to actually leverage WebAssembly and especially on edge devices or edge applications it's much more smaller in terms of being more lightweight and also in terms of the overall inference time that actually takes. Of course, we can also use machine, we can also use Wasm Edge and specifically WebAssembly for edge applications and specifically for edge functions. Now, one of the reasons why you should actually use WebAssembly and like you know specifically Wasm Edge in terms of serverless computing is because of the fact that in comparison to your standard high performance functions that might be written in C++ or Rust, they can actually be directly compiled into WebAssembly and the WebAssembly functions are much more faster as compared to some of the more popular languages that might be used to actually create serverless functions such as JavaScript or Python. And there are a few other benefits of actually using WebAssembly as well, specifically for serverless functions. So one of the biggest ones being that, as we have mentioned previously, that the bytecode that is usually provided by WebAssembly is really portable. So the developers don't actually have to worry or have to make modifications to the underlying serverless functions that they have to run on multiple platforms such as multiple OS or multiple device like edge devices. They don't have to worry about that as compared to like let's say other serverless functions written in JavaScript Python that might be more dependent on the OS and the hardware on which they're actually running. So essentially the same WebAssembly functions can be reused by developers in various cloud environments. And also, they are much more secure. So essentially we can extend the WebAssembly sandbox security to these edge functions as well to these serverless functions. And that essentially helps us to run like these serverless functions in a very safe environment. That means you can actually do things like machine learning, AI inference at full native speed, having the security that is provided by the sandbox of the WebAssembly. So these are some of the benefits that actually come when we are actually using WebAssembly as compared to your standard serverless functions being written in JavaScript or Python. In this slide we have basically done the benchmarking of different environments for a very popular machine learning model that is the MobileNet V2 model. As we heard, basically we're comparing the performance in terms of how much time does actually take for inference. So you can see the different type of environments that we have actually used. We have used the Wasm Edge. We've also used TF Flight that is very popular for machine learning for mobile devices. And then we have also used the standard TensorFlow Python and also TensorFlow.js, which is essentially being able to run machine learning models on the Web. And as you can see that the least amount of time that it took was on Wasm Edge with the end-of-time compilation. So you can clearly see that when it comes to specifically for edge devices, the inference time that it took for the Wasm Edge was the least. And that means it's one of the most highly performing ones, especially for edge use cases. So that's why Wasm Edge is really, really popular and very much more in, like, you know, I'm sorry, we'll cut this out. So as you can see that Wasm Edge is really secure and also it is very much optimized for edge based applications. And for our demonstration that we have for today, this is our tech stack. So we'll be using Rust, WebAssembly, Wasm Edge and WordSale for the serverless functions and also Kubernetes. So now we come to the interesting part of demos and we'll see a couple of demos using WebAssembly and we'll also see how you can deploy it as a function as a service. And finally we'll also see how you can use Kubernetes to manage it. The code for all of these demos is already open source. You can find it on GitHub at this link. So on to the interesting part of demos. So the first demo we'll be seeing is running WebAssembly app locally using Wasm Edge and we'll see JavaScript and Rust demos. So for JavaScript we'll actually be using Wasm Edge minus QuickJS interpreter, which is a runtime sake and run JavaScript in WebAssembly. And well, obvious question you might have would be, is it slower than V8? And well, the QuickJS interpreter is slower than V8, but given the specific conditions of when you would actually use it, especially in cases of high performance computing, then the Wasm Edge minus QuickJS at time could be very much helpful. Machine learning is a prime example of that, which is what we'll be seeing. I also have all of this in a GitHub repository to make it easy for you to follow along. So the first thing we'll be doing is adding the Wasm 32-Wazzy target and was essentially a standardization of system calls. So you could use Wasm outside the WebAssembly. So it provides consistent system calls so you can use Wasm outside the WebAssembly. So we'll just add this target and then we'll try to build QuickJS, Wasm Edge minus QuickJS. And notice that we also put in minus-features equal to TensorFlow. So that allows us to add TensorFlow and TensorFlow Lite extensions very easily, which is what we want to do. So I've already built this, which is why it didn't take us a lot of time. But what I want to show you is let's just quickly go to, let's just quickly go to, let's just CD into target, Wasm 32-Wazzy slash please. And let's do an ls over here. So what you can see is we have a Wasm Edge and let's go to QuickJS dot Wasm. So this is a dot Wasm file, the Wasm Edge, QuickJS interpreter, which we'll be using to run any JavaScript with WebAssembly. Okay. So now, here this one. So now what we want to do is actually take a look at a JavaScript example and run it locally. So I have a couple of them over here. You can essentially try out any one, but for now, let's just try out the MobileNet V2 demo. And I'm in the wrong directory. Okay. When the couple directories to back, I guess. Okay. Okay. And now let's go to the MobileNet V2 slash board directory. So this is a very simple MobileNet model that that classifies different board species. And if you see, we have the Wasm Edge underscore QuickJS dot Wasm or Wasm interpreter. So the JavaScript code we have here is pretty straightforward as well. We are using the TensorFlow Lite APIs to load the image we have. We are doing any kind of pre-processing we need to do on that. Over here it is resizing it to 192 plus 192. And then loading my TensorFlow Lite model and then simply running the inference based on the TensorFlow Lite model. But you can of course have different pre-processing or post-processing steps as you would need. And we also have the Wasm Edge underscore QuickJS dot Wasm interpreter dot Wasm file. So let's actually try running the JS example locally. And what I'll do for that is, so I'll just run. Oh, I have a command here. So you can feel free to actually use these commands while trying these numbers out for yourself. And so what this command essentially does is runs the Wasm application locally using the Wasm Edge minus TensorFlow minus Lite utility. It mounts the current directory and then passes it the Wasm file, the QuickJS interpreter and the main.js file, which is actually the JavaScript code you want to run with. So that was about the JavaScript example and running it and running it locally with Wasm Edge. Let's now take a look at the last example. So the idea we have is trying to run the same model in Rust, the one we ran in JavaScript. And this is actually like the code based on which the benchmarks were also made. The benchmarks we saw earlier. So, yeah, this is actually the same model. So let's just go to the Rust MobileNet v2 directory. Rust MobileNet v2. So now that I'm in the directory, what I'll first do is I'll build this application. And this shouldn't take a lot of time because I've already built it earlier. Yes. So now I've built my application and something you'll see again is if I go to target. Wasm 32 minus was a slash release and let's do an LS over here. So I have the classify.wasm file and this is on the classify.wasm file is what we'll be running. Running when we want to run this Rust app with WebAssembly. So let's start running the classify.wasm file locally with Wasm Edge. So another thing you can also do is like AOT compile down. So we also, so a lot of speed of WebAssembly actually comes from a head of time compiling it. So essentially converting the dot wasm file to the dot so the Linux shared library format fight. And that is machine specific. So it is dependent on the machine. Essentially machine native code. So you can also do that with Wasm Edge C minus sense of low, but we see that in another demo. Right now, what we can do is simply use the wasm Edge minus sense of low minus slight utility to run the Rust app locally. So it seems I don't have. Oh, okay. I see. So I put in the wrong image. This is for another demo. You can also like feel free to try out the other master most I have in this directory. But the image we actually have here is because this is the same model, the board image. So let's put in that image and run. Okay. So I've made a part of it over here. I don't have a classified version file over here. First it's in the target was in 32 minus was in slash release. Now this should go with yes. So it actually takes 215 milliseconds to run this mobile network to model and it actually correctly predicts it as well. Just like our JavaScript just like the JavaScript demo we showed this was able to correctly predict it as well. So that's good, but a lot of speed of ever simply can also come from ahead of time compiling it down. So you could get some really great speed up on this on this benchmark as well and we and actually how you can ahead of time compile it as well. So let's now take a look at how you would how you deploy this as a function as a service. So well, that is pretty simple to and the first thing that we'll do is let us go through the function as a service directory. And I love to show you how similar it is. So I have a I have a function over here, quite the image classification function. So if you see on the left, my directory structure, so I have the image classification function and I just open up the restore. And this is essentially the same restore to be used earlier. So this makes use of the was a match to the flow interface loads my loads my mobile like modern and does any does any kind of preprocessing that's required. In this case, we just have a resize. We just have a size to the size of 224 by 224 pixels. Of course, preprocessing or post processing ships could differ. And then I actually get the output from a particular node. So this is actually really similar to the rest example we saw earlier running locally. So first what we'll do is we'll build this application. So I have some commands over here. We do it with me for me. So let's just sell these. So what this does is build the wasm application by a function that is a saw. And now what this does is remember the dot wasm file that is created. So this will also because you just built it using the wasm 32 minus was a target. This will also create a dot wasm file. And what we want is to have the dot wasm file in the root directory from where we are applying that function. So that is just what we are doing. Simply getting the dot wasm file, which is in wasn't 32 minus was the slash release to the root directory. So now let us try deploying that function. But before I do that, I also have the I also okay. So this is my function. And I have my classified at wasn't file over here, which is what we just compiled. And I also have the speed or test each fight. So what the speed or test each file is actually doing is it's using the wasm hc minus tensor flow. So remember what wasm hc minus tensor flow was used to do. It was used to convert it at wasn't files to and ahead of time compile down here wasn't files. So this actually takes all the dot wasm fights. In our case, which is someone wasm file classified at wasn't. So this takes the classified at wasm file and converts it to a dot test of fine. The Linux shared library format. So the head of time compiled on file is also machine native code. And because it is machine native code and can only run on a particular device set of devices, that is that also gives rise to something called the web assembly universal format, which is the dot wasm file plus the plus the head of time compiled and that is a file. So we'll also ahead of time compiled on our code this time. So, so, so let's run this and what we'll do is we'll simply use what's in deploy. So what's in deploy will allow us to simply deploy this to what's in functions. Of course, you should have the Horses CLI already installed. Let's open this link. So this is my. So this is my or what's in dashboard where I can see that it's already building. So it might take a moment or so to build because we just applied the function as a service application right now. It still has to ahead of time compiled on the wasm. And let's see. Oh, so if you see now it actually started compiling down the head of it ahead of time compiled on dot wasm files. I've got them in dot s of files and the windows actually done. And we should now be able to check out check out the deployed function as a service. So let's select a photo and I'll actually select a pretty popular photo. And let's click on classify with wasm. So this is. So let's select a photo and I'll actually select a pretty popular photo. Let's click on classify with wasm. And this is actually a pretty popular photo and this is actually calling the rest function which we wrote the machine learning model, the mobile, the mobile net model we wrote using rust. It is calling this mobile net model and this is not actually the same words model that you're using for real to classify between different word species. This is a generic model at the end of the internet dataset. So it is able to classify their other things apart of words as well. And so when we click on classify with wasm, it actually called it actually called our rush code which we had used to do the machine learning inference and it also apparently classifies this as a comic book. So that was about our second demo like applying it as a function as a service. The next demo we'll be seeing is running web assemblies and managing them with Kubernetes side by side by Linux containers. So let's take a look at the rust mobile net we do demo which we had seen earlier. And what I'll do is because I already have that wasm file. So I'll just come down to the wasm file I have over here which was in. So let's just go to target was in 32 minus was ease last lease. So we're here. I have a classify dot wasn't fight. So let's just create a Docker file over here and let's open the Docker file. So this one is already be populated from you for you. But what this essentially does is simply runs the dot simply runs the dot wasn't fine. So that is all our Docker image contains. So that is all the Docker file contains and now that we want to build the image we also need to specify that we don't want the image to have a guest to us because we are running it on our running it with web assembly. So what you can do is use something like build up and and simply add the annotation to tell that this is a web assembly image and we don't need a guest to us focus. So so let's just do a build. Oh, yes. So let's just use build out to build a Docker image. And I have actually built an image now. And now you could easily do something like so the build up push and push it to anywhere you want. It's easy or a little bit of a little bit easy to do that. So, so let's take an example. Let's take a look at an example for an image that is already out there. So there is a pretty popular image. Maintained by the wasm ht and this image is essentially made to try out or try out was he and this is a very simple application that just tries out made to just try out was he and make some prints out some random opens a file creates a file. So a very simple was the example and then image for this already exists. So I already have a kind Kubernetes cluster created and what I'll try to do and what I try to do is just is just run this web assembly this web assembly image. So, so what we already have over here is this is the wasm was the example and this essentially runs the runs a web assembly image in the kind Kubernetes cluster which already have created. So this is the sample application I was talking about and this is the language inside a kind Kubernetes cluster algorithm. So, so, yeah, this was the final demo about how you can easily create images from your web assemblies and then run those images and run those images inside a Kubernetes cluster and manage it with Kubernetes. Now specifically talking about why to actually use Kubernetes with wasm ht. So essentially, Kubernetes be managed microservices and edge services have actually gained quite a lot of popularity, but there is actually a growing demand for a lot lighter weight containers that are actually appropriate for workloads such as the ones on edge devices because usually we know that edge devices are usually constrained in terms of the resources. So if we use some traditional Linux containers, they are not the most optimized specifically for its edge applications because of the fact that generally the Linux containers are really large in size and also the fact that the compute time to actually set up and run these Linux containers is really large. Now if we take into comparison web assembly based containers, web assembly containers are having a smaller footprint in terms of the size and also the amount of time that actually takes to boot up and start using basically a web assembly container. So in terms of the size, usually the web assembly applications are just 1% of the size of a standard Linux application container. And then also in terms of performance, they can be anywhere from 10 to 100 times more performance as compared to your standard Docker based containers, Linux containers. So that is why web assembly is a really good choice for the Kubernetes ecosystem. But yes, with that, that kind of brings us to the end of our entire presentation. So you can reach out to us on our Twitter handles how to lp and so on. In case you want to get in touch with us or if you want to have any questions regarding our presentation today, but now we'll be open to questions. We did collect some of the questions asked regarding our talk from the audience and we'll be taking up a few of them in this presentation today. So one of the most common questions that we got within the audience is web assembly used outside the web. So of course, if you look at the terminology as web assembly, a lot of times it's confusing to those who hear the terminology for the first time since it's associated with web. But actually the term web assembly itself is not really related to web at all. Specifically, we break it down into the web and assembly. The assembly part comes from the fact that how web assembly is actually created. So as you know, it's a binary instruction format. And originally although the focus of web assembly was mainly for web to be able to actually use certain other languages such as C++, Rust, and then actually compile them into this bytecode and then run more highly computational tasks within the browser. But since then the ecosystem for web assembly has expanded a lot more even on the server side and on edge. And you actually see more number of applications for web assembly today outside of the web rather than just being on the web. So whether it is being able to actually make, let's say your traditional back-end frameworks like Node.js more performant or being utilized for being able to actually run your standard services instead of using Docker. They can actually be run on web assembly. Web assembly today is being utilized as container. And of course, we are seeing a lot of different applications like being able to run highly conventional tasks on the edge as well by utilizing web assembly because of the smaller footprint and the security that web assembly provides. And in this talk that we have showcased, we have showcased example of web assembly being actually utilized for serverless programming as well, specifically by taking an example of machine learning on the, as function as a service with the help of web assembly. So the general, the entire general ecosystem for web assembly is expanding quite a lot in a lot of different use cases outside of the browser. So we have a number of different services. For example, we have a number of front-times you can take a look at including Wasm Edge, Wasm Cloud, and then we also actually are able to spin up microservices with the help of some of these services such as Spin and Neon. So you can also take a look at those as well. So from being able to actually run simple AI based services to actually being able to spin up entire microservices, you'll see the use of web assembly in all these different use cases especially on the server side and in the cloud native side. So the next question we have is could you shed some more light on Wasi and how it is used? Okay, so web assembly system interfaces at the forefront of bringing Wasm to platforms other than the web and it plays an important role when you try to do some particularly interesting analogy I could give is let's say you think of it as some programming languages give you access to let's say some system calls. A system call could be something like opening a file deleting a file creating a file all of that stuff and a lot more just a couple of examples of system calls and well if you think a programming language directly gives you access to all of these system calls well they don't so let's say if you are writing some CRC plus plus code to open a file what you would essentially have to do is I use the CRC plus plus syntax to open a file and under the hood what that does is translates it into a system call as the kernel if it can perform the system call and why does the kernel come in between and well that's because all of these tasks are way too important you need to ensure that the right user with the right privileges is being able to do these tasks and it's not affecting the performance of any other processes that might be running so all of these system calls are pretty important which is why in a traditional setting you have a Linux kernel in between and WebAssembly system interface essentially standardizes these system calls so so let's say if you are writing some CRC plus plus or rush code to be then combined to a WebAssembly you would want to use a WASI system calls and WebAssembly doesn't will just stop at putting your source code so you can most certainly put your source code but WebAssembly doesn't stop at that because even after putting the source code you still have to compile your source code into a machine specific format so the WebAssembly system interface goes beyond that you can use any language you want, compile it once and then run it in any WASM recognized runtime so that is what the WebAssembly system interface allows you to do and this OS level emulation is actually highly beneficial more than traditional sandboxing so WebAssembly system interface is very much useful especially in bringing WebAssembly outside the web and standardizes all of the systems alright so the next question is what are the alternatives it is for AI based functional service and why the WebAssembly approach might be better so we have seen that the rise of a lot of different serverless functions and that's why we are also seeing functional service being really popular now traditionally in most use cases if you are utilizing any of the major cloud platforms such as AWS, TCP or Azure, we are aware of things like AWS Lambda which allows for creation of very easy serverless functions and these serverless functions can actually be used for anything and there are some really great examples for actually being able to utilize AI based services with the help of these things such as AWS Lambda and essentially all these different cloud platforms to provide you being able to actually run the inference directly inside of the cloud provider and then have these serverless functions that you can utilize to do all the training and inference for you now the reason why WebAssembly is a better approach because of the fundamental way in which WebAssembly actually works so the first point being that WebAssembly is highly portable now traditionally if you were to let's say migrate over your entire function as service and AI based function as service from one platform to another there will be certain platform dependencies due to which you will have to make changes to your functions as well but with the help of WebAssembly since WebAssembly is highly portable usually it's just a single target once like you know you have utilized your highly conventional languages such as Rust or C++ into the specific bytecode that you have generated it can actually run across multiple OS and multiple systems so you don't have to worry too much about having to actually edit or change the configuration for your functional service and the security model the sandboxing model on which WebAssembly based what that means is that essentially your entire calls run inside of the sandbox environment and that makes WebAssembly highly secure so whenever you're running any kind of inference, AI based inference those will be running inside of the sandbox environment making it a lot more secure as compared to traditional functional service being provided by other providers like if you're traditionally using something like a Python based service function or a JavaScript based service function so those are the two major reasons why the WebAssembly approach is actually better as compared to other alternatives and especially if you are running your functional service for low compute platforms that's where it really shines as compared to some of the other options that currently exist for functional service for specifically AI related tasks the next question we have is how does any WebAssembly how does any WebAssembly integrate with the JavaScript interpreter so this is an interesting question because you can also have WebAssembly and side by side with your with any JavaScript business code or business logic you might have so what we can do is actually use the wasn't binding so the demo we actually should just go with how you can how you can write JavaScript code and have it as and run it as a WebAssembly but we actually didn't see how you can run WebAssembly side by side with JavaScript so and this is actually pretty easy to do with the wasn't binding so that allows you to have your JavaScript code the JavaScript interpreter and your JavaScript code side by side with the WebAssembly and the WebAssembly and the wasn't the MC hub so you can very easily so you can very easily have both of these side by side and create JavaScript applications with embedded trust functions and and the link between the link between JavaScript and the WebAssembly is actually done by the wasn't binding and it can also take a better look at this so those are all the questions you have Thank you so much Anis, we hope to see you in person at the next Open Source Summit Latin America Thank you