 I hope the network permits, because I have been trying to do it since. I think you're watching now. Yes, so far. Thank God for that. Yes, our next section is going to be focused on WebAssembly based AI as a service with Kubernetes. This will be taken by Shiva Lamber and Rashid Dagli. Rashid Dagli is a high school student from Mumbai, India, and loves working on machine learning, especially computer vision and Kubernetes. You can always catch Rashid working with Android. He is an active contributor of multiple open source projects like TensorFlow, Kubeflow and Kubernetes. While Shiva Lamber is a software developer specializing in DevOps, machine learning and full stack development, also an open source enthusiast that has been a part of various programs such as Google Code and Google Summer of Code as a mentor. We are currently in the MLH fellowship. And over to you both. Welcome to our talk. I'm Shivaay with my friend Rashid. And the topic for our talk is WebAssembly based AI as a service with Kubernetes. A very quick introduction about myself. I'm Shivaay, a developer advocate at Millisug. And I'm also a contributor at Lef5, which is part of the Cloud Native Computing Foundation. Over to you, Rashid. Hi, I'm Rashid. I'm a high school student and I'm an incoming student at the University of Toronto. I contribute to maintain and create multiple open source projects, mainly in the machine learning ecosystem. Well, talking about machine learning, the first thing I'd like to touch upon is why to choose Rust over Python in machine learning. Now, of course, we know that worldwide, Python is by far the most popular language when it comes to creating doing machine learning inference. But there are a few reasons why you should actually choose Rust over Python. One of the biggest ones is in terms of performance. Now, Rust can be actually directly compiled into machine code. And there's no need for, let's say, a virtual machine or an interpreter that is actually usually the case with Python. And one of the other really big advantages is in terms of the thread and memory management. Now, you must be very aware of the global interpreter lock issue that still plays Python. And while Rust doesn't have, let's say, the necessary bit selection like in Python, but the compiler in Rust still enforces some invalid memory reference leaks as well. But these advantages are much better in comparison to Python. And as a matter of fact, according to a study by IBM, which highlights that how Rust and WebAssembly can actually gain 12 to 15 times performance in comparison to Node.js, and more than 25% times increase in performance as compared to Python. So that just goes on to show why you can actually choose Rust over Python, especially machine learning. And that sort of brings us to WebAssembly, how WebAssembly actually comes into picture. Well, WebAssembly is a compile target that essentially allows you to run these executables at native speed and also in very small containers. And these are portable. That means that they can actually be run anywhere. So you just have to use, let's say, some of your highly computational languages such as C++ or Rust and have WebAssembly as a compilation target. And these are portable in nature. That means you just have to generate them once and these can be executed anywhere without having to redeploy them again and again. And also you can use some scripting languages like JavaScript or Python also to compile them into and WebAssembly executable. And the biggest factor also being that since WebAssembly is a binary instruction format that allows you for near native decoding. That means it is much more faster as compared to other comparative runtimes. And this also specifically means that when it comes to machine learning, WebAssembly is a really great way to do machine learning in France as well. And coming to how you can actually use WebAssembly within the ecosystem for cloud native. So WebAssembly is expanding within cloud native as well, especially because CNCF has a lot of different sandbox projects today, integrated such as the Wasm Edge, Wasm Cloud. Speaking specifically about Wasm Edge, it is a light WebAssembly runtime mainly used for cloud native and also edge-based applications. And Wasm Edge truly helps you bring WebAssembly to the edge because Wasm Edge allows you to enable serverless functions which we'll be talking about in a bit that allow you to run WebAssembly use cases on edge devices. According to Mortius Kapion from the University of Tilbagh, actually published an article where he compares the comparison of running WebAssembly and actually doing machine learning inference as compared to Docker. And by actually comparing the inference time that actually took between both WebAssembly and Docker, there was an improvement of more than 5 to 10 times for the machine learning inference in terms of performance while also the machine learning containers being smaller as compared to Docker containers. Now, one of the reasons why we should actually use WebAssembly also for serverless. So first of all, you must be aware that today's serverless computing is like, you know, gaining a lot of popularity, especially with a lot of different tools such as edge functions provided by all major cloud providers like AWS Lambda or even like, you know, Netlify or Vercel providing Vercel functions and these edge functions allow completely new dimension to serverless computing. And there's some really great reasons why you should actually use WebAssembly as well for these serverless functions. So first of all, you can actually write highly performance functions in languages such as CA and Rust, and these can be directly compiled into WebAssembly. And these WebAssembly functions are actually much more quicker as compared to JavaScript or Python, which are commonly used as serverless functions, for example, in AWS Lambda. There are some other benefits as well when using WebAssembly. One of the biggest ones being that as we mentioned that the bytecode for WebAssembly is actually portable. So that means that if you're actually using WebAssembly based function or like less functional service, you just have to deploy it once, and you can actually then go ahead and use these serverless functions anywhere and then actually run them anywhere in any kind of a cloud environment without having to redeploy them again and again. Also, at the same time, the deployment as we mentioned deployment of WebAssembly applications is very simple. There are a lot less platform dependencies that are required to actually be used alongside these WebAssembly applications as compared to let's say JavaScript or Python based serverless functions. So that makes it very easy to actually just work with these WebAssembly based applications. So with Bosmich, we also expand the WebAssembly security model, so you must all be aware that WebAssembly security is very well known as compared to other containers because WebAssembly itself cannot really do anything within the sandbox environment. In case you want it to interact with let's say the file system you need to use Bosy, and this makes it a very safe environment. So you can basically use that security model as well. So for example, if you were to do AI inference as a functional service that can be first of all very secure and also run at full native speed because of the fact that we are getting that native performance with the help of WebAssembly executable. So those are some of the reasons why WebAssembly is preferred, especially for functional service. And this particular site sort of showcases the benchmark that we ran for one of the most popular machine learning models, which is the MobileNet V2. We basically ran a benchmark for doing performance comparison between various runtimes including Bosmich, TensorFlow Lite, Bosmich with ahead of time compilation running this model in Python and in TensorFlow. And as you can see that the least amount of time that it took for inference was with Bosmich with the ahead of time compilation as compared to some of the other ones as well. All these records were in inference were done in milliseconds. So you can see that the fastest one is Bosmich. And the text that we're going to be using for our demo includes Rust, WebAssembly, Bosmich, WordCell since we've been showing how you can actually deploy a service function onto WordCell and then also with communities. So now over to the demo. We'll meet you after the demo. So now we'll come to the interesting part of this demo. So the first demo we'll be seeing today is a running machine learning inference task that is computationally intensive in WebAssembly. We'll be seeing Rust and JavaScript demos both. And at the moment, let's just see how to run this locally in Bosmich. And so WebAssembly, as many of you might know, started out as a JavaScript alternative for browsers and run high-performance applications or high-performance computations like machine learning inference that we'll be seeing today. In languages like C, C++ are safely in the browsers. So WebAssembly runs side by side with JavaScript and we'll actually need, so to run JavaScript, we'll actually need an interpreter. So a sophisticated interpreter and something I've tried quite often is the QtJS interpreter that allows you to easily use JavaScript applications and also has really brilliant support for TensorFlow and TensorFlow Lite models. So before taking a look at this QtJS interpreter, you might think that could it be slower than V8 and will it be slower than V8 because QtJS is not having just in time compilation. But if you think about it, if you think about it, QtJS is not only a lot smaller than V8, it's literally 140 in the size, if that's right. Yeah, it's 140 in the size than V8. And the second part is you only want to run a lot of your, you want to run, you only want to run some of your code in JavaScript and probably call the computationally intensive tasks like machine learning inference for one or image processing stuff that you are doing. So it will be calling the Rust function for that. So, so you can essentially have JS programs with extension APIs and Rust, C++ and Rust with embedded JavaScript. So that is very much possible, which makes QtJS a really nice right for us. So, so first let's take a look at building QtJS, which is, which you can do quite easily. Let's go to wasn't it's QtJS. And now we'll try to build the QtJS interpreter. So what I want to do is cargo build, minus, minus target, wasn't 42 minus was he so you'll also have to add this target early on, I've actually installed wasn't 32 minus was he or you want. But if you don't, you will need to install this target early on, please. And here I'll just, and that I also want TensorFlow specific features, because we'll be at the TensorFlow extensions because we'll be running a TensorFlow model. Oh, I made a typo there. So I have actually built this, so it didn't take quite a lot of time. But what I do want to show you is, let's just go to target wasn't 32 minus was he and let you in the list over here. So what you can see is he wasn't it's QtJS.wasn't. So this is the JavaScript interpreter that we have. And so let's now go to, let's now go to our JavaScript example of our mobile net we do. So we'll go to JS and let's go. Mobile net we do both. And so, of course, we'll start. So, so now we can start using the wasn't it's QtJS interpreter, which we just built. And we'll use the was image minus TensorFlow minus light utility that is a was image built with TensorFlow and TensorFlow light extensions. And it makes it really easy to work with TensorFlow and TensorFlow light models. So let's do that. So what we see is was image minus TensorFlow minus light. And now we want to mount the current directory. So let's just want the current directory first. Then we want to, so now we want to tell it which interpreter we want to use. So that is actually the one that we just built. And now we'll give it the, and now we'll give it the JavaScript code. So let's go on to the JavaScript code, which we have. And so if you see, this looks quite simple. And it is because of the TensorFlow light APIs as well. So I start out with loading my image and then doing any kind of pre processing on it. So as of right now, it is resizing the image, but you could have some more pre processing steps as well. I then load my TensorFlow light model. And I want to get the, I want to get the predictions from a specific node in the model graph, which is what I'm doing over here. And so the output, which I get will essentially be a list of probabilities. And finally, and finally the index I get with the most probability. So let's say I have zero or one as the number with most probability. So each of them corresponds to a particular species of bull. Remember what it was? So each of them corresponds to a particular species of bull. And which is why I also have a label map down below. So what this label map tells it is, is what index corresponds to what species of bull. So as of right now, let's just print out a label and we'll print out a label. So this is the JavaScript code, which if you see was pretty simple. And let's make this up. Sorry for the background noise. I just saw the JavaScript code. So let's give it the JS file. And it should tell us, so I also had a sample image over here. That was actually him and his Cassini. That is a species of bull. So it actually did. So the TensorFlow model actually did predict it correctly. So that was the JavaScript example. And there is another little thing which I wanted to talk about, but I'll come in that after the last example. So, so let's take the same model. And now we want to see a rough example. So let's go down to the rest directly. So this is called rust mobile native. I'll just show you this demo as well. So, so first off, so first off, we'll start off with building this. And we'll do this in a similar way we did it earlier. Where did the command go? Yeah. So, so, so let's build our, so let's first build this. And I actually built it beforehand. So we don't spend a lot of time on these processes in the demo. But if you see similar to what you might have seen earlier, if you remember from the JavaScript example, we go to the wasm 32 minus wasm target release. And let's do nls over here. So what I have is the classify.wasm. And this is, and this is the dot wasm file that I can use. What I can also do is I can aot compile it down ahead of time compile this down to machine native code. Compile it down to a .so file, the Linux shared library format. And I can run that as well. So, so remember the benchmarks from earlier. And so, so this is actually the model on which those benchmarks were made. And so at the moment, we'll also run this with the wasm bench minus TensorFlow minus light utility. But so, but with wasm edge, it gives you a cool utility wasm hc, which allows you to aot compile your code down very easily and get a .so file, which you can use. And this also gives, so because the .so file is machine specific, this also gives rise to this also gives rise to the universal format, that is the dot wasm file plus the dot so file. At the moment, let's run the rush code. So, so, so I'm using the same model to do the inference with in rust. And if you see the main dot artist, it is the whole year is pretty self explanatory as well. I'm using the same model, the boards model and performing the inference on it in the exact same way as earlier. What I also wanted to do was show you how much time it takes to do an inference. So, so let's, so first let's just run this. So using the wasm edge minus TensorFlow minus light utility like we discussed. So wasm edge minus TensorFlow minus light. And now we have, so now you are doing the part to the wasn't right. So that is classified or wasn't. And we also put in an image. This is the same image from earlier. The demo was essentially supposed to be the same model shown in JavaScript and trust, which is what we do. So this gives us output in 187. This is, which is a bit more for a TensorFlow light model, particularly for a small model, but a lot of the speed of web assembly also comes from a head of time compilation. And if you remember the benchmarks we showed you earlier, a head of time compilation can cut this down by 8 times to 10 times, which is a very big increase in speed. So that is something you definitely want to try out with wasm edge and you can run it in the exact same way. So that marks our end for the demo one. And let's now go on to our second demo, which is deploying this as a function as a service. So now what we want to do is we have already seen, we have already seen how we can run how we can run a web assembly app with wasm edge. So let's now try and deploy it as a function as a service. So we'll actually be using the similar code. We'll not be using the same model. You can also like put in the same model if you wanted to. We'll be using the same category of model, which is actually built in the exact same way. You have the label map, you have the model from which you want to get the outputs from. So let's go to the functions. Here we have the rush code. So if you see this is essentially the same rush code as with the new model out there. And of course different pre-processing steps in our earlier model. We wanted to convert it to, for the earlier model we resizing it to 224 by 224. But of course any of the pre-processing or post-processing steps might of course differ according to what model you have. So we also have the hello.js and the pre.sh and the pre.sh file. And what that does for us is, so let's just go to the pre.sh file. And what that does for us is, gets all the dot files to the dot SO files. The Linux shared library format. At the head of the uncompiles it so we can run it a lot more easily. So we'll take the mobile net model and as you might expect, we'll first go to our function which we have, which is image classification. And let's actually bring this out. So we'll first go to image classification and now we'll build this similar to how we did earlier. So, okay. Let's build this. And our target is 32. 32 minus 1. So I had built it earlier, which is why I just optimized the target very quickly. So now we'll just prepare this because we now have a classify.version file, like we did earlier in the release directory. So now we'll just put it into the root directory for the function as a service application. And oh, I didn't, oh, yes, yes. I want to get the dot version file, not the old directory. So just moving the dot version file to the root directory over here. And what you can now do is, what you can now do is deploy this to version, which is the example we'll be taking a look at. We can deploy this to version serverless functions. So let's just do version deploy. So you already need to, so you already need to have the version CLI installed. So I'm in the wrong directory at the moment. What I will do is I'll go down to the fast recovery, which is where now our classify.version file is. So let's now do a version deploy. So this is actually building and then deploying it to version functions. And we'll have our function ready for deployment. So it is taking quite some time. Let's go to my version workspace for this demo. Oh, yeah, give you build docs. That is what I wanted to see for the deployment. And it was actually able to build it up. And if you also see that, so when it ran the pre.sh site, it actually converted. If you see, it uses wasmhc to height of time compile, to height of time compile a rush hold. And I do the inference a lot more quickly. But you should remember that the .so file is the machine network, which is why it is only useful to run a particular machine. And you can also use the universal version format to copy the .version file that can be run on any version sandbox and the .so file that is machine specific. So let's actually take a look at this deployment, which we just deployed. And so this deployment is a password. So let's now take a look at wasmhc. I just showed you about why it needed to be converted to all the .so files. And now let's open our deployment. Okay, there we go. So this is the deployment, which we just made, to wasmhc functions. And let's try on an image. So this is the image net model at the same board model. So let's try on this image, which is a pretty famous image. And it actually tells me that it's a comic book. So the image net model has another set of labels, which is used for detecting all kinds of images it has. So the image net 1k has 1000 labels and the image net 21k has 21,000 different labels for images. And that is the model which we deployed to serverless functions. If you might see it also, the inference was also quite quick because we ahead of time compile down our code. And it also rightly classifies this image as a comic book. So now we come to the third and final demo, which is managing web assembly apps with Kubernetes. So it is really nice. And one of the great advantages of Linux containers, the Linux containers ecosystem is that you have a lot of tools, you have a lot of support. You have Kubernetes to manage it. You have a high level container, a low level container, which allows you to work with Linux, Linux based containers very easily. And so you can also do the same with WebAssembly based containers, which is quite interesting because WebAssembly based containers are pretty, are faster at startup. And you have already seen the speed, especially coupled with the ahead of time compilation about how WebAssembly could be faster. And it would be really nice if you could have your WebAssembly container images side by side and WebAssembly apps side by side in the same system with Linux containers, which is what we'll be seeing today. So first off, we'll take one of these examples and create a container out of our version application. So first let's go to an example. Let's take the mobile net example. And so we already have the wasn't 32 minus was the target installed. So first we'll build this. We have already done this in the previous demos, but I'll just build this up again for the wasn't 32 minus was the target. And yeah, so because we had already built it on here, this was a lot faster. So now what we'll do is we'll also, so now we want to create a Docker file, which will run a dot wasn't fine. If you remember where the dot wasn't file is, it's in the wasn't 32 minus was the target. So first let's apply the executable permission to it because that is what we'll be running in the Docker what we'll be writing in a Docker file and what our container will be running. So let's wasn't 32 minus was the release and this is called classify dot wasn't. So there you go. And now what we'll do is how we'll create how we'll create Docker file in the release directory. So let's create a Docker file here and we now have the Docker file up. Let's open that. So I've already written this down for you, but what you can simply do is add your dot wasn't fine and I think that was a fight. So this is very simple. Now we'll build the container from this Docker file. So what we'll do is so the seed and container and time can already start the this web assembly based container image, but it requires another annotation on the container image to indicate that it is a web assembly application which does not have a guest OS. So to do this, we'll actually use build up and add an annotation that this is a web assembly application and we don't have a guest OS for this. So, so let's actually do that and we add and we'll add our annotation. So the annotation is module dot and this is just to let know that we don't have a guest OS for this web assembly application. So classify. So this adds the so this adds the annotation on this container image and then you can also and then you can also do build up push to push this to Docker hub or CCL literally wherever you want it. So, so that is the part about building up building a container from our web assembly building a container image from our web assembly application and what we'll do at the moment I haven't pushed the container I haven't pushed the image I created to Docker hub and what we'll do is we'll just take this we'll just take this simple piece of code. So what this does is this is actually by the wasn't it's team so this is it's a couple of random numbers so just to test out the body and they already have a container up for this so, so we can directly test this out. So you can actually try running a web assembly container or you can actually try making a web assembly container and running it running it with with Cubelch KTS Minikube kind of whatever you want at the moment I already have a kind cluster up and I just I'll just run this example which we just saw they also have a published image which was a reason to try out this example and show this example because it's quickly doable and I haven't pushed the image so this is an example of running a web assembly running web assembly and managing it with Kubernetes we have actually taken this web assembly app which is a very simple web assembly app made it into a container I like the annotation and also run it on our kind Kubernetes cluster so that was it for demo 3 so I hope that you liked the demonstration in case you are interested you can check out the code on GitHub via this particular link that we'll also be sharing in the chat for all of you. Now we also want to cover slightly about why choose this balance between Kubernetes and Wasm Edge now developers can actually use various container tools such as Kubernetes to actually run lightweight web assembly applications and you may ask that why to actually choose web assembly based applications on Kubernetes so especially when it comes to being able to run applications on the Edge which are usually those kind of hardware which are very resource constant if you try to run Linux containers on its devices it can be sort of an issue because Linux contains are usually a larger and they also take a lot of time to actually get started. Now in comparison to those when we use web assembly containers they are much faster to not only invoke but also they are much smaller in size up to hundreds of times in a lot of different situations and at the same time a good method to actually run these web assembly containers on Kubernetes is by actually pairing them up with some of the existing Linux containers especially like Docker containers that might be running within that particular web assembly within that Kubernetes ecosystem because as we know that with web assembly it is not having the best tooling out there a lot of the toolings are not supported so essentially what you can do is that you can run your web assembly containers alongside Docker containers and use the rich ecosystem that is provided by the Docker containers with the tool chains that can be used also by the web assembly containers as well with the help of Vossi and web assembly as a really good choice for our communities based ecosystem with that we will conclude our talk thank you so much of course you can reach out to us via Twitter handles me, Shivai, Adirith Haudelab and Arishit Adirit Arishit Adagli so thank you so much for attending the talk and of course now we are open for questions. Thank you very much. Awesome that was an interesting session we will now bring Shivai and Arishit online to answer your questions. Hi, welcome Shivai and Arishit. Can you hear me? Yeah, yeah, Booker we can hear you. First of all thank you so much for inviting us for Kisvi and we have been watching some of the other talks as well so yeah thank you so much for inviting us also. We are very grateful for you taking the time to join and prepare the video that we don't have any questions here at the moment. I think since we will probably also be uploading the questions later to be on YouTube we will share the videos with you so that you can probably check the feedbacks that were given and so on and also viewers can also throw questions at you on Twitter I think Arishit's handle is Arishit on Twitter and that of Shivai, I can't remember yours, maybe we can How to love? But yeah this to sort of cover what we wanted to represent from our talk is that a lot of times you might want to run machine learning applications and also in general any computationally heavy applications on edge devices and that's where WebAssembly really comes to the picture because generally WebAssembly is conceived to be front-end and a browser-specific technology but today it has been brought over and it's actually being used quite a lot in the back-end and especially in the cloud-native space as well so that was we wanted to convey that message of course by taking an example of machine learning in our talk Yeah awesome Arishit, do you want to add anything else? At the moment, no but in the highlight part but another thing which I found particularly interesting with WebAssembly is that you can shift the paradigm because of the speed and the size of it compared to Linux containers and we also showed the benchmarks for the MobileNet v2 demo so that was actually the same model we were running in the demo we did all kinds of benchmarks on it so that in a sense feels to me managing your WebAssembly containers with Linux could change the paradigm for a lot of tasks and you should definitely give it a try all of the demos are already open sourced some of them are the example demos from the maintained by the Wasm Edge team but a lot of the demos like the MobileNet v2 one which we saw today were new models so you can definitely try them out if you also have a GitHub repo where all the demos are you can take a look at them anytime Thank you so much over the entire KCD Africa team and thank you so much Anita for having us Thank you Thank you so much I think you said Oh, and I was busy talking I was just thanking you all for joining and we look forward to hosting you in our subsequent talks and engaging with you online. Thank you very much Thank you