 Okay, we'll get started. Hello. And in the next 30 minutes, we'll be talking about WebAssembly-based AI as a service on the edge with Kubernetes. I'm Rishith and I'm a CS undergrad student at the University of Toronto and I also work on the Machine Learning team on Finch at SpaceX. Shivai. Yeah. Hi, I'm Shivai and I'm a contributor and maintainer at Layer 5, the service mesh company within the CNCF and multiple relations engineering and research. It's an open source trust based search engine. So really looking forward to be sharing with all of you how we can leverage the use of WebAssembly to enable very easy access to functions as service within the edge. So I want to start out with this meme. When I start talking about WebAssembly, which is that WebAssembly is neither web nor assembly. And this is an interesting meme and this also depends on what you consider the definition of web. And we want to run it on more than just web. We want to run it on edge devices. That's what we are excited about. So what is WebAssembly if it's not web and not assembly and why should you care about it if you are deploying edge applications, especially Machine Learning applications which are quite compute intensive. So that's what we want to talk about. So simply put it's a binary instruction format for a stack based virtual machine and what that means is it's analogous to machine code, higher level than machine code. And it's essentially designed as a compilation target. So you would probably have some JavaScript functions, some rush functions, then use WebAssembly to make a compilation target out of it. So then you would probably use WebAssembly, make a WebAssembly module out of it and then run it with WebAssembly. So you can essentially take your code or take your functions, take your modules from other languages and compile them down to WebAssembly, which is the idea. And the idea is that you can see that it's not just your functional programming languages but also scripting programming languages like Python, JavaScript that can be compiled into WebAssembly. So it's kind of a polyglot environment where you can use any of your standard languages and that can be compiled into WebAssembly target. So I want to talk about WebAssembly and especially I want to talk about this in context of edge because it's efficient and fast in the sense that you are not using, it's pretty, it's higher level than machine code but it's pretty efficient and fast. It's also open and deeper WebAssembly is open source which is why we are presenting about some of the work we have been doing with WebAssembly and it can also run on non-Web platforms which is kind of why you should think about using it for your edge applications and it's also based on the open web platform so you can also use it on the web for edge use cases where you might have not a lot of bandwidth, not a lot of network, you can also use it for that. And the WebAssembly security model is something really interesting as well. It's not just simple virtualization and you tend to get a lot of security with WebAssembly too. So I want to talk about how, so Shivai, would you like to talk about how Wasm comes into the picture? Sure, so as we kind of recap over the past few slides what we have seen is that you're primarily getting a lot of high-performance bytecode and why is that the case because what we're doing is that we are taking your high-performance functions written in languages like Rust and they're getting directly compiled to these bytecodes. So one of the common use cases where WebAssembly started was on the web where JavaScript functions that couldn't be that performant, you could use C++ like M-scripten to be able to run these highly performant C++ applications and functions on web. Now we're also expanding it to server side and on the edge as well and you're still doing that, that we can take your high-performance functions written in very popular languages like C++ or Rust and convert them into this bytecode and then run them because of the small size that comes with this WebAssembly model and as we mentioned that it's compatible with all of these various programming languages and since it's a binary instruction format so like you know deserizing it and serializing it at the lowest level is very easy to do that. And again like you know because it is running as a native bytecode it runs at a very high speed without a lot of performance overhead to worry about. So on this point I would also like to add that so if you think about something running at native speed the first thing you think is it's not, you cannot put the same thing on multiple platforms and well that's right to a certain extent especially in the case of ahead-of-time compiling so if you take one of the popular wasm run times like wasm edge and try to ahead-of-time compile it into a .so file which brings a lot of the native execution speed it's also then tied to the machine type. So WebAssembly in general is not tied to what machine it is running on it's just an instruction format you can run it if you try to do something like ahead-of-time compile it down you would essentially get native speed but it would be tied to what kind of machines it can run on. And yeah and one of the other things to this kind of focus on we hear since you're talking about function as a service right and WebAssembly as a service. So one of the other benefits that we get is that with WebAssembly we are primarily looking at again the portability aspect of being able to run these wasm bytecodes anywhere that you want because usually if you're using let's say a Python-based or a JavaScript-based function it will be more kind of limited and tied with the dependencies on the specific platform that you're running it on. So the platform dependency is also something that improves with using wasm and of course finally when you're talking about machine learning we know that it's very computation intensive, mathematical computation intensive so especially if you're running it on edge devices you want high speed given the lower constraint like given the constraints that you have with the size and also with the competition that you get with running a edge device so it's really good for being able to do model inference, conversion and deployments as well on the edge and that also very quickly and that brings us to wasm edge. What do you say? So wasm edge is a popular runtime to allow you to easily run wasm for edge devices which is also what we'll be showing in the demos and talking a bit about today and so we want to talk a bit more about how does wasm run on edge devices which is very interesting because we talked about all of these benefits but we didn't really think or talk more about how does it essentially run the same code on all devices? How does it do this? So what it allows you to do is have a fast scalable secure way to run your same code and by safe and portable we also mean that users and programs can only access what they have the right to access and we also mean that one process should not create problems for other processes so essentially we want standard platform independent methods so we want to have our system called standardized and these should be independent so which is where WebAssembly system interface comes in which solves a lot of these problems and so what was he does is it allows you to have standard platform independent system calls and this is a very popular tweet from the creator of Docker and he says that if wasm plus was he existed in 2008 there would have been no need to create Docker and so could you go to the next slide? So I want to motivate this and talk more about wasm by starting with a statement which is wrong that C directly gives you access to all system resources which it does not because it's far too important for stability and security so the way wasm works is I have taken this image from Lynn Clark and the way all applications work is they are zekernal can I do this task and to which instead of having all of these platform specific ways to do system calls was he allows you to have standard libraries you can now use was he in Rust and in C just an example here to do a system call which is not tied to what kind of system what kind of CPU you are running the wasm code on so Shivai would you like to talk a bit more about how wasm edge and Kubernetes can come together? So far what we have discussed is the application of how wasm edge as a popular runtime with ahead of time can be used to very quickly do things like modern inference especially from a machine learning perspective but also you can do it for other high conditional tasks so let's kind of understand where does Kubernetes come into the picture so we primarily know that what the main idea of Kubernetes is that it helps you to manage and orchestrate your Docker containers now the proposal is that what we can do is that we can either run these wasm modules side by side with the Docker containers there have been a lot of debate going on whether wasm replaces Docker but the ideal situation is that you're supposed to run Docker containers with these wasm models and they go hand in hand very nicely because some of the benefits that you get by running Docker containers is that WebAssembly itself is very limited in terms of functionality because of the Sandbox model and the security model that you get with WebAssembly so you can use the system resources and a bunch of different libraries and functions that you can directly get from your Docker containers whereas with WebAssembly you get very fast execution and fast load times especially if you're having heavier Docker containers they can take a lot of time to load up initially when you're in the warm up stage so that's the benefit that you get with the wasm containers as well because they are smaller in size and they have a much faster load up speed so the idea is that we are supposed to basically run these side by side and then entire Kubernetes stack and this is how the goal is that if you take a look at the entire Kubernetes landscape you want to be able to run you want to be able to have all of your different Kubernetes applications and then the high level and the low level container run times and run wasm as part of this entire ecosystem to be able to complement each other and that's where we'll kind of now go over to the demos where we'll be showing you the functionality of wasm edge and how you could configure them to be able to run on your edge devices so we'll now head over to the demos Sure So before I go on to the demos I also want to just mention that we also have some benchmarks So I worked on creating some benchmarks for running TensorFlow Lite models directly with TensorFlow Lite APIs on Android devices and iOS devices as well as running TensorFlow Lite models with wasm as well as running TensorFlow Lite models with wasm and AOT compilation if that's what you're excited in we'll not go very deep into those benchmarks right now but if that's what you're excited in we have a GitHub repository and you'll find all about the differences in time it takes Another thing I just wanted to quickly mention before we go forward is when we were talking about wasm edge and Kubernetes or wasm and Kubernetes we particularly want to share that the idea of having Linux containers and wasm containers side by side is a great one because we want you to take a look at running your edge applications with wasm but we don't want you to run all the processes on wasm and I would again like to take this by an analogy and say CPU, GPU and TPUs if you're running a YouTube video you probably don't want to use a TPU there which is the same case over here too you don't want to use wasm containers ideally for all your tasks for a lot of your tasks, Linux containers would work just as well so one thing you are focusing a lot on is having them run side by side not have all your processes on just a Linux container or just a wasm container so I'll start out with a demo about so I'll start out with a demo about running wasm edge apps and we have a few demos over here I'll start out with a demo I had been preparing just yesterday and this is running a mobile net v2 model running a mobile net v2 model with wasm edge and I want to first show how to run this app locally so we'll do that this is also in a GitHub repository I have spent some time writing some docs so people can understand and yeah, that's what we'll be doing today so I'll start out, so this is actually a REST application and let's go I'll open in what my main.rs looks like so we are essentially using track tensorflow which is a pretty popular way to run machine learning with REST it's a pretty popular library and it's actually multi-threaded wasm doesn't support multi-threads so we have made some changes into how track is loaded how track does some stuff to still make it work but we are actually using track tensorflow which is a pretty popular way to run machine learning on machine learning with REST it's a pretty popular library and it's actually multi-threaded to make it work but essentially what I want to quickly show over here is that we have a tensorflow model over here which is being loaded we are doing some pre-processing just ideal stuff putting in an image and then we are just running the model.run so the T-Vec is something I also want to talk about which is how wasm interacts with the state and yeah so I'm simply running a model inference over here in REST and what I want to do now is show running this locally to do that I'll first start out with building with building this REST application as a WebAssembly module and to do that I'll specify my target to be wasm32-wazee you do have to add this install the wasm32-wazee target which I've already done so I'll just start with building I'll just start with building this REST application and add the target to be wasm32-wazee so it knows that it needs to it needs to create a .wasm module at the end and this will take a bit of time and I would like to do this live so I'm just going to set the stone over here that this is where this is the step where we're taking a REST function and this could be used with any other programming language as well for example with C++ you'll be using M-scripten and if you're using like TinyGo or any other language this is the step where we're generating the bytecode the wasm bytecode that can be used anywhere so yeah we'll wait for it to run for a while I was expecting it to go faster than this but it seems the internet is a bit slow but later we'll also talk about WAPT very quickly so oh yeah there we have interesting always the case in life now and the most interesting part is it worked yesterday we did this work out okay so regardless of that what I did want to show you was you could so I seem to have some dependency error but I seem to have some dependency error but the GitHub repository does have the right one at least okay so what I wanted to show you is how you could easily make it as a compile it down as a web assembly so what you'd get is a dot wasm file now the dot wasm file can run anywhere and I also wanted to show you AOT compilation which you can also do with wasm time and wasm edge we are just showing an example of AOT compiling with wasm time here and so that would compile it down to a dot wasm file you can also get the linux shared library format to AOT compile this and then I wanted to run it locally I also wanted to talk a bit about the WebAssembly text format so the WebAssembly text format allows you to get your WebAssembly module as a bit readable code which is pretty helpful for debugging so I just wanted to show that you can get a dot wasm file as well all of this is up on a GitHub repository so I mean feel free to try it out for yourself I unfortunately couldn't show it here so I'll go back also what we want to showcase is the future of service functions right so as we move ahead from writing functions and having to use managed hardware or like no resources to be able to run these services we are of course moving towards service platform which inherently has a number of benefits compared to your standard linux containers or services that you might be using instead of the standard virtual machines because you get on demand service and then of course it's very easy to scale it up and down compared to your standard virtual machines so that's where WebAssembly is also moving to be a really popular functional platform as well and that's what we also want to showcase with edge devices as well specifically because as we mentioned earlier that you're some of your popular programming languages that are typically used to write these functional service calls such as Python or JavaScript are great but then they do come with resource limitations and like in a lot of dependencies that might not run on various platforms and that's where WebAssembly comes into the picture and the additional benefit that you also get is the security sandboxing that is typical of WebAssembly and it provides a lot more isolation to your functions that are being written with this entire infrastructure of being able to write them in WebAssembly so those are some of the benefits that come with these serverless computing as well when we are talking in terms of how WebAssembly is now getting into this functional service space if you want to add to that. So for this also we will have a demo that we will quickly demonstrate. We want to start by showing a demo of using Kubernetes and to manage your WebAssembly modules and the main thing we want to talk about when using Kubernetes is so recently a very interesting talk right here at KubeCon yesterday was that Docker was in preview which is pretty interesting they just announced it yesterday so they had essentially written a shim in bosom which allows you to have containers and directly run them with Kubernetes as well. There is also crushlet which is pretty popular and allows you to run WebAssemblies with Kubernetes so what we want to also show over here is something we have done to manage it with Kubernetes and especially we want to show how the shim layer works so you can run your WebAssemblies I have a Kubernetes cluster here in azure and I also have two node pools here so if I just go to node pools and show you that I essentially have two node pools called pool one which is the three nodes for system and the myWasiPool which is a single node and this is a wasi node pool so this is kind of the idea we want to talk about you have a process you have multiple pods you run some of the pods on so not on pool one because that's a system node probably on pool two which is a Linux node and then you run some pods on the myWasiPool so essentially run some of your processes on wasi which can give you some pretty nice improvements over what you might be doing so I have two node pools here and right now what I do is essentially schedule what I want on the wasi node pool which is what I do with this runtime class use it with schedule my pods to be run on the wasi node pool and so I have also given the configuration for spin which is actually taken from the official spin repository but that's again if you want to go beyond edge use cases we'll not really talk about that in this demo we'll just show how you can use slide as your handler but I also just put in the official code from the spin repository so with that we also have we also have our container running over here this is essentially a web assembly converted to a container and you can do that using wasm to oci so wasm to oci is a pretty popular tool which allows you to take your web assembly modules and convert them to and convert them to oci compliant containers and convert them to oci compliant containers so this is what has been done and this is also using the container to use wasm shim this is actually one of the examples that uses container to use wasm shim so we'll just try to run this this is one of the very popularly available containers out there to demonstrate using web assembly modules inside Kubernetes so we'll just show this and so I'll start out with getting all the services I have I do have one load balancer which is called wasm slide created with this configuration itself and I'll get the so this is a very simple example to show web assembly modules running in are you managing it with Kubernetes and if I just call this if I just call the external API and appended out hello to it I should just probably get a hello which is what this container does it just prints out hello but it uses wasm and wasi under the hood to do this to get the system calls and so on we also have some more examples with running a TensorFlow model all in wasm and managing it with Kubernetes but we are already at 10 a.m. so so that was about our talk and thank you so much for hearing us thank you we got time for some questions if anybody's got any raise your hand and we'll bring you the mic you were saying before that you don't think we should run all the workloads as wasm modules there's some that are better fit for a regular container and others that are probably more suitable for wasm which workloads do you see wasm shine like when would you say yes this is a good candidate to use as wasm container instead as a regular one sure so as we gave an example for machine learning over here so when we're doing machine learning in France we need it to be very quick now such kind of more high computational tasks where you could need because it will take time to compute to do the inference so those are kind of tasks where you can use wasm because of the faster load time for actually spinning up a wasm cluster and being able to run a wasm container because of the small size and quick inference you'll want to do all of those such kind of tasks using your wasm container whereas with Docker you could run all these high inference tasks using your Docker container because it's well supported because of the entire ecosystem of Docker containers and being able to make system calls that's where you're going to be using the interface between both Docker and wasm but specifically if you're asking for the use case for wasm it's mainly for doing all these high inference tasks Any other questions? Is there an upward bound I mean I'm thinking about AI models that get very very large sometimes I'm repeating the question please I'm wondering if there's like an upper bound to like the size of an executable or not executable but a bytecode executable for wasm for like large AI models and things like that So it's the question regarding like for the wasm executables for larger models So ideally for larger models I mean you can still use wasm but I would urge you to first evaluate whether the model can be used on edge devices itself I mean a wasm would allow you to raise the bar taking even more larger models and what something like TensorFlow Lite would allow you to do and I've been one of the contributors on TensorFlow Lite myself but the wasm definitely raises the bar when used with TensorFlow Lite but for very large models I think one of the right ways would be to work on optimizing the model itself if it's a pretty large model in terms of wasm and the stock so to get back on to the track the wasm counterpart will raise the bar because of the lower container sizes so one of the examples we show is with the TensorFlow MobileNet V2 TensorFlow MobileNet V2 model a Linux container to run the TensorFlow MobileNet V2 model even if you try to make optimizations in size would be at least 20 to 30 times larger in size than the wasm container so wasm already does a great job at raising the bar in essence if that answers the question