 Hello. So in this talk, we'll be talking about using Wasm for edge workloads. We'll also talk about ACRI for discovering edge devices. And we'll show how you can do this all in the context of Kubernetes. So yeah, let's get right into it. And hello. I'm Rishit. I'm a student at the University of Toronto. And I also research computer vision and machine learning at U of T. And I also do AI research at CIO. Hi, everyone. I'm Shubhai. And I'm a developer relations engineer at Millie Search, which is a Raspace search engine. And I'm also a Wasm Edge ambassador. So Wasm Edge is a WebAssembly runtime. And I've been working primarily in the WebAssembly space for the past two years. So of course, starting off with what exactly is WebAssembly. Now, some of you who have been attending all these sessions at the embedded IoT summit might have probably watched one of the previous sessions at 9.50 that is around edge devices and WebAssembly. So not waste a lot of time talking about WebAssembly, but just a primer for those folks who might be new to the space. So WebAssembly is a binding instruction format. And it's primarily meant for being used, for being able to compile, let's say, source code from multiple languages into this one single binary format that can then be run across multiple platforms. And it's been designed, as I mentioned, as a compilation target for it supports different types of languages, so whether you're looking at highly functional or object-oriented programming languages, and even today, scripting languages. So today, we have support for more than 20-plus languages that can be compiled into the Wasm binary with their appropriate tool chains. And some of the biggest benefits that you get with WebAssembly, that's what we'll be covering in the next few slides. So of course, this is WebAssembly. And now we'll look at some of the features that we get. So the first one is it's pretty efficient, because it performs or it gives you newer native performance. So of course, in comparison to native bytecode, it's not at that similar level if you were using something like a Rust native binary or a C++ native binary. But it does come pretty quick. And we'll see some of the benefits that we get from it being very fast and efficient inside of our Edge workloads. The other one is that it is open source and it is also debuggable. So in today's morning session, there is a reference into how you can debug WebAssembly modules with the help of some VS code extensions. So definitely check that out as well in that particular talk, where they have covered how to debug WebAssembly modules. And of course, it does work for non-Web platforms. It started out as a browser technology. But today, it finds even more usage for cloud, for cloud native, and for Edge workloads, as we'll be showcasing in today's demonstration. And it is an open platform being developed by multiple companies, most of the world's companies today, including Cisco, Adobe, Microsoft, all of them are contributing to the bytecode alliance, which is basically the entire alliance that defines all the different WebAssembly standards, especially for the Edge and for the cloud native use cases. And it is also very safe. So it comes in a sandboxed technology, so it adds safety as well. So we also see how those benefits for safety can be useful, can be used for the Edge applications when we are working with multiple Edge devices at a scale. So just in summary, what we have discussed so far is that WebAssembly, it's a bind instruction format primarily designed as a compilation target, which can be used to compile down your source code functions from multiple languages, and it performs near native performance as well. And also, it's very portable. So that means that it supports most of the different type of architecture, so whether you're talking about Edge x64 or x86, or even risk five based architectures. So you don't have to worry about your Edge devices having different type of system architectures, unless and until you have the WebAssembly interpreter that can be run in those system architectures, then your WebAssembly module that gets generated after you have compiled your code, you can run it across all these different type of system architectures. So that means it has native support for being able to run on most of the different type of Edge devices that we have. And now of course, coming into the fact of how do you actually make the WebAssembly modules interact with the system resources. So we'll go back in time and see how your native applications that you write typically operate, so you have the kernel that is responsible as that authority layer between the application and your system resources. So with the help of syscalls, you're able to get access to the different types of file resources, and that's what the kernel does. And similarly, when you talk about WebAssembly, we have something called as the WebAssembly system interface that does provide you those specific API calls and syscalls that you can make for your WebAssembly module to access your system resources. So whether it's the file or the network or the input-output operations, those are manageable with the help of WebAssembly system interface. So it's what was essentially powering up your capability to access your network resources. So as we'll see in today's demonstration that how these IoT Edge applications can leverage the WebAssembly modules, whatever is happening behind scenes is with the help of Aussie. So this particular diagram that you see is what should give a better understanding. So we have the host OS. In between, we have the WebAssembly system interface, and at the top, we have all of the WebAssembly modules. So the system interface makes syscalls to be able to get access to all those different type of resources that I mentioned. And of course, now coming to the most important aspect of why all of us are over here. And that's Edge is complicated, right? So whether it is with respect to looking at different type of constraints when it comes to Edge, right? So of course, the biggest constraints are with respect to space and with respect to performance of these Edge devices. So if you were to run your standard Linux containers or systems, those can be pretty heavy. And even to actually get them started, the cold start time for Linux containers is pretty huge. Then if you look at security as well, we need to ensure that we take into consideration security for all of the different Edge devices that you are running inside of your cluster. And network connectivity is also something that we typically face a lot of issues when we are setting up the TCP, IP connections, and the empty list connections to manage your communication between all of these different Edge nodes that we are running. And then of course, the biggest one is the heterogeneous ecosystem of different type of Edge devices. So let's say that you're having different node devices that have different functions. So some of them might be doing something like video surveillance. For that, you'll need some type of architecture for video cameras. You might have some other sensors that will have completely different architecture. So those are some of the complications that come when you are working in a multi-node setup and you have different node devices having different functions with completely different architecture. And managing all of those can be very difficult. And that's where some of the superpowers that we get with WebAssembly come into the foray. So because of the fact that WebAssembly has support for multiple platforms, it's very easy to just use it across different types of system architectures. But at the same time, in comparison to containers, you get some inherent benefits in comparison to these containers. So the first one is that WebAssembly is a lot more smaller in terms of the size as well. So the WebAssembly modules do not essentially package a lot of the things that typically your containers package. So the size of your WebAssembly modules is a lot smaller. It essentially is just the application code that you're running. And because of that, the startup time for your WebAssembly modules is also significantly faster in comparison to the Linux containers. So just think of the use case where you have smaller sizes of the WebAssembly modules and the startup time is faster. So that means that the load times when you're running these specially constrained environments within these edge applications or edge devices, it's a lot faster for being able to spin them up and then run them. And then, of course, it's more portable, has support for multiple system architectures. And as we also mentioned, it's smaller as well. And generally speaking, it will be also more secure. So because of the sandbox model of how WebAssembly runs, the WebAssembly modules itself cannot really do anything. So there's a more rule-based approach where you have to directly set up what all resources your WebAssembly modules can use. You have that capability to set up what all it can access. So you have to set it up manually. So that is why it is generally more secure in comparison to your containers. So that is where now Reshith will talk more about the project Akri and how it's useful for detecting different type of edge devices. Yeah. So we talked a lot about WebAssembly being portable, WebAssembly being small and fast. And this is mainly because the WebAssembly module, so with the WebAssembly module, it's not an executable. You also then need a WebAssembly runtime, something like, wasm time, wasm edge, many of these. So you also need that. And which is why it allows you to be portable. And the WebAssembly module itself is smaller. But you also have this WebAssembly runtime, which does some stuff for you. So it does feel a bit like LLVM IR at this moment. And for anyone new, it's not that. But we'll talk a bit more about that. And right now, I also wanted to talk about how Akri comes into play and how wasm can give you superpowers, especially when using Akri. So a lot of the edge devices, so you often face a lot of problems when you have edge devices. And those might be connected to some Kubernetes cluster. And so what Akri allows you to do is you have all these edge devices. And those are treated as the sources in a standard Kubernetes cluster. So it allows you to detect these edge devices. And it's going to work loads with them pretty easily. And if you just see a very quick overview of what Akri does. So we have the standard Kubernetes control plane and nodes. And Akri particularly has these components, the Akri agent and Akri controller. And those allow you to, and you also have the Akri custom broker deployed. So when the custom broker is deployed, it has this discovery handler, which understands that a new edge device has been discovered. And now this custom broker allows you to treat it just as a standard resource. So when you have the Akri agent, what you can do is just schedule pods onto this edge device and still get to use all the benefits of Kubernetes and management and all that. So that's what Akri allows you to do pretty easily. This is also pretty similar to what you would do, let's say, if you have a separate node pool and you want to schedule different kinds of pods on it, what you would do is fundamentally pretty similar. So it allows you to detect these edge devices and also schedule jobs to them very easily. And of course, the inherent question then comes is why use Wasm with Akri, right? So primarily, if we talk about what benefits we receive from Wasm, those can directly be applied to Akri as well, because as Rishit mentioned, that Akri is primarily used for being able to detect these different type of IoT devices, edge devices, and then just add them as custom resources that you will have similar to how we have volumes in communities. Those can be added as resources to your communities' node pools. So primarily, we are leveraging all the different benefits that we get with the help of WebAssembly and just applying them to Akri. So whether we talk about the security enhancements that you get with WebAssembly, that's one of the reasons why we want to use in order to have a better security for your devices that you are discovering inside of your entire communities node pool. And then, of course, second one is the performance benefits that you get with the help of WebAssembly over containers. And primarily, the third one is as we spoke about the discovery feature of Akri. We are detecting different types of edge devices. So we are using the portability of Wasm to be able to then support all of those different type of edge devices that are being discovered by Akri. So we'll see in today's demonstration how we can leverage both of them together in order to create highly scalable edge applications. So let's talk a bit about what can we do with Wasm and Akri and how does this all pan out. So we understand what Akri does right now, which is pretty simple to understand how it detects and then schedules pods on them. And right now, we have something looking like this. We have the control plane on some node. And we have the user node where we want most of the stuff to happen. And then we have our edge node, which is, for now, let's just say it's a camera. And we want to use this. The edge cluster is a K3S cluster, a standard Kubernetes cluster, could be anything. And you still want to use all the benefits and management of Kubernetes. So you already have this. And when you deploy Akri, you have the Akri controller deployed as well, which talks to the API server. And so this will also help schedule pods with the API server. But once you tell Akri what protocol it wants to discover devices on, we have the Akri agent and the Akri discovery handler. And that discovers this camera. Once that's done, you have the Wasm broker come into play. And the Wasm broker allows you to now use this camera as a resource within the user node. So wasm broker will facilitate doing this. And the Wasm work load can then be run on the edge device directly. So you still get to use the edge device, which is on the same network, and manage it through the user node. So the Wasm work load. And there is also the shim. So the shim would pretty much be container dshim, or if it's just an edge device on the network, you can also just use some Wasm runtime. So what any of these shims allow you to do is have these Wasm workloads or web assemblies run inside containers. So what you want to do is run these and have the feel of running a container. But of course, these will consume a lot less space and be a lot faster than containers. We'll talk about why they are a lot faster. So that's essentially what you want to do with the shim, have the web assemblies be converted to OCI's. And with that, I want to talk a bit more about making Wasm work for the edge. So we talked about how we can probably redo some of the ACRI stuff to make it work, to make it work with Wasm and get all the benefits of Wasm for the edge. But I want to talk a bit more about Wasm for the edge. So let's try to understand what is happening here. We have all of these languages which are converted to dot Wasm. And so this is a process. This first gets converted to LLVM IR which is then converted to dot Wasm. And the dot Wasm is just this instruction format. And you can run this on any machine. You can run the same dot Wasm on any machine. And that is because when you run it on any of these machines, you also need to have the web assembly module itself. But you also need to have some runtime. And so the runtime is just deciding how to run the web assembly module. And if you also compare the size of the web assembly module and the runtime itself, both of them combined are also lower than standard containers. And so this is why it becomes really portable. You can now run the same dot Wasm on everything. So something very popular with a lot of edge deployments is just JIT or ahead of time compiling it. And you can do the same with web assembly modules as well. But you happen to sacrifice portability to do so. Because these AOT-compiled modules will be run from hosts that are compatible with the target environment of the AOT-compiled module. And so you sacrifice the portability aspect. But when working with non-trivial programs, you will see that AOT compiling will achieve higher performance than an interpreter or even a JIT-enabled run. But what you would also do is, once you AOT-compile this down, the size of your web assembly module itself will be increased. Because many elements of the Wasm runtime, at least the elements of the Wasm runtime which are needed to run that web assembly module, will also be compiled into the binary. And that improves performance by quite a lot of margin. But you happen to sacrifice portability. And your web assembly module itself becomes larger when you try to AOT-compile it. So now, if you think about AOT-compiling, and you still have some of the drawbacks of Wasm, of it not being native performance and near native performance. So which is why I want to talk a bit about making Wasm work for the edge and how you get over these limitations. So before that, let's take an experimental look at this. So we'll take an experimental look where I want to run a compute intensive workload with Wasm and do it better. And that should help us better understand how you make it work for the edge. So what we'll do is, we have neural radiance fields or nerfs which are pretty popular. And what we want to do is generate novel views. So we have one of this image of a car which we took from some edge device. And what we want to do is tell it that how would the car look like if the camera was placed here instead? And the machine learning model then decides what to do and produces this target image. So this is what we want to do. This is immensely costly. You can also take a look at the original NERF paper. So this is immensely costly. And this also happens in multiple stages where you would actually train the model. And then in runtime, you just tune some parts of the model and then produce these novel views. So this often gets pretty compute intensive. And it's nowhere near real time at all. So there is already mobile NERFs, which makes it possible for NERFs to run on edge. And it's still quite compute intensive. And on a lot of edge devices, at least the standard ones where we consider 4GB RAM and just simple CPU, you wouldn't still be able to get more than 10 to 15 FPS, which can be considered partly real time. So mobile NERFs already does this, which is pretty awesome. And they do this with vanilla JavaScript as well. So they run NERFs with some algorithmic modifications that allows you to run it partly on the edge. So the question now we want to ask is, can we use Wasm to make this faster? And along that, we'll also talk about how you make Wasm really work for the edge. So first off, we're doing the same ART compilation. So doing the same ART compilation at least for NERFs gets us to 14% larger artifacts. But we get to 32% faster. And this is also including the rendering the neural field. So it's rendering the 3D model of the combined with all the novel views we have. So if you take a look at this car, we also need to render the 3D model of it so with all the novel views from all angles. So ART compiling actually improves the rendering the neural field part by 32%. And I mainly include percentages here because just to get a relative understanding of how optimizations to Wasm help this. And this is pretty interesting because for other kinds of machine learning models, especially if you take a standard mobile net, you get to 20% smaller artifacts. And a Wasm mobile net module comes down to just 3.3 MBs, which is really, really cool. And it's also 40% faster. So you beat standard Node.js Wasm runtime. And you also beat TensorFlow Lite, standard TensorFlow Lite running. And Wasm with ART compilation gets it down really fast. So if you see the benchmark I did over here, it happens to classify a single image on the mobile net model in 500 milliseconds, which is quite faster than TensorFlow, what TensorFlow Lite does. So that's pretty interesting. And just ART compiling also gets this module down to just 3.3 MBs, which you can directly run. So that's pretty nice, what we can do with ART. But at least for NERS, this seemed to help quite a lot with inference line. Another thing you could do with WebAssembly modules is doing some initialization or warm-up. So this allows you to understand what instructions in the WebAssembly format or what parts of the interpreter are actually needed. And this is actually easily doable yourself. But there are also exist projects like Visdor, which can help you quite a lot in initializing or warming up WebAssembly binaries. And in our case, that's also pretty useful because the pre-trained weights that we have from the earlier stages of the NERF are so we have quite a few pre-trained weights from the earlier stages of the NERF. And especially in case of machine learning models, you'll often see that just warming up the WebAssembly binaries gets you to at least 50% increase in inference times. So if you just take a standard mobile-net model where you don't have to do any neural rendering and all that, you just have weights and a DAG, and you compute it, you get to directly 132% faster just by initializing the WebAssembly binary, which is pretty interesting to see. And this also gets down the size by quite a lot because you now observe that most of the nodes or a lot of the instructions, at least for standard machine learning models and how they are converted to Wasm, how TensorFlow models are being converted to Wasm, or how even PyTorch models are being converted to Wasm. A lot of these nodes and instructions still remain. So you get to a lot faster just by initializing the WebAssembly binary. And there's also link time optimizations which you might be aware with if you worked with LLVM. And LLVM does LTO really, really well. And Wasm doesn't happen to do it so well. And this is mainly because the WebAssembly module you have is still instructions. And it's not that compile time, the compiler is actively trying to prove what parts of the program are unreachable or what inlining will help the program best. So the compiler is not trying to actively prove inlining and what parts of the code and what can be improved in code locality, which is why link time optimizations when you try to do link time optimizations on Wasm, you actually see quite some improvement. And so this improvement mainly comes from code locality and inlining functions. And for the nodes, this gets us even smaller and even faster. The link time optimizations do have a marginal improvement, but it gets us even faster. And there's also quite a few other optimizations you can do, especially using the binarian IR, which is an IR that is specially made for WebAssembly. So and the binarian IR does a lot more. The compiler actually tries to prove unreachable parts of the code, prove unreachable parts of the Wasm runtime, and eliminates those as well. So the binarian IR is pretty helpful. And the ideal workflow with WebAssembly modules is you convert a WebAssembly module to binarian IR, and then you convert it back to a WebAssembly module, which is a lot faster and slower. There is more certainly intrusive stuff, which is pretty much self-explanatory. There are things like do not throw exceptions and stuff in Rust, which makes WebAssembly modules a lot faster, but we'll not talk about this. This is very different from codebase to codebase. And there's also LLVM's optimization process. So what's happening is if you try to do a size profiling of the WebAssembly module itself, LLVM IR is what's generated before it being a WebAssembly module. So if you try to do size profiling of the WebAssembly module, you'll notice that usually what holds true is, so the WebAssembly module is smaller than LLVM IR, but usually what holds true is the parts which have the most usage in the LLVM IR will also take up more space and increase execution in the WebAssembly module. So and of course, there's a lot of information loss when going from LLVM IR to a WebAssembly module. So what I suggest doing is always instead of trying to profile the WebAssembly module, which is quite hard to do, and there's not a lot of information to debug from it, I always suggest size profiling the LLVM IR generated. And this allows you to get a better understanding of what parts of the WebAssembly module you can better optimize. And if you also make these optimizations to optimize the LLVM IR, it of course means that the corresponding WebAssembly module will also be smaller, or smaller and faster. So that's another thing I always suggest. There are also a few other optimizations, but these are the ones which made the most sense for NERF. And with these optimizations, what we are able to do is get NERF running at native speed. So we'll also see this. And with this, what we get is we get NERF running at 40 FPS in real time on an edge device. And that corresponds to native execution. So had it been written in C and optimized, this would pretty much be the performance you could expect. And that's pretty well doable with WebAssembly, which is really interesting to see. So this is what we want to do. So this is what we want to be able to do, have the edge cluster, have the ACRI agents, and the Wasm workload now is running this NERF model. So that is our Wasm workload right now. And so there's also this edge camera, which first needs to be discovered, and then pods need to be scheduled to it. So what we do instead just for demonstrating this NERF is also quite finicky to run. So what we do to demonstrate this is have a simulated camera, which just gives you four to eight images. So I also show how this works. But right now the setup I have is you have the simulated camera, which gives you four images of an object. So four different images of an object. And the NERF model identifies these are the camera poses from which the image was taken, and then generates novel views from those images. So novel views would be what if the camera pose was not one of these four positions we put in? What would the camera image look like? So that's what the Wasm workload will do for us. It's just running NERF under the hand. And so this is what we want to be able to do. And I'll quickly run through the demo. And we'll do this a bit faster, which is why I've also recorded this. But what I'm trying to do right now is just create a Wasm-Wazee node. So this one's equipped with RunWazee. And allows you to run WebAssembly workloads. So I'm just creating three containers to do exactly that. And those three containers are now created. And then I just see that I have three more nodes. Oh, sorry, I meant nodes. So you could just see that you had three more nodes which are all equipped for running Wasm workloads. And once you do this, let's come to, so now that we have it all set up, what we'll do right now is just try to install Acree. And that should have the Discovery handler up and running. And also have the Acree agent, Acree controller. So we'll be equipped to take on and discover any edge devices. In our case, the edge devices, just a standard simulator. But once we do this, I'll also get to showing the pods. So once we do this, we should be able to see agent and controller pods. So that's because I've already deployed the Helm shot. And once you try to see the pods, actually, let's just show the controller pods first. And so we see that the controller has been deployed. So if you recall the diagram, this was the Acree controller. And that has been deployed. And we'll also have the Acree agent be deployed. So with this, we can now define how we want the Discovery handler to identify new edge devices. So we can take a look to that. And of course, there's Wasm to OCI, which I just started. So this is more related to the Wasm workloads. So we'll not talk more about this, at least in this talk. But Wasm to OCI is a really cool tool that allows you to convert Wasm modules to OCI. And what we have is the Discovery handler here. And so this is adapted from just the Acree standard, which is you tell it the protocol on which you want to discover new devices. And I also had a Kubernetes deployment, which is that whenever you have an edge device discovered and it has the capability to take on more pods, you just deploy the Wasm workload to it. So we just saw the Discovery handler right now. And Wasm pods will be deployed once the Discovery handler understands that. So what this looks like is actually running the NERF itself. So this is now running the NERF. And we actually get to 40 FPS with this. Well, this is running mobile NERF. And if deployed just using vanilla JavaScript, at least on this machine, we only get to 15 FPS. And Wasm actually allows us to get to 40 FPS and get us to more than pretty much faster than what vanilla JavaScript would have been able to do. And this is also another example. So this example is actually taken from eight images. I give it eight poses. And then the NERF reconstructs it. So if he constructs the image from those eight poses, the chair one actually was just on four different camera poses. And this one is on eight, which is why it also took a lot more time to render some parts of it. And yeah, so as you saw that, we primarily showcased through this demonstration of how you can leverage the Wasm workloads with the help of Acree. But there are some best practices that you can leverage. So Azure comes with this edge essentials that allows you to very easily bring community support for being able to manage multiple IoT-enabled devices on, let's say, with the help of Acree specifically on, let's say, a Windows workload, a Windows machine. So you can very easily set up all of your edge devices. And those can be automatically discovered. So that actually comes with some predefined configurations for deployment and for the resource. That is required for discovering and then setting up these different devices on your community's workstation. So all of those can be very easily done with the help of Azure Edge Essentials. So another very important thing to keep in mind when you're dealing with WebAssembly and with Acree is the device strategy. Because the most important aspect is that when you're dealing with different types of devices, how you discover them and how they will be defined as different node pools within your community's cluster needs to be very clearly defined in order to run well with your Wasm workloads. So those are some of the best practices when it comes to, when you're working with Acree. And of course, if you have any questions, we'll be more than happy to take those up during the Q&A. But with that, we'll conclude our talk. Thank you so much. And we'll be open to questions now. Also, you can, so we kind of rushed through the demo of actually running the node and using Acree to discover it or to be able to strike a balance between covering content and still showing the demo. But you can most certainly try it out for yourself. I'll also make all the optimization passes which I was talking about briefly open source. So yeah, you can most certainly try those things out for yourself using Acree. Yes, so at the end when I was showing you native execution, so a lot of the nerf is not, it's not possible or at least not feasible to implement in C directly. So yeah, yeah, so yeah, sorry. So when I was trying to, when I was saying I compare it, so it was being compared with one large script. So all the computations being done in one large script. Yes, yes, so at least for this demo, all of it was CPU. Yeah, so the question was, it should be more useful for AGI, but not so much for other real-time stuff. And well, yes, wasm will always be slower than directly going to the byte code. And that's for sure, but there's also this balance routine, portability and security and all the other benefits of wasm, so yes, it's more certainly slower and then you'll have to strike the trade-off. Thank you, we're over time now. Thank you. That's all.