 Hello everyone. My name is Ming Tiu Xiang. I'm a senior principal engineer at Intel. Today we're going to talk about machine learning with WebAssembly, together with my colleague Andrew Brown. So here's our agenda. We'll start with the introduction to Wazhi and the Baiko Alliance. Then we'll get into some of the design guidelines for Wazhi and N. Such as, you know, loader versus builder and machine learning model as a virtual prototype. Andrew then will start talk about how to write a Wazhi and N application in Rust and Assembler script, and as a reference implementation architecture on modern time. And he will then do an image classification demo with Wazhi and N, and then followed by performance advantage analysis of Wazhi and N. Then we'll end the presentation with call for action. So what is Wazhi? Wazhi stands for WebAssembly System Interface. It is a modular system interface for WebAssembly and with focus on security and portability. So the body that's driving this specification is a subgroup of W3CE WebAssembly CG, and Wazhi and N is a Wazhi module for neural network, and currently we're in phase two with the Wazhi community group. So let's talk briefly about the Baiko Alliance, which is an open source community dedicated to creating secure new software foundation, you know, built on top of a technology such as WebAssembly, WebAssembly System Interface, interface type, module linking, et cetera. And the Wazhi implementation is definitely an important goal of Baiko Alliance, and the first reference implementation for Wazhi and N is built on top of Wazhi type, which is a Baiko Alliance project. So let's dig deeper into, you know, why we want to do Wazhi and N. So the first question we want to ask is, you know, why do we need WebAssembly for machine learning? And it turns out that people do a lot of machine learning training, and then they typically deploy them to a variety of devices with different architecture and operating systems. That's where the WebAssembly come into the picture, because it provides the ideal portable format for deployment of those models to those devices. So the next question is that, you know, why do you need the WebAssembly System interface, right? The reason is very simple, because typically machine learning workload requires a lot of performance, and the special hardware support acceleration are necessary. You know, examples of AVX-512 for CPU, and the access to GPU, TPU, et cetera. And the current WebAssembly spec only supports a limited set of parallelism, you know, especially with this 128-bit SIMD, so that's clearly not sufficient for near-native performance for those machine learning workload on modern hardware. So later in the presentation, we'll actually demonstrate 7232X performance improvement with this YDNN approach that we are proposing over a pure WebAssembly approach. Let's take a look at the WebAssembly runtime environment and the use cases associated with each of those environments. So there are three different types of environments for WebAssembly to execute. So the first one is the standalone environment, where, you know, a WebAssembly application run on top of a standalone WebAssembly virtual machine, which called into the Wazzy software stack for system APIs. So typically, you know, standalone application or cloud application such as content delivery network, function service, envoy proxy type of cloud middleware, and resource constraint environment such as IoT or embedded environment. You know, those are the typical applications that will make use of a standalone WebAssembly virtual machine. So the second one is, you know, the current mainstream browser environment, where that you have access to a JavaScript virtual machine, where that a WebAssembly virtual machine is a component of it, and they have access to WebAPI through JavaScript environment. That's where the WebNN API lives, and all the browser application or PWA will make use of this environment. So the third category is the Node.js environment, which is a reuse of the VA JavaScript engine in a standalone environment, and it would make use of WebAPI as well as Wazzy API. So this is a case that when you don't have a lot of resource constraint, and especially when you need to have both JavaScript and Wazzy WebAssembly support, this will be an ideal situation for those type of applications. So let's take a look at some of the design decisions we have made for Wazzy NN. In this case, you know, loader versus builder API. So when we started designing Wazzy NN, you know, those are the two options that we could take. And we worked really closely with the WebNN team, and, you know, they went through similar exercise. So in the end, we decided to take the model loader API first and push out the builder API to a second phase. So the reason for that multiple flow and, you know, influencing is obviously the main machine learning use case. And that's our initial focus. So a loader API is really an easy way to support influencing. In terms of future completeness, you know, machine learning is still evolving rapidly with roughly 20% growth of new operations each year. So a builder API will probably take multiple years to be, you know, usable and roughly functional, complete to cover all the major operations versus a loader API, which we will treat a model as an opaque object and we can just define operation related to the machine learning loader and get done with it. So that's a lot easier path to market. And a loader API is a much simpler API with a better IP protection opportunity. And the next slide, we actually talk about how we treat the machine learning, you know, model format so that we can make the machine, the main web assembly program, you know, framework and the model format agnostic. And it's also very easy to support a variety of devices, you know, CPU, GPU, FPGA, TPU for the loader API. So for the model builder API, you know, the main advantage of it is that you can provide like operation specific acceleration. You know, this is very useful for some machine learning framework that lacks support, you know, by some model converters or by some backend machine learning engine. In that case, you know, if you want to compile the specialized machine learning framework completely into web assembly and have some specific operation accelerated by the hardware, then the builder API will be very useful. And we think that we can look into supporting that sort of usage in the phase, second phase of Wadi Anand. And we could potentially leverage the work already done by the WebNM team. Let's talk about machine learning model as a virtualized IO type. Our view of a machine learning model is a virtualized IO type, which you know, you think of a journal Wadi direction. This type is just like a media type with its own data and metadata and defined operation associated with it, such as the load. So the approach we take is to, you know, to push the mapping to actual implementation to the edges. That way we can keep the main web assembly program portable and framework agnostic. So there's really two approaches to this idea. One is that we can perform a model conversion before we do anything else to whatever accept the format for this target platform. Or alternatively, we can let the web assembly virtual machine to dispatch to different machine learning back end, you know, based on metadata available associated with this machine learning model. So it could be, you know, open vinyl TensorFlow or ONNX in this example. So that's the approach to the model machine learning model. And then next Andrew will cover how to write Wadi and that application. Hi, my name is Andrew Brown. I'm a software engineer at Intel. Let me tell you a little bit more about the Wadi and an API. What you see here is the WIDX specification of that API. It's a five function API, including things such as load, initializing and execution context, setting inputs, retrieving outputs, and most importantly, computing the inference using the loaded graph. Here's an example of those bindings in the assembly script. What we're trying to do here is make it a lot easier for users to use Wadi and then from their web assembly applications. Here are the same bindings, but in Rust. You can see that the same functions are exposed, but you have higher level constructs, which should make it a lot easier to use. Though Wadi and then is not tied to any specific web assembly engine or machine learning framework, we did have to start somewhere. And so we used wasm time as that engine and open vinyl is the machine learning framework. This diagram shows you all the various components in our implementation. Starting on the left, you'll see the user application code, which when tied together with the wasm and bindings can get compiled down to a web assembly file. That web assembly file will make use of tensor inputs, for example, from an image and the model file or files. And we'll pass those on to the engine, in our case wasm time. Wasm time provides the wasm in implementation, but when it comes to the machine learning heavy lifting, it proxies calls down to open vinyl, which will execute them on a CPU or GPU, etc. These middle boxes can be swapped out. The engine and the machine learning framework could be swapped out for different implementations. Let me switch over to my ID for this demo. Okay, let's look at an example of classifying this image of a pizza inside wasm time with wasm in enabled from assembly script. So the first thing we're going to need is a version of wasm time with wasm in enabled. That'll look something like the following. What I'm doing here is I'm compiling wasm time with the wasm in feature and using some pre installed open vinyl libraries as the backing implementation. Next, what I'll do is I will place wasm time on my path for convenience and I will tell my path to load up the paths for the open vinyl libraries. Let's go back to our example. So I've written this assembly script code and you can see here that it's using the high level API, the bindings that we provide in this library. It'll load the graph, both the graph description and the graph weights, attempt to use the open vinyl implementation on a CPU, initialize the execution context, load up the tensor, compute the classification and then with the output tensor, it'll sort those results and print out the top five. So to compile this assembly script example into WebAssembly, we're going to need some tools to provide it in this repository and we will run AS build. This uses the assembly script compiler to compile the example I just showed you into WebAssembly. Let's take a look at what it generated here. So if we look at the imports for the optimize.watt file, you can see that it's using wasm in as well as some other wasm APIs, for example, for reading files. Other things that show up in the build directory are an untouched version of that wasm file and both wasm and Watt versions of these compiled artifacts. I've placed in here the AlexNet weights and model files as well as the image and image converted to a raw tensor. We'll use those in our example as inputs to the classification. Okay, let's run the classification. So what it's doing here is loading the graph files. This helper function readBytes is probably not as efficient as it could be. I believe it's reading each byte by byte. And then we'll see it set up the execution context and actually run the classification. Okay, so now we see that it's printed out the top five results for the raw tensor that we passed in of a pizza. But what do these mean? Let's take a look at this classes file that I also placed in our build directory. We'll grab for 963. We see, okay, 963 corresponds to a pizza. So it is identifying a pizza from the image I showed you earlier. So this would be an example of using the assembly script bindings to run wasm and programs from within wasm time. After proving to ourselves that we could make this work, we want to see what are the performance benefits. The intuition was that WebAssembly couldn't make full use of the system capabilities, whether that was longer SIMD or threading or special machine learning instructions. So to do that, we took as a starting point some work that Marat Dukan had done to run TensorFlow models from within the browser. We called that approach the wasm exclusive approach and we adapted it to work within a standalone engine, Node.js. On the right side, you'll see the wasm in approach, which uses a different engine, wasm time, and a different machine learning framework, OpenVINO. But what we're trying to do is compare approaches here, not necessarily engines or machine learning frameworks. We are trying to use the same machine learning model and the same input, which will then pass down to the machine learning framework. So we ran some classifications using both approaches. What you see here is a chart of us running mobile net classification a thousand times and measuring the average inference time on both a wasm exclusive setup and a wasm end setup. You can see that even across various platforms, the wasm end approach is significantly faster. We thought, oh, this is probably because threads are available in wasm end and are not available in the Node.js TensorFlow setup. So we single threaded the wasm end OpenVINO implementation, but still see a significant speedup. We think this is because of longer width SIMD and optimizations that OpenVINO can provide. Just to make sure that our approach made sense, we tested a more complex model. So using the inception model, we again ran classification a thousand times and took the average inference times. You can see that still the wasm end approach is significantly faster even when single threading the implementation. So we took as a summary that there's performance being left on the table and that the wasm end approach to using the full system capabilities is necessary to get the best possible performance. You can see speedups of up to 32x from the wasm end approach. And we hope that this motivates adding new engine implementations of wasm end and adding new machine learning framework implementations to those engines. Finally, call for action. If you are a machine learning practitioner, please go download wasm time and start using wasm end. Tell us what you like, what you don't like, what kind of improvement you would like to see. If you are interested in the wasm end proposal in terms of spec or architecture, please engage us in the wasi community and to help us drive this proposal to final approval. It's a lot easier to make changes right now than later at this proposal in phase two right now. If you are a WebAssembly virtual machine implementer, then you have this opportunity of providing an additional reference implementation for wasm end on your own virtual machine and perhaps with different back ends such as TensorFlow or ONIX or anything else. So that concludes our presentation today. Thank you.