 Hey everyone, my name is Radu and in the next nine and a half minutes we're gonna talk about web assembly and machine learning and We're gonna have a quick look at WASI NN and a new proposal for neural networks in web assembly I'm a software engineer. I work at Microsoft Azure if you've seen a bunch of Deus Labs projects mentioned today I work on the team. We also do a lot of work with the bytecode alliance and the CNCF and TLDR I'm interested in everything that has to do with web assembly machine learning distributed systems and Most often the combination of the of the three I'm a maintainer or contributor for waggy crosslet hippo Bindle the HTTP extension to WASI and WASI NN This talk and a lot of WASI NN has been Contributed by Andrew Brown and Mingq soon from Intel. So this is a Shoutout to Andrew who's I think on a sabbatical. Hope he's enjoying his time off We're gonna talk about a little bit about web assembly and WASI and then machine learning in web assembly. I Paid very close attention. No one actually mentioned that web assembly is neither web nor assembly today And that is absolutely a given at any web assembly conference. So I think a Lot of people talked about what web assembly is and what web assembly isn't But in short, it's just a binary instruction format for a web for a stack based virtual machine But in short, it just lets us run cool stuff on the web. That's not JavaScript only But on top of web assembly, it's web assembly has no mention necessarily of the web You can run web assembly outside the browser pretty well, and that's where WASI cam comes in it's a capability oriented API and essentially is designed to standardize the execution of web assembly outside the browser and What web assembly and WASI give us outside the web is a portable lightweight fast and secure way of executing semi-trusted or untrusted code and If you're just getting started or interested in getting in depth with Anything that has to do with WASI you have to live to read the Lynn Clarks code cartoons They are probably the best resource for anyone getting into web assembly and WASI Why would we want to run Machine learning in WASI? Well, mostly for the same reasons we want to run anything in web assembly We want the portability that web assembly gives us we want the flexibility to run the same web assembly module anywhere and Specifically for machine learning. We want to run the same machine learning model across different architectures We also want to have language Agnostic access to different machine learning run times and more importantly We don't want to keep re-implementing the same CPU instructions and machine learning operators But there are a couple of problems that makes that make running machine learning in web assembly today not ideal First and most obvious is we don't have GPU access in web assembly run times There's also no multi-threading or other hardware acceleration. There are obvious missing missing CPU instructions from web assembly that's Probably no one's gonna add for each machine learning framework and then in general deploying machine learning in production It's just difficult So this is where WASI and N comes in the web as the WASI neural network proposal It's a way of allowing us to load a neural network model into a runtime and Essentially run inferencing on that particular neural network model It's framework and format agnostic. So if you have an implementation, you can run it for By torch tensorflow or any other machine learning framework or model you want to run it for It's much faster than running inferencing in pure web assembly and it's really portable we'll see a recording of The same machine learning model and neural network and runtime on across Raspberry Pi's and Intel CPUs and AMD and GPUs as well the current implementations are for the open Vino model and the onyx runtime and Essentially, it's a pretty simple API. It lets us load up an opaque byte array as a neural network model We can initialize the execution context and bind some inputs as tensors We can compute the inference and then get the output tensors using a Specific get output result the API is written in Wittix Which means for anything that you have either a code gen or you want to manually write bindings for Wittix You can do it. It's not fun But in essence if you want to write the low-level implementation all you have to do is call the four main API functions that Wazian and gives you and you can implement that for any web assembly run time the interesting part is and Here yeah, we're gonna see the same neural network and the same runtime and the same web assembly module run across an Intel Mac book using the web assembly runtime first using the Wazen time Hope that's visible I'll just let it run essentially it takes the same machine learning Model built with PyTorch that does image Inferencing and now runs it on a on a Raspberry Pi the great thing about this you can cross compile web assembly from anywhere and I mentioned earlier we essentially can take the same model and the same Run time and run it in something like waggy The one of the important things is it's more performant than running pure web assembly So you can have a look at the at the benchmarks The actual inference is really really fast it's around two milliseconds tensor Pre-processing is the slower part. It's around 10 times slower than the actual inference The other part is you still have to run pre-processing in non-python, which is not ideal one of the other really interesting proposals we've seen recently is Waziparallel and One of the ideas is running the pre-processing using Waziparallel You can find most of the implementations and resources about this on github Any questions? Go ahead and repeat the question. The question is why it's faster than pure web assembly Essentially there are ways of running inferencing in pure web assembly And the way that usually done is a re-implementation of the machine learning framework That's compiled to web assembly Waziann what it usually does is essentially you can Offload the execution and the inference to the runtime and the runtime has access to hardware acceleration GPU or anything like that So what what you do is through the Waziann API you pass the model bytes You pass the input as a tensor input the runtime does does the inferencing using GPU access And then you get the tensor back Okay, thanks, Radio. That was great. Thank you