 Hello, everyone. My name is Michael Yuan and I'm the CEO and founder of a company called Second State. We do software performance engineering. Welcome to my talk. This is my home office in Austin, Texas during the pandemic. And I hope you enjoy this talk and I hope we can meet in person soon. Let me start sharing my screen first. Move myself to the corner. OK, so here we go. Well, the the topic of this talk is performance engineering for application developers, especially web application developers. So when this topic came up, a lot of people ask why, you know, because for for many years, the main challenge of web application is not performance. That is very evident with the rise of frameworks like first with Java and with Ruby on Rails and with Java script on Node.js, you know, those all those frameworks are not known for their performance. They are known for their productivity, developer productivity, use of use and all those other features safety and all those other features just not performance, right? And over time, they get optimized a lot. But when they first came out, most people would say those are slow languages. When Java first came out, it's definitely a slow language. It has a reputation for that, right? Ruby first came out with Java script was first used on the server. It's the same reaction. And people typically have two answers to that. The first answer is what we call the Bill Gates answer is to say, yes, I know it's kind of slow, but technology evolves so fast. So in three or four months, you know, it'd be fast enough. Well, that's actually the answer Bill Gates gave when people criticized that's the Windows software is too bloated. He basically said, the CPU of a computer is evolving so fast. So what's slow today is not going to be slow tomorrow. So why spend all this time, you know, optimizing things that are going to improve on itself, right? Why wouldn't you spend time that's optimized for user experience and features and other things that are going to have long term value, which I thought worked really well for them or for him and for Microsoft, right? You know, the second argument is, oh, you know, I'm just writing a web application, you know, that's the application. All the activity does is babysitting a database. You know, so someone coming I request, I go to the database and fetch the answer back to them. You know, that's that use case, it doesn't need a whole lot of performance, so to speak, right? You know, so that's, that'll be the second answer. You know, I don't need it. The first answer is going to improve itself. The first second answer is I don't need it. But I think things has changed in the past couple years. That's forced us, you know, for us forced the entire industry to have an entirely different look at this whole problem. So I referred to an article that published the Journal of Science only a couple months ago. There's a top computer scientists published this article. And from the title of this article, you can get a hint, you know, there are plenty of room at the top, the top in the top of the stack, right? Because 40 years ago, Intel scored a moral article called there's plenty of room at the bottom. The bottom of the stack is what is the hardware, the CPU. What he meant, what Dr. Maumind is the CPU and hardware going to continue to improve at a rapid pace. So everything else going to lift with the type, right? You know, so you don't need to spend a lot of time optimising software. You can spend a lot of time optimising software in terms of its developer product, human productivity, not the computer productivity. Because what's slow today going to be fast enough three months from now. So that's what we refer to as more slow. That's basically driven the productivity improvement in the entire computer industry for the past 40 years. So that's essentially the first argument I just gave, right? You know, he said, yes, it is slow, but it's going to improve. However, this article talks about the more slow has really has stopped as we all know, you know, the computers on hardware are not really getting faster. You know, the clock speed is definitely not getting faster. It's the size of the transistor has, you know, has gone down from say, you know, 10, you know, has gone down a little but not a lot, right? You know, the way to get it faster is to do like GPU or add more cores to do more parallel computing and, you know, things of that nature. It's not the raw performance is improving. It's, you get you, you threw more computers at the same problem. Right. So, you know, so that's perhaps that's one of the biggest challenges that's the developers today are going to face. That's did not really exist in the past 40 years. That's a completely new problem that people are going to have. And then there will be the second problem is to say, okay, yeah, it's maybe not getting any faster. But, you know, all I do is database, you know, that's primarily limited by the network bandwidth and hardware and hard disk speed, you know, that's, you know, and if there's more requests, they can just open more servers, you know, there has been a lot of study in terms of horizontal scalability of those services. So we can do sharding and we can do a lot of things. So, you know, that's why performance is so important. Well, that is because another thing happened in the past five years when more will stop is on the software side, we have figured out how to utilize more and more computing power, especially with the emergence of say artificial intelligence deep learning. And all those applications require a huge amount of computing power that is previously not available. And in order to incorporate those features into your application or into a web application that you are building, that is not just database, you would actually need to care about performance, you know, as we will see later in this talk, there's a huge difference in terms of the software stack you choose versus the performance you're going to get. So here is the figure one in the science paper, right, you know, so how do we, so hardware has stopped improving, but the demand for computing power has gone up. What's the solution? So the leading computer scientists give us the answer, you know, they said there's plant here room at the top, meaning today's software stack has a lot of inefficiencies that we can squeeze out by doing software performance engineering. Like I said, that's why I come into this, right, you know, that's why I'm giving this talk. So it's removing the software bloat and tailoring software to hardware, you know, we have new hardware that's designed for specific problems like for deep learning and for, you know, for things like that or even for, you know, for Bitcoin and for blockchain, you know, there's special hardware being developed for that and to tailor software for that, not just to have generic software that designed for CPU to access those hardware. So by combining those, if you read this paper, you know, the paper showed, you know, a Python program by rewriting it in state of arts C plus plus and running on customer hardware, you can get a performance gain of four orders of magnitude. That's over 10,000 times performance improvement. So that's actually also is something that we have seen routinely by, you know, when people have a web application that does database stuff, they don't do AI because not because they don't want to but because they can't, you know, because the software stack is not efficiently enough to support this type of computer intensive applications. And by changing the software stack, we will be able to squeeze a lot more performance without changing the underlying CPU, we may need customizes. Sorry. We may need customized software, we may need customized hardware to be in a mix, but not more CPU part because CPU part is not improving anymore. So let's see a case study or example in action. Okay. So the goal here is to create a cloud application that runs TensorFlow inference on an image. That's, you know, I think that's the use case a lot of people have considered, right, you know, that's I have a web service or web application that would take an image and then around TensorFlow model, there's literally hundreds of thousands of TensorFlow models trained for different purposes. On those images, on this image to do things like facial recognition or object classification, you know, anything you can think of, you know, so this is a typical use case from TensorFlow inference on an image. So depending on the software stack that we use, so if we implement this TensorFlow inference metrics on, say, JavaScript and even run this JavaScript, you know, very efficient runtime. So for instance, in the WebAssembly runtime in V8 in the interpreter mode, you know, meaning the JavaScript running as a way it's supposed to, you know, as a way it's designed, you know, as a scripting language. We are looking at 10 minutes of time on a state of art CPU to process a single image. Okay. 10 minutes. That's clearly too long for web application. You know, it's may okay. It may be okay for, say, asynchronous function of the service that you need to, you know, it's asynchronous. So there's plenty of time on the back end to process those things. But 10 minutes for waiting is clearly not good enough for web applications. So now we re-write it in TensorFlow.js in just in time assembly. You know, so that's a different, that's not in the interpreter mode, but TensorFlow.js is TensorFlow's C library cross-compiled into JavaScript or WebAssembly through the EMCC compiler. And it runs in the V8 WebAssembly with adjusting time compiler optimization. That's speed up significantly because that allows application to be compiled and run instead of being just being interpreted. That's creates, that alone creates 500 times performance gain, you know, it's reduced from 10 minutes to around two seconds. You know, two seconds, I think it's pretty, it's fairly acceptable for web application now, right? You know, that's because the time you need to do the network transmission and all that is in order of seconds. So two seconds is probably good enough for the single interaction like this. However, if we optimize our WebAssembly stack, it's not even go to native, but go, but do WebAssembly with ahead of time optimization. For instance, you know, the SSVM called second state VM is a WebAssembly, open source WebAssembly implementation that we have to run the same application, you can get to half a second, which is four time performance improvement again. You know, that's half a second, I think is very acceptable to, you know, to most, I would say interactive web applications, right? But you can optimize it even further. By giving it custom hardware like GPU and then have the compiler optimize for optimized instructions for GPU, you can get another 10 times performance gain. So it's now goes to 0.05 seconds or less per image. If you think about that, that pretty much allows you to process real time video data because real time video data is about 30 frames per second, right? If you want to recognize every single face on every frame of a video image, of a video stream, you can more or less use this now. You know, that's, you know, you can do 30 seconds per, 30 frames per second may not, but you certainly have the capability to do, you know, that's, but you can sample it to do like once per, you know, once, you know, do one recognition for every three frames, you know, something like that, right? You know, so to do that, that's, that type of performance allows to actually do streaming video and run application on top of it. Then of course you can have four native meaning that you don't use WebAssembly at all, that you just write the code in CC++ and then run on bare metal CPUs and that would cause it to be even less, even more performance, you know, that's would be under 0.05 seconds. That's, so that's the range of things that we can see, you know, so we can see from, you know, from JavaScript being interpreted JavaScript to precompile WebAssembly, that is basically in the same family of technologists because WebAssembly evolves from JavaScript, right? You know, same family of technologists and we already got four orders of magnitude improvement in performance. So that means that I agree with the science authors, there's plenty of room at the top and in this talk, I want to give you some, you know, real examples and tips in terms of how to optimize for your application. So that's the same, you know, it's there's a saying to say there's turtles all the way down in the world is, what is the world being built on? You know, as we, as we have looked at we have just mentioned, you know, things like JavaScript on Node.js when it first came out, people say the performance is not good enough, you know, that's you guys really need to improve performance or fine, let's improve performance. And now most people who use Node.js would think it's performed pretty well, you know, that's because Node.js fundamentally is not a JavaScript framework, it is a JavaScript runtime that build on a C++ engine. So the Google V8 is completely written in C++ and most of the common tasks you use in Node.js like the fire system operation, like image resizing, and you know, those compute intensive tasks, they are all written in native modules in C++, it's just providing a JavaScript as the interface for developers. So what we would argue is Node.js achieved high performance today, not because it's a JavaScript platform, runtime, it's despite it's use of JavaScript, it's most of the heavy left things done by native code, native machine code in C and C++. The same thing happened in Python, you know, if you do heavyweight machine learning and AI applications you often use Python to write your application but you are only writing using Python as an interface, so all your Python calls actually get translated into native modules, they run natively in open CV and intensive flow, all those libraries and modules are written in C++ and compile to machine code. So the trend has become very clear here, you know, these are we need something for developer productivity, you know, JavaScript, Node.js, Python, but on the other side of the things, we need something that is high performance, which is translate everything into native binary code, and for most applications that application developer develop, they set in the middle, you know, so if you write a complex Node.js application, a lot of your logic can be executed in JavaScript and some of them, you know, like, you know, image resizing or fire offerings, like things like that would have gained from the high performance of the native modules, but the more application logic you have in your application, the more business logic you have in your application, the more JavaScript it's going to use, and the more JavaScript it's going to use, it's slower it goes, so for most application developers, it's a balance between those two, right, you know, that's, for very simple applications, I can achieve really high performance because most of, because mostly all I do is to call the native, to call the APIs in Node.js that get translated into native, native function calls, but as my application grows more complex as I'm an enterprise developer, you know, if 80% of my logic is actually built in JavaScript, that's no corresponding native library, then you can expect that application to run pretty slow, so that's, I think that's one of the challenges that's, you know, more and more developers are finding out, you know, is to say Node.js or Python is, it's really performance, if you follow the tutorials, but if you write a ton of your own code, then performance engineering become a really big problem, right, you know, so that's a problem that's, we think, well, you know, at least I think, would be more for more problem problem for future developers. So, because of all that, you know, that's we have seen, you know, the more slow it's not, it's not holding, and the demand for performance is great, and by having choosing different frameworks, you can achieve vastly different performance results, and most of the frameworks today actually go with the route of having a high-level interface and then do the actual computing as native code, right, so that's how we define great performance, is to follow the lead of what Node.js and what Python has done in the past, you know, is that on the performance side, we only have one requirement, is it should be as fast as a native library, you know, so it's if we come up with a new framework in Node.js that does high performance, we say high performance computing, it should make the JavaScript application you're right, as fast as a native library of the Node.js API cost, right, you know, so that's on the speak side, that's the only requirement that we have, however, on the other side, which is a software engineering side, there's lots of other requirements that the framework must be able to satisfy in order for developers to use it, because in the end, I can, you know, if speed is all that matters, then I can write everything in C++ and people are not doing it and people will not do it, because there's the native binary have all kind of problems, we need other things that's other benefits of software engineering that has happened in the past, say 25 or 30 years since Java came along, right, you know, the safety and the security of the runtime, the portability of the software so that I can compile it on my Mac and then run on Linux, right, I don't have I don't need to have the exact same machine on my development environment and on the deployment environment, and I can move it around in different environments, I can manage it, I can allocate resources to it, and I can tell, you know, I can say which resources have access to it, which is not, I can stop it, I can restart it, I can move it to another machine, you know, all kinds of other things that IT folks want. The new framework we need to be able to integrate with existing ecosystems, we can't have something that is brand new, because, you know, the learning curve of something like that will be really sharp, and it takes 10 years for say Ruby Unreal to get popular, right, and also for Node.js, so you know, so something to get a developer mind share, you also need to have it to integrate with a large existing ecosystem, and it needs to have high developer productivity. We are looking to squeeze performance from the top, but it's not in the way that it's screwed, that it's only optimized for the machine, that we also want to make it so that we preserve the software engineering gains that we achieved through the past 30 or 40 years, right, you know, so that we want the way we do it is to native performance, but with all those nice features that managed languages or managed ones that gave us. So is there that sounds like a wish list, right, you know, and I think Node.js and Python have achieved some balance of it, that's precisely why they're successful, but I also mentioned their shortcoming is that if you write, they provide API for that, but they don't have a way for, they don't have the flexibility for developers to write arbitrary JavaScript or Python applications and achieve the same performance as their native API provides. So what's our answer? Well, so the first thing we need to really think about is choosing a battle of programming languages, right, you know, so if you, we want to write high performance application, like the science paper also allowed to the software blobs start with our choice of programming languages. I frequently refer to CC++ as the most efficient, but not for everyone because of all this the safety and the portability and other things that we described. And here's a table of the performance of different languages and the, you know, in terms of execution time and the memory takes for the wrong time. And you see, you know, some languages like Python, it's very, very slow and Java takes a huge amount of memory at wrong time. CC++ is probably would be, can be used as a benchmark, you know, they are, they are small and efficient. And what's the second come? It's Rust, right. Rust is a very popular programming language. It's, well, it's it's popular as a CC++ replacement because it's provided a lot of the high level programming language features to CC++ to make it far more productive to write safe code and secure code in Rust is a CC++ and because of that it is a faster growing programming language and it's also one of it's, I think, five years in a row it's the most beloved programming language by, you know, the stack overflow survey. So, I think Rust would be a pretty good language to choose to write high performance applications that can run inside environments like Node.js which you agree, so that's that's my assessment and if you don't agree, that's you know, email me or, you know get in touch because this talking about which programming language is better is a holy war, right, you know, is that everybody have is their own opinion it's just my opinion. I like Rust that's why I'm giving a talk here, right so however how do we run Rust code on the web, you know so there's there's old way you know is I think that's the predominant way that people do it now is to do is to write it as an as a native module, you know so I create a web web server that is has a native module, right, you know, it's going to be no JS could be a touch it could be a lot of things and it's just write or have a web server that is written in Rust so that it can execute Rust application logic behind the HTTP interface right, you know, so there are a lot of projects done this way and they have seen a lot huge performance gain over other approaches like Node.js within JavaScript, right however, the problem with that is the problem I mentioned it has to run for native code it's not friendly to the code providers it's not even friendly to the IT department because people cannot really audit the native code and even if you are completely trustworthy yourself you say my code has no bugs and has no security problem whatsoever but most of those code depend on particular libraries and those libraries might have problems so running native code is always an issue for big organizations so that's in my opinion that's what hinders the adoption of Rust in big enterprise at this moment because the need to run things in native and even with containers it's not completely safe and it has you know it has all kind of issues which we're going to talk about in a minute so a better way is really to compile to manage the bytecode the approach that Java has pioneered 20 years ago right, you know, so to compile it into a bytecode and run inside the managed VM give it a lot of benefits and this VM today the leading choice of this VM is WebAssembly so here's the co-founder of Docker, he said this last year WebAssembly on the server is the future of computing even for now to say that if WebAssembly and what WebAssembly system interface has existed in 2008 they would not have invented Docker right, you know that is how important it is it's a new high level container that allowed us to run bytecodes that compiled from a highly efficient programming language so that's something that we have not seen in a long time in computing in this industry so I think that's what's most exciting about it so the benefit of WebAssembly VM I think we have touched those points many of those points already are on my slides so the first of course is security because provide a sandbox it protects from yourself and protects from all the dependencies your application has so that you don't have to run them at native and it also provides something called capability based security it's a part of the manageability it allows you when you start up the VM you can specify which file system directory it has access to which environmental variables it has access to so you can have an independent security policy that independent from the operating system itself so security is a big draw for running rust compiled high efficiency code in WebAssembly VM but of course all this is not even an issue if it's not fast, right, so WebAssembly has to meet the because we are talking about high performance computing it has to meet the requirements that this can run at close to native speed so bytecode running WebAssembly should be almost as fast as bytecode compiled to the machine binary as we all see in a minute that is actually true which is maybe surprising to a lot of people of course one-time safety is related to security is to say when an application crashes it should not affect other applications that run in the same system Docker gives you some of that but we're going to talk about the problem with Docker in a minute it's mostly related to performance and the portability of platform independence is also a big draw is that I can write my application on a development machine and I can run Windows or Mac or whatever and then this same application can be executed in a cloud provider I don't even need to know what's operating system the cloud provider is using and what standard libraries it has so it's platform independence is a big draw and the manageability we just talk about that can be moved around you can allocate a resource to it and you know those nice features that VM can give you so here's a graph that's a study that we did for WebAssembly especially we use the second state VM as an example here so those are execution times so the shorter the bar the better the shorter the bar meaning the smaller amount of time takes accomplish the task and the bottom two benchmarks NOP and CATSync are startup times and file system performance one thing that really stands out is that Node.js isn't that bad when compared with Docker native and Docker Node.js is the same application Node.js is only about 100% times slower than the native application that is of course because all those standard benchmarks are actually being optimized in Node.js so if you just want to do a number crunch because those are essentially all number crunch that's you are calculating the binary tree three body problem and body problem and things like that and crunching and those things are very heavily optimized in the Node.js V8 runtime so they are when the JavaScript runtime engine sees something like that it actually knows how to build the very efficient machine native code to actually execute those so that's why it's I would say in a realistic benchmark the Node.js is going to be 10 times or maybe even 100 times slower than the native code if you run everything in JavaScript but because the benchmark is structured it's actually not that bad it's only twice twice as slow but what's really really surprising is in almost all cases the WebAssembly runtime meaning the bytecode the Rust code compiled to bytecode and then run inside the the safe sandbox VM is actually faster than it's running in Docker plus native code so I think many people challenge this result we have shown this result to to different people we can talk about that in a minute people challenge this result to say how is that possible that doesn't sound possible because the machine code even with Docker because Docker has very minimum overhead the machine code should definitely be faster than running inside a sandbox that is because the way virtual machine technology has evolved over the past 20 years pioneered by Java is the AOT compilation the head of time compilation so when the VM sees this WebAssembly bytecode the first thing it does is to compile it to machine code then run it in the process of compiling to machine code it can have many different optimization options that is not available when you compile it to Docker plus native when you compile a Rust or C++ application to full native application you actually get you specify a set of compiler optimization flags that is very generic that it depends on your guess of the target system for instance you're going to say it's an Intel processor it has well yeah it's an X86 processor and optimized for that but for SSVM and WebAssembly knows exactly the type of machine it's running it knows it's an Intel processor or AMD processor it knows how much memory it has so it can intelligently choose the compilation flags that make it run faster because all this is done dynamic at a wrong time not at a compile time anymore the compile time you just generate the intermediate bytecode so that allows the WebAssembly to have even higher performance than native code so that's one of our requirements as fast as native machine code but you can see from those benchmarks the WebAssembly bytecode with the AOT is actually better in terms of compile bytecode that's a very surprising result and because this results that we wrote an article that is peer-reviewed that's already accepted for publication in IEEE software it's going to come out this year it's called lightweight design for serverless function of the service it compares the second state VM against leading serverless containers including Docker Armadon Firecracker and the Google's G-Visor plus Docker you know so this is a research paper that's already peer-reviewed and accepted and if you want a preprint of it let me know and I'll send you one so that's but that's the surprise result is that WebAssembly provides safety and better performance so you don't have to compromise between safety developer productivity against the native code performance you can actually do both so that's a big takeaway from this paper and maybe from this talk as well let's take this so let's look at the application architecture that's a typical WebAssembly based application run inside Node.js so on the front end there will be the Node.js JavaScript run time and so developers can write things in JavaScript as they want to because that's proven again again developers want to write most of their application in JavaScript and when it comes in the run time figures out what the application is actually doing if the application is making Node.js function calls it's typically dedicated to the C++ based module library so that it can go on and gets and executed as native binary but as we said the native binary is all nice if it's audited and if it's safe but a lot of them are not unless it's come prepackaged with Node.js to install third party native libraries so that's the challenge so for the other computation or intensive tasks it could optionally run it into a WebAssembly run time and run it inside WebAssembly and it gets the result returned back to JavaScript so for instance this could be a whole it could be a complete task of image recognition or face recognition so the Node.js run time just passed the image and the model to the WebAssembly run time to access the GPU and do all that stuff do all that magic and this particular WebAssembly code WebAssembly machine code is compiled from a Rust program so for developers they write two pieces of software one is the JavaScript application that handles the web related stuff and then the JavaScript application calls APIs part of the API is the Rust application that the developer has written and the compiled WebAssembly so we're going to show a very concrete example in a minute so let's just and there's also a link if you are interested you can go to that website and see how exactly different components of this system can work together so now let's look at the demo that's AI as a service go to this website secondstate.io it's fast it's not fast it's fast it's a function as a service it's a WebAssembly based function as a service so let me switch my screen share to to this website so here we go the web page it's our demo and it has a lot of tutorials and code we're going to see as an example in a minute so let's go to do face recognition it's it goes to an application that we wrote to say detect faces and this application I think it represents the future meaning that it is a completely static application there's no server here it's just we can put it anywhere in GitHub pages or any static or even hosted locally and what it does it's it's make a request to a serverless function the serverless function is written in WebAssembly written in Rust and compiled WebAssembly and the serverless function would process the request in Ajax and come back to this web page so all you need is a service where you can deploy WebAssembly function and pay for each location there's no need to stand up a server that's 24-7 to process to process image upload and all that stuff so let's do how it works so you choose a file you choose an image and of course you can do a lot more and then you say find faces what it does is that takes the image it makes a web services call it goes to the fast the function of the service the function of the service executes the WebAssembly function and markup first detect all the faces on that image and then draw those green boxes around each face and then it's the return image and the return image back to our browser so that we can see the this fully interactive application it's available 24-7 but you don't need to start a server and pay for the server to make it available so you can come here and use that at any time and as as a whole of the service I'm not paying anything I'm only paying for each request so for all the time inside though it does not generate anything for me so some of you might recognize this is a Nobel Prize season and this is a 1927 Solway conference it's probably the most famous conference in the history of science and there's 29 faces here you may recognize some most people would recognize Albert Einstein but 19 of those people 19 of the 29 people later got a Nobel Prize and some of them got twice so this is perhaps the greatest mind in modern history and that's we submit the image and the computer run a TensorFlow model and detect all the faces and then came back very simple and you spend most of the time uploading the image and getting it back and rendered in the browser that's the actual time to do image recognition on the server is under one second at this point and we can optimize a lot more but I just want to show you this example and then let's see another example image classification for instance that's also another very popular TensorFlow model that I chose an image and I say I already knew it's a doc so I want to classify it it runs a TensorFlow model to say it runs a TensorFlow model to try to detect what is on that image and then give the confidence of that so in this case it doesn't return an image back it returns a JSON string so it looks up it's object classification table and it says it's a pug and it knows it has a very high confidence so this whole process again on the server it takes several milliseconds and then it takes it actually goes to get the image and submit to the server and then come back and all this time I don't need to pay anything as a service provider I don't need to pay anything because I have no server I just have a bunch of HTML and then I have a function of the service sitting somewhere and someone hits it so that's high performance computing plus the new burning model that gives us a lot of flexibility so let's go back to our side so we have just seen an example of AI as a service I welcome you to go to that website and play it with yourself and then see all the source code it's all on GitHub that you can see all of them and then see how but we all walk through one of the source code in the remainder of this talk is how it works this is the demo and the architecture of this thing is what I've just described the overall architecture but let me dive a little deeper into it to see how it works so on the Node.js is a JavaScript app for the web, the network and the database tasks that's handled by function of the service so that's out of the question yet so what we start is the Rust source code is the function itself the function gets the input from the service which is the image that the user uploaded and it does all the heavy lifting of the image processing resize it to the right size select the TensorFlow model and pass the image in TensorFlow model to TensorFlow and then after result come back it does the things like drawing the boxes around faces and then generates the JSON it makes sense what the result is and then gets the result back so it's the heavy liftings all down in the Rust source code and it's compiled into WebAssembly and executed in WebAssembly and then the actual TensorFlow work is actually down to a interface called what we call TensorFlow WASI or AI WASI system interface so because this is fairly standard the system command that requires the full native functionality it doesn't really make sense to run inside WebAssembly because it's so standard there's very little chance that anyone would be able to make it unsafe or so all the benefit of WebAssembly is not here but to make a standard interface in the Rust API so that developer can cause TensorFlow has tremendous value for developer productivity so we have a native wrapper of the TensorFlow command that can be invoked from WebAssembly so WebAssembly runs inside the function of the service that it's a process or the application logic when it encounters the task that needs to do to TensorFlow it passes off to the native command and returns back into WebAssembly WebAssembly process all this again and then return back to Node.js and then give it back to us so that's the whole process of the application so Node.js code again it's very simple it's just to take the image and return it and what's important is the Rust and WebAssembly code and as we see this is the meat of the application this is why you need to write application logic in Rust and WebAssembly because each of these steps will be very slow if you just write them in JavaScript so the first is you load an image that the user has uploaded and you resize it to the model that TensorFlow demands so for image recognition TensorFlow demands 224 by 224 image size with RPG values for each pixel so you resize the format you get the image from the JVG, JPEG or PNG you normalize the format into that particular format TensorFlow can accept and then you load the model data you select which TensorFlow model you want to run against this model this particular this particular image is that face detection is that face recognition or is that object detection so you choose the model data so you choose the data that is associated with the model for instance in this particular example its object classification but all TensorFlow give back a numbers and you have to map the number into the actual words for the object like a dog or cat or uniform or whatever so you need a label file and we read the label file that go with the model here as well then step number four is to actually execute the TensorFlow model which we're going to talk about in the next slide and once we execute the TensorFlow model it gives back a bunch of numbers it gives back an array of tensors that's what TensorFlow is that's a fun fact the tensor was popularized by Albert Einstein which we saw in last picture because he needs that for general relativity so now we are using TensorFlow to recognize a picture of Albert Einstein so that's well anyway just something I thought it would be interesting to mention so in number five we have the command return value as floating numbers and then we look up the table to map those numbers and probabilities into the objects or images that they have if we do face recognition we would load the image again and draw those circles and then we would do it in a way that is high performance much high performance in any JavaScript library can ever achieve and then send the result back to us so that is what we call in our way of doing that what we call high performance function it's the function of written in Rust and the compiled WebAssembly so that it can run at full native speed and it's have access to the operating system it can make it run even faster that's for instance in TensorFlow we don't have to rewrite TensorFlow in Rust we just make a system call from the WebAssembly runtime that shows the how flexible the WebAssembly runtime is on one end it integrates with JavaScript on the other end it integrates with the native operating system so number four step number four we left out is what we call WASI for TensorFlow WebAssembly system interface it's commonly defined as how to use WebAssembly to access system resources like the file system, the sockets random number generator, the clock things like that but we can use WASI to access any command that's available on the system in this case we have the system installed TensorFlow command so we use WASI to access this command by having WebAssembly security at the time of when we initialize the WebAssembly runtime we can say which command you have access to so that it has no unpredictable behavior maybe this particular instance can only have a particular kind of TensorFlow model and not others that we can specify this or when we start up the WebAssembly runtime so that's why we say WebAssembly is more secure and more manageable than native applications right here so to execute this TensorFlow model we specify which family of model it is we just selected a model file but we also need to tell it what's the structure of the model and so there's different commands for that and in this particular case it's MobileNet version 2 and then we give it arguments so the tensor name in this model for the input argument is called input and the argument for the output is softmax and the size of the image that the model expect is 224 by 224 and after that we load the model data and the image data into the command so that the command runs the model against the image so the actual execution of this command is actually in one line and the result comes back and we package the result into a JSON string and then the Rust WebAssembly application continues to process this and return this value so that's the whole application you know that's the application is just a single function that's written in Rust and does most of the work in the compile WebAssembly but when it does need the help from the operating system or from the system installed command like TensorFlow it calls out through a well-defined interface that can be secured so that's our application so as you can see one of the works that we do is we try to make as many types of TensorFlow models available in our ecosystem this is all open source so there's many family of TensorFlow models that we can incorporate as Rust APIs we just have shown you there's a mobile net Rust API that calls the mobile net family of TensorFlow models and each and the mobile net family of TensorFlow models I think hundreds not thousands of models that people train to recognize different things you have a model that's very specific to recognize different types of docs or you have a model that recognize different objects on your desk so there's a lot of ways that you can have different models and so we would like to have open source projects like the TensorFlow model but TensorFlow models for WebAssembly so to have those models being prepackaged and made available to open source projects so that anyone using Rust and WebAssembly to write those high performance applications would be able to use those models by just installing them through their compiler so that's if you're interested in this initiative you can go to the link here in the second state AI as a service native model loop so here are some resources to get you started there's a guide there's tutorials and documentation and again this is linked to the FAS so that's it thank you very much for attending my talk and I hope I'll see you in the community reach out to me if you have questions or want to criticize our approach we welcome all comments and just go to our website and find our contact information thank you very much, I'll talk to you later