 Welcome everyone so and welcome to the to the conference to your Python So My name is Leandro, and I am a software engineer at arm in the UK And I'm here to present you this talk about with an introduction about Apache TVM and the idea is for me to Show you why It exists as a project which gap it tries to bridge and Then I will show you some of the main features we have and a couple of demos So to start we will see an overview about how to install Then a list of features and main functionalities you can access when using TVM a Couple demos with command line and our Python API and then some final remarks Also like to mention that my involvement with the project so I've been working With this project for about two years and a half and I I'm a committer for TVM and one of the maintainers of the TVMC command line So to start I would like to explain what Apache TVM is as a project and what is the gap that it feels as Some of you might be aware the Ecosystem for neural networks and the ecosystem for deep learning is quite complex And I think this picture summarizes it very well So from the top side you can see that lots of groups are concerned with implementing input frameworks Operators so that you can create your own neural network models So you see over there TensorFlow, Keras, MXNet, PyTorch So these projects are concerned with creating those operators so that people can create their models To accomplish something be that image recognition Speech recognition text to speech this all these sort of cool things we see with deep learning every day On the bottom side you can see that other groups and companies and lots of people are creating harder to accelerate deep learning models because these Models are Some of them are quite big and we want them to run with good performance. So We try to squeeze every kind of a little gain in performance In many ways as possible So be that with hardware be that with a compilation be that with specialized libraries that Know how to run these internal operators very well But then there is this so there are if you are Implementing or creating a new harder to accelerate deep learning you want to be supported by as many input frameworks as possible and on the other side if you are working on an input framework such as TensorFlow or PyTorch and this sort of thing You are interested to be supported by I mean as much hardware as possible and have access to those libraries that Can give you gains in performance? So TVM Is as a project it exists to breach that gap from many input frameworks supported to the ability to Support your hardware and support your Functionals that will make your models to perform well So in that context so Apache TVM is an open source project and It defines itself as an end-to-end machine learning compiler framework for CPUs GPUs and accelerators So translating that into some more kind of a concrete pieces So imagine you have a model in one of those Formats in there such as TensorFlow PyTorch TF light and this sort of thing You will use TVM to target those models to be run on your CPUs GPUs accelerators and Libraries as well. So you will get a mix of Which parts of your model are more suitable to run on each of those target Hardware and or libraries So Apache TVM is implemented In C++ and Python, but it presents itself as a Python first framework Which means that every major feature is Supported in Python and it has an API to be accessed in Python as a community We have been working together for Well some time to get our Project exposed to developers and Python developers, especially such such as yourselves And we have been working on providing a Python package on PyPI so it's currently literally in the last weeks we've been releasing a TVM 0.9 and It should have a pre-release package available from from this morning checked and You can install using this command So that there might be a couple things in there Which is that dash dash pre is to allow Non-release versions of the package and there is also the square brackets part TVMC which is to set and install a set of dependencies that are not installed by default when you install the package so it's It is this way because there are multiple audiences so you can use TVM as An API to integrate it with your project But also we want to have it to have this version with kind of all Batteries included so that we have everything you need to run TVM as an API or as a command line tool There is there are other options. So the build is highly customizable. You can Include and remove optional parts as you wish It's very well documented the build process. You can check on that URL As we have lots of dependencies because we support lots of things. That's kind of a one of the Main features of the project is that support is very wide in the in the deep learning ecosystem We have and we make available some Docker image with all the dependencies that you need to have a standard build So you can you can check that on the Docker repository as well Now some Coverage on how to use Apache TVM. So there are two Main ways as I mentioned there is an API that you can access to do that those transformations on your models targeting to a specific hardware So API's there are two API's who Allow you to access most and all the features which are the Python API in the C++ API So here we're talking More about Python API. There is some documentation You can check about how to use TVM with with its C++ API And there is also a more User-friendly so if you are used with compilation tools Which in practice is what TVM is so if you are used to compilation tool such as clang gcc and and all They compiler ecosystem. We provide this command line interface Which we call TVMC as part of the project so that you don't need to go there and get to know all the internal parts in order to make Standard workflow to run. So this is We'll cover the command line interface today as well as one of the Python API's That you can use to access TVM. So the highest level API that you can use to access TVM main features Yeah, so as I said TVMC it exposes Some features of Apache TVM to end users So that is a user who wants to Build TVM or have TVM installed on there on their machines and want to accomplish a task using TVM as a piece so You will fit that piece on your workflow and you you have kind of Limited expectations on how much you need to understand how that works internally, right? So you should be able to just use TVM as a tool on your toolbox. That's that's the idea So it comes in two flavors one is the command line tool that we will see I just just to give you an idea so that's kind of just Kind of a small tool that you can use to access features And also we make it available as a high-level Python API So that you can accomplish most of the things that you can do with the command line via Python scripts TVMC is part of TVM so there is no need to install two things you install you install TVM as a Python package you have access to TVMC and the main advantage is that When we upstream and we made a TVMC available There was this thing that every time you wanted to use TVM you would need to create this boilerplate and this would break if the API changes and Also, you would need to repeat that for as many models as you have our projects and this sort of thing so there's a way to reduce this boilerplate a code required and also to Follow what is the best practice for example to compile a model? What's the best practice to tune a model and make that very specific and with good performance or trying to improve performance for your target hardware we Made that official as part of the command line tool so that you always have access to Across time. What is the best way to accomplish that high-level task? So The command line tool I'll go briefly on on how it is organized So you you will see that as which is kind of the fashion these days You have a tool and then you have subcommands to expose features So you see TVMC and then a subcommand with the specific options for that There are Four subcommands these days So we have TVMC compile which we use to translate a given input model so get an onyx model for example from the onyx deep learning framework and you will build a compiled version of that model and So if you understand a little bit about the compiler's ecosystem, it will Well, it can use and we will use by default LLVM to generate native code So you will be running native natively on your own machine that model Translated from the high-level framework There is also TVMC tune. That's a green box TVMC tune Allows you to try many version many versions of the model with different compiler optimizations to see which one gets performance improvements on your machine and Then it gives you a report of how I mean what it found it found Which runs best on on this specific target that you asked Then the TVMC run the orange box so that allows you to execute a specific model so you compile the model you generate a Package with that model and then you can run that which is the final goal to generate predictions for from a specific model So if your model is for image recognition, so you do TVMC run you will be able to collect some of the outputs from that particular model Finally the great box that's a TVMC micro so Many of these models they can fit into microcontrollers, so very small boards like raspberry pi Pico the last one and Many other boards from from different vendors so you can generate those Generate code for a particular model that can fit into those small microcontroller boards and This is a set of Features that allow you to Automate that process so there are some features in TVM that allow you to just translate your model from the input framework to see and then We use that to integrate that model with an embedded project and then you can run that on on a small board so Going a little different to TVMC compile So there are a lot of input frameworks that we support So if you have a Keros model on X paddle paddle Pytorch and TensorFlow You can use a TVMC compiled transparently just to get that model and generate code for that model so you can have a Compiled version of that model so they wording I'm using to Say that output is a compiled module because that is something you can integrate with your projects in many ways so if you know a little bit about Compilation and an output formats for compilation So one of the outputs of these TVMC compile can be a shared library So a dot SO file on Linux for example that you can use to Have access to your model functionalities via C++ But we also offer facilities to load that Using Python and then using the TVM runtime to execute that model locally on your machine So it's it's native code running on your processor. There are not so many layers Like using an interpreted runtime for example Yeah, so you can the last bullet in there is a cross compilation so you can cross compile again taking advantage of cold gen cold gen like LLVM and In that way you can generate a Compiled version of that model for a for a machine which is not the same That you have running Now TVMC run so the way TVMC Run works is it gets a compiled module from the output from the previous command Plus some input tensors So how do you want to which input do you want to pass that model and then it will generate predictions? So it will Give you the output of that that the model would give you when you run it So you have access to those tensors you have access to those rays and then You can interpret that as as the output of your model It also supports some cool features like profiling You can see if you are somebody who is working on the side of creating neural network models You can see what is the cost in time of running each layer of your model So you it can generate quite detailed prediction out profiling output and Also, it has something which is very interesting. You can using other TVM features such as remote execution you can execute your model Remotely on on a different machine You just need to have the TVM run time on the remote machine running and there are facilities and everything Within TVM to run Via an RPC tracker so you can have a pool of machines that are ready to execute your model. You have your TVMC command line and you just kind of will dispatch jobs on those remote machines to run your model and accomplish your task So if you have if you are integrating with some hardware, which is not the same hardware that you are using to develop this is very handy because then it gives you that Integrated mechanism to run that using all TVM parts of the TVM project Then TVMC tune. I guess just going one step back So the tuning process is something that is particular to TVM is one of its main differentiators so to speak and it aims to Find configurations in the model or activate and deactivate Compilation optimizations to make or to try getting better configurations for your model So it will run and it takes some time To find those configurations as in intermediate shape sizes and this sort of thing In order to get your model to run faster on a given machine So if you are running your model on on our GPU, for example, it might benefit from having data in particular shapes to take advantage of your GPU architecture and The output of this Are tuning logs so tuning logs they will list Very in a very detailed way for each operator on your model using this configuration. This is the performance The process achieved and it will try to iterate that over time and improve The the configuration so that if it finds a way that it run faster Then it will try to evolve that And then over time giving you or trying to give you performance gains Yeah, so the other thing just coming back to this is that so these results then Once you run a session of tuning these results you can run to do further tuning and Get and trying to find even better even better configurations or you can use this as an input for compilation So if you remember two steps before you can on TV MC compile You can provide tuning and then it says yes, by the way I know that these shapes in these layers and everything Will run faster in this machine. So generate me a model which takes that into consideration then finally TV MC micro this was implemented in the end of last year and the idea is to provide some facilities to Simplify the workflow of starting from a given model make that model to run on an embedded Operating system such as Zephyr for example if you are familiar with embedded operating systems it can also generate Arduino style projects and it will From behind the scenes what this does it will put the right sources in the right places And it will code gen the model in the right way so that you can transparently just Make that project to compile with the model and make that model to run on an embedded device And by embedded device. I mean more Microcontroller in microcontroller space So you start with a compiled module plus a templated project So so this template project knows where the sources need to be correctly So when you generate your model as a C source file Then it uses that template project that it knows where to put them so that it generates a valid project then it generates that embedded project and You can put them in your target hardware So if you look for Zephyr, so Zephyr is a very big embedded project Embedded operating system and you can transparently use in this tools to generate that Now a little bit of a Comment about inputs and outputs so when you are implementing a tool which aims to be generic such as TV MC and TV M You need to provide a generic way so that people can input Data on your models and read the output So every model or every class of models seem Tends to have a particular way that you Want to format your inputs so that the model accepts it for example If you have a image recognition model It will expect that the inputs is not just an image or a random image it will expect the image to be on a particular size and The colors on that image should not be kind of any colors or in any color space So it expects the for the the input format to be quite strict And if you don't do that you won't get the full benefits of the model we were using so because of that Which is particular to every model when implementing this tool we decided to go with Very simple input format which is serialized numpy arrays So you can the format is very documented and is very simple to generate models and to to export and import from those from the from that format and And the idea is that if you are integrating this tool with your model and you know You will need to have access to that information Then you will work with inputs and outputs using that npz format I will I will show an example on how to do that Yeah, so it's it can be integrated very simply So usually in this place you already have numpy as a dependency, so you already have it in there and the The idea is that just before giving the inputs to your model you need to convert them to npz and Once you get the outputs you will need to parse the outputs as an npz array Yeah, okay, so I will show how that is done and kind of a full workflow with that when I do the first demo which is soon So then when we were implementing TVMC as a command line tool for TVM then people noticed that The API we were creating to generate the command lines It was a very good obstruction level to have access to TVM features then some work was done in order to make those APIs to be officially Provided by the project so that you can use a set of APIs which is stable to Do a common workflow such as compile tune models run models the sort of common task in a way that It will be more stable than the underlying APIs that are used by these commands So if you follow using the APIs on this module Then you are less impacted by internal API changes so you can keep taking advantage of TVM features and Not have kind of breakages and everything all the time now So this will be our second demo for today, and I think now I'm gonna start So on the first one as I said the idea is to do a full workflow with the command line and I'm gonna show this today, but this is also available As a tutorial in TVM Which was So I was initially involved in the writing of the first version and then it was improved by the by the community and the idea is that it has a Detailed explanation of every step we are doing you can reproduce this on your own machine and You will hopefully achieve the same results that we will achieve in here So to start just going back to my terminal and The the idea is that on the first step we will download a model. So this model is a ResNet 50 so it's an image recognition model So which means that you provide an image to it and it will say what is the most relevant Feature in this image. So what is the main characters of main character of this image and with which? Certainty so to speak The So yeah, so the first step is to obtain the model as I was Planning and as I am planning to do this live. I already downloaded the model So the model is this file here is an onyx model. So it uses the onyx format to implement the model there are other Input formats that implement the same model. So for you to have an idea But for this demo we are using this one and then the first step is basically to compile the module and Generate some code out of it. So the idea is that we will run TV MC Compile We need to provide some Target, so I will use LLVM. So LLVM will be used to generate machine code for this specific model And it also Just to show you that's that's not Kind of that's not a mandatory argument or anything I'll just show you some of the internal layers that TVM uses to in order to Code gen this model. So I will first So that keeps the same I will show you the LLVM code it generates and I will also show you something which is Internal to TVM and a specific to TVM which is called the relay IR so Just to understand a little bit of how the process works internally to TVM you get this model which is in your chosen Input format TVM will convert it to relay IR which is particular to TVM and then from relay IR It will go to the back end. So it will go to the Layers closer to the hardware. So it will code gen or it will integrate with any libraries that you want to have it integrated with and Yeah, but always starts from converting a model from a particular format into relay IR Now I would just set the output name so that I will call it module So we just we just start the file so that we don't generate kind of a scattered set of files and Just yeah, just as a terrible and then the model So this takes about a minute in order to convert this model so we can just wait it in there and I guess while this is running I can give you a comment about What are the next steps in the process? So we will use this compiled module To generate some predictions and then we will read those predictions and interpret them So as I mentioned before so every model will will generate and a race basically some sets of Numbers and those numbers we need to interpret them As kind of a users in order to generate something that is that we can interpret so if generates This model will generate kind of on a thousand different classes So which ones? What is the probability of that being of that image to belong to that particular class and But that image as itself it Reading that and interpreting that is quite hard So we can Just write a small script that will open that and we will generate some information that we can read in a simpler way Now, okay, so that's done as you can see there is that message one or more operators have not been tuned please tune your model for better performance and This is basically Saying that I didn't provide any tuning tuning inputs for this model Yeah, but we will do that later Now what I get that as an output is this module dot are and the idea That I will show you now is My preprocessing script so in order to input something in this model I need to do some handling of these files and things so What this is doing is Downloading an image from from the internet. So that's that little cat It will do some Normalization of the image which is required by the model. This is not this is not required by by TV MC Anywhere that you would use That network you would need to do that Handling of the input so it will normalize according to to some factors that is required And then in the very end what I'm doing is just saving that model as an npz Array, so it's serializing the numpy arrays to a file and that file. I'm just calling image net cut Dot npz. That's numpy does that for us So if I just call that I can I can call this process and It does that and it will get that image net cut dot npz file in there and the so the idea is that we will be able to TV MC run then send that inputs We'll say that the is the image net cut the output Output I'll just call this output One dot npz. So that's the output of the first step of the demo Yeah, so then that is the module Tar. Oh, yeah, so just others it's more flag here, which is print time and then it will so Compiling if you remember it took about a minute Running it's very quick because it's just invoked the runtime loads that shared library and then runs it providing our input Our input didn't need anything special from the runtime because we provided it already Or everything it was in the right format with right names and everything so we can see it run I mean at the order of 51 milliseconds It's all the same because I run it only once we can basically just if we want to benchmark or something We can just say repeat And then you say a thousand times or something to avoid the CPU noise or GPU noise or and this sort of thing So, yeah, so this run Quite quick because it's just invoking the runtime now the next step would be running TV MC tune And then we would just say target equals LLVM And then our model and these would start a Tuning process. So that tuning process. I've run it just to make sure that it was quite fresh I run this with TVM sources from last night and Just so that we don't wait 12 minutes in here. I run this before about an hour ago To generate a tuned tune the logs From that so these tuning logs are here and If we have a look on them, I mean, don't Don't get scared by this but The idea is that these are Outputs from the tuning process. It's a sort of a messy JSON file, but it's useful for the compilation process a few things to identify here. So this is a neural network operator and some data types some Shapes of tensors formatting of the Input and then some performance results. So that I guess the the main mental model here Is that it correlates some performance outputs to configurations on a particular neural network operator That that's what it does and it does this a lot of times So then that's the that's the idea then it When using it basically the first thing you do is to fish out the best results and then see where they fit on your model And then it applies it in there That's that's kind of the main idea But how do you use those? So you do TVMC compile You say dash dash tuning records, that's the official name You just say tuning logs Dash dash target as well What else so this and then Dash dash output. I'll just call that. Yeah, so it's already there as a tuned module To that are and then our res net model Model to sec It's the beauty of live demos, right? Oh, yeah step on one. Sorry Okay, so now it is it's doing the compilation of that same model and But instead of just doing it out of nowhere is Basically providing some tuning records to try guiding the compilation process for us and the the idea is that when it finishes it will use at least Tested versions of the operators that Gave some performance improvement the longer you leave it the bigger are the chances you get some improvements Now if we just run That same model again, so it's the Print time and then I'll say tuned Module To and then I'll just call that output to So if we compare this with the previous run so there was again In performance and just by the fact that we left it tuning for a while and the the idea Is that if you have a particular hardware or if you have a configuration with GPU and everything? The idea is that you can make that model to at least give it the opportunity to try to get Some performance improvements on that particular hardware now Okay, but then there is this thing so we run this quite a few times, but we didn't check the outputs So if we give the model This input so and that's the input we are using and that's the input if you reproduce this tutorial you will use if we basically so we are getting as an output of the TV MC run command you get some vectors in there and What I have is a post-process script script that will Basically download some labels from that model which correlates which class of the model So if this class is very high in terms of the prediction This image shows a bicycle or this if this class different class it will be a guitar or something So the idea is that it will correlate and print us the five most relevant Classes for the predictions generated by that model and the idea Is that if we just do Python post-process and we say output One which is the output from the first step we've run It will show that it is a Tabicat with 62% of probability and the idea so if we run that all that tuning a thing we generate some Outputs we don't want to degrade the model quality and the idea is that It will run and with the output of the tuned model it generates The model the outputs with the same Probability levels so that's kind of an accurate tuning process that that we run Okay, just a final thing I wanted to just give you a hint of The how to accomplish a similar thing using the Python API so that Kind of a codifies everything that we run on our practice So the it loads the cut dot np set loads the model Runs the compilation process and then runs this on CPU and then generates the output So if we run this we will have the output number three And then it will generate with the correct outputs I'm getting out of time and I just wanted to finish with This with some final remarks So this was a high-level introduction to TVM as in why it exists What are the main features and how to access them as a Python programmer or that or as an end user I Recommend if you are interested in TVM you can read about remote execution and read more about how to run TVM or microcontrollers and Finally just wanted to say that so TVM is a very Active community is very friendly community So if you are interested in this area of deep learning Compilers and the intersection of all these different layers in the deep learning ecosystem I recommend you to reach out There are lots of opportunities in the project for Python programmers and Python Development and everything I'll be around in the conference for one two Friday and Yeah, just reach out if you want to chat. Thank you very much