 Hi everyone. My name is Christian Keller and I'm the product lead for PyTorch at Meta. I've been working in AI for the better part of the last 10 years and on PyTorch for almost four years now. So how many of you here have experience with AI and ML directly and how many of you have used PyTorch? Thank you. It's always nice to see. So as a quick intro though, PyTorch is a library based on Torch Library. It was developed by Facebook AI Research Labs and is now actually part of the foundation so it's not hosted in Facebook anymore or Meta. It's hosted as part of its own independent foundation or part of the Linux foundation. It was released in 2016 and it's a programming interface for building and training neural networks. Today PyTorch is the most used framework in AI and ML. We're looking at it here from the perspective of research and it doesn't go all the way back to 2016 but over two thirds of research papers in AI are built are using PyTorch. So why is it so popular and what made it so popular? It's simple and intuitive. That's one of our key principles here. The low level details are abstracted away although you do still have some ability to tweak most of the things you want for research. And it's a Python API wrapper around C++ kernels for speed. At the core also of PyTorch product design is that we made eager mode the center of PyTorch. Other frameworks were focusing on performance and other elements and so took a different approach but for us eager mode was important. So what does it mean eager mode? It means that the code is run in the way you write it. So it's executed in the order you've been writing it. It allows for dynamic control flow and interactive experimentation and debugging. So it's easier for complex model. You know when an error pops up where it's coming from based on where you've written your code. But that was a risky bet again because performance for many was what mattered most and so eager mode is considered slow. We're not optimizing things. We're just running it the way you've written it. But this flexibility came so at the cost of interpreter overheads that were initially hidden behind the GPU but this is starting to disappear now because GPUs are getting more and more performant. So that's what took us to number one but what is it going to keep us there? We want to be fast now. So what I'm going to talk about today is what's our approach and how we think about getting that speed and still maintaining the PyTorch-iness in a way for your deployments. So can we build a user-first compiler? We're looking at here these two dimensions, performance and ease of use. It's not the only two dimensions that matter here but I think these are the ones that we constantly struggle with. How many optimizations we want to bring and at what cost for the user when it comes to simplicity and when it comes to debugability. So we're trying to have both. So how's this how we're going to look at it? We have Torch.compile and so that's what we branded as PyTorch 2.0 that we announced our last conference in December and this Torch.compile which is a very simple line of code that you can add can create massive performance improvements for your models both in training and in inferencing. So what does a deep learning compiler do? There's three main steps that we're looking at here. First is the graph acquisition. It's looking at the code that you have and being able to translate that into a graph, understanding the various loops, the various changes that you make in your code. There's the graph lowering which involves taking all the operators that you might have started with which could be over 2,000 or more different operators and simplifying that into a subset that's manageable and that can be used after that for optimizations. And then there's the graph compilation looking at these operators that have been simplified and trying to combine them in a way that can create gains for speed. And so the numbers are not exact that I'll share with you here but roughly you can have between 2,000 to 3,000 types of operators that exist initially before the graph acquisition step. The graph lowering will bring it down to between 2,300 different operators that we then combine in a way to create these optimizations. So the graph acquisition is capturing a static graph representing the program. So that's important because unlike a Python code that you write that can be quite dynamic in a way, like when we're creating these optimizations we need to see something that's static that can be run and optimized. The graph flooring I talked about and the compilation that's device specific optimized code. So the device specific part is important here and oftentimes we think of deployments on the server and so you have, let's just say, NVIDIA or MG GPUs and you want to deploy there and you have one specific type of hardware you optimize for. So that's great but that also applies for the case of Edge where you don't honestly control the hardware that you have and each device will have a specific CPU, GPU or NPU even that can be leveraged and so how would you create compilations in a way that can be targeted at these devices or these type of hardware? So we're only beginning our compiler journey here. We started back, our step one was JIT Trace which is about capturing the graph that created, that allowed to run models on certain inputs, record the traces so for all the executed operation into the graph. It was only correct if the function is independent of the data it operates on and that's a little tricky because in effect like the model that you wrote would only work for a certain space of inputs that would have been defied ahead of time and so that's limiting because sometimes when you deploy on large scales for billions of user at times that they don't necessarily fit into the limitations that you set for the input so the model could behave in an unpredictable way. So we decided to do better. We decided to do JIT.Script and so that's when we introduced TorchScript. TorchScript is a statically typed subset of Python. It did allow for more of that pytorchiness to be brought in a way all the way down to the deployment and one important point here is that the source code needed to be refactored and so you couldn't include with it any third-party libraries so that's a problem when you have, when you're bringing libraries like NumPy for example or any of the other ones that you might have wanted to have they need to be taken out or rewritten as part of the code all the way in it. So we have JIT.Trace, JIT.Script and now we've got Dynamo. So how is Dynamo different? So we include graph breaks for completeness to revert to eager mode and we've got guards for soundness to ensure that all the assumptions are still valid and that in a sense no matter the inputs that you get the model will behave in the way you had predicted. So the idea behind it is that instead of trying to get you one graph for your entire model you create a set of subgraphs that we know we can compile that are going to be static and around these subgraphs we're able to identify areas are going to be Python code that still allow for the dynamism that that pytorch allows. So are we done? Almost. So we talked about Dynamo and so that does everything I just talked about until now. AOTOGrad here captures the graph for backward ops so if you want to do training in particular that's going to be important and it's the similar process and torching doctor here is here to lower to device to optimize. I'll talk a little bit about inductor in a minute but what's important here is that there's multiple entry point depending on where you want to go and you can use this whole stack all the way through inductor to the devices to optimize that we have GPU CPU for example and we've got some benchmarks I'll share a minute on the GPU side but you could connect like if you're a hardware manufacturer or a computer API even at the Dynamo level or at the AOTOGrad level depending on what you're trying to achieve here. So the inductor optimizes the code for performance by fusing operators so this is what we're looking at in terms of identifying operators that would work well together in order to optimize for the device you're targeting and then you can register custom backends through inductor and so here the idea is that we want to leverage the community to be able to build and tie into what we're building so we're not going to be the specialist at building optimizer for let's say AMD GPUs or for NVGA GPUs or for Qualcomm chips maybe and so being able to have these vendors or these computer APIs that arrive tie in at this level and create the optimizations they need is what we're hoping to achieve long term. So this new process the .compile just setting this one line of code gets you 43% performance improvements over eager mode on NVGA GPUs in these benchmarks for a set of models you can find it's about 120 models that we've trained we looked at from hugging face to Tim to a torch bench that we can find on our on our website and that's just out of the box with one extra line of code that you're getting that it works 93% of the time so one thing as I mentioned earlier is we're an open source project we work not just with the team that's at Meta or that's at Amazon or at Microsoft or at Google or any of these other companies that works on on PyTorch but through the many contributors that that come and engage with us today we've got over like 3,000 contributors that are part of building PyTorch and so you can find here and all these slides will be shared later but on our website where to contribute an easy way to start contributing on PyTorch is initially if you're learning about it through improving documentation that we have so we've got a doc-a-thon starting late May you can scan the QR code if you'd like to join and there are more resources you can find we've got the conference that we had in December that has Sumith's keynotes Sumith is the one of the creators of PyTorch and he describes the Dynamo in the 2.0 that I just talked about as well as these developer notes and you can follow us on all the various social networks and contribute on GitHub and that's it thank you I'll take a few questions if yeah so I think the first thing through if you haven't been engaging with this is to get in touch with the team that's working on that and so you can find them on our yeah on so on that page on contributing there's like specific entry points for for that the the other piece is looking at our GitHub for PyTorch that has all the documentation you need to kind of get it to get started this tutorials also on how to to engage on the on the inductor piece specifically yep okay if you have another question I got a mic here one more thing if you want to learn more about the inductor there's also a video you can find online from our conference by an engineer called Anzel and he he describes in a way how inductor works in more details with the use of Triton underneath that was a an open source language that was promoted by open AI initially and so yeah that was so clear the other question it's like to say it was interesting listening to you talk about AI it's something that it's on all of our minds nowadays thank you thank you well and all the models you are hearing about these days chat GPT Dali all these are built on PyTorch so we're pretty proud of this and so keep keep an ear out for for what's happening out there and I feel free to reach out if you have any questions thank you