 Hey everyone, my name is Paranitha Rahi. I am a Principal PM Lead in the AI Frameworks team within Microsoft and I don't have a history lesson on PyTorch to give because I've just been involved with the community a year back, but what a year, right? I mean, if we talk about the AI landscape, a lot was already covered in the previous talk, but there's so much game-changing work that's happening around us and all is powered by PyTorch in the back end. And so a fantastic year and we at Microsoft like have been part of this journey early on, collaborating with Meta in the nascent stages of PyTorch in now with the Linux Foundation and are so happy to be a part of this community. So as we talk about this AI landscape and if you remember the chart from the previous talk, one thing that's changing is the complexity of these models, right? GPT-3, 175 billion parameters. GPT-2 was I think a hundred times smaller. GPT-4, even more. So complexity of this data is increasing. The amount of data that these models are getting trained on is increasing. And so organizations that are using or building these models have a problem at hand in terms of how do we handle this and how do we do it efficiently and in an agile fashion because there's rapid innovation happening. We've had like a slew of innovations just at Microsoft across various products and cloud-based solutions that we have, be it chat GPT, Bing-based solutions, Office, Github Copilot and so on. So a lot of these parameters and the data model complexity requires time commitment. So how do organizations scale up and perform these tasks efficiently and also responsibly because it's not just that we want to output a model for product innovation, but we want to make sure that these models are built in the right way and are serving society broadly and not narrowly. So there are a couple of aspects that I'm gonna cover in my talk today. The first one, how do we, if you are a business owner or a data scientist or an IT department lead, if you are posed with the question of, how do I train state-of-the-art models that are applicable to my organization? One of the tools that we provide within Azure is a pre-built environments and I'll talk about more later in terms of how we build them specifically for PyTorch as well. Another question would be, how do I innovate fast? Training times are longer. There are deadlines that everybody is working with. How do I equip my scientists and researchers to do it fast? We offer a bunch of training optimization techniques built on the PyTorch community as well as open source accelerators that I'll cover in the coming slides. I have a model, but I want to make it small enough because I want to deploy it on an edge device or I have limited GPU resources. How can we use Onyx Runtime to do faster inference? And then finally, as these organizations are building these models, you have to make sure they're secure, compliant, ethical. So how do I make sure that I integrate an audit trail for these models that I develop? So I'll cover the aspect of responsible AI in my slides. So deep learning on Azure, I think we have a pretty comprehensive and trusted platform for building intelligent enterprise-based solutions. You have responsible AI tools that are integrated as well as frameworks of choice that are integrated and it's all powered by flexible compute and capacity with GPUs at scale like the A100s and B100s from Nvidia. There are four pillars that we kind of bundle our value prop within Azure ML. First is to improve time to value with rapid innovation and integrated tools. Second is to operationalize at scale with ML Ops. Third is to develop on a secure, hybrid and compliant platform. And fourth is to deliver responsible solutions. And we have seen both startups and seasoned customers benefit from it. So Fashable was like an AI-based fashion startup that uses our curated environments so that the researchers can focus on the task of building these models and not so much of the interdependencies between the various packages and libraries. BMW realized time savings for the training of their image models with just a few clicks. PepsiCo was also able to improve their model accuracy back to 40% while still meeting the compliance needs. And Ernst & Young, they build models that predict loan decisions and so it was able to improve or reduce their disparity between men and women from 7% to 0.5% when they make these loan decisions. So all of these are examples of how the tooling that we have at Azure and ML help meet our customer's objectives. So we're at the PyTorch conference and why do we love PyTorch on Azure? PyTorch is a framework, aligns very well with our goal of making end-to-end model development seamless for our users. It's widely accessible. The Python interface makes it very intuitive and easy for users to build their models. We've seen rapid development technologies and functions that help developers build out these models faster, as well as we've been part of a strong ecosystem right from the start. So PyTorch is very critical to our growth, but how do we make it even better? And so these are some of the principles that we try and apply to make PyTorch work even better on Azure. So first is performance. We improve performance of PyTorch through systemic high leverage improvements and I'll talk about some of those later. The second is that we want to engage the largest production-ready PyTorch models at scale and also provide portability. Like you should be able to take your models and deploy it across all computing platforms that can be leveraged for AI and then of course, developer productivity. And we want to do this while building leadership on the open source community. So how does PyTorch work on Azure? This is a quick kind of a summary slide, but this is not just one way. This is like just an explanation of what I'll be talking about more in detail, which is the Azure container for PyTorch. So at the bottom layer, you see the hardware stack, which is powered by the best in class NVIDIA GPUs, like the A100s, H100s and more. And then on top of it, we have multiple ways in which a developer can come in and build their training workflows. So you have the Azure ML SDK, you have the CLI and the Studio, and you can access or build your training jobs using any of these. And then within AML, we provide comprehensive training frameworks. And this framework is packaged in what we call the Azure container for PyTorch, which is a curated environment. And I'll speak more about it later, but it has the latest and greatest technology that is required for faster innovation of your ML models. And then on top of it, users can easily plug in their scripts and then specify their compute targets to create their models. And then you can then further deploy it and convert it into Onyx for faster inferencing. And it allows you a common place where you can monitor and run these like repeatedly. And you can also share your training jobs, your models across all of the researchers and scientists within your organization. So what is ACPT? ACPT is an optimized training framework and the primary goal there is that when an ML engineer is starting to build their ML models, they don't have to worry about setup time. How can I make it easy for someone to start off and use the best and the most recent technology that's available to get going? So like, PyTorch versions keep getting updated every time. How do I make sure that somebody has the most recent? Onyx runtime got a new packaged out. How do I make sure that they have the recent and it's used kind of consistently across all of the developers in your team? Check one, two. Sorry, can you hear me? Okay, yeah. And as well as we also kind of provide it as a common place to package a lot of the open source training optimization technology that Microsoft participates in and provides as well as like the native integration with Azure makes it easy for you to like monitor and deploy your models as well as like share across your organization. So with ACPT, we now have PyTorch 2.0 included. Meta announced, or sorry, the PyTorch Foundation announced ACPT on March 15th and the Azure container for PyTorch released within a week, including this latest package. PyTorch 2.0 is accelerated, it's fast and runs large training jobs efficiently. And one of the good things about, the Torch Compile is the main API for PyTorch 2.0 which wraps your model and returns a compiled model and it's backward compatible. And so with ACPT, now you have an image that is already tested for against Python versions against CUDA drivers and all of this so you don't have to do the testing to make sure I'll figure out what package works with which other, what are the dependencies that I need to worry about as well as we test these curated environments across a lot of Microsoft 1P workloads. So one thing we try to do with Azure ML is that the goodness that you see in a lot of technologies that is out there, whether it's the Bing solution or any of the other things where 1P teams have tried or cognitive service team, et cetera where they've tried these environments and we've already tested them for security, compliance and all of these things. That is then fast on to the larger community in the sense that you don't have to worry about these basics of getting things right and just use the image at hand right away and do more. Similarly like PyTorch 2.0, we also have open source, a deep learning optimization library called DeepSpeed that's included within ACPT or Azure Container for PyTorch. With DeepSpeed, you can get training done about 10x faster for your models. The cost is like 6x cheaper. It's again open source and it provides excellent system throughput. It's an easy config to just apply to your training jobs with a couple of parameters you can enable DeepSpeed while using Azure Container for PyTorch. Onyx is again another collaboration we had. It's a cross-platform machine learning model accelerator. It has a flexible interface to integrate to hardware specific libraries. You can use Onyx Runtime both for training and inference and it is available within the Azure Container for PyTorch and ORT, it's a pretty simple wrap. Like if you want to use ORT module, it is instantiated from the Torch ORT backend in PyTorch and it enables a seamless integration for Onyx Runtime training in a PyTorch training code with minimal changes to your code. So with ORT module, you can wrap the module and just use it within the code. It's a simple one-line change and that can give you pretty good improvement in your training time. We've seen up to 1.4x faster training with Onyx Runtime for training. And similarly for inference, you can take that same model and you can convert it or you can import Onyx Runtime as ORT and create an inference session that then allows you to then deploy this model across a host of different hardware as well as if you want to do it on device or web, ORT makes it much, much easier for you to do that. And we have seen our customers spanning a range of scenarios be able to use this. So for example, we have InFarm, which runs a computer vision on a tractor and uses Onyx Runtime. We, of course, hugging face has enabled it across a bunch of their scenarios as well. Some of the results we see with Onyx Runtime, so on the right, there is an example that was published by Optimum. So Optimum integrates the Onyx Runtime training through an ORT trainer API and this trainer extends their transformer and we've seen that training time can be reduced by more than 35% for many popular hugging face models. And the performance measurements here were done with Onyx Runtime for training in the second run and then Onyx Runtime plus deep speed zero stage one in the final run. And all of these things are pretty much easy for you to do because with ACPT, given all of these packages are available in one place, you can compose your base PyTorch with ORT or your base PyTorch with deep speed and get these results faster. On the right, we have a recent performance analysis done with PyTorch 2.0 with several hugging face models in both eager mode and compile mode. And what we see is that Torch Compile improves upon several models. However, when we compose Torch Compile with deep speed, the gain in throughput is even higher. And similarly, deep speed plus ORT provides gains for several model types. So this is an area where we continue to like improve and evaluate, but the key point is that these technologies don't work in isolation, but you can compose them to get even further gains and reduce the costs of running these models and reduce your training times even further. Also, we have introduced with the recent release of Azure Container for PyTorch, a new technology called Nebula, which boosts checkpoint speeds by up to 1,000 times. This is still in preview and it is fully compatible with PyTorch. It runs only on Azure ML, but again, Nebula can reduce checkpoint times from hours to seconds. So you can save as much as 95 to 99% of your training time if you use a simple checkpointing capability. So how do I use ACPT? Azure ML is available both as a command line interface, an SDK and UI. I'm showing you some snapshots of the UI view. If you go into the environments tab, you can see and search for ACPT or PyTorch. You will see a list of models and as you can see, they're available for various versions of PyTorch. Sorry, not models, but environments. And they're available for various versions of PyTorch. You can easily use a pre-existing curated environment for your jobs. But what if I don't want to use a curated environment as it exists because I have these additional dependencies I'm building or I'm training a Visper model and I have transformers and accelerators that I want to add on to a base image. We provide a capability of creating custom environment. You can start from a pre-existing curated environment and then you can add, so the first line here is just a curated environment that exists and then you can add and install additional pip dependencies to create your own custom environment. And once this is saved, these curated and custom environments can be used across your training jobs in Azure ML. So it's not like it's tied to that one instance. You can share it with your teams as long as the workspace is shared. Then how do I use this along with some of the acceleration technologies I talked about earlier like Onyx Runtime and DeepSpeed? There's a simple argument that you've provided to your training jobs. There's a capability to upload scripts or associate scripts with your job. And within that, you can provide the parameters. Some of the parameters are shown here. One is from the optimum integration that I talked about earlier and that just extends the trainer and transformer so you can specify a parameter here and use that. Similarly for DeepSpeed integration, you can upload a JSON file for your training job which has the configurations required. And it's a very simple and seamless integration doesn't take a lot of steps with a couple of clicks and a few lines of code. You should be able to use these technologies pretty efficiently within your training jobs. So now I have a model. What do I do with it? And how can I kind of use it and make it easy for sharing? So once you have a model within your training job, you can register it within Azure ML. And once a model is registered, it can be shared within your teammates. You can deploy it and you can create an online endpoint which can be tested within Azure ML but also you can call it through REST API, et cetera. But you can register the model as is in PyTorch or even better, convert it to Onyx. So that will give you the speed ups that I was talking about earlier. It can give up to 17x reduction in your inference times and you can also use it with ORT in your inference sessions to get the advantage of Onyx runtime across your deployments. So switching gears a bit. What we talked about earlier was how do I make it easier for developing these models from the part of the steps required to train a model? But how do we do it mindfully to make sure that it's also as seamless for our developers and engineers and scientists to integrate and make responsible AI part of their workflows? So RAI was not a reactive approach for Microsoft. We have been very mindful and proactive in our approach when it comes to building products that empower every person and every organization on the planet to achieve more. We started our RAI journey around 2016. There is a slate article from Satya Nadella which kind of spearheaded this whole synergy within our organization. And since then within Microsoft we have adopted various principles. So firstly we defined these responsible AI principles. What does it mean? What do we need to do? And then we established standards around these RAI principles and then built products that not only are practices within internally when we build models to make sure that these are applied and used. So for example, when OpenAI gives us a new model one of the teams that first gets their hand on these models is the RAI team within Microsoft. And then we also make sure that some of these learnings that we have as we build these models are available for developers to use as they are building their own models. And so we built integration with Azure ML for RAI dashboards and we keep improving upon it as we learn and do more. So there are six basic RAI principles that are the foundation of our RAI governance framework. These six principles serve as a guide for the development of AI technology. Now we take these six principles and we break them into multiple goals. And each goal has a series of requirement that sets out the procedural steps that must be taken and mapped to the tools and practices that we have available. And multiple facets of implementation, including training, tools and testing have been developed around these. These six principles, accountability, transparency, making sure that the models we build are fair. They are not biased towards one cohort. They are reliable and safe. They do not introduce vulnerabilities. They comply by privacy and security standards and they are inclusive. So these six pillars are built across and molded in our enterprise risk management framework and ensure the operationalization of responsible AI at Microsoft. We've already made some of our early work available as open source toolkits like Counterfeit, Econ ML and Hacks toolkit. And we continue to now integrate that more and more within Measure ML. So one of our toolbox's newest addition is Responsible AI Mitigations Library, which you can look up. And there's also Responsible AI Tracker, which helps visualize. So what are the steps required as we as developers go through our training cycle? Firstly, we need to identify where our models might not be doing things as intended. So if we can specify and identify cohorts which have a high error rate, while our overall model might be accurate, it might not be working well in predictions in a fair way for a specific cohort, like for job applications or for scenarios like giving loans or housing, et cetera. So how do we figure out? So there's error analysis tools and fairness assessment tools that are embedded within the RAI framework and it's a single pane approach which makes it much, much easy for developers to test these out and pick cohort, et cetera, and do the analysis. Once you've identified these issues, you need diagnostic capabilities and mitigation capabilities. We provide model interpretability and counterfactual analysis enabled within RAI dashboard as well as certain mitigation steps. And then you then need to make a decision to address these. You can understand the causal impact of your features or do counterfactual analysis, improve your model and then compare so that the issues that we identified earlier are they still occurring or not? So the cyclical approach can help make sure that you are improving your models and making sure that your models are being fair and doing the right thing. So like I mentioned earlier, there are these pillars of responsible development like fairness and explainability. There's automated pipelines, workflows like a YAML powered CLI experience that you can use. You can also customize which components to utilize to tailor to your unique needs and scenarios. And there's also a no code experience offering users an end-to-end experience to generate their responsible AI dashboard. And once you have run all of this analysis and you want to make sure that you're able to share that with both the technical and non-technical business stakeholders, there is an easy way of creating a PDF scorecard that ensures that everyone is on the same page when deploying your AI systems. It's a single pane of glass experience which brings together a robust set of responsible AI tools including the already previewed model explanation and fairness metric. Okay, I think I spoke too fast but that brings me to the end of my slide. If you want to know more and these advancements that I talked about are some of it, there's much more that's coming up. So there's Microsoft Build May 23rd to 25th. You can know about new practices in RAI. You can learn more about what's happening with ACPT or the 10 labs. These are some of the sessions. Please do sign in. And thank you Pytorch and lovely to be part of this community. We'll do more together. Thank you.