 So our next speaker is Ankit Patel. Ankit comes to us from NVIDIA, where he's a senior director. He leads product marketing for software development kits and developer tools. In his 10 plus years at NVIDIA, he has worked on both hardware and software for virtualization, ray tracing, and AI. Today he's going to talk about open source tools from NVIDIA that developers can leverage as they go and build their own AI projects. And, Ankit, welcome. Thank you, Todd. All right. Hi, everyone. So I'm assuming a lot of you know NVIDIA. If you don't, then the sweater is there to represent. But did you know that NVIDIA has over 500 open source projects? Do you know that NVIDIA products serve basically every single industry out there? So why did I bring that up? Well, the reason is, as open source developers, you should feel confident that you can build your applications on NVIDIA. You have a large market to go reach. There are over 100 million RTX PCs and workstations that you can go run your applications on. We have many developers, startups, enterprises building open source applications and tools on top of NVIDIA. And actually, I mentioned the 100 million. That doesn't even include the GPUs and data centers in cloud. But 500 projects, OK? 500 projects, 10 minutes. All right, that's ready? Nine minutes. OK, before I talk about the open source, I'm not going to talk about 500 projects. Let's talk about LLMs. And this is one way of looking at the LLM stack. Many different layers, lots of different ecosystem tools. NVIDIA's model is that we build the full stack end to end. The reason we do that is because accelerated computing is a full stack end to end. It's an optimization problem that requires top to bottom understanding. So we build all the components. We understand the full stack. We accelerate the whole thing. And then, paradoxically, take it all apart. Make it available to the community so that developers can go take the pieces they want, build on top of it. Much of that is open source. And that's what I'm going to talk to you about today is a few of those tools. As you can see from here, many, many companies that are presented here today, companies that you know about, tools that you use, that we in NVIDIA contribute to regularly. OK. You guys have probably seen a version of this somewhere. I've simplified it significantly to talk about a simple LLM workflow. So that's the rest of the presentation. We're going to go through these different things. The gray boxes are a subset of the 500 open source projects that you can use as you develop your LLM application. We're not going to talk about all of them. We'll talk about a few of them. But if you're like me, you don't want to start with the data. You just want to get your hands dirty. You want to say, hey, how does this stuff work? So good news. NVIDIA recently announced something called AI Foundations Models and Endpoints. You can go try it out. Just go there. There's a Gradio app. There's an API playground. So you can use Python Curl to start interacting with these models. Just yesterday, we uploaded the new Mistral model. I've any of you heard of this one? This is a mixture of experts models. Sparse mixture of experts. If you don't know what mixture is, quick sidebar. Mixture of experts, you have a router model and a bunch of experts. The router decides to use a couple of the experts. And so you have a 47 billion parameter model. But the tokens don't go through 47 billion parameters. They go through about 13 billion parameters. So you get a lot of efficiency. It's a sparse mixture of expert models. Super exciting. Go try it out. That's the URL. As you can see, there's also Lama and Lama2. There are Nemotron. There's over a dozen models there to play with. You can go have fun and slide bar it. The Mistral model on there is an optimized model that is accelerated on TensorFlow LLM. So back to what we're talking about. Let's start with data processing. All right, so tool for data processing. Have you heard of NVIDIA Rapids? OK, good. I see some nodding heads. So Rapids is a library that's the goal of Rapids is to speed up the data pipeline and leverage GPU acceleration, right? The cool thing that we announced recently. So Rapids, obviously, ETL, all the different stages of data processing, and even some ML. Recently, we announced Rapids Accelerator. And what that is, it's a KUDF, so KUDF is Kuda Data Frames. And the Rapids Accelerator allows you to accelerate any of your Pandas code. The way you use it, you say Python, module dash M, kudf.pandas, and then your script file. And what it'll do is it'll accelerate your Pandas code transparently. No code changes. You did not change your script at all on GPUs. So whenever it has a, if it doesn't support, if the GPU doesn't support an operator, it'll fall back to the GPU and keep going. So super, super exciting. There's no reason why you shouldn't try it. It really, if you have a GPU, many do even on your laptops, no reason you shouldn't try it. Simply just run your script with a command line argument. Quick note, Rapids, massive community. So there's the Rapids project, but Rapids is integrated into many, many hundreds of projects. We have lots of contributors, hugely downloaded. We work with, again, back to my point earlier, we work with the ecosystem. We work with everybody in the community. So Rapids is a great project as an example of that. Let's keep going. So you got Rapids, data processing. There's also the Spark. If you're a Spark user, same idea as the Pandas Accelerator. There's a Spark Accelerator. So you can use the Rapids Accelerator for Spark. Let's talk about model training and customization. So one of our projects for model training and customization is Megatron LM. So Megatron LM is for researchers who are training massive. I'm talking really, really large language models. The goal of Megatron LM was to train the largest transformer. I mean, you guys know transformer and know Megatron. Biggest, baddest transformer out there. And so the idea was thousands, tens of thousands of GPUs. And so that scaling large language models, sharing the work across, pulling it back together is a computer science challenge, not easy. And so we did all of that work in open source. We had algorithm research figuring out how to, all these different techniques, model parallelism and parameter sharing, et cetera. And that is done in this Megatron LM project. And what you'll see, we did this in 2017, actually, remember the date. But there's been lots and lots of follow on work, right? You look at all the different models and things, things if lots of the community has been able to learn and share from the things that we put in Megatron LM and has inspired lots of follow on work to use those techniques to train more and more models. These are all transformer models. All right, Nemo toolkit. I'm not gonna say much about Nemo toolkit other than my colleague Elena had a talk yesterday. Sure it's recorded, you can go back and watch it. Simply Nemo toolkit is an open source toolkit for ASR, TTS and LLMs. If you go in the GitHub repo, you'll see NLU, same. And there's a lot of things you can do. The Nemo toolkit actually is another one of the open source tools that was inspired by the work of Megatron. So a lot of those techniques are available there. Nemo is a tool for you to train and fine tune our language models. All right, let's keep going. So now that you've trained and fine tuned, optimize and inference. So we have a tool called Triton. So again, another open source tool, you can go to the URL. Triton is a inference serving solution. So once you have your application, you've got to serve all these different things in the data center. Triton supports many different modalities, right? You can do real time, you can do batched, you can do GRPC, you can do REST, all the different ways that you want to serve. It supports many processors, so it supports NVIDIA GPUs, but it also supports x86 and ARM CPUs. There are, actually, let me just jump to the back ends. There's lots of back ends, right? You can have a, for large language models, it has VLLM and TensorFlow TLM back ends, but that also supports TensorFlow and PyTorch directly back ends for that. We have an open Vino back end for, if you're gonna run open Vino on Intel, there's back ends for, you can just have your custom C++ or even just your custom Python back end, all these different back ends. If you go to the GitHub repository, you'll see the server and then, I don't know, 30-something back ends that have been contributed, either we created or were created by the community. All right, so you're thinking, wait a minute, that's pretty simple. How'd you talk about LLMs and not mention RAG? Okay, let's do it. So RAG, as you guys understand, hopefully most of you know, you got your trained model, but then you have the separate thing where you say, I'm gonna take my company data, I'm gonna create an embedding model, and then I'm gonna use some vector search retrieval, and I'm gonna take that data and give it to my model so that when it's generating the data, it's using that information along with the priors that it has been trained on. So again, back to RAPIDS. So RAPIDS has a module called RAFT, which stands for reusable accelerated functions and tools. It's a terrible name. But what it is, is actually, wanna be clear, it is not a vector database. It is, well maybe the name is not that bad. It actually is reusable functions. They're accelerated functions for vector search. So we expect developers who are building databases and who are building vector search tools to be able to use RAFT to accelerate their vector search and their embedding work on GPUs. We're working with these. You probably recognize some of them. Melvis, Redis, and Facebook, AI research has their similarity search. So again, that's how you can take advantage of RAFT. My point here was just to introduce you to a bunch of projects that can help you to help you accelerate your applications. Now, okay, so we talked about basic LLM workflow. We talked about RAG, but you can't build an application nowadays without worrying about safety and bias. You kind of have to think about that. And so there's another project called Nemo Guardrails that I think you should check out. Nemo Guardrails is a tool that basically helps you control, like you create rails. And rails are literally just instructions to your large language model. And so when you run your app, you say, hey, these are some rails I want to give you. So the rails are don't talk about politics. Rails are when you, if you have any insults, respond calmly, like things like that. So you can train your LLM, you have these rails that you can have your application on. And that way, when you build your application, you can control toxicity and bias. So that's kind of it. Two action items for you guys. Go to the open source page, check it out. We have lots of projects there for you to learn. I just gave you a quick sampling of things that you can use for LLM application development. And probably the most fun thing to do is go to the model endpoints. You can just play with the models, right? You got to interface that you can easily play with the models and try it out. Cool, thank you.