 I'm Priyanka Sharma, and I'm the executive director of the Cloud Native Computing Foundation. Thank you. Before we begin, I would like us to take a moment to acknowledge those who can't be here today. Cloud Native is a global, diverse community, representing 190 countries. And today, our communities in the Middle East, particularly Israelis and Palestinians, are suffering. After the tragic terrorist attack on October 7th, we have lost cherished community members. And as the conflict goes on, so much suffering and loss of life continues. Let's take a moment of silence and recognition of the lives lost and for all those who need our love and support. Thank you for participating with me. And now welcome to KubeCon Cloud NativeCon North America happening in Chicago. What do OpenAI, NVIDIA, and Hugging Face have in common? That's right, I heard it in the audience. They all run on Kubernetes. Yes, that's this group right here. Six years ago, the platform engineering team at OpenAI were on the stage at KubeCon Berlin. Anyone here in the audience who remember that? Yes, there are some OGs, awesome. And they shared with us how they were using Kubernetes to train their AI models. These are the folks behind ChatGPT, by the way. Even though Kubernetes was originally built for web-scale workloads, OpenAI was able to use it to run huge and dynamic training jobs, sometimes lasting for weeks. While HPC-specific technologies such as Slurm have a very important place, Kubernetes's developer friendliness and APIs have made Cloud Native the scaffolding of the Cloud Native movement, of the AI movement. OpenAI was able to scale to 7,500 plus Kubernetes nodes to build ChatGPT, the AI-powered chatbot that took the world by a storm, one that my husband lovingly refers to as his best friend and workwife, and to think I'm the fool that introduced them. ChatGPT surfaced the generative AI movement for all of us to see, the revolution that is already changing our lives. At its simplest, generative AI utilizes AI models to learn the patterns and structure of input data and then generates new data with similar characteristics. In essence, creates content. Large language models or LLMs are a form of generative AI that have been broken the barriers between human and machine communication so that there can be outputs that humans ask for in natural language. The infrastructure that AI uses, especially generative AI, is different from the web-scaled workloads. And often, but not always, leverages graphical processing units, or GPUs. Around since the 70s for image and graphics processing, NVIDIA introduced GPUs for the personal computer in the 90s. And this new paradigm has unlocked a seismic shift in society. Speaking of seismic shifts, according to a McKinsey report, the impact of generative AI on the global economy could be as high as $4.4 trillion. For some context setting, the GDP of the United Kingdom in 2021 was $3.1 trillion. I recently judged the TED AI hackathon in San Francisco. Yeah, it was fun. And I met a lot of companies and individuals all so excited about the possibilities. And I asked about the emerging LLM stack. I asked these companies there what they were using for infrastructure. And I got some really confused looks and, obviously, Kubernetes as the response. Why was I even asking? And it made me wonder, could this be? Kubernetes's Linux moment? A ubiquitousness so pervasive that it does not need to be inquired about. The cloud-native ecosystem has a symbiotic relationship with the GenAI movement. Not only have we provided companies and organizations with the infrastructure layer to run LLMs and GenAI features, we've also leveraged the technology ourselves to make infrastructure more accessible and useful. For instance, I was talking to folks at New Relic the other day. And they told me that once they fully retooled on open telemetry, which, as you all know, is a CNCF project, using GenAI in their user interface, their traffic through open telemetry increased by 79%. So it turns out making it easy for folks to ask what went wrong to your infra makes it that much more useful and accessible. GenAI and cloud-native infra have much to do with each other. All this is to say, regardless of whether you work for an end user organization or a vendor, learning how to support an LLM infrastructure is important. Now, there are two elements of the LLM app stack to be aware of. The first part is around training, which, in the case of LLMs, is how a large language model is built and developed. And the next is inference, which is applying this capability from the large language model that we learned to answer questions, to create content, to give the response the input or needs. I started tinkering to see how easy would it be to deploy an inference LLM application on my computer with Kubernetes. Last time when we were in Amsterdam, I shared with you over video how to run Kubernetes on your computer easily with kind clusters. Today, we're going to build upon that. And I'm going to show you how to use kind for Kubernetes and run an LLM app on your computer. There were two things that really mattered to me. One was that I'd like to create a simple chat bot. And the second was everything had to be 100% open source. So because of those parameters, I had to pre-bake part of this demo. So as of now, on my computer, I already have kind and an LLM downloaded. I'm going to just check that that is the case. Sorry, do a kubectl. Let's check. OK, you see here, can folks see this? Yes. You see here, LLMA running. That's the registry from which I pulled the Mistral database, all open source. And at the same time, we also have the kind clusters on. So knowing that this is here, I'm just going to run my demo script. And this is running now. The database is loading. I'll use Neo4j. Now keep in mind, while this is happening, it's taking some time. This could be all very, very fast if I had used either a managed service or if this feature in Kubernetes called Dynamic Resource Allocation was enabled so that I could then use my M1 chip on this laptop to then run this demo. But as things stand, this is what we're doing for having 100% open source demo. And it is important to have that just because being 100% open source means any person in this audience and beyond, regardless of what industry they work with, however regulated, can test this out and get familiar with the LLMA infrastructure. Thank you. And I'm hoping this will run. We'll give it a few seconds. And I have faith that it's going to happen. As this will pull up, you will see one is you will see my chatbot. But second, you will also see an open source tool called WeaveScopeLoad, which is 100% open source offering that shows you what the infrastructure is set up as and what the traffic patterns are. So let's give it a hot second. Just so you know, I practiced this so many times. And every time, it was going real fast. And I was promised that this is going to run. And I have a backup video that I have prepared. But if it's OK with you all, let's let it happen. Database pod, man. Maybe this will not go the way I would like it to go. Maybe we can pull the video that we have of this. And by then, maybe it'll have run. And I can show it to you. I'm watching my computer closely. If it happens, I'll say something. That's what would have happened. That's WeaveScope that shows us. And Neo4j, of course. Nothing wrong with it, but given the constraints. There's the app running, getting started. And then we have our chat app over there. And you can see what's happening with the intro. Then we ask the question, what is the CNCF? Because maybe someone doesn't know. There you have the chat app, giving a response. All open source, built on the computer, just recorded a few hours ago. A big thanks to the folks at DockerCon who released a startup app for LLMs that was leveraged to build this application. If you'd like to try this out, use the QR code on your home computer and fast internet. I promise you it'll work a little bit better. So give it a shot. We know AI is changing the world, and CloudNative is powering AI. With this scale of innovation, CloudNative needs to keep evolving to be at the cutting edge. The stakes are high, because anyone innovating with JNAI applications is particularly sensitive about their data security and privacy. As a result, a lot of LLM stack vendors need to be able to deploy to customers or VPCs on their Kubernetes clusters, turns out, so that they feel more secure. Kubernetes is the new on-premise here. And so if the end user is our astronaut, we provide the scaffolding for the rocket ship of the JNAI apps to take off. What we need to do to make this happen is to connect the end users building JNAI technologies and CloudNative maintainers. To that end, I'm pleased to welcome on stage Joseph Sandoval of Adobe, who recently released a JNAI product called Firefly. He's also a member of our technical advisory board and user technical advisory board. More on that in Paris soon, I promise. And he will speak with key maintainers over here. Kevin Clues from NVIDIA, Marlo Weston from Intel, and last but not least, Tim Hawken from Google. Well, welcome to the stage panel. And personally, I'm very excited to be able to be a part of this. Since recently at Adobe, we have just shipped our first JNAI application called Firefly, which is based and built and running on CloudNative technologies. Just to give you a little bit of kind of overview of how we did this and how this was able to go to market quickly, is we have an underlying platform quality that is based on Kubernetes, using a lot of the familiar things within the CloudNative ecosystem, like Cilium and Envoy. As well as we have another part of our platform that is focused on that developer experience called Flex that is using Backstage, Argo, and a lot of the familiar other elements that help you create that dynamic experience. And so this was a great opportunity for us to really kind of see how the power of CloudNative can power these applications. But there was also one learning that there was much more that the community could do for this type of workload. So let's just get right into this conversation. I want to start with you, Tim. And what I really want to do, when I look from this recent learning, there is definitely some gaps when it comes to running ML workloads on Kubernetes. And maybe you can kind of help identify what some of these challenges are, as well as what the community is doing to address this. Sure. Yeah, AIML workloads are a little different than the things that Kubernetes has been built to support for the last 10 years. And so I think it's really changing our relationship with the hardware. And so we need to think hard about how we're going to manage that in scheduling and resource management and performance management. And workloads, in particular, are different than the classic sort of web apps. So as a community, as a project, we're thinking a lot about what new primitives we're missing, what new concepts, what sort of user experiences we want to assemble out of the pieces we've got. I think we've got a good platform for building that, but we don't have exactly the thing that we need yet. Interesting. Thank you. Marlo, I want to ask you something, because of your background and the things that you've been working on, is what are some of the key differences between AL ML systems and HPC systems? And what kind of community to learn from this? So HPC systems are very solid systems. Basically, you label it by RAC. Whereas some of the AIML, we're starting to look at amorphous blobs of compute, where they're not necessarily racked and stacked the way that old HPC is. And so we need to be designing for cases where it may not just be in a data center, but maybe also edge. Interesting. So, Kevin, I wanted to ask you something in regards to, and I think this would be for the panel as well. But there's definitely some key takeaways that I'm hearing here, that there's some actions that need to be taken. Maybe you can provide your perspective on this. Yeah, sure. So from NVIDIA's perspective, and using GPUs in general, we've heard a lot about LLMs, but it's not just LLMs that run on these GPUs. It's any kind of ML workload. And one of the biggest pain points that we see people facing right now with using GPUs in Kubernetes is being able to get the right size GPU for the workloads that they're trying to run, where the conventional thinking has always been training, you need big, beefy GPUs. Inference, you need smaller GPUs or even a fraction of a GPU. But even with the introduction of LLMs, that's not even true. If you want to do inference on LLMs, you need these big, beefy GPUs. And so just trying to find the right size GPU for your workload is a challenge in Kubernetes, amongst some of the other challenges of, once you do have access to it, how do you control the sharing that you have of those GPUs across different workloads that you run? And one of the things we're working on the community as Priyanka mentioned in her keynote is this new abstraction called dynamic resource allocation, which we've been working together with Intel on building out and becoming kind of the new way for doing resource allocation in Kubernetes. There's lots of challenges with that. It's not, it's still in alpha form, and there's some cost that comes with doing things this way that we as a community need to kind of rally around and figure out how to solve some of these before we can actually make this the standard way of doing resource allocation, but I'm hoping we can do that together over the coming months or years so that we can actually make this available for people to use GA before too long. That's great. Marla, I think you may add some more thoughts about this as well in regards to, from an environmental perspective. So when you start looking at the GPUs and their power use, you're talking Bitcoin mining, but bigger. So Bitcoin mining remembers, I think was at 1.1 to 2% of the world's power in total. So now we're talking bigger and bigger. So we need to find ways to optimize for power so that people running the data centers or people running locally are minimizing the amount of power user and maybe just powering up just when they're using them. That's great. I think one final takeaway that I may add is that as mentioned, we have the CNCF End User Technical Board, we're an advisory board where hopefully we can get a lot of inputs there. So, panel, I wanna thank you for all this great comments and feedback. It's really appreciated. Thank you. By the way, my demo loaded. If anyone wants to show the screen, AB, so it's real, I promise. All right. Awesome, back to slides, please. So, the panelists have given us clear action items. I'll recap them for you all so that we know exactly what we need to do. First step is please engage in Kubernetes issues with how you are using Kubernetes for AI workloads. In particular, we need to look at dynamic resource allocation, which was brought up here. And if that had been sorted, my demo would have run on time, so I'm just saying. We need all the folks to participate in that, especially all the cloud providers, big and small. Second, the end user research user group is a great place to go talk about HPC workloads. And work group batch is a place to influence for how to build AI workloads on cloud native. And last but not least, as Joseph said, the end user technical advisory board is the place to connect, is the set of people to connect with because they will be interacting with our maintainer community very, very frequently. And that's how, together, we can build for the AI revolution. So Team Cloud Native, your work is cut out for you. There's a lot to do, and I'm counting on each and everyone to step in and participate. If you don't know where to begin, no problem. Here at KubeCon Cloud NativeCon, tomorrow there is an event running called AI Hub. It's an unconference. It's in Ballroom CDE in the Hyatt, and it runs 11 to 1230, and then 230 to 520 p.m. Go there for conversations. I'll be stopping by, I know the panelists plan to stop by, and there'll be many folks interested in the intersection of AI and Cloud Native over there. With that said, thank you so much for your attention and enjoy the show.