 Bonjour, Paris. So wonderful to be here today. 12 people, right here. This is the largest KubeCon ever. I mean, it's so big that even my parents are here. It's a big year for cloud natives. It's turning 10 on June 6. Our CTO, Chris Anastricht, will tell you a lot more about all the celebrations and how much this means for us. But how far we have come in 10 years? Together, as a community, we've built up Kubernetes and Cloud Native to power the majority of web applications and extending to new and unique workloads almost every few years. Together, we have made platforms more resilient, stable, and secure. Some of the largest brands in the world rely on Kubernetes and Cloud Native. Spotify, Mercedes-Benz Tech Innovation Group, Airbus, Deutsche Bahn, Intuit, McDonald's, Disney, Apple, Adidas, Zalando, you get the gist, are all powered by us. I believe the reason for that is the extensibility of Kubernetes. Today, bring us an edge device, a stateful or batch workload, and we got this. Our ecosystem wouldn't be veritous today without the members who have supported us. Today, we welcome Akamai, Mercedes-Benz Tech Innovation, and Trend Micro as gold members as we continue to build infrastructure that supports every workload. Yes, thank you to all those companies. The past 10 years have been remarkable. I'm also looking forward to the next 10, maybe even 20. I mean, six years ago at KubeCon Cloud Native in Berlin, this conference, OpenAI told us that the future of AI was going to be powered by Cloud Native. Fast forward today, and what we see is that the world of AI has expanded and entered what Alan Greenspan, the former chair of the Federal Reserve of the United States, would perhaps call an age of irrational exuberance. It can be off-putting sometimes. The wild valuations, humongous investment rounds, every startup with AI in its name. But as Fred Wilson, a venture capitalist who lost almost 90% of his wealth in the dot-com bus said when quoting a friend, nothing important has ever been built without irrational exuberance. If we need proof, the technology inspired booms of the past, including railroads in 1840s, automobiles in the early 20th century, radio in the 1920s, television in the 1940s, transistor electronics in the 1950s, computer time sharing in the 60s, home computers and biotechnology in the 80s, and smartphones, and the app store in the mid-Ox, all brought lasting change to humanity and were a little bit gritty and preposterous when they were under wear. In this latest AI era, we are the people who are building the infrastructure that supports the future and are the partners that AI innovators need. Today, every company I have talked to utilizes cloud native. For those in the know, Slurm is a fan favorite for research and training workloads for AI. But even there, the realities of developer friendliness are making folks either move to Kubernetes or use Slurm plus Kubernetes, a favorite of mine. This big promise of AI will be actualized by our end users, people in this community here. And y'all are not wasting any time. End users developed a technical sophistication in the digital transformation and cloud journeys. The platform teams built, that's you all, and really know what is needed for infra. The AI centers of excellence need us, as they eventually find out. Question, how many folks or organizations are prototyping AI-enabled features right now? Raise your hands. Yep, a lot of you. And how many of you are seeing them have challenges go from prototype to production? They're the same people raising hands. Yep, yep. And that's because prototyping with generated AI is incredibly easy. But going to production at scale is how the AI dream is going to be realized. Feedback from our end user community indicates a significant push towards adopting the optimal AI stack for teams dedicated to gen AI development, but they're stymied by the prevalence of proprietary cloud-based solutions. This is not a new story. Proprietary or less open options can be faster to adopt, sure, and develop against, but there's a cost. You're buying into an opinionated solution that offers less configuration and interoperability. Wald gardens are not for everyone. As you may recall from the cloud computing era, the irrational exuberance of a new type of user experience leads to rapid prototyping, and that can reflect on the bills. Our friends at the FinOps Foundation ran a state of FinOps report in 2024 and found that 60% of large organizations are experiencing or expect to experience a rapid increase in costs because of AI ML workloads. Some are saying it's the wild west of cost and knowledge. Costs are growing exponentially. It's all very stressful, but it's also understandable. We are trying to shift as fast as possible, new models, new versions of models, inference features, and we are doing this without standardization. So each time is a huge lift to package and deploy to production. This narrative is familiar and seamless. In the cloud-native world, we saw standardization emerge with the Open Container Initiative, or OCI. In the AI world, with the order of magnitude of data and hardware choices being so much greater, we also need standards to emerge, and that will help encourage the same interoperability and consistency of how workloads are built, run, and deployed. And cloud-native is here to help in that journey. Right away in the present moment, the answer to cost management and interoperability both lie in open source. With the right guardrails, platform engineering teams, which is all y'all, can help your organizations. And I'm going to show you how. Just this past month, I discovered Moondream. It's a VLM, or visual language model. And what it does is, if you show it an image or a video, it will tell you what is happening in that picture or video. I tried it out on Hugging Spaces, which is Hugging Face's sandbox environment. Think of Hugging Face as the GitHub for machine learning. Now, Moondream is epic. It is so cool. When we were building the demo, Moondream had not yet been open sourced, so I couldn't use it. Today, it is Apache licensed, and I'll have a link for you to see it later. What I'm going to do now is I'm going to go to my terminal and get my kind cluster going. This is how I'll be running Kubernetes on my laptop. Those who have seen my demos before know that's kind of what I do. All right, let's get started. So just picking this off. And while that's happening, can we please go back to the slides? Perfect, thank you. I'm going to show you the architecture I'm using while the cluster is spinning up. So everything I have used is Apache 2 or MIT licensed. Oops, sorry. On the development side, I have an LLM application, which is an Image Summary Inference App built on Kubernetes. Now, Image Summary Inference App is a fancy way of saying, well, give it a picture, it'll tell us what's in the picture. And with Kubernetes, we have the flexibility and scalability we'll need for the modern software life cycle, which is essential once my app takes off. Now, on the other side, you'll see here, I have Lava. This is a visual language model and what I'm using instead of Moondream. And Ulama is running locally on my laptop. Lava serves as our building block, a component that offers the runtime environment for the application. Ulama is our model registry service and acts as a gatekeeper, managing traffic and ensuring secure communication protocols are upheld. Think of Ulama as the repository and distribution hub for all AI models our LLM app might need. Kind of reminds me of Docker Hub. And the LLM artifacts make me think of OCI artifacts. Now, if you look on the production side, it's a mirror environment with Kubernetes at the heart of the entire operation. The CNCF platform engineers, Taylor, George, Jeefi, have allow listed the libraries I used and we can go into production with Kubernetes the minute we want to. From a guardrails perspective, the model registry is the single source of truth for model management and serves as a consistent dependency between our development and production environments. Platform teams push approved models into Ulama, which are then version controlled and managed according to an organization's specifications. Let's go and check if my cluster has loaded yet. OK, it looks like it. Before I get started on the URL, what I'm going to do is I want to show you all the options that I have for models that I could use. You'll see three, oh, can people read? Should I make it bigger? It's always a good idea to make it bigger, so just did that. Hopefully that helps. OK, so you'll see here I have three models, Baklava, Lava, which I said I'm using, and Mistral. Mistral is a text-based language model, so that's not a fit for my app. Baklava is a combo of Lava and Mistral and is something I could use should I want to and just swap it out very easily. Now, going back, OK, here is my local host environment. OK, this is my application. And here's the camera. So what we're going to do is everybody here is going to join me and say Kubernetes and we'll take a photo. So everyone, one, two, three, say Kubernetes. Kubernetes. Awesome, great work. Best audience ever. Oh, and the app has already given me an answer for what's happening in this image. And it says it's an indoor setting. It resembles a conference hall, spot on, rows of chairs, facing a stage or platform, et cetera. You can see how it's very accurate. So now that applause tells me we need to get to broad ASAP. No problem. All we need to do is change the variables and run the pipeline and we'll be good to go. If you are interested in checking out this demo or the Moondream VLM, please feel free to use these QR codes to get access. Everything's available. My last demo got 145 stars. Pretty stoked about that. Let's please beat it if you like this. And this demo ran fully without a hiccup, so I'm just saying. This architecture demonstrates not just the harmony between development and production, but the seamless flow of work from the hands of our developers into the lives of our users. Cloud native isn't just connecting AI development with AI infrastructure. It's facilitating the connection between services, teams, and people. Our projects and ecosystem allow folks to focus on their areas of expertise and provide a trusted interface on which we can all depend. So I urge you, tell your developers going from prototype to production can be easy even when they start with open source. Give them the guardrails they need to be successful and help them make their dreams a reality. Roger type to production at scale turns irrational exuberance into sustainable growth for mankind. Speaking of turning things to reality, the cloud native AI group has released a white paper and reference architecture that's available to you today. And you can see the cloud native AI landscape that they have outlined here in this picture. I'm going to focus on a couple of sections, and you're going to see some pink logos and blue logos alongside the standard template purple ones. Pink is CNCF, blue is Linux Foundation, and you'll see a comforting amount that's already within our purview. So here, as you can see, this is general orchestration, lots of pink over there, no surprises. There's MLflow and Kubeflow. Sorry for the twisted workload observability. And the rest you will see on our social channels soon after this conference. And this is the cloud native AI working groups or reference architecture for AI workloads that's released in the paper. I highly encourage you to read it. As you can see, the community is hard at work on these infrastructure problems for AI. You will hear from Nvidia, Microsoft, Bloomberg, among others about the latest developments after me. I want to focus on another element. I want to bring up on stage some of the brightest minds in generative AI today and hear how they want to work with cloud native to support them in building the next generation of AI. Please welcome on stage Timothy Lacroix, come on Timothy, of open and portable generative AI for devs and businesses. Hi, Tim. Thank you for coming. They have received a lot of love and attention from press and investors alike for their open source models. Next up, Paige Bailey, who leads generative AI at Google, and they recently released Gemma, the open source slash language model. Excellent. Good to see you, beautiful. Thank you. Oh, I appreciate that. And last but not least, please welcome Jeff Morgan, founder of Olama, and open source tool for running models with 48 plus stars. Welcome. Good to be here. All right, folks. Thank you for joining me. It's such an honor to have you. It's an honor to be here and to get to see so many people who are enthusiastic about AI and cloud native solutions. This is awesome. Thank you for coming. You know, I want to start with you, Jeff. In the demo that I just did, it was so evident that Olama provides a very similar experience to Docker. Being a cloud native OG and someone who worked at Docker was that intentional. Part of it was just what we knew after using containers and Kubernetes for 10 years. We kind of couldn't help but design it similarly. But the bigger idea here is one of keeping paradigms the same. And it ends up that there's a lot in common with distributing, running, and serving models with applications. And so for that, it made a lot of sense to lean on existing paradigms and ways of designing tools as Docker, Git, and other cloud native tooling. Makes a lot of sense. And now you're positioned between the Infra and AI developer teams. What does that look like? There's definitely a gap. And it's a gap of tooling and concepts and community. The best way to think about it is what we were all working on together 10 years ago, which was the gap between developers and ops. And obviously, we've worked to conquer that together and to build tooling to bridge that gap. But the same gap exists between AI research and AI engineering as well as developers and ops today. Yeah. I remember, I definitely agree. I remember when I first started doing machine learning around 2009 and then started talking with some of the folks at Microsoft around 2017 and 2018 like Lachlan and Bridget about how I would go about deploying some of these solutions, the tooling stack was completely different. And not just languages, but also libraries, infrastructure. Most of the machine learning engineers weren't even thinking about containers and still aren't. So I really do feel like there's a lot that we can learn from each other, especially as more and more of these AI applications are getting deployed out into the world instead of being stuck in research land. Here, here. I mean, speaking of, there is so much innovation, fast innovation in AI and LMS right now. How does your infra team keep up with your scientists and model builders? Oh my gosh, they don't. So as an example, as we're training out these models, people want to try different architectures. They want to do tailpatches. They want to incorporate more data, train bigger and bigger models or models of different sizes. And each of those sizes has different hardware constraints, different latency constraints, different deployment targets or configurations that would need to be included. So I feel really fortunate to be able to work with all of the AI infrastructure engineers that we have at Google. They help make sure that the stuff that we build is actually useful out into the world. And it's a really, really hard challenge. Yeah. Tim, is your experience similar? I mean, it must be a bit different because I don't have access to all of Google's crowd of infrastructure engineers. So there are like three different workloads that we consider. There is training, which has lots of large homogeneous resources that we have to allocate between researchers. For this, we use SIRM that runs on Kubernetes, which has been a very nice experience for us. Everything that goes to prod is deployed on Kubernetes directly. Nice. The third workload, so inference, is like short workloads on single node. The third workload is fine tuning, for which code looks like the code for training, but it is on much smaller resources, much shorter time spans. So this can also be allocated on Kubernetes as well. Very nice, Saif. It seems like we're being useful. You know, you're building so fast in your startup, it's gone zero to a billion in nine months. What are the scaling constraints you might see as different models are evolving in sizes and capabilities? So I think demand for all sizes will grow from edge to huge models that take up multiple nodes. I think right now people are still discovering the models and still don't know which size is the right one for their application. Once people start to figure this out and get more confident, they will be able to pick the right price point for their use cases. What's interesting as well is that as the architectures are getting more settled, chip makers have the occasion to build more dedicated chips that are more efficient for these use cases. And so the variety of hardware that this community has to support will also grow, so that I don't have to think about it. Absolutely. And definitely, I also think that some of these inference constraints are driven by use cases. So as an example, in the AI for software engineering space, for things that are low latency like code completion, you might want to have smaller models, perhaps even local models that are used for very, very fast efficient inference. Whereas for some of these more complex planning workloads, where you have massive models kind of building target opportunities for migrations or for being able to do things like performance optimizations, you're using a much, much larger model, which would need to scale across multiple nodes. So I definitely think that we're going to be seeing a more interesting land in the hardware space for these inference cases, but also people using smaller and smaller models more efficiently, maybe doing fine tuning or using long contexts. So big, big opportunity space there. A lot is evolving, it seems. And like you said, the GitHub co-pilot-esque use case is one kind of model, targets, migration plans, different kind, yeah? And then as you were describing, we have to think of all kinds. There's three types of AI workloads with the training, fine tuning, inference. The app that I had deployed was an inference application for those who might want to know. And that I could run locally on my computer, thanks to y'all at Olama. But it was a CPU, M1 chip, M2. So I'm curious, in the land of AI right now, there's a lot of what can be called open source washing going on. What are technologies that need to be Apache 2 or MIT licensed? I think the obvious answer is the models themselves. That wasn't quite true six to nine months ago. But thankfully, to models like Mistral, we're able to have Apache 2 large language models and foundation ones. The reason for the model needing to be open source is one of access and trust. Customers are deploying this and using it with their own customers as well. And so you have this situation where the surrounding infrastructure on the model needs to be open source. Because today, the cloud tooling we use is open source like Kubernetes. But if the model is not open source, how are we supposed to extend that in a way that's unique to our business? And how are we supposed to go build our dreams for not able to actually open up the model, understand how it works, and ultimately secure it so that it's a good steward of our customers as well? I know you have a take on that, Tim. Yeah, my take on what needs to be open source is really all of the tooling. So from the languages that we use every day, like JAX or Bytorch, down to all of the compiler stack that lowers this to the hardware, I think that's what I really hope will stay open source. Because I don't know, it gives us so much more opportunity to just swap out things without asking ourselves, like, am I optimizing during this one year something for hardware that I'll have to throw out next year? So having this freedom of switching from one vendor to the next is really great. I have plus one to everything that's been said. I really loved when I was first exploring machine learning, being benefited by looking at all of the libraries, at all of the models that had been open sourced, and really being able to teach myself some of these concepts. If we don't do that as a community, then we run into this scenario where the next generation of AI scientists and the next generation of people who are wanting to deploy AI applications, they don't have that opportunity to learn. And the only place that they might be able to pick up those skill sets are by embedding in one of the companies that are doing it professionally. So I really do feel like we should make the models open source, the compilers and the frameworks. And then also, it's just been magical to see what people build. When we released the Gemma model externally from Google, it's a smaller model. But almost immediately, like three days after, people were building a mixture of experts versions of it and testing it out, building like a code version or like a special fine-tuned version for their particular use case. So I've been just super enchanted to see that. And the only way that we can understand what these models can do are by people kind of testing the boundaries, pushing them for new things. And that's just amazing. So if the cloud native community here could do anything for you, what would you ask for? Well, these models don't run in a vacuum. I think we built an immense amount of powerful tooling together to run applications at scale. And the same challenges are going to repeat themselves, just like you mentioned earlier, in this new world of running AI-based applications. And so my big ask is, how do we take the amazing tooling around monitoring, security scanning, logging, and how do we bring that into this new world of AI-based applications? The challenge is really that we're going to have to do it fast because these AI models are evolving so quickly. I would love. I feel like everybody's prototyping with paid service APIs right now, which really obfuscates some of the infrastructural requirements. I would really, really love for the community to create patterns for how you could replicate the experience that you're getting with paid service APIs with purely open source solutions. And not just using them for inference, having these retrieval patterns, but also the things that are very, very important for deployments, like observability, logging, all of these things. So I would love, if anybody has a yearning to have a weekend project, that would be wonderful to see. For me, I think it would be like keep abstracting the hardware, basically. I had to know Kubernetes one year ago, so I'm really an end user. And when we run things in prod, basically, I don't want to care about whether my GPU is going to die in some weird way. And so I think knowing that all of this management of health and checking that everything is good, replacing the machines if they're not well, all of this I think should be done by this community. And I would feel much safer knowing that all of these health checks are done by the cloud-native community, basically. Awesome. That's a word of confidence. Thank you so much. Yes. There are so many questions I want to ask you, things I want to delve deeper. And I'm sure folks in the audience want to have a more conversation as well. So we don't have time right now, but let's do this after the keynotes. If any of you want to chat with the panelists, have a conversation. Come to the AI Hub at 2.30 PM today. And the panelists will be there. And all questions and ideas are welcome. Panelists, I hope you saw that the cloud-native ecosystem, we are a way of working, we're a community. And our innovation is across all workloads. And we are here for you. Audience, the Generative AI Revolution is going to change the world. These people are changing the world right now. And they will do so with our help. Let's help our organizations, our friends, go from irrationality to completely thoughtful, rational exuberance, and give the prototype to production story a happy ending. Thank you so much, and enjoy the show.