 Hi everyone and welcome to today's webinar. We're excited to have all of you joining us from different parts of the world. I have with me here my colleague and he's the creator of local AI. He'll be joining us and he's the coach for today. Today we're going to be talking about local AI mid-skate GPT, analyzing Kubernetes cluster states locally with CPU at the edge and beyond. If you're a seasoned professional or you're just getting started in a field of Kubernetes, I believe that there's something for you in this webinar. During this webinar I'll encourage you to ask any questions that you have. I'd also encourage you to share your thoughts and engage with us actively. After this webinar there's also going to be a recording of this webinar on the CNCF YouTube channel so you can go back and have a look at it should you miss anything. So without further ado let's get started. My name is Ole Bouvet. You can call me Bouvet because Ole Bouvet is very difficult to pronounce. I'm a DevRel engineer at SpectroCloud and I've also been a software engineer for six plus years. I enjoy writing technical content, writing code and sharing my knowledge. You can find me on LinkedIn at Ole Bouvet, Princess Ebuna. I mean the name shown on your screen. And you can also find me on Twitter as Princess Ole Bouvet. So the S is just one. There's just one S not two Ss for Princess. And then on all the social media platforms you can find me with the same name. Next slide, Atari. I'd like to introduce himself at this point. Hey everyone, I'm Mettore. I am the head of Open Source at SpectroCloud and I've spent more than 15 years contributing in the Open Source. And I've been maintainer of several Open Source projects. I'm the creator of Local AI. You can find me on Twitter as Maddler underscore IT or on Kitbab as Maddler. Awesome. Andrew Engi says AI is the new electricity and the same way that AI electricity is powering houses, that's the same way that AI is powering industries, businesses and the future of work. AI has come to change the way we do almost everything that we do at this point in time. Most industries have seen AI disrupting some part of their operations and how they work generally. There are tons of tools that are being released every day that utilize AI. Some of these tools include ChargeGPT. ChargeGPT is designed to facilitate natural language conversations. So when you give text input to ChargeGPT, it returns human-like responses. There's also don.ai used for generating images. Besides that, there's CodeGPT, which is specifically designed for assisting with code-related tasks, including code completion, code generation, documentation assistance and much more. The field of Kubernetes is not left out. AI has come to disrupt the way that we interact with our clusters as well. So what exactly is KIT's GPT? So there we have an exciting agenda. I'll go over the agenda first of all. We have an exciting agenda. Today, we're going to be looking at an overview of local AI and KIT's GPT. Atari is going to be working us through the technical details. We're also going to be having a small demo and then a Q&A session. So what exactly is KIT's GPT? KIT's GPT focuses on leveraging GPT within Kubernetes clusters to analyze your clusters. So the analysis of these clusters generates some information on what's going on in the cluster and it's transmitted to OpenAI, which in turn returns results on what's going on internally in the cluster based on the information that is collected on the cluster. It works with an external API to be able to do this. It's truly magical what KIT's GPT can do. So with it, you can enhance your SRE powers. It works great when the environment is not isolated. For example, with test environments, KIT's GPT works great. When it comes to isolated environments and air gaps environments, the question of having to expose sensitive information to a public API comes a valid concern. That's where local AI comes in. Local AI lets you run your LNMs on your own device or hardware. So bringing it to the world of Kubernetes, with local AI, you can now harness the power of AI to analyze Kubernetes cluster states locally that's on your own hardware. This way, you don't have to worry about exposing any data to the outside world. So this is perfect for isolated environments. At this point, I'm going to let my colleague Eturid talk us through the technical details. Thank you, Bubi. Yes. So I'm going to show you what is local AI. First of all, let's discuss a bit what it is. It is an open AI API drop-in replacement. So that means that the software that you've already built and leveraged the open AI API already works out of the box with local AI. What does it do? It runs large language models on consumer-grade hardware. It leverages the CPU, so it works mostly on any modern computer, and it can also accelerate the computation using the GPU. It uses open source models, which are provided by the community. It is the perfect fit for IRGAP or isolated environments or small fine-tuned models, or where the privacy is a big concern. So you can see the quick adoption of local AI in this graph. And how do we tie together K2GPT and local AI? So local AI gives you the ability to run large language models on-prem, so you can install that locally in your machine or also inside the Kubernetes cluster, because it has a Helm charge that you can already download and use. So first of all, K2GPT analyzes those analyses of the cluster state. So it tries to find everything that doesn't seem right in the cluster and look at problems like configuration issues, services not reachable and on-pod-crushing, for instance. It collects all of those analyses and feeds that back to the AI to enhance the error with a more comprehensive message. So you have a little bit more of context on what's happening with error. So this typically works by contacting the OpenAI API, so this is remote, but we can swap that component now with local AI and do the inference completely locally. Local AI, generally speaking, is an OpenAI API drop-in replacement. So how does it work? It has a different set of backends behind the scene, and backends are CC++. OpenAI is having goal and bindings, so it can actually leverage the models and run the inference locally. Since it supports as a shim of OpenAI API spec, you can use it to generate text, images, and also transcribe audio. And now, let's see the demo. So I'm going to show you now in this example how to bring up local AI. We talked about local AI already. This is the GitHub page. You can find instructions on how to run local AI down here in the usage, and here an example with using the GPT for all model. So now we are going to try out this locally, and we're going to see debugging and analysis of case GPT of a cluster with a workload, which is having problems. So first of all, let's create our cluster. In this case, I'm creating the cluster locally, and I'm using kind. This will spin up a Kubernetes cluster using Docker in my system. And afterward, I'm going to create a deployment, and I'm going to slightly modify it to have an issue. I cannot pull, let's say, an image. So now the cluster is up and running. I should be able to see all the pods. And I've already installed case GPT locally here. You can get it in the case GPT repository here. It releases for binaries. There are also instructions how to install it in Linux Mac and all the Linux distributions also for Windows. And so we are going now to set up local AI locally. So we are going to follow up the example over here. So I'm going to copy past this one. I'm going to clone the local AI repository first. And then I'm going to just get inside it. And the first thing you will notice here, there is a model's folders, which is empty. I'm going now to download one of the models. This model is free. It's Apache, truly censored. And even from GPT for all that I own. And there are also the models that you can use, but we will stick to this one and try with this model over here. So it's going to take a while and now it's about to finish. So the model is about 3.5 gigs. It gets partially loaded in memory. So you can expect, you need some hardware, certain kind of hardware to run this. However, it works even on Raspberry Pi, but it's very slow to get answers from the model. So now we are going to copy the template. So every model might have a default template to be able to talk with. And the template allows basically to interact with the model in a specific way, so that the models are trained towards a specific prompt. And in this case, we are going to use the template for GPT for all that was trained for GPT for all. And then we are going to start local AI. Now, this is pulling the images. Not the images right now are going to compile the local AI API binary the first time that it starts. We will have in the future releases options also to have the binaries, but keep in mind that it depends on the CPU, specific CPU you have and hardware. It will leverage certain instruction sets. So it is suggested still to run the compilation before running the API exactly for this reason to leverage all the instruction set of the CPU. So and this can actually make the performance much better inside having a general binaries. So it's about to run. So yes, so we can see now the local AI, it's getting up. So as you can see the first booting step, it's actually compiling the code on the machine. And as soon as this will come up, we will try to run an inference, something locally to see if the model is actually up and running. It's about to finish. It's compiling calling binary and we should see the API starting. So and there we go. The API it started. So I'm going to kill the logs. Let's have a look at the models folder now. So you can see we have the GPT for all model and the template file. Now we're going to run this command just to check if everything is right. So yeah, as you can see, we can see the ggml GPT for all model correctly listed in there. And we can try to ask a question to the model. So how are you? And this is basically the first call that we do to the API. So it will load the model into memory and it will be faster on the next inferences. Meanwhile, we can also check what's going on on the API over here. So as you can see the model loaded, you can have some extra output with bug options. So you can see what's going on. However, taking into account that on CPUs it really depends on the CPU model. But more or less this is the time that you can see. So we have an answer to everything. It's ready. And now we can just try to deploy something. It's not working just fine in our Kubernetes cluster. Okay, let's set up then Kate's GPT to actually use local AI. We're going to see the instruction over the Kate's GPT project. It's over here. Running local models down here. So there isn't start the API that we already went through. And now run Kate's GPT. So this is the model name. This is the backend. So we are going to authenticate the local AI backend with this model. So the model name is this one. So I have already provided for it. So I'm going to remove it. All right. So now again, right, perfect. So I have now the provider loaded in Kate's GPT. So Kate's GPT is going to ask directly local AI for things about my cluster. Now I'm going to take a deployments directly from the Kubernetes documentation. So create a deployment. This one looks good. So deployment. YAML. I'm going to put it over here. Now what I'm going to do is going to change the image and have some typos. Right? So what about this bad guy? So I'm going to just see what's the class and now I'm going to apply it. Deployments. There we go. Let's see what's happening. Okay. So I have something which is not good in my cluster. Now I'm going to ask Kate's GPT about it. So let's see what it's going to give me. Now what we can see of course is it will be definitely more slower because it's running locally. But and it's also buying to just use four threads over here. But it's going to give me an answer soon. Let's see the result. So back off pulling image. This was our typo and solution. So check the image existing in this cluster. Check the postage for image. If the postage is set always adjust the postage to pull if present and run the image pull. So now we get some real insight from the errors in Kubernetes. So we can check what was given into logs and we'll see here this is the message that was feeded to the AI. So simplify the following Kubernetes error message limited by triple dashes written in English. So this is both the error message from Kubernetes back off pulling image and this was the image. So and that's all about it. All right. Thank you very much for taking us through the demo. I hope you learned one or two things from the demo. Now we're going to be getting into the question and answer session and I have just two questions on my table. So the first question is what kind of hardware can local AI run on? Are there any hardware restrictions? That's a very great question. So generally speaking there are no big restriction as this is the back ends which local AI uses behind the scene works also on a constrained hardware such as Raspberry Pi but this is depending mostly on the model size that you are willing to run. So this is the reason I would suggest local AI also for fine-tuned models because if you have fine-tuned models then you can leverage that piece of technology more. For larger models you need very modern and capable hardware and I would suggest also GPU but generally speaking fine-tuned small bonus work very fine there. That's interesting and then there's a second question which says security is usually top of mind for several organizations. Can you talk about how security in local AI or can you talk about security in local AI especially with respect to production environment clusters? Right it's also a very good one. So I think local AI have a very good sweet spot here because everybody wants to use open AI but the privacy and the sensibility of the data that you are sharing with it is very important. It's an aspect that nobody is neglecting so I was reading news a few days ago that several companies are actually blocking access to chat GPT. I totally understand that and in this context when you are even analyzing the cluster data you don't want to expose any kind of sensitive data back to the API. So I think it's a great feat in this context local AI. I think that's clear enough. Thank you very much Etore. For next steps you can follow local AI on local AI on Twitter. You can also check out the project on Github using the link shown on your screen. Besides that you can look at how you can make use of local AI in your own clusters using the link also shown on your screen and with that we've come to the end of today's webinar on behalf of SpectroCloud. I'd like to extend my sincerest regards to Etore who was our speaker and to you all who are listening to us today. Thank you very much. Thank you Boobie. Thank you everyone. Bye. Bye.