 All right, everyone. Like Chris was saying, I'll just get kicked off now. Presentation is about using a quote unquote robot to help solve our issues with Kubernetes and apps that run inside of Kubernetes. So, a little bit about me. I've been around for a little while, a little over 20 years. Worked with Chris in the past, actually he was a, helped me kind of tangentially when I was building a company back in the past. Worked in different fields and tech, everything from building data centers to helping direct drivers, things like that. Currently, I'm in the freelance market. I raise chickens and things like that. So, recently I've been working in observability and for some reason that industry keeps pulling me back in. So, I'm also looking for opportunities in AI now, which is why I'm doing this, because it was kind of a good site way into that. And as you can see, if you want to get a hold of me at any point in time, I'm very hard to find online. So, just go ahead and reach out any time, give me any questions or anything about the presentation. So, in order to figure out what the point of all this is, we have to figure out where we are at currently. So, I'm gonna walk through a little bit about Kubernetes, why Kubernetes is important, observability, why that's important with Kubernetes, why it's actually needed. It's not like a nice to have, it's an actual requirement. Prometheus, how that plays into the observability ecosystem. Combusto, what that is, how it actually uses Kubernetes and Prometheus to build up a set of observability metrics and observability data points that you can look at, if you can find problems. And then kind of what people are actually waiting for is this open AI integration that the Robusta team has put together. And it's all open sourcing, you can actually go in and see it and add prompts to it and things like that. It's pretty interesting. And then hopefully be able to do a short demo of what it's actually doing. And hopefully if demo doesn't turn into demolition. So, currently Kubernetes, I'm sure every of you here has at least touched Kubernetes at some point in time. So, you understand what the advantage of using Kubernetes is, right? You know, it's cloud-native, you can do orchestration, you can go apps that are cloud-native, they're scalable, they're portable. The biggest thing, especially for like the being powers they like is that you can actually have a lot of cost efficiency then. So, you can get more out of less hardware, so to speak. So, obviously all of this is good stuff, right? I mean, everybody likes all this stuff. So, what's actually the issue when it comes to Kubernetes? This is the issue, right? If you don't really know what you're looking at, this is what you get. Think about being one of these initiating things, you sit down for like the first or second time and you're trying to figure out a problem or you're trying to write an app or you're trying to get a pod running and someone throws this in front of you, right? You don't know what's actually going on, right? There's a lot of complexity, which is just inherent with flexibility. So, if it's flexible, it can be complex. And because of that complexity, this is where observability comes in. In order to actually have each complex system, in order to actually use them, in order to be able to get the most out and get to actually know what's happening inside of that system, right? So, this is where observability comes in. And in the past, there's been these one-off flavors of observability out there. You have things like certain companies, you know, I'm not gonna speak ill of anybody, but you have companies out there that have their own kind of native observability platforms that use their own text and metric, that have their own query languages, that had all of these different things that it was, or very fractured ecosystem. But at the same time, it was immediate, right? So, people would go out and get observability platforms, say like a dog or a cystig, they have to put that into their environment so they can actually see what was happening inside of their Kubernetes environment. Well, that's all good. Those platforms are fine, there's nothing wrong with those platforms, but now, this is where Prometheus is coming in, okay? People wanna have a cloud native, an actual cloud native observability platform that works across, you know, AWS, Google, OpenStack, whatever you have, wherever you're deploying Kubernetes, they wanna be able to use a cloud native observability platform. And that's where Prometheus comes in. If you look at any of the companies out there, such as, say, Cystig, or Datadog, or Chronosphere, or Profana, any of them, they're all using under the hood, in some way, shape, or form, they're either compatible with or using Prometheus, okay? And why do people wanna use Prometheus? Well, for all the things I just said, plus, you know, out of the box, it just works with Kubernetes. Okay, you deploy Kubernetes, you deploy Prometheus, you're up and running in very short order. You can write all of your applications that run inside of Kubernetes, who have scraped endpoints, so if you're using something like Python or GoLang, they have filers that are just native to Prometheus, so you can build your own scraped endpoints, and gather metrics from the application itself to put it in Prometheus, to scrape it into Prometheus, so you'll have your application layer, your Kubernetes layer, and then you'll also have, say, your virtual infrastructure, maybe even your physical infrastructure, all having metrics inside of Prometheus, and that's very powerful because when there's an issue, you can go in and actually model the data and find out where things are, like where's network issues, where's bottlenecks, why are the pods failing, you know, things like that, all the things you do during a troubleshooting session or something broke. And also, with Prometheus, you get the added advantage of having this very flexible query language that's built into a call PromQL, right? The double-edged sword of the flexibility, again, of PromQL is very high learning curve, okay? I mean, if you set down PromQL, yeah, you can look at a couple of docs and you can start doing some simple queries like summing and ranging and things like that. I mean, you get much more complex data queries and data modeling, complexity of the queries goes like that. Okay, it's very complex. I might no means an expert in it, okay? I'm trying to constantly ask people for help so that they'd be perfectly honest with you. That in Prometheus, it also simplified the data gathering, the metrics gathering process. So in the past, in some platforms, I actually had to push data to the actual platform, right? Well, you can see how that can become a problem is you just try to scale up, okay? The number of nodes, we'll say, they're pushing metrics to these platforms. But for Prometheus, you get something very interesting. You've got to actually pulling data or pushing data to Prometheus, you're pulling data through the scrape method. So they have a thousand endpoints out that you need to scrape. You can figure Prometheus to go out and scrape all those endpoints to pull the data in, okay? So you're not actually overloading Prometheus systems just going out scraping all those different endpoints. So Prometheus, well, it's awesome, it's cool. I love Prometheus. There's two issues with it. There's no logging it, there's no tracing it. Those are separate platforms and it's worth mentioning. I know it's outside of the scope of this but it is worth mentioning. Okay, so now that we know Kubernetes and Prometheus, they're really cool with their complex and they're kind of a pain if you don't know what you're looking at and what you're doing. This is where this project called Robusticum. This is something I was explaining to Chris earlier that I found because I've been doing a little bit of research into AI, how to make observability a little easier to look at how to simplify Kubernetes troubleshooting. And I thought on this, I thought, hey, this is actually really cool stuff. It seems like they're really putting a lot of work into this. So it's open source, obviously. It does have a component, a SAS component that we'll get to later that is not, to my knowledge, like an open source project, but you'll say if you have under 20 nodes, you can still use it for free. What's really nice about Robustus, it takes these two complex systems and it simplifies them. It actually is, for all kinds of purposes, a simplified observability and automation platform for Kubernetes. While you strictly don't need Prometheus to run Robusta, it's highly suggested that you have it in place. And if you don't have it, Prometheus already in place, when you deploy Robusta into your Kubernetes environment, it'll actually set up Prometheus for you. So when you go through and you actually set up the configuration YAML manifest, you can specify, yes, go ahead and deploy Prometheus for me and it'll just go ahead and do it. So it makes it, it's very easy to deploy. It's, and at the end, I actually have the Git repository that I use. I loaded all my notes and everything into that repository. So if you guys wanna go home and just pull that down and play around with it, obviously feel free to. What's going on in the hood of Robusta? This is a very high level of diagram that I pulled from their website. So they have the forwarder, which is actually that monitors Kubernetes and Prometheus, they have a little bit more on here, but we're just talking about Kubernetes and Prometheus. So they're gonna look at Kubernetes and Prometheus, see what's going on and any kind of alerting or any kind of, we'll say, these are what you're really looking for to happen in that environment will be forwarded onto the runner. And the runner is where a lot of the magic kind of takes place. That's where these things that Robusta calls playbooks which are automations that you can use to automate your environment or even remediate issues like simple issues that come up inside the environment. You can use a playbook to do it. There's a bunch of playbooks already built into Robusta. This thing kind of works out of the box already when you deploy it, but you can pretty simply write these playbooks yourself and then deploy them into the runner. As a matter of fact, this open AI API extension is a quote unquote playbook that just gets deployed into the runner at install time. So it is obviously the open AI plug-in that just extends the runner to be able to talk to the open AI API and Prometheus to bring back information. Any questions so far? I said you're gonna share with us some of the work you've done, like the GitHub link. Yeah, but you know this at the end. So people can just go ahead and clone it and play around with the play around with it and walk through what I'm talking about right now. Sure, thanks. So how do you interact with Robusta? There's a couple different ways. There's this thing called the Robusta CLI which is you can just install with TIFF. If you go to the Robusta GitHub repository or their website, that they have very, very simple instructions that go ahead and install that. Once you get that installed, that's where you actually put the, where you actually create this Kubernetes manifest, well it's actually not a Kubernetes manifest, it's a manifest you're gonna use with help because you actually use help to install because that's what makes it so easy. But you can actually go through if you wanted to go ahead and pull down the code for Robusta that runs inside of your Kubernetes environment, modify, build it, write your own manifest, do whatever you want with it. I use help because it's just the easiest way to do it. So yeah, you'll use that CLI to make that value to a dynamic file. And it's not, I mean, you can't really administer Robusta with the CLI, but from what I've seen with my limited exposure to it, it's very good at obviously building that configuration file and looking at log files, okay? That's what I've been using it for. You also have a Grafana-based local GUI. So you'll obviously pipe this out through your Kubernetes environment. I'm not gonna go into how to do that because I don't know how everybody has their environment set up, but you can pipe that out and actually get to it. So you don't technically need to use the SAS GUI. So if you wanted to just have Robusta up and running, looking at your Kubernetes and Prometheus environments, or if you want to deploy Prometheus with Robusta, and actually just look at the metrics that are flowing in from both Kubernetes and Prometheus, you can use these Grafana dashboards. And not only are there a bunch of dashboards, there's also a lot of really good alerting. They've already pre-built for you. Like I said, it's pretty great to go out of the box. If you have an environment set up, you can just deploy this and feel pretty comfortable that you're gonna get a lot of a taste that you're kind of commonly see. But at the same time, it's very easily easy to extend it using these Playbooks, these Runbooks, and also be able to write your own alerts for the alert manager and Prometheus as well as your own dashboards and Grafana if you want it to. And then last but not least, the SAS GUI. The SAS GUI enables a lot of different stuff, okay? It's obviously easier to look at than some of the dashboards, and those are pretty easy to look at. I'll show you what I'm talking about here in a second. I'm gonna deploy the demo. But it makes things easier to look at. You can drill down, you can very clearly see and track problems, it has a nice timeline. So you can see when problems get fiery and how they relate to each other and things like that. So the SAS GUI is very cool. And this is the actual part of the presentation. Now that you know where we're at, how we can actually look at what we're here for. So the open AI integration, like I said, it's a Runbook that actually runs inside of the Rebooster Runner. When you deploy Rebooster on its own, it doesn't actually have this. You have to add it. And inside, if you look at my GitHub repo as well as the Rebooster docs and their GitHub repo and on their website, they don't explain how to actually meet. Well, it's really super easy. Okay, to do is, I can paste a couple of lines of code into your values file and you deploy it with Helm and you're up and running or you can just do a Helm upgrade if you already have it deployed and it works for you. So this allows you to have that open AI, that LLM integration. So you have chat, VPP and assistant. This will, or the alerts that come in to open AI, go query it with the prompting that's available inside of the integration that you have. And what's neat about it, like I said, you can actually modify it to prompts. And I modified the prompts a little bit just so I think that you get a little better results with the prompt that I have. All they have is actually really good. So I was actually giving back commands or just giving back instructions of, oh, you're seeing this, well, maybe this is the problem. I actually wanted to be able to have Kubernetes commands as well, return back to me. So kind of the use case of, I'm a level one support guy, this will actually just help you get pushed in the right direction, right? And what's cool about it is, it removes to have all these integrations. So you can either forward all this to like Slack, Discord, Telegram, there's a whole bunch of them. There's always different integration. So that if you're just out of living your life, you can interact with this thing and actually get some work done or remediate an issue if it's a simple issue or troubleshooting an issue if somebody on the phone, right? There's a couple of things that you're gonna need to make it work, Slack channel. For this instance, it's a Slack channel, but like I said, if you go to the website, you can look at all the other integrations that they have. For this demo Slack channel, which I'm sure we have one of those in some way, shape, or form. And then the API key, which I'm gonna draw back to that, is that there's an nominal charge to use the API. I mean, I guess if you're making hundreds of thousands of calls, it's gonna cost you a lot of money. But for this, I was talking to Chris, I think I spent five bucks in the last month playing around with this, so it's a nominal fee. And right here at the bottom, you can see there's the actual GitHub repo, so you can go take a look at the actual code. Yup, fine. Have you looked at what data's being sent to OpenAI? So like from a security context, is OpenAI using the data you're sending it and mapping your topology of, for instance, your Kubernetes cluster and learning from it? That's a good question. Yeah, but what's happening is you're sending, and I can actually show you this. Let me, I didn't mean to derail you. No, you didn't derail me at all. Let me talk about the logic, show you the repository so you can see the prompt. What it's really doing is, I'll explain it real quick, is the integration is, it's not like you go to Chansey PT and you type in the prompt, say, hey, I wanna know what a BlueJ is or whatever, you know, and it gives back a bunch of information. That's really what it's doing in the background. It's not technically scanning your network, and pulling that data in. It only knows what the prompt is that's getting sent. Okay, and I'll try to talk about it, it's probably gonna be if I actually show you the prompt, it's a natural language processor, right? So you just type a sentence in, which is kind of how you're sending a sentence saying, hey, I need it in this format, blah, blah, blah, and here's my issue, okay? So, no, that's a very good question. That is a concern, obviously you don't wanna have, for security reasons, have that data just jumped out onto the world scene, you're right, exactly. But here you can see it with the Slack integration. When an issue happens and it says, hey, there's an issue happening, and shoot you a message, the color of the thing that you can do, you can either click on the invent date button, go right to the SAS interface, up or a booster and start drilling down and troubleshoot like you normally would, okay? You can silence it, so it's just don't look like an annoying click. Yeah, I know that pop's gonna crash, okay, fine, whatever, they can wait until tomorrow, you can just silence it for however long you wanna silence it for. Again, that takes you into the SAS environment, you can figure the silencing. But then there's this new button, so we have this plugin for a chatGPT, from the purposes installed in the runner, it's gonna forward you this button as well, say, hey, let's ask chatGPT with it, thanks. So you push that button, you give it a couple seconds, process, and then it will shoot back an answer to you. It's actually, like I was telling Chris, right now it's good, I mean, it'll push you in the right direction, it's not a be all and end all yet, but it gives you like another 18 months of development and you're actually able to train the LLM a little bit more on some issues that you're seeing in Kubernetes. I'm pretty confident that it's gonna give a heck of a lot better answers, and actually at some point you're just gonna say, yeah, just go ahead and remediate that, and it'll just end up doing it for you, just follow all your own directions, right, just go through and automate that right now. I mean, with Rebooster, you can already, like I said, remediate some of these simpler issues already with the runbook. So if an issue pops up, like this pod, it'll crash if a load goes above, okay, well, scale one, or something like that, right? You can already do stuff like that. So if you think about it, if you're getting this data back, somehow integrate it back in with a runbook that will then, okay, this might be an issue, why don't we try to remediate it with this automation, okay? Now let's see what happens. Maybe that could be our level one support, right? If that doesn't work, okay, now I need to get on, now I need to put hands on the keyboard and figure out what's going on, right? Question. Yeah, do you know if there's a company behind the Rebooster, Rebooster, or is there, it's like a bunch of, you know, nerds came up with the idea. Thank you. No, I think that's a good question, too. There's a small company behind it. They have a website, they have a corporate website. Like I said, it's free to use, the SaaS environment, 20 nodes, whatever that, and they do have a licensing for anything over 20 nodes or businesses. They also have like an unlimited, kind of all you can eat type of plan on there. So there is a company behind it. It's a startup, or at least it looks like it's a startup. It's a pretty good idea, I like it. I think that they play their cards right, and they start honing in on something like this. I think they're gonna be in a pretty good shape. Thank you. Yeah, any other questions? Okay, so I think that works. Right there, let's test the GitHub repo that I used to put this all together. So if you wanna go ahead and pull it down, you can play with this as well. Just to be 100% transparent. I actually, how did I copy their code down? I changed the prompt like to put it in this repository. I think I'm actually gonna fork their repository and put the changes I made in there, submit a PR, just because I was comparing the results. I think I'm getting all the better results than what they were getting out. So I wanna see what they think too. Okay, so this is demo time. Demo time, hopefully it doesn't turn into demolition time, but okay, so let's go ahead and deploy it. I'm gonna run for my laptop, I got a mini cube right now. Give them a minute to come up. Probably should have done this before, and it is what it is. I got these are all very kindergarten level shell scripts. They were honestly just because I can't type, okay? And I'd be able to type. I've been more kind of tight than demoing, so I forgot to do that. So if you bring it in, if you clone the repository, you can go ahead and use these same things, and it will just do the same thing on your laptop. All right, let's go ahead and run. As you can see right here, I'm deploying Prometheus with Reboost. I'm not gonna go through and actually generate the Helen values file. It's the simple fact that it's really straightforward. And when you look at the doc, it's literally one line. Put it in, done, it just spits all the stuff out. Copy and paste the open AI integration stuff into it. And you're done. Man, it'll take you two minutes to actually do it. That's kind of a no, if you pull this repository down, you're gonna have to do that part because I was gonna put the generated values file in there because they have a bunch of security keys in there. And I don't want to, obviously, for obvious reasons, I don't want to go into my environment. So as you can see, it's deployed, you know, Prometheus, the forwarder, for those local dashboardes, as well as the KubeState metrics is running. Prometheus Node Explorer so I can actually scrape my physical node or physical node, however you define that, it could be a virtual node too, obviously, and then a runner. So everything's on and running. And if we go over here, press this, go in here, and we can see, that's the alerting. So if you look at the alerting rules, like I said, it removes the cum of a bunch of predefined rules, okay, or out of box, okay? Really good rules, like I said, this thing, if you were to deploy this in your environment, say you have a homelab setup or you're just testing something out or you have a dev environment up, you can feel pretty confident that this will catch almost everything that you're gonna be in general. If you go back here to home, and say we want to look at the dashboards, try not to be a lot of data in there, because it just came up, but there's a lot of really good dashboards in here. We'll go back over here. So we can actually go in here, and as you can see, there's a lot of really good predefined dashboards. You can look at things like cubelet that's running, the alert manager overview, so there's really no alerts running right now because nothing's broken yet, so I don't see anything happening. But you know that it's there, and it'll also tell you when messages are stopped, like flag or Discord or Telegram, and things like that. And what's nice about it is that it is for fun, if you wanted to go in there, either change those dashboards to figure needs or create new ones if you wanted to, which is what I would obviously do, so create new ones. Now the SAP environment, okay, but this is a SAP environment that, let's do a little boost up, add the free version of it, obviously. When you do that generated values, it actually has all the connection information to connect you up to this, so it doesn't, you don't have to set up an account or anything, you just go in and fire it off, you're good to go, you're ready to go. Well, like I said, there's things like a timeline, so you can see what errors are happening or what's happening in your environment, what kind of they happen, it's a little bit simpler to see, then you'll go ahead and try to build out a dashboard inside of Grafana with a prom QL or how are we trying to get to the data and model the data. You can see what jobs have been running inside of the Kubernetes environment, and you can start drilling down inside of them. Of course there's not gonna be a lot of stuff to look at because this just came out, it's only a dev environment, you can get an idea of what you're actually looking at. So you can see stuff, I can go and I can actually see the YAML that was used, deploy that job, see CPU memory utilization, alerts that are firing for it. You can see a whole bunch of different stuff inside of, inside of this SASS environment. And what I really like about the SASS environment is a lot of you just start drilling down. I like that ability to actually see that a good overview and then to drill down inside of an environment, you can get too much data on the screen at one time, it gets very confusing and honestly, it just, you just get lost, right? It's like, oh no, I'm even looking at here. Obviously there's a mini queue, so I have one node that you had, 50 nodes running in here, you can go in, you can see all your nodes, you can drill down on those, you can see what's going on inside of your nodes here, what's running on them, CPU utilization, the memory utilization for that node, any alerts that are related to that node, events on that node, you can see a whole bunch of different data on there. And again, you can drill down into all of these different things. Preparing clusters, I don't have obviously multiple clusters running, so I can't compare them to find out which one's running better, which one's not. So I can't do that, but those are your actual cluster that you, that I'm currently running right here, called PromBot, going, you can drill down, you can actually see different stuff inside of there. So any questions so far before I actually hit the OpenAI demo? So you mentioned that it works with Prometheus, does it also, is there like a plug-in for like low-key for logs at all, or is it just like strictly Prometheus? It's strictly Prometheus that I know of right now. Okay. They'd be really cool if they were using the low-key, if they would have a low-key plug-in, if they could see all that, you know, you'd be able to relate to metrics, Kubernetes, logging, that'd be really cool. Yeah. All right, thank you. What we're going to do is we're going to actually deploy a broken pod. So let's go ahead and say, play the demo. Okay, it's broken. See if the SAS environment caught it. Timeline here, it's caught something here. Somebody's going to fire it off the alert. Yeah. Oh, there it is. Okay, pod crashing. Here we go. So you can see that you got a pod crashing. Like I said, if you wanted to, we can go in and actually investigate this. You can pull this up and investigate, go in here and start actually drilling down and do it yourself through here. You can actually silence the alert if you wanted to. So something that was just really not, wait, you know, just silence and not being bothered by it for a little while. You can also go over here and you can configure a silence rate from a slack if you wanted to. And you can also say, hey, why does that have to give it a second? Come back. This is actually, I hope I have a little bit of time here. I might have a little bit of a limit. So as you can see, this prompted the open AI API. The alarm looked at the issue that was sent to it and sent back a description of the problem. So kind of an example of it. What some of the possible causes that it thinks it might be, okay? It's some simple troubleshooting steps that you can go through to help alleviate this, okay? And this has been crashing pod so obviously there's not a whole lot. You look for the pod, do a describe, do a log, find out what's going on, right? So I'm just gonna just send back those steps. What would be really interesting is to see like, oh, I have a note down, see if it can actually start handling more and more complex issues. And I haven't actually been able to do that, but that would be something interesting to see if it would actually be able to handle those kind of issues and to be able to train the model on the other end. So the more we think about it, the more data you send out, the more training it's gonna get and it's gonna be able to start getting back better answers over time. And then I'll use a possible solution. This is what I was talking about with that, that model one support guy who's sitting there, he's like, I don't know what I'm looking at here. Well, this can actually help him figure out, maybe put him in the right direction, kind of give him that nudge to figure out what's actually broken and what's actually happening in that environment, right? So with that, I know that the question was asked, what data is actually being sent? I'll just show you that very quickly here. So if you go down here, this is the actual prompts that are being sent to, to the open AI. So there's no environment data that's really going through. You're asking, you're more like, if you're more like me asking Chris a question, hey man, I'm hitting this problem, what can I do? Right? So, yeah, thank you, it's a good question to bring up that you don't wanna have an obviously environment, the entire map of your environment getting sent out there without that. So I don't know if I'm actually a little bit over time, so any questions, any other questions? If not, obviously I guess I'm very hard to find online, so just go and reach out to me and I'll help you out with it sometime.