 Hey everyone, how are you today? Okay, so we're here to talk about something new. I don't know if you ever heard about it We're going to hear about talk about AI and In order to make it last more of the same thing We invited our friends. We invited floppy and friends and we're going to do this adventure and do this talk with them So let's get started Nice to meet you all. I'm Niveen Gilson. I'm an AWS community hero and the AWS user group leader I was up until three and a half months ago I was the head of DevOps at Milio and Nowadays, I'm a cloud consultant and I have the nerdiest tattoo. I have a pseudo tattoo on my finger And I love everyone. Yeah, clap your hands. That's perfect Hello, everyone. I am Guy. I'm a solution architect at Commodore I'm leading the platformers community for platform engineering and also CNCF ambassador I'm really excited to be with you and and share with everyone and also Nive our thoughts about it I am observability So what we are going to do today So we are going to do a quick intro about AI make sure that everyone in the room knows What is they are and how we can use it? I guess that you have some experience with AI still today And we are going to talk how it matters observability and how AI actually going to change the way we observe and running observability With real-life examples by the way Cool. So let's start with a quick introduction about AI. So I Guess that some of you or maybe all of you have a quick try with chat gpt or one of the other chats And what that's essentially is is that we are using large language models Or LLMs and what it does the essentially models or programs that they get text input You prompt whatever you want you ask them to do anything or fully they can do that and Based on the data they pre-train on they get you back with a text output as simple as that But they do some complex thing in between so they try basically to predict What is the best result for you? So if you ask something they're trying to based on all the data to try to evaluate your question based on the data They already add in place and pre-train on and then they send you what should be a good result for that The main use case for that is usually summarization You saw that if you search something and then basically you got the summarization of that from any source in the internet or you try to predict something or you try to basically Like generate something so you asked him to write a poem or write even a book All of that those large language models can do and they're still available. So open AI ones is the most I think common one today, but basically there are a Lot of them available Mistrol is one of the most common one for on-prem environments and you can use whatever you want, but there is one single problem about them is that We definitely want to connect our observability stack into the large language and language models We want to make sure that we get the data we fetch the data and use the power of AI for that But the main problem is that Unfortunately, LLMs are too general for observability. So they are trained not to use observability for anything They are trained only to do the simple tasks. They are Trained usually it's what we said summarization and generalization and predict what you'll be happy to get as a result So never do you think we can do to solve that? Yeah, so basically do have a possible solution We can do fine-tuning. So what what we are going to do is we're going to take a general model that can solve a problem for example an LLM model and then we Would take a subset of this model and we train it with our data with our data The data of our organization and then the answers that we're going to receive will be more costumate for our needs Okay, but then we still have a few problems The main issue is that LLM cannot query live but by itself So it's creating an issue that we need every week or a month every few every Set of time we need to to fine-tune the result the results with the new data that we get get because it's Synchronizing in real time So we can solve that and there is a solution and a Solution that can help us to go to the prompt ask a question and Get a result which actually queried by live data and this is called drug Ruggies retrieval augmented generation and what it does is basically before we get the text output We can connect it to the fine-tuning predefined data that we added it means to do a more observability task But also to query that so for example if you ask, I know what is the average memory? We add over the last week It can go into Prometheus It can query this data create a promql for it and give you the results back And that's super interesting because it's used a lot especially With new application using a guy because that way each company can Define its own use case and make it very general and to connect the LLM to their own database and So how do you think that the AI world will look if we will The observability world will look if you'll add a little bit of AI to it Wow, which part of the observability world because there are so many steps and What now that we went through all the basics and everything and we all speaking at the same level and everybody knows what those buzzwords are We're going to take it through each step of the observability and together We will try to imagine How this world will look like when we add the magic of AI to it Maybe it will be in a year from today in five years or in a decade My guess it's that it's going to happen sooner than later because all of us here are builders and the timeline It's really up to us so the first step is continuous preparation and I bet all of you have done it like Decided on which logs to collect from which server which metrics are needed What is the retention period of everything and on top of that? We need to build the dashboards and do the alarms and everything and it takes a lot of time It takes a lot of time and each time we add a new service to production We need to go back and decide all those things all over again for every service that we do but Maybe when we add the magic of AI to the continuous preparation step The model will create a dashboard for us and actually it's not something that is so futuristic because even nowadays We can create yamls Using using AI and it makes things so much easier. It can create complex queries for us We don't need to have so much knowledge or so much experience to create a good dashboard or to prepare a good There a good visibility a good observability and the fun part You know how the DevOps or the operation person is always a bottleneck Well, no more because with AI even a junior developer or a product or everyone can use whatever they need They can create queries and dashboards and create everything by themselves And that's amazing like how many of you in the last week went to one of the dashboards and do their weekly daily or monthly review Shift a few of fans. I know so one of the one of that is that once every while we go to the dashboard And then try to assess we're trying to assess first of all if all the data that we take a look at really looks good If the data that we review in this dashboard is actually relevant Do we have anything any constraint that we need to solve all of these questions is something that we ask yourself once in a while and that's that's kind of nice, but it takes a lot of resources to do that and Imagine how many dashboards there are and walking through them can be really painful So what we can do is that we can have an AI assistant that will go through our dashboards and refine them for us They will check the results that we check the data They will gather insights for you So you don't need to do it on your own for example if something going to explode you would get it prehand and and that's amazing and and more than that like we do Basic calculation on our own, but AI tools really packed with advanced algorithms Advanced ability to do correlation and that's make them much more advanced in terms of functionalities and what we have today and More than that and there is a little thing about ownership So we know that we do need to still be the owner of our dashboards We will still need to do and review them once in a while, but we can reduce significantly How long how frequent we do that and we can make AI our own? observability and Assistant so that can be amazing. What do you think about the lurking? Okay, so I think that nowadays with alerting we have many challenges and One two of the biggest one for me at least I think is the fact that we need to come to configure each alert separately even if you do it by code Still need to define what's the threshold for each service and what is critical for production and whatnot and the other one I think that it's even bigger is that We probably all have this like channel with critical production alerts that alert you at night and everything and but probably at some point You had some false positives that weren't really false positives They were positives But as the system changes and scales those thresholds stay the same but the needs or the criticality Changes and then you just get alert fatigue because you have so many alerts in this channel that used to be Just for the most critical ones and it's like the boy who cried wolf, right? You don't know if you really really need to wake up in the middle of the night to take care of it Or not so it does create a big challenge With the magic of AI we can create alerts using LMM for a start and We can get smarter thresholds nowadays the thresholds are static But we with AI it can look at across the entire system and make correlations that for our for us as human beings It's harder to make and it create that it can create dynamical threshold to change By by the state of the entire system or the day maybe if there's a holiday or something, which is really cool And we can also get get complexities and dependencies out of the box We don't need to do it by themselves and save time for by configuring one by one and managing of course How about investigation do like investigating at down times I don't like investigating at all and I don't think anyone likes Like you get an alert in the middle of the night and then you need to start investigate You need to find the right dashboard and then you need to figure out what's the right metrics Maybe to do some correlation on your own That's bad and sometimes you don't need to investigate in with an incident Something you just have a query that you want to run and you ask yourself something about your infrastructure or something about your software and that's great because we can leverage LLMs and use them in order to investigate how We have connected them to our database and now they can create a dashboard for us when we have an incident You don't need maybe to define it in advance. You can say okay. This is my incident. Those are the relevant components please bring me the most relevant dashboards and as it runs like From QL queries and you get a query today. You would get the full dashboard Just from that and that can be a game changer when we go into the investigation Something that we see a lot is the way to extract logs. So when we when we talk about logings and Doing like sending the logs to our system that that's the easy part But filter them that can be really hard and especially for people that are not Like the experts in the team in terms of observability They don't really know how to filter events and get the juice out of the logs So we we can use the AI in order to get into the logs find all the relevant data and summarize the log for us And that can be a really game changer in that one of the main thing that we talked about is that There is the current state of the environment and We want to keep that in mind when we investigate So one of the question that we ask ourselves is that should we solve the problem or should we? Keep investigating in order to find out the root cause in a much more deep level and then we know that the first action we are going to take going to fix it and those what this question is actually comes up in any part of our incident and What LLM allow us to do is basically to get a snapshot of all of the environment and get all of the information especially if we're running a server thing I'll actually recycling on our servers to get that information and That can be truly game changer to how we query What we want whenever we want and in a simple way we want and and the only investigation word is going to Go upside down with the lens. What do you think about predictions? That's my favorite one actually Who here does on call? Oh Yes, thank you But most of you if not then maybe you are CTOs or something because everybody I know that's uncalled duty So it means that we're not so good at doing prediction by ourselves so there's a lot of place for improvement and with AI we can basically make ourselves like sort of a Megazord Combination with with that data engineer and if if you are data engineers Then you can make yourself a Megazord with an operations engineer And then you have like all the skills that needed to make the correlations across the entire system and Make the predictions more precise and better and sleep better at night, of course and After the downtime as if it's not bad enough you need to write the post mortem Yeah, exactly No, no one likes to write post mortem You need to gather the information to go through them the metrics the observability tool the incident Maybe slack or the chat you are using gather all these pieces of information takes a very long time And after we gather all of that we need to analyze we need to think okay Which decision that we we took one year ago made the impact that we see today What do we need to do in the future in order to improve that so we need to analyze we need to be the big thinkers of What a good post mortem should be and And that's interesting because there are so many post mortem written all over the world based on the same data And for every different company and team we will get completely different post mortem And But what if we'll be able first of all not Together the information something else would gather the information for us And that will be very impactful for the data gathering and you will have everything a single place and from that point You would get automatic analysis something will take the most The most smart brain in the world Bring it together to one place which is your data and write the best post mortem that's ever written for the same case maybe it's maybe it's already pre-trained on Similar post mortems and how they solve them and that will be very impactful to how you are going to benefit from post mortems And the last thing is when we have a post mortem. It doesn't hand With writing the post mortem right so so it's just a simple words on a text document We need to follow up on the action items And deliver them and then we actually go back to the start when we have action items We'll go back to continuous preparation and so on and so on Exactly exactly and we want first of all to implement the action items. That's really hard and second of all We need to follow up on them. I mean like I guess that all of you and you are free to raise your hand if you are not in this case Implementing all of your action items from post mortems Some people laughing I can I can see you you understand what I mean So it's really hard to follow up and it really are to make sure that we follow up on them and make close all the action items So we want AI we can just use them to to train us and and follow up on us and So need what do you think like will AI replace us? Well, that's a tough question because yeah, maybe we'll be out of work soon No, but seriously something that we do have and AI doesn't have is that we have Ownership we need to take ownership and we need to be responsible for everything we make So even when all of us add all these things that we theoretically Suggested to every step to make it our life easier. It's still AI that cannot take accountability So as long as you are the owner and you take accountability on everything that happens in each step in wherever you work I think I think we're still needed. I think we'll still have food without Deploying infrastructure as a code That's it for today. Thank you everyone for being with us today And if you have any question, you're free to jump and ask us in here. Thank you