 I'm, my name is Martin Plank, you can find me by this name, like literally everywhere. And I'm currently head of DevOps at Threadmark, which is a small startup here in Bernal. And many of the lessons learned that you'll see in the talk are from this company. So something about me, I hope some of you remember me from Red Hat, because I was formerly software engineer working on Overt and Qvert. So I have really close relationship with infrastructure and with monitoring and with performance. What I really like about moving away from developing the infrastructure to this thing they call DevOps or SRE or whatever it's called now, it usually means that you have two full-time jobs. One is developing, the other one is servicing the infra. I really like the movement to business side and, you know, to the actual production and deployment where you're not really making some kind of piece of infrastructure in a silo where you think that's how it's going to perform and everything, but you're really trying to make sure that you bring some value to your customer. So I'm also maintaining the monitoring stack at Threadmark and I also have a colleague here who is usually the one who points out the mistakes, so that's awesome. And I'm automation and data geek, like I need to have everything automated and if something is not automated, it's not ever going to get used and the same is for data. If something doesn't give me data, I don't like that thing and I would use it at all. So why are we talking about Grafana? Now the thing is I don't want this talk to be too Grafana centric. Grafana is just a tool that displays data from your monitoring system. In this case, I'm using Prometheus alert manager, but the thing is there are more monitoring alternatives. There are the old school Zabix, Nagios and stuff, there's Datadog. This is really generic concepts of monitoring real production workload and how it performs and what do you need to look for in your monitoring system. So but we have decided to go with Grafana and why is that? So the first thing is you're using Grafana because you want to use Prometheus because Prometheus is the good stuff and Grafana just happens to be nice and complementary with that. And the thing is it also fits nicely in cloud native landscape. So nowadays you're probably running Kubernetes or Docker or some other cool high technology that is in this large cloud native landscape and Grafana nicely fits there or Prometheus fits there and you need to display data from Prometheus. So Grafana therefore fits there. It's really beautiful. Like the graphs are beautiful. They may not be showing relevant information, but you can make some absolutely stunning graphs and we'll get to that a bit later. And the thing is, yeah, it's super easy to mess up because you can create many dashboards and you can monitor many data and items and then you end up monitoring nothing. So that's pretty bad. So I would like to start this talk by talking about context. And what I mean by context is context in monitoring. If you have a piece of data, a metric, why are you monitoring it? Why does it matter? Why do you need to see it? So you might be monitoring your performance metric because you need to deliver for your customer. You have some kind of crazy SLAs or you may be monitoring availability because again you have SLAs. But there are also more use cases like business intelligence and other things that you just want to know about. So let me start this by showing you a graph. I hope it's readable. Is it readable? Okay. I have a backup picture. So if you can't really see one picture, I probably can show you a high-res version. But this graph shows how many engineers are at lunch in a week. So you can see there's a very small spike Monday but it goes all the way up on Wednesday then it falls down. Now one thing I want to point out that this is obviously graph generated using fake data because people, startups have to work on weekends but that's just like a small thing. But the thing is why would you graph this? Like it may be interesting. Our manager maybe want to graphing this but let me ask you a question without giving you more context. If you think that this graph shows absolutely useless metric, raise your hand. That's not too many hands. So I just think that this is useful graph. Cool. Because then we can change the context a bit and instead of showing this data in context of company that happens to be tech company, we could be showing this data in a restaurant. And this restaurant is tracking how many people are at lunch. And then this changes how the data is useful because restaurant needs to stock up on their food, they need to order in advance, they need to handle all the logistics of food. So it becomes critical business metric just by small change of context. So something that might look irrelevant at first becomes relevant just by shifting it. So in summary, what context means in monitoring is that there are some metrics that you need to prioritize. First level of prioritization is you probably need to monitor metrics that make your company function. You can't really, you know, if you're having a web application, you need to be serving. If you're a Google, you need to have your search working. If you're Amazon, you need your Amazon working probably. So that's the most critical thing that you need to know. Then the second part is the SLAs. So I have started talking about SLAs and they're going to be throughout this whole presentation because I'm dealing with crazy SLAs. Now this can, what does SLA mean? So it's some service level agreement and it's probably something that's written in contract with your customer. Now your customer may be proper paying customer but it may also be different department in your company that, you know, you need to provide some service for them and make sure that it's available for certain periods of time. Next, two small items. One thing is optimizing for cost. So for example, if you are thinking about running AWS, and you probably don't want to go bankrupt the next day. So having monitoring set up and monitoring all the things that AWS charges for might be a good idea. And the short one is extremely business specific and that's the business intelligence part. So you probably want to monitor metrics that help your business grow and, you know, scale and go further. So this one takeaway from the context part is you should try and focus on the needs of your customer, whether it is paying customer or just different department or anything, focus on the customer. So I have a few examples and I'm going to show some details or explain these charts. So this is how one SLA focused dashboard could look like. And what this is tracking is mostly performance SLAs. So you do have some kind of data source that shows you how your application is performing. In this case, it's how many requests are under given time. Then you might have some written codes from your proxies so you know you're not erring out. And then there's this absolutely beautiful and useless, but we'll get to that later, graph that shows exactly the same thing as the one above, but in a different way. So this is almost the same thing, but slightly shifted. And this mostly focuses on some kind of saver queue length. So you might have SLA that you need to display your data to the customer in some specific time. So customer can't wait for two hours before the data is up to date. So you might be tracking this metric. And again, it extremely depends on the use case and the application. Now last one is kind of modified note exporter that shows just metrics that are usually infrastructure related. So your CPU usage, memory usage, and disk space. And this is example of tracking your costs. So you might be paying for your CPU time. You probably want to track your CPU time. So I've already started digging into this. The thing is showing relevant data isn't always about showing nice data and nice graphs with nice colors. It usually is pretty ugly. Something like nice stuff. You want to see these graphs that are like nice, periodic. You can see that your system is performing well. The problem is that kind of data doesn't show you when your system is on fire and it's dying and you need to save it, preferably before it dies. So I have a few examples of beautiful graphs. This graph is Redis command calls. And it's beautiful. I really like the colors and everything. And you can see how it peaks every day. That's really cool. But there's one hidden pitfall of Graphana. And it is very able to hide peaks that have occurred within these time frames. So there might be some spikes that are just hidden because it's displaying too much data. But I really like the graph. It's nice. Then there's this one. And that is HAProxy throughput. This shows you in our specific context, this is our HAProxy where we see that customer is pushing traffic to us and that we are returning some kind of traffic. It's not really application-centric, but it gives us an idea whether the connection is working. Now, there's one kind of crazy pitfall with this graph. And the whole Graphana and Prometheus is running at the side where this proxy is connected to. So if this proxy went down, this monitoring is utilized because we wouldn't even be able to access the Graphana. Think about whether this is something you really want to track. And now this is the flame graph that I absolutely love. But it unfortunately shows almost no relevant information. You can see, hopefully, there are some blank spaces where you can guess that there was a maintenance window, for example. But other than that, it doesn't show anything relevant. And then I have this super ugly graph that no one wants to see. And it's ugly, it's spiky, and it surely shows nothing interesting, except there is a very hidden spike here. And I know it's probably not that visible. But if you're close, there's a very obvious spike here. I have even arrow pointing to it. And this ugly graph actually pointed out that our Redis is leaking keys like crazy, and we are not deleting them, and that led us to crashing. So it was this ugly graph, not the beautiful ones that are just nice and handy. This one showed us an issue. So don't focus on the form of your graphs, but really try to monitor relevant metrics in maybe not so nice way. So this is my first conflict with Grafana, where I think Grafana is terrible because it just happens to be too nice. You can graph anything in Grafana and make it look nice and make it look important and then show it to your managers, and they will probably like it. The problem is it doesn't have any value. They are just creating something that could be considered art, but you don't really want to be artistic with monitoring your infrastructure. So try to don't fall into this pitfall and track the scary stuff. Have ugly graphs, but make them show what matters. The next point is if you have already debutified your graphs and created ugly metrics, question them. It's really easy in Grafana, at least, to have wrong units, to break the law of physics, to just track something that doesn't exist, and that's really easy to do. I might have a few examples of this. So in this case, if you remember the HA proxy graph that was beautiful, this is how it started. I guess these numbers won't be really visible, but you can see there's 715 megabytes per second pushed through this proxy. There's one issue with this graph, and that's the fact that it's on 100 megabit line. So in this case, we are pushing more data than physically possible through a line, and that's kind of wrong, right? In this specific bug, you can hit it pretty easily if you use Prometheus and HA proxy and HA proxy exporter, and the proxy has multiple processes, you have to track each process each or this is what happens. So this is pretty easily reproducible, and it can lead you to false assumptions about your infrastructure pretty easily. Now another one is IOPS of disks, and again it's not too visible, but here the peak of IOPS is 15K, and on the lower graph it's roughly 4K, but with multiple devices shown. The thing is, this is the same system, and what's the problem is there's a rate on the system, and we happen to be double or in this case roughly triple counting the IOPS. So this is where questioning your data again becomes relevant. So first thing, when you add a new metric, be it Grafana or any other monitoring system, try to look at it from really primitive standpoint and ask yourself, can this really happen? Have someone else ask, is this really the right metric, is this possible, isn't this like 100 megabit line that's pushing 150 megabit, that doesn't make sense? The mistakes I have shown, like really check for double counting or anything else, display the labels in Grafana and make sure the data is correct. One thing is that physics can be handy, like there are metrics that depend on the speed of light, and if your monitoring shows that you're breaking the speed of light, you're probably monitoring the wrong thing. So be careful about that. No clutter, and this is where pretty much the talk name comes from. You might have many dashboards, and you have many metrics that you want to track, and it may start to look like this. And that is just crazy. It's intentionally made worse to prove the point, and you shouldn't see anything there, except for there's a weird hole but ignored it for now. The thing is it's beautiful, like there are these small speedo metrics showing you things and there's small graph within them, but like if I see this dashboard and I have five minutes before the next meeting, I have no idea what's going on with my system. The system can be breaking down, you know, there's, I don't know, like small spike over there, and I'm not sure if it's important. It's just showing too much data in no focused way, and I have no idea what's going on. Or another example is from my SQL dashboard, where again there's a lot of data, it's not segmented by the means of it, and it just looks crazy. So you know, this probably makes sense if you're operating something like that, and you're launching things into space where you have a room full of people monitoring it, and everyone has his or her assigned graph, but if you're monitoring a simple issue, for example, I think that might be too much data. So again, I think Grafana is terrible, and it's because you can start creating many dashboards and you can spam multiple graphs into these dashboards just really easily, and it's just so convenient, you know, just copy this dashboard, adjust something, and leave it there, and that eventually turns into what we have seen before. So what you can do about that. So one thing is try to segment your monitoring into two parts. One of them is the overview part, where someone who probably doesn't have too much time or, you know, you just need to have dashboard where you can see that your application is working, your customer is getting the data, the customer needs. So I call this the overview dashboard, and it shouldn't have too many graphs, but it should focus on the important external metrics. Then it might make sense to have a few dashboards that are as detailed as these are, but then think about having people who are able to monitor the 24-7 or, you know, high, really the service people or operations people who can care about these graphs. So one question is, can we fix this? And can I open the water? Yeah. The arms phrase we probably can. And I'm going to start with the previous MySQL dashboard. Nothing has changed, it's the same mess. And we can just add more spacing and reorder the metrics, and it then becomes like this. And I think it's way more readable. It's clustered, so your SQL locking is in one row with some graph about that that shows the general state of locking. Then you have, like, other metrics grouped on the second row, so you have, like, queries per seconds against f-syncs per second. So you just group these data together into the rows. You make the charts bigger, and it just becomes much more readable, at least to me compared to this. Like, the amount of data is probably going to be same. It's about presenting it in a nice way where, you know, someone who hasn't worked with your Graphana for two years can just come in and see that, okay, this little spike might be wrong. Whereas here, I have no idea. Like, yeah, there is spike, but here isn't a spike, so what's going on? This seems unrelated. So just moving around dashboards can lead to much nicer and better experience with your monitoring. So these are some of the points, like, think about spacing, think about positioning, put data that correlates or is relevant together. And one thing that you should, that's Graphana-centric. One thing that you need to take care of is if your dashboard grows too large, and you have many, many graphs in multiple rows, and you have these small folding rows and all the cool features, the dashboard becomes slow, and it's crazy slow. If you try selecting different timescale, you might be waiting a minute, and minute in monitoring can mean that your system will crash or not crash. So that's what minute does. And that's really pronounced. If you use the selector where you can select multiple timeframes at once, it really kills your Graphana. So I was talking about the split between overview dashboards and then your regular deep dive dashboards. And the thing is, try to think about dashboard space as limited resource. It's the same in the previous slide, I believe. Yeah, this is something important. Like, the space at the top of your, it's like supermarket where they try to put the most relevant items or the most advertised items right in your eye line. So think about the top of your dashboard being as the most expensive space where you really want to move the metrics that matter to you most. And as you go down, it should represent going deeper in how important the metrics are to be monitored. In overview dashboards, we need something that when you open it, you can see, okay, my company is working, my company is running. Or on the other hand, you can see there's something going terribly wrong. And we need to fix that immediately. So this is an example of overview dashboard. And I've circled a metric, it's called RV ServingJS, and it's because we are trying to serve JavaScript. And we can see that there is a spike going down, whereas every other metric is fine. And the spike should be pronounced somewhere here, but it's, I guess, invisible. And this is what I want to see in the overview part. I know that there was an issue with serving of JavaScript. Now, this is important for me. And there might be someone else who needs to debug that and figure out, okay, so we didn't serve JavaScript for a while, where's the issue? And then you can start digging deeper and have your node exporter. And then you come to this conclusion, for example, in this case, it was about running out of disk space. So on your node exporter, you may have this kind of graph that shows really low level detailed metrics that isn't very relevant for your customer, but it's relevant when you are debugging the issues that you found in the overview dashboard. Now, don't try to track everything you see in your Prometheus, for example, in Grafana. You should probably have some kind of logging setup. You might have some tracing setup. And all these other tools give you a different view of the system that you can debug and analyze in a better way than just from graphing everything. One example of tracing is Yeager, for example. And there you can see each request and how it performs. And you might not really need a big picture that your monitoring normally brings. Logging is another example. You should probably feed your logs to some centralized monitoring system which can pull out its own metrics, but it might not make sense to track too many logging related items in your Grafana. Now, one struggle is that your company might have multiple deployments, multiple customers. It doesn't really matter, but your app probably is running in multiple instances. Imagine, for example, Amazon, like there's only one Amazon, but they're trying to serve in Europe and USA and Asia and every other continent probably. So in that case, you need something that gives you overview of your company, of every deployment in your company. And you can do that in Grafana pretty nicely. There are roughly two approaches. One of them is this one. And I can't really show, it would be nice if I could hover over that, but there's confidential data in there. But we have multiple deployments, and each of them is tracked in this graph. It's just showing zero to one, whether that service is currently running. And if you hover over that, you can see the customer name. So you can debug the deployment instantly. And we have the same thing for JavaScript. And this dashboard simply tells me, okay, the company is fine and alive and it's probably gonna survive the day. Or it's crashing and we might have some kind of a problem. And currently, this is the whole dashboard. I think there are no more metrics. The other metric is 500 responses, again, from all deployments. And that's simple. If we see a spike in any of our deployments where we have started serving 500 errors, then again, we might have a problem. Something maybe is not crashing yet, but we are not really returning valid data. So this is one way how you can capture your whole company, your multiple deployments into a single dashboard. So the other thing is, this is great for overview. But when you start digging deeper, you probably don't want to track your every system in your company into a single graph that will, it will kill the graphana. And it won't show you any relevant details. There are two features that can help with this struggle. One of them is graphana templating. And I'm gonna jump straight to that. And this is how it looks. I would really love to show more deployments, but again, that's confidential. But imagine having for each customer, you would call the deployment for each environment in that customer setup, you would call it environment. And in this way, you can select specific customer and specific deployment running at the customer. And then use your more detailed dashboards to actually analyze what was going on. So again, you can see here something might be going wrong and then you start digging deeper, you choose the customer, you choose the environment and then you can actually start debugging your metrics. The other approach is repeating panels. And this is pretty dangerous approach for multiple reasons. So the data is currently irrelevant. The thing is this is a single graph repeated for some kind of variable. And it's similar to templating, but instead of having a single dashboard display from multiple data sources, you have a single dashboard displaying one data source multiple times, just with different data. And this is dangerous because in graphana, the width of the dashboard is 12 units. It doesn't say what, it's just 12 units. And that means you can only have limited amount of graphs in a line. And when you extend or when you overflow this number, the thing will start spanning multiple lines and it just looks bad. So this is great if you know that, for example, there are no more than two load balancers or there will never be more than four load balancers. But at least in tech saying there will never be is a terrible thing to do and I wouldn't recommend it. So that's why having this kind of repeating panels, not really a good idea. One thing that this would really solve is having repeating rows. So if you see this, if you could repeat the row for each customer that might be interesting, but that's as of graphana 5.3 or what is the current version, that's not really supported. So this is probably the only way to track so much data on a single dashboard. And this is great if you're tracking a limited resource. Now, another point is don't leave a mesh, don't leave a mess in your graphana. So you might be tempted to, like for example, after this talk to create some beautiful graphs and visualize them in a different way, instead of tracking them in line, create flame graphs and see how it turns out visually. Maybe even do crazy things, try dashboards that don't make sense, try dashboards that are just weird. It doesn't matter, but the thing is, make sure you clean up the graphana. And so it doesn't look like this. You're gonna see there are probably some relevant dashboards, like for example, it's our company running, but the test debug dashboard and test permit is ready, it's multiple a's, that probably doesn't belong to your production graphana, even if you're trying to test there. Also try to name your dashboards in a way that they represent what they actually are. So there's this running joke of disks are on a fire dashboard when our disks were crashing. And I mean, it's great internal joke, but if you come to that graphana as an outsider, you probably have no idea what that means. Now, one thing that I would like to mention is the tags in that graphana. So the relevant dashboards also have tags. And these tags usually point out what is actually the dashboard monitoring. And you can see that these can be combined, so you have like, in this case, it's the data source. So here we are taking logs and readys and some kind of, I don't know, traffic as the sources for the data shown. So this might be one way of tagging your dashboards. But the thing is, this probably is not the right way, but try to think about some kind of structure for your tags so you can always open your dashboard and you know what's going on. And one thing I really like about graphana is that if you're deploying it through Ansible, for example, you can do so-called provisioning. And provisioning is a process of moving dashboards to your new deployment, where instead of going to the graphana UI and clicking at dashboard at anything, you bring the YAML files of the dashboards straight into the graphana directory and then you can't change these dashboards. So if you have some evil colleague, for example, who would enjoy changing the dashboards, this way you can log them and make sure they are not changed unless the change goes through proper code review process. And we're at code review. The great thing about graphana is that the dashboards are YAML files. So we can store them in Git, for example. And in Git, when someone does change to your monitoring variable, you can have proper code review process. There's one pitfall about if you just take your graphana YAMLs and put them all in a Git, then do a change. This is what happens. And it is pretty crazy thing. So I moved one panel below other panel and it did 106 changes because it generated like new IDs and new coordinates for every dashboard. And when reviewing that, that is just annoying and you don't wanna deal with this. So there's one great tool that I recommend using and the link is down here. It's called Graphana Life and it's library in Python where you can create the dashboard in Python structures. You can use Python structures to hack around issues that you would normally not have supported in graphana. You can use for loops, you can use templating, you can generate special templates. It just gives you power of Python in graphana which is absolutely cool. It makes the code reviews for example and when you are reviewing dashboard made that way, you can actually see, okay, this has been renamed, this has been moved down. It doesn't look that terrible. Actually, the main reason why when comparing YAMLs, it looks terrible are graph IDs and coordinates. So there's a special function for generating all this and you just call that when you're generating the YAML. I have an example which you probably can't see. The idea is that it looks like the YAML that you would normally use to generate your graphana but it just uses Python structures where it defaults most of the values so you don't really have to see them and you only see the parameters of the dashboard that you care about. So that's pretty cool. It's probably not visible but have a look at the GitHub link and you can see examples there. So there's an issue because again, graphana is terrible because you can add dashboards too easily. There's a unique dashboard ID on the graphana website. You copy that ID, you plug that into your graphana, it automatically downloads the dashboard, it bootstraps it and it just starts working which is really cool but when you're trying version control for your dashboards, it doesn't scale for this use case. You're adding random graphs. It's pretty much like cool graph, pipe, graphana. So how do we solve this when you start version controlling your dashboards? And I have a few ideas listed here. One of them is implementing them in the Python link. For me, that's terrible idea. It's just duplicating what has already be done and I don't wanna re-implement the thing. But it's an idea, it works. The other approach could be just considering them rendered. Download the YAML, put it into Git, but don't ever review it. Treat it as external dashboard and just leave it alone there. It will probably rot away so that's pretty bad. And I wonder if anyone has an idea how to store these dashboards, track them, feel free to say it at the end of the talk. But these are two approaches I've seen. The first approach really doesn't work having them as external rendered dashboards work kind of nicely but it's not an ideal solution. Anyway, when you migrate to this graphana lip, the difference has become way more manageable so I have moved one panel here. I'm not showing the code because it's too small for the projector but you can see that I've brought 106 changes down to 11 changes. And you can really see, maybe it's not there. You could see that it really took a codebook and moved it below so that's really good. Now the next part is I've been talking about monitoring your memory, monitoring your response time, monitoring your availability proxies but I'm now trying to monitor my application. And really important point is that I'm trying to make is that you can monitor your application by observing the infrastructure at which the application is running. So if your application is not monitored, it doesn't have Prometheus export ready, you can probably gather many metrics just by observing the infrastructure, the database, if you're using Redis or any other queue and that's pretty good. But graphana has this really good feature where you can connect directly to SQL data source. So I know it's probably not too cool to talk about it because nowadays we all have document DBs where you throw your data, you forget about it and your data eventually gets lost. That's how modern databases work but if you have a bigger application, there probably is SQL somewhere down the stack. And this is where the SQL data sources become handy. So this is how it looks and this is mostly relevant. You can see that when you deploy your own graphana but you just specify the name of the database, it's type, host the database as regular connection. Better read this note, not here but if you're doing that because it tells you to make specific user just for graphana so that someone cannot kill your production DB in this way. And then you just do a SQL query and you get free application monitoring based on your database. So this is pretty cool. There are several problems with this. Again, not probably not visible but to generate this data, you probably need to use group by and doing group by on a huge database is not exactly efficient operation. So this query might be pretty heavy. Now there's a comparison like this. This is some kind of application score that's not relevant but there's counting query in the group by times. So you group by your data by time. And if you need to compare that to one value, you need to select that value. So you might end up doing select asterisk from table. So again, if you have huge database this could become quite tricky. And really be careful using this data source because not everyone has multiple replicas for different department. I don't think there's an issue pointing this to your production database as long as you have the user permission solved but then consider the performance implications. Like it could happen that someone opens this dashboard and it kills your production database. That's not what you wanna see in your monitoring. Now the last part I want to talk about is alerting. I hope this is my last point that Grafana is terrible but alerting in Grafana is really nice. And I have to show this. Again, this is graph from our production Grafana and there's this broken heart that shows you that it's critical and there's this line that shows you that the cutoff for your alerting it's really beautiful but the problem is it's not really sophisticated enough if you are monitoring something critical. So you can see, I have to stand this way, sorry. You can see that I've set a cutoff line at a constant and then there might be weekend fluctuations there might be monthly fluctuations and there might be even daily fluctuations where tracking a constant number just doesn't scale. It doesn't alert anything. It either alerts every other day except for the one that you set it for or it just spams random alerts and no one likes random alerts. The other problem is if you opt for this approach you are trying to alert on what is broken and it is already too late. You want your monitoring to catch problems before they actually break your production or break the environment. So that's why I think that you shouldn't rely on Grafana's own alerting and rather use something real like Prometheus and alert managers. The most important point I hope is that you should alert on symptoms. Don't alert when your whole application is down. Like there has probably been thousands of alerts fired before that happened and try to monitor your syndromes, monitor your memory usage, just look where when something broke, look at which graphs spiked and set up some kind of monitoring there. And one of the approaches is that there's anomaly detection. You can use this simple approach where you take the average of some time and it's standard deviation multiplied by two and that pretty much shows when something has spiked. This doesn't really work when your workflow is too fluctuating again. In that case, you need something more sophisticated like buckets for a specific day, specific month and you need lots of data to alert on that. But this is good beginning. It can sometimes predict something useful. The second thing is if you're using, for example, Prometheus, it can use linear prediction where you take the rate at which you're doing something and predict how that will expand into the future. So this is really the way monitoring can scale to symptoms monitoring instead of issues monitoring. And there are, of course, recording rules and much more in Prometheus. And this is the graph you have seen before where we have run out of disk space, eventually. And in Grafana, that's how it looked like if we alerted a certain threshold, the systems that are always above that thresholds would be triggering nonstop whereas systems that were under that thresholds would never get triggered, for example. But in our Prometheus, using linear prediction and a constant, we could see that there's a time, so this was at 1417, this was at 1404 and the crash occurred at roughly 16 p.m. Not p.m., sorry, too much America today. But the thing is, we could see, this is when Prometheus considered this alert to be critical and where an alert manager was actually firing alerts. So this is why more sophisticated monitoring might make sense. So some summary, I wanna leave time for questions, so the summary is everything you've heard before. Maybe the most important point is try to show relevant metrics, make sure your metrics don't break physics or that they're real and you're not pushing more data than is possible and try to focus on what the business and what the customer needs instead of what your developers want to see and have nice dashboards. Just adding new metric is super cheap. You just create a graph for it, you commit it into Git and put it somewhere on the dashboard that's great. The problem is if you start monitoring something you don't care about and you use the space of something that was more important, there are implications. You could be crashing, you could be breaking SLAs. So really think about each metric that you're adding to your dashboards. And that is, that's it. So thank you for listening and staying. And if you have some questions, feel free to throw them on me. Yeah. So you've explained how to conserve space on the dashboard, but then you also have the dashboards which show just binary data, true or false. Why does it even need it? Is the tool to display this kind of metaphor like the line graph? Why is it needed if there's just two values? Okay. So should I repeat the question too? Yeah, so the question is some of the dashboards that are showing binary data, for example, are we serving API, are we not serving API? My answer is that it's probably not the right tool or maybe not the right chart, really. So we could have table, we could have anything else. On the other hand, this is really good if you need to track your uptime or availability over time, where then you can query when this variable was zero, when this variable was one, and you can generate your availability like 99. whatever percent. So I think it's pretty good to display this kind of information in this way. But if you have, again, if you have more important metrics, I wouldn't put this on top. But really, you need to know whether you're running and if it's a table, if it's a board that just says okay or not okay, that would be enough. So, any other questions? All right, so thank you.