 So I'm going to give you a quick talk on how Datadog views monitoring and how we believe that monitoring should be done. So just first to introduce myself, I'm Greg. I'm an open source software engineer at Datadog. I work on our agent integrations team. Datadog's a SaaS platform that processes trillions of data points per day. So first off, what is DevOps? DevOps is about culture. It's about automation, metrics, and sharing. If you're raising a barn, it doesn't take very much modern infrastructure or modern technology. But if you approach it collaboratively from the culture side, many hands can get the job done much quicker when you have everyone working together. And once you identify the bottlenecks within your culture, then you can start working as a well-oiled machine. And as many of you know, people are really our bottleneck. You have to start with culture. And that's the big thing. And so first off, culture, automation, and sharing are all great. But what about metrics? If you have no observability, all you can do is wait for your users to tell you that something has gone wrong. I actually know big organizations that operate like this. And I have friends who tell me that they have to wait for someone to tweet at them to tell them that something has gone horribly wrong and that their platform is down. Collecting data is very cheap, and having a lot of data is very cheap. But when you need it and not having it when you need it is very expensive. So you should instrument everything. And you can see this, you know, BA went down hugely. They didn't know what was going on. They were using whiteboards to figure everything out. And it took them a long time to be able to restore service because they didn't have observability into their platform. And so what are the qualities of good metrics? It must be well understood. So you can't be mixing, say, imperial and metric units as they did in the Mars climate orbiter, which caused it to crash down on to Mars. It also has to have sufficient granularity. If you only have one second granularity, all of these are the same. So you must have better granularity than that in order to determine what's going on. And so you can see here, a one second peak, a five minute peak, and a one minute peak are all very different when you have different granularity. And so how granular is your granularity? Azure only has one minute. AWS only has one minute. Google Stackdriver only has one minute. We offer you 20 second granularity up to 15 months at full granularity. And so you can really store that data. It must also be taggable and filterable so you can be able to slice it down by region, by Bosch job or by Bosch deployment. So you can be able to actually slice the data up by each piece of that data. And if you can query on a different, and if you can define a query, you can monitor based upon those tags and based upon that filtering. It must also be long lived. You must be able to know what happened last week or last month in order to define trends. And so when you have enough metrics for a long enough period of time, you can be able to define these trends. And you can see this here. I'm not sure if any of you recognize the pattern here that's going on. This is a weekly pattern. And so you can see dips on weekends. But on every weekday, except for that one, that Tuesday, they have increasing numbers of requests. And so what happened on that Tuesday? Was it an outage? Was it a holiday? You need to be able to figure it out. In order to figure that out, you need to be able to keep the data long enough in order to analyze those trends. So what you're seeing here, that shaded area, is actually we are using machine learning to figure out how that data fits together and the patterns within that data. So how long is data kept by the big cloud providers? Pretty long. But not very long at a sufficient granularity, which is why a lot of people turn to Datadog. We keep it at full granularity for 15 months. What type of metrics is important as well? There's a few different types of metrics. There's work metrics. There's resource metrics. And there's events. Work metrics are things like throughput, success, error, performance. If you're in a donut shop, it's the number of donuts that you're selling. If you're serving metrics, it's perhaps the number of, if you're serving requests, it's the number of requests that you're serving, the number of errors that you're serving, the throughput. But then there's also resource metrics, things like utilization. If you're doing a donut, if you have a donut to turn to the donut analogy again, it's the amount of sugar that you're using in the donuts. And you need to be able to figure out how much sugar that is. And finally, there's events. There's things like National Donut Day, which demands an enormous number of more donuts are going to be sold that day. You need to know when those things happen. And discrete events are important. So yeah. And so we like to overlay our graphs with events in order to determine what happens. So you look at these different types of events, and you recurse until you find the technical causes. And we want to be able to put metrics to work. We want to be able to put metrics to work so we can figure out exactly what's going on. And we figure out what to page on. And so you know who to alert when different things are happening. And you want those alerts to be actionable. You don't want them to be cryptic and nonsensical. So you want to be able to make actionable alerts that tell you what's going on and what you need to do. So we want metrics that are well understood, that are taggle-infiltrable, that are long-lived, and that are sufficient granularity. We want metrics that are well understood. We want to alert on the appropriate metrics and make your alerts actionable. And if anyone has any questions, my time is done. But I'm happy to answer them.