 All right, well welcome everybody. Thanks for showing up at nine for prom day today. My name's Martin Mao. I'm the CEO co-founder of Chronosphere. We provide a cloud native observability solution and we get to work with awesome companies like DoorDash and Robin Hood. Hi, my name is Yash Kumar Swami. I'm the tech lead for observability at Robin Hood. Awesome, and we're here today to talk about whether it's time to raise your metrics game. So what do we mean by that? Well, one of the ways we can value a lot of things like our metric system is return on investment. Really simple calculation. How much return or value are we getting from a particular system like our metric system versus how much investment are we putting in? And one of the interesting trends that we're seeing is when we talk to companies in the industry there, they're saying that the return or the value they're getting out of their metrics and observability system is actually becoming lower over time while their investment is increasing, especially as the data grows. So you can imagine this makes this equation on your ROI far lower over time and that's really not great. So that's what we're here to talk to you about today and perhaps some techniques in reversing that and making the ROI sort of improve. So let's talk about the return component first. When we talk to companies in the industry, these are some of the ways that people think about how to value your metric system, things like MTTD, how long it takes you to detect an issue, MTTR and whatnot. And when we talk to companies out there and we did a study on this, about 68% of companies are actually getting worse outcomes, which means they're getting worse MTTDs, worse MTTRs over time, which is an interesting fact for sure. So, Yash, how do you think about this at Robin Hood and how do you think about the return of this particular system? Absolutely, I think as our metrics footprint increased and as we're going through the happy growth phase of the company, we realized that the availability in MTTR and MTTD are very closely correlated. So we had to spend a significant amount of money to keep our availability high and that's what prompted us to consider and overhauling our metrics infrastructure in January of this year, which directly resulted in dropping the mean time to detect and mean time to recover, which also increases developer productivity because they're able to root cause issues more accurately and in a more timely fashion as well. That's interesting. So you found that actually the availability was closely correlated with MTTD and MTTR and root cause analysis as well. That is absolutely right. Nice, awesome. So that's a quick snippet on the return side. What about on the investment side and how much these things cost? So we're all adopting cloud-native, we're all containerizing everything, we're all moving to microservices and generally that ends up producing a ton more metric data than ever before. You can imagine high cardinality is a hot topic on everyone's mind and generally more data equals more cost. Whether you run the system yourself and need more infrastructure to it or you pay a vendor for it, more data generally correlates to higher costs. However, that's not the only thing when it comes to cost. There are a lot of hidden costs as well. So perhaps you actually will talk to us about at Robin Hood how you thought about the cost of these particular systems. That's right. At Robin Hood we benchmark our costs on a continuous basis and they're usually data-driven and empirical and that allows us to make decisions based on data. And what we found was a total cost of ownership definitely increases over time with the metrics footprint and the other costs outside of what Martin just mentioned in terms of infrastructure costs, operating costs is also developer productivity costs. How do we keep our developers productive and feeling confident about the underlying infrastructure that allows them to move faster and be more confident when it comes to product launches and so on. That's interesting. So you found a pretty high correlation in developer productivity as well as cost of downtime as your data grow. Exactly. So keeping your downtime as minimal as possible as an infrastructure core, infrastructure team as a tier zero infrastructure team is very critical. Nice. Cool. We talked about this equation and how the return is getting worse and the investment is rising. So what could we do about it? There are a few things here that we've just listed on some ways you can approach this. I'll go through one at the beginning. Even very simple things like changing the collection interval or your scrape interval in Prometheus is a great way to improve your return. You can imagine if you have a use case like a CICD use case and you want to roll something back, sending the collection interval to five or 10 seconds allows you to detect an issue much better than if you set it to 60 seconds, for example. We have a very simple one there. Alternatively, you can imagine if you have a long-term capacity planning use case, sending the collection scrape interval to a minute or 10 minutes or an hour is a great way to save on costs there as well. Yash, what are some other ways that we can look at improving the ROI here? Yeah, so we decided to significantly invest in the solid foundation aspect which brings me to the done sampling and retention of data. We wanted to increase the retention of data from two weeks which we supported with our previous infrastructure two or 13 months with Chronosphere which allows us and this also ties in very closely with the query performance. We want to be able to effectively query large period of data at a more consistent and a predictable fashion and another investment that we've been pushing for is developer productivity in terms of out-of-the-box monitoring and observability for our teams. So any microservices that gets bond in Robinhood comes with a set of red set of metrics, red set of alerts and comprehensive coverage of their infrastructural pieces and dependencies. Awesome. It's only a five minute keynote so there's plenty of other ways to improve return on investment that we didn't have time to cover today but if you want to continue the conversation come find us at the Chronosphere booth G15. For this conference, Chronosphere we also created a video game called Olly Legends just for the conference so come and check it out and play the game. I think there are some prizes to be won every day there and actually one other announcement one to make this morning this news just dropped about 20 minutes ago but here at Chronosphere we worked with Julius Wolfs one of the co-creators of Prometheus in open sourcing PromLens which is actually a standalone application that is used as a query, a visualizing query builder for PromQL as well as a query analyzer as well so you can sort of look at what the cardinality of your particular queries are as well as which part of your queries are efficient or inefficient. So this is open sourced right now under I believe the Apache 20 license there at that URL. It's a standalone application you can run with Prometheus today or eventually it is gonna be upstreamed to Core Prometheus as well. So very grateful to work with Julius on making that available to the whole community here. Thank you. Thank you for your time.