 Good Friday morning everyone from Motor City. Lisa Martin here, John Furrier. This is our third day, theCUBE's third day of coverage of KubeCon, CloudNativeCon 22 North America. John, we've had some amazing conversations the last three days. We've had some good conversations about observability. We're going to take that one step further and look beyond its three pillars. Yeah, this is going to be a great segment. I'm looking forward to this. This is about in-depth conversation on observability. The guest is technical and it's on the front lines with customers looking forward to this segment. It should be great. Ian Smith is here, the field CTO at Chronosphere. Ian, welcome to theCUBE, great to have you. Thank you so much, it's great to be here. All right, talk about the traditional three pillars approach and observability. What are some of the challenges with that and how does Chronosphere solve those? Sure, so hopefully everyone knows. People think of the three pillars as logs, metrics and traces. What do you do with that? There's no action there. It's just data, right? You collect this data, you go put it somewhere but it's not actually talking about any sort of outcomes and I think that's really the heart of the issue is you're not achieving anything, you're just collecting a whole bunch of data. Where do you put it? What can you do with it? Those are the fundamental questions and so one of the things that we're focused on at Chronosphere is, well, what are those outcomes? What is the real value of that? And for example, thinking about phases of observability. When you have an incident or you're trying to investigate something through observability, you probably want to know what's going on, you want to triage any problems you detect and then finally you want to understand the cause of those and be able to take longer term steps to address them. What do customers do when they start thinking about it? Because observability has that promise. Hey, you know, get the data, we'll throw AI at it. And that'll solve the problem. When they get over their skis, when do they realize that they're really not tackling it properly or if the ones that are taking the right approach? What's the revelation? What's your take on that? You're in the front lines. What's going on with the customer? The good and the bad, what's the scene look like? Yeah, so I think the bad is you end up buying a lot of things or implementing even in open source or self-building and it's very disconnected. You don't have a workflow, you don't have a path to success. If you ask different teams, like how do you address these particular problems? They're going to give you a bunch of different answers and then if you ask about what their success rate is it's probably very uneven. Another key indicator of problems is that, well do you always need particular senior engineers in your instance or to help answer particular performance problems? And it's a massive anti-pattern, right? You have your senior engineers who will probably need to be focused on innovation and competitive differentiation, but then they become the bottleneck and you have this massive sort of wedge of maybe less experienced engineers but no less valuable in the overall company perspective who aren't effective at being able to address these problems because the tooling isn't right, the workflows are incorrect. So the senior engineers are getting pulled in to kind of fix and troubleshoot or observe what the observability data did or didn't deliver? Correct, yeah. And the promise of observability, a lot of people talk about unknown unknowns and there's a lot of crafting complex queries and all this other thing. It's a very romantic sort of deep dive approach but realistically you need to make it very accessible. If you're relying on complex query languages and the required knowledge about the architecture and everything every other team is doing, that knowledge is going to be super concentrated in just a couple of heads and those heads shouldn't be woken up every time at 3 a.m. They shouldn't be on every instant call but oftentimes they are the sort of linchpin to addressing, oh, as a business we need to be up 99.99% of the time. So how do we accomplish that? Well, we're going to end up burning those people but also at least to a great dissatisfaction in the bulk of the engineers who are just trying to build and operate the services. So talk, you mentioned that some of the problems with the traditional three pillars are it's not outcome-based, it leads to siloed approaches. What is Chronosphere's definition and can you walk us through those three phases and how that really gives you that competitive edge in the market? Yeah, so the three phases being no triage, I don't understand, so just knowing about a problem and you can relate this very specifically to capabilities but it's not capabilities first, not feature function first. So no, I need to be able to alert on things. So I do need to collect data that gives me those signals but particularly as the industry starts moving towards as slow as you start getting more business-relevant data. Everyone knows about alert storms and as you mentioned, there's this great white hope of AI and machine learning but AI machine learning is putting your trust in sort of a black box or the more likely reality is that really it's a statistical model and you have to go and spend a very significant time programming it for sort of not great outcomes. So no, okay, I want to know that I have a problem, I want to maybe understand the symptoms of that particular problem and then triage, okay, maybe I have a lot of things going wrong at the same time but I need to be very precise about my resources, I need to be able to understand the scope and importance, maybe I have five major SLOs being violated right now. Which one is the greatest business impact? Which symptoms are impacting my most valuable customers? And then from there, not getting to the situation, which is very common, where okay, well we have every customer facing engineering team, they have to be on the call. So we have 15 customer facing web services, they all have to be on that call. Triage is a really important aspect of really mitigating the cost to the organization because everyone goes, oh, well I achieved my MTTR and my experience from a variety of vendors is that most organizations, unless you're essentially failing as a business, you achieve your SLA, you know, three nines, four nines, whatever it is. But the cost of doing that becomes incredibly extreme. This is a huge point, I want to dig into it if you don't mind, because you know, we've been all seeing the cost of ownership miles and ITO, the cost of doing business, cost of the shark fin, the iceberg, what's under the water, all those metaphors. When you look at what you're talking about here, there are actually real hardcore costs that might be under the water, so to speak, like labor, senior engineering time, because cloud native engineers are coding in the pipelines, there's a lot of impact. Can you quantify and just share an example or illustrate where the costs are? Because this is something that's kind of not obvious on the hard costs, it's not like a dollar amount, time, resource, breach, wrong triage, gap in the data, what are some of the costs? Yeah, and I think they're actually far more important than the hard costs of infrastructure and licensing. And of course, there are many organizations out there using open source observability components together and they go, oh, it's free, no licensing cost. But you think again about those outcomes. Okay, I have these 15 teams and okay, I have X number of incidents a month. If I pull a representative from every single one of those teams on and it turns out that as we get down in further phases, we need to be able to understand and remediate the issue, but actually only two teams are required of that. There's 13 individuals who do not need to be on the call. Okay, yes, I met my SLA and MTTR, but if I am from a competitive standpoint, I'm comparing myself to a very similar organization that only need to impact those two engineers versus the 15 that I had over here, who is going to be the most competitive, who's going to be most differentiated. And it's not just in terms of number of lines of code, but leading to burnout of your engineers and the churn of that. VP's of engineering, particularly in today's economy, the hardest thing to do is acquire engineers and retain them. So why do you want to burn them unnecessarily on when you can say, okay, well, I can achieve the same or better result if I think more clearly about my observability, but reduce the number of people involved, reduce the number of senior engineers involved and ultimately have those resources and more focus on innovation. You know, one thing I want to, I don't know if we want to get in there, but one thing that's come up a lot this year, more than I've ever seen before, we've heard about the skill gaps, obviously, but burnout is huge. That's coming up more and more. This is a real, this actually doesn't help the skills gap either. So you got skills gap, that's the cost potentially, and then you got burnout. People just kind of sitting on their hands or just walking away. So one of the things that we're doing with Chronosphere is while we do deal with the pillar data, but we're thinking about it more, what can you achieve with that, right? And aligning with a no triage and understand, and so you think about things like alerts, you know dashboards, you'd be able to start triaging your symptoms, but really importantly, how do we bring the capabilities of things like distributed tracing, where they can actually impact this? And it's not just in the context of, well, what can we do in this one instant? So there may be scenarios where you absolutely do need those power users or those really sophisticated engineers, but from a product challenge perspective, what I'm personally really excited about is, how do you capture that insight and those capabilities and then feed that back in from a product perspective so it's accessible. So everyone talks about unknown unknowns and observability, and then everyone sort of is a little dismissive of monitoring, but monitoring is that thing that democratizes access and the decision-making capacity. So if you say, I once worked at an organization and there were three engineers in the whole company who could generate the list of customers who were impacted by a particular incident. And I was in post sales at the time, so anytime there was a major incident, you need to go generate that list. Those three engineers were on every single incident until one of them got frustrated and built a tool. But he built it entirely on his own. But can you think from an observability perspective, can you build a thing that it makes all of those kinds of capabilities accessible to the first point where you take that alert, you know which customers are affected or whatever other context was useful last time, but took an hour, two hours to achieve. And so that's what really makes a dramatic difference over time, is it's not about the day one experience, but how does the product evolve with the requirements and the workplace? And cloud native engineers, they're coding so they can actually be reactive. That's that there's been a platform and a tool. And platform engineering is the hottest topic at this event. And this year, I would say, with cloud native hearing a lot more. I mean, I think that comes from the fact that SRE is not really SRE. I think it's more of platform engineer. Not everyone in the company has an SRE or SRE environment. But platform engineering is becoming that new layer that enables the developer. This is what you're talking about. Yeah, and I know there's lots of different labels for it, but I think organizations that really think about it well, they're thinking about things like those teams that develop for efficiency and develop a productivity. Because it's again, it's about the outcomes. It's not, oh, we just need to keep the site reliable. Yes, you can do that. But as we talked about, there are many different ways that you can burn unnecessary resources. But if you focus on developer efficiency and productivity, there's retainment, there's that competitive differentiation. Let's up level those business outcomes. Obviously you talked about three phases, no triage, understand you've got great alignment with the cloud-native engineers, the end users. Imagine that you're facilitating companies' ability to reduce churn, attract more talent, retain talent. But what are some of the business outcomes? Like to the customer experience, to the brand, talk about it in some of those contexts. Yeah, one of the things that not a lot of organizations think about is what is the reliability of my observability solution? It's like, well, that's not what I'm focused on. I'm focused on the reliability of my own website. Okay, let's take the common open-source pattern. I'm going to deploy my observability solution next to my core site infrastructure. Okay, I now have a platform problem because DNS stopped working in cloud provider for my choice, it's also affecting my observability solution. So at the moment that I need- And the tool chain and everything else. Yeah, at the moment that I need it the most to understand what's going on and to be able to know triage and understand that fails me at the same time. It's like, so reliability has this very big impact. So being able to make sure that my solution is reliable so that when I need it the most and I can affect reliability of my own solution, my own SLA, that's a really key aspect of it. One of the things though that we look at is it's not just about the outcomes and the value. It's ROI, right? It's what are you investing to put into that? So we've talked a little bit about the engineering cost. There's the infrastructure cost, but there's also a massive data explosion particularly with cloud native. Yes, give us, all right, put that into real world examples, a customer that you think really articulates the value of what Chronosphere is delivering and why you're different in the market. Yeah, so DoorDash is a great customer example. They're here at KubeCon talking about their experience with Chronosphere and the cloud native technologies, Prometheus and those other components aligned with Chronosphere. But being able to undergo a transformation, there are cloud native organization but going a transformation from stats D to very heavy microservices, very heavy Kubernetes and orchestration and doing that with your massive explosion particularly during the last couple of years, obviously that's had a very positive impact on their business, but being able to do that in a cost effective way, right? One of the dirty little secrets about observability in particular is your business growth might be, let's say 50%, 60%, your infrastructure spend in the cloud providers is maybe going to be another 10, 15% on top of that, but then you have the intersection of, well, my engineers need more data to diagnose things. The business needs more data to understand what's going on, plus we've had this massive explosion of containers and everything like that. So oftentimes, your business growth is going to be more than doubled with your observability data growth and SaaS solutions and even your on-premise solutions, what's the main cost driver? It's the volume of data that you're processing and storing. And so Chronosphere, one of the key things that we do because we're focused on organizational pain for larger scale organizations is, well, how do we extract the maximum volume of the data you're generating without having to store all of that data and bring presented not just from a cost perspective, but also from a performance perspective. And so feeding all into developer productivity and also lowering that investment so that your return can stand out more clearly and more valuably when you're assessing that TCO. Better insights and outcomes strives developer productivity for sure. That also has a top theme here at KubeCon this year. It always is, but this is more than ever because of the velocity. My question for you, given that you're the field chief technology officer for Chronosphere and you have a unique position, got a great experience in the industry, been involved in some really big companies and cutting edge, what's the competitive landscape? Because the customers sometimes are confused by all the pitches they're getting from other vendors. Some are bolting on observability, some have, I would say, a shim layer or horizontally scalable platform or platform engineering approach. It's a data problem, okay? This is a data architecture challenge. You mentioned that many times. What's the difference between a pretender and a player in this space? What's the winning architecture look like? What's a, I won't say phony or fake solution, but ones that customers should be aware of because my opinion, if you have a gap in the data or you configure it wrong with a bolt on and say DNS crashes, you're dead in the water. What's the right approach from a customer standpoint? How do they squint through all the noise to figure out what's the right approach? Yeah, so, I mean, I think one of the ways, and I've worked with customers in a pre-sales capacity for a very long time, I know all the tricks of guiding it through, I think it needs to be very clear that customers should not be guided by the vendor. You don't talk to one vendor and they decide, oh, I'm going to evaluate based off this. We need to particularly get away from feature-based evaluations. Features are very important, but they're all have to be aligned around outcomes and then you have to clearly understand, where am I today? What do I do today? And what is going to be the transformation that I have to go through to take advantage of these features? It can get very entrancing and say, oh, there's a list of 25 features that this solution has that no one else has, but how am I going to get value out of that? I mean, distributed tracing is a distributed word. Distributed is the key word. This is a system architecture. The holistic big picture comes in. How do they figure that out? Knowing what they're transforming into, how does it fit in? What's the right approach? Too often I see distributed tracing particularly, bought because, again, look at the shiny features, look at the premise and the MTTR expectations, all these other things, and then it's off to the side. We go through the traditional usage of metrics, often very log heavy approaches, maybe even some legacy APM, and then it's sort of at last resort. And out of all the tools, I think distributed tracing is the worst in the problem we talked about earlier, where the most sophisticated engineers, the ones who are being longest-tenured and the only ones who end up using it. So adoption is really, really poor. So again, what do we do today? Well, we alert. We probably want to understand our symptoms, but then what is the key problem? Oh, we spend a lot of time digging into the where the problem exists in my architecture. We talked about, you know, getting every engineer in at the same time, but how do we reduce the number of engineers involved? How do we make it so that, well, this looks like a great day one experience, but what is my day 30 experience like? Day 90, how is the product to get more valuable? How do I get my most senior engineers out of this? Not just on day one, but as we progress through. You got to operationalize it. That's the key. Yeah, correct. Summarize this as we wrap here. When you're in customer conversations, what is the key factor behind Chronospheres success? If you can boil it down to that key nugget, what is it? I think the key nugget is that we're not just fixated on sort of like technical features and functions and frankly, gimmicks of like, oh, what could you possibly do with these three pillars of data? It's more about what can we do to solve organizational pain at the high level? Things like, what is the cost of these solutions? But then also on the individual level, it's like, what exactly is an engineer trying to do and how is their quality of life affected by this kind of tooling? And it's something I'm very passionate about. Sounds like it. Well, the quality of life is important, right? For everybody, for the business and ultimately ends up affecting the overall customer experience. So, great job, Ian. Thank you so much for joining John and me talking about what you guys are doing beyond the three pillars of observability at Chronosphere. We appreciate your insights. Thank you so much. All right. For John Furrier and our guest, I'm Lisa Martin. You're watching theCUBE live Friday morning from KubeCon, CloudNativeCon 22 from Detroit. Our next guest joins theCUBE momentarily, so stick around.