 From theCUBE Studios in Palo Alto in Boston, connecting with thought leaders all around the world, this is a CUBE Conversation. Welcome to this special CUBE Conversation here in the Palo Alto Studios. I'm John Furrier, host of theCUBE for this Leading with Observability series with Under the Hood with Splunk Observability. I'm John Furrier, Arajit Mukherjee with Splunk. He's a distinguished engineer. Great to have you on. These are my favorite talks. Under the Hood means we're going to get all the details. What's powering Observability? Thanks for coming on. It's my pleasure, John. It's always nice to talk to you. Leading with Observability is this series. We're going to take a deep dive look across the spectrum of the product, the problems that it's solving. But Under the Hood is a challenge because people are really looking at, coming out of COVID with a growth strategy, looking at cloud native, Kubernetes is starting to see microservices really be a big part of that in real deployments, in real scale. This has been a theme that's been growing. We've been covering it. But now architectural decisions start to emerge. Could you share your thoughts on this? Because this becomes a big conversation. Do you buy a tool here? How do you think that through? What's the approach? Exactly, John. So it's very exciting times in some sense with Observability right now. So as you've mentioned and discussed a few times, there's a bunch of trends that are happening in the industry, which is causing a renewed interest in Observability and also an appreciation of the importance of it. And Observability now as a topic, it's like a huge umbrella topic. It covers many, many different things, like APM, your infrastructure monitoring, your logging, your real user monitoring, your digital experience management and so on. So there's quite a set of things that are all fall under Observability. And so the challenge right now, as you mentioned is, how do we look at this holistically? Because I think at this point, it is so many different parts to this edifice, to this building that I think having a non-integrated strategy where you just maybe go buy or build individual pieces, I don't think that's gonna get you very far, given the complexity of what we're dealing with. And frankly, that's one of the big challenges that we as architects within Splaft, like we are scratching our heads with is how do we sort of build all of this in a more coherent fashion? You know, one of the things, Arjun, I want to get your thoughts on is because I've been seeing this trend and again, we've been talking about on theCUBE a lot around systems thinking. And if you look at the distributed computing wave from just go back 20 years and look at the history of how we got here, a lot of those similar concepts are happening again with the cloud, but not as simple. You're seeing a lot more network, I won't say network management, but Observability is essentially instrumentation of the traffic and looking at all the data to make sure things like breaches and cybersecurity and also making systems run effectively. So there's all, but it's distributed computing at the end of it. So there's a lot of science that's been there and now new science emerging around, how do you do this all? What's your thoughts on this? Because this becomes a key part of the architectural choices that some companies have to make if they want to be in position to take advantage of cloud native growth, which is multi-fold benefits and your product people talk about it, faster times of marketing, all that good stuff, but these technical decisions matter. Can you explain? Yes, it absolutely does. I think the main thing that I would recommend that everybody do is understand why Observability, what do you want to get out of it? So it is not just a set of parts, as I mentioned earlier, but it brings direct product benefits as we mentioned, like, you know, faster, meantime to resolution, understanding what's going on in your environment, having maybe fewer outages at the same time, understanding, you know, your cost, there's so many different benefits. So the point is not that one has the ability to do, maybe do APM or ability to do infrastructure monitoring. The main question is, aspirationally, what are my goals that are aligned to what my business wants, right? So do I, you know, what do I want to achieve? Do I want to innovate faster? In that case, how is Observability going to help me? And this is sort of how you need to define your strategy in terms of what kind of tools you get and how they work together, right? And so if you look at what we are doing at Splunk, you'll notice it's extremely exciting right now. There's a lot of acquisitions happening. We have a lot of products that we're building. And the question we're asking as architects is, suppose we want to use, that will help us achieve all of this and at the same time be somewhat future-proof. And I think even if, you know, any organization that's either investing in it or building it or buying it, they all would probably want to, you know, think along those lines. Like what are my sort of foundational principles? What are the basic qualities I want to have out of this system? Because technologies and infrastructures will keep on changing. That's sort of the rule of nature right now. The question is, how do we best address it in a more future-proof system, right? And so at Splunk, we've come up with a few guiding principles and I'm sure others will have done the same. You know, one of the dynamics I want to get your reaction to is kind of two perspectives. One is the growth of more teams that are involved in the work, right? So whether it's from cyber to monitoring. So there's more teams with tools out there that are working on the network. And then you have the, just the impact of the diversity of use cases, not just so much data volume, because that's been talked about. We're having a tsunami of data, that's clear. But different kinds of dynamics, whether it's real time, bursting. And so when you have this kind of environment, you can have gaps. And SolarWinds, if it's thought of anything, is that you have to identify problems and resolve them. This comes up a lot in observability conversations. M-T-T-I, meantime, to identify and then to resolve. These are concepts. If you don't see the data, you can't understand what's going on. If you can't measure it. This is like huge. Yes, absolutely right. Absolutely right. So what we really need now is, as you mentioned, we need an integrated tool set, right? What we mean by that is the tools must be able to work together. The data must be able to be used across the board. So like my use cases should not be siloed or fragmented, that they should work as one system that users are able to learn and then sort of be able to use effectively without context switching. Another thing that I think concept that's quite important is like, how flexible are you? Like, are you digging yourself into a fixed solution or are you depending on open standards that will then let you, you know, change out implementations or vendors or what have you on the line relatively easily? So understanding how you're collecting the data, how could kind of open standards and open source you're using is important. But to your point about missing and gaps, I think full fidelity, like understanding every single transaction, if you can pull it off is a fascinating superpower because that's where you don't get the gaps. And if you are able to go back and track any bad transaction anytime, like that is hugely liberating, right? Because without that, if you're going to do a lot of sampling, you're going to miss like in a huge percentage of the user interactions, that's probably a recipe for some kind of trouble down the line, as you mentioned. And actually these are some of those principles that we are using to build the Splunk observability suite is like no sample or full fidelity is a core foundational principle. And for us, it's not just isolated to let's say, application performance management, where you know, user hits your API and you're able to track what happened. We were actually taking this upstream up to the user. So the user is taking actions on the browser. How do we capture and correlate what's happening on the browser? Because nowadays, as you know, there's a huge move towards single page applications where like half of my logic that my users are using is actually running on the browser, right? And so understanding the whole thing end to end without any gaps, without any sampling is extremely powerful. And so yes, so those are some of the things that we are investing in. And I think again, one should keep in mind when they're considering observability. You know, we were talking the other day and having a debate around technical debt and how that applies to observability. And one of the things that you brought up really about tools and tool sprawl, you know, that causes problems. You have operational friction and we've heard people say, yeah, you know, I got too many tools and just too much to replatform or refactor, it's just too much paying the butt for me to do that. So at some point they break. I have taken too much technical debt. When is that point of no return where someone feels the pain on tool sprawl? What are some of the signaling where it's like you better move now or while that's going too late? Because this integrated platform, that's what seems to be the way people go, as you mentioned. But this tools sprawl is a big problem. It is. And I think it starts hitting you relatively early on nowadays, if you ask my opinion. So tools sprawl is, I think if you find yourself, I think using three or four different tools, which are all part of some critical workflow or the other, that's just think that there's something could be optimized. For example, let's say I'm observing, whether my website works fine. And if my alerting tool is different from my data gathering or whatever the infrastructure monitoring metrics tool, which is different from my incident management tool, which is different from my logs tool, then if you put the hat on of an engineer, a poor engineer who's dealing with the crisis, the number of times they have to context switch and the number of sort of amount of friction that adds to the process, that the delay that it adds to the process is very, very painful. So my thinking is that at some point, especially if we find that core critical workflows are being fragmented, and that's when sort of I'm adding a bunch of friction, it's probably not good for us to sort of make that sort of keep on going for a while. And it would be time to address that problem. And frankly, having these tools integrated, it actually brings a lot of benefit, which is far bigger than the sum of the parts because think about it, if I'm looking at say an incident and if I'm able to get a cross tool data all presented in one screen, one UI, that is hugely powerful because it gives me all the information that I need without having to again dig into five different tools and allows me to make quicker, faster decisions. So I think this is almost an inevitable wave that everybody must and will adopt. And the question is, I think it's important to get on the good program early because unless you sort of build a lot of practices within an organization that becomes very, very hard to change later, it is just going to be more costly down the line. So from an architecture standpoint, under the hood integrated platform takes that tools for all problem away helps there. You had open source technology, so there's no lock in. You mentioned full fidelity, not just sampling full end-to-end tracing, which is critical, wants to avoid those gaps. And then the other thing I want to get your thoughts that you didn't bring up yet isn't that people are talking about is real-time streaming of analytics. Okay, what role does that play? Is that part of the architecture? And what function does that do? Right. So to me, it's a question of how quickly do I find a problem? If you think about it, we are moving to more and more software services, right? So everybody's a software service now and we're all talking to each other in different services. Now, anytime you use a dependency, you want to know how available it is, what are my SLAs and SLOs and so on. And three nines is almost like it's a given that you must provide three nines or better, ideally four nines of availability because your overall system stability is going to be less than the one of any single part. And if you're going to look at four nines, you have about four or five minutes of total downtime in one whole month. That's a hard thing to be able to control. And if you're alerting is going to be in order of five or 10 minutes, like there's no chance you're going to be able to promise the kind of high availability that you need to be able to do. And so the fundamental question is you need to understand problems quick, like fast within seconds, ideally. Now streaming is one way to do it, but that really is the problem definition. Like how do I find the problems early enough so that I can give my automation or my engineers time to figure out what happened and take corrective action? Because if I can't even know that there's something amiss, then there's no chance I'm going to be able to sort of provide the availability that my solution needs. So in that context, real time is very important. It is much more important now because we have all these software and service dependencies than it maybe used to be in the past. And so that's kind of why, again, at Splunk we invented or reinvested in real time streaming analytics. But the idea again being that's the problem. How can we address this? How can we provide customers with quick, high level important alerts in seconds? And that's sort of a real time streaming is probably the best way to achieve that. And then if I were to, yeah, sorry, go ahead. No, go ahead, finish. Yeah, I was going to say that, it's one thing to get an alert, but the question then is now what do I do with it? And there's obviously a lot of alert noise that's going out and where people are fatigued. And I have all these alerts. I have this complex environment. Understanding what to do, which is the sort of the reducing the MTTR part of it, is also important. I think environments are so complex now that without a little bit of sort of help from the tool, you are not going to be able to be very effective. It's going to take you longer. And this is also another reason why integrated tools are better because they can provide you hints looking at all the data, not just one type, like not just necessarily logs or not just necessarily traces, but they have access to the whole dataset and they can give you far better hints. And that's again, one of the again, foundational principles because this is in the sort of the emergent field of AI ops where the idea is that we want to bring the power of data science, the power of machine learning and to aid the operator in figuring out where a problem might be so that they can at least take corrective action faster, not necessarily fix it, but at least bypass the problem or take some kind of corrective action. And that's a theme that sort of goes across our suite of tools is the question we ask ourselves is in every situation, what information could I have provided them? What kind of hints could we have provided them to make short-circuit their resolution process? It's funny you mentioned suite of tools. You have an observability suite which Splunk leads with as part of the series. It's funny suite of tools. It's kind of like, you kind of don't want to say it but it is kind of what's being discussed. It's kind of a platform and tool working together. And I think the trend seems to be, it used to be in the old days, you were a platform player or a tool player. You really kind of couldn't do both. But now with cloud native as it's distributed computing with all this importance around observability, you got to start thinking suite is kind of like a platform features. Could you react to that? And how would you talk about that? Because what does it mean to be a platform? Because platforms have benefits. Tools have benefits. Working together implies it's a combination. Could you share your thoughts on that reaction to that? That's a very interesting question you asked John. So this is actually, if you asked me like how I look at the solution set that we have, I will explain it to us. We are a platform. We are a set of products and tools and we are an enterprise solution. And let me explain what I mean by that because I think all of these matter to somebody or the other. As a platform, you're like, how good am I in dealing with data? Like ingesting data, analyzing data, alerting you. So those are the core foundational features that everybody has. Like these are the database centric aspects of it. And if you look at a lot of organizations who have matured practices, they are looking for a platform where maybe it scales better than what they have or whatnot and they're looking for a platform. They know what to do, build out on top of that. But at the same time, a platform is not a product. 99% of our users, they're not going to make database calls to fetch and query data. They want an end to end, like a thing that they can use to say, monitor my Kubernetes, monitor my elastic search, monitor my whatever other solution I may have. So then we build a bunch of products that are built on top of the platform, which provides sort of the usability, right? So where it's very easy to get on, get send the data, have built-in content, dashboards alerts, what have you so that my day-to-day work is fast because I'm not an observability engineer. I'm a software engineer working on something that I want to use observability, make it easy for me, right? So that's sort of the product aspect of it. But then if you look at organization that a little bit scale up, just the product is also not good enough. Now we're looking at the observability solution that's deployed in an enterprise and there are many, many products, many, many teams, many, many users. And then how can one be effective there? And if you look at what's important at that level, it's not the database aspect or the platform aspect, it's about how well can I manage it? Do I have visibility into what I'm sending? What my bill is? Can I control against incorrect usage? Do I have permissions to sort of control who can mess with my cheese and so on? And so there's a bunch of layer of what we call enterprise capabilities that are important in an organizational setting. So I think in order to build something that's successful in this space, we have to think at all these three levels, right? And all of these are important because in the end it's how much value am I getting out of it? Is not just what's theoretically possible? What's really happening? And all of these are important in that context. And I think, Arjun, that's an amazing masterclass right there, soundbite right there. And I think it's because the data also is important. If you're going to be busting down data silos, you need to have a horizontally scalable data observability space. You have to have access to the data. So I think the trend will be more integrated, clearly and more versatile from a platform perspective. It has to be. Absolutely, absolutely. Well, we're certainly going to bring you back on our conversations when we have our events and or our groups around digital transformation under the hood series that we're going to do, but great voice, great commentary. Arjun, thank you for sharing that knowledge with us, appreciate it. My pleasure, thank you very much. Okay, I'm John Furrier with theCUBE here, leading with observability content series with Splunk. I'm John Furrier with theCUBE. Thanks for watching.