 From theCUBE Studios in Palo Alto in Boston, connecting with thought leaders all around the world, this is a CUBE Conversation. Hello and welcome to theCUBE Conversation here in Palo Alto, California. I'm John Furrier, host of theCUBE with a special content series called Leading with Observability. And this topic is keeping watch over microservices and containers with great guests, Patrick Lin, VP of Product Management for the Observability products at Splunk. Patrick, great to see you. Thanks for coming on remotely. We're still in the pandemic, but thanks for coming on. Yeah, John, great to see you as well. Thanks for having me. So Leading with Observability, this is a big theme of our content series. You know, managing end-to-end user experience was a great topic around how data can be used for user experience. But now underneath that layer, you have this whole craziness of the rise of the container generation where containers are actually going mainstream and Gartner will forecast anyway from 30 to 40% of enterprises still yet haven't really adopted at full scale. And you got to keep watch over these. So what is the topic about keeping watch over microservices and containers? Because we know they're being deployed. Is it just watching them for watching sake or is there a specific reason? What's the theme here? Why this topic? Yeah, well, I think containers are part of the entire stack of technology that's being deployed in order to develop and ship software more quickly, right? And the fundamental reasons for that haven't changed but they've been greatly accelerated by the impact of the pandemic, right? And so I think for the past few years we've been talking about how, you know, software is eating the world, how it's become more and more important that companies go through the transformation to be more digital, right? And I think now that is so patently obvious to everybody, when your only way of accessing your customer and for the customer to access your services is through a digital medium, right? The ability for your IT and DevOps teams to be able to deliver against those requirements to deliver that flawless customer experience to sort of keep pace with the digital transformation and the cloud initiatives, right? All of that is kind of coming as one big wave. And so, you know, we see a lot of organizations migrating workloads to the cloud, refactoring applications, you know, building new applications natively. And so when they do that, oftentimes the infrastructure of choice is container, right? Because it's the thing that keeps up with the pace of the development. It's a much more efficient use of underlying resources. So it's all kind of part of the overall movement that we see. What is the main driver for this use case, microservices? And where's the progress bar in your mind of the adoption and deployment of microservices? And what is the critical things that you guys are looking at that are important to monitor and observe and keep track of? Is it the status of the microservices? Is it the fact that they're being turned on and off? The state, not state? I mean, take us through some of the main drivers for why you guys are keeping an eye on the microservices component. Sure. Well, I think that, you know, if we take a step back, the reason that people have moved towards microservices and containers fundamentally has to do with the desire to be able to, number one, develop and ship more quickly, right? And so if you can parallelize the development, have, you know, APIs as the interface between these services rather than having sort of one monolithic code base, you can evolve more quickly, right? And on top of that, right? The goal is to be able to deliver software that is able to scale as needed, right? And so that is part of the equation as well. So when you sort of look at this, right? The desire to be able to iterate on your software and services more quickly, to be able to scale infinitely staying up and so on, right? That's all like a great reason to do it, but what happens along those lines, right? What comes with it is a few kind of additional layers of complexity because now rather than have, you know, let's say an end here app that you're watching over on some hosts that you could, you know, reboot when there's a problem, now you have tens, maybe hundreds of services running on top of maybe hundreds, thousands, maybe tens of thousands of containers, right? And so the complexity of that environment has grown quite quickly, right? And the fact that those containers may go away as you scale the service up and down to meet demand also adds to that complexity, right? And so from an observability perspective, what you need to be able to do is a few things. One is you need to actually be tracking this in enough detail and at a high enough resolution, right? In real time, so that you know when things are coming in and out, right? And that's been, you know, one of the more critical things that we've built towards Splunk is that ability to watch over it in real time. But more important or just as important in that, right, is understanding the dependencies and the relationships between these different services, right? And so that's one of the main things that we worked on here is to make sure that you can understand the dependency so that when there's an issue, you have a shot at actually figuring out where the problem is coming from, right? Because of the fact that there's so many different services and so many things that could be affecting the overall user experience when something goes wrong, right? I think that's one of the most exciting areas right now and observability is this whole microservices container equation, because a lot of actions being done there, there's a lot of complexity, but the upside, if you do it right, is significant. I think people generally are bought into that concept, Patrick, but I want to get your thoughts. I get this question a lot from executives and leaders, whether it's a cloud architect or a CXO. And the question is, what should I consider? What do I need to consider when deploying an observability solution? Yeah, yeah, that's a great question. Because I think there are obviously a lot of considerations here, right? So I think one of the main ones, and this is something that I think is a pattern that we are pretty familiar with in the sort of monitoring and management tool world, right? Is that over time, most enterprises have gotten themselves a very large number of tools, right? The one for each part of their infrastructure or their application stack and so on, right? And so what you end up with is sprawl in the monitoring tool set that you have, which creates not just sort of a certain amount of overhead in terms of the cost, but also complexity that gets in the way of actually figuring out where the problem is, right? Like, I've been looking at some of the tool sets that some of our customers have pulled together and they have the ability to get information about everything, but it's not kind of woven together in a useful way. And it sort of gets in the way actually, having so many tools when you are actually in the heat of the moment trying to figure something out, right? It sort of harkens back to the time when you have an outage, you have a con call with like a cast of thousands on it trying to figure out what's going on, right? And each person comes to that with their own tool, with their own view without anything that ties that to what the others are seeing, right? And so that need to be able to provide sort of an integrated tool set with a consistent interface across infrastructure, across the application, across what the user experiences and across the different data types, right? The metrics, the traces, the logs, right? Fundamentally, I think that ability to kind of easily correlate the data across it and get to the right insight, we think that's a super, super important thing. Yeah, and I think what that points out, I mean, I always said, don't be a fool with a tool. And if you have too many tools, you have a tool shed and they've got too many tools everywhere. And that's kind of a trend. And tools are great when you need tools, right? To do things. But when you have too many, when you have a data model where essentially what you're saying is a platform is the trend, because weaving stuff together, you need to have a data control plane. You need to have data visualization. You need to have these things for understanding the success there. So really it's a platform, but platforms also have tools as well. So tools are features of a platform. If I get what you're saying, right? Is that correct? Yeah. Yeah, so I think that there's one part of this, which is you need to be able to, if I start from the user point of view, right? What you want is a consistent and coherence set of workflows for the people who are trying to actually do the work, right? You don't want them to have to deal with the impedance mismatches across different tools that exist based on, you know, whatever, even the language that they use, but how they bring the data in and how it's being processed, right? You go down one layer from that, you sort of want to make sure that what they're working with is actually consistent as well, right? And that's the sort of capabilities that you're looking at, whether you're whatever, trying to chart something to be able to look at the details or go from a view of logs to the related traces, you sort of want to make sure that the information that's being served up there is consistent, right? And that in turn relies on data coming in in a way that is sort of processed to be correlated well. So that if you say, hey, I'm looking at a particular service, I don't want to understand what infrastructure it's sitting on or I'm looking at a log and I see that it relates to a particular service and I want to look at traces for that service. Those things need to be kind of related from the data on in and that needs to be exposed to the user so they can navigate it properly and make use of it, you know, whether that's during kind of, you know, wartime, you know, during, during Initson or peacetime. Yeah, I love that wartime conciliary versus, you know, peacetime, you know, it's sort of blog posts from a VC, I think it's that don't be a Tom Hagan, which is the guy in the Godfather when the famous lines that don't, you're not a wartime conciliary, which means things are uncertain in these times and you got to get them to be certain. This is a mindset and this is part of the pandemic we're living in. Great point, I love that. I love, maybe we could follow up on that at the end but I want to get some of these topics I want to get your reaction to. So I want you to react to the following Patrick. It's an issue and a topic and there it is. Missing data results in limited analytics and misguided troubleshooting. What's your reaction to that? What's your take on that? What's the Splunk's take on that? Yeah, I mean, I think Splunk has sort of been a proponent of that view for a very long time, right? I think that, you know, whether that's from the log data or from let's say the metric data that, you know, we captured high resolution or from tracing, right? The goal here is to have the data that you need in order to actually properly diagnose what's going on, right? And I think that older approaches, especially on the application side, tend to sample data right at the source and provide, you know, hopefully useful samples of it for, you know, when you have that problem, right? That doesn't work very well in the microservice world because you need to actually be able to see the entirety of a transaction of full trace across many, many services before you could possibly make a decision as to what's useful to keep, right? And so the approach that I think we believe is the right one is to be able to capture at full fidelity, right? All of those bits of information, partly because of what I just said, right? You want to be able to find the right sample, but also because it's important to be able to tie it to something that may be being pulled in by a different system, right? So an example of that might be in a case where you are trying to do real user monitoring alongside of APM and you want to see the end-to-end trace from what the user sees all the way through to all the backend services, right? And so what's typical in this world today is that that information is being captured by two different systems with making, you know, independent sampling decisions, right? And therefore the ability to draw a straight line from what the end-user sees all the way to what is affecting it on the backend is pretty hard, right? Where it gets really expensive. And I think the approach that we've taken is to make it so that that's easy and cost-effective, right? And it's tremendously helpful then to tie it back to kind of what we were talking about at the outset here where you were trying to provide services that make sense and are easy to access and so on to your end-user to be able to have that end-to-end view because you're not missing data, right? It's tremendously valuable. You know what I love about Splunk is because I'm a data geek going back when it wasn't fashionable back in the 80s. And Splunk has always been about ingesting all the data. So you bring all the data, we'll take it all. And at the beginning it was pretty straightforward. I mean, complex but still not a great utility. But even now today it's the same thing you just mentioned, ingest all the data because there's now benefits. And I want to just ask you a quick question on this distributed computing trend because everyone's pretty much in agreement that's in computer science or in the industry and technology says, okay, cloud is a distributed computing with the edge, it's essentially distributed computing in a new way, a new architecture with new great benefits, new things. But science is still, you can apply some science there. You mentioned distributed tracing because at the end of the day, that's also a new major thing that you guys are focused on. And it's not so much about, it's also, yeah, get me all the data, but distributed tracing is a lot harder than understanding that because of the environment and it's changing so fast. What's your take on it? Yeah, yeah, well fundamentally, I think this goes back to ironically, one of the principles in observability, right? Which is that oftentimes you need participation from the developers in sort of making sure that you have the right visibility, right? And it has to do with the fact that there are many services that are being kind of strong together as it were to be able to deliver on some end user transaction or some experience, right? And so the fact that you have many services that are part of this means that you need to make sure that each of those components is actually kind of providing some view into what it's doing, right? And distributed tracing is about taking that and kind of weaving it together so that you get that coherent view of the business workflow within the overall kind of web of services that make up your application, right? So the next topic I want to get into, we got to live it in time, but I'm going to squeeze through, but I'm going to read it to you real quick. Slow alerts and insights are difficult to scale. If they're difficult to scale, it holds back the mean time between resolving, okay? And so it's difficult to detect in cloud. It was easier maybe on premise, but with cloud, this is another complexity thing. How are you seeing the inability to scale quickly across the environments to manage the performance issues and delays that are coming out of not having that kind of slow insights or managing that? What's your reaction to that? Yeah, well, I think there are a lot of tools out there that will take in events or issues from cloud environments, but they're not designed from the very beginning to be able to handle the sort of scale of what you're looking at, right? So I mentioned, it's not uncommon for a company to have tens or maybe even hundreds of services and thousands of containers or hosts, right? And so the sort of sheer amount of data you have to be looking at on a non-going basis and the fact that things can change very quickly, right? Containers can pop in and go away within seconds, right? And so the ability to track that in real time implies that you need to have an architectural approach that is built for that from the very beginning, right? It's hard to retrofit a system to be able to handle orders of magnitude, more complexity and change and pace of change, you know, you need to start from the very beginning, right? And the belief we have is that you need some form of a real-time streaming architecture, right? Something that's capable of providing that real-time detection and alerting across a very wide range of things in order to handle the scale and the ephemeral nature of cloud environments, right? Well, let me ask you a question then because I heard some people say, well, it doesn't matter, you know, 10, 15 minutes to log on an event is good enough. What would you react to that? What a great example of where it's not good enough. I mean, something, is it minutes, is it seconds? What are we talking about here? What's the good enough bar right now? Yeah, I mean, I think anybody who has tried to deliver an experience digitally to an end user, if you think you can wait minutes to solve the problem, you clearly haven't been paying enough attention, right? And I think that, you know, I think it almost goes without saying that the faster you know that you have a problem, the better off you are, right? And so, you know, when you think about, you know, what are the, you know, objectives that you have for your service levels or your performance or availability? I think, you know, you run out of minutes pretty quickly if you get to anything like, you know, say three nines, right? So, you know, waiting 15 minutes, maybe it would have been acceptable before people were really trying to use your service at scale, right? Yeah, and the latency required is that it's super important. I brought that up at tongue in cheek kind of tee that up for you because, you know, these streaming analytics, streaming engines are super valuable and knowing when to use real time and not also matters, right? This is where the platform's coming in. Yes, absolutely. The platform is a thing that enables that, right? And I think you have to sort of build it from the very beginning with that streaming approach with the ability to do analytics against the streams coming in in order for you to deliver on the sort of promise of, you know, alerts and insights at scale and in real time. All right, final point, I'll give you the last word here. Give a plug for the Splunk Observability Suite. What is it? Why is it important? Why should people buy it? Why should people adopt it? Why should they upgrade to it? Give the perspective, give the plug. Yeah, sure, appreciate the opportunity. So I think as we've been out there speaking to customers, right? Over, you know, over the last year as part of Splunk and before that, I think they've spoken to us a lot about the need for better visibility into their environments, right? Which are increasingly complex and where, you know, they're trying to deliver on the best possible user experience and to sort of add to that where they're trying to actually consolidate the tools, right? We spoke about the sprawl at the beginning. And so with what we're putting together here with the Splunk Observability Suite, I'd say we have the industry's most comprehensive and powerful combination of solutions that will help both sort of IT and DevOps teams tackle these new challenges for monitoring and observability that other tools simply can't address, right? So you're able to eliminate the management complexity by having a single consistent user experience, right? Across the metrics and logs and traces so that you can have seamless monitoring and troubleshooting and investigation, right? You can create better user experiences by having that true end-to-end visibility all the way from the front end to the backend services so that you can actually see like what kind of impact you're having on users, right? And figure it out within seconds. I think we're also able to help increase developer productivity, right? These high-performance tools that help the DevOps teams get to better quality code faster, right? Because they can get immediate feedback on how their code chains are doing with each release, right? And they're able to operate more efficiently, right? So I think there's a very large number of benefits from this approach of providing a single unified tool set that relies on a source of data that's consistent across it but then has the sort of particular tools that different users need for what they care about whether you're the front end developer needing to understand the user experience whether you're backend service owner wanting to see how your service relates to others whether you're owning the infrastructure and need to see, you know, is it actually providing what these services running on it need? Well, Patrick, great to see you and I just want to say congratulations have been following your work going back in the industry specifically with SignalFX you guys were really early in seeing the value of observability before it was a category and so how it's morphed and so relevant as you guys that saw it. So congratulations and keep up the great work. We'll keep the conversations open. Thanks for coming on. Great. Thanks so much, John. Great talking to you. All right, this is theCUBE Leading with Observability. It's a series, check it out. We have a multiple talk tracks check out the Splunks series Leading with Observability. I'm John Furrier with theCUBE. Thanks for watching.