 Hi, this is Hoseb Mibhartiya and welcome to TFI. Let's talk today. We have two members actually co-chairs of the CNCF technical advisory group for observability Alolita Sharma and Matt Young. Alolita, Matt, it's great to have you both on the show. Yes, thank you. Looking forward to it. Very nice to be on the show and thank you so much. Yeah and today's focus is going to be the recent observability micro survey. But before I go there, I would love to hear from you folks is that how have you seen the evolution of observability in the cloud native space? So observability today, you know, it's been a science for a long time and you know, it's not new, it's not a new area of science specific to cloud native computing. But that said, with the evolution of new services that have evolved in the cloud space from different providers as well as, you know, a traditional APM vendors, observability is very much live and well. It's the new incarnation, if you will, of monitoring. And it also incorporates a lot of the systems level monitoring, you know, techniques and tools and ideas that have then been scaled up and incorporated into the cloud native space. Observability data only continues to grow. There's just, you know, all the way from the edge networks to the system services that are used for infrastructure as well as container services, you know, it's a whole swath. And looking at the types of observability data, whether that's traces, logs or metrics, there is just a tremendous amount of telemetry that we can collect today, which really leads to the opportunity for being able to build at scale cloud native, you know, observability services. Matt, I want to hear your opinion as well. How have you seen it evolve? As she said, observability is nothing new, but in the cloud native world, it's kind of, you know, totally different beast. Certainly. Well, I think I've seen a few trends kind of that have been, as Alolita had hinted at moving and coalescing towards each other for some time, right? The technology building blocks found in the CNCF umbrella and open source in general. In this domain have really accelerated their capabilities and their ability to deliver value quite dramatically, you know, almost in lockstep with Kubernetes itself. You know, Borg and Borg bond kind of seeded all of this and not surprisingly, you know, what you can do with platforms and how you can understand what's happening or not have both been, you know, coupled. So, you know, I think one of the biggest shifts in the last few years that I experienced as well as many in the industry is as a technology leader, building out platforms for teams so that they can solve business challenges. Initially, it was almost entirely vendor driven, and sometimes maybe only one or two at a particular enterprise or business. Obviously, not a rule, but increasingly, teams are shifting to having the ability to run their own platforms in concert with a richer ecosystem of vendors. They can go a la carte if they want, they can do it all themselves if they want. Perhaps they need to be air gapped or their industry is regulated. So some portions, they actually want a vendor driven open source solution that they run on their own infrastructure. So we're seeing all of these different use cases become possible now. So now technology decision makers, when they're looking to deliver value, have a deeper catalog with more deployment options. And so they can really make solutions that fit the demand versus a one size fits all approach. One of the things I'd like to add there, though, is that, you know, you're correct that the, you know, diversity of solutions that are available today are tremendous. But it also is not to, you know, under recognize the fact that open source projects in observability, specifically under the CNCF umbrella or, you know, having evolved as, you know, projects from the search space or other monitoring or Kubernetes spaces have intersected into really paying a huge part in the technology innovation that's happening in the cloud native space, right? So if you look at open telemetry or you look at Prometheus, or you look at, you know, some of the very fast growing observability projects under the CNCF umbrella, salium, you know, with EBPF and other projects with Pixie joining in, again, it's phenomenal to see the diversity of different projects and the same time addressing different, you know, solutions that they were trying to address and build in an open standards and an open source way, right? And that intersecting with traditional vendor based solutions where the customer actually has a tremendous amount of choice now to be able to select through a baseline of open source solutions and also services that are, you know, scaled up to support the kind of scale that they're looking for. Yeah, that's absolutely, I completely agree. And in addition, it's kind of, it's an interesting salient point that this entire landscape and many of those workloads are themselves cloud native workloads. So the same observability tools that one uses to observe your own code, your own product, your own services, you know, you can use across, you know, the breadth of these services. And I think another expansion that we're going to be seeing happen, it's already happening is the sort of the rise of control planes and having Kubernetes orchestrate not just containers and their initial surrounding workloads, but really all manner of things. And all of those things are then observable cloud native workloads. So the reach and the scope of what they're being asked to do is going to expand. So they need to be increasingly, the solutions for observability need to be increasingly broadening the tent of practitioners for which they are accessible, right? And not requiring a super steep learning curve. So again, huge opportunities for the entire ecosystem to innovate and compete and move forward as a domain. Thanks for, of course, explaining, you know, what observability is in today's you know, cloud native workloads. Now let's just switch focus and talk about this survey. What was the goal behind this micro survey? What were you trying to like gain? What kind of insights were looking for? I think that, you know, again, if you look at the survey, we were, you know, looking at understanding, you know, what are some of the observability challenges that, you know, the end users are trying to address in their organizations. As, you know, as the amount of data grows, for example, the challenges and the complexity of addressing, you know, an observability solution at scale grows, right? So if you look at some of the practical challenges or the technical challenges or the cultural challenges that are, you know, different, different organizations are trying to look at and arrange there and take solutions based on those challenges that they're trying to address, what kind of tooling is available, you know, what kind of projects are available or, you know, in going strong in terms of supporting that. So that's, that was really to understand what are the challenges customers and users are facing, right? And the other aspect that I think that the, we focused on was really, you know, looking at any other concerns in the growth of cloud native, you know, observability projects and what were some of those practical challenges, you know, in the growth? Is it complexity of the technology? Is it lack of resources or just the maturity of code features or projects, you know, or is it integration? Because typically, you know, when you have existing large-scale monitoring systems already, what is the cost of integrating, you know, new solutions, say for collection or for modern, you know, observation or processing or alerting to be able to send that, you know, to an existing legacy system that you have, right? And that's typically very expensive for organizations to also be involved in. The other aspect, you know, that I think we focused on in this survey was really to look at some of the daily operations that, you know, typically are happening on the, you know, using the cloud providers versus the on-prem, you know, setups that customers or users have and or, you know, self-managed, right? Where, you know, again, there are still users who have data centers that they have, you know, built out over time and they still use them. So really understanding what kind of tooling exists there and does it intersect with the cloud-native, you know, projects and the tooling that is being built for a new generation. Matt, did you want to add any other areas? I think we looked at tooling also, a fair bit. Were there areas that we, you think we focused on otherwise also? I think as to why do these or why engage in having these microsurveys and these check-ins with the community, I think the points you've raised are all spot on. I think another aspect of this is around, you know, the explicit goal that the CNCF has who sponsored this activity around community building, right? I mean, by taking a pulse, by capturing this feedback from practitioners, you know, attempting and successfully in many cases, you know, using these LEGO building blocks, you know, to assemble solutions that make sense to them and then meet their needs, it's good to highlight this up and bring more people in, you know, one of the connectedness and the shared ethos of open source that pervades all of these projects, that's a shared value and an advantage, I feel, to having that as a core value is this open communication and collaboration, right? So in the end-user community, there are case studies and you can trade notes and you can talk about your experiences, what worked and what didn't and why. You know, these are advantages that in a non-open source, you know, closed source universe, you simply don't have, you just have marketing materials and intelligent choices, you know, as much as you can make. Here we have this superpower. So I think our goal is to coalesce these communities and engage in further discussion with additional actors. So I'd like to add that another very significant aspect of, you know, the collaboration that Matt called out is that there is an key drive towards, you know, being able to leverage open source and open standards in the observability space and that has really been a very significant, you know, change in the way that large-scale, you know, observability projects are you know, kind of guiding where interoperability requirements go. Like one of the examples I'd like to call out is, you know, open telemetry where I am very active on, as well as the Prometheus community. We actually have worked closely together to make sure that in the metrics observability world, we have a common standard of interoperability on the data protocol that we can actually interoperate on and the collaboration across, you know, all these projects which have been very innovative in themselves is also very obviously interoperable across the data that is being shared by across these different stacks, right? So I think that that's very important to recognize because typically, you know, you had this separation between open standards being done in something like the W3C, open source projects running in parallel, you know, doing their own thing, solving a particular problem and then vendor-based solutions which are, you know, building out, you know, their stack on their own. But here in this, you know, generation what we see is that there is a very close harmony and work that is ongoing between open standards, which are also, you know, often part of the open source projects that are in the observability space and also vendors being stakeholders in these projects so that they get the best, you know, they contribute to the standard and at the same time also to the open source code base that the end users and benefit most from. And I think that that's phenomenal because that's something that, you know, we actually in previous generations of open source hasn't been as aligned. So we talked about what is observability, we talked about the goal behind this survey now. What are some of the takeaways or your findings very likely like, hey, you know what, this is something, of course, you were expecting and this is something you were not expecting. So let's start there. So I think one thing I was expecting to see from this, you know, as both an end user and talking with colleagues and observing the industry, you know, the general concern around the complexity. You know, one of the dangers of an ecosystem that's growing so rapidly is without active explicit communication and, you know, training and or, you know, dialogue, it's difficult to understand how to use all of these blocks or if you're even using the right thing in the right place. So that didn't surprise me at all. And I do think that different projections of that theme kind of are throughout most of the responses. But that's also a good thing. It's a huge opportunity for things like the technical advisory group on observability, you know, to help provide not all the answers, but to provide the forum and the meeting place where those answers come from. One thing that, yeah, I could leave it there in the interest. Do you want to add to that, Alida, or were there things that you didn't expect, just so I don't answer both sides? One of the things that I didn't expect was also complexity, right, being called out as one of the major, you know, aspects of deployment of use and also configuration, right. And but it's again, I also would say given the state of where all the open source observability projects are at this point in time, it's very important for these solutions to be able to solve the, you know, areas where users go and use these solutions the most. And those are really deployment complexity. You know, Kubernetes is a difficult, you know, environment to deploy in. And, you know, that is not only Kubernetes, but then the observability, you know, solutions built for Kubernetes also kind of emulate that, right. It's difficult to use an operator for most end users. So complexity, I think, is an area which is something that we need to address because we do need to, you know, as open source projects, be able to handle, reduce the complexity of deployment, reduce the complexity of configuration management, have better instrumentation and make it as seamless as possible for an end user to use an observability solution out of the box, right. It should not be that we are reinventing Nagios again for the next generation and we still have the same complexity for the end user to be able to spin up observability and solution and instrument it, right. It should be that observability is actually baked in into the systems or the container infrastructure or the infrastructure that you're monitoring, right. And that's where we want to get to. I could add, too, that another thing that did stand out to me that I didn't expect but I'm thrilled to see is, you know, if you, in the question that asked folks, you know, what are some of the challenges in any regard to moving forward with some of this tooling in this cloud native approach in an observable way, two things that didn't really make the lists that are actually the two lowest was the fear of bugs, you know, buggy code and the fear that their management won't understand the value of using open source. Those are just, give me, those are table stakes now, right, for modern managers and modern companies and organizations that they just have assumed now. So I think that is probably a dramatic shift again over the last, you know, three to five years or even 10 years when the debates around should I use open source or not were quite a bit different and mired on those two points. So that's encouraging. Because of this ongoing geopolitical crisis, cyber security is becoming a very big issue. What role do you think observity is going to play in the whole DevSecOps sorry space, because sometimes when we look at cloud, things are not like there are no silos here. Everything is, you know, so if you can talk about that, I would appreciate that. I think that we're going to see, you know, solutions being incubated in the space and brought to market very quickly for a few reasons. One, they're, you know, as you pointed out, current events do make clear the necessity and the importance of this of this capability. But, you know, I think that much, there's been a lot of innovation in the last five or six years around ad tech and tech machine learning, model training, and all of the associated infrastructure and learnings and best practices around how to implement those, that whole technology stack and the thought domain into solving those challenges that can all be brought to bear quite quickly, I posit. In addition, much of the data that is needed to train all of that obviously comes out of these observability systems and tools. So it's kind of a lined up in my view that this would be a natural place to see a lot of innovation this year and next year, either by taking, you know, off the shelf capabilities and mixing and matching them. We're having people bring, you know, tailored solutions to bear. And just like with, I think the other, the previous question, you know, I think a differentiator for these projects that will both help their communities grow, but will also help the projects themselves be more sustainable and more trusted and vetted is how they go reach a customer job and the practitioners that are trying to use them. I will, you know, make great points around what the technical communities need to do around aligning on open standards. I think at the same time, really, some of the liberties that have been enjoyed by a rapidly growing ecosystem are beginning to expire of it. And many of these projects are going to have to consider a critical role for the success of their product to be product and or pms and, and really folks that can help ensure that that not only are amazing technical achievements being accomplished, you know, but, but that they're able to be translated and made practical and actually solve problems, not just say, hey practitioners, here you go, because an overall industry trend is a move away from a centralized, you know, IT or infrastructure or even a DevOps group that's forward looking, you know, that's still a centralized place. And we're seeing an inversion of control and a democratization, depending on which side of that line you're looking at of these capabilities out to lines of business who will make their own decisions. And so, you know, increasingly, you can't rely on that centralized function within an enterprise or an organization to take on the cost of training and documenting and saying, what this is, why do you care? Who does it help? Why? All of those things are now, you know, in an a la carte world, even if it's with with with similar open standards, you know, for people to use it and feel and know and be able to demonstrate to their peers that it's worth doing at all or it's worth doing with this particular solution. That's things that people are going to need. And I think projects that provide that final minor or that enabling collateral will see a good return on the investment for sure. A couple of interesting things I'd like to call out in the security space, especially one is that I think there is a lot of work that is ongoing, especially in the eBPF world, to be able to actually handle, you know, specific security use cases and also data which needs to be, you know, is coming from sources which are secure or need to guarantee an end-to-end secure, you know, pipeline even for telemetry. And that's something, again, both from a services and systems level as well as from an end user device level is something that needs to evolve, right? Because those are, I think, complex use cases, logging typically has been used in, you know, those use cases for collecting telemetry data, but you also see the evolution with eBPF, you know, projects as well as others to look at the kernel space, address security use cases, to look at the collection of, you know, telemetry from the other parts of the stack, whether that's network, network layer or, you know, distributed different systems running on top and being able to actually handle not only logs, but metrics, traces, and then being able to correlate and process and be able, you know, in a secure way, not only end-to-end for the pipeline, but also in terms of specific use cases where the data is secure, right? So there is a lot happening here in addressing those, but I think that we're not there yet. There's more work to be done around observability, you know, being able to fully handle, you know, same use cases. Excellent. Before I wrap this up, one more question I want to ask you folks is, since you brought it up also, is the cultural aspect, as you're talking about Matt and also a lot of that you talked about, open source for me, like it solves day one problem, right? Day two is where a lot of vendors come in to package as a service or product because, but more importantly, that cultural shift within companies, organizations where they look at observability as part of, you know, their workflow pipelines through the survey or beyond the survey, how much cultural shift you're seeing towards observability where once again, you don't have to educate folks about it and companies do understand the importance and they are doing changes internally. You know, in terms of a cultural shift, I do agree, you know, for many years, you know, it was almost a meme, you know, so common it was, it was well common that, you know, security is not a feature. It's something that is intrinsic to writing, you know, credible, workable solutions. You know, I think that that something should be observable, you know, to the consumer of that thing is now also just, you know, it's not a check the box anymore. It just has to be there or you're not looking at it at all. I do think that will be the mindset that folks have. Like I just, you know, if I'm going to invest in something, I shouldn't also have to invest in figuring out if it's even working. Right? So that day one experience really needs to be, I do think, solid. And so I think those expectations and practitioners and IT groups and DevOps groups and cloud groups, however they're called, or folks that are self-servicing for their teams, you know, that's kind of a common expectation. You know, I ask my kids sometimes imagine life without a cell phone and they're just, you know, I think it'll be that way with observability to the point where the term might not even be relevant because it's assumed as to, you know. Yes. Yes. And then that's what it should be, right? I mean, observability should be as core pillar of computing as, you know, computing itself, right? And it should be just baked in. And in terms of a cultural shift, you know, I can't echo it enough, you know, we talked about edge computing for many years as, you know, the edge of the network, you know, that that's where, you know, somewhere there's a handoff and then you get to a user. Well, I think the observation of cloud native systems when like, you know, this is a cloud, you know, this is a cluster, you know, my car is in that ecosystem or, you know, you know, all my vehicles just smart transit. Like I think we are, we are now walking around in a cloud native system, right? So we're, so from a cultural perspective, you know, because it's so ubiquitous now, I think it raises both some ethical, another ethical challenges to us in industry to build the right systems that protect privacy and whatnot. But really, observability is so important because it's going to be how we observe ourselves and how we interact with each other. And, you know, the increasingly automated, potentially robotic points of world that we're going to be walking around. Right? So it's very concrete now. Yeah. And I think that again, you know, addressing the question you asked about culture, right? It's always been the case that, you know, you have several challenges in being able to have an observability first culture, right? Because traditionally, it's been always taken for granted that, hey, you know, if you're doing monitoring of a system, if you have SRE in the company, then it's the SRE's problem to, you know, go and figure it out and use the tools that they need to and just use whatever. But as you also see, you know, from a cloud native perspective, as you build out these large cloud native, you know, infrastructure investments for every organization, you also need to shift the skillset, the mindset of the leadership of the organization, the understanding of the complexity, you know, shifting at a different layer and really, you know, continuous support from your leadership and understanding benefits of, you know, why observability first approach day one matters, right? Because if your systems are not observable, if your end-to-end workflows and workloads of data are not observable, if you're, if you cannot really get a pulse real-time, whether it's on your, you know, IoT network or whether it's at the core of your cloud native, you know, services, then, you know, how do you optimize, right? How do you actually reduce root cause, you know, time to go and address a particular and mitigate a particular problem. And these are areas that, you know, have been, as we have large scale computing, which is being deployed, you know, at a cloud native scale, these problems become even more complex to address. And that is a continuous balance and learning that needs to happen in every organization. As you shift that balance, you know, of computing, not only to everything is in your control, but you're now having to deal with a whole layer of cloud native services that, you know, have to be observable. And even your end-user devices that have to be observable, and that's a whole mix, right? So it changes the equation dramatically. We should mention this Tuesday at the tag, we've got the Hubble project, as well as the Pixie project, at the beginning of an update of the last almost year of development since they last joined us. So that should be really great. And we're going to be getting an overview of a portion of the Sillium project that's focused on network observability from that project we'll be presenting. And then lastly, one of the work streams that has launched recently around making a five to 10-minute news segment on Accadence. Hendrick's rather will be giving an update on that. So first and second Tuesday of the month. Excellent. Lalitha, Matt, thank you so much for taking time out today and talk about not only this micro survey, actually that survey was a micro on this discussion. We had a much wider discussion about observability. Thanks for sharing those great insights. And as I said, I would love to have you folks back on the show, but thanks for your time today. Anytime. Thank you so much. Thank you, Stephanie. It's really a pleasure to chat with you.