 Hi, this is your host up in Bhartya and welcome to another episode of T3M or topic of this month a topic of this month is Observability and today we have with us once again. Justine Hartung Managing partner at Carrick. Justine is great to have you on the show. Thank you So I'm the that's great to be here. How you have seen the the evolution of observatory over the years and when it's like I say evolution we can also talk about What role of the replay why we need observatory and how it has evolved because the whole landscape has changed Use cases of change technologies of change I'll start with a brief how I how I think of observability because it might be different than how some other people think and When I think of their origins of observability, it was really about understanding what went wrong for incident management Think of those logs and metrics so that people could could really dive in and figure out Okay, where did something go wrong and fix it? But then companies like Google and Netflix they started to build on these micro services and they couldn't just rely on simple logs It was hard to look at all the components so they also started to build Distributed tracing which allowed every component to send telemetry back to a central system and then understand how are all those systems working together and Then Google also through their SRE practices start to go from a reactionary such as an instant happen to where the proactive measurements the SLIs or indicators to understand is something about to to go south and then intervene before it does and These practices and these research papers that were published really led to some crafty vendors to re-brand the whole monitoring into Observability and they really took that from scientific definitions of understanding the state of a system based on its outputs or the data that it generates So now I think that this has really evolved beyond simply just looking at the technical systems because the same data is Incredibly useful for other people in organization Let's think about a product manager trying to figure out Are we shipping the right features and is does the changes that we make impact the user behavior and the outcomes? And so like a back to an example again at Google is they wanted to find out how does latency of a search query respond to the user behavior and they found that that's you know that as the Search query end-to-end response time went from half a second to a quarter second the growth of people's search started to grow Exponentially and with that meant is that people started to think of Google search as another way of thinking and just start to use it Seamlessly, but as soon as it got above 250 milliseconds then it started drop-off quite a bit So this is a very interesting insight and without having all that data and understanding of what does it take for the user? Responses and how they change and in some cases even artificially injected latency just to see how it affected behavior So that's kind of more broadly On observability and I think where we're starting to go if you could think of That same concept of latency now companies are running thousands of experiments at one and these observability suites are really using a lot of Intelligence to really highlight like well What are the things that are productive and what are the things that aren't productive in a way that is Consumable consumable by the the mere mortal engineer excellent And you also kind of lightly touched upon the next question that I'm going to ask which is about How has the scope of authority grown beyond the original idea as you said, you know that you can you know Other teams can also leverage it. Absolutely, and I think that this is just the Evolution that we're seeing that's going to continue to to grow So if I think of the monitoring and alerting and tracing are just signals that are being used by Teams well, there's other signals that are starting to be used if you think of the practice of Phenops of Understanding well, what are the cost basis of all the things you're running? Well now engineers and product managers could start to also take in the cost So not only you're looking at optimizing the outcomes for the user But what if that outcome gives you five more dollars of profit but costs you ten more dollars in cost Well, maybe that's a bad decision without the Phenops data You would only look at the revenue drivers and so you really really understand the full picture So I think we're going to continue to see this observability trend where there's more and more signals that are coming in and I mean Call it enterprise observability or something else But like what is the business starting to do to really understand how they're they're functioning as a business and then? Optimize that and I think that will expand way beyond just the technical components But think about the flow of the developer like how many times do developers get interrupted? And so that really drives to your cost of creating software So that's I think where observability is going to continue to evolve and how much Adoption are you seeing off ability there where you're like, yeah, everybody knows that everybody is embracing it Are you also feel that when you talk to your own customers and clients that you have to still go and educate? Hey, you don't have any absolute practices in place. You need to have them So you use the key term there observability practices and so too often companies think I've gotten observability tool check the box I'm done, but that really doesn't solve anything and at worst it gives you this huge bill that doesn't give you any value It's much more of a cultural impact and a cultural changes and the companies like Netflix this is just built in their DNA So of course they're they're way ahead of everyone else But a lot of other enterprises that we work with which is the core of our customers helping them Really sit with their teams and helping them develop these cloud native practices such as using data to drive decisions I think that's core to that and that's core to observability And now contrary to what all the popular observability vendors would like you to believe you don't actually need observability software to start Developing this practice of observability So I think I think what I'm seeing is that there's this mixed bag of a lot of teams that have an IT teams in particular that have the software But there's a lot of application developers and product managers that I run into that aren't yet using this data to make Intelligent decisions or not even intelligent, but just to help them inform their decisions I'm so happy that you talked about the cultural aspect because Technology part is easy culture power part is the tricky part and we have been seeing a lot of culture shift The biggest one off good DevOps is there we talk about DevSec off. We talk about the whole shift left movement We talk about SREs, you know all those things from the observability perspective When our nation do want to embrace this culture the practices What are some of the major challenges because tools are there but tools themselves are not sufficient I think there's two aspects and I'll touch first in that cultural I think it has to be set from leadership So a good example is if leaders shoot the messengers when they show bad news based on the data Then people you know kind of implicitly get this negative feedback loop of oh, we don't want to we don't want to understand if something bad It's happening Conversely if leaders kind of embrace like hey, that's an incredible insight. Let's let's fix it and improve it Then that starts to create this culture of let's let's learn more And so that's that's part of I think that the core is start with that leadership of Really embracing data and using to make decisions now too often I've seen leadership use that I'll call it lip service and say we need data we need data But they don't really have the the same kind of data-driven decisions themselves when they for example when they want to learn something about a System they ask everyone for a bunch of custom reports that they go scramble and build Custom reports just for that one off inquiry as opposed to using the systems and you start to dive in now I don't expect executives like a CEO to dive into an observability tool But it's more about how does he and nurture people at different levels to create those dashboards and use the systems for the data That they have in the data that they generate so that's a cultural aspect the Is one of the the key elements that you need to start to foster The next is you touched on the shift left movement And I think this is a lot of confusion because enterprises that I run into what they've done is they've they've said hey We're gonna we're gonna take the developers and now we're gonna make them the operators And we're gonna make them the you know the the UX and UI and they start adding and piling on all these Job descriptions and requirements, but they don't do anything to take anything off their plate And so I think that that's a recipe for disaster I There's a term that that people are starting to use of platform engineering And it's really how do you take it and shift it from a checkbox implemented tool to help implement in a way They can remove cognitive overload from the developers So a good example is that if you give them the right visibility into their system the developers Then it's easy for them to start to operate this system But if you don't have that in in concert then it really starts to deteriorate the ability for those developers to work And now they're spending less and less percent their time on feature development Which is what all the split between Dev and ops start in the beginning So you really have to look at those tools and how you you you build platforms that allow people to consume the data in a Meaningful way while you're doing that cultural change of embracing data Can you also talk about the business value of having observatory practices because it's more than just you know a lot of plumbing in the back And it may have direct impact almost again businesses as well And I'll use an analogy that is related but completely different that hopefully some business folks if they're watching will get Now before Google and online advertising, you know the joke was that 50% of your advertising is effective You just don't know which 50% and like of course companies would spend lots of money on radio TV and newspaper And of course they saw sales go up But how would you optimize that well? They didn't really have a system that gave them observability of understanding the behavior of people and how it was driven based on the different things They tried now look at online advertising which I granted is different than brand and marketing But just look at product advertising companies can know exactly how much a Customer is going to spend and how much profit they'll make if they click on an ad So of course they're willing to spend and bid Interactively on individual users because they have all the information about that user and about the products and the historical behavior of what they've They've done you could also understand this person's probably gonna come back regardless I don't bid anything for that ad because it would be a waste of money So this is a much more sophisticated way of a business to operate now at the macro Level companies that didn't embrace this kind of data in advertising They kind of became outmaneuvered by competitors that were and so you have a lot of these these companies in fact most of the companies on the fortune excuse me for the Stock indexes are relatively young and new because they leverage this this I'll call it broader observability trend now Granted they didn't call observability at the time But that's just an aspect of understanding how a system performs based on the metadata generated by the system I think the same things happening in companies today Those companies and enterprises in particular that aren't embracing the data that is generated by the systems to help them make data-driven Decisions are gonna be outmaneuvered by companies that are and it's not a fact Are they a startup or it's just really simply are they given the permission to do that from that culture as well as Provided the tools that are continually evolving and so that's the important aspect the companies need to start somewhere and just continue It's not gonna be perfect, but continually to improve both the cultural and the technological aspect What I also want to talk about is that we all talk about generative AI these days Do you see how do you see? observability Will benefit from These technologies you really called it out is that there's nothing new about AI it's been using a lot of tools, but I have to admit Chat GPT and the whole generative AI movement has really taken me by surprise ask me a year ago And I would have said all that science fiction and now today it's reality And I can't believe how impactful it is when used in the right case something as simple as you write a you know You write something you say hey make it better for me or research like these are incredible Capabilities that are being developed rapidly So I do think that companies in the observability space and we're already seeing this are Leveraging the same techniques to highlight that needle in the haystack like well What are the insights and the meaningful things that you need now? That's slightly different than the large language model. That's that is trying to generate new or like Information out of this vast piece of information more textual based or image based This is much more of a search problem of understanding correlation and core on causation So it's a slightly different type of AI that's being used But it's really being leveraged quite a bit and I think will only propel people to understand better about the systems And have those insights So that's only going to increase quite a bit now the flip side of companies starting to use generative AI I think this is a very dangerous and slip slippery slope And this is another area where you need observability is if you just let everyone start using AI They're going to start going to these these non sanctioned Systems and before you know it they start copying and pasting Lark likely intellectual property into these destination systems and then information starts leaking the problem The companies have today is they don't have any observability on who's using what and so that's leading to this problem of an unknown Attack vector will you of eroding information security and intellectual property now before we write this up? I would love to know how is carry helping, you know customers users in this space with As we're discussing observative practices now I mentioned the observability is yes The tools are great, but it's much more that data-driven decision-making process at Carrick We call ourselves characters. We're quite obsessed with data. It's one of our characteristics You know that age-old saying if you can't measure it you can't improve it and so every engagement we come into we're always asking What are we trying to accomplish and how will we know for successful? And if we can't answer those through data Then we haven't yet asked the right question or the right metrics or we just need to build something to build metrics And what we do is we then sit with and that's what we really like to say We kind of sit with our customers rather than do something for them because it's really about helping companies and Individuals at those companies understand what questions should they be asking what data should they be gathering and then how they use that to make a decision and Just doing that once is nice But it's that practice of understanding that hey You can ask questions and you can be wrong and you can add more data and you couldn't prove over time That's the the really the essence of what we we instill in teams we work with so that their capabilities Continue to evolve and grow beyond our engagements and that's core to who we are as Carrick So I can give you an example is a company a telecom company that we worked with you know They said oh we've got these pipelines on on the DevOps of like who's doing what and the information when it's being pushed and how Systems working and when we assume that so we didn't include it in our statement of work but it was pretty clear that they actually didn't have a system that was usable and So we just took it on as part of like additional scope that that we just built it because how would we know for improving and Maybe self-serving of saying hey, here's the impact we had But but the reality is it's actually pretty easy to build some of these systems So you could start using data-driven decisions that particular system was just some web hooks they call out to event streaming systems into BigQuery on Google Cloud and Within a month of when we pushed that and it only took about two weeks to build it within a month of pushing that We were collecting about seven terabytes per month And that was allowing us to understand well What is the fleet of engineers how they're functioning where they're pushing their codes? How many of those things were being successful versus rolling back really? How much time do you spend on rework and now this isn't a traditional case of observability where you're looking at the microservices But this is an aspect where you can combine it So for example do different engineers start to push code that causes spikes and increase in latency now You don't want to use that data to punish those engineers But you want to use that data to help engineers know who to work with to figure out how to solve these problems and really Create a culture of asking the right questions and what I have since two things I thought were interesting one is The system that was gathering those seven terabytes into dashboards that even executives could look at was costing I think at when I last asked them five hundred dollars a month So if you think about how much it cost to in the old days of buildings monitoring systems versus being able to just pump Data and analyze it. It's so much cheaper. And so that's why it's part of everything we do The second thing is that this company came back and said they they found a breach that they would have never known about if it wasn't for that telemetry pipeline and so that was a huge also benefit of they were able to uncover well What happened in when I like to think that if we were also engaged in security, maybe that breach wouldn't have happened But who knows but the thing that I do take a lot of pride in is when we started working there Executives would have fired somebody if that kind of breach had happened But what happened instead is executives like they started to understand what happened in when and they started inspecting Like oh, how can we learn and improve from this and it was quite comical I shouldn't say comical because it was a very serious incident But the shift was hey, you know what assume your source code now is going to be out in the public domain So are you are you proud of your source code or do you want to make it better? And that was a completely different approach than I think what an enterprise would have taken 15 or 20 years ago And I think that's the also the area and evolution that's happening the competitive advantage isn't just simply the source code You have but it's the combination of the systems you build how they work together your customer experience And the products that you assemble from those components So that's kind of what we do at Carrick is we sit with engineers in teams to help learn how to use data But then we also work with the executives to understand how to set goals and how to reward people based on the data Rather than penalize people for having something that looks bad. It's much more about that improvement rather than the absolutes I'll give you one other story that might not be that you could choose to use or not But observability when you when you create a data-driven culture that people just assume like sure Let's gather data. You can really enable different Completely different use cases. You didn't think we're possible. And so I'll take an example for my days at google When they started using solid state hard drives They didn't know when they were going to fail And so the concern that some engineers had is that if we put these drives and all of our index servers They're going to fail all at the same time because they have a very similar right and usage pattern And so some engineers thought wouldn't it be great if we track every single bit that we turn on and off on every single disc on And and and and our entire fleet in real time and then move discs around the data center based on those usage patterns So if you just think about that tracking every single bit on every single disc that flips You end up having exponentially larger data of logs than you do of the data that's being stored But you know what took a couple engineers and they built this and they streamed all the data in so they could understand and move Discs around and what's fascinating about that is that they they realized that solid state hard drives Were much more robust than anyone thought And then they started selling the data back and working with all the vendors to understand how to improve their disc drives Because no one had this kind of real-world data So just think about when you have a culture that says yes, we can with data What else could you be doing as a company? Justin? Thank you so much for taking time out today and talk about those Absolutely. Thank you so much for all those stories Great insights there and as I said, I would love to have you back on the show. Thank you My pleasure. Have a wonderful day