 So folks, just to introduce myself, I head the engineering team at Gabri Technologies. So we provide a cloud-based SaaS products to large retailers and other consumer-facing businesses to engage better with their end customers. So we are presenting about 30 countries at this point and working at a fairly decent scale. We touch almost 650 million consumers across these 30 countries and processing close to 15 billion line items every year, close to 3 to 4 billion transactions every year. So we are working at a fairly decent scale based in Bangalore, the small dissenter in Shanghai, China as well. And thanks to Abha for that insightful talk. I think what caught my attention was that $18 per day that you are spending with someone who is always looking to optimize my engineering expenditure. I think that definitely caught my attention and I will be reaching out to you to understand slightly more in detail. All right, thanks folks for joining. So the agenda for my talk is going to be fairly straightforward, it's more of a one-on-one country talk on observability. Just to, you know, I'll probably take an initial five minutes to set a context of how this whole, you know, the whole drive and the whole, the concept of observability came into being in the recent five to six years, right? How the trend has progressed and what are the three core pillars of observability and how they tie up into the other related areas of monitoring, alerting as well. And also I think in the towards the end, we spend about a couple of minutes on the recent emerging trends in observability space, like beyond the basic your metrics, monitoring and alerting, right? And after that, we can open up to questions and like discussions with the audience. Yeah, let's just dive in. Right, so I think as many of you folks know, right, release to production is as the beginning of any cycle of any software system of any product, right? So there have been very many, many studies in the past and I think from anecdotal evidence as well, like 40 to 90% of the total cost of software actually incurred after launch, right? So the maintenance, the keeping the uptime, keeping the performance healthy, pushing in bug fixes, patching, right? So it's a significant amount of cost that goes beyond just the production deployment or the first lease of the system, right? And as systems are becoming more and more complex, more and more distributed, more and as as you know, rightly said by a very famous VC that software is leading the world, right? So software is kind of patching more and more parts of our lives. The consequence of that is as systems fail, the impact of that on our day to day lives is also becoming more and more prominent, right? So and as systems become complex, they will fail, right? There's no, I mean, there's no doubt about it, right? No matter how hard you try to build a scalable uptime system, they will eventually fail, right? You will be up towards the tail end of your availability numbers, right? If the system will fail eventually, right? And as one of our goals, Amazon's YouTube keeps saying that everything fails all the time, right? Number of failure points in a distributed system will increase with every new component you add to it. The more number of parts, the more number of requests to components, interactions are increasing, right? The more the more chances of failure you are introducing in your system, right? So people say that, you know, if I don't, we can have a very strong monitoring framework, very strong monitoring on our systems to kind of alert us when some things are going wrong and we can just go and fix it, right? But the question that comes to mind is what do you monitor, right? I think that's what we probably just do a very simple case study, again, something which most of you folks would be very familiar with, right? Let's just take an example of a very simple RESTful API server. You're serving REST APIs over HTTP probably using JSON as your request response structure, right? What can be, what can be the potential points on which you would like to monitor, could be your API latencies, 9 to the percentile average, 99 percentile. If you are running on containers or even on your VM, your CPU usage, your load average, your memory stack, if you meet the database application, your heap usage, your HTTP error codes, exception rate, failure rate, if you are dependent on external systems and whatever latency the external APIs, which could be impacting your application latency as well, and many, many more, right? So, I mean, these are just a few high level sample points that we could potentially monitor on this API application, right? And you may want to multiply by that, by the number of servers you are running because for redundancy purpose, it's obvious, daily, obviously you would want to have more and more servers running with the same API server code base. And if you have into disaster recovery and you want to have a backup code or you are running in a hot cold kind of a setup, you also have the multiple error or the number of deployments or number of classes that you potentially have, right? So, just a simple back of the envelope calculation will tell you that, you know, you are looking at close to 400 to 500 or maybe even 1000 separate data points that you may want to monitor essentially, right? And this is just for one REST service that you could have at the top of your stack. There could be multiple such services in your overall software deployment that you have, right? So, it's really obvious, it does something that does not look like it, right? I mean, you can't have a monitoring system which is observing like thousands of metrics of across so many servers and applications running, right? I mean, you will go crazy. You will have to deploy a much, much larger team as to monitor and watch these metrics essentially, right? So, let's just step back and understand what monitoring is, right? So, in theoretical terms, what monitoring means is, you know, you're just capturing the state of the systems who try to determine what is the health of the system, right? So, a couple of methods that are commonly used is, you know, you have health checks running just to see whether the service is actually up and running and secondly, can you send work to it, right? So, in the container ecosystem, you have the liveness source to check whether your service is running or not. You have the readiness source to check whether it's application ready to accept, move, work or not, right? And you could have metrics, system, application and function, we probably do a deeper dive into this in the subsequent slides. And once you have understood these health checks and metrics, you can define anomalies on it. Like, for instance, if your, your, your litencies are going beyond a certain threshold, which you have said as per the acceptable number for your consumer, you can define an anomalous behavior on top, right? And alerts are usually set up on known failures. Basically the things that you already know can go wrong and you could find a threshold on top of it and you define alerts on top of it, right? It's a fairly knowledge-based process. It's a reactive post-outage process. You have an outage, you realize that something was going wrong and your system was not alerting on it. So, you end up adding an alert on top of it so that in future you can catch these, you can catch these anomalous behaviors proactively, right? But what about the unknown failures, right? I mean, you can always, you can always apply alerts on things that you already know could potentially go wrong, right? But as the complexity of the system is increasing, you're becoming more and more distributed, you're adding more and more integrations, external dependencies. The number of unknown failures are also increasing, right? So, I think that's the time, you know, observability kind of comes into the picture very nicely, right? It helps you to kind of keep a tab on these unknown failures, essentially, right? So, I started my career way back in 2007 and 2008, started working at Yahoo. So, at Yahoo, the dev team had zero access to production systems and the message that you see on the slide is something which I have lived through for almost one or two, three years. Once my code has gone to production, I did not even write and I lived whether it's running or not, just hand it over to the dev ops team, let them manage the systems, even if it comes under file, you know, it's a crash and burn kind of system, it's not our problem, where devs don't have access to production systems, right? But in the last decade or so, the way the industry has progressed, the PSI, which the developers are building software, this barrier between dev ops and dev is getting thinner and thinner, right? And this is one of the, I would say, you know, one of the very popular and remarkable posts by copy transfer, I mean, a big fan of the writing, right? So, he posted this article in 2017, it's a highly recommended read, right? Observability is, you know, because devs don't like to do monitoring, you need to package it in a new nomenclature to make it palatable and trendy, right? So, it's more of a lighter, it's a lighter take on what observability is, but yeah, in some form, you're essentially taking monitoring, keeping the systems up and running and then the operational aspect of your infra closer to their developers in some form, right? Baron Schott's one of the well-known names in relational databases, one of the chief architects of Percona, and he also, you see your vivid cortex, I'm not sure if you folks have used it again, a very, very nice tool to monitor and keep your databases up and running. So, he mentions it in a very, very clear and elusive way that monitoring tells you whether a system is working. Observability will help you understand why it is not working, right? That's why the whole, as the system ecosystem is becoming more and more common, observability is picking up at a similar phase, essentially. So, slightly theoretical definition, what observability is, it's a measure of how well the internal states of system can be inferred from the knowledge of its external outputs. Again, might seem too fancy or too verbose, but the key items over here is, you are trying to measure the internal states and by looking at its external outputs, and you're trying to infer what's going on inside your application, right? So, I kind of consider it to be fairly analogous to medical diagnosis, right? A human body to diagnose what's going wrong inside a human body, doctors just don't go and keep start opening up your body by doing surgery and all that. They try to infer what's going on inside your body by looking at the external output. It could be your blood pressure, it could be your heartbeat rate, it could be your temperature, it could be your pulse rate, and all those external signals that your body is sending out, right? So, essentially, observability of software systems is just an analog of that, essentially. You're trying to observe what's going on inside the software system by looking at the outputs that you're emitting from your applications, essentially, right? So, what could be the potential internal states of an application? Now, they're very, very context-specific. For instance, if I'm talking about a web server, the internal states could be characterized as the availability and the uptime that is the server actually responding to my requests, could be the incoming request rates, could be my response latency, that should be failure rates. Application services, you know, it could be my functional success rate, failure rates, for message queue, it could be my queue length, number of active producer consumers at any point in time. Okay, so these are fairly context-specific states that you may want to define depending on your application architecture. And the key point here is that to measure the internal states, you will have to instrument your code while you're writing it, right? It cannot be an afterthought. So, that's something I've seen very new engineers doing when they're writing code, right? They finished the whole application development when it's ready to go to UA or to move on to the next stage in the development process. They will start to actively instrumenting their code, right? I kind of strongly discourage that, right? While writing code, think of that, is this a point where I want to measure the internal state of my application? If yes, instrument it, right? External outputs, that's where I've come the three pillars of observability, right? Metrics, logging and tracing. So, health checks is also a form of observability, essentially, but I kind of club it under metrics, but again, it's fairly hazy, so feel free to club it wherever you want. But yeah, for the sake of discussion, I consider that part of my metrics chapter. Let's go, let's spend a bit of time in each of these three pillars, right? What are metrics, right? Metrics are essentially nothing but the external state which you're measuring at a broad scope. Now, when I say a broad scope, it's nothing but a time dimension because when you say a metric, you're looking at, you'll always look at metrics with the time as one of the access. For instance, what was the performance of my APIs between 8 a.m. to 8 a.m. to 9 a.m. ISP, right? Or what was the incoming request rate at particular slice or a packet of time, essentially, right? So you will always have a time dimension or time access available to you when you're talking about metrics, right? Could be system metrics, system metrics are nothing but your seat utilization, your memory swap, other network system parameters. Could be your application metrics, your success rate, failure rates, you can see the records. Again, these are all very applications specific and subjective things which will be dependent on your application architecture and your design. And lastly, your business and functional metrics, these are likely more relevant towards your business and your product managers, right? You're looking at your order rates. In case of payments, it could be a lot of successful power payments, reverse payments, reconciliation requests, in case of a opening system, it could be a lot of coupons you're issuing, number of coupons you're getting the team successfully. It's, again, business and functional metrics. So it's good to classify the metrics into these three pockets. They help you to kind of design this metrics better and also to create your task codes as aligned to your consumers and stakeholders of these metrics essentially, right? Metrics should, what should be the ideal of deciding what to emit, what not to emit? I follow a very blanket rule of thumb, be as generous as you want to be when editing your metrics. Some, if you want to get, if you want to read some government guidelines around how do you design your metrics, I'd highly recommend this article on metrics that matter, it came on ACM. It's a very, very nice framework, how you decide whether this is a metric that you want to track on, essentially, right? So be generous on metrics, however, be judicious on alerts because, again, you don't want to alert pretty. Going back to the point we are discussing in that simple case study that you know, you don't want to alert on 1000 different data points essentially, right? Your knock team or your own calls will kind of go crazy with the amount of alerts you will be generating, essentially, right? Another common mistake I have seen people doing is that they end up stuffing a lot of dimensions or data points on their metrics, right? So it's a very, it's a highly advised that your metrics should have low cardinality on the metadata that you're attaching to them, right? So avoid attaching user specific IDs, order IDs, or entity IDs inside a metric, right? So metrics will always be providing you system summaries in some form and they will help you answer questions like how many transactions failed, how many logins succeeded, how many orders are being processed, how many payments are being processed. So they will help you answer how kind of questions, essentially, right? So that's another, a simple way to decide whether there's something you would want a metric on, essentially, on your system, yeah? Common tools available to capture metrics, Prometheus, again, we're both touched upon it in a deep dive, Influx DB, again, fairly, popular tool, timescale DB, one of the recent editions in the last two to three years. Graphite is, I think, one of the old trend IDs of your timescale databases. Although I'm not a personal kind of graphite, because the data model is very flat-ish and not very intuitive to people who come from a SQL-ish kind of a background. Open TSGB, again, fairly old timescales, science-scale database, they'll run on top of HPACE. SCUBA is, again, a very nice architecture propagated by Facebook, which is a very nice paper, highly recommended, if you want to go deep into the algorithms that are optimizing the each and right parts of a metric store. Apache, Druid, people are using it for metric storing, but it's largely an event storage, event analytics, and graphana for visualization, right? And one common pattern and common algorithm as data sector, you will see in this metric store that they all use log structure and mergers, because metric databases are, metric stores are, by nature, supposed to be extremely, extremely scalable on writes as compared to reads, because you are writing at a much, much faster rate than you're reading it. So I think Vivo also touched upon, they're touching close to 900,000 metrics per second, right? So obviously, the read could be of a much smaller scale than what their write scales are, write-to-write throughputs are, essentially, right? Coming to health checks, right? As I said, it's my service running in container world, liunas probes, can I send work to it? It's my application ready to respond to work, the liunas probes. What are the different ways to collect health checks, right? So in a lot of P2P distributed databases, you would have heard the term gossip protocols, very popular in Cassandra and React. Whenever a node enter the cluster, it broadcasts its availability and health checks in a gossip protocol, right? Service raise free, service comes up, it's available to serve traffic, it goes and serves itself successfully. When the service is going through a downtime, it will end up be resting it from the service, it's serving some service raise free and that's a way it's kind of propagating its health to the rest of the services in the system. And the, I mean, the older mechanisms and a lot of reliable mechanisms as well are doing health checks on via ERVs, HAProxies, NG9s. Again, they're very common and have been in use for many, many years, essentially, right? Coming to the second pillar, logging. So the way to understand logging is that, you know, logging helps you deep dive at a much smaller scope and not on the time dimension, but rather on an entity dimension, it could be a request or a customer or a transaction. Logging will help you answer another question, essentially, why couldn't a customer place an order? Why did a transaction fail? Why did a checkout on an e-commerce application fail? Why couldn't I add a product to a cart? The scope is much, much smaller. You're not trying to aggregate things across a large number of requests. You're trying to look at or do a dissection of a much smaller request, essentially, right? Logging has to be centralized. Here comes the log collection and aggregation technologies, Fluendee, Logstatt and tons of agents are available out there. Fluendee is a lot popular in the cloud and native ecosystem. Logs have to be searchable. It comes with your indexing technologies, elastic search and whatnot. In fact, index-free logging is also becoming quite popular these days. So, low-key is the last tool that you see is becoming a fairly, it has come into the Grafana ecosystem and they're doing some really, really kick-ass shop in promoting the index-free logging ecosystem. And logs, essentially, have to be correlatable by a common key, even being, in a fairly large distributed system, your request is touching multiple services. You have to tie the things together and kind of arrive at a common trace of the request. So, your request ID is a very commonly used parameter. A lot of commonly available load balancers allow, you know, you can inject UID that request ID as soon as they enter your input. So, standard tools these days, ELK-START, ESK-PAID technologies like Splunk and Sumo Logic and low-key, as I mentioned, a new trend in index-free logging is picking up, essentially. Right, anatomy of the log, very simple. Time-stamped levels, service commit IDs, build times, build version number, region, customer ID tracing and all that. So, having a well-structured logs helps you from day one to kind of get the maximum value out of your logging system. So, many popular libraries allow you to define your appenders and your logging format. So, it's a very, very common practice across multiple languages and libraries. Third pillar, tracing. So, tracing helps you to, if you're working at a system, you're working in a software style where multiple services, each service is responsible only for a very small component of your request processing. You may want to visualize it and you may want to trace what happened in which part of your request processing. That's where the whole tracing comes into the observability space, right? And it actually became popular after Google published its dapper paper, I think almost 10 to 11 years ago. Twitter also came up with an open tracing system called Zipkin. Then, the community of developers got together, they published a standard called Open Yeager, which we, again, been very popular. And with a lot of your application to how much monitoring tools like Neuralik, DataDog, or AppDynamics, tracing comes out of the box, essentially. So, it's a very, very useful tool for you to zoom in into which particular component of your request processing is slowing down, how much time is it taking, and the whole concept of spans and traces comes into the picture, right? If you look at the observability spectrum, right? So, you will have health checks and some form of metrics which are allowing you to catch known unknowns. And then comes your debugging and explorations space where the other part of metrics help you to query to understand what went wrong. You're trying to discover the unknown unknowns, right? The tracing and logging kind of help you in the debugging and exploration part. Your health checks and some form of your basic health metrics help you for monitoring and resiliency, essentially, right? We have discovered, maybe we have discussed metrics, we have discovered logs, we have discovered tracing, right? But coming back to the problem we started with, what do you want to alert on, right? Come service level objectives, one of the terms which has been popularized by Google's SRE teams a lot, essentially, right? Just folks bear with me, I might sound too much fun, at least if the audience is fairly mature, it might seem repetitive, but yeah, for the newer folks, SLO is a very nice concept. I think it's something which I promote very, very heavily in my teams, right? SLOs are nothing but a very simple, quantifiable and measurable goal for a service. And that goal should be linked to the user experience for a delight factor, right? So as engineers, we take a lot of pride into building scalable systems, but at the end of the day, we have to think that there's a user who's trying to derive value on a system that you have built, right? So SLOs help you tie up the technical health and the technical factor of your systems to the user experience and the delight factor, essentially, right? And SLOs should be something which you are defining before you start writing code, right? You should work backwards, define an SLO first and start then start writing code of your service, right? And have as few SLOs as possible and which are representative of your system behavior. You can probably get a more deeper insight into SLOs by reading through this article or looking at this talk on YouTube. Let's do a simple exercise on defining what could be the SLOs or let's assume that I'm writing a cart service, which I'll probably open it up for comments from the audience at this point, right? Let's assume that you are responsible for writing a service which is managing the cart on an e-commerce portal, right? What could be the potential SLOs that you would define for such kind of a service? Maybe you can pour in your comments and then pour in your answer in the comments that works. Or shall we open it up? I mean, if somebody wants to speak out, they can unmute. What do you want to do it? Yeah, so as usual, as I said earlier in the previous talk also, folks watching, we have around 10 people watching on YouTube live stream. We have more people joining the second talk who are all here. Thank you, Piyush, for such an insightful again talk. While this was more of a 101, but sometimes, as you correctly said, right? You need to start from scratch and there are always people coming into the practice new. They have to learn first principles, right? Which is something we always talk about, right? Just doing fancy tools probably is always not great. You need to understand how to use that data more often in the right way to arrive at your decisions, right? So, yeah, with that in mind. If you want me to finish, I'll probably just take a few minutes. Maybe just start upon the... Oh, sorry, I'm so sorry. I thought you were opening up for questions. I'm so sorry. I'll probably make a collaborative thing, you know, rather than just going to a monologue, right? Maybe just let the audience come in. Awesome, sure, sure. Sure, we can do that format also. So, yeah, folks, you know, since Piyush once said that way, feel free to, you know, have a conversation with him and interact with him. You can unmute yourself as he wants to and you can interact as he would like to direct this. Please go ahead, Piyush. Sorry, I thought wrong. Apologies. Yeah, no worries. So folks, again, let's assume that, you know, you are writing a service or managing a service which does one of these three functionalities, right? It could be a cart service. It could be a authentication authorization service, right? Again, we have built a service like Octa or Auth0 or maybe you have built a gateway communication gateway, sending out SMSes, emails, push notifications, right? So what could be the potential SLOs you may want to design for these kind of applications? So open for thought folks, I mean, feel free to send us comments or you can speak, feel free. So let's say I will pitch in here, you know, since folks are still deciding on stuff. Like I'll go for the second one, right? Auth0 and Auth0. Like one important SLO here would be that out of let's say a number of authentication requests, how many are getting successful, right? And what is the latency auto, right? So those are sort of, you know, base metrics that we need to and how many auth failures are happening. You know, they could happen due to wrong logic. They could happen due to unavailability of service. They could happen due to an edge case in the authorization logic because more often authentication is a simple. Authorization is where it becomes, you know, much more complicated for people to actually plan out the whole workflow and execute, right? So those test cases, those age cases, how we are navigating that should be, you know, our SLOs for those two services, right? Right, right. No, absolutely. I think that makes a lot of sense. And the idea is that, you know, SLOs are essentially representative of what your service represents, right? So what your service does essentially, right? So now, right? I mean, just going back to the SEP example, right? My, you know, my CPU spike can actually cause a high latency which can cause a failure in my authentication operation, right? So there could be thousands of reasons which can actually cause this behavior to kind of deviate from the expected norm, right? So, but having a tight watch on the expected behavior helps you capture multiple failures into a single metric essentially, right? So having alerts on the SLO and the SLO objectives makes a lot more sense on a much common, on a very complex application, right? So having alerts around SLOs makes the job a lot easier, right? You can always use the deeper level metrics for analyzing and troubleshooting when the SLO is not getting made. But thinking of an SLO first approach helps a lot, you know, in terms of designing your architecture, designing your routing and monitoring in a much more easy and communicable way, essentially, right? Yeah. So, yeah, please go on. Yeah, I mean, it should be like business metric driven, right? Like SLOs are mostly business metrics. And then you drill down to deeper system level stuff, more always level stuff, right? Right, right, right. Right, similarly for the card service, it would be, you know, the number of products are getting added to the card, number of quantities are able to change on a card, number of successful checkouts are able to do, right? So those are the SLOs on which I should be designing some of my metrics and essentially my alerts as well. And similarly on a convenient system could be the number of successful SMSs I'm sending out, number of incoming requests, right? Number of successful messages submitted to FCM or maybe send grade or some of those things, right? Rather than worrying about my lower level nuances of the email payload sizes or anything of that sort, essentially, right? Yeah. And lastly, why should we spend time on observability, right? Obviously through some of the really, you know, technical and operational aspects that you can build scalable systems, you can do better capacity and load planning. It leads to a lower mean time to repair. But besides that, you know, I've been running large teams for a very, very long time. One thing I realized is that, you know, having observability as first class citizen in the team, you know, provides a lot more data driven culture, right? All the conversations get driven around what metrics have you added? Do you have sufficient afterwards? Should we have sufficient alerts on the SLOs? So the whole culture kind of becomes a lot more data driven. Your teams are talking in terms of data rather than subjective terms like, is your service healthy? Can your service handle my scale? And things become a lot more data driven, right? Your outcomes become a lot measurable and remove subjectivity. In a sense that I have a bunch of architects on my group, right? I give them quarterly goals that here is a service which is operating at a 700 millisecond latency on 10,000 RPM. Your objective for the next quarter is to, you know, ensure that 10,000 RPM latency comes out of 500 MS. They have a clear measurable goal, no subjectivity, right? Someone cannot come back to me and say that, you know, I changed the whole data structure, I changed the whole algorithm, I optimized the code. If the API has not moved, essentially something has gone wrong, right? Either the hypothesis goes wrong or maybe the implementation goes wrong, right? It brings a lot more accountability across second-party teams. I have had so many product managers coming to me saying our experiments are not working because the implementation is unstable, it is buggy, right? The way I solve it is that, you know, when the tech team rolls out, I ask the product managers to give me three KPIs which define that implementation is successful. If those KPIs are made, the next analysis is completely on the hypothesis of the product, right? Was the hypothesis correct? Was the data analyzed correctly? Has the tech team delivered the features in a stable way? The product team can then go and conduct their experiments successfully. It brings a lot more accountability because you're talking data, I have some activity, there's no emotions they want to input, right? Do data-driven conversations, essentially, right? Also, it's, I think, very given how deep in the stack DevOps and observability is, it's very easy to use back of, you know, that you are actually doing all of this for a very different product, right? It's not, it's not about a single server, right? It's about what impact it's having. So having SLOs help us sort of prioritize the product thinking, right? I think for me, you know, as an engineering leader and manager, for me, you know, the accountability aspect is something which is helping a lot by bringing observability, right? So the engineering groups kind of commit to agreeing on SLA, SLO numbers to each other, right? Actually, I'm a consumer of the authentication of the service. I can clearly go and tell them that, you know, I foresee a traffic of at least 5,000 RPM coming in the next week, in the next coming quarter. Can you guarantee a 95% SLA of 50 milliseconds to me so that I can commit a 100 millisecond SLA to my consumer of the services, right? So this level of accountability becomes much easier when you have observability as part of your culture and part of your implementation, right? Talking about standards, again, given the plethora of, you know, metric tools, metrics, collection technologies coming into the picture, the developer community came together, started defining standards which can be followed and technology is abide by. So Open Metrics, one of the standards which came about three to four years ago, standardizing the structures of metrics, how do you emit it so that you can replace the metrics back in seamlessly. And Open Telemetry is one of the recent standards which is getting a lot more adoption. It's basically a combination of the open sensors and the open tracing standards, right? So they have defined very clear APIs and SDKs to cover all the pillars that we discussed, right? Metrics, tracing, context is nothing but adding more logging and annotations. So context is a generic broad term on top, right? And abiding by Open Telemetry standards makes you, Open Telemetry APIs and SDKs are essentially open source, they're vendor neutral. So tomorrow actually you want to replace your metrics back in from Promise ES to something else. So maybe you want to go away from proprietary vendors like Neuralink or DataDoc to open source to like Promise ES. Having an Open Telemetry implementation will allow you to do that seamlessly, right? And it also avoids a lot more logging. If you're not logged into a vendor and you don't have to commit to a long-term usage of a particular vendor essentially, right? And one of the recent trends in observability beyond just application metrics monitoring and all is the data observability. This is something which I have started looking into in the last few months, right? With the advent of large data systems and companies like Snowflake coming into the picture who are actually making data analysis and analytics at a much much large scale, very seamless and commodity. The data observability has also become a recent trend. The idea is since systems are becoming more and more integrated, you're collecting a lot more data from multiple sources. The quality of data, the correctness of the data, the availability of the data, the freshness of the data has become a lot more important because your data teams and business teams need reliable data, right? They need to be, they have to be sure that the numbers that they are looking at, they are actually on accurate data, right? So a lot of startups have recently sprung in this space in the last 10, 12, 15 months, right? Axel data is an Indian company, Montecarlo Soda, right? They're actually building some really, really nice, good platforms for observing your data very, very closely, right? So the five pillars of data observability are freshness, volume, schema, distribution and lineage. There are some reference articles you may want to read up, right? So freshness simply means is my data up to, the date is my data, two days old, three days old, like can I get the data that I'm looking at, the inferences and analysis I'm looking at, are they recent or are they slightly stable, right? Volume could be, I expect to ingest at least 500 degree of data on a daily basis. Suddenly we see a dip by 200 gigs, right? So what happened? Is my volume reduction organic? Is my volume reduction could be further helpful, right? Schema essentially means since I'm connecting to multiple systems, these days, a slight change in a schema or a JSON format in one endpoint could cause, you know, my whole data pipeline to crash essentially, right? Ensuring the integrity of the schema across different integration touch points, right? Distribution means again, how you're tracking the data pipeline. You have data coming in from multiple sources who are relaying and exchanging data to on different APIs and to different steps in the pipeline. Ensuring that whole exchange along the pipeline is healthy. Lineage again, something very, very critical especially for enterprise systems like what CapBerry has, right? User data is changing over time, user is updating their profiles over time, how do you draw the lineage graph of that data? How do you derive at a current state of a particular entity essentially, right? So these kind of questions can be answered by data observability platform, but again, it's a fairly new trend. I would say not more than a couple of years old, so still a lot more innovation to happen in this area, but I feel with the increase of data systems globally, I think this is gonna be a very, very popular or very closely watched trend in the coming years, right? That's all folks, happy observing. And some of the references strongly recommend folks reading it even if you are a matured observability practitioner running through these articles will definitely help. I kind of circle back to these articles every six to eight months, to be honest, right? Yeah, I'll open it up for Q&A and if you want to reach out to me, that's my Twitter handle, you'll be able to drop me a message or anything.