 Hello there. There are many debates on Twitter Zingers regarding the difference between observability and monitoring. I'm not going to talk about that. I'm going to focus on what is observability? What is observability-driven design, development and testing? How applications should focus in producing golden signals during the design phase so that those signals can be produced at the operation cycle, can be consumed to produce meaningful insights from the data. Let's get into the session. I'm sure all of us have seen this mobile phone Nokia 3310 known for its boundless battery, talk time of an entire day and its strength. It has many good things. At the same time, if the battery goes down, it's completely dead. Even the basic functionality of calling my mom when there is a need, that is not possible. I just took this as an analogy for comparing it with monolith applications where there is a single unit containing user interface, business logic, data access layer, all packed as one single unit and very easy to test, develop and also monitor. All good. But if there is a problem, then the complete system is unavailable. The system availability isn't questioned. And none of us like this hard failure. I'm taking another example, the smartphones. We have upgraded our mobile phones into smartphones. When the battery is 100% H, right? So I can do wonders with, you know, I can do many things. I can do my work. I can chat. I can share. I can access all the applications. The battery goes down before it goes completely off. It helps me do my basic functionality like calling or doing some work, sharing something in LinkedIn. So do you like the soft failure? And soft failure is good. I just apply the same thing in our software development. And I segregated into application as well as infrastructure. So coming to the infrastructure, we have great caching tools to prevent overloading of a database. We have scaling in and out to launch the instances when there is a demand. Load balancer, which takes care of routing to only healthy instances. And our containers will automatically recover. On the other hand, in the application side, application architecture has evolved from Monolith to microservices where we have smaller scalable and highly available functional units. A ticada code with Monolith, development and operations were operating in silos. So they build, release and hand it over to the deployment in the operations team, which takes care of the deployment, right? And microservices will merge these two phases, release and deployment. It was single step called continuous integration. Where we deliver services rapidly to support the ever-changing needs of the business economy. If we consider the Monolith application, the system is homogeneous. Similar to development and testing, even monitoring is so simple. With microservices, where there are many moving parts and they generate a myriad of data, say something goes wrong. Identifying a root cause of an issue itself is like, you know, finding a needle in a haystack, right? Even Gatner says that, you know, the traditional monitoring tools leave a gap in identifying or giving the data inside which are required. So what they can say here is, traditional monitoring is not just enough. What is this new word, the new buzzword observability? Let's start with the basic. The term actually means, from the control theory, it's a measure of how well the internal states of the system can be inferred from the knowledge of its external output. I'm going to share a couple of ways to produce such external outputs so that we can infer the state of the system. And I read that logs, metrics and traces are often known as the three pillars of observability. There are many powerful tools in the open source as well as in the commercial markets like, you know, some logic, NK logs, Prometheus, open telemetry, Yacker. I have added some of them. So does that mean my application is observable? Not exactly. The services generate a myriad of data, what needs to be observed, what needs to be ignored and it wasn't clear to me. As a philosophy, observability is to know what is going on inside our system from the signals it produces. So during the operation, we can consume those signals but we cannot produce those signals. So observability shouldn't be after that, like during the operation cycle. It should be like, you know, thought well through during the development phase. During the design phase, it should be like, you know, observability driven design. So design for what behavior is normal, what is up normal and to design for developer policy. During development, built to produce some meaningful signals. Google codes such as like golden signals in its array book in detail. During the testing phase, similar to functional testing, we should have testing for verbose logging, verbose phrasing and observability driven testing. During the operation and monitoring, then we get good signals. Then that can be consumed. We kind of know generated that in the operation but how do we fed it back to the development so that, you know, there is a continual improvement in the development quality. So I'm going to add a new phase called observe just after this monitoring and before planning and I'm going to consume all these signals and fed back to the development phase. Here we got a new cycle. In order to achieve that, there are some must to have architectural patterns. We have seen them in philosophy. Let's see how to achieve them. Number one, monitoring. It is as important as the system is and it should not be compared with observability at all. It is not just for identifying the application issues or security issues but also for planning, decision making to ensure a more reliable system. At the same time, aiming to monitor everything is like an anti-pattern. Number two, distributed tracing. How many of us have driven the microservices application under fire especially during an outage? Thanks to Dapper, the Google research paper on distributed tracing at detailed information on spans, traces are well explained. There are tools like open tracing, now it's called open telemetry and they implemented it and we need to add traceability into our system. Number three, event driven. Rest is not an all in one thing. It could have happened to all of us even for applications like Google where there is a slowness for obvious reasons. And Grigger had detailed it in enterprise integration patterns how to make them even driven or loosely coupled. Number four, retry. Most of us have implemented the retry logic. If some API is not responding, retry it in after two seconds, three seconds or some fixed amount of time. Now tell me what is the use of retrying after every two seconds or three seconds if the application on the other side is on maintenance mode. So we need to have effective retrying strategy like exponential back of increase. Number five, fail first and fall back. So what does it actually mean? Is failure a desirable state or what? It's basically a behavior like, you know, service A can try to waive or service B risk for some time and it is not responding means, okay, just go back to the plan B, that's the fall back method. All right, let me recap. We started with monolith and we have learned a lot from heart failures. Then we started building micro-services where the system availability is not in question, it's ensured, so there are some failures. Now it's time to build observable micro-services with built-in resiliency mechanism in the system. Essentially the observability culture changes the way we think about the development process, testing process and the operation process and it index an operational mindset into the development. So doing it so, it increases the resiliency of our application and make the operations and monitoring very simple. It's not a one-time heavy lifting job and it has to be a continual process we should embrace. I come from Imagenia Technologies, consultancy as well as product development. We have rapid application tool called WaveMaker, appetite for API platform and high-scale for container orchestration. Thank you very much for joining me and thanks to BNCF. Thank you.