 So, I will talk about what observability is, why we need observability, the observability based on monitoring, because it's a common question, right, is observability the same as monitoring, or like what are the differences, right? The pillars of observability, observability with Kubernetes built-in tools, and also CUBE CTL, extending the Kubernetes cluster to use external observability tools, some best practices for observability in Kubernetes and share some links and some insight on like how teams, cloud native teams, observe their clusters. So, what is observability? So, the English word, right, observe observability is basically to know what's going on, right, on something. So, observability is a process of gaining insight on your application, the behavior, its performance, right, in order to like identify issues, right, that may resolve, or basically to know what's happening, let's say bedside view, right, in order to make the system more efficient, or know also when you have issues to diagnose and to resolve them. And in order to, for Kubernetes to ensure the stability of workloads, reduce downtime and Kubernetes is a lot, right, the cluster system with pods, nodes, there are several moving parts, so that's the essence of observability. Why do we need observability? So, for years, software have changed, right, the way we build software, from monoliths to service oriented, so now micro-services are like some other stuff. So, monoliths, everything in one place, deployed on one server, service oriented, maybe you can have your, the web application, the database, so all separated into maybe three layers or four layers. Then micro-services, companies on two services we use, like Netflix, Uber, there are hundreds of micro-services deployed across several clusters in different cloud environments, some on-prem. So, if you can look at it in like in the real world, right, so I have currently a PC, it's easier to look at a PC, right, then if I have three PCs, you can look at the three PCs, but is it a lot harder to look at three pieces at the same, three laptops at the same time than the first, than when you have one? Then, when you have hundreds of services, right, you can't really like look at every service at the same time, check each log at the same time, manually, per se, and that is the essence of observability, right, bringing two services, techniques in order to get insight on hundreds of services at the same time, setting up alerts and notifications, dashboards and all, to make sure your system performing as it should be. Without observability, you can't understand what's happening in your systems and you can't fix your problem. I didn't really see, I wasn't able to get who exactly said this, so I hinted someone. So whoever said this, you are listening to this talk, yeah, thank you, this is good. Observability-based monitoring, so I don't know if you read the introduction or description rather of my talk, right, I talked about like some confusions and struggles and hard-wired understanding of observability in Kubernetes. So and this is a popular one, right, and if you check a lot of articles that have been written on observability, you see that they are differentiating between observability and monitoring. So in short, right, monitoring is collecting data from a cluster, it's component nodes, ports, containers, to ensure they're all performing as expected. Observability, on the other hand, is a broader concept, right. So you can see monitoring is under observability. The goal of observability, like I mentioned earlier, is to understand your system, to know what state is in, what's currently happening in your system, in order to, let's say, do things better, right. So when your system is using more CPU resources, right, or memory resources, so okay, the system is working, the users are using it. But then again, you think about peak periods, right, let's say during, for USA, an e-commerce application, during Black Friday, so what can you do to make your system better in order to save costs at another time and also to give your users better performance. So, yeah, so everything that comes together, both debugging, right, maintenance, and everything that comes together, making your system better, can be put under observability. So both use the same type of telemetry data, telemetry basically is taking data automatically, right. So these data are known as the pillars of observability. So next pillars of observability, so I kind of like didn't put in my description, I say I mentioned four pillars, right, but I kind of didn't put it in my slides here because initially, most people say three pillars, right, but there's a new addition profiling. And also, if you look at it, some teams right now maybe using more than just these four, right, like per se in the development team, how they practice observability, because they are teams, they are observability teams out there. So hence, I just made it like a blank, an open page, right, in the sense of whatever you want to add to it, whatever you want to call it for your organization, for your team, and yeah. So looking at the current for the new addition of profiling, logging metrics, tracing, profiling. So metrics in Kubernetes, right, is a type of telemetry data to know what's happening inside your pods and your nodes in your cluster as a whole, your namespace depends on how like the particular aspect of your Kubernetes components you want to look into. So Kubernetes components emit metrics in primitive format, right. So primitive format is text-based. It's machine, it's text-based. It's easy to read. I have a screenshot of an example. It's easy to read and easy for both humans and machines to understand. So let's see if for you to create dashboards, right, a set of alerts. So there are some issues that like your system, let's say you're using Grafana, right, with primitives. So there are some issues that your system may not catch or the setup you use may not catch that you'll be able to read, right, while debugging or at certain use cases, right. So in most Kubernetes components, metrics are available with the metrics endpoints of the HTTP server. So this is an example of primitive metrics. Like I said, it's text-based and it's easy to read and for one to understand. Then logging in Kubernetes, logging, collecting and storing data about events to diagnose issues and when stuff occur. An example is like there's an error in our application. The first step is to look at the logs, right. So and also you look at it depending on industries like in FinTech industries, in FinTech industry or financial industry, you have to like kind of like for compliance purposes, you have to keep logs for like a few years, right, in order to know what happens. And if you have some live scale issues you need to resolve, you need to be able to submit these logs for compliance purposes. An example of logs. So this is an example of how a log might look like. So breakdown, timestamp, log level, name of the component, so easy on our app or database and all the unique identifier, then message, right. In Kubernetes, you see starting up container filling, there are different like kind of logs that you might encounter. So tracing in Kubernetes, tracing basically is to know like what's what's happened, right. So when a user makes tries to log in, where each request is sent, we had the server that the user tried to that the request was sent. Was it this my replication in, let's say the initial application was deployed in the EU year, right. And someone is trying to access it from, we are from Nigeria, right. So we had the request, let's say the content or the feedback was given to the user. Was it from the servers they have in the local zone in Lagos or the ones in South Africa, less if you're using AWS. So to know where the origin and the entire flow of request, right, in order to know when you have issues and also to see bottlenecks in your application, do I need to use a better CDN, right, to not deliver this content or this particular service to my user. So the tracing system, tracing tools like Yega or Jega, however you want to call it, Zipkin to enable distributed tracing in Kubernetes. So yeah, this is an example of what I just explained, right. The user makes a request, front end shows the user. So like, if there's an error or something occurs, right, as this is happening, we'll be able to know like what happened, where it happened and know how to make it better. Then the next pillar of observability, which will see the latest addition is profiling, profiling to analyze, to know what the particular, let's say resources, right, each section of code, each section, each pod or each deployment user, right, now let's make it better. So there's CoupFlame, right, by Yahoo. It's a QCTL plugin for you to know. So it's just with a few commands, there's little to no overhead. You don't have to build any infrastructure from ground up to know exactly what's happening and profile your system. So generating Flame Graphs. So Flame Graphs, so this is an example, right, my SQL database, CPU Flame Graph. So when each SQL query was run, what happens? Like, is it the performer, does it use more CPU, right? How can I optimize that SQL query? Should I use like a dependency? So this helps you make your system better. This helps you prevent issues in the long run. So then observing, observability with QCTL, with Kubernetes built into QCTL. So implementing probes, health checks, using QCTL tools to monitor Kubernetes application. So probes, right, so the English word probes, so basically means to investigate, to ask questions to know what's happening. So in Kubernetes, there are three types of probes. We have liveness, readiness, startup probes. So readiness probes to determine whether a particular container is ready to serve traffic. So, you know, when you deploy an application on Kubernetes, it starts from container creating, then there's an entire container lifecycle, right? So once it gets to the running stage, right? So, okay, for a pod, right, once it gets to the running stage, it's possible for your pod to be running, but a container in your pod is not running, and it's not ready to. It's less, it's an internal fail state, right? So that's possible. And you will see that by just running QCTL, get pod, right? So how will you make sure that, like, there's no, an issue that, oh, your application shows running one-on-one container running, right? How will you make sure that it's ready to serve traffic? It's ready to be, it's ready to go live as per se, right? So, using readiness probes, startup probes to determine where a container is set up successfully, and also, live readiness probes. So, okay, this is just, so I'm not going in depth, right? This is open source on Ramp. I'm not going in depth to explain each particular, to each particular feature because of the, the goal of this particular section of this summit. So, this is just to give an, give an understanding, right? So, an introduction level into observability in Kubernetes. So, this is a manifest file for, let's say, demo deployment with the readiness probe. So, if you look at the readiness probe, you can see it's run the command core, look at our street thousand, which is the part of the container, right? So, it pings it. If this return and exit zero means the container is not ready to serve traffic, right? So, it has an initial delay of five seconds, period seconds of ten. So, after ten seconds, it tries again. So, when that request is successful, right? Then, you can see that the container is ready to take traffic. So, now, implementing this in as like the first step, right? To deploy your application. Well, you're sure that if this port is running, means that every container in the port is running. So, I kind of like prepared a demo. So, let's observe a cluster with kubectl together. So, I don't know if you can scan this for, to go to your KIDACODA playground and also scan for the GitHub with some commands. So, I'll just give you a moment to scan while I, okay. So, I have a kubectl playground here. Okay, let me start a new session. So, this is free. I think you just have to log in to KIDACODA, KILACIO. So, K get ports. So, K is set areas for kubectl or kubectl, however you may call it. So, this GitHub repo, I've set up some commands. So, in order to know, or you can use this, let me start from the scratch. So, now, see all the namespaces in this particular cluster. Then, to know what's happening in each of these port, you can use the describe command or to see the logs, you can use the logs command. Then, the goal of this, right, is to want to see the resource utilization of ports and nodes in the KILACODA Kubernetes playground. So, to see the monitor node and port utilization, we use the top command, right. So, let me go. So, this command allows you to see the resource consumption of nodes and ports. It requires the metrics server. So, like I mentioned earlier, Kubernetes emits metrics, like third metrics endpoints. So, and it says this command requires the metrics server. So, if I try to see the kubectl top node, if I try to see, oh, I will get metrics API nodes available. So, this process, right, this entire process is kind of like debugging, right. I want to see the resource utilization of the nodes in my cluster, but I can't see that. So, it's giving me an error, metrics API not available. So, I added some commands also, some commands, so we need to first install, deploy the metrics server. So, copy that, then deploy the metrics server. So, you deploy the metrics server, it's RBAC, robust authentication, service account and everything it needs to run for you to access the metrics of nodes and ports. So, if I run this command again, I would still get the same error and that's because, okay, this is a playground cluster. It's not really like a reward cluster, right. It's not meant for you to deploy applications. So, the metrics server is designed on the Kubernetes repository. It's designed to need secure access, right. You know, the Robes authentication hours are there. So, what next I need to do is to take away that, for this demo, I need to take away that, I need to make it possible for me to access data from, for me to call the metrics server in security or unsecurely, rather. So, this is just for demo purposes. So, I need to edit the deployment and, okay, then head over to the container and add this argument. So, you know, with Kubernetes, with YAML configuration, right, this is a top level, how do you call it? So, the fact that I don't put this dash, right, before the command means that the following flags I will add would be under the initial arguments. So, let me add this. So, if I try to call, to run the command again, okay, so now I see it's not metrics API not available. So, let me check why I use K get ports, dash n, namespace, keep system. So, I see, okay, it's dominating the initial deployment of the initial port of the metrics server and it's deploying the new one. So, if I see, so it's still terminating. Okay. So, now I have the metrics server running. So, let me try again. Okay. Now, I can see this, the control plane node is using 14% of the CPU and I can see the metrics of the nodes in the cluster. So, if I go for ports, I can see the metrics of the nodes in the system. So, now I can see for ports too. So, moving from just the metrics server and implementing prometheus, right, you can then have a complete monitoring pipeline. And from then you can now build dashboards, right. Let me go back to the slides and, yes, then extending the next part of this talk, extending a cluster to use observability for metrics, right. There's prometheus, victoria metrics, inflows dv, graphana. Graphana is not really just for metrics, but it's mostly used for creating dashboards, alerts. Graphana basically for logs, you can also make dashboards for logs by implementing graphana. Also, pyroscope for profiling is built by graphana. Or you can use an all in one or basically a non open source, observability tool like Diana trees and there are several out there. For logs, plain D, log stash, like I mentioned Yeager for profiling city of flame pyroscope. So, for metrics, right, like I mentioned earlier, every component in most components in the Kubernetes, then prometheus, right, deploying prometheus, it extracts those metrics, then connecting graphana, you can be able to create dashboards and display those metrics, right. For next, you have extending a cluster, external logging, right. So I wrote an article on sidecar container, it's on this space, I don't know if you know the company split space for IEC deployment, but there's no I'm talking about today. So basically, multi a port can have multi multiple containers, right. So the sidecar container is for logs, streams the logs to flu and D. So the exits are flu and D. Ideally, you can use like writing a shell script to stream logs. But the essence of flu and D is like is a unified login layer, you can perform more complex actions, right, on your logs and more performance, then you can study logs, you know, an external log back end, maybe Amazon history or maybe deploy mini or on your own on-prem or your personal server. So for tracing, so you know, we talked about Yeager. So this is implementing tracing with Yeager open telemetry. Open telemetry is like a standard for standard and open standard open source for how telemetry data is being taken and transferred. So the open telemetry collector takes the data from the application, then Yeager collects the traces stored in an in-memory database, and you can then know what happens with the data, which request or that happens in your system. Then profiling, right, like you saw for cube city of flame, in order to claim frame graphs, you can install a Grafana agent or Pyroscope SDK in order to profile and know what section of code or what deployment is taking more CPU and able to determine performance or to resolve issues. Then best practices for observability, a lot of best practices out there. But this I would say is like the most basic and things to keep in mind, consider your specific telemetry needs, right? So when you first deploy your application, what telemetry data do you need to look at first? So if you look at it, right, like I mentioned earlier, profiling was basically was done now. It hasn't been this kind of new addition to the system. So do you really need to profiling, right? You know, Kubernetes is complex on its own. So adding motoring increases complexity to your system. So you look at what you need now, the basics like we did was metrics, right? So you need to know what's using what. So that's the basic keep things simple, right? Depending on your own team and the application being deployed. So keep things simple. From there, you can scale as your application scales and your team skills. And an important note is, and like Abubak had talked about in the earlier talk here, security rights, permissions, user access, access control. So you shouldn't be using it to that grants everyone access, right? So the metric server, you know, the demo will be looked at. I couldn't, it didn't deploy the it failed it because of the cluster, the playground cluster I deployed it on doesn't have security set up. It doesn't have user access and rules set up. So I couldn't really. So I had to make it insecure in order to access. So that's not what you should do in the production grade environment. So it's important to whatever tool you're using, make sure like, can you can give this privilege is right to each user to from the engineering manager to administrator to the coenities developer, everyone should have like precise access and just have access of what they need to do at a particular time, right? And yeah, how do cloud native teams observe their clusters? There is a pretty, it's an entire blog highlighting several user stories. I highlighted it here. I've also added my presentation to the schedule. So there are several use cases. One can basically look through read and learn in order to implement in your application or your platform. But one one thing you want to keep in mind is like no application is the same, right? So no application receives same amount of users from same same country. So an application deployed in Nigeria shouldn't use try to replicate the observability of an application that was deployed in Spain or in the EU year, right? So you need to know your platform, know your application in order to deploy deploy your observability tools and techniques. So this user story just serves as a guide, right? So that's where for you to learn and know what you should try and implement on sandbox environments, right before going into production. So melt what are you adopted to the system? I don't know if you know, sirium. So sirium is basically container networking interface. But sirium does basically everything on the networking aspect from observability to security. So I think sirium has the has the future for tracing packets be a BPF trace. Can we recall exactly I worked on sirium last year during the Google season of dogs. So raise up a raise up is an Indian company, right? So someone trying to deploy an application is in India should okay, maybe you're more interested in checking out what raise up a deed, right? Are you doing something fintech related, raise up it will be your your best pet, then who is shooting interface. And also these are several examples one can read and understand how do these things in the observable classes. Some of them use the tools I mentioned. Some of them build their own tools on scratch, like Netflix, I think it's Netflix builds its own from scratch. So it's just for you to learn makes much test at sandbox environment and be sure what works for your application. Then conclusion, like I mentioned earlier, observability is broad. There are a lot of things to consider. So in from match in metrics in profiling in logs, a lot of things to consider, but starting small keeping things simple and scaling and learning, right? I wouldn't say myself I'm an expert in observability, right? But is those learning that I'm able to understand like, Oh, this is used for this, I'll rather use this for my application or use this. So you just want to keep learning and trying and implementing within your your organizational environment. And like I said, more about that in sandbox environment before moving into production. So if things break, or if things, if you're deploying your observability stack, but it's in turn having effect on your application, that doesn't make sense. So you know that in sandbox environment before going into production. So like I mentioned, Kubernetes is complex. Nodes, networking, Kubernetes has a flat networking. I think topology does the word. So there are a lot of moving parts, a lot of things are happening. So in a team, maybe only two people might know some stuff about a particular Kubernetes cluster that have been deployed. So more tools you add, the increase in complexity. So start small, keep going, but trust in the point of starting small and keeping things simple in order to scale. And yeah, that's it. Thank you for listening to me. And yeah, I'm on Twitter, LinkedIn, my GitHub. I try to write as much open source code as possible. And yeah, I hope this was interesting and insightful. And if you have any question, you can ask and I'll try my best answer to the best of my knowledge. Yeah, thank you. Is there any question? Absolutely. Yeah. Thanks very much for the presentation. I was curious about why you think that there is a new, I would call it a new fourth pillar in observability and profiles. And maybe are we ready to obliterate the pillars and just call them something new? They're all one thing at some point. Like what's the difference between a log and a trace? And a is it just an event at this point? Okay, so I would say like it's been new doesn't mean that it wasn't companies and organization weren't doing it before, right? When I say new is like, okay, if you're talking about observability, it's been like categorized, right? That's why I would say when I say new is like being categorized. So hence why I didn't give it like, oh, just three pillars or four pillars because someone on the organization can call it a different thing, right? So I don't know if this answers your question. Well, I think I want to keep pushing and say, okay, if the categorization is the only thing that's happening, let's stop categorizing it and just say observability is built on events. Oh, yeah, that makes sense to. So like I said earlier, right? Depends on like what you need right now. Observability, like I mentioned, observes what happens in the cluster, right? So you can decide to categorize it. You guys decide to not categorize it. Hence the confusion, right? Hence the confusion for me, like I mentioned in my description, trying to understand is monitoring different from observability. Is it? Well, it's actually the same thing or you can say observability is built on anything you just need to know, like mentioned, events that are happening across the cluster and having an idea in case of issues, know how to resolve issues or to make your application better. Yeah. Any other questions? Yeah, thank you very much for that. Yeah.