 OK, good. So good afternoon. And thank you very much for coming to our long version of observability talk. So we're going to spend about next one and a half hours together. So yeah, I hope you all have a great time. Also, this time gives us a bit of flexibility on how to handle the content. So I just wanted to ask and invite everyone, if you have any questions throughout the talk, please don't hesitate to ask us. We're happy to go back and show things again and reiterate on something if something is not clear. Also, if we speak too fast or whatsoever, or if you can't see things on the screen, well, just let us know. And we try to add up to that. All right, so with that, I'm going to hand over to my colleague for introduction. Hi. Yeah, I'm Tiffany Jernigan. I am a developer advocate at VMware. So I mostly deal with VMware Tonsu or Open Source and Kubernetes. If you're trying to learn more about Kubernetes in general, which is probably part of why you're here in the first place, or you're interested in spring, we have things that are at Kube Academy and Spring Academy, where you can just go to kube.academy or spring.academy, and there's a bunch of free classes on there to learn a bunch more in that space. So yeah. All right, thanks. So I'm going to say a few words about myself as well. So my name is Matthias. That's me. I work for a company called Novatech, medium-sized company in Germany. We do mostly custom software development, but also consultancy and training. That's why I put this logo up here. We just got certified as a training partner for the Linux Foundation. So we teach classes for the CK80 exam. And besides that, I'm also lecturer at two universities in the area of Stuttgart. So that's also basically the reason why we two came together to give this kind of talk. As you have seen in our background, both of us help people to get educated in and around the topic of Kubernetes and the ecosystem. And that's basically the same thing we try to do today. So today, the focus is going to be the observability part all around Kubernetes. And for the introduction, I'll hand over to Tiffany again. OK, so a big question is of being here at all or why this talk is, there's a question of why. So the need for observability maybe is clear. Maybe it's not. Basically, I guess who here has already used Kubernetes? Some people maybe? OK. All right. So if you've played around with Kubernetes, like you've probably found that it can be pretty complicated. There's tons of different things that you have to learn. You have to understand all the things that you're dealing with and maybe how they interact with each other and just trying to figure out what is going on. So maybe you're this person here and you have all these pods running or containers running and you're just trying to figure things out. So basically, what our goal is is to make things easy, show some tools and ways to make it easier for you so you can actually see what is happening more so in your cluster and how things are interacting with each other. So going in just like a high level overview of some things to make sure to level set with everyone. Basically, you can see that there's two different layers here. You have the top part, which is the applications and your workloads, and then at the bottom you have your infrastructure. Just basically making sure that everyone is just kind of understanding that if you're pretty new to this, that basically you have a Kubernetes where you have a cluster that consists of multiple of these nodes, and then on top of those nodes you have some sort of workloads that are running that have containers and it can be whatever application that you basically want to have. So to look a little bit closer into the infrastructure side of things, we have two main parts. We have the control plane, which has the API server, which we will see a little bit later. It basically has control over the cluster. Then there is the set for the worker nodes, which usually you have multiple of them. You can scale up and down as you need to, and basically this is where you'll run your applications afterward. So to look at this a little further, so if we have a bunch of different applications here, you can basically see that it is independent of whatever type of programming language or whatever framework that you have. You have, and with these different types of applications, there's different ways to go about monitoring them. Basically you want to know, hey, sure I have these applications, are they running okay? What is, are they talking to each other properly? Like just trying to figure out more of is what you're expecting actually happening? So if you are familiar with some of the Kubernetes API objects, so there's things like you have the base level, which is pods, which is where your containers are running. There's things like services, ingress, deployment, et cetera. So these are built in there, and basically with all these different things that you might have running, you might have things like multiple languages, you have a bunch of different components, and trying to observe all of these things manually on your own can be a lot. Maybe you can go look at some of the kubectl logs, or you can see, hey, this pod is not running some container failed or whatnot, but trying to deal with that yourself can be pretty complicated, and you need to be able to understand how all these things kind of work. So one component I want to go and highlight is the API server. Basically it runs once in the cluster, in the control plane, and it's basically how people, such as yourselves, interact with the Kubernetes API and can do things with objects like creating a pod, deleting some things, scaling, et cetera, and later we'll see how significant of a role it plays when it comes to observability for what's going on in your cluster. So just to kind of summarize things a little bit there, so we have the infrastructure, we have our control plane, we have our worker nodes, we have the API server, there can be a bunch of different languages, and then there's all these things that we need to be able to observe. So basically we don't just care about all these individual things that are happening here, we also care about like the state of, like how are they interacting with each other? It's this part of the application interacting with that one and is it getting actual data? Maybe it's something actually making it to the database, especially since we're dealing with microservices, it's not just one thing that just is standalone. Like is there a high latency? Are there errors or different things like that? And then there's things like app level metrics or infrastructure specific metrics, and basically because there's just so much, observability is pretty complicated. So basically if we are like observing applications and Kubernetes, we need to understand like a little bit of like how it's being handled. Like we've seen like what we care about observing, like the next question is like, how do we get this kind of information? So there's like various places that you can add agents to do monitoring. So like there is the level of like your application that you have there. Then there's the level of you have a container, there outside of that is a pod, and then outside of that is a node. And basically you want, you as the person who's trying to do observability, you need to be able to care about that on each one of these levels and that can be pretty complicated and confusing. So just kind of like interacting, talking with like the API server, talking with having agents in various other places there. So for instance, we could have an agent there and it can collect metrics for having like observability dashboards for instance. You can also do it for your nodes as well. And then for your application, then there's also the ability to have a sidecar for dealing with the service mesh for instance as well. So basically at the, there's all these places that you can go and get metrics. And like there's a bunch of different types of metrics and also there's not just that, there's different levels of overhead. Like there's also how much do things cost, how much effort do you have to put in to be able to observe at that level. And basically it's a trade off of what do you actually need. So that's where some tools come in. There's this lovely CNC escape. Hopefully you have seen it at some point in time. With all of these different things that are going on, there's like this is not just observability. There's pretty much everything that you might care about in the CNCF landscape. In this one little section here, there's things with observability and we're not looking into like the chaos side of things but we're looking into like the observability for monitoring a little bit of logging and tracing in this. So just to kind of like give a little agenda. So first we're gonna talk about the Kubernetes API, Prometheus and Grafana, then go into things with service mesh with sidecar based and EBPF as well and then application based. All right, I'll probably go back one slide to this real quick to that overview and agenda we plan to address today. I mean to summarize real quick and what Tiffany just said and I think this is very important for you to understand is like the challenge that we have here is on the one side we have a lot of complexity because of like workloads, API, objects, infrastructure they all belong to the thing that you wanna monitor. On the other hand, you have a lot of tools and now the challenge is very often to say, okay, which of the tools is actually matching my kind of problem? I mean, what are the things that I want to see and which tools would give me that and also what am I willing to invest for those tools and with investment I'm not plainly speaking about financial issues but more also about like complexity and overhead issues. How much does it take me to install those components? How much overhead will I have during runtime? Do I have to change my application infrastructure? Do I have to rebuild the container? All of those things come into evaluation when you set up your observability kind of, yeah, infrastructure for your own environment and so we wanna go through these and then address exactly these points and say, okay, what are you gonna get with Kubernetes API and what are you not? How much do you need to change or to get there and similar with all the others? So also a little disclaimer here, as we have seen there are many tools and the 90 minutes are definitely not enough to address all of them. So the things we are trying to show are on the one hand the kind of tools that have a certain maturity status like they are graduated or incubating or also some tools that we have some personal experience with and say, well, these have helped us at that stage by doing this and that. In turn, and this is very important to me, this doesn't mean if we not mention a certain tool that we don't think good of it. So certainly, I mean, we might do things on justice by not knowing the tools as well. Also for us, it's kind of hard to keep up to with the pace of how quickly new technology is getting edited at landscape, how mature they are and so on. So we try to give you a rough overview of the different things and the different aspects, but I'm sure there's one or the other we also might probably be missing. And this is also a question to you. I mean, if you have any feedback on those things and please give it to us because we will surely talk about those things in the future again and would like to make sure we're not gonna miss it and any significant part there. Okay, then I'm going to start with the first aspect. So also the way we're gonna do it is like we kind of increase complexity and footprint and also the depth of monitoring. So with the Kubernetes ABI, most of the things will be very high level. With service mesh, we're gonna go deeper into the cluster and in the instrumentation and stuff. And finally, application-based will be the deepest level. So also in a way, we could say, we first have some tools where we can sense an error of see on a high level basis where there is something wrong. Then we're gonna go further into how can we isolate the problem? And with the final one, it's more like drilling down and get to the root cause and find out why things are actually not as they should be. All right, so looking back at the graph that Tiffany has showed before, there is, this is the API server where we're gonna connect to and try to query information to figure out what is going on. So also for people that just starting or have started with Kubernetes, this is probably also the first thing you're gonna interact with is the KubeCTL API. And technically what it does is of course, basically the KubeCTL API is a client to that API server as well. So it just helps you on the command line to query objects, to create objects, to drill into them and so on. And we will see that the KubeCTL API is not the only tool that does that. And there are a couple of others there too. So this one is, let me just see maybe. So another little disclaimer here. We actually intended to show you most of the tools live. However, we had quite a bit of internet issues here in the, with the conference Wi-Fi. So we have backup recordings now that we're gonna show. If there are further questions, we can still try to show with things live, but we cannot guarantee that they actually work. So these recordings have just been done yesterday afternoon or this morning, so they're brand new. And that should be fine. Okay, let me just have a little quick. This is, oh, this is the debug one. We're gonna do that later. So, yeah, sorry for a little chaos in the slides here. So as you have seen, there are tools, as we said before, and those tools to connect to the API server can kind of be categorized into three different, like, yeah, categories. So there's the CLI ones. There are also some web-based ones and also some fat-clined things that you can install into your machine that gets you all those metrics. I've been dealing with this topic for a while and when I started with it, there were tools around, there was Octant from VMware, there's Scope from Weaverx and Lens, and K9S as a command-line tool. Now, this is also an important thing what has happened here. Two of those tools have been deprecated and discontinued. Another one has gone more into the commercial direction. So also what we wanna show here or do here is definitely not a vendor pitch. So we try to not focus on commercial tools. We really wanna show open-source solutions to also give you the idea, okay, this is the technical capability, what you can do with it, and if you decide to go into something commercial, that's absolutely fine, but that's not what we wanna do right here. So as those projects have been kind of removed, new ones have come in. So for example, there is Headlamp and Schooner. Those are two also in the CNCF landscape that we can look into. And K9S has kind of been sticking around all the time. So this is definitely one tool we use quite a lot and we wanna highlight here as well. So I have a few screenshots, but we are definitely gonna see some live stuff as well. So in the end, you see here, you get the overview in the command line window with K9S, so this is certainly good if you don't have any browser or own machine. So you can always just, if you do a remote terminal session, this will always work. If you have a browser, then the tools like Octand and Lens and the others, they roughly provide the same thing. I mean, they give you an overview of the various objects that you have in Kubernetes. They do kind of a logical grouping. They provide a bit of an easier navigation that you can easily go around. And they can be certainly helpful, especially if you're new and starting with this. So there are screenshots from a few of those, but you can certainly see, yeah, you have a bit of an, I can't maybe see this, here an aggregation of the various components. You can expand certain sections to make the navigation easier and so on. So the one that I'm gonna show in a second is actually this one, which is called Headlamp. I think first we're gonna look into the, where's my mouse? Ah, here. All right. So I hope this is actually visible well, because we can't expand it much. I checked it on the TVs before, so that was actually quite okay. So this is what you probably have seen before, querying, cube CTL, get commands, getting information about the various components. But this is all within the limitation of your screen. So you can see if you have a lot of pods, it will become difficult to navigate around and keep the overview. And that's definitely something where K9S can basically jump in and help. So even if you make it a little smaller, but then you can't read it anymore. So yeah, this cube CTL environment will be limited. So you can, of course, group multiple objects. So if you do pods and services, the output will become even more. And in the end, all of those toolings that we're gonna show, they have access to the same source of information. So technically speaking, you can do all these things with cube CTL, but doing it in a tool like this, if you get familiar with it, will certainly help you navigate around. So you see all the pods here now. You see the CPU memory stats. You see how many container a certain pod has. You see the services, the port mappings, and so on. So everything is in a way quicker kind of overview. And it's certainly helpful to get a quick glance of say, okay, is my thing going all right? Do I have certain components that are actually restarting? Then they would be highlighted in red. So you get a high-level kind of information of how your cluster is actually behaving. And you can switch the object types. So in this one, I'm basically looking into the namespaces so I can narrow it down to a single namespace. So I see only the objects of that one, which will then certainly help. Okay, so this is how it basically looks, if you look into one of those pods, you can look into containers. So you get all the logging information there. So in case if something is not behaving well, you get the log access straight away. This is basically equivalent of a describe command. So technically speaking, you have your cube CTL thing all in one place. And with an easy navigation. So I think that's enough for this one. So the next one I would wanna show, I think is the tool called, and it's the same one. Okay, sorry, thanks for pointing it out. So if you run this tool, you basically have to create a token first that lets you sign in. So your access is kind of secure. And then you have the similar thing based in a browser. So you have the grouping of your objects to the right. It's categorized into workloads, storage, network, and security. And then you of course have more possibility to display things in a more beautiful way. You get the list of all the pods. I'm sorry if this resolution doesn't come out here super well. I'm trying to explain what is happening. So here you see the amount of containers in a certain pod. You see if the pod is running, if they have been restarted, how long they have been running. And one of the nice things really is, so you can jump into it, then you get all the details of the pod. But you also can query the logs, which I'll probably show in a second. Yeah, so this icon basically displays the logs. But what I find even more helpful is the fact that you can execute a shell inside of your browser. So if you go back here and click on this little terminal window, then it's basically opening up a shell in your container given if your container has a shell as a binary inside. So you don't need actually an SSH connection or whatsoever. You can do all that through your browser. And most of the tools that I've been talking about before, like Octant, like Schooner, they all pretty much provide the same function scope as this one. So we're not gonna go through each and every one. We just want to go with that real quick. All right, are we doing time-wise? That's okay. Yeah, and of course you have options like filtering your output and say, okay, I wanna see only the components of a certain namespace. And adjust basically the output to the way you want to have that. It's a bit difficult also for us to navigate through the recordings if you're not exactly know what the next move will be by the time we did the recording. So if you have any questions, as I said before, please stop us and we're gonna look into it. All right, with that, I would go on to the next part. So one more slide, I think I want to go here. So looking at your Qubectl commands, we have basically seen whether they get, there's describe, there is logs. So we can query quite a bit of information of everything of other things that are container to give us. We also know that Qubectl exec, we can open a shell and look inside of a container and see what is going on. But there is, sometimes this doesn't work in case the container doesn't have a shell binary. So in this case, you cannot really log into it. You have no possibility to attach it from outside. There is this Qubectl debug command that actually does this. So with this one, this was introduced not so long ago, I think with Kubernetes 1.23, well it's like one or two years now, but I still think a lot of people haven't used it yet or are not even aware about it. So let's say you already know you have a container or a pod which has a problem. Then you can use that debug command and attach another container to your running container. So this is not exactly the same way as a sidecar. It's more like it's loading the binaries of that debug container into the running application container. So say your application container is maybe lacking of some things to troubleshoot, then you can use that and put it in that way. So I think the, no, no problem. The recording for that is over here. So I'm gonna show that real quick to make clear what I'm doing here. So I have a container here which is distro less. So it has no underlying Linux distribution. That means there are no binaries inside except what the application needs to run. So if I now go in and say, okay, I wanna execute a shell. So I do a Qubectl exec. Oh, I'm queried once again. So just to get the name. So this is the one where it's running. And now I can try an exec to start a bash. And if I do so, just need to, so if I do so it will tell me it doesn't work because there is no bin bash available to execute. Also if I try other commands like just do a plain LS or something like this or an Nf, it won't work because those commands, they don't exist in that container. That makes the container really, really secure. But in turn, it makes it also very difficult to debug because you can't get anywhere close to see what is actually going wrong here or what can I do? So with this debug command, I can basically say I'm now loading this container which is my own home-built debug container into that pot. And I'm going to target the container with that name. And this will, I think take a few seconds because it has to load the binaries in. But after that, let me see, so it's executed now. And this is the advantage with the recording. You can fast forward a little bit. And now we can see here, we now have a shell in that container because the shell binary now exists. And now I can suddenly do all those commands that I wanted to do before like N, LS, whatever. I installed a bunch of network tools in there to troubleshoot like NSLOOKUP, PING, DIG, and so on. And with those, you can check if your network connectivity works, if your applications behave fine, and more things like that. So the idea actually was coming from one of our friends, my colleague Paul. He asked me at one stage, is there like a way to have an htop command in a base container? And I said, no, normally it's not in a base container, but you can certainly edit. So I added this command through the debug container, and now I can individually look on all the different Java threads of this application, and I can do much more local debugging and troubleshooting than with the initial container before. I mean, this is of course something to handle with care. As I said before, it will totally break your security because now it's able to inject, of course, all kinds of binaries in a container, so don't allow this functionality in a productive environment, of course, but allow it in a dedicated scenario when you're at a point to say, yeah, I really have a failing application now, and I need to figure out what's going on there. All right, so I'm just gonna fast forward here again. So I'm gonna wrap things up here with the initial part for the Kubernetes API. So from all the different things you're gonna show today, this is of course least intrusive. That means the actual, we don't need to change the cluster, we don't need to change the application code, we don't need to rebuild the container. Everything can just stay as it is. That makes it very easy for you just to play around with those tools and say, well, I'm gonna try it out and see what it can help me, and if it doesn't help me, maybe I need something else. So it will also help you quite a bit if you're new to Kubernetes to say, okay, I'm trying to get familiar with it, getting an overview. But with that, of course, it also has limitations. So you can query some kind of network metrics, but you cannot really see which component talks to whom. Like what Tiffany said before, you have a distributed application, none of those tools would give you an info, okay, those two belong together and they have so much traffic in between them. This is certainly something you might wanna see, and this is something we're gonna address in the next part. So, I'm not quite. There's one thing in between. We just added this recently to this presentation about Prometheus and Grafana. I mean, technically speaking, it's kind of difficult to say we do a talk about observability in Kubernetes and we're not gonna mention Prometheus and Grafana because those tools, they are pretty much embedded in most observability solutions out there and so we just decided to give them a bit of a quick introduction at this point. So I don't wanna make it all too complicated. So you don't have to understand each and everything on that slide, but in general you can say, Prometheus is like your database backend. This is like a time series database that stores your data over time and then you can have multiple sources where Prometheus gets its data from. And on the other hand, you have this tool called Grafana which is a visualization dashboard. So technically speaking, they wouldn't have to be connected but they work very well together. So Grafana can basically grab the metrics from the Prometheus storage and you can decide how you wanna render those things for various purposes. And we're gonna see quite a few Grafana installations in the various toolings we're gonna show. Okay, so quick demo here as well. This one is rather short. So I'm using, I think K9S again here just to look what's inside the cluster in that certain namespace. All right, so here we are. If you scroll up a little, so here we have this installation of this cube Prometheus stack. And then you can see there is basically the Grafana installation. There's the Prometheus server. There is an agent to collect all the metrics and some other components, exporters, operator, these are not so super relevant for us at the moment. They just come with that entire installation. Now, look into those components. We're gonna look into Prometheus first. This one also provides a very basic web interface. So this is like Prometheus here and then you have this query bar where you can go in and say, please show me all the metrics that you currently have in your database. So you see, if you start typing and then they show up and there are many. So let's say we're gonna look at something CPU related. So this is the container CPU seconds. And if you query it, you get all the database results. I mean, those are all correct metrics, but of course they are pretty hard to read in that way. So as you see, the scope of Prometheus is really have to have this time series data. You can do more sophisticated queries to get more granular information out of it. But in the end, it's still something that you probably wouldn't wanna do every time in this way if you wanna fetch some certain information. Important thing to understand, Prometheus has all this information and various tools can talk to it and use the same kind of query language to extract that information. So I've narrowed the scope down now towards a certain namespace. Of course, the results get a lot less, but still it makes it pretty hard to read. Another sample I have in here, I think, so you also have this graph view that basically shows this historical data over time. So in this time, the amount of CPU time is accumulating. This is something I might wanna look into if this continues in that way. Another sample that I'm gonna show is the restarts of pods. So if I just wanna query how often has a pod been restarted? Then I can see there is one which continuously keeps restarting. That's that wave front component. So obviously something is wrong with this component, otherwise it wouldn't have this magnitude. There are a few others that may have restarted two or three times. That's probably something which is okay. Sometimes when applications come up, I don't find the peers, they might restart and aggression restart. So this looks more like a healthy curve. This one is something you would probably wanna worry about and say, okay, let's see what's going wrong here. Now switching over to Grafana, and this Grafana dashboard basically connected a Prometheus instance, we see we have a set of dashboards. And those come with that configuration that we had. And so things get a lot more colorful and visual and helpful to read. So in here we see the CPU metrics, the network metrics and the memory metrics for the various namespaces. We can also drill into these namespaces and then we get the information for the individual pods. So in the end, this is the same information as we just saw in Prometheus before, but it's just like prepared for a nicer aggregation and a nicer view for all the end users. So that's basically the story behind Prometheus and Grafana and as we will see later on in the other toolings, those are very often embedded as a backbone for on the one hand, storing the data and on the other hand, extracting and visualizing it. Good, so I think with that, I will come to an end with my part, at least for a while. And yeah, I think this is good, we can leave it here. So with that, I want to basically summarize like finalized that intro part. So we've seen the Kubernetes API, standard metrics queried by Grafana and Prometheus, but still what we're lacking is more an application kind of view, how do components belong together, how are their response times and so on and this one will be introduced now. All right, so yeah, this next upcoming section is a service mesh, but specifically for pod-based service mesh, we'll talk about in another form a little bit later. So basically, you can see the API server is grayed out because for the section we were just looking at, we were having an agent based over there, but it's a little different for how we're doing it with the service mesh. So we have our little application there and then basically there ends up being another, an agent that gets added inside of the pod as another container that is running there. So basically this allows you to get metrics on the pod level which was something that you weren't specifically able to do for before. Basically it's using the approach of being able to, how Kubernetes allows you to have a second or multiple containers running within a pod. So this was a survey that Cilliam did. Some people might be wondering, okay, hey, why do I care about service mesh with respect to observability? Usually service mesh I hear about, I just care about networking or something like that, but they did a survey basically being like, okay, what features of service mesh are the most important to you? You can see that there are things that you might expect like encrypted traffic between services, rate limiting, circuit breaking, retries, the things that you might think of specifically with like checking out things for networking, but the one that actually most people said that is a must have was actually for observability. So basically kind of like stepping back a little bit, and basically the base unit in Kubernetes is a pod and you have a wrapper around your container and then this is connected to the network traffic. But basically pods can also have multiple containers in them and these pods, they end up sharing the same network address. So basically anything that one pod can see on the network, the other pod, I mean one container in that pod can see on the network, the other container can see as well. So in general, it doesn't make too much sense to have like two application components that are just sticking around in the same pod. What actually makes sense is to have a proxy there and then that proxy can get information that is happening for the other application that you have running there. And that can go and basically collect all the network data since again they are sharing the same network. So that's maybe not so useful if you literally are only caring about one application, but often what you probably wanna do is care about all the applications that you may have running from within your cluster. So basically what you need to do is you need to inject a proxy inside of every single pod that you have running in your cluster and then you have all of that gets aggregated and by the control plane and then you can decide what you want to do about that. You can create policies and rules for how you wanna deal with directing traffic. You can see things for like how long a trip might take and basically you can get an overall picture of how things are. Each individual part, that one won't know about that one or that one, but the control plane will know about everything that is there. Another great thing is that it is independent of whatever framework or language you're using because you are not modifying your application in order to add this. You just have a separate container that is running in there. You can't get things like app-specific metrics at this level, but you can get all of the network metrics, but basically it makes it so that it's independent of whatever you have running. Another nice thing there is that if later you decide, hey, I don't want to have a proxy anymore, maybe there's just too much overhead. I need too big of clusters for CPU memory usage, et cetera. You can just pull out the proxy. You don't have to completely rebuild your application, anything like that. I mean, it will restart your pod, but that's kind of about it. So that makes it pretty accessible for you to try it out. Decide you like it. Decide you don't and it's not a big deal. And again, just a reminder that it is independent of whatever language or framework you're running with your application. And so then all of this stuff, again, ends up feeding into the control plane. So basically, there's all this information that you end up having. Like trying to, there needs to be some sort of way to be able to go and look and visualize that. So for instance, this is a screenshot from a tool called Kiali that specifically allows you to visualize what's happening with your service mesh. And so you can see things like, hey, which you see the arrows, like what direction is the traffic flowing? You can see things that are like percentage routing. So like that one's 80 something percent, that one's 13% and how things are going with that. There's things that you can look at with respect to like, maybe like, there's security, there's observability. There's just like a bunch of different things that you can look at or decide how you want. For instance, again, with splitting. Say, because Kubernetes, it only basically say you have three copies of a pod. It will do even splitting between those. You can't decide, hey, I want some specific amount between that. And this is a pretty basic application. Here is a more complicated one. There's actually a demo that will go into this a little bit more. But so this one is for the Spring Pet Clinic app, if you have interacted with Spring, which is a Java framework. This is like a more complicated example. And you can see a bunch of different colors and things there for showing different things happening. So basically we've been looking at a bunch of the stuff that is like, hey, this is really cool. This is why someone would want a service mesh. But like most things come with some pros and some cons as well. Basically there is a lot of overhead that gets added by adding a proxy inside of every single one of your pods. It uses a lot more CPU and memory. It adds latency because there's more network hops. And basically this is a benchmark thing that was done for Linker, D and Istio, which are two of the most common service mesh like I guess providers. But basically it's something that you should look into. Again, as a reminder, you can just pull it out if you decide that you don't want it there anymore. Things maybe have improved a little bit since 2021, probably not significantly. So yeah, just kind of like as a bit of an overview. So basically service meshes, they extend Kubernetes for limitations in network traffic awareness and shaping capabilities. So things like percentages and whatnot. You basically add a sidecar proxy into every single one of your pods that you care to get more information about. And therefore based on doing that, you can get information on your entire network flow. You don't have to do any changes to your application or anything to do with the application container. The one, I guess, thing that it doesn't have though is it does not give you application level metrics. So let's take a look at a demo for this. So this first part here is the screen that you come to when you first hit Kiali. You can see all the different namespaces. There's some charts depending on whether you have traffic. So if you look over here, you can specifically see, hey, there's no inbound traffic for that one or that one or that one. So there's nothing to be seen there. Open telemetry has constant load on it so we can see things happening there. We have our Spring Pet Clinic and you can also see how many applications. There's the red because something is not working right. And then you can see the graphs of what is happening there, which is pretty cool. So if we click on graph right now, there's no load so you don't see anything happening because basically it is specifically looking for what load is happening there. So there's some things with traffic you can choose. You can choose what namespace. So, and then things like versioned app graph and then there's different things that you can display. So there's things like response time. You can do traffic rate and then there's things that will look a little bit afterward as well, but so just setting that up. If we go to our little demo application, which you saw a screenshot of earlier, basically it's just a to-do list. You add something of like, for instance, learn observability or maybe it would be something like enjoy KubeCon, which hopefully folks are doing so that hopefully it can be checked off. And then I decided to add one of eating dumplings because it was something I had to do while here and since this has happened, I can click done and check that off. So if we go back over here, now we can actually see that there is traffic and there's things happening here. We can see the different components. So you can see the to-do app namespace within that you have the UI, you have the backend, you have Postgres DB. You can see basically for each of the separate components with their pods and everything with that, you can see things like what the percentage of error is. You can see request per second. There's things for HTTP and RPC and just basically all of these individual things that you can see how they're connected because otherwise if you're just looking at your Kubernetes cluster and you see, hey, in my namespace, I have these different pods but you don't know how they're related whatsoever. And then there's also open telemetry that's running in another namespace and you can actually see the connection between these. And then after time, if there's no load because we're currently not doing anything with the page, you can see that some things have disappeared afterward as well. If I jump back to a few days ago, we can see there's this backend thing that's just sitting alone because that one didn't have a proxy in it so it doesn't really know what to do with it. But if we jump back to an hour ago which actually probably is more like two or three hours ago, we can see that it looks kind of like it did just a moment ago. So just kind of playing around with that. So again, this is a simpler application. We can also look into something like the Spring Pet Clinic which is a little bit more complicated. You probably can't see what all is happening here but there's things with different colors. For instance, the Wavefront proxy is not configured. So everything that is communicating with that is in red which is kind of useful. You can see the arrows in which where things are going. You can also check this thing called traffic animation which lets you see what direction the traffic is going in and there's different speeds of that and you can see all the other data that I was mentioning earlier with response times and with traffic rates that you can see there as well. And then, yeah, the 100% there for things failing is because again, Wavefront proxy is not set up. If we go to the actual application, we can play around in here and like add Matias to the database because why not? And then if we click around, just on different things that we're interacting with, maybe we look to see who all the people are and click on this random person that maybe exists. I don't know if they are a real person but if we just wait a moment here, we will actually be able to see more parts of our application if we go and refresh that. We can actually now see that we have the Istio Ingress Gateway. We can see all the different pieces. We have the API Gateway. We have the different services and within the service what all those are. We have the namespace for the Spring Pet Clinic and then all the things that it connects out to. So it's really cool where you can see just different pieces of what is happening in some sort of visual way that you can't have from just base Kubernetes. And this can help you figure out one is something wrong and then more details on that. And to make it even more complicated, if you looked at OpenTelemetry which has even more pieces, if you're looking at it at this level, you can't really tell exactly what is going on. You need to end up being able to zoom in to see more but there's just like so many different pieces and just you can fall along and figure out where things are going. And just like basically the different services here we don't see all of the errors as much. We can see what namespace the different parts are from as well. And then where a bunch of things are being collected over there, you can see that there's a central point for all of that and things that are like green, red, blue, just assuming you're not colorblind like this one, then can figure things out with that as well. So basically, yeah, there's just a bunch of information that you can figure out based on having a service mesh inside of your cluster. Okay, so that was talking about having something that a proxy or some sort of agent as running inside of every single one of your pods. There's a different form that is node-based and that is called EBPF. It stands for Extended Berkeley Packet Filter. Who here has heard of EBPF before? Okay, a decent number of people. Okay, so again, before we were looking at, we were looking at having something every single one of the pods which is over there. In this place, we are now observing at the level of the nodes. So basically, it's a pretty low-level functionality. So we have this kernel space, which is where basically all your kernel software is running. It's usually a pretty protected environment, which means it's also kind of difficult to change. Maybe you can submit something, some pull request and eventually down the road it'll get added. What you normally are fiddling around with or dealing with is the things that are over in the user space. You can kind of compare this in a way to JavaScript in a web page that's basically what the EBPF sandbox is in the kernel. You can run custom code, but you can't actually cause things to go wrong. And basically, in the kernel space, it can interact with kernel events such as network events and then you send things over to the user space, which has the SDK, the libraries, and the tools. So basically, when we were looking at Kubernetes initially for the infrastructure, we had this split between the control plane and we had it with the worker nodes as well. So previously, we were showing getting information specifically from the control plane. This time, we were gonna get some stuff from the worker nodes. So this is kind of what your worker nodes look like. You have your operating system. You have your container demon. You also have KubeLit, KubeProxy, and then you have your containers. And so this is just kind of like what's happening with your Kubernetes worker node. So most of the time, not always, of course, the operating system that is being used for your containers is using some sort of Linux. I mean, there are also Windows containers, which doesn't help you in this scenario. But basically, using the operating system in the node and via EBPF, you can inject code to create a metric since it also is using Linux. So basically, instead of all of the top level of sections and trying to collect from there, you're going beneath it. So all the container traffic eventually goes through the network interface of the underlying node and this is where it's being intercepted. So basically, this looks kind of similar to what we were seeing with the pod-based. But so you still have some sort of thing that collects. But then instead of having this be on every single one of your pods, you have one on each of the nodes. So chances are that you have fewer nodes than you have pods. So then there's a little less overhead and less stuff that you have running there. And then another thing also specifically about it is say if you're using pod-based, you need to make sure that you actually have a proxy with every single one of your pods. If you forget one, then you're not going to get network information on it. Whereas if you're putting on the node, you don't have to worry about, is it in this pod? Is it in that pod? It's just, is it on the node? So this is a visualization from using Hubble with Silium. So basically, it's kind of similar to what we were looking at with Keali. Like you can see things like with traffic, you can see what's happening, you can kind of see like, okay, what pieces are talking to other pieces here. You don't get things necessarily like, some of the metrics with like the response time or the throughput from here, but Silium itself does have these and you can actually see some of these things, not in Hubble, but like through Grafana. So let's see. Right, yeah, it's gonna switch. Okay, so basically you can see a bunch of the namespaces. Obviously if you had a ton, that would go pretty far down. So one other way you can do it is you can click choose namespace, you can pick something there. So for instance, if we decided to go to the ToDo app, which is the first one that we were showing a bit earlier, you can see like the database, the UI, the backend, and then we have IstioD as well there, which we didn't see earlier because we didn't specifically have a proxy running there, whereas you can see like for here all these connections and how things are like talking to each other, you can see a bunch of other data down there. If we hit done on something else here, so having traffic, if we go back over, now we can see, we have other things, we have the open telemetry collector, we can see the Ingress NGINX, which again is something we couldn't specifically see earlier because we didn't have a proxy running on there to be able to find it, but since this is at the node level, we can see all of this. There's a bunch of information down here that you can see like the destination, the source, the destination. Here you can't see a bunch of like the numbers of like the rates or anything that errors are things we're looking at earlier, but again, that stuff that you, Cilium itself does have, it's just as far as I'm aware, you can't actually visualize it in this way and there is a way to be able to do that with Grafana. Not going to be showing that via demo here, but I will show like a graph of what that ends up basically looking like for that afterward. It's gonna switch soon. Okay, so now if we look at open telemetry, again this is like way more chaotic. The bigger and more complex your application is, there is just so much more stuff that you have there and seeing how everything is connected. It's kind of hard for me to read most everything there, which is harder for you all, but like so you can see all the different types of services that are in there that are connected and be able to just like get more information based on that type of thing there. You can see the destination port and you can see whether it's forwarded and just basically another way of looking into getting more information from your cluster that you have there. Okay, so this is the part that I was mentioning that doesn't exactly exist in demo form, but basically you can have like the 8TDP request duration by source. You can see that there's levels of tracing that you are able to see too and this is stuff that is directly coming from Silium that you are getting into, that you set up with Grafana and you can see that as well. So between like the two, like seeing the stuff from Grafana and the stuff that we could see from Hubble, you can see basically the same type of thing that you could see in Keali, but doing it a different way. So just to kind of recap basically, you are injecting your proxy component on the node level for EBPF instead of on the pod level. It's using a Linux load level functionality leveraged like specifically for Kubernetes observability. It's quite fast growing in the CNCF landscape. Again, your application and the container for your application are not touched and you only need to configure your cluster once for this. Okay, yeah, then thanks, Tiffany. I'm taking over from here again for the final part. Now, yeah, just to recap quickly of the things we tried to cover before. So we could see with the initial part of tools covering the Kubernetes API, we were able to get a lot of metrics, but it was kind of difficult to say, I wanna isolate a problem within an application. For that, we have seen now with the technology of service meshes, we are able to pinpoint to say, okay, this is the component which is running slow or something is not all right, but we don't have the possibility to look inside of it. What we have also seen that to get this kind of information, we also had to configure and do more in our cluster. So either we have to inject the agent like within a pod, which triggers a restart of that pod which is being instrumented, or with the EVPF technology that we have to place something on each node, which would also have an effect on the cluster once you install it. So you better do this like in the beginning before you place your applications there, but then you get like the monitor of the network traffic of the cluster in total. Now, to bring this to an end, we're now gonna look into how are we gonna actually be able to find out if things are wrong within our application. So we say, we saw something is wrong, we kind of know where it is. Now we wanna pinpoint it and say, okay, what is the root cause of the problem from within my application? That means even though it's displayed here in a very simple way, it's not actually that simple, we're now gonna place the agent inside of the application. That also means this has a way higher impact and disruption of your overall system if you wanna apply it. Because either you have to rebuild the application or at least to repackage the container to put some client libraries inside that will do this monitoring aspect for you. In the end, I mean, the rest is kind of the same as it was before, we collect the metrics, put it into some database and exported them some dashboard. Now, the problem here is, there again, there isn't only one solution. So if you look at the various kind of application-based monitoring things, you'll probably come across open telemetry or an Elk stack, fluently for logging. Some of them can be combined and integrated. And there's certainly more. I just put out a few examples here of what I've seen before. Now, one technology which is definitely worth to mention in that space is open telemetry. So this is also one of the faster-growing CNCF projects. And in my opinion, this is taking a very, let's say healthy evolution opposed to all the other things in the CNCF landscape. So with the other things, we basically just see there are more and more solutions and we have to evaluate more, which is the right one for us. With open telemetry, it's more like a certain package of solution is being combined into one, which is definitely something that we as end users will benefit from because we don't have to evaluate so many different things. So the way it works and one of the central components is the so-called open telemetry collector. This kind of aggregates all the various sources of information and make sure, okay, on the collection side of things, there is a standard which can be used across various vendors. So whatever is being fed into this collector can be of different type. Of course, if you go in an application-level monitoring mode, you need programming language-specific agents. So they can be built on their own. There are already some available. They can feed into this collector. Other things like Kubernetes and cloud metrics can also be fed in there. And then on the other side, you can decide yourself where you wanna export it to and where you wanna visualize it. So the good thing is a lot of the commercial vendors have also agreed on that standard. So the products are now not competing on a proprietary collection level anymore. They are more like competing on a level how they're gonna handle that collected data and give you the most value out of it. And then it's your decision what you wanna pay for and what it's gonna be helpful. But in the end, there are open source solutions out there which can at least show us now what are the things that we're actually gonna get. So yeah, the question is now, how are we gonna put such an agent into our application? So as I said before, this is heavily dependent on what kind of programming language or framework you're using. Now, I have a bit of a Java background. That's why a lot of those examples are based on Java frameworks now. Either way, there is a coverage of multiple languages. So if you see that, that is a screenshot taken from the open telemetry website, these are all the languages where a currently agent solution exists for. And there's an open SDK. So if you wanna contribute to a meaningful open source project, please do that and write agents for more other programming languages. A subset of these can also be used in an automatic instrumentation way. Automatic means you don't have to place the agent in the code. You can load it alongside as a library with your application. So it means in terms of intrusiveness, you have to rebuild your container, but you don't have to recompile your application. So this is the way how things would look in a Java example. So this is a Docker file that shows basically that flow. I'm using a base container with Java 17. And at that point, I'm downloading the latest open telemetry Java agent and put it into my container. Here, I'm copying my own application jar file in the container. And when I invoke the Java call, when I start the container, it's just going to add this Java agent which has just been downloaded. So it's like halfway intrusive, I would say, because the actual application is not touched. So you can reuse your jar file that you had before, but you need to rebuild the container to basically load that library. Then you need to specify some environment variables so that the container will actually know where to send the data to. So it of course needs the open telemetry collector information. In this case, it is a built-in solution with Jego, but here is the endpoint where the collected data will be sent to. It also has a service name. This is basically something how this component will show up in the traces at a later point. There are other solutions where you actually go into the code. So this is a spring boot example using sleuth. In that case, you just add dependencies to your Java code and then you need to rebuild it. Of course, at that point, you don't need to add additional libraries to container. This dependency mechanism and the rebuild mechanism will pull it in into the jar. And in this case, you would have your client sitting in the jar file and you have basically a rebuild of everything. Same things go here. You need to know where the collector is running and you need to give your application a name, basically as an identifier in the traces later on. Same things exist for multiple different languages. So there's a Quarkus one here, for example. So I'm not gonna go through each and any of them just to show there are options out there. And most likely, the frameworks and program languages you use are already supported in one or the other way by various implementations. All right. So once you start doing that, you will also start getting application level metrics. So again, this is a sample of a Grafana dashboard, but now we're not seeing like CPU and memories from the operating system anymore. We're now seeing JVM metrics. So we see what is the JVM heap size usage? How often is the garbage collector running? All those things which are way more prone to debug an application on a deeper level than just scanning the high level details. And in order to get this information out of an application, you need to go deep inside of it. All right. So one also graduated CNCF project, I should probably mention at this point is called Jäger. This is actually a German name. So it means kind of Hunter. So it kind of hunts and track down the traces of animals to figure out maybe whatever the details of what they have been doing. Now this is a very simple screenshot. Here I have a trace with only one kind of nested span where you can see, okay, this is the application we had before on the outside. It's calling the UI application and within that is basically calling that backend component. The interesting part with this one, this one was not taken with open telemetry. So this one was actually something that we could see with service mesh before. And this is also the root, like the source of information for this screenshot. So if you use service meshes, you only see things on the outside. When does it enter components and when does it leave it again? But you don't see anything on the inside. If you instrument that with something like open telemetry, you're suddenly getting metrics from within the application. So now here, for example, I see the JDBC called from the Java perspective within the application. I can say, okay, this was the time that it's spent. So most likely there is no problem right here. So with this solution, you can basically granularly define how deep do I want to look into the application and what are the things that I want to pull out? Of course, it makes more sense if your application is becoming more complex. So if you can't read this, this is no problem. It just is basically to show how complex those traces can grow. And from here it would be very easy to spot if a certain call or a certain sub-call would take a time which totally stands out and is something you should potentially look into. All right. I also have a quick demo on that. So yeah, if you look in here into the open telemetry sample that I'm currently using, what am I doing here? Is that the right one? So yeah, here in Istio, you can also still see there is a Jager component. That's the one that I had shown just before. So in our sample cluster, we have put many solutions in there. So technically speaking, you wouldn't need a Grafana and a Jager on each and every instance, but we didn't have time to configure it explicitly and we didn't want them to interfere either. So here in open telemetry, you have another Jager instance. You have another Prometheus. That's where the data is stored. You also have the Grafana part as an alternative to visualize that data. All right. So now I guess I'm gonna switch over to Jager. So again, so you can see there are two different ones. This is the Jager from using the service mesh. So here you can see the spans don't have a lot of depth because it can only monitor the components on the outside. If you switch to the other one, which is being fed by open telemetry, we get all the data from within the services. So the way that works, you basically select the service. It gives you a list of all the things it can find. If you select one of them, you get like a kind of a time scale. You can also select like the sub-call within that service. And if you select find traces, you get kind of a time scale of how much has been collected over the given time span. And in here you can see these are your traces. Sorry. This is the depth with the spans. And here you can also see all of the services which are part of that trace. Let's select one of them. Maybe this one has 40 something. So now you can see this is your overall trace with all the spans that are happening there. If it gets too complicated for you to watch and isolate things, you can also use those sections here to hide things that you don't wanna have. So this would basically your first level of depth and you see which services are being invoked. Then you can see on a high level scale how much time width that have been taken. And then you can drill down into them and see the detail of the various components. Sorry for that back and forth here. It wasn't so easy to navigate while recording. And yeah, so these are the individual kind of times that are being spent. If you click into one of them, you can see the details of such a span. You will also see the details of what is actually collecting that span. So in this one, if we go into the process, we will be able to see this was the open telemetry agent for the programming language of Go. So in this sample, it's a mixture of various programming languages to show the polyglot support of open telemetry. All right. And maybe just, I'm not sure if I recorded this. Yeah, if I switch to the other Yeager instance, and I also select basically the same kind of component, then I will see, looking into such a trace, it only has a depth of one. It's basically the component and then the component talking to the open telemetry collector. So there's not much information that you can get out from the inside. And if you expand one second, wasn't happy with this one, I think. Yeah, if you expand that and also have a look into the process right here, then you will be able to see that this one has actually not been collected by open telemetry, but in this case by Istio. So of course, the two links we show here will at some point also overlap. I mean, that will still be the tricky thing for you to figure out which of those metrics do I really need? How deep do I need to go in order to debug my application and find out things? And so with that, we are almost coming to an end. So the only thing I just wanted to show and now we're going deep into a level of code is to say if the automatic instrumentation does not give me all the things that I need, then I can basically use the open telemetry API and insert things into an application code. So you don't need to understand the code here very much, but it's just like this with span annotation and this span attribute right here. They are not from the libraries of your application, they are from the open telemetry library. So with this one, I declare that this method will appear as an own span in the overall attribute. And with this attribute, I basically filter out or log out the life value of each invocation of that method. So these are the two things which you should pay attention to. And now later, if you have your span like your trace of your application, you can now see that now you can't read it very well, but this is actually the method that would not have been collected automatically, but we told it to do so and then it gets integrated into that trace. And if I expand it afterwards, I will also get the value of that variable that I annotated with span attribute. Yeah, so this is probably the deepest level that we can show right here. I mean, anything further would actually mean extending the API or going further into the code of the agent. I don't think this is what we want to do. I mean, our idea was to show you what are the different aspects of monitoring components in and around Kubernetes. And we've started from very high level and then went very deep with what open telemetry can deliver. Now, what does it mean for you? It's of course, yeah, we probably won't be able to tell you this is the right one and this is the wrong one. Hopefully we were able to give you a bit of an idea to say, okay, this is something which I should look into. Also, yeah, as I said before, if you have any feedback for us, this might, there might be a tool which we should potentially include in this. Definitely feel free to comment. And so just to close it down here in the characteristic, of course, we've seen this provides application metrics. You can do root cause analysis, but, and this is the big but, you have to change your application or container and your specific programming language needs to be supported. That's of course, totally not the case if you use service meshes or API server, but in turn you won't be able to dig so deep. Right, so I think with that, we have maybe five minutes left, so that hopefully gives a bit of time for question, but yeah, first of all, I would say, thanks for listening, I know it's been a long session, thanks for being here. So are there any questions or are you tired of listening to us by now? Our question about the Coupe CTL debug. Yeah. I want to know if there are any limitation, if I want to attach the image to the distruth image. Okay, so to repeat if I understood correctly, it was about the Coupe CTL debug thing I showed in the beginning of the slide and if there are any limitations. I mean, technically speaking, no. It really, your Kubernetes cluster needs to have access to this image that you want to attach. So it needs to be present in a registry which the cluster can access, but that's basically the only thing. And the other thing is of course, the debug command must be allowed by the API because very often this is restricted for a good reason because you can basically modify each of the running containers and add new binaries, but if the API call is allowed and you have that container image present, then you can do it. I mean, the one I showed is an image I built by myself. So I kind of assembled that image. This is the tools that I want to have in that image. And then I can now attach it to any other container. This is available on the Docker Hub. So technically speaking, everyone can use it. So there is, if I attach the container build from the open to a centralized image, that's no problem. No, that's no problem. I mean, let's say the base layer of that debug container might not even be so important because there is no process being started in that debug container. So the container process will still be the one from the initial application container. And it's just that the debug container will be loaded and all the binaries in there will then be accessible in the namespace of this process. So it's basically using that namespace technology, not the Kubernetes one, the Linux one, and just puts that one into it. Okay, okay, thank you. Yeah, thanks, good question. And the way that he had created that container, it was using a Docker file and installing things you need. One quick way of being able to do something like that is there's nixery.dev, and you can pull an image that is nixery.dev, slash, like you could do, slash kubectl, slash sh, slash jq, or slash whatever tool that you want, not whatever, but a lot of tools that you want, and it will basically pull an image down that has those things already in there, which is pretty useful. Right. Okay, thank you for your speech. From your speech, we know the, Romisos and Grafana are the best combination for the observability, especially for the metrics. But I know they use Romisos and Grafana to solve the problem about what's going on of the application. And Romisos relies on the Kubernetes cluster. But what if the cluster goes down? We confronted this situation. The cluster is going down. We cannot access the Grafana from outside the cluster. So what's going on, what's going on to do there? Okay, I think, to repeat the question, and please correct me if I understood it right. So the technology of Romisos and Grafana is very helpful to monitor the things in the cluster. But what if the cluster goes down and takes all the things down with it? Well, you don't necessarily need to run the Prometheus server and Grafana in the same cluster. So you can potentially, you can put your agents into the cluster and have Prometheus and Grafana in another cluster. There are also other high availability options for Grafana. I think there's Cortex. There's also Thanos for like persisting the data over a longer period of time. That means you would actually externalize the storage of your Prometheus data to outside of the cluster. I mean, if it crashes, you won't be able to access it in that very moment in time, but you're not gonna lose your metrics and your data. Okay, I see. So we should do some job of the architecture involvement, right? Probably, yeah. Thank you. Okay, thanks. Thank you for your sharing. And I think we cover many tools in this presentation and maybe for users to choose best of the ability system for themselves, maybe they maybe have to pick the right tool for them. So may I ask, do you have any recommendation? I think this may be a hard question. So maybe can you recommend, is there any information that can help us to choose the right of the presentation possibility tool for our application? Well, you're correct. This is a hard question. I mean, first of all, let me repeat and see if I got it correctly. So you asked if there's any further guidance for the various toolings to how to pick the right monitor or like observability solution for your problem. This, yeah, we kind of tried to narrow this scope down for you. I mean, of course, we cannot say, well, use this one, this will always work because like each application landscape is different and each scenario is different. And it certainly needs a good level of understanding from the application owners and the architects, what are actually the metrics which are relevant because they're not always the same. I mean, of course, and the way we approached, we went down from top to bottom and this is basically the thing we would always recommend. It doesn't necessarily have to be the same tools as we have, but for example, I would never recommend to say, okay, please use service mesh all the time because you might generate too much overhead for the things you never need. Or please use open telemetry in every application component because it will have to like refactor and rebuild all your applications. That's why those tools can provide value if you place them at the right point. But if you only put them in there for good, you probably pay more in overhead as you will get out in the end. What the right, I mean, my first suggestion would be try it out. I mean, all of the things we've shown here are free open source and available. So you can just start installing them and play around with them, especially like Istio and the API tools are super easy because you can just swap them in and out and figure out if they do the right job for you, then I will go on in that direction. And if it's not, then you can remove them again. On the other hand, I mean, when it comes to productive kind of workloads, it might also be an option, of course, to consider commercial solutions. This is something we didn't wanna do today for obvious reasons. But again, the commercial solutions will probably combine the things in a more efficient way as we have shown. But technically speaking, they don't have any other capabilities to monitor as the tools we've shown today. Most likely you can try out commercial solutions and see if this is what you need. But there is no, at least I'm not aware of any kind of guidance. So this is how you can approach and this will help you solve your problem. Otherwise, we wouldn't have to do this talk. Thank you. Thank you. If there's no more, then again, thanks from our side. Like to tell you as our guests and then enjoy the rest of the conference. Thank you.