 Hi, everyone. Welcome to the SIG instrumentation deep dive. We're going to start by introducing ourselves. Hello, everyone. So I'm Dimang Rezanit. I work for Red Hat as a software engineer. I've been working on Open Chief Monitoring for two years, I guess, and now I switched to the API server. And I also am a core tech lead for SIG instrumentation. Hi, everyone. I'm David Ashpole. I'm the other tech lead for SIG instrumentation. I've been working on CUBE for five or six years now, and I work for Google. Hey, everyone. I'm Han. I've been chairing SIG instrumentation for about three years now, working on Kubernetes for about four years. Yeah. And let's go into SIG instrumentation. This is our agenda. We're first going to talk about what SIG instrumentation actually is and what we do. Then we're going to talk about our SIG sub projects, things that we hope you guys may participate and contribute to. Then we're going to go into some of our main themes of work streams that we work on, like logs and traces and metrics. And then we'll talk about how you can contribute and where you can find us. First, let's go over the charter. Yeah, the charter is basically to cover the best practices for observability across Kubernetes components. So basically, this means while we don't own individual metrics, we own the practices and sort of the format and the tools that people use to instrument Kubernetes components like the CUBE API server, the scheduler, controller manager, all of these things that you are familiar with. We have a large number of sub projects. Here we have listed CUBE state metrics. You guys probably have heard of it. It's quite popular K log, which is used basically in all Kubernetes components, metric server, which is used for auto scaling. And we have many more sub projects. We also work a lot on metrics. So permit these metrics, which are instrumented across all of our Kubernetes components and logs, which are also in all of Kubernetes components. We're working on structured logging. That way you can get structured format instead of dealing with free form text and events as well as traces. So how do we do it? We have bi-weekly triage meetings where we go through all second instrumentation related issues and PRs. We triage them. We assign them to contributors that join us. And we hope that you guys will join us. We review all code changes for metrics and we develop new features enhancements and we'll go over some of the things that we're working on in this talk. And we also maintain sub-projects are the large number of sub-projects that we have. All right. For sub-projects. Yeah. So thank you for the introduction. I will go through the sub-projects, some of which I'm more familiar with since I'm also contributing to them. So the three most popular sub-projects that we have and that the group kind of govern as well as help to, like, better at the community in general are Cubestate Metric, Metric Server and Prometheus Adapter. Even though, like, you may not have heard of some of them, I will explain, like, their purpose and why they enter our scope as well. So the first one is Cubestate Metric, quite the most popular and the one that you are the most likely to have running alongside your Kubernetes cluster. So Cubestate Metric is a simple Prometheus exporter, which is like a piece of software that is able to convert any third-party data into metrics in the Prometheus format. And in the case of Cubestate Metric, we are getting metrics from all the Kubernetes objects in general, since those can be quite insightful in the cluster. But the reason why this is not part of Kubernetes directly is because the metrics can be quite verbose and maybe out of scope for, like, the normal Kubernetes setup. So we have, like, this exporter to expose this kind of data. And for an example, like, we have metrics about pods, about deployment, about stateful data that are useful in order to get any insights on your cluster or even, like, start building alerts when, I don't know, a deployment is failing or whatever. Some of the example that we have is that the Kube deployment spec replica, which allows you to know at one point in time how much replica your deployment had, as well as the status of the replica dated, which means that if there was some failures during a rolling update or whatever, like, you would be able to know it via this metric. So it's really insightful to investigate some failures and how does Cubestate Metric does that is. So Cubestate Metric is watching for events on the API server on a regular basis. And whenever there is a new object that is created, updated, or that there is any, like, mutation towards an object in Kubernetes, Cubestate Metric will get that information and convert it into a metric that would then be able to be collected by Prometheus and you would be then able to create alerts on it. Another project that is, that you may not have heard of but, like, might be running in your cluster because, like, it's essential nowadays in a Kubernetes cluster, which is metric server. It implements one of the API that we support as a group, which is the Resource Metrics API, which is, like, the metric that is in the common such as kubectl top, that essentially, like, allows you to know the resource usage of your pods and nodes as well. And some other use cases of this API is for autoscaling. So most of the basic autoscaling is done via the Resource Metrics API, where you can specify to the autoscaler at which point in terms of CPU usage do you want a new replica of your application. And this is done via the API that we own as a group. And it does that, like, essentially, it's what we call an aggregated API server metric server, which is, like, some kind of third-party API server through which the API server will send autoscaling request to. And then metric server itself will grab the CPU usage or memory usage in the kubelet and then return all this information to the kubectl API server. So if you have an application that's a kubectl that tries to get CPU usage info, then it will go through the API server and then metric server. Another project that is quite similar, but has a more bigger purpose, I would say, is Prometheus Adapter, which supports the three autoscaling API that we kind of own as a group, which is the Resource Custom and External Metrics API. What it does is that, well, there is some limitation when you are doing autoscaling with metric servers, since you only have, like, usage metrics. And you might want, I don't know, like, to autoscale your application based on the number of requests that are currently being processed by your application. And that is done via Prometheus Adapter, for example, but there are many other adapters. As long as they support the custom or external metrics API, they should be able to autoscale based on any kind of signal. And in the case of Prometheus Adapter, what it does is that when it receives a call from the API server to get, like, I don't know, metric server, like the number of items in your RabbitMQ queue, then it will convert that request to Prometheus Adapter and create the Prometheus server directly and return you the results so that your autoscaling pipeline will be able to directly, like, contact Prometheus Adapter to get any kind of information that would be related to your application. So now that we've gone through, like, a couple of projects that we own, I will go through the, like, we, as I mentioned in the beginning, we own the three major observability signals as part of our SIG. And I will now go through the logs. With the logs, like, we own the infrastructures around logging Kubernetes as well as some parts of the logging output that is produced by all the Kubernetes components, like, we don't know what the other group are doing with the logs and how they produce it, but we are responsible for helping them writing, logging into their code as well as, like, writing a good, on how to write a good log line in general. And we've invested quite a lot the past couple of years in logging because, well, before we had no real structure in our logging, the format was essentially inherited from the Kellogg, which is like the logging tool that we are using to produce log all over Kubernetes. And it was just a basic string. So whenever you wanted to correlate that with any other signal or try to do some pattern matching with it, you would have to go through, like, some complex regular expression pattern matching, so it wasn't ideal. So we came up with what we call structured logging, which where we try to have, like, build some kind of format or run logs in general so that they can be easily correlated and aggregated by third party, like, a logging platform. And in general, we try to have fields that are strings that are constant so that we can have patterns that are reoccurring between logs as well as key value pairs that can be used in later on. And we have two format for those today. The first one is text based, which is mostly used by developers because it's quite easy to go through and being able to investigate it is way simpler than JSON. And then we also have JSON, which is what most of the logging platform today ingests. And then it's easier for this platform to build querying on top of these log lines since like JSON is more formatted in general. So that's what we've worked on for the past couple of releases. It was fully integrated in the Qubelet in 121 with the help of a lot of various contributors because it was a very big effort. As well as later on added to the Qubes scheduler, which took quite some time because there was some e-cups, but we've done that in 124. And in general, there are still like many communities component where we would want to add structure logings. And that's why we would need even more contributors to help us build that in Kubernetes. And one effort that we've also found while doing that was that the previous logging tool that we are using in Kubernetes component, which is K-Log, has a lot of flags which might not be meaningful nowadays and flags that you might more see like on the logging platform rather than the actual logging client. So we've initiated a deprecation as well as a removal in all Kubernetes components. And these flags will be removed in 126. So if you are a component owner that are still using these flags, make sure to be worried of it because we will remove it in the next release. But that won't be a problem for any normal K-Log users because we are only doing that for the Kubernetes components, like any normal user of K-Log wouldn't have the flag removed. And we've done that to reduce the maintenance burden that incurred with K-Log in general because there was way too many flags. And if you want to have a look at this particular effort, there is a cap attached to the slide that you can check. But like now that we've had like structural logging, we've noticed that it wasn't complete like we still had some way to improve logging in general, which was still at context in what we called contextual logging. And this is new in Kubernetes like we've added it in alpha in 124. And there is only a couple of components that have it today. The idea behind contextual logging is that if you were to do logging in Kubernetes today, you would have to just inherit from like a global logger that you would share like between all your processes and all your components, which is not ideal if you want to have very specific behavior depending on the scenario you are in. So what we did is that now we will pass the logger via the context, which means that along your call chain you will be able to modify the logger and add some new information to it. The most common use cases that we would have is to make sure that once you reach a certain point in the call chain, you had key value pairs to all the logs that will be produced via this context. So let's say you are in the scheduler and you want that all the calls related to a pod being scheduled attach the name of the pod. Then you will be able to do that via contextual logging just by enforcing this particular level and value to all the logs. And yeah, this is implemented via a new API in Kellogg that was introduced by the group. And there is like many cut changes that will need to be made in order to spread that over all the Kubernetes component. And we really need like some additional contribution for that part of the code. Yes, come please help us. And we will need additional contributors, the working groups, structural logging. So I haven't talked about the group sig instrumentation, but in Kubernetes there is also the working group, which has like kind of smaller groups that are like related to a sig, but are very dedicated to one particular area of the code. Like in the case of structure logging, it was first meant to introduce a way to build structure logging Kubernetes. But now it has a bigger purpose since it's also to cover like contextual logging. There are two organizers, Marek and Patrick, you can reach out to them via the Slack channel. And you can go through also the bi-weekly meetings if you want to contribute. And this is one of the best area to contribute, I guess, as a new Kubernetes contributors because you will be able to discover various parts of the codebase. Since like everything needs to be changed in terms of logging, you will be able to have your hand into many aspects of the codebase. And it's also like well documented how to do the change. So that's a really good area to start your contribution career, I guess. Cool, let's talk about traces. Traces are exciting and actually pretty new to the Kubernetes project. So first, what is distributed tracing just really briefly? If a user makes a request to something, it often passes through a variety of services along its path. And we'd like to have an idea of what path that particular request took. And maybe it would be great if we got a cool graph or something that showed that to us visually. And that's what tracing provides. So there's a thing called a trace context that's propagated between components in your microservice architecture. And that has a trace ID and a span ID. And with that, each component can produce telemetry that we call spans that have a single unifying attribute across all of them. And that way, some trace back end can stitch those all together into a graph like you see there below. So in Kubernetes, we have a couple components that look very similar to this diagram here. The main one being the kube API server and etcd. But also we've gone ahead and instrumented the kubelet and container runtime as well, which serve requests back and forth to each other. A few things to note, we use open telemetry. The open telemetry go project has been stable for more than a year now. So that's what we use for tracing. And we export our traces in the open telemetry format. Okay, so what's the current state? So API server tracing was introduced in Kubernetes 122 and is planning to go beta in 126. So now is the time to try it out. Now is the time to provide feedback. Feel free to hit me up on Slack if you have any thoughts or if you've tried it out and want to see anything different. Kube tracing was just released as alpha in 125, and that's really exciting. And another big thing to call out that many of you may have been impacted by before 126, which is like now. We used an unstable version of the open telemetry library. So if you were trying to use open telemetry and Kubernetes client libraries at the same time, you might have had some problems, but don't worry, those are resolved now. Yay. Okay, so let's dive a little bit deeper into API server tracing because that's been a fun project I've been working on. So this is what it initially looked like in Yeager, which is another CNCF project as of Kubernetes 122. It's pretty bare bones. You can see the top span is for the user's call to the API server and the API server serving that. And then the middle one is the API servers at CD client making a request at CD. And the bottom one is actually coming from at CD and is responding to the API server. So as an aside, there's been this thing that's existed in Kubernetes for a long time, which is called I refer to as log based tracing. So basically, there's something that looks very much like trace instrumentation already sprinkled across the Kubernetes code base. And if something takes a long time, you'll get this giant blob of log lines printed out that sort of looks kind of like a trace and tells you, you know, sort of what happened. And so one of the improvements that landed, I think yesterday is that we were actually able to sort of sneak in and change those calls to also integrate with open telemetry based tracing. So now you can see the red boxes there highlighting some of the new things that have come from using existing text based instrumentation and migrating that to also use open telemetry. And actually I have a really quick demo. Let's do that. So first, I'm going to start up. I'm going to start up a Yeager pod locally. And second, I've written this. It's actually just a modified version of the API server integration test. But basically I'm going to configure API server tracing like so and set it to 100% sampling rate. I'm going to pass that to it. And then I've also turned on at CDs tracing here just for the integration test. And I've got it going to a local OTL PN point, which is supported by Yeager. And this will take 15 seconds. Oh, how do I clear the test cache? Do you remember? Let's do it. Okay. There we go. So this will take 15 seconds. We pop over here. So Yeager's up and running, but it doesn't have any traces yet. Okay, there it went. So now we should be able to pop in here and go in. Oh, the other thing I meant to mention is the API call that this is making is it's creating a node object. So that's what we've done. Set up an API server, create a node, and let's see what kind of traces we get back. So I'll do API server and I'll go ahead and look for a create operation. And then I'll click on the first result. And there we go. So this is actually the same as that screenshot, but it's much more fun because I can actually go in and look at all of the interesting attributes and stuff that have been added here. All that actually existed in API server code probably has existed for years, but hopefully is now much, much more usable. And so if I were trying to debug some issue with a create call, it would be much easier now because I can see a lot of details about exactly what was happening. Okay, cool. Let's talk briefly about the cubelet tracing. So this is a cubelet trace from the trace feature that landed in 125. And it's pretty basic, to be honest. So what you can see is the top line is from the cubelet. And that is when the cubelet started the request to create a container. And then the bottom line is from the container runtime, which is where it's serving the request to create a container. So as you can see, it's basically the same amount of time. And there are useful attributes, but it's almost like we could have just used a metric for this. So what's really missing from there? One of the things that's missing from the cubelet is that we'd really love to have some way to tie together a whole bunch of container runtime operations. Like in order to create a pod, I need to make a sandbox. I need to pull images. I need to create containers, right? And I'd like to see how all of those are related. So if I have a pod creation taking a while, I can go figure out which aspect of it actually was responsible for that. As far as general tracing, both in the cubelet and in the API server, we really want to add exemplars to the Prometheus metrics that are associated with traces so that someone using a trace backend and a metric backend like Prometheus can link back and forth between those two. And the same thing goes for logs. You can insert those common IDs like we were talking about earlier into the log messages, and that way you can jump from traces to logs, or even just find all of the logs associated with a particular request. And while I was at KubeCon, I made a proof of concept for what the cubelet tracing would look like if we had that unifying parent span. And you can see the span that we were looking at before is actually like this tiny one right here. So what before looked not very useful. Now that it's put in context, in the context of creating an entire pod, is actually much more valuable. So it just goes to show that it may just be adding a span, but it actually changes the nature of what you can figure out from your trace telemetry. And that's it. Cool. Let's talk about metrics. This is probably one of my favorite topics. I am sometimes referred to as the metrics guy. Let's go over some boring stuff. Yeah. Kubernetes uses Prometheus. Prometheus uses a client server architecture. So basically your software is instrumented with a Prometheus client and exposes a metric endpoint, which is text-based. This is scraped and stored in a time series database. So there was this one time I was on call and basically I was debugging some issue and I had to look at charts except one of the charts that I was wanting to look at had no data or it was blank. And I looked at the underlying data and there was no data. And after doing a bit of digging, it turned out that open source or upstream Kubernetes had basically renamed a metric. So I don't know how much you guys have thought about this, but one cannot simply rename metric because when you rename a metric, the metric actually just ceases to exist. You end up with a new metric. So all of your charts, which reference a metric by name, no, just don't do anything, right? And so there was this other effort going on at the same time called metrics overhaul. And what metrics overhaul was doing was it basically took a huge set of metrics in Kubernetes and tried to conform them to best practices in upstream Prometheus. And this is obviously like a noble effort. We should definitely have, we should adhere to standards. We want to do that in Kubernetes. The problem is this basically broke the entire world's ability to use charts continuously, right? Because you basically just deleted all the Kubernetes metrics from the control planes and cubelets, the scheduler, whatever, right? So as a result of this, we launched one of the larger initiatives in SIG instrumentation, basically something called the Kubernetes metrics framework. And what this does is this introduces stability classes to metrics and allows us to guarantee a certain API around metrics, right? So how did we do this? Well, like Prometheus doesn't natively allow you to add stability levels to metrics or metric definitions. So we wrapped all of the Prometheus descriptors and we added a metadata field called stability level. And so you can annotate metrics with a stability level and then we created a static analysis pipeline. So I don't know how familiar you guys are with static analysis. It's basically like compiler level stuff where we go through the entire Kubernetes code base. We analyze it. We look for all metric definitions and we look for stability classes. Then we parse that information and we guarantee that the structure for a stable metric does not violate the guarantees we provide the community. And so we have built that. We have been improving on that over time. And not only that, but this requires a lot of code, right? Like we don't want to break the entire world's ability to operate Kubernetes. That would be terrible. And we basically nearly did that with the metrics overhaul. We created a repo in component base for all of this code because this is a lot of code to prevent, you know, obviously tremendously bad breakages. And if you want to read the original cap, this was a few years ago that we started this effort. It's called it's a bit.ly link metric stability. We are extending this framework. So when we started this framework, we actually only had two stability levels. We had stable and we had alpha and stable metrics had guarantees to not change for at least one year. Right. So you can guarantee that your charts alerts will not break for a year. Alpha metrics don't have any such guarantees. We are now introducing beta, which is experimental. We have not yet defined what guarantees these have. And we are working on this and we would love community input on what would be reasonable for a beta class metric. Right. We also have internal metrics. These are likely to not have stability guarantees. Some metrics basically describe code or in implementation detail of Kubernetes and therefore changes when the code changes. Right. So these are like highly variable mutating type metrics. Those would be classified as internal would probably not have stability guarantees and you probably don't even want to chart them or define SLOs for them. So we may introduce stability guarantees for alpha metrics. We may this is on the table. You should come to our meetings and we can discuss it and, you know, we would love community input on what would be reasonable. So, yeah, we're establishing the guarantees for new stability levels. The static analysis pipeline that we built basically parses the entire Kubernetes code base, all of the metrics. And it basically allows us to auto generate documentation for every single Kubernetes metric defined in the code base. And this basically recently landed. And in 126, we will have documentation. You will be able to see every single metric that the Kubernetes, the core Kubernetes control plane outputs, the scheduler, the Kubelet API server controller manager. We even have metameter metrics about registered metrics. So we have a metric called registered metrics total. And how many stable metrics there are, how many alpha metrics there are. And the cap. There's a bit link extending stability. We're still working on this. This is an alpha. So if you want to contribute, come to our meetings, we are, we are like, you know, pretty friendly open, open seg. And I think one of the easier six to get involved with. Yeah, it's a little bit easier, I think, for newcomers. So we would love for you guys to get involved. Yeah, you should come to our sig meetings. We have a sig meeting every week. We basically alternate triage meetings where we go through PRs. And a regular sig meeting where we go over like more high level issues and concerns about sub projects and, you know, caps. And, you know, we are also open to any ideas that you guys might have on how to improve observability about Kubernetes in general. Yeah, we welcome you to participate. We're also looking for contributors for our sub projects like cube state metrics and permit use adapter. And in fact, we have been discussing deprecating some of our sub projects because of a lack of contributors. Like basically we can't support the large number of sub projects that we actually own. And we are in dire need of people to come and help us with it. So please, please come. Metrics server and structured logging are also in need of contributors. And you can contact Merrick. He has been driving the working group for structured logging. So these are the sig meetings that we have. They're basically weekly and we alternate like I said before. We have a slack channel sig instrumentation. We also have a mailing list on Google groups. And you can hit all of us up on slack. We are, you know, available basically all the time. We might not respond right away because some of us don't check slack like that. But it's not because we're ignoring you. It's just whenever we check slack is whenever we check slack. I don't use slack day to day for my job. So it's kind of intermittent for me. But yeah, I will respond to you as soon as I see it. Thank you for coming to our talk. And yeah, I hope to see you guys in sig instrumentation. Yeah, so if you guys have any questions, we'd be happy to answer questions about our sig. Hi. Thank you for the presentation. It was very insightful. I was wondering for the tracing of the cubelit if you've ever considered using kernel traces to collect more information. Because I'm actually doing like a research project on that. That sounds very cool. I had not thought of that. So happy to talk. Yeah. We also accept caps. So if you want to write a cap on how to integrate that stuff with Kubernetes, we are like definitely open and amenable to any ideas to improve. Thanks for the talk. So my question is around instrumentation libraries, right? So sounds like you moved to a buttonometry for tracing. And I'm maintaining client from to use a client goal lines. I'm interested like what's the future of, you know, metric instrumentation. Are you also try to kind of maybe converge on using open telemetry go for metric as well or how we can collaborate? Like what's the future here? David and I were actually just talking about this. So we did a lot of work wrapping the previous libraries. I have been suggesting to David that if open telemetry were to actually natively support stability classes, that there would be a path for us to move towards that. Otherwise, we can still get exemplars through the Prometheus library. So, yeah, those are the options on the table, actually. Nice. Thank you. I think, I mean, this, the stability kind of framework, like it's a good feature for client goal lines as well. So, yeah. Thank you. I would say as well, like we're clearly very committed to metric stability and we already have a lot of Prometheus endpoints. So whatever we do, we would make sure that we maintain, you know, the same metric names and attributes and stuff. So in some ways, this is just an internal implementation detail of how we, you know, how we manage code and Kubernetes. Thank you for your talk. So I'm wondering for the tracing part. I have instrumented, like, several operators with native open telemetry, like, client directly. I'm wondering if we will provide such, like, framework or library to instrument for common, like, operators, like, in Kubernetes. Yeah, I think operators are really tricky, right, because they get, they're driven based on, like, Kubernetes watch events. You know, something changed in Kubernetes and now they're reacting to it. We haven't quite solved the context propagation puzzle there, but it's something, I mean, I gave a talk in 2018, 2019 on a proposal that unfortunately hasn't come to fruition. So it's something like a lot of people are interested in and there has been a lot of excitement historically about. There's another project called K-Span, which addresses this partially that you might be able to look into. But, yeah, right now there isn't anything today. Oh, okay, yeah, thank you. Yeah, I was going to ask the same question about K-Span because it makes Kubernetes events to trace it, right? Does Kubernetes event also part of the SIG instrumentation? Yes, so we... I mean, so Kubernetes events are part of SIG instrumentation. So, and that means the, like, definition of an event, not the events produced by components. So you're correct that that is part of the code that we own within Kubernetes. Okay, yeah, sorry. I have one more last question. So I think Han, you mentioned there's a documentation for auto-generated, like, metrics, right? Can I ask how is that implemented? Because I think a lot of open source projects, we need such capabilities. We have a very elaborate static analysis framework, which basically runs through every single file in the Kubernetes code base, analyzes all of the metric definitions, and then outputs a YAML file with basically the entire metric definitions. And then we parse that into basically a markdown file, which ends up in our, the Kate's website directory. Okay, thank you so much, thank you. I guess we can close it here. Thank you everyone for attending. Yes, thank you everyone. Happy keep coming.