 Hi everyone, welcome to the talk on introduction and deep dive for SIG instrumentation. I am Richa. I'm a software engineer at Google and I'm also a contributor to SIG instrumentation since around last year. Hello everyone, I'm Damian. I'm working for Red Hat. I'm a main dinner of Kip State metric, metric server and permutative adapter, and I'm also a co-tech lead for SIG instrumentation. Hi everyone, I'm David Ashpole. I'm also co-lead or co-tech lead and I work at Google. Hi everyone, I'm Han. I'm also software engineer at Google and I am one of the chairs for SIG instrumentation. All right, so our agenda for today is to first go through a quick introduction of the SIG instrumentation, our group, what we do and what's our purpose, and then we'll dive into the concrete subject which are the sub-project that we are responsible for as well as the three principal signals or like observability signals that we can see every day that are metrics, logs and traces. And then we'll go over our future plans for the group and how you can help us and contribute. So what do we do? For those who are not familiar about the Kubernetes structure and the project, it is divided into groups of interests that have each specific area to work on. And in our case, our charter is to cover the best practices for cluster observability across all the Kubernetes components and to also create new components to cover some of the gaps that we are seeing. And for the sub-project that are covering those gaps, we have, for example, kubesake metric, klog and metric server, but we have way more. And we are also responsible for the signal metrics, logs and traces, and we have also events in Kubernetes that are similar to logs. And how do we do it? So we try and fix all the issues that are relevant to instrumentation in Kubernetes and the sub-project that we have. We also review all the code changes that are made to any signaling Kubernetes, metrics, log and traces, and we develop new features and announcements to drive observability in Kubernetes further. But we also still remain, like we still need to maintain all the sub-project that we have to not abandon them. So talking about sub-projects, today we will dive into four of them. Metrics server, properties adapter, usage metric collector and kubesake metric. So the first one is one of our oldest. And even if you don't know its name, metric server, you might have used it at some point because it is the source of kubectl top, the command that you can use to introspect pod utilization, node utilization. And it does that by implementing the resource metrics API. That is our way to connect to the auto-scaling pipeline and for metrics server to expose resource-based or like resource utilization metrics to Kubernetes. So you would see it via the HPA or the VPA that would use this API to auto-scale your application based on the CPU usage of your pod, for example. And the metric that it gets comes from kubelet and you have one instance on each of your node and metric server will scribe that and expose it via kubepi server to any application. But metric server is very lightweight and has a dedicated purpose and only supports resource usage auto-scaling. But if you want to do more than that, you can use another project that we own that is Prometheus adapter that implement three APIs instead of just the resource metrics API. It also implements the custom and external metric one. And those allow auto-scaling based on any kind of metric. So for example, if you want to auto-scale your application based on the rate of request that it is receiving, you can use a project such as Prometheus adapter. And now it works is it scrapes or like it queries Prometheus and then exposes those metrics to kubepi server that then exposes them to your auto-scaling pipeline. And because it queries Prometheus, it means that any metric that you collect in your Prometheus backend will be able to be used for auto-scaling. And nowadays there are even like solutions that goes even further such as KDA that allows you to do that on any kind of source of data, not only Prometheus, for example. The next project that I want to talk about is pretty new and it was pretty exciting because it was given to us earlier this year. And it covers a gap that we've noticed quite recently, which is that when scraping resource usage metrics, it doesn't scale well. And there are some limits in terms of performance that we were eating, that were preventing us from reducing the resolution to let's say one second to get even more data than we used to have for example, for capacity planning. And that particular tool is very specialized for these metrics and allow for one second scrapes so that in your dashboard, you can get a lot of data points for the resource utilization of your part of your workload for every second. It also performs aggregation at collection time, which means that instead of having all the time series for resource utilization stored in your monitoring backend, the project aggregates them ahead of time so only the key information are stored in your database. And also one of the advantages of this project is that it doesn't require any punctual knowledge, which can be quite difficult to work with sometime. And I will show you an example after one. But one thing that is important to mention is that it currently only works with C Group Z1, which means that from Kubernetes 126 on Word, you need to disable C Group Z2 in order to make it work. And we are actually looking for contributors to help us figuring out how we can fix that problem and contribution and welcome for sure. So an example to get the P95 utilization for your workload with a sampling rate of one second would be the one on the screen right now. So as you can see, there is no punctual at all. You have your aggregation that is explained and also which operation you want to perform. So it's pretty straightforward. And then it gives you the metrics that you want. So then CubeSafe Matrix, which is one of the most active subprojects that we maintain today, it is used to generate primitive style metrics based on any Kubernetes API. For example, metrics from pods deployment. And in the example I shared below, these two metrics can give you, for example, an idea on how an upgrade or like a rolling update went with your deployment, which is really insightful for cluster admins. And there was a request from quite some time to also add this support for CRDs because a cluster admin wanted metrics about their CRD. So we added a new feature to support those. And via configuration, any cluster admin can add their metrics for their CRDs. And here is an example for uptime on a full resource. However, we've noticed when working on that the original implementation wasn't ideal. And we are currently trying to improve it by first moving to a CRD so that we don't need to restart CubeSafe Matrix every time. And also try to simplify the configuration because we had some limit and there were some coroner cases that we didn't handle well, as well as the original syntax of the configuration wasn't as what we wanted, like it wasn't the best for the users. So we are trying to figure that out right now. So if you want to contribute, also feel free to join us. And that's it for the subproject. I will let Richard work you through the metrics. Thanks, Damian. So let's talk about metrics and Kubernetes. Kubernetes uses Prometheus to instrument a ton of metrics that can be consumed by software that understand the Prometheus metric format. And these software apps can then develop monitoring in the form of dashboards and alerts on top of these consumed metrics to monitor Kubernetes workloads. Because Prometheus has a client server architecture, so Kubernetes components are instrumented using Prometheus client libraries. And the metrics are exposed via HTTP on a slash metrics endpoint that is text based. So before I jump into one of the bigger projects that falls under SIG instrumentation metrics arena, I wanted to give a bit of context. A few years ago, SIG instrumentation was involved in a metrics overhaul project in which what we were trying to do was trying to bring the then current Kubernetes metrics up to the recommended or Prometheus standard. And as a part of that project, what we ended up doing was we renamed a bunch of then existing metrics. Now, renaming a metric is tricky because when you do that, the original metric, it ceases to exist. And what you have done is you've created a new metric in place of the original metric. So all the dashboards and the alerts that were still referencing the original metrics, they stop working, they break. And that's what we ended up doing. So we broke monitoring for all the Kubernetes users who were still relying on these original metrics that we had renamed as a part of metrics overhaul. And that's how we realized that one cannot just simply rename metrics. And so to prevent this from happening again, SIG instrumentation came up with the concept of having a metrics framework around Kubernetes metrics to express certain stability guarantees. This framework also helped us have some automation in the form of certain checks to prevent contributors from introducing changes to metrics that can cause breakage in the functionality for the end user of these metrics. It also provided a mechanism to centralize all the instrumentation related code and processes in one place. And here is a link that you can visit to read more about Kubernetes metrics framework. So stability for Kubernetes metrics currently is expressed in the form of stability levels or classes. And right now we have these four classes. Initially, we started with just having alpha metrics and stable metrics. So for alpha metrics, they can change anytime. They do not have any stability guarantees. But a stable metric, they cannot change arbitrarily. They have certain well-defined stability guarantees. For example, if you wanted to deprecate a stable metric, you would have to provide a public announcement for its deprecation. And even after the deprecation, the metric is going to be supported for a specific period of time. But with these two levels, we realized that going from having no stability guarantees to suddenly making a metric immutable was too much of a change. And there was a need for introducing two more levels. So we had internal and beta levels come up. Beta level was introduced as this transitory phase for a metric that was on its journey to becoming a stable metric. So beta level metrics, they are likely to have more stability guarantees than alpha metrics. But internal metrics, these are more fluid kind of metrics that are very tightly coupled with the Kubernetes code base. And as the code base evolves, these metrics can change structure. So they do not have any stability guarantees associated with them. And it's not even recommended to use internal level metrics for monitoring Kubernetes workloads. You can read more about the stability levels in the first link that's mentioned here. And if you want to know more about the process around deprecating a metric, the second link is helpful. Talking about some of the recent additions that we have made for metrics, we recently released auto-generated documentation for every single metric that exists in the Kubernetes code base, except for internal level metrics. We have done this using a very elaborate static analysis pipeline, which parses all the files that exist for the Kubernetes code, identifies all the metrics definitions along with their stability classes. And as you can see in the picture, it generates a very nicely formatted documentation along with the metric name, description about the metric, the stability level which it is at, the type of the metric, and the labels that it exposes. This documentation we hope can become handy in the time when you are debugging some cluster issues and you want to quickly identify some signals that the components are emitting that can be useful during troubleshooting for your cluster issues. There is a new metrics endpoint also that we have introduced recently for all the Kubernetes control pane components. This recently went GA in 129. I worked personally on this when I had joined SIG instrumentation last year. So it will be cool if you guys can check this out and let me know if you have any questions. So this endpoint returns the SLI health data regarding the different Kubernetes control pane components health. So there are different health checks that are performed. And when you invoke this endpoint, it's going to give you details about the health checks that were done for those components. This endpoint, it exposes two metrics. One is a gauge denoting the current state of the health check that was performed for the component. And the second one is a counter that denotes the cumulative result for the health check that was done for that particular component. The example is showing the metrics SLIs endpoint for Kube API server. And as I said before, the first metric here is the gauge that's telling you that the result for the pink health check that it did was a success. And the second metric is the Kubernetes health checks total. It's telling you that the result of the pink health checks seen so far was success twice. This metric is intended to be scraped at a higher frequency because these are low cardinality metrics. And because they can be scraped at higher frequency, they can give you more granular signal about the health of your Kubernetes components and can be useful to compute SLO or availability stats for your cluster. You can read more about these endpoints in the link that's provided here. And we have also introduced a beta level metric. It's called the feature enablement metric, which exposes all the feature gates that exist in the Kubernetes version your cluster is currently using. And it will also tell you whether the feature is enabled or disabled at a given point in time. So in the example, you can see that API priority and fairness, a beta feature is enabled in the cluster, but API server identity and alpha feature is currently disabled in your cluster. That's it about metrics. And up next to talk about logs is Han. Thanks. Hi, everyone. My name is Han. My GitHub handle is logical Han, but today I'm going to be talking about logs. So I guess you can call me logs. It go Han. Sorry. I didn't actually run this by my co speakers here because I, I feared that they would veto me, but I had to make the joke. So serious business logging. So in Kubernetes, we use this library called K log short for Kubernetes logger. It is actually forked from a different library called G log, which was short for Google logger. And it's been heavily modified and adapted for Kubernetes. As you can see by the snippet here, it knows how to parse Kubernetes objects into a string friendly format also conforms to the logger interface. So for those of you who don't know logger, it is a side project by Tim Hawken to introduce a log API. And what this does is it decouples the implementation of the log from the writing of the log. So basically it's an API over loggers. And the reason for this is because oftentimes the people who are writing logs are different from the people who consume them. So Kubernetes developers are the ones writing these logs often during development. And it is cluster administrators who care about ingesting logs because they want to debug their cluster. So what we have done is integrated logger directly into K log. So K log is not only an implementation of a logger. It also exposes an API that allows you to inject other loggers into it, which is a little bit confusing. But basically that allows you to have two output logs in multiple formats. And specifically, we support today text and JSON output. Logs are notoriously problematic. I don't know how many of you out here have probably run a grep for logs using an error string, like literally grep error. Probably all of you. I've done it myself. But that isn't really the best way to look for logs, right? And we can do better than that. So one of the efforts that our SIG embarked on was structured logging, which basically allows you to append key value pairs, typed key value pairs to your logs in order to get typed information in your output. And this allows you to have a systematic output for log messages. So that makes it easier for you to find the specific log line that you are looking for or even output the stuff into a nicely structured JSON. So what this looks like in practice is this. This is text-based format. And as you can see, you have your log header, which is a timestamp, you have your message, and then you have random key value pairs. And you can see an example of, for instance, a pod. And you can see how easy it would be to grep for that. In JSON, you would have nicely output format like this. And you could ingest this into a database and query against whatever relevant fields you would want. So pretty helpful. But we realized that there was something missing. And that's not everywhere in the Kubernetes code base is it easy to pass parameters around for logs. And it's not always practical to just start appending a bunch of parameters to method names just so you can pass parameters into structured logs. So we embarked on this other effort called contextual logging, which allows you to append key value pairs to a context. And it is actually good go-lang practice to pass context along. So your go routines don't want to stop. And so what we can do is we can piggyback off of an existing best practice. We can append metadata about a request that we want instrumented. And then we can pass that to a logger so that we have access to data where normally you might not be able to get it. And this allows us to have pod level information in maybe deeply nested request methods. So that makes stuff a lot easier. This effort is currently in beta. And we have converted kube scheduler, but we are looking for contributors to help us migrate the rest of the code base. So kube controller manager and API server and kubelet would be next. So if you are interested in helping out with our logging efforts, we have spun out a working group. And the organizers are American Patrick. And they are available on Slack. And there is a regular working group meeting that deals with the structural logging and contextual logging issues that we have mentioned here. And next, I believe, is David. Cool. Hi, everyone. I'm David. And I'm going to tell you about tracing in Kubernetes. So to start out with, metrics and logs have been around in kube for quite a while. But tracing is relatively new. So for those who aren't familiar, distributed tracing is a way to emit telemetry from multiple different applications and somehow combine them back together to get a picture of what happened to a single request along the way. So the user might make a request to a front end. And then that might require making a request to a back end. And you would like to be able to combine those into a single view for that particular request to kind of get context around what was going on, what maybe caused it to be fast or slow, et cetera. And in order to do that, distributed tracing attaches, generates and attaches an ID to that specific request. We call that a trace context. And it's a W3C standard. And that way, the front end in this case, when it writes telemetry, can attach that ID and so can the back end. And then at the end, you can group by it in order to reconstruct this tree. So that's the basics. And in kube, we use open telemetry, which has been gaining traction recently. But we only use it for tracing today. So there's a few places in Kubernetes that sort of match that server with nested server model. One of those is the API server in at CD, where it might be very difficult if you're debugging an API server issue to figure out, okay, I see a log here about this request, but then how do I know which at CD transaction that's associated with. So if you're using distributed tracing here, you'll get a span for the API server. And then you'll see the matching span in at CD that'll help you figure out that maybe the problem was because at CD's right was slow, or maybe it was in the API server waiting on authorization or something like that. And the other place where we have this model is between the kubelet and the container runtime, right, where you might, you're creating a pod and that involves pulling the image and starting the container. And you don't know necessarily if your pod's starting up slowly because it's the kubelet waiting on the API server, or if it's because the container runtime is slow creating files or doing C group stuff, right. So it'd be nice to know where exactly the problem is. And we've been hard at work on this for quite a few releases now. And they, both of these features API server and kubelet tracing went beta in 127. And that was two releases ago. So since then, we've been getting a lot of feedback. We've been making a lot of small bug fixes. And we hope to promote both of them to GA in the near future, but not a whole lot has changed. If you say watch this talk last year to give you an example of what this looks like, because I think these are pretty cool. This is an example of an API server trace. And you can see the at CD span in there. And you can see that actually you can't see, but there's a little authentication span here that it's doing. So you can sort of see the breakdown of where it's spending time on its requests. And if there was something wrong, this would be very useful for identifying exactly where in the API server where at CD, the problem was likely to lie. And same thing with the kubelet. The current state is that if you create a pod and that request or in that trace is sampled, then you'll get a view that looks like this that shows the breakdown of here's pulling the object and creating the sandbox and starting the containers, et cetera can be really helpful, especially if you're trying to optimize for fast pod startup. So that's all well and good. Let's talk about where we're going in the future with this now that tracing is sort of entering a more mature phase in Kubernetes. So this says current, but really this is maybe the previous way that most people would debug things. Kubernetes components is that you would get an alert, right? For some metric that's not behaving the way you expect. You go check out your dashboards and you find kind of approximately what's wrong. Maybe it's a problem isolated to one node. Maybe it's a problem isolated to a particular replica of the API server. And you find some metadata that correlates with the problems you're seeing, right? And so the next thing you do is you go search through the logs for that particular node. What was the kubelet doing at this time? What was the container runtime doing at this time? See if you can find any clues about what might be going wrong. And then traces today are mostly still used as like kind of a special debugging tool, right? You turn it on, you generate some data, you maybe run some test scenarios to get this really cool detailed tracing stuff, but you don't generally run it all the time, at least in Kubernetes today. But we hope for a better future. So ideally, we want to be able to get from the metrics we start with, right? We get an alert and we'd like to be able to look at our dashboards and use things like exemplars to link from the latency spike in our dashboard to an example trace that shows us exactly what was going on, or an example like pod startup or API server request that demonstrates the issue that we were seeing that generated the alert for us. So one of our plans is to add exemplars to the metrics that we produce. The other thing is you can actually move between traces and logs. So you can start with logs or start with traces and move back and forth if you attach the trace metadata that I talked about earlier, that context to your logs that you write. And so the feature that Han was talking about earlier, contextual logging will allow us to easily do that everywhere that we have tracing so that if you have a trace that shows an example of what's going on, you can easily go find the logs that are associated with exactly what was happening at that point in time. So this is where we'd really like to get to and yeah, that's kind of the things that we're going to work on moving forward. So we're an open SIG. We're all friendly, I promise. And we love new contributors or people who have experience trying to use any of the features we've talked about today that's all very helpful for us. So we'd love your involvement. The best way is probably to attend our SIG meetings or message us on Slack. We definitely need help with reviews or debugging issues that people have. And especially some of our sub-projects could really use contributors. If you're interested in Cube State or the Prometheus adapter, you're welcome to contact Damian here. The metric server is you can contact Merrick if you're interested in helping with that. And if you're interested, again, in contextual logging or any of the logging efforts, Patrick is a great contact for that. And this is when we meet, where we meet, and we'd be happy to see everyone. And that's it for our talk. Thank you everyone for coming. I think we have a couple minutes for questions. I guess use the microphone there if you have questions. Hey, good to hear about the developments. I want to ask, so you're using OTLP for traces now, but the metrics are exposed in Prometheus format, right? That's correct. Do you think it's reasonable, feasible? Is it on the road map to start exposing metrics or logs? Maybe sending them in OTLP to an OTLP? So that's something we've talked about kind of informally. I think right now we have to just balance the trade-offs of having an extra format we support, which means maybe more ways that people can report issues with the potential benefits of it. So people have been very pleased with our Prometheus metrics for a long time and there's a lot of dashboards built around them. So OTLP is super cool and is definitely becoming more popular. But I think for us, we just have to evaluate the pros and cons of doing it versus the cost of maintaining the additional format. So we're certainly not against it in any way, but we haven't put together a justification for why we would do something like that. Sure, that makes sense. So that's definitely not crossing it out, but not necessarily at the moment, no rush with it. No rush. If you have a compelling reason why you really want OTLP from something, reach out to us and we can evaluate that. Yeah, makes sense. I actually don't know. I think maybe you could quickly describe what exemplars work like in Prometheus, how they're exposed. I actually don't know. Yeah, so Prometheus does have support or it's a little complex. Open metrics, which is very, very similar to Prometheus, supports exemplars if you've ever seen the hash and then a big trace ID equals blah. That's what an exemplar would look like in the text format for open metrics. We currently use Prometheus and not open metrics, so that's sort of part of what we're trying to figure out as we try and get to that ideal end state that I put up. Hi, I'm not sure if this is the right place for the talk, this question, but we would like to get like the ephemeral storage usage of our pods and I think we have to install some third party. Is that something that would be considered to be taken up by the any of the sub projects that were talked about here? This is very likely signaled. They're funnily enough in my previous life as a signaled approver. I implemented the ephemeral store. I worked on the ephemeral storage feature and the metrics for it, so I'd be happy to chat afterwards as well. Yeah. All right, thank you so much everyone for attending. We really appreciated having everyone here and have a nice KubeCon. Come chat with us if you want.