 Hey, everyone. Welcome to the introduction and deep dive for SIG Instrumentation. My name is Han. I am a software engineer at Google. I've been working on SIG Instrumentation, Instrumentation Adjacent Stuff for probably about five and a half years. This is, and I'm Damien. I've been working, like I'm working for Red Hat as a software engineer. And I'm a co-TL for SIG Instrumentation. And I've been working on monitoring for almost four years now, all this time in Kubernetes. Although I focus more on cube APIs over these days, I'm still heavily involved in SIG Instrumentation. We have a number of co-leads. They are not present today, but David Asheville and Bruncher. Yeah, Bruncher. So today, this is going to be our agenda. We're going to go over SIG Instrumentation. We're going to go over the basic atomics of what we observe in Kubernetes. We're also going to go into SIG Subprojects and our future plans. So the Charter for SIG Instrumentation is basically to cover the best practices for observability in Kubernetes. And this is important, since everyone wants to know how to operate Kubernetes clusters. We steward various SIGs and the way that they instrument their components. For instance, a cube API server, cube scheduler, the kubelet, cube proxy. And the way that they do this is generally with metrics, logs, traces, events. We have a number of subprojects which help other people observe what's actually going on inside your Kubernetes clusters, like cube state metrics. Metric server allows you to horizontally and vertically scale your Kubernetes cluster. So how do we do it? We run bi-weekly SIG Instrumentation meeting. And on alternating weeks, we run a Treeout session where we go through GitHub issues and pull requests and ensure that they are addressed. We review code changes and we develop new features. So if you guys have any ideas on how to improve instrumentation observability, you should join our sessions. We also maintain the various subprojects. And we have a number of subprojects. And Damian will go into that a little bit later. So first, I'm going to start with logs. Logs are the most granular piece of data that you're going to have in your Kubernetes cluster. So when you are debugging an issue, oftentimes the pinpointed stuff that you are looking at is going to be derived from logs. And internally, how this works is we have this thing called K-Log. It is forked from a library called G-Log. And it is modified specifically for Kubernetes. And if you see the code snippet, you can see that we are basically understand the basic structure of Kubernetes objects so that we can print these things out in various logs. It conforms to the logger interface, which is a generic interface that allows you to inject logging implementations into whatever it is that logger is wrapping. This has allowed us to do a number of things upstream, including structured logging, and we're going to get into that in a sec. Currently, our logs default to text-based format, which means everything is just lines of text. But we also offer JSON format. Not everyone realizes this, I think. One of the things that we have been working on is structured logging. So basically, what we've done in the past few years is we have modified the call sites for all of our logs so that, well, we are actually in the process of migrating all the call sites, but we're about halfway done. We can actually annotate logs with structured information about Kubernetes objects. That way we can pass pods or nodes and have node information or pod information automatically output in Kubernetes logs. The reason why this is good is because it allows you to maintain the necessary information that you want in your logs without necessarily having to pipe it through everywhere. This is also very convenient to use because the way that we're outputting logs is structured, which means you don't necessarily have to have very obtuse, regular expressions parsing your logs. There is a more systematic format, and I'll show you what that looks like. So basically, we have things that look like key values in the text-based format, and if you see the example, then you can see that the pod's value here is kubesystem, kubedns. So this will automatically be output for any of the relevant logs nested in these call sites. The same output can be output as JSON, so you can optionally ingest Kubernetes logs in JSON format, and this makes stuff much more easy to ingest. You don't even need any regular expressions if you do stuff this way, unless you're parsing a value from a JSON key. This is actually quite convenient if you want to insert stuff into a database because now you have stuff in key value format. One thing that we have been working on recently is contextual logging. So not only do we have structured logging where we can pass like pod information, node information to logs, but now we're also piping context, so we can embed data in the context and have this automatically propagate to relevant call sites. So this again makes stuff easier to pass and log message is more consistent. It also makes the Kubernetes code base a little bit more easy to parse because we're not passing a zillion things to every function. Instead, you can just embed it in the context and pass the context around, which is go best practice, and this data becomes available to you in all of the relevant log sites. Leading this initiative is Patrick, and we have a working group called Structured Logging, and they meet us regularly. Thursdays, it seems like at 1530 British time, and they could use our help. We're not completely done migrating, so anyone who wants to contribute is welcome to come and join the structured logging group. Metrics is actually probably my favorite topic. This is the thing that I am most familiar with in Kubernetes and the thing that I have actively worked to improve the most. In Kubernetes, we use Prometheus, and for those of you who don't know, this is the basic architecture of Prometheus. There is a client. Generally, components expose a metrics endpoint with a metrics payload, and these metrics are scraped by some scraping agent and ingested into a time series database. We are currently actually exploring a push model, so this has not actually been decided, but it has been a recent topic of discussion in our SIG meetings, where we are thinking about being able to push metrics directly to a time series database, open telemetry cell, and while we haven't actually decided one way or another, it is something that we are thinking about, and if you are interested in this topic, you should join our meetings and we would love your feedback on it. So the way I got involved in Kubernetes was actually, it's kind of funny, we had metrics at Google and various charts and alerts, and every once in a while, these charts and alerts would just stop working, and so I would poke around, try to figure out what happened, it turns out that people would rename metrics. You actually can't rename metrics, so when you're renaming a metric, what you're basically doing is you're deleting a metric and creating a brand new one. So when you have a chart, which presumes the name of a metric, and you change the name of that metric, that chart just does nothing. You end up with no data. So there was this pretty big initiative that was happening called Metrics Overhaul, and in order to kind of get the shape of the Kubernetes code base into more proper, Prometheus form, there was a big refactoring that happened where we basically renamed all of these metrics, which would have basically broken all of your charts and alerts. So this was not a great thing, so we started working on this thing called the Metrics Framework, and the reason why this is important is you don't want your charts to just start stop working on some minor version boundary, right? That would be pretty undesirable. You have to rewrite these things. If the schema changes, then you have to rewrite your alerts, and that would be a pain. And in Kubernetes, we really strive for compatibility across versions. So instead, we decided to make metrics an API. And so what we did was we wrapped the Prometheus libraries and we annotated metrics with a stability level. And what this allows us to do is we have the Static Analysis Framework, and it basically ensures that people are not mutating metrics or breaking them and their contracts so that you can ensure that your charts and your alerts will continue to function across minor version boundaries. And we have very stability levels. We have stable, we have beta, we have alpha. They have different stability guarantees. The stability guarantees you can see here in this table. And depending on your stability level, you will be able to reliably use your charts and metrics over the lifetime of that metric. And deprecation follows rules, which is in the official Kubernetes deprecation documentation. So while we were building the stability framework, we built this Static Analysis Pipeline, which allows us to ensure that people aren't actually breaking the metrics that everyone here depends on. But this ended up being actually quite useful for another reason, like Static Analysis basically parses the entire Kubernetes code base in all of the metric call sites. And we realized we could actually auto-generate metrics documentation from this because you're parsing the entire Kubernetes code base. You have references to all the metrics. So now we have auto-generated and documentation for all of the metrics in the Kubernetes code base. And you can see that in official Kubernetes documentation website. It's pretty cool. Not only that, but we instrumented our instrumentation. We have metrics about the number of stable metrics that you have, the number of beta metrics that you have, the number of alpha metrics. We also added metrics to see what features are enabled since across minor version boundaries, you can have various features getting enabled. And it's important to know what's going on in your cluster. More recently though, we worked on this thing called Component Health SLIs, which I'm quite excited about, but it basically exposes liveness and readiness data in metrics format. And the reason why this is important is because we are currently working upstream to improve upgrades. And what we'd like to do is to plug these types of metrics into the upgrade sequence so that upgrades are safer. So like for instance, cube ADM progresses through a number of phases in an upgrade sequence. And currently while there are pre-flight checks to ensure that your upgrade doesn't work out, we don't actually check to ensure that the Kubernetes control plane has not imploded before and after any sequence. And by exposing liveness and readiness data from let's say the cube API server and the cube scheduler and controller manager, we can actually check for anomalies before and after the upgrade sequence. And this will allow us, for instance, to halt an upgrade before destroying the cluster. So this is part of a larger initiative that we are pursuing across a number of SIGs. And this is one of the constituent elements of that. So exciting stuff. Okay, so Anne talked about logs and metrics which have been there in Kubernetes for almost since the start essentially. But more recently we've had it tracing as well to Kubernetes and for those we don't know what distributed tracing is. You essentially want to gather telemetry data from different sources and somehow merge them together in order to get a general view of what's happening in your cluster. So if you were to take a request, it would essentially follow it through the distributed system and give you the exact time it took in each of these steps. And it does that by propagating a trace context from one system to another. And so with that trace context, like for example, if a user worked to make a query to a front end, the front end would then make a query to the backend, but with a trace context so that the telemetry data that the backend is emitting would have a context that could then be gathered and used to merge the data from both the front end and the backend together to know what exactly happened to that request. And this would allow us to build the graph and a tree with all the different steps that the request took and how long it took in each of these steps. And to do that, we are using the open telemetry library and kind of mentioned it, but we are only using this library for tracing and not for metrics and logs at the moment. So in Kubernetes, we have this kind of front end, backend or backend and embedded backend system and there are actually two components that we have that have this infrastructure and we really wanted to use tracing there, which are first the CUBE API server. And as in this example, I'm pretty sure a lot of you have to experience slowness in the API server and didn't really know what caused this particular slowness. It could be in the API server, it could be at CD. And our goal there was to really be able to pinpoint where the slowness was coming from. So by providing a trace context in both the API server request and the request that is made to at CD, we can surface that information to the users and then it's easier to know what caused the slowness. And the other component is Cubelet and most specifically the relationship between Cubelet and the container runtime because that's where the pod creation is happening and that's the request that we want to measure because it's super useful to know how long a pod startup took and what cause slowness there. Like it could be the creation of the container, the creation of the sandbox. The image pulling from the registry, it could be many reasons. And as a cluster administrator, that's something you wanna know. So that's why you invested heavily into these two components and today it's available in Kubernetes. And especially like it has been beta since 127 and since then we've invested a lot in fixing the couple of bugs that were reported and also like trying to gather as much feedback early on to make sure that the feature was stable, that was getting stable and ready to be consumed by the majority of the users. We were initially targeting GA for 130 but the time window already closed and we haven't graduated because there was a kind of last minute bug that we have to fix for the QBP server. But we are on track to getting a GA in the upcoming releases. So to give you an example of what it will look like today for the relationship between the QBP server and HCD. This is like a Yeager UI where I'm pretty sure you cannot really see what this bound refers to. But you can see that like there is one particular one that is about like the time it took for the request to be authenticated to the QBP server. Then you have the time that it spent in the QBP server and as well as the time that it spent in HCD. So you can see where an issue could arise and it's like a way better view than if you add just logs or simple metrics. And for QBlet it's even more detailed because we are like there are more steps. So you can know exactly like how long took the creation of the sandbox, how long took the image pooling and the creation of the actual container. So it's super useful if you want to optimize the spot startup which is something that a lot of people are trying to do. And so I was saying that we are going to work on making the feature GA but that's not the only thing that we have in mind for tracing. And the thing that we want to do is make it the first class citizens of the observability kind of user story in Kubernetes. Because at the moment like the current doing impasse that user would take and kind of the old way of doing things is to first be alerted about a problem, look into metrics to find approximately where the issue is coming from and find particular patterns. So an M space, a node or whatever it is. And then based on this pattern, look at the logs and try to find what's happening. And for the traces like it's as a kind of per case basis where sometime we will use them but not always, sometime for development during development cycles but it's not very well integrated into the current structures of debugging. And so what we want to do is that whenever cluster admin would get notified for let's say a SLO violation, they would first look at the metrics to know which request is slow. And then in the request we would have exemplars which would essentially tie a trace ID to the metric and this trace ID will be a kind of an example trace of this particular behavior, so the slowness. And so we could see this particular trace in any tracing UI and from this trace then we could link that to particular logs. Han mentioned contextual logging was a thing. So as part of contextual logging we are thinking about injecting a span ID into the logs so that you can jump from your actual spans to like a group of logs or just a log, a log line and know exactly what's happening at that particular time in your cluster. So that would be super useful and make debugging more efficient. So that's mostly it for tracing and observability signaling Kubernetes but there are some gaps in the observability stories that we cannot just cover in Kubernetes because it might not fit all the users so we have some projects for that. I've listed four of them but there are plenty of them as Han mentioned but these are the main ones. So metric server, you probably heard of this name although maybe you wanted to share what it's doing because it's most likely installed by default but essentially like the segmentation is responsible for three different metrics APIs it's not the API that Han was mentioning it's actual API endpoints where you would have actual Kubernetes resources that are served as metric such as like pod metrics, non-metrics and so on. And metric server would be one of implementation of the resource metrics API in particular. It's the source of truth for kubectl top which is used to introspect the auto scaling pipeline it's also the source of any or like if you have it installed of resource metrics based auto scaling so if you want to auto scale an application based on CPU utilization with an HPI you are most likely to it's like a metric server that would implement the resource metrics API. And it does that by collecting metrics from kubectl at fixed intervals and then serving them to the API server whenever the user or the HPI controller is sending requests to get those metrics for other purposes. To go even further like we've noticed that just CPU and memory doesn't help much when you want to auto scale an application. Like it's pretty basic but if you want to let's say auto scale a web server based on the number of requests it's getting per second you wouldn't be able to do that. So we have the two other API which are custom and external metrics one for that purpose. And so Prometheus adapter is one of the implementation Kader, another very successful CNCF project is another implementation. And this API would talk to third-party application let's say for Prometheus adapter it would be Prometheus like the third-party monitoring backend that would collect metric. And then Prometheus adapter would just queries these metrics based on the user inputs. So it could be a HTTP request per second and then serve that over to the API server and to let's say the HPI controller so that it can take auto scaling decision based on that. One or I guess like the newest project that we've had that was recently integrated into the sub project that we oversee is usage metrics collector. And it's meant to cover a gap that we've had for years now which is that you cannot really do high-frequency scraping of Kubelet or of any memory usage, CPU usage and CPU capacity metrics in Kubernetes because C-Advisor is not very optimized and we've tried to optimize it but so far we haven't been able to. And so usage metrics collector works around that to allow for one-second scraping intervals and based on this high-frequency scraping you can get more accurate dashboards as well as faster auto scaling. So that's super useful and that kind of bridge that gap that we've had for a while. And it also allows you to perform aggregation at collection time meaning that despite having so much data coming from Kubelet and from the system you wouldn't get as much on your monitoring platform so the cost would be fairly minimal but the value is super important. The only problem at the moment is that it was written based on the C-group Z1 API and I'm not sure if you know but there was a big change in the API between C-group Z1 and C-group Z2 so we would have to migrate it to V2 in order to work on the clusters that are the most up-to-date and are supporting V2 only. So that's something that we are looking into and are looking for contributors to help us achieve. An example of a configuration so one of the main benefits is that you don't have to know from QL at all which is blocking some of the users sometimes. So that would be one of the examples that you could get for the 99th percentile of resource utilization with a something rate of one second. Pretty minimal and you would do that over all the containers and it would generate the two metrics or two time series that you can see below and then you can query those if you collect them on your mentoring platform. Last but not least, CubeStateMetric I guess one of the most active project that we own. It's a simple exporter that generate primitive style based metric out of Kubernetes objects, pods, deployment, stateful sets, any object really. And an example of that would be these two metrics that would give you the number of replicas that you have in the deployment. And for example, if you want to know when a rolling update was completed, you could know that by looking at the status as well at the updated replicas. And in your monitoring platform of your choice, you will be able to see what's happening inside of your cluster. And it does that by always watching changes made to the QBAPI server and reflecting them as metrics to your monitoring back end which could be primitive or anything. Recent, like one of the biggest feature we've introduced recently was the support for custom resource metrics. So now it's possible to generate metrics based from your CRs and CRDs. So now you can kind of reuse CubeStateMetric and extend it for your own needs and generate any metrics that you want essentially. But the problem is that the configuration is not very easy to understand and we are planning to simplify it in a way that would allow us to both extend it more easily as well as maintain it more easily as well. And to do so, we are planning to use cell which is the common expression language that is starting to get widely used in the Kubernetes ecosystem. And another effort that we are going to pursue would be to move the configuration to a CRD. So yeah, we are pretty, I guess, welcoming community. So if you want to join us, we'd love to see you around. The best way to interact with us is to join the SIG meetings and say that we have a meeting like once per week so you can join, you can also participate in reviews, do any kind of contributions. If you are interested in a particular sub-projects, I've listed a couple of contacts that you can reach out to on Slack directly to contribute. Like we're always seeking new people and we can always mentor people as well and we'd love to. So here you can find the time of our meetings, the Slack channel that we have, all SIGs have one and the contacts for our shares and leads. So feel free to reach out to any of us. And yeah, thank you for coming. One last thing. Yeah, so if you find that the instrumentation or observability pipeline isn't what you want and we're doing something weird, you should definitely come to our meetings, especially if you have some ideas on how to improve it. We welcome any contributions and any ideas to just make the experience better. So yeah. So if you have any questions, feel free to ask. We have five minutes to answer a question, I guess. Also, yeah. We'll stay here for a bit. So if you want to ask us anything, feel free to. Otherwise you can go, feel free. Thanks everyone. See ya.