 All right. Welcome everyone to the sick instrumentation session. I'm Frederick. I am the CEO and founder of Polar Signals. I've been working on Kubernetes things for the past over five years now. I am also a Prometheus maintainer. I generally do pretty much everything in the intersection of observability monitoring and Kubernetes. And so if you're using any of the tools kind of in that intersection, there's a good chance that I might have touched them over the past five years. You can check out my GitHub and if you want to chat, feel free to hit me up on Twitter. So in case you're new to Kubernetes and Kubernetes SIGs, essentially special interest groups. SIGs are essentially groups of people within Kubernetes that have a common kind of desire that they want to improve Kubernetes in some particular way. And SIG instrumentation as the name kind of implies is effectively about all kinds of instrumentation that you can add to Kubernetes, but also through sub projects to kind of allow Kubernetes to be more observable. So like we have several metrics projects. We have several logging kind of libraries and projects and more recently we've started to kind of dive into some tracing aspects as well. But organizationally, special interest groups SIGs are actually kind of well organized, I'd like to say. And we have a charger and effectively it says we try to cover the best practices for cluster observability across all Kubernetes components and develop relevant components. So that's kind of the crisp kind of vision statement of our group. Some of the sub projects that you might have heard of our KubeStateMitrixK log, which is kind of the logging library that is used throughout the Kubernetes ecosystem. Some hated, some love it. Then the metric server, which is a component that is very vital for running Kubernetes clusters. But I'll dive a little bit deeper into metric server a little bit later and same with Prometheus adapter. But just to give you kind of a quick glimpse into what we do at within SIG instrumentation. And again, if you're new here, it's very easy to kind of get started and talk to us. We have regular meetings every two weeks at 9.30 a.m. Pacific time on Thursdays. So that's every second week and every other week, essentially, we on Wednesdays have our triage meeting where we go through the latest issues and kind of assign them to folks that are in the call. So if you're interested in contributing to Kubernetes, this is a really fantastic way to get started. You can just join one of these meetings and we'll help you find your first issue. But if you also just want to chat, you can hop on Kubernetes Slack on the SIG instrumentation channel or you can also join our mailing list, the Kubernetes SIG instrumentation. And also this is kind of common throughout most special interest groups within Kubernetes. Once you join the mailing list, you'll get automatically access to all these meetings. You'll automatically be invited. You'll automatically have access to all the kind of meeting notes and all the documents. So you'll immediately be part of the community as soon as you join the mailing list. I certainly don't do this alone. I have three other wonderful co-leads and chairs. So David Ashpole, he works at Google. He kind of leads most of the tracing aspects lately, but also some really awesome node-related topics. Elena also recently started doing a lot of the node-specific things. I believe she's also now more deeply involved into other node things, not just instrumentation. And then Han, who I'll kind of reference back to some of the topics later. But yeah, all really, really awesome people and I certainly couldn't do it without them. But yeah, so what are we going to kind of talk about in this talk? And I kind of want to just give you a quick outline. So what is it that we do is kind of what I already covered, right? But I want to also guide you through some of our current activities. Maybe some that we've just completed and that we're proud of that we've just completed. More importantly, the things that are ongoing and how you can potentially collaborate here, right? And hopefully one day become part of sick instrumentation. So yeah, let's get into it. Yeah, so as I already mentioned, we have our triage meetings, right? And the way it works, if you're particularly interested right now already in instrumentation issues, they're all labeled on GitHub, so you can just search for the sick instrumentation label and you'll see all the current open issues and PRs that people are working on. We do review pretty much all changes in regards to metrics changes just because we try to help people follow the guidelines and kind of give advice for how to best structure metrics and so on. And we also have a kind of specialized framework within Kubernetes that kind of wraps the Prometheus library so that we can force a little bit more structure into it just for the Kubernetes project so that we can enforce some guidelines a bit more with tooling as opposed to just people. But we certainly do review a lot of the changes still. But I'll talk a little bit more about the framework in a bit. This part is common throughout all six within Kubernetes. Kubernetes has the so-called CAP, the Kubernetes enhancement proposal process and you can find all of the CAPs that we have written and may still be working on with this link. But that's only for things that are in the Kubernetes repo. All the sub-projects are kind of able to organize themselves and don't necessarily need to be included in the CAP process. The CAP process is mostly for things that may even span multiple six in terms of responsibility and scope and being affected by it. So yeah, let's look at some of our current topics and we'll start with metrics. Just as a very high overview, some of you may already know this but Kubernetes essentially integrates very deeply into the Prometheus project. Prometheus is one of the other CNCF-graduated projects. I happen to also work on Prometheus but Prometheus was already integrated when I started working on Kubernetes so it's not my doing but I do enjoy it. But essentially the way Prometheus works is it uses a pull model and so Prometheus goes and does HTTP calls to each Kubernetes component and scrapes the metrics that way, then writes them to its internal time series database from where you can query it. And so if you look at this picture, we in second implementation are concerned with the left side of the picture. So we are concerned with actually serving that metrics endpoint to Prometheus and hopefully providing the most value that you can actually do some useful things to monitor your Kubernetes clusters. So one thing that has been a very long project that we've been working on is our metric stability framework. I have shortly touched on the framework earlier already but it's been a long time coming and I really wanted to give Han a shout out here as well who initially started this but so many other contributors helped get this across the finish line and just now in the 121 release we have actually g8 this feature. We have our first metrics marked as stable so that means that there is a certain period of time where you can definitely rely on these metrics existing. Before this framework we essentially had no stability guarantees around metrics and that is still the case for most metrics but we do now essentially have this framework that allows us to mark certain metrics as stable and once they are you can actually rely on them. And then there are a couple of other really cool ones that we're working on both of them in Alpha. The pod resource metrics one I personally find really useful. Essentially what this feature is is the scheduler reporting metrics the way it sees it. It sees the available resources in the cluster and so the reason why this is useful is now we don't just have kind of an outside perspective on why particular pods may not be able to be scheduled but now it's actually the scheduler reporting this information the actual way the scheduler perceives the state of the world. So this is incredibly useful for understanding and capacity planning your Kubernetes clusters in terms of resources. And then one other really neat feature is dynamic cardinality enforcement. So with metrics what can potentially happen if we're not really careful with our reviews and there will always be human error right. We do have some tooling already available that prevents us from doing hopefully as many mistakes as possible but no humans perfect and neither is tooling necessarily but this is essentially kind of a fail safe mechanism that we can use to if there is a cardinality explosion accidentally happening in your Kubernetes cluster we don't necessarily need to patch the entire code base to be able to prevent this cardinality explosion but we can use some configuration to prevent it in place. So that's been I think it's going to be a really useful feature for these cases when it does happen. But yeah these are kind of our metrics focused features that we're working on right now or have just completed. These are definitely really great ways to contribute to the metric space but let's go on to logs and events. In this case actually we didn't have too many things about the events API happening since our last KubeCon. In our last KubeCon we announced the new version of events but we try to always mix it up and make sure that we present the latest and greatest content here. So right now most of our logging efforts are concentrated around structured logging. If you've been around for a while you know that Kubernetes did not have, does now, did not have structured logs in the past but that is changing. Since 1.19 we've had the alpha feature for structured logging and that introduced a couple of new methods essentially against the Kubernetes logging library and this flag with which you can enable structured logging in Kubernetes components. This is still alpha and it's been a really fantastic way for a lot of people to start contributing to Kubernetes because essentially the entire code base needs to be migrated to these new structured logging calls. This is a really nice thing to contribute through if you're interested in contributing. But on a larger scale there's even more that can be done aside from migrating. We need to do performance tests, we need to verify the decisions that we've made in our designs are still panning out the way that we had hoped and overall there's just a lot of stability work that is left to be done so that we can actually mark this as beta and then hopefully eventually graduate it just like we've done with the stability framework for example. That was also a process that we went through. There's a really cool block was that was written about this. If you're interested there's a lot of work to be done here and I believe there is a specific working group being formed around this effort so keep an eye out on our mailing list or join any of our meetings if you're interested in structured logging in Kubernetes. The last thing I believe for logging that we have currently in progress is log sanitization and essentially this is a combination of static analysis and kind of knowledge about particular things in Kubernetes to make sure that we don't accidentally leak secrets into logs for example. So this is all about security and this came out of a security audit actually which I think this is so cool that we have these being sponsored by the CNCF for example and that way kind of we got this independent security audit and this now got kind of converted into a initiative led by the Kubernetes community and in particular under the umbrella of Kubernetes thick instrumentation. Yeah so that's that in terms of logging initiatives in the Kubernetes project itself. Now the latest things that we've been looking at is we want to make sure that not just kind of metrics and logs kind of things that are focused on a particular process is in place but we also want to make use of the really awesome capabilities that tracing allows us to do and we introduced tracing a couple of tracing features that's not necessarily throughout the entire Kubernetes codebase there are some challenges here but we definitely have some functionality for tracing available in Kubernetes since 120 and you can essentially just enable it using this flag and that will allow you to configure the open telemetry collector and then that way you can send it through whatever tracing back end you like. Yes I really love tracing so I'm really happy that we're getting started on these topics but as I said these are just the topics that we are working on within the Kubernetes Kubernetes project as second instrumentation we do a lot more than just the Kubernetes repository itself we have a number of sub-projects and so I'm not necessarily going to highlight every single one that we have but I'm going to highlight the ones that I think are the most noteworthy and the ones that have the most things happening right now and that's kubestatemetrics maybe you're familiar with one or the other project here already, PROMQ metric server and Prometheus adapter no worries if you don't know what these are I'm going to go and walk you through each of them as well so kubestatemetrics is essentially an add-on agent that you can add to your Kubernetes cluster and it looks at what's going on within your Kubernetes cluster like actually the kind of domain specific things within Kubernetes and converts those anything that looks like a number converts that into a metric and then you can script that with Prometheus just like everything else in the Kubernetes world and one really awesome example that we have here on the slide could be that you compare the expected replicas of your deployment with the actual replicas so this is really useful for alerting purposes for example where you want to make sure that your deployment is actually doing the thing that you wanted to do or that it rolled out successfully or all sorts of situations where the deployment is not necessarily in the state that you want it to be and one really exciting thing about kubestatemetrics right now is that as of this recording it's not necessarily released but we are in the process of releasing the second major version of it as I record this release candidate one is out but I do believe once you watch this recording version two the final release will probably be out if not then it will be soon I'm not going to go through all the changes necessarily but basically v2 was a chance for us to get rid of all the technical debt that has accumulated over the years and we've done a major cleanup throughout there are a couple of flag changes and a couple of new features actually a couple of changed features but if you're already using kubestatemetrics just check out the change log look out for a blog post that I believe should be going out soon as well and it's not too big of a change but it's just for us maintainers just a relief that we can finally get rid of some of these things PromQ I think might be our most recent edition actually I believe this was started by Han and Solly and maybe a couple of other Googlers I can't remember right now anymore but essentially this is kind of a Prometheus but within your terminal and I think that's a really neat idea to just kind of explore slash metrics and points in real-time and locally and you can just query things immediately from your command line not necessarily requiring an entire Prometheus to be set up and creating configurations to scrape things this is really just to understand your local instances a little bit better I think this is a really fantastic tool for understanding the metrics that you might have available from a process and it's not specific to Kubernetes at all it's actually totally general purpose but the next project is metric server and the way that you may have already interacted with metric server is through kubectl so kubectl has this supplement called top just like the Linux command and using it you can essentially request from Kubernetes how much CPU and memory are your pods and your containers using as well as your nodes overall and this is... kubectl top is almost a side effect of what this was initially used for but it's useful for that as well essentially this was originally created to be able to be used for autoscaling purposes so let's say you are using 80% of your memory horizontally scale your application by one more pod or whatever your horizontal pod autoscaler definition is you can find it under the Kubernetes 6 GitHub org and essentially just a very high overview of how metric server works it's very similar to Prometheus goes and scrapes Kubernetes nodes gathers that information and then it's what we call an aggregated API in Kubernetes which essentially just means that it is registered on the Kubernetes API to serve a specific API and whenever the Kubernetes API is asked for a particular request for that API then it's just proxied to the metric server and the metric server will actually answer that request so that's kind of how a couple of APIs are pluggable within the Kubernetes ecosystem and I go into this detail because the Prometheus adapter is essentially an alternative implementation of this API that you could use if you may already use Prometheus within your Kubernetes deployment because metric server essentially does the same thing as Prometheus by scraping nodes for this information it's kind of natural to not duplicate this task within your cluster and kind of use additional resources for this if you are already using this within Prometheus anyways for your maybe alerts and dashboards so it's just if you already use Prometheus I recommend using this I actually forgot to update the repo link here this repository did move under the Kubernetes 6 org recently it was originally developed by Solly but kindly donated it to be a Kubernetes SIG instrumentation subproject recently so now it's under the umbrella of the Kubernetes SIG instrumentation so yeah, hopefully I kind of demonstrated that we do a lot of exciting and interesting things within Kubernetes SIG instrumentation and there are so many ways to get involved as I said throughout the presentation there are many metrics related topics you can get involved in logging aspects that you can get involved in the migration paths for structured logging for example or performance tests the tracing work is definitely also not done so if you're at all interested in any of these things we do attend our SIG meetings and we'll be more than happy to find something that you can work on if you're interested in any of the subprojects you can either contact Lili or myself if you're interested in KubeState metrics we're always seeking new contributors if you're interested in metric server you can reach out to Marek if you're interested in PromQ either to Han or Sali or Yuchen we're all a nice bunch of people and we're always happy if you reach out directly or if you join our SIG meetings we'll be more than happy to find a contribution you can get started with one more time our SIG meetings are every two weeks on Thursdays 9.30 a.m. Pacific time and then every other week on Wednesdays 9 a.m. Pacific time we have our triage meetings and these are the perfect meetings where we can figure out something that you can work on if that's what you're interested in or if you just want to listen in that's totally awesome as well we always like to hear feedback about things as well so even if you just have some feedback about any of the things that we're doing we're more than happy to listen to you and yeah all the other distribution channels either Slack or the main list are awesome as well once again the chairs are Elana and Han and technically it's our David Ashpole and myself and just giving you a quick shout out there are some other cool special interest groups in the CNCF for example the observability working group they are also having a talk on Thursday, May 6 so do check them out as well there are many other observability related talks in the observability track and maintainer track do check those out as well but yeah that's it thank you so much for coming and I hope you have a great cook coming