 Hello, everyone. Welcome to 2021 Qubocon. Here is the signal intro and the deep dive talk. My name is Don Chen, and I am the software engineer from Google and currently working on GKE and Answers. I'm one of the funding engineers for the Kubernetes and initiate the signal community back in 2016. Hi everyone, I'm Derek Carr. I'm an engineer at Red Hat. I work on upstream Kubernetes as well as our distribution and OpenShift. With Don, I've been working on Kubernetes since the pre-Kubernetes 1.0 days, and I'm happy to hear and talk with everybody. Hi everyone, I'm Alana Hashman. I'm a software engineer at Red Hat, where I work on the OpenShift node team. I've been contributing to Kubernetes since about 2018, and I work upstream on the node team as well as in SIG instrumentation and the production readiness review team. Hello, I'm Sergei Congelif, and I work for Google. I work on GKE specifically, and I was involved into Kubernetes for over a year now, and I'm super excited to be part of SIG node. So Don, can you tell us about SIG node? Okay. Before getting into today's agenda, I want to briefly mention the previous SIG node update made by Alana and Sergei back in May at Kupacon, Euro. And click each of those links you will see the watch of the record video and also slides. Next. Here's today's agenda. We are going to first introduce SIG node's responsibility. Then we are going to talk about the current activities, the roadmap for 1.22 and 1.23. Then we are going to talk about some interesting projects and efforts currently driving by SIG node community. For example, 1.22 power life cycle refactory, CI sub project to continue from last updates. Then finally, we are going to talk about the bug challenges. At the end, we are going to talk about how to get involved and how to get help. Next. What is the SIG node and its responsibility? Let's briefly talk about the node responsibility in Kubernetes first. Kubernetes is a cluster or transportation solution for container-less applications and services. Those containers include of the Kubernetes control plans are running on the nodes. On each node, there is an agent called KubeNet. KubeNet registers the node to Kubernetes master. KubeNet together with container runtime managing pod and the containers life cycles on the node. Set up, run, tear down and clean up. KubeNet also does the node-level resource management such as ensure applications get the request resource, detect the node-level resource starvation issues and takes action to prevent other resource situations. KubeNet also sends the status back to the control plan. Next. In summary, SIG node owns all controllers running on the nodes, which ensures node itself and the applications running on it, running happily. SIG node is very large and owns many projects. And you can click all those following off the links. You can find so many interesting projects and also the tools and the demons to ensure all the healthiness. Next. So thank you, Don. Let's talk about SIG node roadmap. As we mentioned, Don, we already did a talk in the past in May about roadmap and what we planned for 1.22 release. I wanted to refresh and tell what we actually delivered in 1.22 in terms of caps. By the way, cap is our way in Kubernetes to track features and improvements. It's a short for Kubernetes enhancement process. And we track all the caps through different stages and this list shows caps that we've been working on in 1.22 and actually merged it. I try to split caps into themes and it's really hard to split anything in the teams. A team is not an attribute of a cap, it's just our way to attempt to group and to demonstrate what we've been working on. I think it's important for Kubernetes to stay on the edge of which workload we support, be it in high performance applications or some databases. I mean, we try to support as many workloads as possible and we constantly improve how we support these workloads, how we make it possible to run applications with high availability and high performance. Another thing is reliable operations. It's critically important to stay reliable and secure by default. And we also constantly working on cleaning up a code base and no permanent beta is a topic that is always on top of our mind to clean up code base and make sure that we have a way for new features to come in and deliver a lot of value for users. I don't want to go into details of every single cap we've been working on for 1.22 and you can go to previous talk to listen what you've been playing to work on and get some details. I just wanted to highlight again, these caps are by no means ordered in any priority order. For everybody, priority is different. Some people care about some features, other people care about other features. So I just ordered that by cap number. You can see it in the end of every link. And it's also hard to structure and categorize caps into themes. For instance, node swap support that Elana and you worked on, it can be attributed to both to support more workloads like some AI workloads that loads of huge data models or it can be attributed to reliable operations. Which one do you believe it applies the most? I mean, to some extent, it's a bit of both. Honestly, for the alpha, to some extent I think we're targeting reliable operations maybe more so than workload support because swap is currently only tunable at node level rather than for each individual workload. However, you are currently allowed to with the alpha, let workloads use up to unlimited swap on the node or no snop within the constraints of C groups. So it's giving folks some options and I'm also excited to continue working on that in the 123 cycle. And Derek, I know that another highlight of this list is memory cost support. It's a seemingly easy feature that allows to set up requests and limits for memory and make it more reliable. How long did it take to get to the alpha stage of memory cost? Yeah, it's a great question, Sergey. So I think C group suite two has been a journey in the Linux community and then as well in the broader Kubernetes community. It's important that as the underlying operating systems the keyboard is orchestrating to run containers on that we keep up with emerging changes at the actual host itself at the Kubernetes tier and take advantage of them. A lot of folks might not realize today that the memory requests on a pod when running on a C group suite one host isn't actually used to provide any quality of service guarantee to that container beyond just the minimal scheduling guarantee. So we're excited about some of the new features in C group suite two that we can exercise now to provide differentiated memory quads so that better and more minimal guarantees can be provided via the memory dot, the different options in the V2 version of that controller. I think this is an effort Don, you and I have talked back on since probably the last five years. And so it's been a slow emerging process but an important one we see for the sake going forward. Yeah, I still remember though, the first out of memory and the discussing and the memory quality of the services and there's the long list of the problem released to there. Yeah, thanks. And Don, since you've been here a long time and you initiated dynamic couplet configs that currently on the list for duplication. Do you have any hard feelings about it? Actually, I'm really glad that we finally decided to duplicate this one. So this, I first started at the beginning about the Kubernetes at early adoption states and we don't have like the clear of the node spec. We also don't have that clear of the standardize of the system configurations. So many user at earlier time want to customize the node configurations and specs. At the same time, we don't have, we only have tons of the logs, flags to need to use that to customize our offer. So which is unstructured and is not manageable. So through this, we founded this project and back in 2017. And so while to support this one, actually we are defying off the Kubernetes components, component configurations project, which is not just for Kubernetes, actually it's applied to all the control plan. And so we draw those things successfully or a while back. So make all our control plan can be managed. And another thing it is, we also deprecated tons of unused of the flags and convert them to the Kubernetes config fields. And over those times, and we also realize how to do the more dynamically Kubernetes configurations at a node level, but no need to have this intrusive and much harder to manage the dynamic Kubernetes configuration. So I think this is serve the purpose, original purpose. And now is the good time to deprecate this. Yeah, thank you, Don. And I want to say from myself that how much I enjoy working with you Derek and every member of a community, so much history and hard work was put into these features. And the history of it is quite exciting and thrilling for me. So in Kubernetes, we also committed to improve our processes all the time. And as part of the signal, we also committed to do the same and we run some retrospective for 122 release. I think it was the topic of 122 release was reaching new heights or new peaks. Yeah, reaching new peaks. And I think we reached another peak in signal. We tracked record number of 24 caps for this release. We merged 13 of them. It's a very good number considering everything. Two of them was exception. So very last moment and four was almost ready. So we could have done 17, but the exception was denied by different reasons. And from things that went well, again, record number of people working on the record number of caps, which is great. We now looking more at the reliability and requiring end-to-end test on early stages, it allowed us to catch some issues on caps that wasn't merged, which is great because we don't want to shift something that is not working and create a better impression about alpha feature. Also, we realized that all small caps were very smoothly merged into release and no issues on the road. So it's all great. Some things we plan to improve is we will define a soft early deadline for PR draft. This will allow us to first confirm that the cap is actually on track for release. And it also will allow us to request early API review. This release, we realized that API review is a bottleneck for us. Too many caps were trying to merge at the same time. And API reviewers just don't have enough bandwidth to satisfy and review everything. And then we don't have time to react on these reviews. Plus PR process can find a lot of issues. We just started using PR and we realized that PR may early discussions in PR. PR stands for production ratings review, by the way, can find a lot of issues that will enable smoother implementation of a feature. Anyway, great release. And now we work on 123 release. In 123, I put the same things for caps and I had the same workload support. We want more workloads, same secure by default. But also new topics is easy to troubleshoot, which is something that helped us to improve their instability and observability of workloads and infrastructure. It's partly handled by SIG instrumentation and mostly handled by SIG instrumentation. I would say that Elana is a chair of, but at the same time we in SIGNOT also need to make some core investments and core features to enable advanced scenarios that are outside of SIG instrumentation scope. Yeah, and just to jump into that, SIG instrumentation for the most part does not actually own any of the code in the various components that provide observability. SIG instrumentation owns the core library, like the metrics library and framework in component base, but not necessarily specific metrics on each component. So it's the responsibility of each SIG, such as SIG node to actually make investments in those areas to meet the guidelines set forth by SIG instrumentation. Yeah, thank you. And in other things that I'm very excited about is intelligent operations. We want to make sure that we minimize workload interruptions and apply some intelligence to do how we do that and how we optimize resource use, for instance. Intelligence is replacing manual labor. So this is great for operators and it's great for workloads. With that, I mean, I can go into details of some of this caps. I think content-intensive fire is something that we'll long awaited for. It will enable some SIG storage related features and it will improve some, enable some workloads. Another long-awaited feature is JVC probe support. And from intelligent operations, I'm really excited about in-place pod update. This is something that was discussed for a long time. You can see by the number of a cap, it's like 1,000 less than everything else. It was long discussed and finally we reached a milestone when we understand the API changes that we want to implement. So hopefully in this release, we can make it happen. So caps is very important. It allows us to deliver new features but also as important the reliability, refactoring and optimization work. And Ilana will talk, tell us about one of the refactorings that we've made in 122 release. Yeah, thanks, Sergei. So it's really important for SIGNO to be able to pay down technical debt as we accumulate it over time. And so to give you a little bit of context, in the 122 release cycle, we went and did some pretty substantial refactoring of the pod life cycle. So to understand that refactoring, I first want to give a very quick overview of the kubelet as part of our SIGNO introduction. So the kubelet is just a controller that turns a machine into a Kubernetes node. And the code for the kubelet lives at mostly in package kubelet with entry points and command kubelet. And I've listed the files under the package kubelet director here if you want to go take a look at what logic lives where. But essentially, like any other controller, the kubelet manages a certain kind of Kubernetes object. In the kubelet's case, it is the pod and it has a main sync loop and you can find that main sync loop in kubelet.go. And the sync loop uses pod workers to track operations on pods and you can find that logic in podworker.go. Each pod describes a collection of one or more containers and those containers have a life cycle that's defined in the kubelet starting in a waiting state, moving to running and eventually being terminated. And you can find more on that logic in kubelet pods.go. So on the next slide to illustrate with a picture here, this is sort of an idea of the core workflow that this comes from a design proposal. If you click on this slide, you will actually get a link to that design proposal all the way back from the 1.2 Kubernetes release cycle and planning. So the core design has remained the same over time and you can see here in the middle, the various pod workers that the kubelet spawns in order to manage various pods and pod changes. So pod life cycle refactor, why did we need to do this? Well, so initially we found a bug wherein pods can set on the pod specification their termination grace period seconds to zero. That is when you tell the pod to terminate, this is the grace period seconds that it will use by default before it tries to force kill the pod. When you set that to zero, we found a bug wherein that was accidentally causing forced deletion, which was never intended by design. So we went to submit a patch to fix that and we found that when this was fixed, we started finding a number of flakes in the end-to-end tasks as well as some failures because those were relying on this undefined behavior. And so in order for everything to coordinate without this forced deletion, it was no longer lining up as far as the timing in our end-to-end task go. So we found this new problem, which was that over time the signode code had evolved and we needed to use a single source of truth in the cubelet during pod shutdowns in order to avoid various things like storage, racing with one another. So the solution was to go back to the original design of the cubelet, clean up some of these drifts that happened over time and unify all of the pod termination logic inside of that pod worker. So while fixing this, we have a pretty substantial test suite in Kubernetes and we started uncovering some occasional flakes as well, which led us to find more rare bugs. So for example, we might encounter a rare race condition on pod startup where there's a cached container status and the cubelet thinks, oh, there are zero running containers, which is not correct in some cases. In other cases, we found that code that had been added was only checking running containers, but not necessarily new types of containers such as ephemeral containers, which would also be required as part of the pod spec. So sometimes that could cause rare correctness issues. As well, some loops had confused whether a pod was terminating or already terminated. And as a result, if one, for example, assumed that a pod was terminated, while it was in the process of terminating, then these components could race with each other because the pod wasn't completely torn down. So by working through this, we were able to clean up a bunch of these sort of subtle, rare issues that we could occasionally encounter as these flakes in our test suite and thus significantly improve the code as part of the pod lifecycle in the cubelet. So what did we learn about this? What went well? We fixed a lot of these subtle, really difficult to reproduce, difficult to debug bugs by going back and fixing this code. And as well, we found that the test suite worked out really well. It caught a bunch of bugs before we went to release. And so we were able to fix a number of issues that might have been sent out in release before we cut the release. But because the majority of our tests we tends to be leaning towards periodic jobs and much of our most detailed tests because they can be very involved to run, they require a lot of infrastructure. We still found ourselves catching some regressions after the release was cut. So as a result, we found a couple of regressions very early in this 123 life cycle. And so since this merge so late, we are going to be working on getting those fixed as soon as possible and then back porting those back to 122. So they'll be available in a patch release. So what are we doing next here? Well, we still need to actually fix the issue with the termination grace period seconds, like causing pause to be forced deleted when that's set to zero and any other bugs that we've managed to uncover here. And hopefully as such, we will then end up improving the reliability. And speaking of reliability, I think it's time for me to hand over to Sergei to talk about our CI subproject. Thank you, Alana. Yes, so a subgroup was originally formed as a reaction on one release where many release blocking tests were failing and nobody were around to fix this release blocking test. So the reaction was that we formed a subgroup but now it's way more than just reactionary subgroup to fix some release blocking issues. We actually converted them to reliability group. We work on improving stability, watching couplet regressions, including performance regression from this version. And ultimately we ensure smooth releases and confidence that Kubernetes is stable and reliable platform. CI subgroup is also a great place to start learning Kubernetes and get started contributing because we have very friendly people and there is always work to do. There is always work that easy to get started with. Last half year, we switched a lot from being reactionary to issues that emerging and ongoing issues that just uncovered by our CI with the new check-in. So like with infrastructure change and we moved more proactive mode where we proactively cleaning up test structures, we add in tests, we requiring tests from new caps and doing all those great activities. We also start looking into reliability more and realize that to be even more proactive, we need to start looking into incoming bugs and for that we want to get into backtrash. So Elana, can you talk about backtrash a little bit? Yeah, absolutely. Let me talk about bug triage. So next slide, please. Oh, I missed the one before this. Yes, here we go. So SIGNode, SIGNode has one of the largest workloads in the entire Kubernetes project. I believe we're around the third largest SIG by absolute workload. And so that means in a given week, we have over 200 average open PRs. We tend to merge in a given week anywhere between 20 to 80 PRs, both merged and closed. And over the past year, we've seen contributors from nearly 50 companies and many countries, probably more than that, making contributions to SIGNode. So your help is needed, we've got a lot of work to do. And so how do we manage such a large workload? Well, we focus on ensuring that we have efficient processes that will allow anybody to be able to jump in and work through a bunch of the stuff on our backlog. So in 121 and 122, we focused on pull requests where we added GitHub project boards for tracking the life cycle of PRs, both for the wider SIG and also focused on CI signal and testing. We improved our documentation by adding a triage guide and contributing file for SIGNode. And we also worked to mentor new contributors via a Contrabex mentoring program in order to help train more reviewers for the SIGNode. So that's focused on pull requests. Well, in the 122 release, we wanted to start doing a better job at tracking bugs because we had, when we started a particular bug scrub event, over 450 bugs on our backlog. So I organized a very large scale global event where we tracked three different regions, all working together for 48 hours. We had over 90 attendees in our Slack channel where we all met virtually. We had 13 region captains and mentors. And during this event, we closed 136 issues. We updated over 200 and we ensured that the vast majority, 96% of SIGNode issues had received an update in the past 90 days. Now, here are some of our stats that you can see from this event. So we really drove down the backlog on the left. You can see the total number of bugs and issues and on the right, how many have received a frequent update. So I mentioned that we previously had tracking on our GitHub boards for the sort of steady state of pull requests to ensure that we're tracking them throughout their life cycle, but we didn't have such a thing for bugs. So for this upcoming release, we have added a bug tracking board that we are going through every week in our CI sub project and ensuring that we can be very responsive when bugs are filed against SIGNode that we're answering folks, we're ensuring that if there's information needed that we request that very quickly and thus we can drive down our meantime to resolution. So how can you get involved in those things? Derek is gonna tell you a little more. Yeah, so I know we're at almost that time here but hopefully everyone saw the themes of presentation today where that, if you're looking to contribute to the SIG, we love prioritization of contributions that focus on helping us improve overall stability posture. So if you're looking to open up issues or provide bug fixes, we deeply appreciate those fixes that come with end-to-end tests demonstrating the problem and the resolution. For folks who wanna help improve Kubernetes to run more optimally to make better use of resources on their computers, we welcome all contributions looking to make the Kubelet more resource optimal. So better usage of CPU, better management of memory, disk, et cetera. And then finally, new features are always welcome but they are less pertinent to us than those first two principles we just talked about earlier. But we do welcome contributors coming forward and coming forward with their new ideas and new use cases so that we can ensure that the project is successful going forward. Great ways to get started beyond where Alana and Sergei have talked about today is just helping improve code documentation, enhancing the actual logging and metrics reported by our components and helping us to really keep up with the workload within the SIG. So we go to the next slide. How to join. So the first step was participating in this presentation. Hopefully you saw we're happy, fun people to talk to. And so we welcome you to join one of our regular SIG meetings. So we have a primary meeting where we cover ongoing features, caps, maybe broader design topics as well as the CI triage meetings that Alana and Sergei talked about earlier. And critically, as the SIG is constructing new features and evolving the platform, please give early feedback for your own real-world environments so we can inform our future directions. Finally, if you go to the next slide, you can reach out to us by joining the SIG meetings linked here as well as join our Slack channel, reach out and ask questions, send a note to our mailing list and reach out to both the SIG chairs and any of the sub-project leads and we'd be happy to assist. So thanks again for everyone taking the time to learn about SIGNib.