 Hello everybody, and welcome to this year's Kubernetes SIG node intro at DeepDive update at the KubeCon Europe 2021 virtual edition. My name is Alana Hashman and I'm a principal software engineer at Red Hat. I currently work on the OpenShift container platform engineering node team, and I've been involved in Kubernetes since 2018, primarily in SIG instrumentation, but more recently also in SIG node. I'm Sergei Kangelev. I'm from Google and I'm working on Google Kubernetes Engine. I joined SIG node specifically heads on more than a year ago, and I'm excited to present it to you today. What we will be talking about today is overview of SIG node, what is this and what companies does it have. We will talk about current activities and what we did in recent releases and we will talk about roadmap going ahead. Then we will talk about two specific projects like features like DockerSIM deprecation and finally we'll go into subprojects and how you can get involved into SIG node or subprojects of SIG node. What is SIG node? If you think about Kubernetes as cluster orchestration solution, you will think about nodes as something that manages containers are scheduled by Kubernetes. Once Kubelet, it's an agent running on a node, receives a port that needs to be scheduled, it tears it down into individual containers and schedule them according to resource management constraints and other possible limitations. SIG node contains of subprojects like Kubelet, container runtime interface and it also includes some specific tools that help in SIG node like tools to manage containers like CRI tools or detecting problems from these containers like node problem detector. You can follow the link and find more subprojects that we own and SIG node is huge. Now, Ilana will talk about SIG node roadmap. Thanks, Sergei. One of the things that I want to talk about today are what we've been up to in SIG node in the past few releases, what features we've graduated, what features are on our upcoming roadmap, that kind of thing. So one of our themes for the most recent releases have been graduating features stuck in beta. So for example, we have syscuddle support which was Alphaed in 1.4, but we recently graduated that in the 1.21 release. As well, we graduated Rana's group and CRI log rotation in 1.21, and in 1.20 we graduated Pint limiting, runtime classes, third-party device monitoring plugins, and many more. We're also working on new features. So some of the things that we've seen in the past couple of releases, we've added CRI support on Windows which graduated in 1.20. In 1.21, we graduated Immutable Secrets and ConfigMap support in the Kubelet. We've also added Alpha support for memory management in the Kubelet which enables NUMA use cases. We have added the pod resource API extension to GA, so that's now been graduated. We've launched graceful nodes shut down in beta as of 1.21, and many more. We've also worked on a couple of enhancements that are bug fixes, particularly around probes. So for example, we've fixed exec probe timeouts in 1.20, and we've added a configurable grace period for probes that began Alpha in 1.21 in order to override the pod level spec. In terms of our upcoming roadmap, some of the features that we will be working on in the next release include swap support on nodes for both workload stability at the system level and workloads, pod overhead which we target graduating in 1.22 for resource accounting, username space support which we hope to add Alpha support for in 1.22. We're also looking to graduate the CRI API to beta in the next release, send C groups V2 support to beta in 1.22, and also to remove Docker shim in addition to many more features. But since Docker shim is so big, Sergey, do you want to tell us a little bit more about the Docker shim removal? Yeah, thanks, Alana. Indeed, we have so many features and so many different areas we're working on, and this feature is something that we're already talking about, even though it's scheduled for 1.24, which is April next year. So Docker was the very first runtime that Kublet was using to schedule containers. You can think of Kublet as something that receives signals from API and it schedules its containers on machine. But to schedule them, I actually do work of running them, it uses Docker. So it calls Docker shims through the API called CRI Container Runtime Interface. Docker in its turn calls into Container D and Container D schedules containers. You may note that Docker also provides extra set of tools to work on these containers, and it also puts them in special Docker environments, so containers have more information about each other. So Docker provides more tooling around containers than Kublet needed. Without like for many years already, we've been working on removing dependency on Docker to eliminate this overhead of extra features that Kublet doesn't need. So with Container Runtime Interface, you can call runtime directly, and this runtime may be the same container D as with Docker or Cryo or there are many other like few other runtimes. With this direct call into runtime through Container Runtime Interface, you get a little bit faster scheduling because you don't have this overhead of Docker. You have less dependencies on your machine and better portability of your environment because you don't take dependencies on Docker as a Container Runtime Interface or as a runtime of a choice. We also get in a faster feature development as a benefit because now before, I would say now, now with the Docker and other runtimes, we need to implement every feature on Docker as well as CRI to work on all the runtimes. With the removal of Docker Sheem out of the tree, we only need to concentrate on few runtimes and this will make feature development faster and test coverage of this feature better. So we recommend to migrate all the Docker Sheem now before 124 will come. And we already can do that because Container D and CRI is production quality is well supported and you can use them at any moment now. So what you will be losing? Generally, you wouldn't be losing anything because your containers were around the same way. You will only get problems if you intentionally or unintentionally took a dependency on Docker. You may be calling Docker PS from your support scripts or you may be even like doing something dangerous by executing code into containers that currently running with Docker exec. You may be pre-pulling images with Docker pool following some instructions from off-Internet. It's all not gonna work with migration to other runtimes because Docker wouldn't know about these containers. We don't call into Docker to schedule the containers, we call into runtime directly. Some things will work if you install Docker, like Docker build may still be working if you have Docker on the machine. But why would you do that if you have other tools that doesn't require to run in privileged ports? So you may not want, like you can run Docker built and pull and push off images, but you may not want to do that. And eliminating the dependency will make your workload even more portable and faster. So please migrate off the Docker shim. With Docker shim, we're using a little bit different code paths as we use with CRI in general. And that increases the need for test coverage. And in general, the amount of component signal managers and amount of areas we are responsible for makes the test coverage is a big issue for us. So in signal, we always prioritize stability first and we've been taking tests over features. We've been taking tests over box fixes. We can take box fixes only with stats because we want to make sure that everything is stable and we wouldn't catch a regression next time. And we always was cautious about test info and health of a project in general. In fact, a few releases back, we even delayed release a little bit because of stability issues on release block and test jobs. So we formed this CI subproject of Signode and we have separate meetings and we have separate subset of people who really like working on this area. So what are you working on? You're looking at test matrix. Test matrix is something you can learn about by going to community, Kubernetes community website. And you're looking specifically on Signode test. And whenever we see some flaking test or failing test, we schedule someone to work on this flake and fix it. This way we have constant attention on test matrix and we can guarantee that Kubernetes code base is stable and can be released successfully. With the CI subgroup is very welcoming community and it's a really good place to start learning about Kubernetes because you learn from doing some actual work and this work is something that we prioritize really highly. So please join us in CI subproject. And now we will talk about the scale of Signode, Ilana. Yeah, for sure. So one of the interesting things about working in Signode which is a vertical SIG is that Signode in terms of absolute workload is the third largest SIG in the Kubernetes project. And that's out of about two dozen SIGs. So what does that translate to in terms of contributor volume? On an average day, we have about 200 open PRs. And in a given week, we usually merge or close anywhere between 20 to 80 PRs per week. We have hundreds of contributors that work on Signode. And in the past year, we've had contributors for over 39 companies. So your help is needed. We can't do this alone. And one of the ways that I've worked to manage this scale is moving a lot of our work into project boards that help us track the state of these hundreds of PRs. So as you can see on the right of this slide, it's a quick snapshot, which we'll zoom in on the next slide of a project board with multiple columns that help to track the state of a pull request. And for now, we're currently at the Kubernetes, Kubernetes level tracking specifically PRs within Signode with the hope of expanding to project issues in the future. So at the scope of Signode, we look at and separate PRs out into categories of needing triage, waiting on author, waiting for a reviewer, waiting for an approver and done. And this allows new contributors who want to jump right in to ensure that pull requests are correctly categorized in the right column on the board. The correct labels are applied and that if you want to go ahead and start either reviewing PRs or approving PRs, the column of everything that you need to work on is right there. So hopefully it's very handy and with tools like project filtering, it can be even easier to work on. So this is an example of how we're doing this on the triage board for the CI and test sub project. So on this sub project, we also focus on issues since people sometimes file flaky test issues against us and we need to track those to ensure that they get closed. So we have multiple columns here for issues and PR reviews and there's more columns if you were to scroll right but we can't do that on the slide. And here you can see in the search box, I've applied a no assignee search which allows us to look at all of the items on the board that don't currently have an assignee which means that probably nobody's working on them right now. So we might want to ensure before the end of our meeting that someone's assigned. We can also filter on labels on whether a PR is open or not, whether something is a PR versus an issue and so on. So this makes it much easier to track the hundreds of PRs and issues that the SIG has within the project. So as I mentioned, we're focused on pull requests in the 121 release, but we do hope that in the future we might be able to expand this to issues as well and I would look forward to help from the community to make that happen. And so if you look at our slides after the talk, we've included lots of links and so you can access the SIG node PR board as well as the CI and test enhancements board. One of the other things we've been working on as part of triage is improving our documentation. So more people can jump in and get involved. So if the board looks a little intimidating or you're not really sure where to jump in or what does reviewing a PR even mean or how do I find an approver? We've written all of that down to make it easier for you to get started or if you haven't worked on the project in a while to get a refresher. So you can click on the triage guide within the SIG node community repo and read up on all of these various roles and what you can do for your level in the project whether you're currently interested in becoming a Kubernetes org member, already an org member, a reviewer or an approver. We're also looking to add a contributing file specifically to help you get started with SIG node. So you can use that as your one-stop shop to get started. It'll have links to all sorts of things that you need to focus on in terms of working with the Kubelet, working with the CRI and getting started. One of our other efforts has been to bring on new contributors to assist with the triage and I hosted and recently graduated a cohort of new node reviewers from underrepresented groups. So you can read more about the program in the community repo under the mentoring section. And that is to say, we really want you to get involved and we're trying to make it easy for you. So what can you work on? As Sergey said earlier, we're prioritizing stability first. So we wanna make sure that our test coverage is good and our tests are passing before we start working on bug fixes or known open issues and then prioritizing features last. Of course we graduate things from all three categories, but if you're looking for somewhere to get started, highest impact stuff you can work on are probably tests and bug fixes. We also need help with our test infrastructure monitoring and health. We also want to improve both the user experience for an operator of a cluster who's working with Cubelets and the developer experience for folks who want to work on SIG node. So it is very helpful if you want to contribute documentation or help improve our logging and metrics or help us keep on top of pull requests and issues because there's just so many on the board. In the latest release, one of our exciting new features was that we graduated we migrated the entire Cubelet to structured logging. So that's one example of one of these sorts of efforts where new contributors found it relatively straightforward to get involved and are now more involved in SIG node. So more to come, things like that. And this effort, I can say is very important because it will improve the instability and troubleshooting of node. So kudos to new contributors who participated in this effort. Thanks all. So okay, you're interested in contributing, you know approximately what you want to work on and you're excited to get involved, how do you do it? Attend our SIG meetings. That is usually your first stop. So our SIG meetings, which are weekly cover features, Kubernetes enhancement proposals and more and they typically involve all of the wider discussions as a SIG that we would undertake. Our CI and triage meetings are a smaller hands-on group that still happen weekly. So if you want to learn how to triage issues or improve CI, there are really good settings to get started in because you'll be able to ask us questions one-on-one and we'll be able to help out. We also invite you to participate in code review, filing issues, commenting on issues and improving our documentation. So again here, I've got links to our triage guide as well as both of our boards in case you want to get involved there. We also welcome people to adopt new features and give feedback. Writing code is not the only way that you can make Kubernetes better and it's really important for a lot of our features, particularly as we are focusing on graduating things from beta to graduation that we get test data from real environments in order to ensure that we're not introducing any regressions, that performance is okay and so on. So testers are very welcome to join us as well and we'd love to see you at our meetings. And even if you're not considering yourself, Tester, just give us feedback on what you're using and what you're not using so we know where to pay attention to. So where can you find us? Specifically, our regular meetings are weekly on Tuesdays at 10 a.m. specific time and if you click on the link, you'll be able to convert that into your local time zone. Our CI and triage meetings are weekly on Wednesdays at 10 a.m. specific time. Our Slack channel is PoundSignode on the Kubernetes Slack. Our mailing list is the Kubernetes Signode Google group and our SIG chairs are Don Chen at Google and Derek Carr at Red Hat in case you wanna ask them any questions. Thanks so much for listening to our talk. Thank you.