 Hello, everyone. Welcome to 2028 news. Cupid Khan, North America. Here is the signal, the intro and the deep dialogue. My name is Don Chen. I am a software engineer from Google and the kind of working on GKE and answers. I'm one of the funding engineers for Kubernetes and initiate the signal the back in 2016 together with Derek. Derek. Happy to be here. My name is Derek. I'm an engineer at Red Hat. Like Don, I've worked on and around the communities community for a very long time. And I also work on the OpenShift product from Red Hat, which is our for next distribution. Sergey, you want to introduce yourself? Yeah, absolutely. And comparing to you two, I'm newest addition to signal. I've been a signal for two years and I work for Google and working on GKE. Before we get into today's agenda, I want to briefly mention the previous the signaled update we made at the TupperCon Europe earlier this year. You can access related slides, records by clicking those links. Here is the today's agenda. And we're going to first to recap the signals responsibility. Then we are going to talk about the current activities since the last update. The roadmap from the 1.25 to 1.6 and then some of interesting projects and the efforts currently driven by signal. Signal version two in place of powder assessing, even PLBD. After that, we are briefing introduce several new subprojects initiated by signal since last update. Kernel modules, dynamic resource management and batch group participants. Then circuit will update us with CI project, which is the highest priority in our community. Finally, we are going to talk about how to get involved with and how to get help from the community. What is the signal and its responsibility? Let's briefly talk about the node responsibility in Kubernetes. Kubernetes is a classed ultra-student solution for container-less applications and services, including Kubernetes controllers running on the nodes. On each node, there is an agent called Kubernetes. Kubernetes registers the node to the Kubernetes controller. Kubernetes together with the container runtime manages part and containers life cycle and doing the setup run, tear down, and clean up. Kubernetes also does the node-level resource management, such as ensure the services get the request resources, detect the node-level resource starvation issues, and take actions to prevent the out-of-resource situation. Kubernetes also sends the status back to the control plan. I always call Kubernetes the bring of the node. In summary, SIG node owns all controllers running on the nodes, which ensure the node itself and applications running happily. SIG node is very large and on many projects. You can click the link here to find them all. Recently, starting 1.20, we've been focusing on eliminating permanent betas. Permanent bait is a feature that enters the beta stage. We start collecting feedback and accumulated many customers and many usage of the feature, but never made it official GA. It creates some confusion among customers and generally bad for software. So we deliberately tried to eliminate betas as well as did other major duplications like docker shim removal. We've been doing a lot. We removed the GA features and we created very small features that are being used already, but whenever had time to be officially called GA. But the work is not complete yet. We still have a few feature gates and features are still in beta. They go all the way back to one, four of Kubernetes. We're working on eliminating this as well. At least two of them are planned for 1.26 and we will be working on the rest of them to make sure that we don't have any permanent betas. We also have very old features that are long in beta like CPU manager, but for this kind of features, we are actually working on new policies and collecting feedback to make sure that this feature is done right. So even though it's beta for a long time, we're actually working on that. That's why it's still in this status and not graduation to GA yet. That said, we, even with this focus on remaining permanent betas, we're still working on new features. Some of them brand new, some of them just updating alpha to beta. We currently have 21 feature gates to give a take for SIG node and all of them are being worked on in parallel. And Derek here will give us some update on what exactly happened in 1.25 and what is our plan for 1.26 and onward. Thanks, Erke. So in 1.25, we graduated a number of features that were in development to stable or generally available status. So what this meant was that the feature was well-tested, well-understood and documented and offers a backward compatibility guarantee going forward. Two I'll highlight here that I think are particularly interesting are those issues, numbers that are low. Kubernetes has been a project for a very long time and some features take a long time for us to mature and bring to stable. A thermal containers is a good example of that. You have a feature that is changing the life cycle of a pod and how we run containers in those pods. So it's really good to see that feature go to stable in 1.25 and start to close out these very early issue numbers in this case, 277. Other features of note here, a thermal storage capacity isolation. So you can schedule on node local storage that graduated to stable a C-group C2 which we'll discuss in a little more detail later is also stable so we can better support newer, more modern Linux distributions. An example of a use case that was removed from the KubeLit was the ability for the KubeLit to advertise particular GPU accelerator usage metrics. This has now been offloaded to individual GPU device plugins and vendors. And then lastly, a feature that went to stable that I really appreciate is now you can identify the operating system that is required by a pod. So if you're running Windows nodes or Linux nodes, you can actually look at the pod now and know this should land on a Linux node or Windows host, a very helpful feature. Next slide. We had fewer features graduate to beta. The one of note here was enabling SecComp by default. This has been long explored and so it was good to see that this feature continues to make progress. One note I'll make about beta concepts going forward is it used to be beta features were on by default. There has been a posture change in the project to make beta features off by default. So keep that in mind for future beta graduations. And then last, Sergei noted we are continuing to explore new features so things that are alpha typically mean that they might not be as richly tested. They're definitely off by default. The problem space they explore are complicated and often need more time to mature. The one issue I will highlight here is user namespace support, which you can see at a very early issue number of 127. User namespaces has been a complicated topic to talk about throughout Kubernetes and the problem space is large particularly as it intersects with storage. We chose to make some forward progress in this space to try to facilitate username space support for stateless pods and it will be interesting to see that feature continue to evolve going forward. So as we look ahead to 126 and onward, we are hoping to continue to make progress on better resource sizing of pods, particularly allowing you to resource, update resource requirements in place for things like CPU and memory. Always in SIG node, we're trying to bring down the management overhead of supporting your nodes. So you have more resources available for your workloads. To that effort, you'll hear in this discussion more detail on an effort we're calling the evented plug. And then security is a concern obviously in the Kubernetes community broadly into our users. So areas of improving the ability to better integrate with secured image sources is important and you'll see work in that space. And then finally, as we look to support the evolving environment, we are trying to explore possible prudent evolutions of both the pod lifecycle and resource plugins that pods might consume going forward. You'll hear more details on those in this presentation. So let's dive a little bit on C-Grips V2. We talked about this a little bit in the last SIG node update, but I think it's important to emphasize now that it won't stable because it will likely get more production usage. So as you go to the next slide, just to level set and remind folks, C-Grips V1 and V2 have had feature parity support since Kubernetes 1.24, but many Linux distributions or many environments for users were running workloads in production. We're still only using C-Grips V1 and V2 support is still something that we are anticipating feedback on in the years ahead. We are not actively deprecating C-Grips V1 support just because we added V2 support, we will tolerate both C-Grips controllers for an extended period of time, except as we add new resource management features into a SIG node's domain, we will likely only do new resource controllers on V2. If you are exploring Linux hosts, whether that's a newer Fedora, Ubuntu, COS sort of distros, it's very possible that those distros now default to boot into C-Grips V2. And so it would be great now that the Kubebit can work in a stable base on that platform. So you definitely be aware as you're exploring this feature, there's not a unique knob to turn on. It kind of is self-identified based on how you configured your operating system. For new features in this space that we're excited about, we do think that Kubernetes can offer better memory quality of service and we have work in flight today around this in the Kubebit to support that, as well as we think that there's opportunities to explore the UMD user space out of memory killer to ultimately make the node more reliable. What should you do to get ready for C-Grips V2 as you're exploring putting into your environment? Just know many workloads actually have no dependency on the C-Grips version, it should be transparent or opaque to you. If you are using C-Grips V2, we do recommend that you use the system, the C-Grip Manager for your container on time and the Kubebit when you deploy. Finding all possible issues for a change like this is hard to automatically discover and sometimes we expect feedback or you might discover that intersectional projects or solutions that layer on top of Kubernetes may have had unknown dependencies. So for example, if you are using a security or a monitoring agent, it's likely they need to have some updates to support that C-Grip V2 environment. But I've been a lot of work here and we believe many agents have already updated so should be smooth. Demand specific challenges just to be aware of in certain industries, there might be a unique C-Grips V1 feature that was not yet present in V2 that may have been explored. And where those have come from, we try to redirect engagement into the appropriate upstream kernel community to get that capability capable if possible. If you're using container images where you might have particular runtimes in those images, sometimes you might need to update those runtimes in order to take, to have awareness of a C-Grips V2 host. So for example, on GoLine, you might need to manually set GoMax Crocs or if you're using older JDKs, you might need to update to have your JDK be aware of a V2 environment. So for those who are starting to explore this and wanna get production feedback or areas for improvement, please reach out to the signal community and we're excited to continue to see this be successful. Next slide. Dawn, you wanna talk about in-place pod resizing? Thanks, Derek. I'm going to talk about the ongoing feature in-place pod resizing. Before we get into the detail, I want to quickly summarize the auto-scanning features offered by the Kubernetes today. The horizontal pod auto-scanner automatically adds or removes pods based on the usage rate of your application. So when the usage rate of your application goes up, auto-scanner adds pods for you. When it's go down, the auto-scanner automatically removes the pods. This happens based on the CPU and memory by default, but it's also possible to use in customer metrics. This is great of the handling more requests when you have enough space to run more pods on. But if there's no space to run additional pods, even if the horizontal pod auto-scanner wants to add more, out of luck, unless you are using cluster auto-scanner. What cluster auto-scanner does, it adds more resources to your clusters. In these cases, more machines, more nodes, that pods can run out. Those two features together address scale-out issues, but in many cases, the application would run out of the memory and a heavy node. And there wasn't a real good way to estimate how much memory was needed at any given time. Vertical pod auto-scanner helps modify the resource provision to an individual services, like if a pod doesn't have enough CPU or if it has too much of the memory. Vertical pod auto-scanner exists to automatically set up to date resource limits and the requests for the containers in their pods. It can both downscale pods that are over-requesting resources and also upscale pods that are end-requesting resources based on their usage over time. So far, everything works great, except vertical pod scanner, up pod CPU or memory resources that require pod restarts. This is very disruptive to the services and expensive for long-running applications. It is very important to have the ability to scale pod resources without restarts. So in case pod resourcing is designed to address this problem, this is a non-deserved features and the signal to initiate this discussing back in 2015. The project was actually designed and implemented until 2018 by Vinny. Last four years, Vinny together with the community have iterated and evolved the design and implementation several times, along with the several new resource management features, huge pages, so on, single word to, and so on. Finally, we are ready to release the alpha version in 1.26. I'm calling out this ongoing feature today to get the communities and the users attention. Please help us to review and test this non-deserved features and share your feedback with us. I want to talk about evented plaque as one of the highlights of this release. We plaque or pod lifecycle event generator is was an improvement over the way how Kublat knows about container event. It was an improvement over every pod controller was querying its container statuses itself. It was very racy. Like we need to talk about parallelism and how many container on time needs to handle. Pod lifecycle event generator is a single thread kind of a releasing of all the containers on a node. It asks the status of all the containers and then make updates on its own understanding what container is doing based on this status. For instance, if it knows that container was running and now it's not running, then Kublat may decide whether this container needs to be restarted or something else needs to happen with this container. What this is working very good. We pod lifecycle event generator works great on existing environments. It's simple implementation is very straightforward. It guarantees consistency because all the time we just take all the containers information. We don't rely on container on time being able to generate events for us and performance is acceptable. Open source Kubernetes only supports up to like 100 something pods and some environments support a little bit like maybe two times more. But in general it's very, it's okay. Like performance wise it's okay, but we started receiving more requirements. We have environments with many pods and every pod has many containers. We have high availability requirements for workload and some workloads wants much faster and failure detection that we can provide with a releasing. Also we have more questions about low overhead environments like edge devices where even the listing takes a lot of overhead and needs to be eliminated or done way less often. So plug V2 will be streaming based. We will rely on container on time to give us information when something happens instead of us asking constantly if anything changed. So this supposed to be, this should be a very good improvement that will help a lot. And we hope that you will install a new version of Kubernetes and we'll get free CPU cycles out of it just because we made some internal improvements. So again, we've been talking about a 126 round map and we've been asking feedback for one GA feature, one alpha feature and gave you every highlight of internal improvements that we're making with things like invented black. Now we want to talk about new projects and new sub-projects if you're running and we start with Derek. Yeah. So one of the new sub-projects that Signode has formed since our last update was around improvements to support kernel module management on Kubernetes platforms. For those who are interested in exploring this space, you'll see there's a new repository in the Kubernetes Signs GitHub organization where approaches to kernel module management on Kubernetes nodes are being explored. If this is a area of interest for you, please reach out to contributors in that repository and help make the solution broadly available for the community as a whole. We're excited to see new projects that come on. Dawn, you want to go next? Dynamic resource allocation is another non-desired feature while Kubernetes adopted by the industry as the standard container authorization. Well-themed increasing leads to better support non-lative computer resources on Kubernetes. Which is covers a wide range of the resources, such as the GPUs, high performance leaks, infinity bands, and so on. And such resources often require vendor specific setup and they have a rich set of the different properties even across devices of the same type. This is to bring a new requirements to the Kubernetes. Signal proposed the initial design back in 2018, but it didn't prioritize the implementation due to the complexity and the overhead and also the dependencies. So the new designs proposed with the narrow down scope, now it was approved for 1.25 and hopefully we can have the alpha implementation for the 1.26. But we call out for this feature today because we want to get the community's attention. We also want to call out the complexity along with this feature to the Kubernetes management. Next? Let's talk about batch work group participation of our SIG. Batch work group was formed very recently as a response for it has been not ideal environment for running batch workloads. Many jobs, we have many problems with jobs scheduling and ways to control job execution. Around Kubernetes, there are many third party vendors who's providing ways to run jobs on Kubernetes. But all the solutions are quite hacky and they require some changes in core of Kubernetes and this is what we're doing. Specifically from SIGNOT perspective, we concentrate on caps like retribal and non-retribal job exit codes. So we will be able to be more granular with jobs needs to be retried and with jobs fail forever and we don't want to waste resources on that. As well as some life cycle controls like Keystone containers, caps that we're discussing that will help with scheduling jobs with sidecar containers and such. Plus jobs are very resource intensive. Typically, they don't have any weight expecting some user, they just doing a lot of calculations. That's why we need to be as much resource aware as possible and all the numerous scheduling improvements and device plugins improvements. It all helps with batch work group and with job execution. And all of the sub projects are only possible when we keep our SIGNOT stable and all the code reliable. We keep running this CI work group. It was formed two years back and we keep running it since we did a lot of good progress. We had very successful releases recently and we keep improving some old failing tabs in test grid. So we recently improved NPD test grid that was failing for a long time and nobody paid too much attention. We finally get our hands to it and it was fixed as well as some other improvements. We're running more environments like Cryo was added to the test set regularly and now it's most of the time green. So we're doing a lot of work. We get some regression over summertime because less people are participating but now we hope that this work will be resumed and we will do more work on test classification. So we know which test needs to be looked urgently and which test can be a little bit less urgently plus doing more permutation of which test to run and when. We are taking our work in GitHub project during our meeting. This board is publicly available. If you want to contribute and you don't know where to start, you can always pick up work from this board. We also proactively controlling availability signal by triage and bugs for signal. And again, if you're interested which issues customer experience and specifically you can join this meeting and hear discussions we have on bug triage for all the signal bugs. And as I said, there are many issues and we need to get you involved. Don? Thanks, Sergei. There are many of the things going on in missing of the signal. By far signal is the third largest of the community we've seen component is there. And also you can see that from here and also introduced the better ticket previous thing. We have so many go going on and the signal by the work node they have the 200 plus of the average open PR per week. We also merge 20 to 30 PS every week and close every week. We also have the developer and the contributor from all over the world and from more than 39 companies dedicated to work for the improve of the signal. And we really need more people, more contribution from you from the community. So how to contribute? And here's our signal node grid priority. First thing we are really keen on the stability and the sustainability. So there's the effort for the CI project which Sergei just mentioned earlier and we tried the issues and those meeting it is held every Wednesday. Please join that effort if you want to help on the signal and get us to start. That is the first thing you can involved. And we also started to focus on optimization and the reduce of the management overhead and make our load more efficient. And the third one it is, there's the more features. There's some like new sub project initiatives. There's also ongoing features and the grid or the alpha feature to beta, then to the GA. So there are many things to go on. And also to help off the user and the developer we have a lot of documentation and user guide and the developer guide we need to put the effort in. And so welcome to contribute and get involved and contribute for all those items and the categories. How to contribute and also how to get help if you have your as a user. You can attend our SIG meetings. We have the two SIG meetings to so far regularly hold. When it is the regular signal meeting every Tuesday which is cover features, caps, designs, even issues. We also have the every Wednesday have the CI and the tribes meeting which is the smaller but and which is goes to all those user issues, bugs and test the fingers. Then you also can join our Slack channel and ask a question and also write a proposal and file the bugs issues to us. The please reach out to us and we also have the secret note, the meaningless you can send off the suggestion and the feedback directed to us. Thank you.