 My name is Marshal Jiangba, and I'm software engineer at Google, and I'm also one of the two chairs of Six Scalability, and today we'll be talking about Six Scalability introduction that also deep dive. So the first question that we need to answer is what we do as a Six Scalability. Six Scalability is a bit different from other Six because we do not own production code, we only own test code that we use for testing the scalability of Kubernetes. So basically we do have five areas that we are interested in in terms of scalability. So first of all, the most important is actually defining what the scalability is. And once we know what the scalability means, which I will explain later, we also want to set up some goals. So you can think of it as we could start improving the Kubernetes by making some improvements, but it doesn't matter if the end user is not able to see those improvements and actually have some better scalability properties. Except for defining and driving those goals, we also contribute to Kubernetes in a way that we want to make Kubernetes more scalable. So what we do is we usually coordinate between Six or ask other Six to actually make some improvements in order to make Kubernetes more scalable. But okay, let's say that we do have those goals in mind for Kubernetes that we want to achieve. But first of all, we need to measure and monitor how the performance and scalability of Kubernetes actually changes over time. You probably know that every day, there are multiple PRs merged into Kubernetes and each of these PRs, these features, these changes can potentially impact the performance of the whole Kubernetes. So that's why one of the most important areas of Six scalability is actually monitoring the performance of Kubernetes to make sure that we do not regress, which is actually the next point here. So basically, once we have this monitoring, then we are able to track different metrics and see if there is any regression. And last but not least, we also consult and coach other Six. So probably most of you know that there are Caps, which Caps is kind of like design doc. And when you want to implement new feature, there are some sections that are specific to Six scalability with questions like how does your feature impact control plane or what kind of calls is making your feature that are additional to the previous state. So one more important part is not to confuse, wait a second, oh, not to confuse Six scalability with Six autoscaling, these are two different Six. So what is the Kubernetes scalability? So sometimes we ask our users, okay, what do they want? And basically they say, wait, well they want scalable clusters. But if we ask them like, okay, but what does it mean? Well, they usually don't really know because scalability is not just a single number. So I can give you one example where in 2015, the Kubernetes 1.0 was actually supporting 100 nodes. And then this number changed over few releases to 1000 nodes, 2000, 5000 nodes. And 5000 nodes was supported in 2017 in Kubernetes 1.6. And this number did not change, this number did not change since 2017. So what I want to say is that the scalability of Kubernetes is not just the number of nodes, it's much more than that. So now I just want to introduce this concept of scalability envelope. So if you think about Kubernetes and the scalability, what you want to actually think of is multi-dimensional problem. So there are many, many more dimensions that you want to take into account when thinking about the scalability of Kubernetes. So just to give you a few examples, like number of nodes is just one of the dimensions that we are interested in. But then you have like number of secrets or like how many pods you can have per node or podchern for example. And now taking into account all those dimensions, we have scalability envelope. So the scalability envelope is safe zone. So if you are within it, your cluster is happy, but we still don't know what does it mean that the cluster is happy. So you can think of it as like if you are within those limits, then something happens and now I will try to explain what. So what does the happiness of the cluster mean? So in order to think about cluster being happy, we need to introduce two concepts. One is SLI, which is service level indicator, and SLO, which is service level objective. So in simple words, I think about SLI as a metric. So let's say I have metric that says pod startup latency is X second. And then on top of this metric, what I can do is put some kind of threshold. So for example, I want to have pod startup latency like 99th percentile of pod startup latency to be below specific threshold. So for example, in Kubernetes, it would be five seconds. So basically the cluster is happy when all the scalability SLOs are satisfied. So I just want to give you a few examples of scalability SLOs. And this list of scalability SLOs is actually changing over time. And we started with very two simple SLOs few years ago, which was API call latency and pod startup latency. You can find all those SLOs and the description on our six scalability page. But I wanted to give you some kind of idea how well those SLOs are defined. Because just saying that we care about the call latency is not enough. There is actually way more definition to that. And so let's start with the SLI. SLI for the API call latency. So the SLI for API call latency for mutating requests only is defined as you take five minutes window and you choose one resource, one verb. So let's say deployments and patch. And what you want to do is you measure 99% tile within this window. So you can think of it as like for each resource, for each verb, you will have different metric. And then what we do, we say that the SLL is kind of aggregated for each specific resource and verb for the whole day. So in the whole day you have 288 windows of five minutes. And then only two can not met this kind of threshold which is one second. So we want to have all the mutating API calls to be within one second. And if you have like 288 windows of five minutes then basically you can only have two windows that are not meeting this one second threshold. So this is how the definition of the API call latency SLL looks like. So that's just one of the six that I showed before. And once we have that and we have the scalability and the help, what we can do is actually test, test some limits because computing the whole scalability envelope is impossible. As you can see, there are like multiple dimensions and you could think of it as, you know, like I could have zero pods but like one million namespaces but is it useful? It's not really useful for the user. So what we want to do is kind of model it how our users use Kubernetes and let's say that, okay, we are interested in some limits like number of nodes to be up to 5,000, number of pods like 30 pods per node, number of services to 10,000. And as a six scalability, what we do, we actually run those tests that maximize all those limits and we check for all the SLLs and make sure that they are met. And you can find all those thresholds that I mentioned like here we have three examples but you can also find multiple other thresholds or limits like for pod turn or number of secrets and so on. So now we will move towards actually our infrastructure like what kind of tools we use for making sure that the scalability envelope and all those SLLs are satisfied in Kubernetes. So the most important tool that we use is Cluster Loader 2. So Cluster Loader 2 is actually our internal six scalability tool used for testing scalability of Kubernetes. So what you can think of it is that you provide some kind of states that you want your cluster to be in. So let's say that you are starting with empty cluster and then you want to make sure that the cluster is working when you have 1,000 deployments and 100,000 pods for example. So you are specifying this state and then let's say you are also specifying how do you want to transition from empty state of the cluster to this desired state. So let's say you are saying that you want to create one deployment per one second for example and Cluster Loader is actually doing that. So Cluster Loader is creating those deployments as you wish and along the way the Cluster Loader is also measuring all those SLLs and SLLs to make sure that for example pod startup latency did not exceed five seconds during the whole test. And there is also a bunch of other features that you can use for example for debugging but I will mention a few a bit later. Also you can imagine that if you want to test 5,000 nodes using real clusters is pretty expensive. Even if it's like one CPU per node then it's still 5,000 CPUs. So running those tests it's pretty expensive and as a six scalability we cannot run them like super often so 5,000 nodes tests are run only once a day but we also do have Cubemark and Cubemark it's something that we call as a cluster simulation. So basically the idea is that we have Cubemark master and what we are trying to do is actually test this master and we don't want to use 5,000 nodes. So what do we do is we actually create let's say 80 or 100 nodes and on those nodes what we do is we actually run something that simulates hollow nodes. So what we do is we run hollow nodes and those hollow nodes simulate the traffic that the normal control plane would see and with that improvement basically instead of having 5,000 nodes we can usually do with like 80 nodes and each let's say four or eight CPUs which brings down the cost significantly down. But of course the issue is here like okay so let's say that we want to have this kind of setup where we have three actual nodes and here 15 hollow nodes. So how do you actually run those hollow nodes? So these whole nodes actually need to be somehow scheduled onto the real nodes. So this is actually pretty funny because we use also Kubernetes to do that. So there are actually two clusters. One cluster that is responsible for running those hollow nodes on actual nodes and those whole nodes are connected to the separate master that we want to scale test. And because of that improvement we are able to actually run scalability tests much more often because it does not consume as much resources as like regular clusters. Except for that we have a bunch of monitoring and observability tools. And Pervdash Pervdash is one of our favorite tools. It looks super simple, it's super simple but it allows us to find multiple regressions because like I mentioned before that we have SLOs, we have some tests but there are sometimes regressions that are not found by just SLOs. Let's say that the pod startup latency, actually this is the pod startup latency example and there was some regression sometime ago. And we only saw that the number of pod startup latency increased by like 400 milliseconds and once we found it, we fixed it and basically here you can see that we brought down the latency of pod startup latency by 400 milliseconds. But our tests were not able to actually find it because the SLO says that the 99% I need to stay below five seconds which was still true but we still saw the regression due to the Pervdash and all those cool charts here. Except for that if you want to also see one particular run of our scalability test then you can use our Grafana dashboards. They are super useful and you can just use them for debugging scalability issues. And also like when you're running cluster loader, cluster loader two out of box you get much more observability also in terms of profiling. So when running the test we are also gathering memory and CPU profiling. And so for example then later you can see that, okay during the load test there were some parts where the control plane CPU was burning quite heavily and you can just grab the profiling and see what kind of parts of the Kubernetes control plane was actually consuming the CPU. So now we move to the scalability test. So I showed you all those tools that we use for monitoring and testing the scalability of Kubernetes. But now what kind of test we are actually running to ensure that the Kubernetes is scalable. So there are two types of periodic tests that we run. One are release blocking tests and release blocking tests there are actually performance tests for 100 nodes, 5,000 nodes. And these are actually run on real clusters and also correctness 5,000 nodes. So the performance is purely about the performance like how fast the pod startup latency is or like what kind of latencies do we have on the control plane. And correctness is more about that the features that Kubernetes is providing still work when you have like 5,000 nodes in your cluster. Except for that we do have non-release blocking tests and they are super informative. So one of the examples, one of my favorite examples is actually a benchmarking for going. So in the past over the last, I think three or four years we've noticed that there were multiple regressions coming from the go-ank. So what we did was actually we froze all the dependencies except for the go-ank compiler. And what we do is we just have frozen Kubernetes version and we run our cluster loader performance test and we just change the go-compiler version. And based on that we noticed if there are some regressions in go-ank compiler and we just informed the go-ank team that okay we noticed that there is regression between like for example those two commits and they are able to easily detect those regressions as well because Kubernetes is actually one of the actually the biggest project I think in going. So it's super useful for them as well to validate if they are not introducing any regressions. But except for that we have also the cubemark some bunch of storage tests and so on. There is one more thing I wanted to mention. So if you are contributing to Kubernetes then whenever you open PR there is bunch of pre-submits and one of the pre-submits is actually the performance 100 nodes test. Test so whenever you open PR there is actually job running that creates 100 nodes cluster and it's running the load test to check if there are no any obvious regressions in performance. And that's basically it in terms of scalability tests. How we protect the scalability of Kubernetes. So I mentioned those CI tests and you can see our test grid. What's happening basically is like we have those tests running for the master branch and also we do have for all releases that will be happening for example, 124 or 125 the releases are still happening. So we are still running those tests and this test grid is actually available to everyone. If you want to see you can just go to our test grid and see in what state the current Kubernetes scalability is. So I mentioned few of the standard regressions that we already saw but scalability is super sensitive. One small change can break the whole scalability of Kubernetes and we've seen regressions pretty much from everywhere like Golang I mentioned my favorite but then also operating system or controllers, API machinery, scheduler, SCD, Kublet, you name it. It's basically all. So what's happening is that usually once we observe that there is some regression, we kind of try to narrow down where this regression happened and usually contact outdoor or even revert the whole feature if we think that the scalability is not as good as it was before. So there are a few actually pretty interesting regressions that we solved quite recently. To name one example. So for example, there was recently introduced recently. Priority and fairness and probably you heard about it and some of the calls in priority and fairness were incorrectly estimated and because they were incorrectly estimated they were consuming way more, we call it seats. And what was happening was that in some of the setups that our customers use, the API call latency was significantly higher than it's supposed to be. So that's something that for example we recently detected and fixed last few versions. And there is the also interesting part where we actually drive some improvements. So again, like going back to going, it's not like going was only introducing regressions in Kubernetes migration to going 118 helped Kubernetes performance quite significantly but maybe you also heard that the memory footprint also increased significantly. So this is one of the examples where actually the improvement was quite huge. 99th percentile of API call latency was basically like 10 times smaller than it was before. So it was quite huge improvement. But there are also some other improvements that we made like page size progression for our selectors. So what you can do is you can list some resources from the API server and you can provide limit. So let's say that you provide limit of one but you also have some field selector. So what was happening was that whenever you were making such call, the elements from the at CD were fetched one by one. So if you have like very rare selector and let's say you have 1000 items and only the last one matches the selector what was happening was that there were actually there were actually 1000 calls to at CD made but we fixed it with progression of the limit. So the limit that you use for the API calls to API server is no longer bound to the limit that we use for fetching the results from at CD. So now to sum up like if you want to get involved we have homepage where you can find all the contact information if you are interested in scalability. We do have bi-weekly public meetings. We do have also mailing lists. And if you want to get involved in for example our testing infrastructure, we do have some issues that are marked as help wanted. So you can just pick up one and get involved in scalability of Kubernetes. And now we have I think few minutes for Q and A. Do we have microphone maybe? Hi, my question is basically is it the scale we recommend to each Kubernetes operatives can go the one which you are testing. So if they go beyond that it's their own risks of say. That's a great question. So actually what's happening is that there are limits as I mentioned that we are testing. Like for example, 5,000 nodes or 10,000 services. And if you go beyond that what's usually happening the performance is degrading kind of like gracefully in a way that the latency start increasing and everything starts to slow down but usually not like it just breaks but that's right that usually it's the cluster operator who is responsible for that. That needs to monitor it and make sure that those limits are like the cluster is within those limits. But there is also one more thing that I want to mention that to make sure that all those SLOs are met that there's also a bunch of requirements that your cluster needs to be following. For example, one of the requirements here we have that the control plane. You have to have 64 cores for example or you have to have two instances of at CD and so on. So basically there is the bunch of list of requirements that you need to also have implemented in your cluster to make it scalable to those limits. But it's all documented there basically. Thank you. So follow up to the question. So is there documented what happens when those latencies increase beyond 5,000 and when it degrades what happens and is it able to recover when you reduce the load? So there is no document like that and the reason is that it really depends. Like we are sure that if you have vanilla setup of Kubernetes and you don't have super fancy controllers then it will work, right? But once you start using some of the components I've seen in the past that there are components that actually do not gracefully kind of like stop working but they just break the whole cluster at some point. Like if you reach the limit of whatever let's say secrets then if you have 20,000 secrets they will just stop working for example. And there are components like that that cluster operators sometimes install. So there is no really simple answer here because there are so many components you can install in Kubernetes that it always depends on what kind of components do you have. The second question I had was what is the significance of that 100 node test that you talked about? So this is kind of like our first test that each change is tested on because in a perfect world what we would be doing is like whenever someone makes PR to Kubernetes we would run 5,000 nodes test that tests everything but that's super expensive. So what we are doing is basically we are running this 100 nodes test instead which is much cheaper and we can do that. Got it, but why 100? I mean 100 is kind of like... Can you just make? Just last question, I mean why 100? Like 100 is too less of a number is just because of cost you're seeing or... So it's actually both cost but also it also detects like very basic like regressions like significant regressions. So it actually saves us quite a lot of time to debug it because once it's passed and merged to the master branch of Kubernetes then if we observe regression on a larger scale then it actually involves like human person to go through and see what kind of changes were made and what was regressed. So this is kind of like first test to reduce the number of nodes. For actually for what it's worth we did have one of our clusters that went above like 5,000 nodes and the control plane just broke and never recovered. But that's actually a great segue to my question. You had a slide on profiling for a QBAPI server. Do you need to have access to the control plane because there's a lot of cloud providers that don't give you that access? That's true, yeah. So in open source we are actually testing like fully on our VMs. We are not testing EKS or GKE. We are only testing Kubernetes as the open source version basically. So there we have the access. And just two questions. The first one is that for the, we are developing some platform and then part of the Kubernetes ecosystem. So what is, is there any guideline to, test those operators like custom resource? Yes, so that's a great question. Currently there is no guidance for it, but this topic is actually coming up more and more often. So probably we will need to invest some time into some guidance how to write controllers and operators in a way that they do not put too much load on the control plane. But from experience I would say that usually the biggest issues are with the agents that are running on all the nodes. Then there is a chance that if you have some kind of specific bug then you put too much load on the control plane and you can break basically control plane. Okay, yeah. That was your question or? Sort of, you know, we're trying to see, I evaluate the performance and scalability of the controller in the Kubernetes ecosystem. Maybe not exactly the Kubernetes cluster per se, but the controller is part of the API server and the scheduler and SCD, all of the loads are there. And then in this kind of context, you're still trying to figure out what is the best way to evaluate this as controllers of the platform wise. But I think that's, yeah, that's fine. I think we can skip to the second question is that in the cube mark there's no pod created, right? It's like a hollow node as you mentioned, right? And so in that sense, so this pod creation time or something if we want to evaluate that then cube mark, is there any other tools and tooling you know that can evaluate this pod, the actual pod, really? So, not really, so I think, yeah, in cube mark basically we are not running those pods, so it's just simulated, but I don't think there exists anything that, you know, like if you start running those pods and containers then basically you are close to having like real cluster, I think, yeah. We have time for our last question, I think. Thank you. Thank you. I think on one of your slides you list like a three regression issues. One of the three I think, if I remember right, is about pod startup latency. So, did that, was that issue ever fixed? Oh yeah, it was fixed, actually I can give a bit more context there. So, there was some feature developed in Kubernetes and by accident what happened was that for each request that was made to API server there were two goroutines created. And you know, by itself, like it shouldn't have huge impact, but we observed because, you know, in our scalability test we have hundreds of thousands of goroutines in the API server and because we doubled that, what was happening that the going scheduler was already kind of slowing down and because of that we saw the regression in pod startup latency. But yeah, then basically we just reverted this PR because we decided that, you know, we need to fix it because the regression was quite significant and then Outor had to actually implement it in a way that we were not increasing the number of goroutines in the API server. Okay, I don't know, can we, one more question? Just a quick one. So, basically all of the examples and info that you gave were mostly about reactive action to changes in Kubernetes. So, basically you're doing a lot of taxing and catching the scalability issues and fixing them when they get merged. But how about trying to improve the actual scalability of the project with the baseline that we have? So, for example, you had that five seconds that line there is the sake working on trying to get those thresholds lowered, actually doing profiling of what's taking up those three seconds that you are considering like the normal baseline now or is that something that's completely out because you don't actually own the production code? So, I think, so how we usually operate is we really want to set our goals based on our customer's expectations. So, it's not just improving based on, you know, like just make, just like possibility of improving. So, there are some metrics that our users care about and there are some metrics that user do not care about but we still measure them in order to catch regressions. But there were some improvements that actually we, but they still came from our users. So, what we were seeing, for example, is that people use quite a lot of secrets. And what's happening is that when you use secret and you mount this secret to POT, this POT or Kublet actually opens, opens, opens watch. And we introduced immutable secrets to kind of resolve this issue, to reduce the number of watches and increase scalability. But I think we need to actually finish, you know. But I will be still here, so if you want to talk then feel free. Thank you.