 Okay. Hello, everyone. Thanks for joining. I know it's Friday, so I appreciate you are here My name is Vojtek Tyczyński. I work in Google. I'm The TL of 6k ability and together with Marcel siemba Who is one of the chairs of six Calibur that we are going to present like the six Calibur the update today Okay, so let's start so What we are actually doing what the six Calibur the is actually doing so There are basically five main aspects of things we are doing which kind of correspond to sub projects, but In terms of like six Calibur the that doesn't really own any Non-tests related code like sub project is some sub projects are somewhat somewhat fluid here, so So yeah, so the first thing that we are doing is actually defining what does scalability really mean for Kubernetes? it it both means like defining Defining well What does it actually mean but also like where we are actually having like the core principle for any scalability related efforts is to Focus on actual real life Scenarios and real life needs like optimization for the sake of optimization doesn't really make sense because it's usually With some exceptions, but usually it's like complicates the system So we should be only focusing on things that are really needed for someone The next thing that is like strictly related to that is like Once we have those goals like we need to actually execute towards those so actually driving and Ensuring that like the actual improvements that are needed to get to those goals are Are something that we are working on This doesn't really mean in many cases that we are doing the improvements if if those are fitting into Individuals six like I don't know signals or CK behind machinery or whatever. We are trying to To work with those six and ensure that those will be doing that but we are often also contributing to those themselves and for any Cross-seq Improvements than to work a bunch of those like over the past years. We are we are usually coordinating those across those six The next thing is Ensuring or checking where we actually are like we know where we would like to be based on the first item But we don't really often know where exactly we are now so Monitoring and measuring where exactly we are is it's pretty critical to Understand how far we are from our goals. And is there anything is there really any work that we have to do in this area? The next item that we're just strictly related to that is Is ensuring that we are not regressing so once we reach certain level of scalability We need to ensure that That we actually stay at that level. It's it's pretty pretty easy with scalability to regress Because like every every new feature is or every new change in the system is often If not well thought through it's it's often possible to easily regress the system and finally We need to ensure that scalability is is not the job for like a very small group of of people like Sitting in the corner, but it's actually a job for everyone like similarly to Reliability or security or whatever like when you are designing your Designing and implementing your your feature you need to be thinking about like scalability on your own like we can't like it can be We can't ensure scalability If the overall community won't be thinking about that so we actually made a very very long journey and very successful journey I would say since the since the early beginning of kubernetes when they were pretty much like Two or three of us like thinking about scalability now Scalability is actually an inherent part for pretty much every single cap Which is like I hope that you all know what kept this. It's like our design doc more or less people are thinking about scalability, but Not everyone knows exactly how to do it They know they should be thinking about it So this this item is actually about ensuring that when they reach out to us like we have like best practices we We can recommend what they should be doing in that situation and so on and so on Okay, I'm one more thing probably worth mentioning is that like we shouldn't be confusing it or it's often confused like sick out six scalability is often confused or mixed with sick out of scaling like auto scaling is about Scaling either your size of the cluster like adding nodes or removing nodes or Scaling your number of pods or or pods in general either horizontally or vertically scalability is about like the Amount of stuff that you that you can do our amount of for example, like how big your cluster is or how many How big services you can handle in a single cluster and so on and so on Okay, so What is really scalability in terms of Kubernetes so As I mentioned like you should all we should always be focused on on actual user needs So when you ask users if they want scalability of the if they want this cluster to skate the answer is obvious Like they are they are all saying yes, of course, but if you know if you ask them like what does it really mean? The answer is often like I don't know Or it's even more it's often also. I don't care because they don't want to like understand all the details of the system They want us to handle that so We need to define it our on our own So like the first approximation that we did like in early 2015 Is we approximate its scalability with like size of the cluster so number of nodes basically so when I first started looking into that like in February 2015 or something like that like that 25 node cluster was basically blowing up and over time we are basically improving that like Starting from like support for hundred nodes in 1.0 reaching like 5,000 nodes In a steps like in in 1.6 release and we didn't really go higher than that since then So now you are now you are probably thinking like if we did anything like in the last like five years and yes, we did we did quite a lot because like Scalability is like actually much more than than like the size of the cluster or number of nodes Basically scalability is like a multi-dimensional Multi-dimensional or scalability in Kubernetes is a might multi-dimensional problem where like things like number of services number of pods number of Endpoints in a service number of persistent volumes and so on like the all they all actually matter for scalability So we introduce this concept of like scalability envelope Which is basically saying that if your Cluster within the same with within each of those they mention is actually fitting into that scalability envelope that means that like Your cluster has like scales or your cluster will be happy basically so What does it really mean that the like your cluster skate or is it happy? So we are building or we are defining that like based on like two main concepts SLI SLI is an SLO so SLI is like silver service level indicator as a lowest service level objective You can conceptually think about them ours like SLI being a metric and that's a low being like a metric and with threshold It's actually much more subtle, but like conceptually. It's it's basically It's basically that so The cloud we are saying that like the cluster is happy if all those scalability SLOs are basically satisfied So more or less like all the metrics are within that the threshold that we that we define. So We don't have much time and we don't have like Enough time to talk like in very details about like SLIs and SLOs so If you are interested I had to talk purely in that topic or purely about that topic and like Barcelona You've gone like so we can probably find find that and then in YouTube if you want so just very briefly like we have like Six main SLOs that we are currently measuring SLIs and SLOs They have like subvarious. Some of them have some sub variants, but like Conceptually we have six of them like the first two are with us like since the very beginning The the the latter four were added over time So as you can see like we don't have like a super large coverage There are different pieces of system that aren't really covered like I mean some of them are covered Implicitly like for example like scheduling latency It's actually part of pot startup time or pot startup latency. So There is there's more covered like Implicitly then then then then then from what looks it's from the one from from how it looks like from the first glance, but We are still we still have a bunch of like holsters. So it's one of the things that We want to invest for quite some time and we we never like have time for that So if you are interested like it's it's definitely one of the areas that we would We would benefit from your health So let's take maybe one one example and look into a little bit more detail I'm not going to to to read it and I I don't even expect you to read it like the main point is that Even though like the conceptually the SLI should be Are or we are trying to make them very simple and that like high level. So basically for mutating API calls we can say that like the What we want to achieve is that like the Mutate like the 99% of mutating API calls is finishing one in one second And this is more or less was written in the slide But we need to be super precise in this allows to ensure that the way we understand it and the way we measure it It's like exactly how you as a user also understand it It's it's like we had a bunch of cases or I've heard about other cases also from outside Kubernetes were the the fact that like The ones who this define the SLO Understood it differently than the actual users was was causing a lot of friction. So we need to be super precise here Okay, so Basically Once once we have the like once we defined like all those SLOs users know what to expect where What to expect from the system or how the system will be behaving if they are within the scalability envelope, but They also like or maybe primarily in many cases. They want to know like what the scalability envelope Or how how large it is or how far they can go in the dimension They are actually interested in so Computing it precisely it's like super hard or maybe even impossible So we are not trying even to do that. We are fortunately able to like approximate it pretty well by by some By providing like a certain thresholds in many dimension in all of the dimensions It it's if you if you take a combination of those It's It's not actually the whole whole envelope It's possible to go much farther in in certain dimensions if you go lower in other dimensions but we want to provide something that is That is easily consumable for everyone and for the most sophisticated Most sophisticated users we can work for them and explain them better now like if they can actually go farther with that Okay, and with that I'm passing the microphone to Marcel we'll be talking about other stuff Thank you. So basically right now we know what scalability means what kind of SLOs do we care about but the question is like How do we measure it? How we ensure that this as these SLOs are actually Satisfied in Kubernetes. So we will go through scalability testing infrastructure What kind of tools we use and basically how we ensure that Kubernetes is scalable So our main tool that we use every day is cluster load loader 2 You can think of it as bring your own cloud bring your own basically kind of similar to like regular deployments that you that you do for Kubernetes and You can think of it as you specify kind of states in which you want to have your Cluster in for example, let's say that you want to make sure that you can run 200,000 pots in your cluster then what you do is you specify this state as for example You create 1,000 deployments with 200 pots each And then you can specify even more states like okay once I have those 200,000 pots, let's say I want to delete half of them and When you have those states what you can do is you can specify how you transition between those states because obviously if you create 1,000 deployments with 200,000 pots in total then Most likely something will break but what you can do is basically say okay I want to create one deployment per 10 seconds or maybe you want to specify that You want to create all those 1,000 deployments within 10 minutes or one hour so you can specify all of those transitions between states and Then on top of that once you do it what you can do is Measure measure those as laws that we mentioned so you specify that for example you care about API Latency or pod start up latency you care that the pod start up latency will be within 10 seconds and so on and except for that we have bunch of other extra features that allow us to monitor observe and Test it basically you can find all of that on our documentation for cluster loader 2 and if you ever want to Scale test Kubernetes. I strongly recommend to use this tool But of course as you may imagine running 5,000 cluster node is pretty expensive and In open source what we do is actually we run 5,000 nodes every day to test for scalability regressions But this doesn't give us like really good observability because Let's imagine that you know there is one day and there are 50 PRs merged into Kubernetes and we see some regression This would be quite hard to kind of like see which kind of PRs are actually breaking the Kubernetes But fortunately we have cubemark and cubemark is simulation of the cluster instead of running 5,000 nodes what you can do is just run like 80 nodes and Until on those nodes you can schedule pods for example what we call whole nodes and whole nodes is something that Simulates regular node But it doesn't actually run the pods so you can think of it as okay regular quiblet Just takes the pod and it's running the container But whole node just reports back to API server. I'm running the container, but it's not actually running the container so whole node consists of actually three parts one is whole cubit and We also have whole cube proxy because cube proxy is actually One of the parts of Kubernetes that is putting quite a lot of pressure on the API server and Kubernetes master And except for that we have also hollow node Problem detector. So with this kind of setup what you can do is with 600 CPUs you can test or simulate 5,000 node cluster Yeah, but then you might ask okay, so how how do you actually run those 5,000 hollow nodes? It's not easy right. So what we do is we actually We actually create second cluster. So we have one Kubernetes cluster that is responsible for For just running those hollow nodes And then those whole nodes connect to the actual Kubernetes master that we are scale testing with cluster loader 2 Yeah, except for that what we do is here You can see our great tool perv dash as you can see it's quite outdated But it's like one of the most useful tools that we have in six culability and here you can see example that okay, this is 5,000 CI test that we constantly run for for Kubernetes and we measure here pod startup Latency and you can see like 99 percentile 50 percentile and so on and based on that We can detect some possible regressions because as a law says that okay The pod startup latency 99 percentile needs to be below five seconds But we are still unhappy when you know the 99 percentile was three seconds And then it bumped to four seconds for example So we we see it in the perv dash and based on that we can still debug it and and see it Then of course we have grafana. We are collecting like bunch of metrics. I actually recommend to check it out and Those dashboards are in our our repository and you can use it for your own cluster if you want And there is like bunch of like really cool graphs that allow you to to check like API server latency But scheduling and and all of that stuff Then we also have profiling so when you are running cluster loader 2 what happens is that? It's not only like gathering those Prometheus metrics, but also It's getting the profiling from the API server for example and based on this profiling What we usually do is we go through it and see what kind of parts of API server is actually consuming most of the CPU or memory and Based on that we can plan and iterate on improvements to API server So now okay I talked a little bit about our infrastructure what kind of tools we use for for scalability testing and now we can go back to Scalability test what kind of test we actually do run and What you can see in our test grid basically so We have periodic tests CA tests and we split them into two categories One is release blocking and one is non-release blocking. So release blocking tests are the ones that you know, there is new release of Kubernetes and We see that okay the scalability doesn't look great Then we say okay We just need to stop the release of Kubernetes and debug it what what happened basically And we have performance test with 100 nodes performance test with 5,000 nodes and correctness test that makes sure that Regular features in Kubernetes work at scale as well Except for that we have non-release blocking which are kind of like more informative for us So kubemark is one of these examples that I talked about But except for that we have storage go on benchmarking different type of but one of my favorites is actually going so with going we actually saw like multiple times that You know going change the compiler changed and it totally broke Kubernetes So we have one dedicated test just for just for testing compiler of the going Which is basically running fixed version of Kubernetes? But we are just changing the go on compiler and If you are a contributor then probably you saw our pre-submits We have 100 nodes pre-submit test that it's running for each PR that Someone is trying to merge to Kubernetes master And this is like early warning that something might be broken basically and It protects basic Kubernetes from like super super big regressions So yeah, now let's get to protecting scalability of Kubernetes. So This is our test grid. You can check it out. It's It's in our yeah, you know, like basically the test grid that all the tests are there And you can find like six scalability test grid and and we test also release branches old ones and To make sure that you know if there are some cherry peaks to 122 123 then they are also not breaking old versions of Kubernetes as wait I mentioned before scalability is very sensitive and What happens is that there are so many things that can break scalability of Kubernetes To name a few we saw as I mentioned before going that compiler changes Can break but also call operating system controllers and API machinery meaning like API server schedule scheduler at CD and cubit basically everywhere So what we do is we try to Try out those issues debug them and once we know that okay the fault is because of the change in scheduler or going What we do is we reach out to those six and try to to help them fix those issues and Sometimes we just fix them by ourselves So I will give you a few examples of like really cool regressions that we recently debug So one was pod startup latency and the idea was that okay since like 120 We've been heavily investing in Priority and furnace and along the way there was one regression that Significantly increased the pod startup latency. So we were the ones that were debugging and the root cause was actually quite simple. So we started supporting watches in priority and furnace and The number of go routines that one watch was requiring was actually doubled. So instead of like a having 1150,000 of go routines we had 300,000 go routines and at this point Scalability just degraded for for pod startup latency There's also like API call latency regression, which was also connected to priority and furnace. This actually was This actually basically increased the API call latency for all the possible resources And so we were also debugging it and fixing there is pretty cool at debugging in this issue that you can you can find on github So except for protecting from scalability regressions, we are also driving scalability improvements and recently we were helping with the migration of Going to 118 and then we also help with implementing efficient watch resumption or Immutable secrets like secrets are actually one of those things in a cluster that can put quite a lot of pressure on API server so if you are deploying your workloads and And let's say that you are using secrets then maybe you should consider using immutable secrets to just make your Kubernetes API reliable More reliable and except for that We also work heavily on priority and furnace which further increases their ability of Kubernetes so If you want to get involved There are a bunch of links. You can attend our public meetings. They are on Thursday 17 30 and Yeah, join our slack channel if you have any questions Our mailing list you can can always reach out to us and we are happy to help If you are developing new features, then we can help you with reviewing it Or if you have any issues, then we can try to help you with debugging those issues and if you want to get involved there are some issues that are marked as help wanted and Basically getting started and so if you are interested in our tooling then you can help us with developing tooling If you are interested in regressions, you can help us with debugging reg regressions So thank you and now it's time for Q&A so nowadays there are Sometimes more CRDs and controllers in a cluster than than workload or or anything. Yes So how do you what do you think about this and how do you consider it? Yes, I Think there are different aspects of that like well well designed controllers to see are like based on CRDs Are not really a problem I mean CRDs are a little bit more expensive than like built-in APIs because because of not using proto buffs for example and using JSON and so on but like that's not a core of the problem like we It's fine. Like the biggest problems that we see are the like Note but note oriented controllers So that like the demo sets more or less that are like watching or even worse listing in some cases like a bunch of state From many notes. So that is really what is causing the problem. So I think that our Main problem is not the fact that we have CRDs and controllers. It's the fact that like Still there are people that are Designing they controllers in like in a inefficient way. I would say I Think we would like to get to like improve like the efficiency of proto buffs in general like the the discussions about supporting proto buffs for CRDs were happening since I know four years or something. It's just like big thing with not High enough priority at this point. Okay. Thanks. Hi my question is So, where are the trade-offs or focuses in terms of improving scalability? Or where are the red lines that you would not like to cross More concretely, for example, have you considered Rewriting some of the components in another language or or switching out very important Important component or deprecating a feature that impacts scalability a lot, but a lot of people rely on that Yes, so Good question. So I Think the first example that you made for example rewriting components something that we really would like to avoid I don't think we want to do that. I mean, especially in a different language. I think we want to like consistency It's fairly important like for the project as a whole like for us as a community and so on like we we don't want to Diverge too much like between components. So I think it's it's it's probably one of those lines that we don't want to cross Re-designing individual components. I would treat it as a last resort, but I wouldn't exclude it like for example networking Cig network is actually considering or they are they are thinking and there's even capital is like Not yet approved, but like active on like how to redesign cube proxy to make it a little bit more efficient So like those kind of things is something that we Definitely consider about like it should go into like individual six like that The requirements should be coming from us, but like it shouldn't be us doing the work It should be that sick driving the work and to rewrite that controller and and ask me like like in Coordination with us and like us helping with that, but like driven by them more or less. I would say Sorry, I forgot the last part of your question. I think there was something more The last part was about deprecating features. Oh, yeah, I think The goal is to actually uncover the regressions or uncover the Unscalable things like as early as possible. So like in general like purely from scalability like wearing my scalability hat like Yes, I would like there are some things that I would like to deprecate and I would like to get rid Wearing more my like production readiness hat and like I'd like all those like higher level things We really don't want to do that. Like we really don't want to break users. So What we are trying to do is we are trying to Introduce a different way of doing stuff or the more efficient way of doing stack for example, like Podante affinity is one of those features like of one of those like scheduler features that doesn't really scale well And it's causing a lot of troubles for scheduler. We introduced like the feature that is called What topology spread if I remember correctly that is kind of doing a very similar job or like for majority of cases So we are trying to steer people towards like more scalable regressions by not by Disallowing them to run to use that the old ones but rather about like giving them a carrot to To use those new ones or or better one or however you call it. There are exceptions I think we recently deprecated like the self-link field That was part of like like every object used to have like a self-link that was basically like self-link And we actually deprecated it. It's no longer set anymore but That was done based on like significant research and and the outcome that no one is really using it for anything useful that and kind of like all of the cases were That we are aware in many different repos Can easily be replaced with already existing stuff and that are not based on self-link. So That's the only example that I remember in the last like two three years where we did that And we generally would really like to avoid it as a project Sorry, so I have two questions. Firstly, I think Way at the start you mentioned you have some documented design principles for other maintainers so Or maybe I misunderstood so if if as a as a component maintainer in in Kubernetes I I want to go and read the You know the cliff notes. What what should I be thinking about? When making design changes to my component, do you do you have a general principles doc? We are missing it a little bit like it's there are some pieces some small pieces here and there written, but I think One of the things which is we are missing like this go-to page that you can go and like read those Seven things that you should first think about it. And if you are interested then live in more detail, let's shut but We are actually missing that really. Yeah, but also like if you are Developing some component then I would recommend like just adding it to to our Load test so it will be just covered by our regular CIT tests, which can also be helpful for you If it's possible of course because it's not not always possible. I maintain cloud components, so Normally not but Okay So my other one was was kind of related to the previous question. So I was kind of interested if you could you've described Diagnosing a pod latency issue and the root cause was I think you said a change to double the number of go-routines on watches. Yeah. Yeah, so Presumably somebody had a reason for doubling the number of go-routines on watches. It was just like, you know programming back basically that was introduced Okay, you know and still like the SLO was fine, right? So we didn't see it at first but using the perv dash where we can see like the changes over the time We were able to spot that, you know 99th percentile of pod start out latency increase for like three seconds to four seconds and based on that We were able to kind of like, you know, see that okay It was within this time that the pod start of latency increased then we went through a few PRs and we picked this one that it's most likely this one and Then we work with Vojtek to fix it. Yes, maybe to tweak the answer. I wouldn't because I'm the one responsible for the I Wouldn't say it was a bug. It was basically a Think like how to like when you are like implementing a feature or designing a feature you want to get to like To production as soon as possible. So There are like there are some shortcuts that you are sometimes kind of like taking and it was like kind of like underwear shortcut that like We like the tests were passing. So we didn't really pay that much attention but but yeah, like it was basically Don't over optimise if it's not really not really needed. It appeared that it's actually needed. So so we like fix that but Okay, so yeah, you didn't you didn't have to ask somebody yourself in this case to remove a feature or redesign something Yeah, it's fortunately it was relatively Small change. So it's it wasn't like really like conceptually like We are paying like we are trying to pay as much attention as possible to the design itself And once the design looks scalable We should be pretty convinced that like the implementation is possible to do in scalable way And like even if the first attempt is not maybe scalable then you you can re-implement it somehow or pieces of that And Keep the same semantics, which is which is the core right? Hey, so you mentioned about the 5k nodes number and Basically the edge scenarios are pushing that number quite to the edge and I'm thinking about What what is your approach about how to tackling the Increasement of eventually nodes that running in an higher latency environments, I Think we've never really focused on like the high latency environments to be honest in terms of like increasing the number of nodes like GKE which like there's like Google managed Kubernetes. We already support like 15,000 nodes So it's and it's built on top of Kubernetes. We need to do a bunch of like other things around to make Make that really work and that was like a lot of work, but The building blocks from Kubernetes in open source Kubernetes are there. So It's not that it's impossible to get there if if We didn't see enough use cases that are people in open source community asking for more So we didn't we didn't try to do that in open source because there's like a bunch of other things to do in open source to make that work but But it's certainly possible if we will start hearing those requests But yeah, as I mentioned that we've never really were looking into like edge cases or like Highlight and see or isolated things or things like that. So There is some work happening in that area not in the context of Scalability, but I'm not super familiar with that to be honest Okay, thank you. Okay, and the last question. Oh, there is you had Yeah, what do you think the biggest constraints on scalability are going forward with Kubernetes? Like if you wanted to do 50,000 nodes, what are the biggest challenges you think you'd face? I think there are many different step many different things depending on like where exactly you'd like to go So I can imagine like going to 50,000 nodes With not much effort if you for example run just like simple batch hot per node We're close without using any sophistication in the working storage or anything like that. That shouldn't be that hard Especially if like a low-chair and plaster and so on so it really depends like What are the what are the dimensions that you would like to you would like to To stress more like usually networking stack is it's something that is like most stressing the control plane because there are Most of other things are basically are Basically Okay, they might be taking a step back So we should split the the problems on like control plane components with which there are like very few of them Or I mean very few per cluster not very few components But very few per class like very few instances of a single component per cluster and those that are running on nodes and Those that are running on nodes are usually like contributing like the highest number of like load on the control plane And from those like the networking ones are usually the most The most stressing so this is this is something that we definitely need to We need to come up with something if we really need to go go visibly higher than that In terms of individual components, it's usually boils down to To throughput or things like that like on the individual components which Which is kind of solvable at the per component level, so I would say being able to horizontally scale API server is something that is that is fairly critical and that we may want to invest a little bit into that more the storage layers of the fcd itself is is potential bottlenecks and then like The components themselves will Or can drive like improvements if we really need to go higher, which is in dimension related to that component, which is usually will be throughput or The time to process all all the objects of a given type or something Okay. Thank you very much for coming