 Cool. Okay. All right. I'm a max. I'm a test engineer at chorus What I pretty much do all day is write logic that spins up clusters And then sometimes shoots things down and hope that everything comes back up So that's my job. You can you can reach out to me over social media if any questions As emails as well. That's fine. You can ask questions during the talk or afterwards to make that pretty free here What does the company do that I work for I work for chorus we secure simplify and automate container infrastructure That is quite a broad topic. Maybe some some projects that we're involved in Definitely Kubernetes upstream and our own Kubernetes distribution then for example at CD the database and chorus container Linux And these are like things that we're well known for and that those things are actually quite important for self-healing Kubernetes and self-hosted Kubernetes that I'll jump into in a second Okay, what is Kubernetes who's here familiar with Kubernetes? Okay, who's using Kubernetes Okay. All right. Cool Who here have I heard heard of self-hosted Kubernetes? one two three, okay All right, cool So what Kubernetes in the end is is just a platform for running your applications so you can think of it as a platform as a service you run your applications on top of it and Now Kubernetes offers you all kinds of nice features around your applications like easy deployment easy scaling Takes care of about a bunch of networking and in general it just takes nice care of your applications So they stay alive and healthy Now all of this tooling is great, but not very useful if the underlying layer is in any way fragile, right? So if Kubernetes dies, probably your application dies as well So what we need to do is make that underlying layer very sturdy as well And how can we do that? Well, we already have all this tooling to make our applications stay healthy Why don't we use the same tooling to make sure our Kubernetes cluster still stays healthy as well So what we'll do is we'll run Kubernetes in Kubernetes and that is not in the second one, but actually in itself You might you might think that's a little bit crazy But for now you just have to believe me that that is possible and we'll go into how that is actually possible All right, so that's the idea of self-hosted Kubernetes Maybe self-driving Kubernetes maybe comes back from compilers writing your own compiler in the same language that you're actually compiling to And you can go on different levels of self-hosted Kubernetes These are like five of them level four with DNS None of the core components really rely on DNS So you can pretty much run your entire cluster and then just run DNS on top and you're self-hosted level four That's an easy one. Then level three is a little bit more difficult When you want to start a scheduler to schedule your parts without the scheduler that schedules that scheduler It's kind of difficult to achieve, right? same counts for example for the controller manager and then you can go further up like the API server the core component that Everything needs to communicate through if you don't have that running It's difficult to start a Kubernetes cluster on something that doesn't have an API server yet And then you can go more crazy like for example at CD self-hosted at CD as the brain of Kubernetes as a database there That is right now from our side still behind the experimental flag So we don't fully support that and then you can go even more crazy Self-hosting your cue blitz which would mean the cue blitz that actually talks to the container runtime So the thing that actually starts containers it's difficult to start containers without anything that can actually start containers Okay, so I told you you need to believe me that self-hosted actually possible and we'll go today till level two We might point at some level one stuff, but that's pretty much it and now I'm going to go into detail how we actually self-host our Kubernetes cluster and That is possible. We are a nice little tool that is in the Kubernetes incubator called bootcube By the way, all the stuff I'm talking about today is all open source You can all check it out today after this talk and you can spin up your own Kubernetes clusters the same way I'm not going to touch any closed source stuff except I'm running my cluster in AWS and that is not open source Okay, so bootcube We want to start our Kubernetes cluster and we want to start it in a self-hosted way and What we need is first of all the node and machine and on that machine We start the cue blitz that talks to the docker demon in our example here And we kick off bootcube and what bootcube does it first of all with the help of that cue blitz It starts a full-fledged cluster. So here you now have a net city and API a scheduler everything in there But it's not a cell phone. It's not a self-hosted cluster. It's just a normal cluster But the nice stuff about this as it is now a full cluster you can start stuff on this you could start your applications But well, you could also start a Kubernetes cluster on this. So what we do is On that bootstrapping cluster we start our actual self-hosting components And these are the components we that are going to be long-living and that we're going to keep alive for a long time The other ones are just throw away just for the bootstrapping process Now these self-hosted components will just idle around they will pretty much do nothing at this point in time Just laying there and they don't really know anything as the etsy de cluster that this level 2 cluster Yes, this is empty doesn't know anything about the world So what we do as the next step is we transfer the bootstrapping knowledge into the self-hosted cluster And thereby the self-hosted cluster now knows about its environment and it's knows about itself running in itself Now we've got everything ready We don't really need the bootstrapping components anymore And what we do is we delete those at this point in time their self-hosted components kick in They notice that they are needed at this point in time and they take over the work and at this point in time We have a Kubernetes cluster running in itself So why all this madness? Why why going so crazy? Why not simply starting a Kubernetes cluster and that's it? Well first of all we have a very small dependency chain this way We we reduce the amount of tooling that we need in total We have come deployment consistency the same way with that we deploy applications from now on We deploy our Kubernetes cluster. So for example, if I want another scheduler I just deploy it as a component in my Kubernetes cluster itself Then in addition Kubernetes offers a lot of nice Tooling around introspection how to debug my cluster and so my applications And now I can use the same tooling to actually introspect into my cluster and use all of that to debug the components that are in there and Then in addition maybe some of you went through this Updating Kubernetes is actually difficult So as Kubernetes before can now nicely roll out new versions of your application It can now roll out nice version of itself just by rolling deployment of the its own components And then of course easier high availability configurations. We want to run this in production in the end So for example, we would like to have more schedulers to be redundant So what we can do is we can just instead of QPCTL scale my application We can now say QPCTL scale and scale my scheduler. It's the same thing Kubernetes does all of this for us So it's all of these benefits actually play in intrinsically in itself, but also are important for our self-healing idea and what I want to do now is Touch a couple of points where a Kubernetes cluster would possibly fail and then see how self-hosted Kubernetes would react to this Okay, just so we are all on the same page. Just look at look at the architecture that these examples apply to We're running multi master. It's gonna be a production cluster So you don't want to run single master, right? That would be a single point of failure That would not be a good idea and on those masters. We first of all state the a start the API server The API server is supposed to be a stateless applications. It's not it does a bunch of cashing, but anyways It runs on all of our master machines And it doesn't really matter how many we have running at the same time So we can run multiple at the same time. We don't have really have any races here In addition, we start the scheduler the scheduler is a little bit more of a problem You don't want multiple schedulers to interact at the same time You only want one scheduler to be active at a point in time That is very important now Kubernetes offers a very nice leader election feature for your applications that you can build in this feature given to you by Kubernetes now as Kubernetes or the components of Kubernetes are running as applications on our Kubernetes cluster We can just use that leader election for our scheduler as well So we can do leader election around our scheduler given bars by Kubernetes So what we do is we only have one scheduler running at a time and all of the others just idling just being lead followers Waiting for the leader to die or anything like that Same thing counts for the controller manager Of course, we run the the active one on a different one And then all of the other components and these now all manage our worker fleet on the right Okay, let's go through some scenarios. So what happens for example, if one of the API server dies Well, as I just said, we are running redundant. So we pretty much don't care at this point in time You probably want logging in there. So if your API server dies every minute You probably want to when wake someone up something is wrong But there's no real reason when it happens once Kubernetes will notice that some the demon set is not successfully deployed on that note And it will just start a new API server on it in the meantime all the load is distributed on the other API servers and Everything is still up and running So next scenario what happens if a scheduler dies? Well, if it's a follower again, you really don't care The leader will just start up a new one When it's a leader it will die the follower one of the followers they race for becoming leaders One of them will win take over start the failed scheduler and you're up and running again No reason here to wake up any engineers in the night, for example Now this all sounds pretty much like happy path. You don't have really have to interact with anything The next second scenario is a little bit more difficult What if all your schedulers die? What if all your controller managers die? Well, first of all, you're very much out of luck at that point Like that is very unfortunate. You had three machines in all of those three machines died or something so that's pretty much unlucky, but You just have to interfere a little bit manually what you have to do is you can use the bootcube your cover tool at this point At the time what it does it checks Okay What it does it checks at CD as your store and it will It will take all the current state bake that into manifest and take those manifest and and Deploy them as a new cluster and your cluster is back up. So you don't have really have to debug at this point in time Okay Well, all masters die same thing you can use bootcube recover and you get to go from now Okay, can I delay the questions afterwards because I only have five minutes left and I got a full demo. I think Okay, all right, go ahead then Do the masters also run the at CD cluster or is this separate so that really depends if you for example going self-hosted at CD You would run them on the masters itself, but this cluster for example does not host them on the masters itself But in the separate cluster Okay, so Running multi master is the way you should do it in production. Well, that's maybe a little bit boring So for this presentation, let's scale it down a little bit and go single master That's of course not the way you should supposed to do it in production But it could happen for example like two of your masters die and suddenly you only have one left What happens now? What kind of failure scenarios can we mitigate from here on? So what that means is that that single master is going to be your single point of failure all your control plane components are going to Run on that single master So what happens for example if the only API server dies? Nothing can communicate from now on the entire control plane is dead at this point in time Don't get me wrong. Your applications will still stay running. They don't really need Kubernetes to run But they do need it for example if you want to roll out a new version of your front-end that you could not do at this point in time So what we here have is a little tool. It's called the check pointer And that's a little bit of logic that I'm going to show you So what the checkpointer does? It's just a part on your node on each master node and it check points important Manifests for example the API server it check points that to disk every now and then Now you got those manifests on your disk and once for example your API server dies the checkpointer notices it and Brings back up that API server So we have on the top with checkpoint the API size over in the happy case the API server dies And we can bring that API server back up From the node itself without having needing a full Kubernetes cluster running at that point in time Now that we have this API server back up. It's just a temporary one Now everything can function again we can actually start the real API server on the left again and Then kill the temporary one and we're back at a normal cluster and here at the very top We're just checkpointing again going through the same cycle like we did before So an API server failure is really not a big problem here now Another problem might be what if that single master dies? What if that single master for example reboots or stuff like that? This can for example happen And if your operating system needs and reboot for your updates, but your cluster only has one master So what are you gonna do now? Well the same logic here applies again. We use the checkpointer again and This is pretty much the the runbook from here So your master will come back up. Hopefully it still has the disk inside and the disk is still running then system D will start system D will start the cubelet and The cubelet will start the checkpointer The checkpointer will look on to disk what it checkpointed in the happy path before We'll see. Oh, there's no API server. I should better start an API server And then from that API server, you got a full running API server at that point your cubelet can again communicate It's things that has a full cluster around it It will notice oh a scheduler and the controller manager and so on was all scheduled on me And then it will start all of that and your cluster is back up running Okay So I think I got five minutes left I got I got a demo. This is a lot about nice talking, but very nice to see it as well I hope those five minutes are good enough Afterwards we can just like it takes a little bit of time if it doesn't come up right away We can like I don't know show it back back there later Okay, what I got here is A Kubernetes cluster running on AWS right now I Show labels I Got these machines these notes here and if I grab for master here Just to make sure and you believe me. There's only one master here So if I shoot this one master down, we got a little bit of problem so Get parts dish and keep System These are the parts that are actually right now running on that machine And these are actually the parts they're running on that on that master machine as well And what you see for example here you got the kube API server running inside your Kubernetes cluster So this is actually a self-hosted cluster and what I'm going to do now here down here I just have the the master machine and What we can do here now is rebooted and Let's see if our Kubernetes cluster comes back up So my SSH connection is lost The machine is rebooting you see on the top the the API server is still responding at this point in time So it's still in the process of rebooting still responds to my requests Should soon disappear There we go now the cluster the control plane at this point is dead nothing is responding to it I can try to SSH back into it in a second And then we can I don't know look look what comes back up first Okay, so what I'm going to do here is watch Docker PS I'm watching for Docker to come back up and show what kind of containers are running on on the host Still takes a little bit of time there we go Docker just came up now the next thing is the checkpoint is going to be started There we go the checkpointer checkpointer is going to start another another checkpointer instance Now the API server came up you see here KDS cube API server just came up from here on Soon up there should be answered the request because we got a running API server now The cupelet can communicate to that API server knows what it needs to run on itself And there you go all the components are coming back up and everything is up running again It still needs a second, but yeah, that's your failure scenario at this point in time Okay, I Think I should finish now at this point in time or just to have a second All right, what are scenarios for future talks I skipped a bunch of stuff today So self-hosted that city is definitely a story that I could cover entire talk and it opens up a whole new set of failure scenarios Then of course AWS dies that could happen again as a as well Then you could for example use Kubernetes federation that is as well and tie a topic for an entire talk And then of course the internet could die That is going to be a very creative talk at this point in time. I'm not going to give that in any point in time in the future All right, so as I said, this is all open source And if you want to try it out and check it out, please feel free We're very happy for feedback on any of these things and as it's open source It lives from contributions. So feel free to open up requests in Addition if you want to get paid for creating pull requests on these repos. We're also hiring. We're hiring in San Francisco New York and Berlin feel free to reach out to me Luca or Casey for example back there as well Yeah, that's it max feel free to ask questions now after the talk and so on. Thank you very much