 So I think you can okay excellent. Thank you very much for joining our talk Before we start so just just a quick introduction what we're going to do So we got to have a quicker look at how troubleshoot deployments We have structured this talk to be in two parts the first part is going to be we're going to have a look at a little bit of a theory behind deployments and the second part We actually are going to break and fix that so we're going to have two teams team a and team b So if you're sitting on this side of the room, and you want to join team B Now it is the time And then yes team B, please invite invite your friends to join you for The ultimate match. That's it without further ado then Madhvi is gonna start our talk and Thank you very much for joining us Okay, now We're going back team B our team a Welcome to our talk. I'm Madhvi I'm a solutions architect at Dell Technologies and I'm joined by Dan Daniel at learn k8 He if you are Interested in following everything about communities well-researched articles. Please follow Dan So this is something that we are going to do so I guess now since there is no shifting so we are going to debug and You know deep dive into this particular picture okay Kubernetes embraces the idea of You know treating servers as one single unit right abstracting away how individual computer resources operate and And For you know from a multiple clusters multiple servers. It makes it look like it has one single server single cluster right Imagine over here like you have three servers and you can install Kubernetes control plane on one and the rest of them that remaining two can join us worker nodes and Kubernetes abstracts over here making it look like a single unit and you're accessing it from one single cluster if you want to deploy a container Kubernetes finds the best place for you and Deploy it now that goes for Many containers over here like this But in reality what's happening is it is finding a best suited place for you to Which worker node suits the requirement of your container and deploys it over here Now that we understood how Kubernetes work, we will see how the deployment actually takes place in Kubernetes Now imagine you have a simple static page like this Right, you can have your users visit this page even without Kubernetes Right, if you have more than one instance, how are you going to route the traffic to those instances? team me or team B Yeah, who knows the answer Ballancer team a One point. Yeah, so you use a load balancer to route the traffic Now what happens if you have more than one app? You use another load balancer Now to route the traffic to both these load balancers, what do you need? Another load balancer or a router which routes the traffic to the right application Now in Kubernetes, you don't call the load balancer load balancer, but you call it as service and You call the router or the load balancer is called as ingress and The instances that are receiving traffic you call it as pods Now ingress, service and pods are fundamental concepts in Kubernetes But there is something that is invisible over here that is Deployment we don't actually deploy pods in isolation, but we do it in together as an abstraction called deployment Now in this picture, you see that we see ingress service and three pods running What you don't see is this is a deployment What is the deployment right? It is a single type of pod that you want to run and you want to mention how many such instances that you want to run That's simple now This is how the Kubernetes takes place of takes care of it the deployment checks how many instances you want to run and Creates that many instances In case a pod is deleted accidentally it will create a new one Now we will see how a request is going through the Kubernetes cluster The journey begins with cube cut till apply command Now who's listening there the first component that is listening is the API server Now the API server is talking to HCD who is the single source of truth over here Now this is a big pipeline by itself now if we want to take a look I want to remember what API server is There are three a's that I want to take away from this slide Authentication you say who you are you prove it authorization What kind of permissions do you have to the cluster and the admission controllers? In the admission controllers we have two types that is mutation and validation in the mutation Admission controller it modifies the request based on some custom logic and validation admission controller Will basically validate the spec and you know pass it on to the HCD Now we're going to do the second step of the life cycle Now the API server now we remember that Everything is stateless in Kubernetes except for the HCD right now the API server gets a request and passes to the HCD where we now have the request for a deployment There is somebody who is listening other than over here a core component called us controller manager controller manager is a bundle of Things like everything like how we saw an API server But in this context there are two things that are very important in the controller manager controller manager is also Subscribing to all the changes that are happening in the HCD Now the controller manager is Can be divided into two in this case one is Deployment controller and the replication controller and who else is there? Now once the controller manager creates these particular pods it sees that you have asked for three pods to run and It is listening it creates the change it persists the change in the HCD over here And who's listening to this it persists the change as pods are spending and the scheduler comes into picture Once the pods are pending in pending state the scheduler picks up And it is the scheduler's job to assign where the pod should run So basically it queues over here The scheduler picks up over here and it kind of does two things which are very important There's a filtering and scoring What is filtering it actually tries to figure out what nodes are actually fit enough to run the particular pod and Scoring is a way where it ranks which node can take it up Now we go to further down the request and see what happens Right now the scheduler has changed the state in the HCD as scheduled Now by now you think is there a resource running? Anyone can say no not a single thing has happened everything that has happened is only in the database and nothing has happened in on the infrastructure side Who is doing installing the pods or creating the pods who's taking care of it the cubelet? Cubelet is a Kubernetes agent Now remember when we started we created a control control plane and we had worker nodes So cubelet runs on each worker node over here Right, you see the work cubelet on the worker node Now what does the cubelet do? right it continuously pulls and checks if there is something For it to create in the worker node The cubelet's job is to see what it is there in the control plane and then Do this get the you know pulls the thing to the control plane the request Receives the spec and installs it in the node So if cubelet is doing that what are these things what are CNI CRI or CSI? These are just the binaries that are there on the worker nodes You can do this even on your laptop or computer right create a docker, but why cubelet? Cubelet is doing this automatically by subscribing to the control plane Now we come to we say that you know deploy three replicas and what's the Cuban cube? Sorry cubelet does it goes and checks with the control plane and then gets the pods back Then Automatically passes it to the which part of the CRI CRI CSI It passes to the docker demon in in my case, but a container on time right and creates the container over there Now the rest of the cubelets that is there in the cluster. They act independently and create the containers accordingly Now let's see from the rolling updates point of view you want to take over. Yeah So until now we had a look at how these deployment has been created and A rather simple process turns out to be quite complex which involves several steps and Running running in sequence So let's have a look at something else which is which is related to deployment and that's rolling updates So when we have a rolling update, so this is the infrastructure. I've got a cluster with three nodes Three applications deployed what happens is we gradually roll out a new version of the application And then we remove the previous one, right? But how does that work in the context of Kubernetes and these controllers that we just Described well the way it works is you go on the command line you type cubacity I'll apply and a version an image version is changed Simple enough, right, but that what that translates to that request goes to the API server So the API server will change that deployment resource. It will mark that resource has changed What happens next who is listening for changes? deployments He made him be show me Who is listening to changes? Controller manager. Yes, exactly the controller manager notices that, you know, we asked for something has changed So it's going to be receiving a new structure and say, oh, okay, you asked me for a rolling update So what I'm going to do next I'm going to create a new pod and this part is pending Who is listening to changes for pending pods? Yes scheduler is gonna say yes, I'm gonna look at this part see if there is any space for it And I'm gonna sign a note to it Done it is scheduled. What happens next? I Had the cubelet. Yes, the cubler will pick up the work is gonna ask the control plane. Hey, is there anything for me? Yes, there is there is a new pod for you And then the cubler will delegate creating the container to the container runtime and eventually is going to create the pod. Is this enough? No, what's next? termination Sorry. Oh, yeah termination. Yeah before termination before terminating the old pod though The cubler will do something else. I mean among other things It will go and execute lightness and readiness probes, okay? And then when the readiness probe is actually thick then it will go back to the control plane and say hey all done This this this pod is actually ready to receive traffic. Well a little bit Over simplified, but that's the idea and then it will go and say the controller manager will say, okay far enough We've done the first one. That's when we terminate the pod, right? That's when we go down and we remove the previous pod and then we've got only three pods running inside the cluster Is it what happens next? I just removed the pod inside at TD the cubler will pick up the work Right, so it's the cubler who now is gonna reconcile the state of the node with the state of the cluster It will remove that part from the node Am I done? No, this is just step one or three. All right, so we go all over all over again We go on a control plane will create a new one. It is pending. Who's gonna pick up the work Scheduler, that's gonna make a market the schedule was gonna pick up the work next Cubelet yes, well done the cubler will go and create the pod. What happens next? Live-ness and readiness. Yes, absolutely. We go back and report that to the control plane. What happens next? Delete the previous pod. Yes. What happens next? okay, I Know he's getting confusing. What next is the cubler cubler will delete controller manager will add a new one. It's pending. Who's next? Scheduler, well, yes, exactly. It's scheduled then the cubler will pick up the work create report back to the control plane after liveness and readiness delete the old pod and finally done Done so you just change the version on your deployments and then Kubernetes go through these steps Right to roll out the newer version of the container What could possibly go wrong? What could possibly go wrong? Well, the reality is that it a lot of things could go wrong in this process And then it's generally a little bit hard to debug where the problem came from So generally the you know the way at least the model I have to think about these sort of structure is If you if I have to debug this very long process or any part of it Generally, I start from the bottom. I go and check that my application is running and Then if that is running then generally I look at interaction between what we call services So these internal load balancers and the pods and then if that is working Then that's the hardest part figuring out why the ingress isn't actually a routine traffic from from the top now This is the plan so we're gonna have a look at what is this So basically this diagram, which is looks more like I was hoping to have was often in the big screen will look bigger, but This is the harsh reality that we live in So the first part is basically we're gonna have a look at what kind of things we have to do To actually inspect the application and see why it is broken or isn't it coming up correctly a Second part is basically we have a look at why this ingress why this service isn't routine the traffic to To our pods and then the third one is okay How do I actually route traffic from the outside and why is that isn't isn't happening at all? Okay, and I thought the best way to do this was to break something and fix it, right? And then we can have a look together. Okay, so I'm running It goes without saying that I'm running a cluster locally The Wi-Fi doesn't let me use SSH and it's not very good at downloading either so we're gonna have some fun So first of all I'll show you what For the end result whoo for the end result looks like so we can start with it so I applied some some YAML and So this is an application so you don't you don't know much right now I just basically type some magic commands and then things happened on the screen fair enough and But the end result is if I visit a URL, that's basically when I see the application running Okay, let me just delete that and then see We do it again this time Okay, so what I've done right now. I just submitted the YAML file and then I've got a pod running in my cluster And then it is the same it is it you know It's is the same application if I go through to this URL then I need to refresh it should work, but it does This is More or less the experience that you as a developers you will face when you deploy something or you know your colleagues deploy something They get a 503 come to back to you and say hey isn't working. What should I do next? So what do you do? What's your advice here? I'm you know, I've got a cluster. I've got some kubectl commands What's your first command? Yeah, go for it. That's it. I said actually that's a good cry That's a good idea. So first of all is showing an nginx error. So is that a problem with the nginx? So what are you doing right now is You're basically debugging a layer for it, right? So while I I think you are on the right Mindset, I think the decision that we find is that this area at the top. It is very hard to debug Okay, so you might be right and the error is actually inside the ingress But before we reach that point Suggestion is okay. Let's go in and start from the bottom before we move up to the ingress, right? Okay So tell me what I should do next You got well here couple get pods Whoa Whoa, what is that? Check the logs someone is saying check the logs. Yeah, let's do it You can do a describe let's do a describe the scribe person is this enough No, what else do you want? Okay? I'm gonna go for the logs Whoa It's an ugly but very well done. It's an application error So at this point, you know, we started we it must be engine x I see an engine x string And then we inspect the log we start from bottom. It's actually an application error, right? So this is actually saying, okay, we are missing some stuff in our YAML It's time to go back and fix it before we can move on so I can so this is This is the YAML file. So this is the description of deployment that I just created inside Inside the cluster and and this is the deployment and you can see that there is no environment variable in this container But if I go back to the pay if I go back to the terminal This is suggesting that something should be set to actually make it work. So let's fix this. I'm gonna cheat I'm gonna tell you straight away. I'm just gonna cap his copy from the solution Okay, and and then I'm gonna apply the changes. Okay, I can see it running Looks like running. Let me go and refresh the web page It is running right now. Okay, cool fix done. Okay. I got more time for more Um So I think I think you know the important lesson from this demo is that if if there is something then Even if it looks like coming from the network It's much easier if you start from the bottom and then you level up. Okay, let's move to two more interesting stuff Let me just check. Yeah, so generally what we've done. Oops I'm gonna reveal a little bit too much now Okay, so we got I deployed another application and this time It's running. Let me just check what kind of resources is gonna create This one should be running. It's a 502 back in Gateway again Okay, what do I do? How do I debug this? I'm supposed to see a page saying hello world You want to see the YAML well done, I agree with you. Okay, so this is the YAML that this application Like this, this is the YAML the application we're working with has got some Generic container image, which is basically just a hello world Then it's got service and It's got any ingress You spot any I know I let me see if I can do a split All of it at once Relight, yeah, and this is already revealing some of the issues. Can you spot the issue? No port Okay, getting closer any issue what's in the type what port what type? 8080 What should it should what 8080 should it be? Ingress ported. Okay. Why what should that be? Okay, I say I know I know it's very hard You know you come to a talk expected to have a good time and I just dump a bunch of YAML on the screen say fix this please now Okay, I Know I know it's very hard, but there are a couple of things here that are a little bit fishy So so generally that we do again, I think you know when you look at this It's very complicated and again You know it's a little bit unfair for me to show you and ask you to fix But the way we debug this is exactly like we did with the previous example Yes, we know it's the problem is probably something with engine X But what if it is before that so what we do first is we go back and we check the status of Of the pod is it running? Is it healthy and the answer this time is yes Okay, so now it is the time for us. We have a couple of things At our disposal here, but if the pod is running Then the other things I can do is I can connect to the pod I can connect to the port which is actually exposed and then route the traffic to it and then see if it actually work so I can do kubectl port forward and Then the idea of the pod and then if I remember correctly here is 8080 Let me just do I think I've got to my slides on 88 It isn't connecting at all All right This is really telling me that whatever whatever we are doing the way we are attaching our services to the pod is wrong Because on 8080 there is nothing so I can do kubectl logs and then the pod You notice anything fishy or different. It's Yes, it's listening to a different port All right, there's no way with this could work. So we need to go back and change that So if I go back, so this is basically saying me that this should be changed to 98 98 Okay Any other change that should follow this? I mean, I know I haven't explained But maybe some of you have seen this in in the past and know what else should be done So someone is saying target port yes, you are absolutely right 98 98 should match, right? Okay, so this is good. Let me just I Wrote these two weeks ago and then I didn't touch it. So I I actually don't remember Okay, let's say yes, that was the fix. Okay, so let's have a look at why that was the fix So so generally what we have is a container port which is described where the application is exposed Then we have something called target port port and then we've got something else On the ingress as well, but generally this target port can contain a port of shoes stayed together and And then the port so if you put 300 or 98 98 like in this case Then they should stay together and then the ingress should also match now this example didn't have this this problem But those are the most common ones you can find The other So we've done them demo too as well. So the other things that I think it's important to remember is that When we don't know when when there is a problem with the network The other things that we generally can do and we haven't done or we could have done is do a cubes to tell the scribe Service, let me just get the services first and then there is a line here saying end points so end points, let me just go and Break it and see the end points here is empty So end points is a very convenient mechanism for us to check what kind of What kind of app? What kind of pods are going to receive traffic from the service? So if I had a look at it at it, you know If I were to go back in time and just check the service before doing doing the fix I will probably find it finding it empty. So how this Endpoint created so it turns out that when you have the cubelet that creates the container Attaches to the network and attach any of the volumes then what it will do is the cubelet at that point in time Creates the containers assign an IP address and then reports that to the control plane So the only one who knows the IP address of that container at the beginning is the cubelet Then the cubelet will say hey control plane This is the IP address I assigned to the pod and this is reported back And then if you were to look inside this control plane now you see the IP addresses as well These IP addresses are usually called end points and are basically one of the most useful things you get you get in Kubernetes and Yeah, basically updated every time you add a pod or remove a pod the state of the cluster is updated as well oops and And services services are basically when you do a cubes to tell describe pod you're basically just asking hey Can you show me all of the IP all of the endpoints that are related to this to this particular service? And it basically just core and collect those IP addresses So services pull services basically just a list of endpoints because that's what they are I wish I could do demo 3 but I'm running out of time So if you want if you want we can do it later, but now I think I'm not I'm not able to sorry So so I just want to recap some of the stuff we've done we've done today Hopefully you enjoyed and and you sort of get a feeling for how the debugging should be structured or how you can approach it When you see something not working right so I think the what I what I suggest you remember is the first one is Creating a pod is very very simple for you But it turns out that there are several components in sequence involved in creating this pod and any of these processes could fail at any time right, so so essentially We're left with some magic incantation that just works and and it's brilliant until it's not anymore and So if you find any of those issues even if it looks like it's coming from the top right from networking Then usually it's so much easier if you start from the bottom and you you draw you know You go up as you see as you confirm that the application is is working Then we had a look at these matching ports So how what you should be checking and then we had a look at endpoints. So these endpoints are So crucial that we use it for so many other things and something very useful when it comes to debugging just checking where they are Propagated I'm over time. I hope you enjoyed Thank you very much for joining our talk. I don't know if there is any questions or anything. I'm I'm conscious about time Any questions or if you got any questions You'll find me just outside. Thank you