 Hello, everyone, welcome to Cloud Native Live, where we dive into the code behind Cloud Native. I'm Annie Telesto, and I'm a CNCF ambassador, and I will be your host tonight. So every week we bring a new set of presenters to showcase how to work with Cloud Native technologies. They will build things, they will break things, and they will answer all of your questions. So you can join us every Wednesday to watch live. So this week we have Salman here with us to talk about operating high traffic websites on communities. If you are seeing a different title right now, that's to do some technical difficulties, don't mind that. Today we're going to be talking about, as I said, operating high traffic websites on communities. Very excited for this topic. As always, this is an official live stream of the CNCF, and as such it is subject to the CNCF Code of Conduct. Please do not add anything to chat or questions that would be in violation of that code of conduct. So basically please be respectful of all of your fellow participants as well as presenters. So with that done, I'll hand it over to Salman to kick off today's presentation. Hey, good morning, good afternoon, good evening, everybody. My name is Salman Iqbal. Thank you very much for the introduction there. Yeah, I am Salman Iqbal. I work for a company called Appia. We are a cloud native consultancy and we have we work in the cloud native ecosystem. So today I am going to talk to you about how do we scale our websites, which are running in Kubernetes. It's not only restricted to websites, just to do with all workloads that you're running in Kubernetes. I know when we talk about Kubernetes, there's a lot of talk around autoscaling, so things that can automatically scale based on some traffic that's coming in, perhaps a metric or whatever it might be. But that doesn't happen by default. There's a lot of work we have to do. There's a lot of configuration we have to put into our system and what are those things, how we can do it, that's all we're going to look at today. So it's going to be all demo and hopefully everything will work. And if it doesn't work, don't worry, I have a video as well of the whole demo. We can go through it and we can pretend that it was all live, but I've tested it a few times before as well. So I think it should be all good. If you have any questions at any time, feel free to ask and we can answer as you go along. If I know the answer, I'll try and answer it. If I don't, then we can search up the answer the normal way, which is using Google. Or if you're feeling adventurous, we can ask the question in chat GPT-4. How about that, and you can decide if we can, if we can search ourselves of what we should do, but it should be a good laugh. I think whatever you prefer, we will do. Excellent. What you're seeing in front of you is my screen. Whatever I'm going to be doing, I'm going to be sharing together. So here's the scenario. Scenario is you're running a website. And that website could be anything, your blog could be an e-commerce website, could be whatever it is. And a lot of requests, actually, I'll probably put some diagrams up in a second. If you give me a second, I shall open this diagram here. So we can all see it together. One second, let's just double-check here. This is where we are. Yeah, one minute. Okay, perfect. Let's get this open. What we have here is, give me one second, here you go. So I hope you can see this. I know we all know all the various components, how things work in Kubernetes, but just a bit of recap for all of us, what we're going to be focusing on today. Imagine this is our current setup of our website that's running. So at the bottom, you can see we've got some pods. Let's say the red ones are the pods which are serving. The traffic is your website that's running. And then what we have is some services which are sitting in between in the middle, at the bottom there in the gray bar. That's a service. Service is basically just a thing that sits on top of the pods. It is an interface, and that deals, I think of it as an internal load balancer. If you have multiple replicas of your pods running in a deployment, you can use a service and the service acts as an internal load balancer through traffic to one or the other pods, right? So something needs to do that. That's where the service sits. Service is all internal. We don't expose anything outside the cluster. Usually using a service, for that what we use is an Ingress. And an Ingress is what I would call it an external load balancer. And the purpose of the Ingress is to take requests coming from outside of the cluster and send it over to the container that's running our website. And that's the purpose of the Ingress. And the good thing about the Ingress is it understands HTTP requests. So if somebody says, hey, I wanna send a request to www.cncf.com forward slash, I don't know, check out, it will send them. It can understand the request. It can look at the headers and it can understand the request and make the routing based on that. So we'll look at the Ingress. We'll look at the service because we'll create this and we'll look at the pod. Now, the thing about the Ingress, which we're gonna focus on today, the thing about the Ingress itself is, here you go, next diagram here. They're not supposed to be the slides, but Arthur will just have some photos of what we're gonna see. The thing about the Ingress is Ingress usually comes in two parts. There is the Ingress pod, as you see in here. The Ingress is basically a normal deployment. So you would have that you can pick any Ingress controller you like. In this case, we're gonna look at NGINX. Everything we talk about today is open source and we also have a link where you can try all of this yourself. It's all the projects in our open source. So NGINX Ingress pod we're talking about is the open source, NGINX Ingress pod. And the point of this is, yeah, the point of this is that when it runs in the cluster, we install the Ingress controller. The cluster doesn't usually come with it and it's attached with the service. So this is now runs inside, is running in the cluster. So anytime request comes in from outside of the cluster, it goes to this pod. And that's absolutely, yeah, that's fine. The problem is, what if you get millions of requests? This can become a bottleneck. Would you agree with that, Annie? I hope you would, right? This could become a bottleneck, right? Because all the requests, yeah, all the requests are going to this one pod. So if that happens, there's two things you can do. Number one, you can always have many Ingress requests, Ingress pods running. The questions that are coming in, I'll answer in a few minutes. So to keep them coming, please do, it will answer in a few seconds. So the point is what you can do is you can start off with a number of pods. You can say, you know what, instead of running one Ingress pod in my cluster, it's a normal deployment, I'm gonna run five or 10 or 15. And that's absolutely fine. You can do that, but it'll take up resources. So you will have to allocate these resources and these pods might not be doing anything for the time where there's not much traffic coming in. So it's a bit of a waste and we don't like waste. So what we should do is scale up when there's lots of requests coming in and scale back down when there are no requests. And that's what we want to do. So that's what we're gonna look at today, how we can do this, how we can look at all of this and then we can scale up. So before we go any further, there's a couple of questions in here and I'm gonna, can we go through that Annie? Is that okay with you? Should we go through some questions? That's perfect, we should go through them. Yeah, and I think the first one that came in is can you route traffic other than HTTP or HTTPS using Ingress, for example, PSQL or request? So yeah, thanks Shadi for that question. I'm usually you route HTTP, HTTPS traffic using Ingress. I'm not sure if you can do PSQL, but we can check that later on in the stream. We can do a quick search ourselves together. Usually we deal with HTTP and HTTPS. There might be some extra controllers because these are different Ingress controllers. Some controllers might be able to actually, you know what we can do. I'll share some reference with you which might be handy. So I work with Appia and I also work with a company called LearnCase. Shout out to Daniela Plengis from LearnCase. There's different kinds of Ingress controllers and different controllers provide different capabilities. So you can see in here, this is a comparison sheet. All of this is open source. You can check out, you can contribute yourself as well. So you can just search for LearnCase research and this will come up. And in here is Nginx Ingress. And then on the left, you can see different types of routing mechanisms. So maybe some of them might support it. You can check it out, but I'm not sure it doesn't look like we can. So maybe different kinds of Ingress controllers could or could not support it. I hope I've kind of answered that question or not. So... Hopefully. Is Shadi, if you want to ask any more or you have extra questions, feel free to pop them in there. And then we had another one from Laurentinas that they are curious and how should they define high traffic? Oh, very good question. So that question is, how do you define high traffic website? How do we know something is high traffic? That all depends on your application itself. So for example, you can say, look, high traffic is, if I get, I don't know, 100,000 requests per minute, that's high traffic. That's all depends on what your setup is and what your setup can deal with, how many requests it can deal with at a time. For example, today, what we're gonna say if we have... I'll share the metrics in a second. If you have 100 active connections inside each Nginx pod and you know, you can test this out yourself and trial it. You can see how much memory and CPU it consumes. And we know like 100 requests per second. This is just an example. 100 requests per second is quite a lot. So then we can scale up. So this is entirely dependent on your setup and your application. What is a high traffic website? You can pick a different metric and then we're gonna touch upon that later on today. Perfect. And I think Shadi continued with, I suppose, yes. I have configured Ingress controller ones to root other ports than 80, 40, 42, 3. I used it to port 3.1, 0, 0, UPD there. Okay, cool. So I think it looks like it should be good. So, excellent. All right. Sounds good to me. Shall we carry on then, Ani? Yeah. There was a comment in the beginning which was it is like AL7 load balancer but there's not a question there, but obviously if you wanna say something. Yeah, yeah, that's correct. Yeah, I think that's what you're saying is absolute, right? So that's a layer seven load balancer. That's what it's doing. It doesn't understand services to like layer four stuff and then Ingress does the layer seven. So that's what they're saying is absolutely correct. Perfect. And then there was an ask from Dennis to share the materials. I don't think we showed a Google Doc here before, but if there's any materials, we'll get them linked to everyone attending as well as maybe you can share some, learn more resources in the end and so forth, but yeah. I will definitely, well, that's great. All right, excellent. So here's the thing. Now what we wanna do, I hope you all understand, what we are gonna do is in this demo, what you'll see is we'll deploy an application and like a normal application as you see in here and then we're gonna throw a lot of traffic to it and what we wanna do is we want to scale up this pod. So that's how we're gonna do it. There's a few steps for it, which we're gonna talk about as we go along. But first thing first, what we should do is deploy our pods, right? So let's deploy an application that we're gonna deploy to the cluster and then we can see what we can do in terms of scaling. Here we go. So let's just make sure it's all nice and big and everybody can see it. If you can't, please let me know. If you wanna change the background from dark to white, like we can do that too. But here's what we're gonna do. Initially, what I need first is a Kubernetes cluster. Luckily for me and for all of us, what we're gonna do is we will, you can use any cluster you like. In this case, we are gonna use Minicube cluster. Before nothing is running on the cluster right now, so let me just make sure we have enough space. So what we have is kubectl get nodes. This is Minicube, right? So this is a local cluster that's running on my machine and it's got nothing running inside. What's just the components to run Kubernetes? That's all it is. We have nothing in there. So I'm gonna go inside in here, CD demo, just make sure we're all in the right place. Yeah, excellent. So this, yeah, that's where we are. Okay, cool. So first things first is we need to deploy an application, deploy a website. Let's just pretend this is our website. This is our super fancy website. That's gonna get lots of traffic. And let's start with here. Let me just stick this thing here. I'm sure you've seen deployment files, plenty of plenty in your life. And what we have is we're using Stefan Prodan, shout out to Stefan Prodan, Pod Info. This is, you know, I'm sure you might have seen this. Basically it's just a website that's gonna run. It's a static website. We'll deploy this on our cluster. It is a deployment. So deployment, we can use to scale replicas if you want. But in our case, we're only gonna run one deployment. And that's what we're gonna do. So we'll take this, we'll deploy. And we got some other information in here around like pods which are running, they're labeled. But the point is we're gonna create this deployment first. Let's go. So let's go QCTL apply minus seven. We'll create the deployment. Hopefully, if everything's all good, we can do QCTL get pods and containers creating. There's gonna be a lot of that during the demo, of course. We'll just make sure the container has come up. It's taken a lot longer than I expected. Luckily, it's all come up. Okay. So the pod is not running, but how do I just tip to share with you all? How do I know if anything's running correctly? Usually what I do is I can, if something is running and I wanna make sure pod is running and because I'm gonna be deploying a number of components, how to deploy a service, how to deploy an Ingress and then we'll try and access it. But instead of just jumping through everything, I can test everything out, right? Let's just make sure I go to QCTL port forward because this is my local. I have access to the cluster. I can put forward it. I can say which resource I wanna put forward, which is a pod. So this is the pod. Let's stick this in here. And then what I have to do is this is the port the container is running on. That's the port the container is running on. So the syntax goes, I pick any port on my machine, 888, let's save that. And then the port the container is running on, 9898 in this case. Let's just deploy that. So 9898. Now this is forwarding. So if I send a request from my laptop onto the, here you go. Localhost, 8888. This is the website that's running inside the pod, right? So the first step is all good. We have a pod that's running. It's giving us a website. This is the website that people really wanna visit because they wanna see this cute little creature on here. And that's what they wanna visit, right? And what we're gonna do next is we need to make this available outside of the cluster. So you can send requests to it. Well, for that, as we've talked about before, I need to create a service and then I need to create an ingress. So people can send a request from outside of the cluster. So let's just go ahead. Nothing too extensive at the moment. What we have is another, another YAML file. There's gonna be quite a few of those today as is always the case. I'm sure you agree, Annie, lots of YAML files. We will create a service. A service, as you said, is an internal load balancer. And in here, there's a few things. We need to make sure that some of the configurations match up. And that's all we're doing in here and picking the right pod. So this is the deployment that we created. It has these labels, app info, pod info. And then what we have is this bit here, which is the target board. So when I deploy the service, the service will be attached to this pod. So let's go ahead and create that service. So if we do this for this a second, so if I, we can stop this, we don't need this anymore. So I'm gonna K, apply my self deployment. Oh, no, we've done the deployment service.yaml, right? So that's what we are doing. Now, the service should have been created. QCTL get service, I can do that. I'll just make sure everything's correct. Now, this is the service type load balancer. And if there's different types of services in Kubernetes, there's cluster IP, there's load balancer, there's node port, and there's one more, which I can't think of right now. And then headless, which is the smallest service. Yeah, that's right. This is using load balancer type service. This is running locally. If this was running inside a cloud provider, it will actually go in the cloud provider who will go in provision an actual load balancer and attach it to the nodes, which are running these workloads. And that you can imagine, imagine if you care about multiple services with multiple load balancer types, and you end up with so many load balancers that could become quite expensive, you could have a lot of load balancers. And this is why in order to expose services outside your applications outside, we use an English. But we need the service because we could have multiple replicas running. Now I've deployed this in order to check everything is correct. So I can do the same thing before forward. Yeah, and there was actually a question from the audience. Oliver was a fan of the comparison table of Inverse Controllers, is it available publicly? And they would like to get the URL. Sure, let's just do that right now because I'm just looking at the other screen. Here we go. I should stick this, I can stick this in the chat, right? So if I stick this in the chat and we can share it. So that's the Inverse. And there's actually quite a few other things in here you can check like there's Inverse Controllers comparisons, managed Kubernetes comparisons, there's comparisons on service meshes. And yes, it's all open source as well. Of course, please feel free to add, send pull requests, all the information is on that page. Okay, excellent. All good, right. So this is the service that's running. I just wanna make sure services is configured correctly. So I can do the same as what we did before. We can do 9,000 and then we can test it out. So let's just quickly do that, local host 9,000. And as long as we see the same page, that means what we've done is we have done everything correctly. So the services configured correctly. And so far, we're looking good, right? So we've done that. But what we really wanna do is install an Ingress. And I'm gonna deploy a few components first and then we'll break and take questions in a few minutes. If that's all good, I think we'll just deploy all the components then we'll take some of the questions in a second. So here's the thing though, what I wanna do is I want to deploy this Ingress. And the Ingress is something like this, right? So I'm gonna show you some rules in an Ingress. Let's just go back to here. So you saw the service file and I'm gonna show you the Ingress file. And the Ingress file looks something like this. We have the kind of resource that we're doing, some metadata, but this is the important bit in here. The rules, the rules of what we are gonna send down. And here is what these rules are. These are all the HTTP rules. And the rules are like this. All it says is if somebody sends a request to this path, so just the base path, you can put anything in like here, blah, blah. You can put anything you like in here. And if you send a request, in this case, we're saying to the base path, send a request to the service pod info. The one that we just deployed. That's what we wanna send it to. Now, the thing is also, what the request should be coming for example.com. So if somebody sends a request, this is what we are actually asking. So this is not just path-based routing, this is host-based routing. So if the request is for example.com, then what we can do is basically, we can send the request to the actual service and then down to the pod. But here's the thing. If I apply this to the cluster, so I can apply this to the cluster right now. So let's just stop this. We don't need this. We're gonna do K is the alias that I use on my machine for QCTLK apply minus F, ingress.caml. Deploy this on the cluster and it just goes in and stores that information inside the HCD database. But it doesn't know what to do with it because there's nothing inside the cluster that tells it what to do with it. There's no controller that's running inside. We haven't installed anything. We haven't installed this bit yet. This ingress pod is not running yet. So how do we do that? We can install this on the cluster using, we can go to their website. There's a number of ways of installing it. What we're gonna do is we're gonna install using Helm. And Helm is a package manager as you might already be aware of. And the good thing about Helm is you can have what's known as Helm charts. And the charts allow us to install all the components necessary. And that's what we're gonna do. So let's just quickly install this. So I'm gonna do Helm. I have the command in here. I'm gonna copy it from other screen because I have it there. So this is just adding the repository, which I already have. And then I can just use the Helm install command. Let's just pick in here. I use the Helm install command. And from that repository, I can install the engine X ingress on my cluster. So we'll install a bunch of things in my cluster, the controller that needs to run inside a pod and also the services and anything else that might need to install. And you can set some variables at the same time. I'm just telling you to also use basically what ingress is without class. You can define that class then ingress uses, but I'm just setting that right now. So once I run that, what I should do is basically go ahead, get all the bits that it needs, bring all the, yeah, here you go. Bring all the configuration down and apply it on the cluster. What I'm doing right now, just for demo purposes, I'm installing everything in the same namespace. That's where it's going. Everything is going in the same namespace, which is the default namespace, just for demo purposes. Usually you would deploy things in different namespaces and that is the right way of going about it. So, but this is just for demo purposes. If I do kubectl, get pods, what I should see is this bit here. What we've installed is engine X, so pod info and this is an engine X ingress that's running. So this is the controller that we've deployed, right? So so far, we're all good. We've deployed the ingress and then here's what we're gonna do. Because this is mini cube, this is running locally, right? And the name of this ingress is engine X. I can see if I can try and get to the pod that I have deployed. And in order to do that, I can do a number of things. Because this is local, right? So I can get the IP of the mini cube and that will basically be able to access that. I'll answer some of the questions. I see some good questions are coming up. I will answer that in a second. I'm just gonna quickly show you the website that we can access because our thing was, if somebody sends a request to example.com, now this is running locally, send it to the pod. But I can use a new command from, so we have kubectl to get a service. As I said, ingress is normal, a service, which is in here. You can see this service here. Load balancer service is called main engine X ingress, right? So you know how we did port forwarding? This is what I can do. Mini cube, mini cube service, similar to that, main engine X dash ingress and dash dash URL. This will just give me a URL locally I can use to access the website, just to make sure what we're doing is absolutely correct, right? So it's gonna run. There's a spelling mistake in here. And they should run in a second and give us the URL. Here you go, it's giving us two URLs in here. So I can access this and see if the website is there. But the problem is I can't do that because I have this host property inside. So what I can do is I can curl actually. Let's just make sure this command is running as long as this command is running. I can curl it, I can do this. I can pass in a header. Now we're coming, we're building up to the part of scaling in a second, example.com. And then I can stick the URL that's been given to me. And you can't, I know this sounds, it is not as exciting as seeing the cute little creature that we have in the URL, which let me see if I can pop back onto. It's not as exciting as this, but what we can't see is, look, there you go. That's the message greeting from POT Info. This is where the logo is coming from. I hope you all agree this is what we're doing. So so far what we've done is deployed our application. And then the bit that we're gonna do next is what we need to do. And we'll take a couple of minutes to answer some questions real quick and then we'll move on to the next part. We've got this set up complete. We've deployed our application. We deployed Ingress. We know Ingress works. We've sent the request to it and we can see it running. But there was only one request. What we're gonna do in a few minutes is pretend we have not pretend. We're actually gonna do a lot of requests and see how we can scale up. But for that, I don't do a few things. I didn't decide how do I scale up. So I need to pick a metric that can scale up. And then I need something else to help me to scale up. And then I think, Annie, this is maybe a good point to answer some questions. There's one question I think I'll start with if that's okay, because that's the thing that we can answer really quickly. I think there's a question on, can I add multiple Ingresses? If possible, are there any precautions when using it? Yes, you can add multiple Ingresses in the cluster. What you have to do is in here, you can have multiple Ingresses and let's just go in here. And then what you have to do is in here, you have to define what class name you're using. What do you have to watch out for? Well, the things that you have to watch out for just make sure you use the right Ingress class name for the Ingress to use. I've seen some examples in the past where people have run multiple Ingresses in the same cluster because of different requirements from different kinds of applications. But yeah, everything that you have to watch out for when you set up an Ingress is the same what you have to do. For example, don't declare the same host twice, but I don't think there's that many pitfalls for it. I wanted to answer that because it was related. Any other questions we've got so we can quickly answer? Yeah, of course. And I also appreciated that Mikhail provided a link to Kubernetes docs for Zoom Jane to check out as well. That's always great to see. And also there was a question before to get the link to the previous resource to LinkedIn as well. And Jillian helped out with that one. So thank you so much there as well. Absolutely, thank you, Jillian. Yeah, and then we have the questions. So there was a question which goes as how does Kubernetes know that I'm running my Kubernetes in a cloud provider such as AWS and how does it know what type of load and server to provision? And there was some helpful information already provided by another commenter, but obviously let's answer the question here as well. Yeah, so yeah, I think Alvin also on there asked a question. So when you deploy inside Kubernetes in a different cloud provider, what happens in Kubernetes is when you create resources, what you can do, you have something called controllers that's running inside the cluster. So built-in controllers like the replication controller. So if you create a deployment, the replication controller is watching in HCD to see if there's any changes for it and it creates that. The same thing happens when you're doing a cloud provider. They've got their own controllers, which is extra bit of logics that's running inside the cluster. So when you deploy something, it basically acts upon it. It says, yo, I need to create a load balancer. And it will tell you already, you already know if you wanna create this kind of load balancer. This is the configuration you have to pass in order to create this. You can see in here in what is spec in here, but there's metadata information. There's annotation section also, which we have included in here. In the annotation section, you might have to give it some helpful hints if you need something additional. So each cloud provider will ask you to do something slightly different in the English configuration. So the English is the only part which could be slightly specific to different managed provider, but everything else stays the same. In here, you'll have annotation section. You might have to add some extra bit. Hopefully that answers it. Yeah, perfect. And then the last question so far comes from Alejandra who asks, is it necessary to deploy a service in load balancer mode if I'm using my cluster within a public cloud? In this case, wouldn't it become enough to configure an Ingress pointing to the service and the load balancer would assign the external IP to the Ingress instead of the service? Oh yeah, it's a very good question. Again, yeah, it's not, I just did it just to show it, but yeah, definitely not, it's not necessary to deploy a service. In the load balancer mode, we don't wanna do that either because we're gonna have an Ingress that will take care of everything. So you don't have to deploy a load balancer mode. That's absolutely correct because you can't do it with authentication, you can't do any of that stuff. So yeah, that's what we're gonna do. Perfect, and we got a comment from one of the earlier Christian's askers that Perfect makes sense. Thank you, so Perfectly done there. Perfect, yeah. Yeah, and then we have Parul asking that if this repo is public, can you please add the GitHub link as well? Sure, I will do. I'll share the link in a few minutes when we take a break. But yeah, there's a whole blog that Daniel had put together on this stuff. I'll share the link in a few minutes and then we can check there is that if that's all good? I'll have to dig it out. It's somewhere here, but you'll have to give it two minutes to do that. But we can continue on if that's all good. Yeah, sounds good. Thank you, Annie, for keeping it ticking along. There's excellent stuff. Okay, cool. So what we've done so far is deployed our application, but here's the bit though. What we wanna do is scale our Ingress. I'm gonna talk about Ingress, but you can assume all of this, also applies to anything, not just Ingress, but you can apply this to any kind of pod that's running inside the application. You can pick any pod you like, and you can do that. But we're gonna talk about Ingress, but just because it's a good use case. Before I can do that, how do I scale up is the question. The question is, and the way you scale up is you need what's, this is what we wanna do. I'm gonna put this diagram up. Here you go. Imagine we have a deployment, which is our Ingress deployment, and it's running some multiple pods right now. Let's just say it's just running one. And then what we wanna do is query some metric. So query some metric. I think there's some questions. Service measures we'll answer that later. So what we wanna do is we wanna query some metric, but how do we get this metric? Well, there's a few things you have to do to get these metrics. Number one, your application has to provide these metrics. So usually, if it's a website, you would create an endpoint in your application forward slash metric and add those metrics in there, whatever the metric might be. And I'll show you an example of that. And then the other thing is you need something to scrape this metric and store it. And for that, what we're gonna do is use yet another open source project called Prometheus. Prometheus, or this one. Prometheus monitoring, but that's what we're gonna do. I don't know if people have seen this movie. I haven't, but apparently it's a very good movie. So what we're gonna do is we are gonna use another open source project called Prometheus. And the good thing about Prometheus is it comes in different parts. We can store it on the cluster. You have what's known as a Prometheus server. And that's basically the central component for everything. It scrapes the metrics and stores it and in the right format, in the format that it needs. And also the good thing about Prometheus is it can talk to the right components. Well, when I say right components, I'm talking about the Kubernetes API and it discover all the services that are running in the cluster and all the pods so you can go ahead and find it. But the main thing is all the containers that are running in the cluster, if they're exposing metrics on an HTTP endpoint on usually forward slash metrics part, they can take all those metrics regularly and store it. And that's what it does. And also it gives you a really cool dashboard which we're gonna use in a second. So basically what we need is somewhere where we can store this metrics. And then once we have this metric, then there's another component inside Kubernetes. There's another kind of resource called the horizontal pod autoscaler. There's two kinds of, well, really in Kubernetes you have three kinds of autoscalers. You've got the cluster autoscaler. So if you wanna scale up the cluster, depending on if you're running on resources, you have what's known as a horizontal pod autoscaler, which is just increasing the number of replicas of the pods. And then there's something called a vertical autoscaler which we're not gonna touch upon today, which is increasing the size of the pods like allocating more memory and CPU and that sort of thing. But we're gonna talk about horizontal pod autoscaler. So there's a component called the horizontal pod autoscaler. And then that's what we are gonna look. So there's a bit there we're gonna query and we are gonna use the horizontal pod autoscaler. So how do we store these metrics? How do I get this? Well, again, lovely Helm, we're gonna use Helm and we'll install this. So if I can, I think the screen is big enough. And then if I do Helm, install, ooh, Helm, install, let's just do this. I have, for the sake of saving a couple of seconds here, we're just gonna install Prometheus. Again, once we install Prometheus, we'll have a bunch of pods that'll be spun up for us. So let's just make sure everything is good, K get pods. So you can see in here, as I said, there's installed a number of components, right? There's a server, the bit that we really care about. Well, we care about pretty much everything. There's a couple of bits in here which are doing different things. We have alert manager if you wanna sign alerts. If you haven't tried Prometheus, I would highly recommend trying it out. If you're looking for a solution, there's many solutions out there. Of course, check it out on landscape.cmcf.io, a bunch of solutions for monitoring. But this is a really good one to get started with. As you can see, for me it was quite simple to start with. I can install inside the cluster using Helm and once it comes up, I'll just wait for it to come up. Everything's almost there, server is almost there. So once the server comes up, what we're gonna do is then we will going to start collect some metrics, right? But what metrics can we collect? Because here's what we want to do. We can decide ourselves what we're gonna scale up and what we are gonna scale on for our thing is the, here's what we're gonna do. I'll show you in a second. Luckily for us, I don't have to modify the, I don't have to modify the Nginx deployment itself because luckily for us, there's some metrics that already exposes and this is the Nginx's, this page, already exposes a number of metrics on the forward slash metrics endpoint. And one of them is Nginx connections active. So if there's active connections in there, I can scale up. But we can define that ourselves. We can define our metric. We can say, hey, if there's like more than 100 active connections inside each deployment, scale up. Of course, that's a lot further to take in. That's what we're saying, right? So we don't have to do anything, but if this was your application, you'd have to go in and you'll have to make sure that you're exposing the right metrics, right? For example, let's just go in here. Let's just make sure we got our pods up and running. Luckily, so far, we are good. So everything is running. Let's just check if I can do a mini-cube service for me to server. We will get that lovely UI on which we can go in and have a quick look, see if we can see some stuff that's running. So here you go. It's gonna start up in a second. Here you go. Here we are. That's what it looks like. I can query something. Let's just say if I wanna see how many CPU cores I'm running. So let's just execute this. And machine CPU cores, it looks like there's a line here, but really all of these are the labels to go with the request that we made. And you can see, oh, this machine has eight cores. This is the value that it's giving. And also we have a lovely graph in here. We can see this graph. It looks good, right? So we've got the graph. But how about the bit that we actually care about? Well, the bit that we care about, the metric is the engine next. Ooh, let me see if I can type this correctly. Engine next. Connection is active. Hey, look at that. Magic. All I did was install Prometheus. And Prometheus managed to grab the metric and understand it already because the metric is already exposed. It picked it up. So let's just execute this. And let's just go to the table real quick. One, you know, we sent one request. Remember that? This pod in for open that we send the request to. So basically there's just only one active connection. That's right. That's kind of cool. But we're gonna, we will pump a lot on there. I'll answer some questions in a minute. We'll just go through this. And you can see there's only one engine next connection active. Now, how are we actually gonna do, how are we going to pump a lot of requests in for that? What we're going to do is used low-quest. A locust is another open source tool that you can use written in Python for load testing. And it's really great. You basically have the locust file. I'm just gonna, if you haven't checked it out, definitely check it out. If you're looking for something to do with locust, I'm just gonna briefly show you. There's just an example in here. If you've written Python before might seem familiar. If you haven't, it is kind of straightforward. In this case, we can define a locust file.py. In there, we can write all the configuration. Hey, go to this URL, go to that URL. There's how many users I want to have. This is how many concurrent users I want to have. You can write this. And the good thing is we can do this in Kubernetes as well. How can we do this in Kubernetes? Well, I'm glad you asked. This is how we can do this in Kubernetes. We can write this configuration. First we're gonna deploy more YAML file, locust itself, because it's basically a Python package, right? So that's the Python package you can install using Python. We can run locust, that's good. Locust will run our cluster. But if you wanna, and then we have a service, everything because we need to have the UI to send that. And then the locust file.py that I was talking to you about, the one where we write what we should be doing is this right here, locust config map. Config map is something which we use in Kubernetes to store some information. The config map in here is what we're saying is just send the request to example.com. That's what we're saying. That's what we're saying, send the request to example.com. And then we're gonna ramp up, right? So let's just deploy locust. It's making sense so far, Annie, I hope, right? So that's what we're gonna do. And excellent, so perfect. For me, this will keep running. And let's just go to the right place where we will be going. Make sure we're good. One sec. All right, yeah, so let's apply that. Ooh, QCTL. I just wanna make sure we do the thing that we're here to do because we might be running out of time. Apply minus F, come on. So, can you get pods? Now locust is gonna be up and running. This is, that's just our load testing thing, right? So this is how do you do load testing and this is how we're gonna do load testing. But how do we actually do the scaling, right? The thing is we have the horizontal pod autoscaler. And the horizontal pod autoscaler is very good. If you go to Kubernetes, HPA, Kuber, that is HPA, Kubernetes horizontal pod autoscaler, but you can check out yourself later on. The thing with the horizontal pod autoscaler is I can define in here if we could scroll down for a sec to the right place. We can say stuff like, okay, where are we? Am I in the right place? Yeah, we can say stuff like, hey, if the memory goes, if the CPU, if the application is consuming more than 50% of the CPU, scale up. Or if he's using this much memory, scale up or scale back down. What it doesn't do is for, you can do custom metrics, but it's a little bit more involved. There is a little bit easier way of doing all this stuff. And this is where something called CADA comes in. Yeah, now the open source, Kubernetes event-driven autoscaling. It doesn't just give you options of a few things, like, oh, you can scale based on this. This runs, this basically feeds into the horizontal pod autoscaler. This is not autoscaling, because the purpose of the horizontal pod autoscaler is to increase the number of replicas in the deployment. So what we can do, we can deploy this in the cluster. And this comes in multiple, it's got multiple parts too. I have a diagram to show. Let's go in here, paid up. And once you install in the cluster, what it has is a few things. It has a metrics API, so it can consume metrics. For us, we're using Prometheus, but it's all good, this can consume metrics too. Instead of Prometheus, you can use this. And then it has the adapter to make sure to put the metrics in the right places. And then it has a controller, the bit that runs inside the cluster that says, okay, this is what I need to do, installs a bunch of custom resource definitions, which we're gonna touch upon. And also, the thing that we're not gonna cover today, but something that you should really check out is the scalers for CADA. For example, the good thing about is, if I can go to the scalers, you can have a bunch of scalers. For example, you have something like Kafka queue sitting outside the Kubernetes cluster where it might be sitting. And you might say, hey, I wanna scale up if there's messages, more than a hundred messages in a Kafka queue. And then you can scale up based on that. And you can define that. Or you might base it on a different, like a SQL query, but a SQL query scale up on that. Well, that's what CADA is excellent. So definitely check out open source project, CADA, check it out. And installing it is the same as before. Really, what we're gonna do is we are just gonna install using half. And we'll answer some questions in a second. Just wanna make sure we get to where you wanna get to. So I'll install this inside the cluster, give it a second. I should bring up all the components that we need. And clear. Okay, get pods. So here you go. You can see this operator is starting up. That's the bit that's going to figure out winter scale up. But how does it know winter scale? Well, this is where we use, let me just go here, something called the scaled object. This is not Kubernetes native, but once you install CADA, it installs these custom resource definitions. And in this case, what I'm telling it, hey, your target is this main engine X in English, deployment, right? That's the one that we wanna scale. We can give it some information around how many replicas we like to have minimum maximum. What's the cool down period? After that, it can go back down. What's the polling interval? One or one minute or whatever it might be. But the main thing is what is the trigger? And the trigger is had Prometheus. So we wanna go to the Prometheus server, look for this metric name engine X active. And some, well, that's the metric name. But what we, this is our query. What we're gonna do is look for the engine X and ingress active connections that we talked about before, match it to that name over one minute. And the threshold is under 100. So if the requests for a pod go over 100, active connections, scale up. And it will decide how many replicas it needs to have. And that's what we need to do, right? So that's what we're gonna do. So if I go in here and let's just make sure we've got the right pods, pods are running, everything is good. And I can deploy this scaled object, okay, Y minus F, scaled object. So the scaled object is gonna sit in the cluster and let that, let's just give it a second. Taking longer than a half. Ooh, there you go. Even if it takes a split second longer when you're doing a demo, feels like an eternity. But that's where we are. So what we've got is everything is now set up. We have everything in our cluster. How do we, let's just quickly scale up because I know we're running out of time. So let's just scale up. But there's one more thing I'm gonna do. We, what we wanna see is, let's just put in here. We're gonna bring up the locust. Let's bring up locust. This is because it's deployed as a service as well. So it's gonna bring up locust in a second. And then we'll have a UI in which we can send, we can start sending information. So let's just put this on the side real quick. Let's just do that. How long we got, Annie? How are we doing for time? We have 10 minutes left, but we already have five questions to answer already. So yeah. Okay. Cool. So let's just quickly do this demo and then we'll answer the questions. Five minutes for the demo, five minutes to answer questions to finish bang on time. Does that sound like a plan? Absolutely perfection. Okay, excellent. So what we got is a couple of things in here. So we have this thing. We have Prometheus, which is running. So let me just quickly do this. Okay. How about this, right? Look at that. Nice. What we're gonna do is start sending requests. When we start sending requests, and this is the UI, but what we want to do is see the pods scale up. We're not gonna just do like QCTL get pods. That's a bit boring. So I'm gonna do something to give you a little bit more financing. So let's just quickly go to the UI, the place that we are. Yeah. Scaling, this is also in the repository. And we've got this dashboard that Daniela put together. So I'm gonna run this dashboard and you'll see that in a second. Let that spin up. Give it a second. Let it spin up. Local host, 8001. Oh, that's not great. How about that, right? We do not want to see, this is basically pulling information from the cluster itself. All the pods that are running inside the cluster. That's all we're showing you. So these are all the KDA operator, blah, blah, blah. All matches up. We have one engine X pod. So once we start sending requests, what we should see is a number of requests increasing. And then eventually we should see the pods scaling up because we've defined everything. We've got metrics that we're collecting. We've got the scaled object. We have the pods that are running and that's what we're gonna do. So here's what we're gonna do. We are gonna send the request to the ingress directly by the host here, which is the service itself. And we'll do something like this. So let's have a peak users of 2,000. Why not? As you can see, I tested this out. Sworn 10 users per second. So we're gonna start swarming in a sec. And then this is Locust. It will start giving some information. Like you can see, it'll start ramping up. What you can see is a total number of requests. I'll have to zoom out slightly so I can show you all the information. Total number of requests, ooh. Make it a little bit bigger. Total number of requests that are going in and response times, it's taken a little bit long because what we wanna do is if I can execute this again and we should see in a second, a bunch of requests coming in. So let's just execute that. It'll take a second. And what we should see is a lot of requests coming in. There's a bit of a lag in here that you can see. And then what we should see, if I go in here, let's open this, QCTL there, pods. What we should see is, oh, look, Ingress is already starting to spin up. We didn't do any of that. You saw, I didn't do this. Because KDAB came in and it started to see, hey, you have a bunch of requests, they're going up. So if the requests are going up, response times were going up too, but active connections were also going up. So let's just quickly execute this. Oh, you can see there's a flame line in here, but you can see that active connections are shot up real quick. And what we have now is if I can hop back onto this, you can see three pods that have come up. This is, I'm not making this up because this is all happening in the cluster. So if I do QCTL, I get pods and we have a bunch of these pods that have come up and then more will come up as they're required. What KDAB does is it actually calls horizontal pod autoscaler. So it creates that autoscaling program. And then it updates the deployment, right? So it updates the deployment directly and then it basically starts scaling up that it needs to do. And this is what's happened. We've gone from one to three. And if I let this run for a while, you can see like three are dealing with it, 100 requests per second. They're all dealing with this fine. Usually when I do this, if I were to do it, like I'll let it run and then we can pop in the horizontal pod autoscaler, it'll scale up. And you don't just have to do it for this, you can do it for anything you like. How does that sound? It looks like the demo worked, right? So we look good, we scaled up. You can do this for anything. It doesn't just have to be for ingress. So I think we should start answering some questions. Oh, look, we've got more pods. What I'll do is while we answer questions, I'm gonna stop this. So by the time we're done, we'll see the scale back down. So we don't wanna put any more requests. All right, go for it then. Great, perfect. I'm great that the demo worked. There was always a bit of nerve wreck about that. So there's a question that came in, which is the first in line. Please explain how does all of this come together with service mesh, like Istio, for example? So service mesh is, you can inject, service mesh is used for a number of other things. This autoscaling is not part of service mesh. It's not one of the features of service mesh. It doesn't do that, but you can still have service mesh that's running. You'll just have to make sure, if you're using the right Ingress gateway or the Ingress itself, that's the configuration that you have to do, which doesn't touch any of the autoscaling stuff. But yeah, the Istio service mesh do not have the autoscaling capabilities. For that, you have to use this. I hope that answers the question a little bit. But yeah, apart from that, everything else is the same. It's not much different. Great, then there was another question. I thought the default installation of NGINX Ingress Controller using Helm will install Ingress Controller as Damon said, that's one pod per node. Yeah, I think it's a deployment. I can't remember, you could be absolutely right. It will start one pod per node, but the thing is I have one node and here is Minikube, which is fine, and there'll be one pod, but it's not necessary that one pod will be able to deal with all the requests that's coming in. So it's just for the demo purposes, I still using NGINX, but you can go with the Damon set that might be enough to handle traffic, but it might not be, so you might want to scale up more than the Damon set that might be running. Yeah, that's good. And then Jose asked, can we configure the time interval in which Kubernetes scales our application in the cluster based on usage metrics? Let's say, for example, we have some deployments that we need to react faster to an increase of incoming load than others. So the question, if I understand this correctly, if we can scale up for specific deployments faster than others, is that correct? Did I get that right? I guess it's a time interval that they are looking at. All right, okay. So the polling interval, yes, you can change that polling interval. I can't remember what the least, the lowest value that you can set. Yes, you can change that polling interval. You can go much faster. That's very well picked up there. And yeah, you can change that. Yeah, and then Rama asked, can I use Kibana for visualization? Oh yeah, you can use anything you like for visualization. Usually with Prometheus, Grafana goes well. I know that they're both right, Kibana, Grafana, but Grafana goes quite well with visualization. I've never used Kibana if you're only using Prometheus, but you can, yes, you can use that. Great. And then Jules asked, is it possible to scale up across different cloud providers instead of relying on one? Very good question. So the question is, can I use, I'm gonna assume the question was, can I use Kader to scale across multiple cloud providers? Is that correct? I think, do you think that's the question? I think it was maybe during Kader or just before Kader that the question was asked, but I think they're maybe looking for any solution that can help them with that. I think it's a little bit, there is some stuff that was done. I'm just gonna think about, let me have a quick look or we can quickly pop into chat GPT and ask it. I think if you're doing by cloud provider, across multiple cloud providers, there is more work that you have to do. There's probably some projects out there that can help you with it. I can't really think of it at the top of my head. So honestly, no idea. But multi-cloud stuff is always harder, but you might have to check something, some other projects to do that. Yeah. Multi-cloud is definitely very big topic. So there should be a lot of content around that. And then there was another question, one from Diego, if there's a details attack that sends a little request, how we could avoid auto-scaled to print the budget, WAF in front of the English solution? Yeah, web application firewall, that's an excellent suggestion. Yeah, you can definitely use that to make sure that you protect yourself from, you don't send this, there's a few more options there. So yeah, Diego, very good, very good point. Yeah. Great. And then we had Oliver asking, what is the URL of the Git repo that has this code? I think it's the same as the previous one, if we can share it or not. Yeah, I am gonna find this in a second and I'll share it. I'm just looking for it right now. Yeah, no worries. Because yeah, it's here. There's two more questions to go, but we are at the hour, but if we are super quick, we can maybe just tackle the two as well. So what is the difference between the auto scalers provided by CSP and Kata? Is Kata compatible? With traffic ingress and OQA cluster? CSP and Kata, I'm not aware of CSP, but the thing- The meetings, cloud service provider? Oh, the cloud service provider. Okay. My guess. Yeah, so I think, cloud service, I've seen that cloud service providers are integrating Kata in their solution, but I think all the cloud service providers also don't provide this by default. So you have to add this on top of it. There's nothing, the cloud service providers provide you to scale your application like that. I'm just sending the URL as well. Yeah, perfect. Let's see your URL. There you go. Good. It's a bit early URL, nice to find it. Perfect. And then to the last question. Oh, there we go. I sent, so for people who aren't in YouTube, the link is bit.ly slash KCD, and then is it? Dash scaling. Dash scaling, yeah. So let me put it up in here. Here you go. That's the link. You can screenshot, do whatever you like. So here's the links, app.io, then we have a bunch of blogs on there. Check it out, learncase.io. So all the resources are in here. The demo at the bottom is bit.ly slash KCD scaling. So you can check that out on there. Perfect. And then just a quick 30-second answer or so to question, do we need to scale up other cloud-based Ingress controllers like AWS load balancer controller? It all depends. All depends. If you can handle requests that are coming in, that's fine. Otherwise, if you have control over it, you should scale up. I mean, Ingress is usually very good at handling the traffic. If you have the right number of instances running, sometimes you might have to boost up or not. But this is just one example you should really look at. Do you need to use this for your workloads too? But you can depending on what's happening in your cluster. If your cloud provider controller can handle it, usually they are good. Usually they are good. Perfect. And well answered there on quick as well because we are out of time. So let's start wrapping up. So thank you everyone for joining the latest episode of Cloud Native Live. It was great to have a session about operating high traffic websites on Kubernetes. And as always, particularly this time, also really loved the interaction and questions from the audience. So many questions. It's great that we got to through them all. And as always, we'll bring you the latest Cloud Native code every Wednesday. So in the coming weeks, stay tuned for more great sessions. Thank you for joining us today and see you all next week. Thank you.