 So it's also a part of our sponsors for today. So listen to this talk. Yeah. Let's give it a round of applause. Thank you. Thanks for the introduction. So thank you, everyone, for staying so late on a Saturday. So we appreciate that. So today's topic, Container Native Load Balancing on GCP. So before going that, whenever we speak to customers of Google Cloud, because it's business for us, what we do is we like to concentrate on the unique things or the differentiators of GCP or what makes Google Cloud better and where it outshines when it compares to other cloud providers right now. So two things which we could really find customers convincing and then which quite resonates with them as the global VPCs which are found on Google Cloud, wherein the networking is quite fast and the setup is hassle-free for people working on Google Cloud. And then the other thing is Kubernetes as a service. The managed Kubernetes offering, GKE, which is much more mature compared to other cloud providers. So the cloud native load balancing is something which combines together the best of both the worlds, which means it makes use some of the unique networking products on Google Cloud Platform. It combines it with your applications running on Google Kubernetes Engine. And together, it enables your applications to perform better with much less latencies. So let's see how all this fits together. Right. So at the heart of this is something called as network endpoint groups. So if one looks at that slide, so on the left side, which is in blue color entirely, that is the traditional way how the load balancing has been implemented all these days. So wherein we have a load balancer, which is front-facing, wherein the user traffic starts hitting it at the first place. And then the back ends are simply nothing but VMs or virtual machines which you see on any cloud or even on-premises. So mostly, most of the times, the way load balancing is done is, once the traffic hits there, it is actually distributed. Sorry. Increase the volume because some people are saying that they're low. OK. Am I audible to everyone? OK. Thank you. So yeah. So when we speak of traditional load balancing, the way it is done is most of the cases, VMs happen to be the back ends, wherein the traffic after hitting the load balancer is redirected to one of the VMs. So VM is the thing which is kind of taking the traffic. So this pretty much happens the same. I mean, when we start moving our applications to containers or when we say we are containerizing our applications, they are not running on the, I mean, they are still running inside the VM on the operating system. But there is one more layer added which is called Docker or containers. So it doesn't make sense for still to be relying on the virtual machines to actually load balancer traffic, right? So that is what the point we try to emphasize today in our presentation. So if we move to the right side of the slide, there is something called as network endpoint group. So you see there, on the left-hand side, the traffic first hits the load balancer, it then comes to the instance group, which is a collection of multiple virtual machines. And then it is from their load balance to any of the virtual machine. So he, and from there, once it reaches the virtual machine, the coop proxy, which is part of the Kubernetes, it's, it, it consoles the IP table rules. And then it tries to reroute the packet from the virtual machine interface to one of the pods or the containers which is actually hosting your application. So if you see here, there are one, two, three, and fourth is the final hop, which is the pod. So there are four hops which your request is taking. Contrary on the right side, if you see, if we start counting the number of hops which your network packet takes, it is only three. The layer, which is the instance group, has been completely removed from the picture. So where in the, the way it has done is, instead of having your virtual machines as the backends for your internet-facing application, you can directly have the pods as the backends for your load balancer. So what load balancer does is, instead of trying to do health checks on the virtual machines and distributing the traffic across virtual machines, it tries to directly conduct a health check on the pods which are actually hosting your application, and it tries to identify a healthy pod and load balance among the pods. So that way we are saving one hop in the network round trip for the travel time, and that way we expect better performance. So the next thing we try to emphasize today is, we'll try to, we have actually tried to test this feature. This has gone general available earlier this year. I think in the month of June, this has been announced into general availability. So we thought it would be a good time to put this to test. So before we speak much deeper into this one, there are a few constructs which we would like to walk you through. So when we create a Kubernetes cluster in Google Cloud, there are two ways it can be created, essentially there are two kinds of GKE clusters. So one is called Routes-based GKE cluster, wherein it's the traditional way of doing it until now or late until last year, wherein when you try to create a Kubernetes cluster, most of you might be knowing that there are three different IP ranges, one is for the host itself, one is for the pods and one is for the services. So when you try to create a Routes-based GKE cluster, what happens is GKE will install some routes on your behalf into the VPC in which the Kubernetes cluster is installed. So the way the routes are installed is shown in the latest slide. So what happens is Google automatically creates one route per node and says all the pods, if the traffic is meant for a pod which is running on a specific node, send the traffic to the node and the node will take care of rerouting it to the pod. That's the traditional way. And then if we move to the next step, which is called VPC native GKE clusters, so that's where this network endpoint group and the container load balancing comes into picture. So where they try to eliminate the way the routes are managed in Google Cloud is a bit different. We will show you in the screen to the next slides. So it's a bit different. They try to eliminate some routes for you and they try to create VPC native routes. By saying VPC native routes, what we mean is irrespective of working on Kubernetes or not, when we try to create a route in Google Cloud or when you create a subnet, a route is automatically created for you. That is the native route which will be created by Google Cloud for you and it is quite efficient because it just works out of the box. So what Google does is when you create a VPC native GKE cluster, it creates such native routes for your pod and service networks. So your pod has a CIDR range and your service has a CIDR range. So both of these ranges will be created as VPC native networks and then the routing will be handled differently. So alias IP ranges is the concept which enables Google to do this. So what they do is they take your subnetwork or the subnet in which your Kubernetes cluster is running and they create additional secondary ranges of IPs. So one range will go for the pod CIDR and one range will be for the service CIDR. And apart from these three, there is something which we will be touching upon that is called as ingress. So we will go into more detail in that later slides. So this is what we are speaking about, route-based cluster. So when we create a route-based cluster, so on top when we say routes for VPC subnets, that's how a default VPC route will look like. So this, for example, I have created a subnetwork in the Singapore region of Google Cloud and something like this, the top line will get created. And when you create a Kubernetes cluster, something like in the bottom, the GKE installed routes. So what happened is the name of the cluster is GKE STD cluster. That is the cluster we have created. We have created it in the Singapore region. We have created it as a three node cluster. So that's the reason you see three routes. So which means when you try to grow the number of nodes in your cluster, you're actually increasing the number of routes. So Google will give you a quota. For every customer Google gives a quota saying you can have these many routes. And every time you fall short of the quota, you'll have to go back, contact Google, and again get it upsized and start creating new routes. So very quickly, if you are supporting a large-sized Kubernetes cluster which is taking traffic from all over the world, very quickly you will run off this kind of things. So this is the downside of doing things the old way. So now on the flip side, when we try to look into things on the VPC native clusters, so the top screenshot is something which shows the alias IPs and the VPC native routes. So if you see in the Asia-South East one, the top line that corresponds to the Singapore region, the 10.148.00 is the CIDR, which corresponds to the nodes. Which means every node in a Kubernetes cluster gets an IP address from this range. Apart from that in the dropdown, still within the Singapore subnet, two more subnets have been allocated for you. So those are VPC native subnets, which means you don't need to install manual routes for that. Google will automatically take care of routing for any IP address which falls into those particular segments. And if we see in the routes section, when we try to filter in the Singapore region, like if you see, this route is something similar to the last route we have seen in the previous slide. So this is what we call as a native route and this route corresponds to 10.00 slash 20, which is the pod CIDR range. So the pod is actually getting installed. The pod range is getting installed as a native route. So when we are to create a Kubernetes cluster with these advanced features, how does that happen? It's actually pretty simple. When we try to create a cluster from the console, you see this box which speaks about the networking for Kubernetes cluster. All one has to do is just check that tick mark and then all the VPC native features will be by default installed into your cluster. And actually, one needs to turn it off because Google is turning it on by default. Right. So we have touched upon the service earlier while speaking about it. So this service in Kubernetes for those who don't know is basically an abstraction over the pods. So my actual application, yes please. Okay. So in the Kubernetes world, you can think of pod as a container itself. So when you are saying your application is containerized in a normal world, a container is actually implementing your application. So when you want to start speaking to your application, you start speaking to the container, right? So you can think of pod as the basic fundamental element in Kubernetes world. Yes, it's a collection of containers, but mostly one pod is one container, most of the cases, but it's not a hard and fast rule. But you can think of it like that for now. Right. So, okay. So service is an abstraction on top of the pod. So when your application is being implemented by collection of pods, because pods keep dying down and getting restarted, the IP addresses keep changing. You need to get hold of a fixed IP address, which you can use while you want to communicate with the application, right? So that's what a service is meant for. So this is how a service YAML file looks like. And when we start speaking about services, there are three kinds of methods a service can be created, like three kinds of services actually. The first one is called as cluster IP, which is when your service is still has to live inside a Kubernetes cluster, internal to the cluster, that is when we choose a cluster IP. But if your service or application is expected to take traffic from outside the cluster, then the cluster IP option doesn't fit in. That's where the other two options come in. One is node type, node port actually. So node port service and then it's a typo and then load balancer. So both node port and load balancer are the kind of services once one will choose if they are to support external facing traffic or if one expects traffic from outside the cluster. And the node port has some downsides associated with it, wherein you will have to know in advance which range of node ports you will reserve for your applications. And the same range has to be reserved from all the nodes. And when a node goes down, the endpoint or the IP address and the port combination where you're trying to speak to your application to it changes. So it's not quite optimal. And that's where the load balancer has come in as an improvement. But again, what happens is when you create a service of type load balancer, it creates an L4 load balancer on Google Cloud. And the downside with it is every time you want to expose a service to the internet, you will have to have as many load balancers as as many services you want to expose. And this is not economical for most of the customers. So that is where Ingress actually comes in and saves a lot of customers. So Ingress is a kind of object built into Kubernetes, wherein on Google Cloud the way it behaves is when you create an object of type Ingress, what it does is it creates a L7 load balancer on Google Cloud. It is called as HTTP or HTTPS load balancer on Google Cloud. So as in one, we submit this YAML file to the Kubernetes API server. There is an Ingress controller which is watching such requests and it gets the request and it starts speaking to the cloud networking components or the Google Cloud networking components and it will create an HTTPS load balancer for us. And what's unique about this one is as we spoke about load balancers just now, you can expose multiple services in a single cluster all using one Ingress, which is one load balancer. So you will have multiple services living in the same cluster, all exposed outside world using one static public IP address and one load balancer, which is quite economical and cost saving otherwise. So now coming back to the network endpoint groups, I will show you in further slides how actually, so even if you choose this network endpoint groups option or not, while creating an Ingress, it will definitely create an L7 load balancer. But when you choose this network endpoint group as an additional option, what happens is it affects the back ends of the load balancer. If you don't choose this option, the back ends that are created by the load balancer will be nothing but the nodes which are hosting your, nodes which are part of your Kubernetes cluster. And when you specify that annotation over there, cloud.google.com.engress is true. So that is where you're telling that this particular service, which is being supported by so and so Ingress, when I submit that request, Google Cloud will know that I will have to create an Ingress for this guy at the same time. The back ends I manage using the load balancer I'll create will not be VMs, but it will be the pods directly. So all of this awesomeness, users are saying it in one line. So when we create such an ML files and submit it to Google Cloud, so rest everything is automatically created by Google for customers. So to show what benefits or performance improvement the network endpoint groups will offer, we have kind of made some tests ourselves to come up with the results. So these tests are quite lengthy and time-taking and that is the reason we try to do it all on our own. But what we can show you today is we have captured some screenshots from the tests we have performed and we will share those screenshots and the test results with you today. So the test setup we have done is on the left side, you have a cluster, Kubernetes cluster created in Singapore and you have an Ingress aka load balancer which is actually exposing the services inside the Kubernetes cluster. And then you have a VM which is situated in US central one it has created. So this test VM US central one using Apache benchmarking tool at ascending some traffic to the load balancer and the load balancer is actually diverting the traffic to the Kubernetes cluster. So the end service which is actually responding to my request is sitting inside the Kubernetes cluster in Singapore and the traffic is being generated by a VM somewhere living in United States. So it's pretty much the same setup on the right side but the only difference is we have created two different clusters and two different load balancers one which doesn't avail this feature VPC native feature or the network endpoint group feature which we just discussed and the other one which is an advanced cluster which is making use of all the features which we have just discussed. So it's the same setup but if you remember the checkbox we have shown earlier one cluster is created by turning off the checkbox and the other one vice versa. All right, so this is an interesting slide. If one looks or takes a notice at the last column which says node so what we have done is the service we deployed onto both the Kubernetes clusters we deployed it as a deployment with six replicas which means it will create six pods for me all of the pods are hosting the same application. So when I request hits my load balancer it can be load balanced and sent across to any of these six pods. The response will still be the same because all of them are pretty much doing the same thing. So if you take, so those in the last column which says node are actually the name of the nodes. When I say I have created a cluster of three nodes each one using the feature and one with the feature turned off all of them have three and three nodes. So if you see here like three pods are in one node one DVF is one, one DVF we have three entries which matches one DVF at the ending which means three of my six pods are sitting in one node two more pods and one node and the last pod and one node. So that's how Kubernetes cluster has spread my pods across all three nodes in my cluster and it pretty much applies the same to the next cluster as well which is making when we say next cluster we are speaking about the one which is making use of the network endpoint group feature. So the distribution is the same both the clusters six pods three of them running in one node two of them running in one node and one of them running on one node. So now using the test setup we mentioned earlier we try to put some load onto the Kubernetes clusters. So this is the one where the feature is turned off where your back ends are actually virtual machines and this graph actually shows how the traffic is distributed. If one goes and see those green colored lines the thickness of the lines actually indicates how much of traffic each node is taking. So if you see here it says North America and 225.72 RPS. RPS stands for request per second. So here in this case when we captured the screenshot we have actually been sending around 200 or plus requests at the same time per second. So the way they have been distributed is if you see on the right side there says something called as rate rate rate. So that is a rate at which each node in your Kubernetes cluster is actually getting those requests. So if you see there isn't much difference. So the top one says 69.13 and the bottom says 77.70 which is not much of a difference actually. But which is good which means the distribution actually is happening evenly across all the nodes. But the catch here is it is actually not efficient or optimal load balancing. The reason is only one of those three nodes is hosting three pods for us. So the amount of pods living inside my nodes is not evenly distributed. One node is having three copies of my application. One node is having two copies of my application and other node is having only one copy of my application. So if one were to go by the copies of my applications it doesn't really look good, right? Irrespective of number of copies my application is running on the traffic is getting equally distributed. So this is what is the traditional way of doing it. On the flip side, the same thing we have tried to do with the load balancer and the net cluster which has all these features turned on and this is how it behaves. I think now after having discussed this slide look pretty much self-explanatory. You see one line is similar, one line is like thicker. Maybe if I take the ratio I can say one s to two s to three which is same as the ratio of the pods, right? So this one is actually considering how many healthy pods are living in my node and keeping that fact mindful and is using that fact while actually doing the load balancing. So this suggests that using this feature will give me more optimal and efficient load balancing. I'm actually relying on my pods which are healthy for load balancing rather than depending on the virtual machines as a back end. And one more thing to take away from both of these slides is if I flip back, what we did is for the tests we have actually sent approximately like 50,000 requests at the concurrency of 100 which means like every thing I send like 100 requests goes at the same time likewise I send 50,000 requests something. So for my load balancer or for my backend Kubernetes cluster to successfully serve all my requests once after having done that it has kind of printed some statistics for me. So we have highlighted the some details which actually matters here. So if you see the number of requests per second which this cluster was able to serve is 230.27. Let's say 230 for round figure. So it was able to serve 230 requests per second and this one is around 400 requests which is a huge improvement. And this application is not even complex. All we did was for the demo purposes we took in some application which is really simple it's a request what it gives you back as it says my host name is Owen so it doesn't do anything fancy. So such a simple application with nothing much complex happening and we are not even talking about HTTPS security tearing off establishing sessions and things like that. And if we go back again and look at the time for the request here it says each request or all the 100 requests concurrently are taking 434 milliseconds and here it's down to 247 milliseconds. So this speaks volumes about the kind of performance improvements and the latencies your application will gain out of using this feature. Thank you. Any questions before we call it off? Yes please. I think it depends on some particular questions like it's not as simple as you know just spreading the incoming requests into different, I think it depends on the ratio so that it can figure out depending on how much requests let's say I put bigger machine and the machines are configured to run on it and I have a smaller machine and I'm running multiple. Right. Is there any possibility to like control it? So the load balancer offers you several settings where you can play around with stickiness something for example if we speak about session affinity like saying all of my, all of the requests from the same user should be served out of the same machine or the same port. So there are things which you can still play around with. These are possible actually on Google Cloud but we try to keep it simple for the sake of the session. All we wanted to concentrate is on the optimal load balancing and better latency experience. Thank you. Thank you.