 We have we have look on day from We used to work together. We look on the in the same company and Then he decided to leave paradise and jump in hell Just AWS Look on there, please give it up for him. Thank you All right. Hello everyone, how are you doing? Looks looks like people are cold. I'm not the only one feeling it I'm kind of hoping being up here will warm me up a little unless I'm just delusional So thanks for coming to the talk as you can see I'm going to be speaking about spreading applications controlling traffic and optimizing costs in Kubernetes as William already pointed out My name is the condom wheeler or you can also call me Luke and I'm a developer advocate for Kubernetes at AWS So if any of you want to get in touch with me, you can feel free to do so you can reach out on LinkedIn I use my full name so you can search for me Hopefully you'll be able to recognize me Also, I produce a lot of Kubernetes and cloud native related content and you can search for that on YouTube Subscribe to the channel if you find it interesting But also the developer advocacy team that I'm a part of has a dedicated YouTube channel called containers from the couch Similarly, lots of great content that would be useful for you and your teams Great, we're gonna jump right in so this talk was inspired by a combination of things from the past and the present the first one being an event several years ago on a specific Black Friday and at the time I was working as a developer for a software company not the same one as William and All the developers worked together in a shared open space so from my vantage point I could see the screens of some of my work colleagues now Because it was Black Friday as you'd expect a number of my work colleagues were especially excited to take advantage of the Discounts on a particular e-commerce website and that site was really popular amongst most of the folks at the company Unfortunately on that day some of you might know where this is going the site crashed and I Could see this from where I was seated My work colleagues that were trying to access the site were all met with the same screen and the company had essentially Issued out an apology letting people know that their systems had crashed due to a high volume in traffic They didn't have that bit there about trying to restore balance in the force. I'm a Star Wars fan So I threw that in there. So just for the sake of historic accuracy that wasn't there Now when you experience something like this as a developer and you're the end user or even if you're just involved in Software architecture and you understand it to some degree. You can't help but ask yourself the question What would I do if I was in that position or how would I have prevented that kind of situation you? You can kind of empathize with the people involved in that situation because they're also involved in software So naturally it spawned a lot of discussions between myself and some of my work colleagues at the time but even over the years because it was a relatively significant incident and The general consensus from a lot of those discussions was that if you want to prevent Something like this then you should probably have some form of high availability and auto-scaling mechanisms involved in your architecture Now I just want to be clear. We didn't have context as to what the root causes were so we were basically guessing but That was a safe guess in terms of the kind of measures that you want to have in place to mitigate the risks of such an Outage especially for a business of its stature and on such an important day now I want to start off by focusing on high availability before I get to auto-scaling and these two components today are Widely considered best practices. They're relatively standard or common patterns and in many ways you think of them as straightforward But you need to give ample thoughts to both high availability and auto-scaling because they come with Implications and the implications that I want to focus on are the cost implications And so it's all fair and fine to have a highly available architecture And these architectures are generally fronted by a load balancer that proxies traffic to different upstream servers or destinations However, you need to give thought to the load balancing algorithm or approach that you're taking in these kinds of setups For example, if you're going with a round robin or a random approach You need to be aware of the fact that that means that you're going to have a lot of egress cross zone traffic And this is one of the points where you incur a lot of costs And this is something that has been coming up a whole lot with some of the teams that I've been engaging with especially in in cloud contexts and So high availability is important because it addresses a resilience issue you want to eliminate having a single point of failure in addition to that it improves performance by essentially Taking care of the fact that you have you don't want to have a minimal number of resources that gets strained But bearing that in mind you need to be aware of what your particular constraints are for a particular project in order to Implement the solution in the best possible way. So there are a lot of costs associated with the load balancing strategy in addition to that We also have to consider autoscaling so this traffic that is being proxied from the load balancer is headed up to upstream destinations and One of the other common pitfalls is wasted compute capacity Even while I've been here speaking to a couple of people who shared that they've also been seeing this a whole lot Do your nodes actually align with the workload? Requirements and so autoscaling is important But another thing that we see that happens a whole lot is over provisioning of compute capacity And so it's great that now your your infrastructure is essentially scaled out to accommodate those Events where you need to scale However, there are situations where you've provided a whole lot more than what you actually need The reason for that is probably using a cluster autoscaler That has a more static approach working with a scaling group in the particular cloud provider that you may be working with So these two areas incur you a whole lot of costs and now it's a whole lot more Critical because we're in a difficult economic time and a lot of teams do want to take advantage of high availability of their workloads And their infrastructure with Kubernetes and they want to achieve that in a cloud environment because of what it has to offer Including the elasticity that they can get but they're also trying to find the best ways to address the operational costs Associated with such a model and that's what I'm going to be focusing on So the first thing that I want to look at is Spreading of your application and there are two different features that you can make use of in Kubernetes in order to spread your application The first one that I want to focus on is pod affinity rules And so pod affinity rules how you can essentially apply scheduling constraints through the form of certain rules that Define a relationship between different workloads It could be it could be the same workload as well And that's what you'd essentially want in the case of achieving high availability And so these specific rules influence the scheduler's behavior when it's placing pods on nodes So if there is a specific affinity rule between pod a and pod b or rather let me say workload a and workload b and The scheduler has already placed workload a on a specific node Depending on the affinity rule that you have defined when the scheduler is about to place a pod on a specific node It's going to look at that relationship and see whether or not they should actually it should actually be placed on the same host or That or that should be on a different az and these rules are based on topology domains And the topology domain can either be a host or an or an availability zone Now what we're interested in as you can see in this diagram over here is an anti affinity rule So an affinity rule would essentially be if we want our pods in close proximity whether that's based on a host Topology or if we wanted to be if we want them to be in the same az with anti affinity That's essentially wanting to have somewhat of a repelling effect We want the pods to repel each other so that they're spread across the different topology domains And so that's how we would achieve high availability with our application Next I want us to consider pod topology spread constraints. So this is the other approach now Pod affinity rules or rather pod anti affinity rules can help you achieve high availability But the issue with them is that it doesn't necessarily address or rather it introduces a problem of fault tolerance Pods apology spread constraints not only Provide you with high availability, but also deal with the issue of fault tolerance It gives you more control over how you actually want your pods to be distributed across the different topology domains in your cluster So if you take a look at this diagram over here You see that we have ten pod replicas and they're close to even when it comes to doing this in three different availability zones We've got a spread of three four three So this is something that you can't really achieve with pod anti affinity rules and the reason for that is because they have an Anti-affinity toward each other so you could easily end up with a situation where you have a single replica running on a node Which is not good for fault tolerance, and it's not good for resource utilization either So this works out a whole lot better if you're trying to not only address availability, but also Faults tolerance of your workloads and because this is the one I'm going to be showing you or demonstrating a little bit later I'm going to focus on the kind of properties that you would be defining So these are the properties that you would primarily be concerned with if you're going to use pods apology spread constraints I'm going to start off by focusing on max skew and max skew is how you would define the maximum point to which you want in Balance or inequality for the distribution of your pods across the different topology remains So if we take the example from the previous diagram, I'm just gonna go back quickly so you can see that So that's 10 replicas for three different availability zones There's no way to equally spread the number of pods But what we can do is say we want to maximum imbalance or inequality of one Which is why we've got three four three so that could easily end up also being four three three Now the max skew can be anything between the value of one and the number of replicas that you have So in that case it would be between one and ten So if you went with ten then that means there'd be a chance of which you end up with ten replicas in a single topology domain Whether that's a host or an availability zone and you'll see over there There's also the topology key and this is the key that's going to be attached to the nodes It's gonna be one of the labels and it's how you essentially define that the kind of topology that you want to work with whether you want it to be a zonal approach or if you want it to be a host and When unsatisfiable is somewhat self-explanatory This is how you want the scheduler to respond in the case that it can't meet these scheduling constraints So if you want it to still go ahead and schedule the pods anyway Or if you want it if you want those pods to remain in a pending state And then lastly we have the label selector and this is similar to pod affinity rules and in this case It's essentially saying which pods or rather any pods that have this particular label or these labels are the ones that have their Relationship for which these constraints should be applied So that's pod affinity rules and pod topology spread constraints That would be how you would achieve availability of your application but the next thing that we want to consider now is to start addressing those two main areas of Optimizing our cost the load balancing area and then we'll get to the nodes a little later but let's start off with controlling traffic and I'm going to demonstrate or Rather speak about two different approaches that you can take the first approach I'm going to discuss will be through the use of Istio for those who aren't familiar with Istio It's an implementation of a service mesh and the job of a service mesh in a nutshell is essentially to unburden Applications from having to deal with networking concerns and they generally do that in these four main domains Connecting of the applications securing those connections Controlling traffic or traffic management in addition with that also adding resilience mechanisms and as well as having observability features Now what we're concerned about in this particular case is Controlling traffic so changing the way load balancing is actually going to work So if we look at the diagram over here You'll notice that we have a request that starts from a particular end user and their request goes through to the load Balancer that is exposed by the Istio ingress gateway that trap that traffic then gets proxied to that component there That says virtual service the job of a virtual service in Istio works a lot like ingress So if you're familiar with the ingress resource, this is essentially how routing takes place You define your rooting rules using virtual services after rooting takes place you can apply additional policies Such as what we're actually about to carry out which is controlling the traffic And you would do that with destination rules and with the destination rules What we can actually do is essentially say for traffic coming from a certain point of origin We wanted to go to a certain destination so some of you might already be familiar with these concepts when you think of Different deployment strategies directing a certain amount of traffic to a certain version of an application for either a blue green deployment Or a canary deployment, but in this case what we actually want to be doing is Taking advantage of Istio's feature known as locality weighted load balancing and if I'm correct other service measures also have this so Istio essentially takes the information of The topology domains in your cluster and uses that information in order for you to actually carry this out So for a load balancer, which is highly available and I'll demonstrate demonstrate that a little bit later So let's say it we have our cluster running in the region EU West 1 and we've also got a load balancer That's highly available. So it's across three different availability zones EU West 1 a b and c traffic that hits that particular load balancer We want to configure it in such a way that we say for traffic coming from EU West 1 a it should go to an application That is running in EU West 1 a and we can determine how much of that traffic goes where so in this case You'll see with this particular diagram of saying 60% of traffic to EU West 1 a 40% to EU West 1 b And this is a powerful mechanism because this is one of the ways that you can drastically reduce the egress traffic costs So I think some of you are probably being kind and just nodding your head as I was talking about Istio because I know there's Sometimes not so much love for Istio because of the operational complexity associated with it and it's totally understandable But something that has been interesting while these kinds of issues have been coming up and I've had the chance to engage with different teams They've essentially been faced with Two main options it's either they go the approach of using the destination rules that Istio has to offer or They accelerate upgrading their cluster to a point that they can make use of topology aware hints And if I'm correct topology aware hints became Hit a beta level as of 1.23 and I think it's still in a beta state But you might have situations where some teams are running older versions of Kubernetes And so they can't make use of topology aware hints But I still want to speak about it so that some of you may be aware of that in case you're running a version That allows you to make use of this feature and just for a bit more context I'm gonna go through how the process of routing traffic happens When when we're load balancing for our particular applications So we've got services again most of you if not all are already familiar with this and their services are Stable network abstraction layer that sit in front of our pods because our pods are ephemeral so they have a static IP Now because our pods are have a short life cycle those IPs are continuously changing But when I when those pods are actually alive their IPs are stored in what are known as endpoints and every time that a service is created There are endpoint slices that get created and those endpoint slices are created by the endpoint slice controller And the endpoint slice controller is what's actually responsible for allocating the different endpoints for the IPs to the different topology domains in your cluster So if you have a highly available cluster across eu west 1 a b and c the endpoint slice controller is going to be Responsible for allocating the different endpoints into these different topology domains now when you when you do that the next thing is The Q proxy is a daemon set that's running on each of the different nodes And the Q proxy is also serving a form of internal routing and what it does is it consumes from the endpoint slices Now outside of topology aware hints each of these endpoints just has information about the specific pod It's IP address the node that it's running on and any additional topology information But when you have topology aware hints enabled This is what happens so you can see without hints The endpoints lices have endpoints that essentially say we serve traffic for just about any zone But when hints are enabled It essentially has there's an endpoint with a specific availability zone based on the one where that Pod is actually running so if we have a pod running an EU west 1a That information is stored in a particular endpoint And then the endpoints slice controller is going to add a hint saying that you should serve traffic coming from EU West 1a so that's the mechanism to essentially control traffic to be within a specific zone to minimize the costs associated with egress cross zone traffic So those are the two approaches Hopefully that helps you with when you're trying to consider which approach to take again Having a service mesh does come with operational complexity additional domain knowledge And it might be a case where your team is not in a position to take on that Alternatively if you decide to go with topology aware hints You just need to be aware of the specific version of Kubernetes that you would have to be running in order to take advantage of that approach but the process of actually Going getting running with topology aware hints is really simple It's just ensuring that your nodes have the relevant topology domains And if you're running a Kubernetes cluster in a cloud environment that automatically gets generated for you Those topology domains are attached as labels to the respective nodes And then you would simply need to add an annotation to the service that you want to have Actually managing the service that's going to be proxying traffic to the different pods in order for hints to work out All right, so then lastly before I Get to the demo. I want to turn to managing nodes So just by show of hands is anyone here heard of carpenter Okay, great a couple of people and Who here is familiar with the Kubernetes cluster auto scaler? Okay, most people which is great and since it's a fantastic project and several years ago before I was in developer Advocacy and was still consulting. That's the project that we primarily used But a challenge that we run into then was the fact that the cluster auto scaler takes a static approach to scaling It works specifically with an auto scaling group in a particular cloud environment, whether you're using AWS or Google or a different cloud provider And so this can be particularly particularly challenging when you're trying to deal with the whole issue of reducing costs and improving Utilization for the underlying nodes And what carpenter does differently is the fact that it's more dynamic and instead of working with a scaling group instead It looks at pods that are in a pending state and takes into consider it into consideration their specific pod requirements as well as their scheduling constraints and then reaches out directly to the EC to API in Order for it to provision the nodes that are actually needed for those particular workloads And so this drastically improves your resource resource utilization as well in addition to that Carpenter also has a feature called Workload consolidation when workload consolidation is enabled carpenter is continuously monitoring the nodes in your cluster that it controls to see whether or not Resource utilization is at a good level and in the case that a number in the case that any node is underutilized then it will essentially remove that node from your cluster and Consolidate things to make sure that there is improved resource utilization on the nodes that are actually needed in Order for you to save on costs as well Now obviously the big challenge with carpenter is even though it's an open source project at this particular point in time It only has there's only support for the AWS cloud provider So there's still an open issue This is also somewhat of an invitation for more people to get involved with the project in order for it to be extended to additional cloud providers as well Now one of the other things that I love about carpenter is the fact that it also respects scheduling constraints So to circle back to where I started from in terms of high availability when you put particular topology Spread constraints in place for your workloads Carpenter will be able to respect those things when it's adding nodes to your cluster. All right, so That's the talk what I'm gonna do now is switch to a demo to show you a specific focus on Controlling traffic. I'm gonna zoom in a little here Okay, can everyone see that clearly great so The application that I want to focus on is a basic node.js application And so it's actually two different versions of the same application And the reason I'm doing that is because they're going to give me different responses for the exact same endpoint And I just want to be able to differentiate Between the the two different applications when traffic is being proxied to see whether or not our destination rule configurations Are actually working as expected. So I'm just gonna scroll down slightly here. So you can see that You'll notice that we have one application that is using version 1.1.2 and this one is called Express test If I scroll down further This is Express test 2 and Express test 2 is running 1.1.4 now both of them Have topology spread constraints applied now if you take a look at this You'll see that this is the exact same code block that I walked through earlier when I was talking about topology spread constraints In addition to that I've got a node selector to ensure that these pods are only placed on nodes that are added by carpenter so that's our application and they're gonna be fronted by a Cluster IP service that proxies to those two different applications treating them as if they're the same one or rather two different versions of the same application Then next over here is a custom resource definition file known as a provisioner So this is the file that would essentially control the life cycle of your nodes in carpenter And you can have multiple provisioners or you can have a single one So in this case, I've got a provisioner dedicated to my express test workload as you can see here It's called express test and you can apply different parameters or constraints For how you want carpenter to add nodes to your cluster So in this case, it's already defined to add nodes to every one of the availability zones in the case that you wanted to add Constraints to that you can do that if you want to restrict it to only adding spot instances That's something that you can also apply in my case. I want both spot instances and on-demand instances and You can also add further configurations like defining the instance families you want Which is a really powerful feature because obviously there are certain instance families that are more costly than others And so you can be able to manage that and then right at the top here is Consolidation and you can see it's enabled so remember this is that feature that I spoke about that allows carpenter can to continuously watch our nodes To ensure that it's checking whether cost whether resource utilization is at an optimum level So that it also just keeps working to reduce your cluster costs and removing nodes that aren't needed All right next up over here is our destination rule. This is that custom resource definition file for Istio and as you can see over here, I'm just going to focus on the Distribution section and you can see what I'm saying is for traffic coming from EU West 1 a I want 80% of it to go To EU West 1 a and 10% to be in 10% to see and The other sections are very similar 80% of traffic coming from EU West 1 b should go to EU West 1 b So this is basically minimizing the amount of cross zone traffic Then real quick just to show you I'm gonna this is the script that I'm going to be running They're all sharing the same Istio ingress gateway and you can see that's the end point I'm going to be accessing because remember I have those two different versions of the same application So the requests are going to be sent there and it will go through that same flow from the diagram that I had up Virtual service destination roles and eventually through to our application So let's see what this looks like before we get to that Before I run it just want to quickly show you Over here is express test to it's already running in my eks cluster and you have express test one Express test two is running on this node over here You'll see two one six five and express test one is running on zero zero eighty seven. So I want to quickly Come here so you can see that You see we have zero zero eighty seven over there. I'm going to describe that And I'm going to scroll down and I just want to highlight to this particular thing to you So you can see there this particular node is running in EU West 1 a so that's where one of our applications lives and then This is our other Carpenter controlled node if I scroll down you'll see that this one is running in EU West 1c All right, so next thing I'm going to do is simply run this load balancing script there we go, so We've got responses of version 1.1.2 and some from version 1.1.4 Now if we're honest, it's really hard to deduce that the destination rules are actually working for all we know It's just a random approach. So The best way to verify this Is to actually go back to our code editor and modify the destination rules And instead what we're going to say is we want regardless of where the traffic is coming from we want 98% of it To be sent to EU West 1a so I'm going to apply this for to each of them That's wrong gonna apply that. Okay, so that's configured Run the load balancing script again, and you'll see now this time. We're just getting version 1.1.2 So that's just one way to verify to ensure that our destination rules are actually working as expected great all right, so I'll Move over to a bit of time of Q&A now That was brilliant. Thank you. Look on it. Do we have any question? Yes Can you hear me? Yes, thank you for your talk. I just wanted to go back to the carpenter Specifically under the topology spread constraints and I noticed you set a condition to schedule When Yes, yeah, that was intentional so again it's going to depend on The how critical your workload is if you're fine with having those ponds those pods ending up in a pending state Then that would then you can essentially change that value to a different approach So it's totally up to you depending on the type of application you're running and how critical it is to continue running obviously the The desired approach is to have high availability across your different topology domains But then this is an attempt by the scheduler to take those rules into consideration when it's applying them So it just comes down to how do you want it to respond in the case that those constraints can't be satisfied? Yeah, we have more questions, I suppose. Yes I would like to ask you something related to carpenter because I've been using it in production for And one of the issue I find out is that While carpenter work really well with scaling regarding like memory or CPU this kind of This space is actually a problem Like I had no the crushing like Application crushing just simply because the node was the disk space was not enough Is that something that it's possible to monitor with carpenter? Yeah, so in that case I would I think it's probably best to have to make use of an avert observability stack in that situation because you want to Use take advantage of tools like Prometheus and have that be Monitoring what's actually taking place on your particular nodes and rather be sending alert messages so that you're aware of it In terms of optimizing those particular nodes Probably just making sure if it's a case of disk space making sure you have the route the amount the right amount of storage in place for that and also Just reviewing the instance families as well Yeah, those would probably be the two main starting points Just reviewing the disk storage space that you're actually using Trying to see what exactly is consuming that quickly and also and better yet making use of the observability stack to actually monitor your nodes Thanks for the talk. Yeah, I Have a question regarding When you're talking about Topology spread like in the beginning and the fact that we have quite a lot of information over on where our workloads are actually deployed in which Zone and given cloud provider we have this information on the polls because polls are in the nodes Yeah, and we also have this Resource called the end points so we are we know how Where to route? Our traffic so that it ends up in the particular Availability zone, however, I also noticed that in one of the configuration files when you were showing how to set up Istio Yeah There you had the configuration on basically you had the paths configured in such a way that There were names of these zones in the paths. Yeah, so my guess is that There is no information right now on How to at least in Kubernetes like vanilla Kubernetes on where the traffic comes from like this has to be done on the external load balancer Like either on premises or provided by the cloud provider, right? Right. Yes, that's correct Yeah, so in this case because my load I knew that my load balancer. It's an external load balancer created by the Istio ingress Gateway It's an EU West one And so that's why I was able to implement those specific rules Thank you Thank you. Look on that. Yeah, that was last question we have We are a little bit off schedule at the moment, but we're gonna we're gonna come back with it I'm going to be very selfish and I'm gonna ask the corner to have a photo with me Thank you Thanks, cool. Thanks. Thank you very much. Thank you very much. Please give it up for him Okay, we are we are back in schedule now And we're gonna we're gonna have the next talk in 10 minutes at 2 30 So, please don't leave the room or call your friends to come upstairs and have a full room. Thank you very much I have I have sorry