 Hello, everyone, and welcome to my talk at Open Source Summit North America. I'm going to be speaking about spreading apps, controlling traffic, and managing costs in Kubernetes. My name is Laconde Mwila, and I'm a developer advocate on the Amazon EKS team at AWS. I'll start with a short background about myself. I started off my career as an application developer working on both web and mobile solutions, and a lot of that early work was focused on SaaS products for startups. After that, I transitioned into consulting in the financial services sector, where I primarily worked as a DevOps engineer and also performed a lot of solution architecture and platform engineering tasks. All the while, I was very involved in various developer communities because I've always been passionate about providing enablement and wanting to shape or influence the product experience for end users. So when the opportunity arose, I happily jumped into the pool that is developer advocacy. If you want to connect with me, please feel free to do so on LinkedIn or Twitter. You can also subscribe to my YouTube channel if you're interested in more Kubernetes or cloud native related content. All right, with that out the way, I'm going to go ahead and get started. First thing, this talk was inspired by a combination of things from both the past and the present. The first one was an event several years ago on Black Friday. At the time, I was working as a developer for a software company and all the developers worked together in a shared open space. Now, because it was Black Friday, as you would expect, a lot of my colleagues were especially excited about taking advantage of shopping discounts on a particular website. Unfortunately, on that day, that specific site crashed. And from my vantage point, in this shared open space, I could see that every one of my colleagues that was trying to access that site was met with the same screen that had an apology. The company shared that its systems couldn't handle the amount of traffic that it was experiencing and so it was down for several hours, if not most of the day. Now, if you experience this as a developer or a person that understands software architecture to some degree, you can't help but ask the question, how would I have dealt with that situation? Or better yet, prevented it from happening. Such events, they naturally spawn interesting discussions and a common consensus among the people that I spoke to about it was that shifting to the cloud, ensuring the application and its underlying architecture was highly available and taking advantage of auto scaling capabilities would have solved the problem. That may be true. Don't know for sure because we didn't have enough context about the problem. But I do wanna focus on these particular elements, high availability and auto scaling. Today, these are critical components to have as part of your architecture for solutions like an e-commerce site. But let's drill down on both of these. I'll start with high availability. High availability helps us improve resilience by eliminating a single point of failure and increases performance by reducing the strain on our resources. That's why it's widely regarded as a best practice and is now a very common pattern. At a lower level, highly available architectures are typically fronted by a load balancer that proxies traffic to different upstream servers in their respective locations. As a solution, highly available architectures seem straightforward. But you have to give ample thought to these types of solutions. For example, if your load balancer distributes traffic based on methods like a round robin or a random approach, you may end up incurring higher costs because of all the egress cross zone traffic. And all of this traffic is destined for different topology domains like availability zones. And this is where auto scaling comes in as well. You definitely wanna be able to increase or just decrease your compute capacity when the need arises. But another question you have to ask is, how well does that compute capacity align with your workload's requirements to begin with? If there's a misalignment and you have larger worker nodes than what you need, then you will be paying a lot more than you should before and after scaling events. And this is a common pitfall. I've spoken to a number of people from different companies that have shared that they're trying to solve the challenges related to wasted compute capacity because they're running underutilized nodes in different zones for their Kubernetes cluster environments. And these kinds of architectural implications are becoming increasingly concerning for teams, especially now because of the economically challenging times that we are in. And over the years, running Kubernetes in the cloud has become a very common model. Some of the driving factors have been to achieve high availability for both infrastructure and the applications that are running on the clusters. Also, teams wanna take advantage of the autoscaling capabilities in cloud environments. But again, because of the current economic situation we're in, teams are trying to figure out how do we achieve high availability, take advantage of autoscaling and manage operational costs when running Kubernetes in the cloud? I'll start by focusing on spreading your application. In this segment or section, I'll deal with the ways that you can make your application highly available in Kubernetes. And there are two features that I'm gonna be focusing on. Pod affinity rules and pods apology spread constraints. Pod affinity rules allow you to apply rules that inform the Kubernetes scheduler's approach in deciding which pod goes to which node based on their relation to other pods. Similar to node affinity, this can be applied in a hard or soft approach, which will then tell the scheduler whether certain rules are acquired or preferred and it will then act accordingly. Pods that have a required role will be placed on a node that satisfies the relevant constraints. And pods that contain the preferred role will be scheduled to nodes that match the highest preference. Okay, so these rules are applied in relation to the topology of your cluster. If you're wondering what exactly is the topology of your cluster, well, this simply refers to the arrangement or layout of the physical components and resources of your Kubernetes infrastructure. With these constraints, you can apply parameters that determine how the pods are distributed across your topology domain, such as nodes, regions, or availability zones. And the pod affinity rule informs the scheduler to match pods that relate to each other based on their labels. So if a new pod is created, the scheduler takes care of searching the nodes for pods that match the label specification of the new pods label selector. In contrast, the pod anti-affinity rule allows you to prevent certain pods from running on the same node if the matching label criteria are met. So our concern in this context is pod anti-affinity. Pod anti-affinity is typically helpful in preventing a single point of failure by spreading pods across AZs or nodes for high availability. For such use cases, the recommended topology spread constraint for anti-affinity can be zonal or hostname. And this can be implemented using the topology key property which determines the searching scope of the cluster nodes. The topology key is a key of a label attached to a node. Okay, so next let's consider topology spread constraints. This feature allows you to make your application available across different failure or topology domains like hosts, AZs, or regions. And this approach works very well when you're trying to ensure fault tolerance as well as high availability by having multiple replicas in each of the different topology domains. See pod anti-affinity rules can easily produce a result where you have a single replica in a topology domain because the pods with an anti-affinity toward each other have a repelling effect. So you can set how strict you want the anti-affinity rule to be but you don't have much control over how the pods will spread out across the nodes or AZs. As you can imagine, a single replica on a dedicated node is an ideal for fault tolerance but it's also not a good use of resources. With topology spread constraints you have more control over the spread or distribution that the scheduler should try to apply across the topology domains. So you should be aware of a few important properties when using this particular approach. Max SKU, topology key, when unsatisfiable and the label selector. And I'm gonna cover each of them just briefly. Let's start with Max SKU. This is used to control or determine the maximum point to which things can be uneven across the topology domains. For example, if an application has seven replicas and is deployed across three AZs, you can't get an even spread but you can influence how uneven things will be. In this case, the Max SKU can be anything between one and seven. A value of one means we can potentially end up with a spread like two, two, three or three, two, two and so on. Topology key is a key for one of the node labels and defines the type of topology domain like zone and is paired with an appropriate value. In this case, a value could be an AZ so that we could apply that kind of zonal approach for our spread. When unsatisfiable is used to determine how you want the scheduler to respond if the desired constraints can't be satisfied. Pretty straightforward. And then we have the label selector which is used to find matching pods that the scheduler can be aware of when deciding where to place pods in accordance with the constraints that you have specified. So that's spreading the application. Next, I'm going to look at controlling traffic. In this segment, I'll cover the use of a service mesh, specifically Istio, as well as topology aware hints or routing for controlling traffic. I'll start with the service mesh approach. Istio is an implementation of a service mesh and the main goal of a service mesh is to unburden applications from worrying about networking specifics. And there are a couple of important areas or domains that service meshes are concerned with addressing. These are connecting applications for intercommunication. Also, they're concerned with securing the communication between those applications with things like MTLS and authorization policies. Service meshes also help with improving the resilience of that communication through retries, circuit breaking, load balancing, timeouts, and more. Coupled in here as well is being able to control the traffic for different use cases. And that's what we'll be focusing on. Last but not least, they allow you to observe and trace network activity across the applications in the mesh. So as I said, for this talk, I'm focusing on the aspect of controlling traffic, specifically the type of load balancing that will be used. With Istio, you can implement what is known as weighted distribution to your load balancing. And this will enable you to control how much traffic goes where from a given source based on the percentage that you apply to it. In the context of what I'm speaking about, you can opt to route a majority of the traffic coming from a certain availability zone to a destination in the same availability zone to minimize the costs associated with egress traffic. In Istio, this is implemented using destination rules. And my demo will be focused on this later on. I'll have a highly available load balancer exposed by the Istio Ingress Gateway, which will be spread across three AZs in the EU West One region. And I will use the destination rules to control traffic to go to a specific destination based on the AZ where it's coming from. Next, I wanna cover topology-aware hints or topology-aware routing, which is what it's called now. When topology-aware hints are enabled and implemented on a service, the endpoint slice controller will proportionally allocate endpoints to the different zones that your cluster is spread across. For each of those endpoints, it will also set a hint for the zone. Hints describe which zone an endpoint should serve traffic for. The cube proxy will then route traffic from a zone to an endpoint based on the hints that get applied. In some cases, the endpoint slice controller may apply a hint for a different zone, meaning the endpoint could end up serving traffic originating from a different zone. And the reason for this is to try and maintain an even distribution of traffic between the different zones. When deciding which approach to go with, some important things to consider are for the service mesh approach would be the general complexity, domain knowledge, and operational overhead that it will introduce into your environment and workflow. If your team is in a position to adopt a service mesh and make use of a number of its capabilities, then I'd definitely recommend this approach. However, if you're looking for a leaner and simpler solution, then you should go with topology aware routing. Just remember that you have to be running a Kubernetes version that supports this particular feature. Also bear in mind that at this point, it is still in beta. So we've covered spreading our application for high availability and how to do so in a false tolerant way. Then we looked at how to control network traffic to minimize the cost associated with distributing traffic in highly available environments. Now I'm going to focus on how to achieve alignment between what our application actually needs and the compute capacity that gets added to the cluster for the workloads to run on. In addition to that, I'll touch on scaling in a cost effective way and avoiding the underutilization of your nodes. And the tool that I wanna highlight here is Carpenter. Carpenter is an open source cluster auto scaler for Kubernetes. You can use Carpenter to manage nodes for workloads in your cluster environments. But how does it work? When you have pods in a pending state because they can't be scheduled, Carpenter will respond to that by adding a node or nodes that meet the specific pod requirements along with any scheduling constraints that you've applied. This dynamic approach is what makes Carpenter so powerful and minimizes the risk of running nodes that are underutilized. This is different from the Kubernetes cluster auto scaler which integrates with your auto scaling group in a cloud environment and then scales your compute capacity in a static way based on that particular group. Carpenter also has a feature known as workload consolidation. And workload consolidation helps you achieve cost effective auto scaling by consolidating workloads onto the fewest least cost instances while still adhering to the pods resource and scheduling constraints. Now that being said, at the time of this talk, Carpenter only has support for AWS environments. Certainly hoping that changes and there's a lot of work going into it. I'd encourage you to follow that project. Either way, it's something you should be aware of when deciding between Carpenter and the cluster auto scaler. The latter obviously doesn't restrict you to a single cloud provider's environment. That's the theory. Now I'm going to show you a demo making use of some of the features and tools that I have just spoken about. Before I get to the demo, I want to walk through some of the relevant components that will help you understand how things are working in the background. And I won't go into too much detail with the source code, but it will be available on GitHub. And I'll share links to the relevant documentation for you to go through in your own time. For starters, I've got a manifest file here that contains three deployments. These deployments are different versions of the same application. And the reason I'm doing this is so that it can be easy to identify the results of our load balancing configurations. Each application version will give me a different response for the exact same endpoint. So when I query the Istio Ingress Gateway's load balancer and the relevant application endpoint, the response I get is what will help me distinguish the results from the different destinations. Also, you'll notice that I have an anti-affinity rule for any workload matching the particular key value pair that I've specified here. The key is app and the value is express test. In addition to that, this is going to be based on the topology constraint for availability zones. So it will repel the pod replicas spun up by other deployments because they all have the same label. Lastly, you'll notice that they have a node affinity rule to ensure that they are only scheduled to nodes created by Carpenter. And the other deployment configurations have this exact same specification as well. These deployments have a cluster IP service that will forward traffic to their replicas. And because of the selector, it will treat them as if they were the same application and version, which is what I want for this particular test. Now let's move on to the Istio configurations with two particular custom resource definitions, the virtual service and the destination rule. First off, I've got a virtual service that is responsible for routing traffic that comes from the Istio Ingress Gateway. And you can create multiple gateways. In my case, I've created a gateway called express test gateway, which is for this particular application. And as you can see under the routing rules, it will route traffic for this test URI to the cluster IP service for my deployment. After routing occurs, we can apply further rules using the destination rule resource. Let's pay close attention to the distribution property under the locality load balancer settings. You can see that I'm setting certain weights for the different AZ destinations based on where the traffic is coming from. If it's coming from EU West 1A, then most of the traffic should go to the upstream service in EU West 1A. And this is the same principle applied for the other AZs in order to reduce cross zone traffic. Lastly, we're going to look at the provisioner for our carpenter nodes. The provisioner is a custom resource definition that allows us to configure how carpenter spins up the nodes and lets us apply some parameters that are fitting for the workload. Let's have a look at a few of the properties here. So you'll notice that I have consolidation enabled so that carpenter can continuously attempt to reduce cluster costs with the nodes that it provides. And in addition to that, if we take a look at the requirements section over here, you'll see that it can provision nodes in all three availability zones for EU West 1A. I've omitted certain instance categories and I've specified the CPU capacity that it's allowed to provide for EC2 instances that get created. And there's more you can do with it, but at this point, that's all we're going to be concerned about and we can now finally take a look at the demo. If we take a quick look at K9S, you can see that I've got three replicas running for this application as expected, a single replica for each of the deployments. And if we look on the far right, you can see that each of these is running on a separate host. But let's see which availability zones that these hosts are in. I'm going to fetch the nodes that have been provisioned by carpenter for this application in EU West 1A. As we can see, there are no hosts for this application in EU West 1A. So let's have a look at the other AZs. So carpenter has created two nodes in EU West 1B and one in EU West 1C. And if I take a quick look at the EKS node viewer and select nodes created by carpenter, I can see the instances provided and how much it will cost me on a monthly basis. So as we can see over here, I've got three T3 medium instances that would cost me about $91.04 on a monthly basis. Now I'm going to run a simple script that will make 20 requests to the Istio Ingress Gateway. Now the 20 requests are just an arbitrary value. You could obviously increase or change that value if you see fit. And the Istio Ingress Gateway uses a classic load balancer running in three AZs, EU West 1A through to C. And this will show us if our destination role configurations are working as expected. Now remember, we have some traffic coming from a certain AZ to send traffic to a service in the same AZ. All right, so if we take a look at the results over here from querying that specific endpoint, we can see that we have three different applications or three different versions of the same application. Each of them giving us the response simple node app working, but then we have two that have a suffix that specify the version, V1.12 and V1.14. But if we're honest, these results don't exactly help us in terms of seeing that our destination rule configurations are working as they should. So one way to verify that is to update the destination rules. And I'm going to send most of the traffic to EU West 1A regardless of the source, just as a test. And there you have it. You can see that the destination rule is working correctly. So I can revert the changes that I just made to the right configurations of sending traffic from a given source to a destination service in the same AZ. And that brings things full circle. Thanks a lot for watching my session. I hope you found that helpful. As I mentioned earlier, feel free to get in touch with me if you want to connect. You can also download the slides for this particular presentation. They're attached to the talk and enjoy the rest of the conference.