 So I'm Yash and today we'll discuss how you can optimize the overall network efficiency of your Kubernetes cluster by observing and monitoring relevant data points. We will see how we can optimize both cost and performance by tackling several problems in network monitoring and trying to stage them together to show the impact of solving all those problems together in an actual cluster. So I hope you're excited and let's get started. So as a quick recap, you know, we know microservice architectures very well. We are in KubeCon after all. So we know microservices typically involve each application performing a specific part of a bigger system and Kubernetes host them in the form of containers and pods. So comparing this to the multi architectures that we used to have where we had one big application performing the entire request response flow. We now have several services communicating with each other to get the response. One critical observation that we have here is the increased number of network calls. So as an example here, we have five microservices that are having more than 10 service service interactions to get the same request and response flow, which means the overall performance of a system directly depends on how optimized these network calls are specifically more so in our containers and Kubernetes systems. So let's have a look at some of the common problems that we have in managing and hosting these microservices and how these network calls can be much more potential for optimizing the systems that we don't really think about that often. So let's start with simple ones first. So how do you identify the workflow for any given application specifically the containers in order parts involved in a single flow of the application? Not missing very easy if you are the owner or developer or the architect of the same application, but that is not necessarily going to be the case for all. Imagine cluster administrators or anyone not in more technically in the life cycle of that application will have to go through the entire design documents and understand the systems well enough to be able to get this information, which may not be necessarily needed to if you're doing something completely different, like optimizing the systems. And additionally, there are complexities like having the same services shared across different teams and maybe bounded by geographical regions as well. And sometimes you may have very fast piece of development, especially if you are having some kind of startup systems and all. So things are not very ideal that way. And how do you solve this thing automatically is what we need. And to solve it is if you think it's very simple, all you have to do is observe and monitor the network logs. So in any operating system, we have network logs containing information about the network operations going through. So if we observe the network entries in container ways like files like if you have Linux containers, you can use proc net TCP files that have the entire network information around the source IP, the destination IP and the processes that we are having this network calls through. And once it is all these IPs to the fully qualified domain name of the services that you have in Kubernetes, you can get a span of what the network call has been. And once you aggregate them, and together you can form a particular span or a directed graph of this particular interaction that we have. And there you have it. You have the flows in hand. And there are several open source and even third party close source software that are there that can do these things out of the box, but they're all going to be using the similar logic that we just discussed. So simple problem, simple solution. Let's go to the second one. So second one is how do you identify the latencies involved in these container interactions, which is a really discussed and thought about problem. You may need this information for debugging or judging your application why it's not getting it's responding that well as you will have expected and so on. So how do we fix that? Again, similar thing that we had. We use the same network laws that we had before. We also have the timing information. So in the source and the destination that we have, all we have to do is keep this delta of time. What we need to do is collect this data over a period of time, figure out what is the time interval has been. You might want to use some kind of basic data smoothing and operations like that. And you have this information ready available to you as well to use. Again, if you're using service mesh or any several third party tools for monitoring and observing, this could be of no worry to you because this is a very trivial thing for network monitoring. So these two problems that we just discussed were quite simple, quite easy to solve. Either they have tools directly available or they're at least well defined industry patterns to solve them. Now, let's look at more difficult and challenging problems. So let's look at imagine if we have a sparse cluster where applications are interacting across high latency barriers. Now, these barriers could be like data centers across geographical locations or something else completely different. Now, if you would like to optimize these calls, the obvious thing that will come to your mind is to replicate these microservices across these barriers so that you have lesser long distance calls going out. And the response could be faster. That is, of course, if it is feasible to solve. But if it is, that is one possible option that you have. But while this is necessary, this may not be sufficient. Now, why do I say so? Because latency barriers are often hard to isolate and could be several, ranging from network to even software. Imagine a service call coming from a particular location or a particular country could be bound to certain extra authentication or could be behind a firewall. And things that cannot be as obvious as you may see seeing the architecture itself. Sometimes there could be CDN specific things completely different, which you may not even be able to foresee, especially if you're not detailed, involved in the development of these applications. Even there are other options, like if you're running a private or hybrid cloud infrastructure, you may have not the obvious culprits like regions and stuff, but you may still have bigger issues. For example, you may have a particular data center that is having some kind of a different QoS in network responsiveness or there could be a particular data center, some hardware issue, which you are not willing to understand that fast in real time. So what we needed is a solution that is kind of customized to your application that automatically groups these group of nodes that are together network-wise. And then that is the source of thing that you have that you can figure out which nodes are closed approximately to each other, and that is the only place you would want to replicate the things. So what you would do is you would perhaps try to solve it using the same thing that we have done before. We tried to build on the network latency data that we had for our second problem. We kind of time aggregate the latency information that we had across containers and try to group node-node pairs. This gives us an estimate of the average latency that we have had in containers that have historically run on these two nodes and have been tried to reach it at least once. And using this we have, again, to use quite a few data analysis techniques like exponential smoothing or something like that to derive this, but it's something that can be done fairly straightforward once you have incrementally tied at hand. Now once we have this information, we are now in a position to group the nodes in a way that we can solve this problem that we have at hand. So let's take an example to understand this. Assuming we have interactions in the graph like this, where edges denote a latency average between the two nodes, and in this case how do we figure out which nodes are closed and which nodes are not. It seems like a simple graph problem. So let's go back to the data structures. We can use a series of traversal methods to group the nodes with lower Intel latency together. One such method could be a simple greedy base approach where you group the nodes until you cannot add any new node in the group, which has edge lower value than the already existing edges in the current group. So I'll give an example right away. Let's see if we start with node 1. We see there are two nodes N2 and N3, which are in close proximity. Let's say you are doing the same greedy base solution you go and add N2 or N3 in the group, because that is the one which has least latency, then N1 and N3 in the same group. Now if you try to add something else, which is N2 is the second node that you have, both edges that you have from this group to the node N3 are all more than one. And you already have one millisecond as the latest, as the only latency in the system, so you cannot add this in the current group. So you have to create a new group for that. So you go ahead and add N2 and then you get five different things if you start with N1. You have N1, N3 is one group and N2, N4 is one group. You have N5, N6 is one group and then you have N7 and N8. So this is one possible option of grouping of these particular nodes that would be of use to you. Similarly, if you try these things out with node 2, node 3 and you have all these possible combinations of what all potential groupings of these nodes could be. Why are we grouping this again is because we want to figure out where are all the applications necessary and sufficient at the same time to be replicated with. You obviously do not want to replicate your things unnecessarily everywhere. It's a high end-fold increase in the cost to you. And at the same time, you would want to replicate in the places where it is actually needed. So once you have this and you have the starting points, what you do is you have to figure out which is the set of combination of these particular nodes that you would like to generate and have that. So one mathematical thing that can help us in this is something called latency ratio. It will be simple sum of ratio of sum of latencies of internal and external because you want to have low interlatency in a group and high external. What you would want to do is basically try to have very lowest possible value of these groups that we are creating. And whatever possible combination that we have here, we would like to select the overall set of groups that have the least sum of individual groups within this thing. So the end result that we get from here is a logical group of nodes where replication in these clusters should ideally be done. And in any of these clusters, even a single node, if you can replicate a service, it will be sufficient. This helps not only keep this performance high within range and provides visibility into your system dynamics, how they are, but also saves a lot of cost and avoids unnecessary application assistance. And of course, there are bigger implications like carbon footprint and whatnot. So this is the thing about application, but application has another thing as well. You cannot always replicate. Perhaps you have compliance regulations where you cannot really have a replica running in a particular different region or something as entirely. And this is not considering the fact that application has an additional cost. You are having multiple end-fold increment in the cost of running your own systems. And imagine if you have a large number of network barriers, or if you have a large number of applications, or even let's say if you have 10 applications running in parallel and you would want to have 15 applications for those 10 applications, it becomes like an exponential increase in the number of applications that you would be running. And do we really need replicas for every single thing? What if you could just trade off a little bit of a performance with potentially a lot of cost, a lot of cost gains, and you will perhaps end up realizing that the potential loss of performance is not even something that is observed in real usage. So let's see what we can do about this thing. So let's take an example again. Let's say we have three distinct network groups, and our interactions are like this, where we have an average 300 milliseconds of latency between the first two network barriers. The average latency between group one and group three is 150 milliseconds, and about 90 milliseconds for group two and group three. And let's say we have our span, which is a sparse application having interactions across these three, so if you see this example here, application C, one option you have at hand for application C is to replicate in all the three places because there are interactions from all the places. Or if you could migrate this thing only in near application A and B, which is in group one, you would still save a lot of overall cost and network response time because the number of calls are much higher from that particular zone or this particular barrier that we have. We will still save a lot of overall time spent in this particular system in an hour, and this is despite the increased, slightly increased duration of calls coming from the third network group. So this thing will more than offset the increased network barriers that we have in the third group that we have. So how do you figure out these kind of things? So we had to go back to the previous solution again we had. We have already had the historical latency information in the previous solution that we just did. We just had to append our data with the interaction counts. So volume of calls coming from a particular container to another container, sum it up, you have that data at hand. And we look at possible migrations that you could do with the latency information at hand. So as an example, what if you do this and let go of that? A lot of these water scenarios that you can do at hand and figure out which is the best possible combination that may actually help you get the same thing with perhaps slight letting go of some kind of a performance. That is a completely user customizable that is something that is completely based on how much of a performance that you want but that is something that can be of something very useful to a lot of users and use cases. So once you do that, we can get a recommendation which is something like this where you can see instead of replicating, for example, pod 1 here, you can perhaps move this pod 1 from anywhere which is running it is in the group 1 to either node 1 or 4 or 3 and you could potentially get a cost saving of this many dollars and you will still get a speed up of this much amount of latency in a given hour or given whatever is a time frame that you have. And how do we get this cost value? This is basically a simple, there are third party cost includes as well or you can get this information directly from your own thing where you can use a network utilization or resource utilization that you have and take an average of those things and get this information. So these four solutions together are already something that we have put it out together for everyone to use if you would like to know more about it, more details about it and there are a lot of mitigations that are there that you can customize. So you can go ahead and if you would like, you can feel free to try it out and start. I would like to show how exactly this has actually worked in one of our environments that we tried to run. So we had one of our test environments where we had more than around 30, 35 nodes and it was just meant for running the end-to-end test cases against automated workflows. We had about $4,500 of monthly cost for this particular cluster and what we did was we tried to use this particular optimization of all the four things together at the same time and tried to figure out how much of a thing that is an improvement that we can get out of it, whether we're able to get in performance cost, anything. So when we did that, we were able to see we were able to avoid at least around 20, 25 unnecessary applications of different, different applications that we used to have and when we kind of, you know, translated it to the network and resource values and the cost of running those services, we saved around $230 in this particular environment itself which may not sound like a lot, but you have to keep in mind that this is a very, very small cluster that we used to have and once we had that, we found out that this particular thing is not only helpful but also at the same time there was absolutely no observable performance hit in any of the developer workflows, any of the things and no one actually figured out that it had been running for a while. We had avoided so many applications that no one could tell. There was no performance impact and at the same time the cost savings was so much that if you could have kind of extrapolate over to our non-bigger test environments and staging and production environment, it could be immense and this is a further testament of how these smaller optimizations can have ripple and big impact in larger systems. So again, this is the tool that we have. If you like, I'll be here for any questions that you have. At this point, I think I've pretty much covered all the things. So...