 All right, so I think we can get started. Hello everyone. Thanks for joining the session. I'm Yash, and today we'll discuss two specific problems in container scheduling mechanisms that have high potential impact in the overall cost and performance of the systems and how they can be solved using logic derived loosely from classic games like Jenga and Tetris. So let's play. Let's have a look at container schedulers. As you know, schedulers are responsible for finding the optimal node where your container should be run. In general, it's a two-step process where schedulers first figure out, filter out the nodes which can fit the existing container in terms of the resources available to them, and then scores all these nodes which on the basis of several predefined parameters and all this entire logic and intelligence of figuring out what is the best node is in the industry. And once this scoring is done, whatever node has the highest score is chosen and the container is there. Now, all kinds of schedulers, whether default or even the highly customized one, have these two steps and are performed statically at the time when there is a new container to be placed. And this is where we'll see. So this is kind of analogous to playing a game of Tetris. Each row here represents a node in the cluster with the available capacity as gaps and the utilized capacity as the blocks. And the incoming block is the container that needs to be scheduled, with the resource size being the resources it requests. So in this example, this needs four units of resource to be scheduled. Now, in the game, you use best efforts to place it in the existing rows and minimizing the gaps as much as possible. And just you would want as something like the bottom few rows are, which is densely packed system. However, as you may have understood, playing this game for a while, you end up in a state with something like this, where you have gaps here and there in every which row and make this thing playing game a little difficult. And this something similar happens in containers as well. So as resources decrease in any cluster, the optimization possibilities that schedulers have in the scoring stage tend to decrease as well. And it becomes like a best effort scenario where you have very limited choices and container has to place somewhere fast. It's kind of like going back to the game analogy, you're playing the game at a very high level. And you have to perform this at an n dimensional level because you don't just have one resource, you have CPU, memory, something else as well. And all those factors that we're considering. So because there are now lesser and lesser scheduled optimizations, there are lesser efforts in densely packing the system. And we start seeing more and more fragments of such available resources, which are not big enough to schedule anything new and they're just getting left out. And if you feel adding more nodes help in this situation, it doesn't really because sure schedulers can now optimize the placement project again in the new nodes that you have, but whatever has been scheduled and what are fragments that you have in the existing cluster remain as is. And what is the impact of this we'll just see. So let's take an example real world. Let's say we have three nodes in the system with six or nine containers running. At this stage, we have a total of 600 millicores of CPU and 1200 MB of RAM available. Let's say we cancel only these two resources. So let's say if I try to place a container C4 which is requesting just 300 millicores of CPU and 600 MB of RAM, which is much less than what clusters has available, you would see that scheduling will fail because no node collectively has these two amount of resources together. And if you have any kind of auto scaling enabled in your cluster, you would spawn out another few nodes and try to place it there. So you will be able to solve this thing directly, but it has an additional impact in the cost of your clusters. So what if there's some kind of an overlooker that can identify these kinds of situations and problems and try to see if they are rescheduling options possible that can fit this container here itself. So in our example, if we move containers C1 from node one to node two, we have accumulated more amount of resources in node one than the requested resources by C4. And we can place it directly here. This sounds so simple and it is actually, if you can identify this opportunity. And one thing to note here is that this will not be possible by regular schedulers because once containers are scheduled, there's no, they're not even touched. And which is fair enough because you don't want your containers to be killed off without user inputs. And what we do here is basically we generate recommendations in the form of move and place, which could be implemented on the clusters like Kubernetes and using their underlying APIs and constructs like node and pod affinities. And we have multiple migration strategies, singular and multiple. And in singular, we displace only one container to place a bigger container and recursively solve for the smaller containers until we can place it. And in multiple, we remove multiple smaller containers to fit a big one and try to see if we can solve this thing. And in each of the cases, we always, always try to remove up overall capacity lesser than what we have to place so that our problem because small and small as we go. And all of these things are done not at real time in the cluster. We are doing some kind of simulations in memory map and then try to figure out these things. Whatever strategy gives us the best number of results is the one that is chosen. And we can go after a certain number of iterations and this helps us place it. Or if not, this is the regular thing which is going on will continue as it is. So this is one problem that is what we solved. There's another problem that is very, very invisible but is a significant one. So let's take a container example here. So let's say if we have three nodes and we have six containers running. Now, if you look at a closer look, node one has as high as 90% CPUs at this point but memory usage is very less. On node three, it's completely opposite story. You have 20% CPU use but memory usage is very high. Which means all the CPU intensive nodes are somehow scheduled on node one which is already start for CPU. And all those applications will start like right among themselves to get a hold of CPU and they might get totaled. In case of memory, they might even be killed or restarted. And that's where this problem comes in. You have, this problem has a huge potential to cause abrupt performance degradations and even potential down times in the system. So how do we solve this? It's quite simple actually. Just keep a balance. So let's say if we swap the most imbalanced causing applications in node one and node three. So CPU intensive application highest is C2 in node one and node three has the highest CPU intensive application as memory intensive application as C2 as well. So if we swap these two, the crucial impact this has is now the resource usage and all the two nodes are much more equitable. You now have close to 40 and 50% of resource utilization in all the two nodes. And that means all these applications can now handle any certain extreme burst load or traffic without any degradations as such. And this is kind of analogous to playing a game of Jenga now where each row represents a node again and the block is a container. And in each turn, we just like in the game, we remove the block which has least contribution to the overall balance of the entire Jenga building. Same way we swap the containers that have causing the most imbalance, causing the most amount of disparity in terms of source usage. And what we do here is we identify these imbalance via mathematical construct called entropy which is basically could be a number in cases of just considering two resources. It's a CPU to memory ratio. What we do is we try to have an equal amount of ratio on all the three nodes of whatever nodes that you have in the system. And we aim for that with a kind of a swap recommendation. So we do these swaps regularly periodically for a fixed number of time and until we can no longer perhaps observe a significant change in the entropy values in each of the nodes. And I'll just showcase the results that we have observed in one of our executions. So we have run this in one of my previous firms in a test Kubernetes environment that used to have heavy cloud nodes costing more than $2 per hour. And with Autoscalers we used to spawn out two extra nodes in case of sourced current situations. And what we observed was for running this for a couple of days, we were able to reduce the auto spawn out significantly as much that as we're able to save close to $5,000 of cost every month. And we're all astonished by this because it was close to 30% of savings. And it shows how impactful these seemingly invisible problems and minor optimization could turn out to be. So that's all we have. In case you would like to know more about the implementation details on Kubernetes that we have done and perhaps contribute or collaborate, here's a GitHub link for the project and we'll hope to see you there. So that's all I have. In case there are any questions, we have a minute for that. Yep, yep. Actually no, all that is limited is the amount of time you would take more to schedule the pod. So let's say if you have maybe five seconds of time, for example, to schedule your container that you can take extra, because let's say if you're doing this at runtime and you want to spawn out a new node, let's say if you're in Cloud Prodigy like GCP, you would want to add more machine system, that will take much more time. Then you can rather have a few seconds of extra time and then that kind of provides the input to the algorithm and that will consider, okay, you have five seconds. So whatever the precursor logic that we have, we can basically try to prove the every branch of search space that we have and look for until the time is reached out. If you can find a solution, yes. If not, whatever happens is still better than what would have been the case otherwise. So that as an input we have, how much time or deadline that you have to complete this scheduling mechanism, rescheduling I would say. If there are any other questions, I think we are done. If you have anything to discuss with me, please find me and we'll be happy to answer questions. So thanks for joining and have a rest of the conference. Thank you.