 Good afternoon, everyone. I hope everyone is doing well. I know it has been a long day. Everyone has been through a lot of presentations. So this is the last presentation of the day. And I'm really happy to welcome you all to this presentation on optimizing customer consumers with Kubernetes and KDA. Let me introduce myself. So this is Shubham here. And I have my colleagues, Simran and Yonzo, who will be co-presenting this with me today. What do we do? We work in the platform team at Grab, which is responsible for real-time data engineering platform, which we run on Kubernetes. This platform is used by various service teams. And before I deep dive into our journey on how we optimized the scaling of this platform, I would like to a little bit go over our use case so that we can relate the various experience that we went through. So essentially, we run a data processing platform on Kubernetes, which consumes millions of messages streamed across thousands of Kafka topics. Each of these pipeline, as you may say, run as a deployment. So we have thousands of deployments running in a multiple Kubernetes cluster. And this entire cluster forms the event sourcing platform that we have built in-house in Go. I will not deep dive much into the application part of thing, but a very essential part to know here is these are essentially Kafka consumers. And being Kafka consumers, if you might have worked with Kafka, comes with certain limitations. For example, in world of Kafka, the data flows in topics and topics of partitions. So in simple terms, when we are designing our deployment, we can't have pods more than the number of partitions because then there won't be any partitions to consume from. Secondly, the load that each pod serves depends on the partition it is consuming from. So this is a little bit different from any typical load-balanced application, which will basically load balance across all pods. But here, the load depends on the partition this particular pod is consuming from. And to give a little bit more information about the platform itself, these platforms basically does various data engineering operations, like aggregation, filtering, mapping. A very basic example can be, let's say, when a customer makes a booking at grab, this data, which is generated by the booking platform, needs to go through certain data engineering changes, like aggregation, before it can be used by other teams. This is where our generic platform kicks in. People design their own logic, their own business case, data transformation on this platform in a very generic way and then use this data. For example, this booking data might be used by analysts to basically improve the customer experience. So this is a brief architectural anatomy of our platform. So you can see a Kubernetes cluster running these deployments consuming from Kafka and the data being produced by various service teams. And then this data after transformation goes to predefined stores, like Silla, Kairos, MySQL, S3, for real-time as well as offline use cases. So today, we will basically deep dive into how we optimized our infrastructure for this platform to scale in a very generic way across thousands of different use cases, different pipelines and deployments. But there are certain platform goals that actually guided or basically helped us with taking some decisions that I want to go over. This is going to help you all to understand why we take certain decisions in a way. So firstly, we want to definitely have a platform infrastructure which is strategically optimal at the same time scalable and available. Basically, we can't compromise on the stability of this platform. At the same time, we have to think about cost optimizations, right? Secondly, balance load across all the pods in the deployment. So as I mentioned earlier, being Kafka consumers, directly consuming from the partitions of a topic, we want to ensure that our load across each part of the deployment is same so that we don't have problems like noisy neighbor problem or a pod-level throttling, which can be quite problematic for us. Third, as a platform team, we provide our customer, or basically, in our case, the service team certain SLAs regarding data freshness. In simple term, data freshness basically means once the data has been generated by the producer, by what time it can be used after it goes through a platform. This we generally control or basically we identified by the consumer lag of a certain pipeline. This is very crucial for our system. And lastly, being a platform team, we want to provide a very good user experience to our service teams and don't want them to actually do a lot of resource tuning for their particular pipelines. We want to abstract that information so that they can just focus on their pipeline logic rather than worrying about the infrastructure scalability. With these platform principles in mind, we began our journey with Vertical Pod Autoscaler, also known as VPA. I think you might have either used it or know about it. It basically skates the application vertically rather than horizontally. It changes the size of the pod. In our case, we actually relied on, this was our attempt when we relied on CPU and memory metrics, every CPU and memory metrics, for VPA to scale based on that. So how Vertical Pod Autoscaler worked for us. So this basically shows, as I was saying, how Vertical Pod Autoscaler in general works, where it changes the size of the pod. And as I was saying, these are tough for consumers. Hence, each of the pod is attached to one of the partitions. So to begin with, we kept the number of replicas in the deployment, same as the number of partitions that particular deployment is consuming from. After this, we relied completely on VPA to right-size the deployments based on the resource requirement for stability. This particular setup also helped us to provide a good abstraction for our end users, because they don't have to configure any resources or even tune when there's an organic change in traffic over the period of time. The pod would automatically resize with the changing business trends over the years. And there was no requirement from our side as a platform team or from the end user to intervene. This setup performed well for us, but there were certain challenges that we faced. For example, the CPU utilization or the resource utilization was quite good at peak hours. But it was not very good during off peak, as you can see from the dashboard itself. As I started earlier, these are Kafka consumers. So despite our effort to design an architecture which has equivalent load across all the pod, theoretically, that's how it should work, but practically it didn't come even close. Because the partition, the load on the partition is decided by the partition key, which is decided by the producers. And these partition key are more business oriented. Hence, in reality, we didn't have equivalent load across all the pods. This unbalanced load, in terms of VPA, affected our average CPU memory metrics for VPA to write size and eventually led to consumer lag or basically affecting our SLAs on data freshness for those pods which we are consuming from heavier partition compared to other. Third, the deployment that we did with VPA were consistently running at the maximum number of pods at all times. This basically led to have always running same number of nodes, be it peak or off peak at all times, which was definitely something which was not very cost efficient for us. With these shortcomings in mind, I will invite my next speaker, Simran, who will talk about our next design attempt. OK, so as my colleague just mentioned, using VPA, we did kind of hit a few roadblocks with regards to the CPU efficiency and as well as the cost effectiveness of the solution. So what we did was to take a different dimension of scaling altogether. So we moved from using vertical pod autoscalers to using horizontal scaling, using the Kubernetes horizontal pod autoscalers. So OK, so using horizontal pod autoscalers, the metrics that we used were the same as we were using for VPA, which were the average CPU utilization and average memory utilization. But what changed for us particularly was now that we didn't have to maintain the consumer replicas equivalent to the number of topic partitions at all times. So which meant that during off peak hours, these deployments could scale down. So the deployment was using these average CPU and memory utilization metrics to gauge the amount of load it was processing. And hence, whenever there is an increase in load in the topic during peak hours, it would scale out and hence help in our better distribution of load across pods. And when the load incoming in the topic reduced, it would ask the HPA to trigger a scale in. So using this scaling operations with respect to the fluctuations in the traffic coming into the topic, we were able to achieve a CPU utilization of about 50% moving above from 20%. So this was a great achievement in terms of the resource efficiency that we were looking for. And moreover, just other than the resource efficiency, we could also improve on the cost effectiveness. So basically, during off peak hours, because there was a drastic reduction in the number of pods that were running, that essentially meant that the number of Kubernetes nodes that were required at off peak hours were also significantly less. And hence, it helped us to achieve that cost effectiveness that we were looking for. So we definitely did achieve the goals that we were looking for in terms of resource efficiency and cost effectiveness. But then, Horizontal Power Auto Scaler came with a new set of challenges. And four of these challenges were basically one was the uneven load distribution across pods. So as Shubham also previously mentioned, that one of the aspects of the load across pods is what partitions are assigned to each pod. HPA can just scale in or scale out based on the load that the entire deployment is processing. But what particular partition gets assigned to each pod is something inherently we cannot control. Secondly, was the unbalanced resource utilization across pods. So as I mentioned, we were using the average CPU and memory utilization metrics in order to perform these scaling operations. But that meant that there could be scenarios where, let's say, one or two pods were over-utilizing the resources. And these over-utilization of the resources is not something that HPA can detect. And hence, this led to our third challenge, which was higher consumer lag. So when the resource consumption was much above the target resource that we were providing to the deployment, this meant that the processing slowed down. And hence, there were higher consumer lag for these pipelines, which essentially meant that we were breaching our service level agreements. So other than these three challenges, one, another major challenge, was the need for continuous resource tuning with HPA. So now we have a fixed resource that we set for each deployment. And what HPA can do is basically scale out to the maximum number of replicas in order to process load. But in scenarios where the organic traffic within the topic was to change, it would require a manual intervention from the user in order to provide the appropriate resources to the deployment. So in order to solve these issues, some of these challenges, basically, we incorporated KEDA. So as most of you might have heard in the keynote speaker, they told you about KEDA, which is an event driven auto scaling mechanism. And it is an open source operator that helps to horizontally scale parts based on the external triggers or events. Now while HPA also does provide with the possibility to incorporate the external metrics, KEDA is just better suited for applications which are event-driven. And it provides an extensive set of tools to incorporate different kind of scalars and to address some of the challenges that we just talked about. So using KEDA, we incorporated custom metrics in order to gauge the Kafka consumer lag that is directly associated with our SLAs. So by utilizing these custom metrics, we were able to keep a track of the Kafka consumer lag and hence were able to maintain our service level agreements. Apart from maintaining our service level agreements, KEDA also helped us to achieve higher resource efficiency. Now how did KEDA help us achieve that was when we were focusing on maintaining guardrails for these important metrics like the service level agreements, we could now set a more relaxed target for our resource metrics. So essentially when we were using, when we were not using KEDA before, we were setting a resource target for CPU utilization to let's say 70%. But now since we had the targets or guardrails assigned for consumer lag, we could now push the CPU utilization to more than 80% or higher. And hence it helped us to achieve a higher resource efficiency as well. So in our systems context, how KEDA helped us achieve our goals was majorly in three ways. One was the ability to incorporate complex scaling rules. Second was the seamless integration that it provided with our Datadog monitoring stack. Now these custom metrics, the application specific metrics that we talked about is something that we maintain on our Datadog dashboards. And using the KEDA's Datadog scaler that's provided out of the box, it was a seamless integration. And third is to obviously the ability to use the application level metrics, specifically which in our case was the Kafka consumer lag. Now, incorporating KEDA with HPA did solve a lot of the challenges that were concerning our CPU or memory utilization efficiencies, our cost effectiveness, and also the major challenge which was regarding SLAs. But there were also scenarios which we needed to consider regarding when KEDA would reach its maximum scaling limits. So this essentially meant that whenever there is an organic change in traffic as we talked about in HPA as well, we would still need to manually resize deployments. And this concern regarding manually resizing deployments also meant that there was a lack of abstraction of these resource configuration details from the end user. So in order to solve these challenges specifically, we designed an operator which was a custom resource definition operator. And this resource operator is something that we designed in-house. And I would like Yonzo to talk about it. Okay, before we dive into the details of Resource Advisor, firstly let's talk about why we need this Resource Advisor in our case. The Resource Advisor plays a crucial role in addressing the vertical dimension of our deployment strategy. This specifically relates to the allocation of resources within a pipeline deployment. Well, KEDA effectively managed the horizontal scaling of the pods. It does not govern the size of the individual pods. Consequently, the Resource Advisor becomes important in addressing this particular dimension which is vertical within our pipeline infrastructure. Okay, so now let's talk about the use case of our Resource Advisor. So firstly, let's talk about when we need this Resource Advisor. We execute the operation of Resource Advisor during off-peak hours for a pipeline when the traffic is low to avoid the conflict while scaling is doing by KEDA. And this will be handled by the component called scheduler. And after that, let's talk about how does this Resource Advisor work. We have three key steps in this process. We gather historical data to understand the applications processing and the resource requirements before changing its pod size, which is handled by the component called data collector. After that, we make the recommendation based on the above data which is done by the component called recommender. And lastly, we need a way to apply the Advisor provisioning resources. And this is the job for the component called updater. So in total, we need four major components for our Resource Advisor. All of them will be managed by the Kubernetes custom resource definition operator. Hence, by scaling vertically, our Resource Advisor ensures we optimally allocate resources for our pipelines. This means we can smoothly adapt to the organic change in traffic over time. Okay, here is a whole workflow of our Resource Advisor integrated with KEDA. So firstly, the horizontal scaling is still managed by KEDA at all the times. And it ensures the peak resource utilization based on the traffic throughput today. Next, we have Resource Advisor triggered during off-peak hours based on a schedule. Then after that, data will be collected from data.custommetrics and the current resource metadata for the deployment. And we also define a certain condition for Resource Advisor to make the recommendation. For example, there is a one of the scenario like a pipeline being totaled even when KEDA had scaled deployment to the maximum number of ports. This would lead to the consumer lag for the pipeline and affecting our work service level agreement. At this time, to stabilize the pipeline, the very obvious choice is to allocate it more resources. Hence, the Resource Advisor will recommend a higher resource for its deployment. Finally, the recommendation is applied and the changes are made to the deployment and leading to resizing of the ports. Okay, now let's talk about the tangible benefits at the end of our journey and along with our plans for the next steps. There are four major benefits I would like to talk about here. Firstly, improve the resource efficiency. Based on our statistics data switching from VPA to HPA with KEDA and Resource Advisor, which is a CRD operator, it helps increase the daily average platform wide CPU utilization from 15% to 57%. This means our pipelines are using resource better even during off-peak hours. Secondly, cost saving. As a result of increasing the CPU utilization efficiency, the new architecture reduced the daily cost by 55%. Furthermore, upholding SLAs on data freshness and completeness, KEDA and the Resource Advisor insures the continuous delivery of our platform SLA and around data freshness and zero data loss. Lastly, abstraction of resource configuration from platform users. The Automatic Resource Advisor enables the platform users to focus on their business logic and they don't need to care about how to tune the resource manually. Okay, now let's talk about our next steps. As next step, we will be focusing on the below two areas. Firstly, it's uneven load distribution among ports. KEDA and Resource Advisor don't guarantee an equal sharing of the loads among ports. Their main job is to act as safeguards, making sure we don't run into problems like consumer latency, which might violate our service level agreements, which is SLA, to address the challenge of uneven throughput caused by the uneven partition assignment among ports. We need to consider implementing measures like rate limiting at the application level. Secondly, advanced scaling based on the complex custom metrics. Instead of just depending on the metrics like what we are using, which is CPU usage, memory usage, or customer latency, we would like to explore more advanced scaling rules to make the most of what KEDA and the Resource Advisor, which is CRD, can do. So that we can tackle complex business requirements more effectively. Okay, that's all of our presentation and we are excited to see the above ideas already turning into real progress and thank you for your time and being an essential part of today's presentation. Now we will be answering any questions if you have for us. Thank you for the talk. My question was around Resource Advisor. Is the key to not having the two conflict mainly around having Resource Advisor focus on the long-term or integral data versus having your KEDA, having your KEDA scalars focus more on short-term like proportional and derivative? Sorry, can you repeat the question? Narrow down, like is Resource Advisor more focused on long-term data? Yeah, so yeah, that's correct. So we use Resource Advisor to actually right-size over the period of time when KEDA could not actually basically work for us. When we see either organic increase in traffic or decrease in traffic or any other changes, as these are Kafka messages, they have a lot of differences, let's say schema might change or topic size might change, all of those stuff. So Resource Advisor is yes, a long-term change in the entire pipeline setup, whereas KEDA does daily scale-in and scale-outs. You mentioned that you had to leverage KEDA for custom metrics, but HP also works for custom metrics. Why did you have to choose KEDA for custom metrics? Yes, first of all, definitely the next steps is something which is more visionary for us, but even before that, KEDA provided more extensive integration with different scalers for us, and also we have to expand our scaling rules beyond consumer lag alone for very different use cases. For example, some use cases where there's aggregation is involved, where we need to deep dive into different kind of metrics. So KEDA helped us to actually expand the portfolio of different signals that we can incorporate in our system. This was just a very basic one system that we talked about for the duration, but essentially we are working on and we'll be working on more signals that we would want to integrate with KEDA, which was not very easy with HP right off the bat. Any other questions? Okay, then with that, that's it. Thanks for your time. Thank you. Thank you.