 Thank you. Yeah, so I'm here to talk to you about how we are optimizing resources in Kubernetes How can you reduce cost? How can you? improve utilization on your Kubernetes clusters and I'll show you what you can do with open source Kubernetes I'll show you what we have built on top for the business use cases and hopefully you will get some ideas that you can Go back and use I used yourself So that's what you already said. I One thing that people know me for for good or bad things is I restarted the Jenkins Kubernetes plug-in And but I'm happy that you all made it here right after lunch instead of going and take a siesta Which it would be the proper thing. I hope you got coffee because we're gonna have been here for the next two hours Just kidding So I work at one team at Adobe we use Kubernetes at Adobe a lot of people ask me. I didn't know Adobe uses Kubernetes Yes, we do use a lot of Kubernetes stuff and we're hiring by the way And I work at the experience manager product is a bit of an introduction So you understand what type of application we run It's a existing Java OS GI application with all the things that happen with Java and the JVM It was already distributed before running on Kubernetes It uses a lot of open source components from the patches of our foundation. We use a lot of open source and we contribute back It has a huge market for extension developers that write their own extensions pluggy well components that run on Adobe experience manager and the right modules that That run in process on AM. So this is an interesting use case because we run customer code in our multi-tenant clusters So on Kubernetes on the cloud service AM cloud service specifically There's also on prem and managed services, but on On the cloud service we have more than 25 clusters and we keep growing every month Because this is a content management system People want to run this in multiple regions. So they want to run close to their customers So we run we have us you're up Australia Japan and we keep adding new regions as needed And Because customers can run their own code. We also have limited permissions for security reasons We use namespace to provide the scope So different customers cannot see each other's data Networking and all that so network isolation. We use make use of quotas and permissions to to isolate tenants I like to refer to it as a Micro monolith because it's not we're use cases not the the typical use case I have one deployment and a scale up to 200 pods or something like that. We have thousands of deployments that are very similar but They also scale, but it's not one deployment that scales a lot is all thousands of deployments that are smaller We have multiple teams building services on top of Kubernetes and application level and This is also important because we want a way to to make these Optimizations and these ways to scale that are like orthogonal to the developer teams If we don't want to chase people in each team to do something sometimes is Way better to have a way that applies to the whole cluster or or we Or we don't have to Require each team to do something we do that for them. So those solutions are way better. We we use extensive Stas if users of resource requests and limits, so if for those of you There are newer to Kubernetes request how many resources are warranted Limit this how limit is how many resources can be consumed over those Number of requests and we play with this a lot to make sure that we can scale while keeping the cluster stable And we apply them well and you can apply them on to CPU If you have if you have more CPU usage than your request that you may end with CPU throttling if you have for memory The limit is enforced If you try to use more memory than the limit the cloud the Colonel is just gonna kill your your workloads and For the terminal storage in Corners. That's that's also the request the limit that you can set and If you have pods that use more ephemeral storage than they Then the limit was set you're gonna get both eviction So you have to play with all these Resources and you have to play with requests and limits in a way that you don't need a huge cluster to run and Your workload is not gonna crash all the time On AM this is a specific case for Java applications anybody here running Java and Kubernetes Yes, a lot of people So if you probably know Java the JVM is gonna take all the memory on a startup and manage it So you set the heap sizes and if you say I don't know 75% of the memory as a heap size And you have one gig is gonna take those 750 max and use it all the time and Kubernetes Doesn't have any visibility on what's actually used what not the JVM just takes it and So that kind of Makes it a bit harder to get visibility on what happened on the JDK is over 11 The JVM will detect how much how many how much limits is set on the memory level And the JVM is never gonna try to use More than that in the previous versions it was just using by default it would take the host memory and that would Always typically cause crushes So to start on the things you can do to improve usage First thing obviously Kubernetes cluster auto scaler who is using the cluster auto scaler Everybody's using the cluster scaler unless you're running on bare metal You're gonna be used the cluster auto scaler to increase and reduce the cluster size You can base it on CPU a memory request We always leave room for spikes because you don't want new pods to require nodes and the time it takes for a node to come up and so on And we have multiple scale sets For different region for different reasons specifically we want a different availability zones We have multiple worker tiers Kubernetes nodes node groups on Asia We have the maximum number of nodes that the cluster can have defined We don't want to scale past the limits. We know are safe And we use the least waste the scaling strategy to That will minimize the the idle CPU I Put a rough number there 30 50. I mean, I don't think anybody on their same mind would not use the cluster auto scaler Maybe some very specific use cases But if we had to run clusters at full capacity all the time the amount of money we would be wasting would be crazy So things that you can see These are examples Real examples so you can see the cluster of the scaler going up and down in the number of nodes in a cluster Some spikes that could be I don't know Maybe it's a day of the week or maybe it's business hours or something like that for our customers And we have these typical patterns Sometimes you will see other patterns that are a bit more scary, right? so this going to the to the limit of this cluster size and this was because of a bug That triggered auto scaler to scale up. So you see how at some point the auto scaler went crazy and Because we have the limit set up This didn't keep going, but you see once we figure out what was happening the the number of machines just started Stabilizing and going down The other typical option you're gonna have is the horizontal part of the scale which is in our horizontal part of the scale a Lot of people listen So you this is basically creating more pods of a deployment when you need them and we have Two ways to metric set up to the to the HPA auto scale on CPU and HTTP requests per minute For CPU is a bit problematic because you could have periodic tasks or a startup tasks that spike the CPU. So imagine somebody makes a mistake or a customer makes a mistake and Only start up the CPU spikes and especially with Java Maybe the garbage collection things like that This would cause a cascading effect. So if every time a pod starts There's a spike on CPU that would trigger another pod to start up that will also spike on CPU and so on So it's something that it's gotta you gotta be careful about Am specifically also needs to be warm up on a startup Because we are serving content We want cashing to be warmed up and a bunch of other reasons And for us like a request based on the scaling is better suited As long as customers don't have expensive requests that you have one request that takes a lot of resources Then the number of requests is not an indicator 12 to scale up and this probably save us like 50 to 75 percent of Then running at fully scale 8 pulse 10 pulse or whatever for each customer, right? So this allows us to run from two pods for we use two pods for ha For production environments, we have one pot for other environments, but We don't we don't go to the To the run on the limit all the time And here it's another example How you see that the number of pots more or less matches the request per minute that we get and the request per minute is again, it's a very typical example of business hours night Number of requests that you get and the pods just match Another option you have on Kubernetes is the vertical pod autoscaler. Anybody using the vertical pod autoscaler? Let's people. Okay, so this is Basically increase and decrease the number of resources used for each pod or each deployment So you don't scale with mother pods. You just take one scale Increase or decrease the number of resources Something that is problematic is that it requires the the restart of the pots So if you have something that is very fast to start that may not be a problem If you have something that takes a long time to start then that's a problem This could be set on automatically or on the next start. So you don't are you're not Killing pots continuously. You could just say oh if these pots gets reschedule or restarts or something apply the changes then But this also makes it a slow to respond and it can exhaust if you do a scale if all of the Pods that you have running on the same node a scale up to the same time. You can also exhaust the resources of the node So we only use it on our case on developer environments and we only use it to scale down if needed and only for some containers because Yeah, because of all the reasons is not very good for for our production uses So these save us a bit a small percentage five to fifteen percent or something like that of the resources for developer environments now Some things that you that are outside of Kubernetes that we build ourselves for a use case the first one is hibernation and Is very similar to the scale to zero problem that you can solve today with With a horizontal pod autoscaler and custom metrics you could do a scale down to zero or If you use K native or something like that or functions Then it's something like that, but this has a twist We don't first or pots take a while to start some minutes to start So it's not like you can just bring down to zero and scale up again And we not only scale down a deployment because most of the things we are gonna find in Kubernetes here are deployment specific So this is more of a business concept of a customer environment So when a customer environment doesn't have any resources coming in for x amount of hours We just scale it down and we scale the whole environment Which is several deployments and we also delete ingress routes and other objects that may limit cluster scale so we hate a Limit where we have a lot because we have a lot of environments running thousands of environments in each cluster We have thousands of ingresses and at some point that becomes a problem for reprogramming the ingress controller So on hibernation. We also delete those ingresses This is implemented very easily Very simple is a cron job that just goes to permit use checks the number of requests in the last n hours and If if it was not accessed it just scales hibernates it and For the hibernation because we we change the ingress route that the customer with you or the user would use and Just point we point them to an to a website where they can click a button and the hibernate it and This we are playing between 16 80 percent We do it for some only some environments with what we call sandboxes so it's like development environments or developers customer developers different more like playing playgrounds and The savings we get as I said is 60 80 percent So this is huge for us because it allows us to to pack a lot of things in the same cluster and It's very stable in our case and Then at some point what we're gonna do is if you haven't used your environment for the X amount of months we're just gonna Garbage clean it delete it garbage collected Whatever whatever you want Another thing that we've built in collaboration with other team at Adobe is a project called arc Which is automatic resource configuration? so one thing we noticed and we analyze is that a Lot of services request more memory in CPU that they are actually using So arc can transparently Reduce this CPU and memory requirements So if we see that One cluster has a utilization rate Very low like five ten percent or whatever on like CPU We just apply Percentage to the whole cluster or to a specific namespaces This way what I was telling you before we have different teams doing different applications So instead of telling them you go on analyze what you do usage and do this and that We can we just apply this all across all the namespaces specifically for like sandbox Stage clusters and non-production. We just go and say everything Everything is goes down It doesn't touch us touch the limits. So the side effects are a bit limited Most likely you're not gonna be trigger on Java the out of memory and the kernel killing your pods Obviously if if for whatever reason Many pods that use a lot more resources than they request happened to be in the same node Then you could get CPU throttling on some side effects But then is that's what a bit what we analyzed right? We look at the use utilization And what are the chances of? high resource using pods Being in the same node at the same time So that's risk benefit thing arc also has the recommender part That leverages historical metrics at the deployment level. So it will give you recommendations on annotations About the optimization of the deployment on how much is being used So it's a bit like VPA It's a bit like VPA But it gives you historical data and also applies to the whole cluster namespace all that And people don't have to Know about VPA or having to create a new CRD for VPA or anything like that. So it's automatic for them So we we can dial down at the dimension request of the cluster or the namespace level and this kind of give us 10 15 percent Reduction So in this graph for instance at the We have the number of requests the blue line and we see that it's very consistent that the utilization is very low For this you gotta think also that our workloads are very specific We have we are serving content if there's no users coming to the website There's no activity, but you cannot just shut it down. So we cannot scale it to zero in production But then We rely on HPA to pick up the pace if needed But we always have to have like at least two pots running all the time Even if there's no traffic because at any point in time there could be right So we have the utilization pretty low in this cluster We have the requests and we have the original request here And you see that the actual request is Percentage lower than the than the requested that the original request and this is what we are saving and Then we have the limits which we always Put pretty high just in case we have Spikes so we don't have to use HPA if we have spikes temporary spikes and Yeah, the only risk is having very noisy neighbors in the same node some questions That I anticipate is why do you use arc and not the VPA recommender? And this is a team that were Joe is working so the arc allows them to to do the full control of the recommendation engine and It the implementation as I said was at the cluster level So you don't have to deal with a specific Deployments it just applies to everything running in the cluster. Okay, so To sum it up We have a few optimizations. I didn't mention phenops in the whole talk Maybe I should or I shouldn't Because that's all the trend now So we have from the Kubernetes ecosystem We use the cluster of the scalar HPA VPA and there's some new things coming Like the HPA down to zero that was very interesting that was already out there Some releases ago VPA. I don't know if it's ready or not But I think there's the possibility or will be the possibility of changing the requests Life without having to restart the pots. I read about that. I don't know where what's the status right now? Internally, we built this hibernation very simple hibernation more business case oriented and Arc which is not so business case oriented, but it's more like Kubernetes cluster thing and They apply at different levels application level and an infrastructure level and hopefully Combination of them will help you optimize and resource reduce the resources you need So I hope this Will be useful to you and you can go on and apply some ideas on your use case And now I'll take some questions Thank you, Carlos. We have one question here from from the online audience They want to know more about arc if it's open sourced and if you have any plans to let the community access it No, it's it's not open source I don't know what the plans is and this is from another team Do you have anything to say yet? It is in the works we were we just About about a month ago. There was a you know talk about this Stay tuned. We should hopefully have something coming out in the summer Great. Thank you. I think there's another another microphone a little bit further down as well It's if we're gonna line up and ask questions there and I'll pass this around up here Hello, thank you for the talk. I want to know how like after deployment is scaled down How do you deal with? Resource fragmentation to scale down the nodes itself so typically we have The workloads distributed across multiple nodes so the cluster auto-scaler will not scale down the nodes Yeah, but if you like could move some more close to some notes, you could scale them down Yeah, I think that's all tunable at the auto Cluster auto-scaler if you want to do that or not we have a lot of things that come and go So we don't have very long lift pods. So we don't have that problem that much we I don't know. I probably we don't have anything more than a few days old So it's continuously. We never had any issue with that and we also apply updates more or less frequently Which may trigger node restarts and evictions to happen and so on but we have enough Movement of workloads that we never had to worry about that too much. Thank you So there's a microphone as well, please like there's a microphone like half way down the rooms if we're gonna line up down there So I don't have to run across the whole room Hi Hi Here I have a small question about How do you know how much resource to give to each pod in the request CPU and the memory? I don't speak about auto scaling about static number Yeah That was just like the trial and Edward experience over time We started with some numbers the different teams building applications They come up with those numbers and we just help them see what's happening and have providing them the Grafana dashboards and things like that and each team looks at it and says You know, well, I'm using more. I'm using less and it's more I live less and they keep refining it the initial numbers Did you thought about to build a system or platform that will say to each a deployment? What are the optimal a? CPU and memory that he needs Yeah, so our recommender would do that and would set annotations at the pod levels and tell them What's there like usage real usage if they want to use that we we go with some to some teams We go and tell them, you know Look at this because is wasting a lot of CPUs and we can let them know But we my my whole goal is that each team is vertically independent So we provide tools and they are they own it So we just tell them unless there's a problem. We just let them know, you know There's you should tweak it or not. It's up to them to do it. Do you instruct them to put Requests and limits for both memory and CPU we enforce that on the pull request level We use Rigo policies with Conf test And we have a set of policies typical ones like security policies Like you should not mount secrets as environment variables You should not do some things mount things from the host and then you have to put request and limits Hi Yeah, just thank you for your folks You mentioned a couple of times that the risk of maintaining your request quite low But the limit is quite high if you've got noisy neighborhoods. Do you have any recommendation for that type of scenario? Yeah, it depends on your case in our case on average on Statistically, we don't have that much of that problem But if it depends on your business case The only solution is to raise the request and make sure that they all are using some sort of like node Time things to know this and then having nodes that you know, they're gonna be very busy and use labels To get them a schedule on those nodes and then having nodes that are not so busy and having other And you could have workloads that have less resources and more spiky things in one node And more resources and more if you use like machine learning workloads, right? They are those are gonna be like 90% 95% CPU all the time Otherwise, you're just wasting money You want them to be up there in our case we We cannot because it doesn't depend on us. It depends on how much traffic we're getting We cannot make them be there all the time And because it has to handle the spikes Yes Is there any recommendation of the instance types that you're using and are you using spot? I don't know if you're using a WS, but are you using spot or we use a sure and we don't use a spot instances because Pricing reasons or whatever is not worth I guess But I've done it in the past Just use a spot instances For things I don't know like Jenkins builds. You can just on Kubernetes Just use a spot instances if you don't care if sometimes a bill fails because there's no spot instances You can wait a bit more than that's fine and the types of nodes We use a standard ones we look at that we look at them Proportion basically what we measure is the proportion between CPU and memory that our application uses And then we went and look at VMs that have that proportion because Ours is pretty Normal I said, but yeah, if you have like high CPU Uses then you would go with a high CPU to raise ratio CPU to memory ratio and so so it depends on your use case and also depends If whether you mix workloads or not But would you prefer like large nodes or too many smaller? No, we prefer less now because there's also limit on how many nodes you can have so the larger the better Typically, so I'm wondering if you could go into more detail About your JVM configuration And the memory usage. I think the newer right the newer garbage collectors are able to release memory to the operating system Yeah, it didn't have a really good experience of that. So I'm wondering if you could share your insights so one thing so we don't let the The default JVM algorithm to decide how much heap we all said it I think it's 75% of the available of the request Because that was we don't have anything of heap not much of heap memory 75 was like the safe high Number you also have to consider the JVM Changes the defaults based on how much memory and CPU is available So I wouldn't recommend sticking with the defaults because then you're gonna get surprises Maybe you're using pods with less memory and then now the JVM suddenly changes your garbage collection algorithm It changes your your amount of heap and things like that. So it's better if you always set it explicitly to what you need Yeah, I was mostly memory Setting up the right memory in this application case We want the heap to be always the same amount. We don't want it to go up and down You could also set that you could say minimum maximum, but that also Yeah, it's the garbage collection working and things like that Yes Have a question over here Thank you so much for the presentation At my company we came with similar solutions although probably a bit more crude. So my question is what's next? Do you have any ideas of how to be more make a better use of resources? um Yes, so We want to improve hibernation a bit more And we are looking at I'm trying to figure out which one I can tell you about We were looking at the different VM sizes, but we kind of left it For now it looks good enough for us as we start mixing maybe different workloads. We may have to revisit that and Probably increasing the CPU usage making sure that people Because our CPU uses is very low in average then we have Times where there's a lot of traffic a black Friday, whatever and then Yeah, I mean Well, we're looking now more at other costs that come on It's not so much about the infrastructure level, but it's on how how can we pack more? Customers into the same clusters because not Not only Increasing the usage, but also having bigger clusters means less maintenance less work to do So it's not directly on resource utilization, but it's also in how to reduce the cost of operating the whole thing I think we have time for one last question down there One question over here Carlos. So I've been reading over here the back. I've been reading a lot about Not setting CPU limits. I mean I've been reading about people not recommending setting CPU limits on pots Yep, we've been trying to implement that at our company. We just did that recently. What's your opinion actually about that? Yeah, I was looking at that recently and so what happens if you don't set CPU limits and your Your workloads Basically are gonna share based on how much they request. So you have two pots requesting One CPU they are gonna share the CPU 50% of the time The problem is when you have pots that request a huge amount of mem of CPU Like things for our customers So they have I don't know for CPUs or eight CPUs whatever, but then you have cluster services That you won't always running That request but that only need like operators or something that only need point five CPUs So now suddenly This workload can take a lot of the resources of the node and this other workload which may be critical for you Is the starved? So that's the balance Why it's not clear for us whether we want to remove the limits or not? Thank you Excellent, I think that concludes the QA section if you have more questions for colors Please take it outside and don't forget to rate the session in the in the app afterwards