 Hi everyone. Thanks for joining us today. We're going to give people a few minutes to join So just hang tight as we are getting ready to get started a reminder today's call is a webinar is about better open source Scaling and right size and k8s. We are joining joining us today is Andy Suderman CTO and Stevie Caldwell lead site reliability engineer both with fair winds So we'll just give everyone a few minutes To join so that way we can get started here in just a moment See a few folks joining us a few of our different platforms wonderful Great, I'll go ahead and get started right now. I'm Katie Greenlee and I'll be moderating today's webinar I'm going to read our code of conduct And then we'll hand it over to Andy Suderman and Stevie Caldwell from Fairwinds a Few housekeeping items before we get started during the webinar You're not able to talk as an attendee. There is a Q&A box at the bottom of your screen Please feel free to drop your questions in there and we'll get to as many as we can at the end This is an official webinar of the CNCF and as such it is subject to the CNCF code of conduct Please do please do not do anything in the chat or questions that would violate Be in violation of that code of conduct Basically, please be respectful of all your fellow participants and your presenters Please note that the recording and slides will be posted later today on the CNCF online programs page at community.cncf.io under online programs They are also available via your registration link and recording will also be available on our online programs YouTube playlist With that I will hand it over to Andy and Stevie to click to kick off today's presentation Thank you I'm gonna share my screen So that we can share these amazing slides I Created we're good Yeah, cool Well kick things off. So you're here. We're here today to talk about better open source scaling and right sizing in Kubernetes first, let's talk a little bit about who we are. So I'm Andy. I'm the CTO at Fairwinds I've been with the company about six years. I've been running Kubernetes in various production environments for about eight We run Kubernetes clusters for other people. So that's where we get all this information and we use all of this stuff on a day-to-day basis Running infrastructure for a lot of different customers across a lot of different verticals and a lot of different sized companies And I will hand it off to Stevie to introduce her So I'm Stevie Caldwell And I'm a tech lead for the SRE team here at Fairwinds. I've been at Fairwinds for about five years so came in the door shortly after Andy and I've been working in tech in general for as long as I can remember Lots of previous lives as network admin sis admin help desk person and for the last few years I've been focused Primarily in the container space in the space container spaces and Kubernetes in particular Awesome awesome so A lot of what we talk about these days is about platform engineering and I think that's a one of those terms that we like to throw around in various circles and mediums and whatnot and so I always like to start with a Foundation of what do we think of as platform engineering? What do we call a platform? What is it that we deliver to our customers that's built on top of Kubernetes? So some people refer to a platform as an internal developer platform But really the goal is developer enablement We want to create a place for developers to be able to build and deploy their applications in a way that's consistent Safe and ideally quick You know one of the things we talk a lot about is how quickly can we get new features out? How fast can we deploy and stuff like that and so Kubernetes by itself is not necessarily a platform But the foundation for a platform and so a lot of the content that we make and a lot of things that we talk about Are about things that we build on top of Kubernetes or we install and run and configure on top of Kubernetes to enable developers and Platform engineers and DevOps engineers and all the Marriott of titles that work on this stuff to run their applications in production on top of Kubernetes and so today we're talking about rightsizing and auto scaling Stevie's gonna get into that a little bit more But again, that's one of the pieces that we deliver as part of the platform that we give to our customers for them to Run their applications again securely safely and deploy rapidly So I'm gonna hand it off to Stevie to talk about Kedah and All the fun things that we can do with auto scaling Thanks, Andy. I did have a quick question for you before I get started What do you call an iPhone that isn't kidding around? iPhone that isn't kidding around. I don't know. What do you call an iPhone that isn't kidding around? Dead serious All right. Yeah So now everyone's relaxed and Feeling good, so we're gonna talk about auto scaling. First of all, I like to I like to typically start off my webinars Level setting, you know, not making assumptions about what people know Because I feel like that tends to make a better experience for everybody if you know this There's no harm in hearing it again. If you don't know it then it brings you up to speed so that you get more enjoyment out of what We're talking about here today. So what is auto scaling went searching around online and I found a definition that seems Decent right so auto scaling also referred to as auto scaling Auto scaling and sometimes automatic scaling is a cloud computing technique for dynamically allocating Computational resources some mouthful like most of our tech definitions are Also Andy I'm full screen so if you want to if you're trying to ever say something I can't see you to just jump in okay, we'll do I'm surprised. There's not an acronym in that sentence though because one thing that it's Yeah, so that's a really wordy way of describing auto scaling I think in a nutshell right auto scaling is adjusting your compute resources You know in response to some threshold some trigger some desired state At its core It's something that informs our systems that we need more or less compute and then something does that for us So I was working on these slides and I had I said Andy. Yeah, can you come take a look at him and he Was looking at the slides and he said What's with the burger? So I have this burger here because I was both hungry and I was trying to think I was thinking about auto scaling and it occurred To me that one of my favorite places to get burgers I think it's kind of a cool example of auto scaling to bear with me right with me here So this is a five guys burger five guys is a burger chain if you're not familiar with it, right? And when you walk into a five guys what'll often happen is the person at the register Will turn around and yell two patties to the people cooking in the back So you've just walked in the door you haven't talked to anybody and They've already put two patties on the grill for you. So by the time you get up to the register Place your order Pay for your food get your receipts step off to decide to wait Your patties are that much closer to being ready to you know be slid into the bun and in your hand and that to me it's kind of like auto scaling right where the cashier is the loop the thing that watches For you know some threshold some event some change Me walking through the door is that change is that event and then You know having the the cooks throw down two patties. That's the auto scaling the reaction the action to the event So I like it. I thought that's pretty cool I've never actually been in a five guys, but that is a that this makes a lot more sense having a burger on the slide And now I'm hungry. Thank you. No, you're hungry. Yeah, because it is a good-looking burger So there are many different types of auto scaling In this space that we're in containerization Specifically kubernetes when we talk about auto scaling we're typically talking about one of these three types, right? So there's vertical pod auto scaling Which is where you're increasing the CPU or memory request of the existing pods and You know to keep with the five-guides analogy that is making bigger burgers Instead of two patties make one really big really big patty For horizontal pod auto scaler it increases the number of pods To increase compute right so that is the putting more and more patties on the grill And then for cluster auto scaler that increases the number of nodes which also increases compute and often plays into increasing You know the number of pods or the CPU or memory requests because you might need more underlying compute to actually Accommodate those changes right and that could be Equated to adding another grill to the restaurant, right? You might need another grill cook to work that grill, but yes. Yes Unless you're a Squidward or SpongeBob, maybe Squidward So those are different types of auto scaling just to give you a high overview of things and today We're really going to be focusing in on horizontal pod auto scaler We talk about auto scaling and right sizing in your cluster So what is horizontal pod auto scaler so it consists of two main Two main parts right so there's a controller Which is a control loop that runs every 15 seconds by default. I think and You know does a thing every 15 seconds evaluate some stuff and does some stuff And then there's HPA resource Which tells the controller which could which configures what the controller is going to do what it's looking to affect and You know it configures it right Over here we have like a little snippet from an HPA resource Configuration and it has a you know the spec field which is You know if you've looked at any YAML for Kubernetes. This should be very familiar to you and The various fields of this spec right We have max replicas and men replicas which tells the controller Like it gives it guardrails for The scale parameters right so always have no fewer than three replicas running in this cluster and You can scale up to a hundred right? And then here is the scale target ref and a scale target ref Tells the controller what pods what underlying pods to pay attention to and it does this by Using these selector labels on the deployment to find the associated pods. So it will find the pods That are owned by this basic demo deployment in our in our example here And then the metric is the metric that we're keeping An eye on and the one that's going to determine whether or not this controller Actually does anything right so this metric we're looking at here is a very basic metric and this this whole spec is super basic and It scales off of a metric type called a resource metric which is CPU or memory Really right and it's pretty I think you know you probably infer what this means from here Average utilization 30% So it's going to in a nutshell Look for all the pods Find out the CPU utilization for all these pods Average those out and figure out if the average utilization across those pods is 30% or more and if it is Then it's going to take some action and it's going to scale Up in accordance with that right yep question for you. So that utilization percentage Yeah, is that what is that a percentage of is it a bit like well? How do we what do we say utilization is what's the Like is it the resource is it the CPU limit is the CPU request is it Which one are we it's the actual it's the actual like current Usage of the CPU Of the container rather in the pod And Yeah, cool So, let's see So it you know But that's a good question because if we're talking about You know what this utilization is that leads to the next part, which is where does it get? This information so we're talking about average utilization and it's average utilization across all the containers that are inside those pods Where's it getting that information? And that's where the metric server comes into play and you know as you see in this This this requires as you know it requires a metric server, which it does because the metric server Exposes metrics for these workloads in a way that the HPA controller can See can can access and use to make its scaling decisions and so the Metric server exposes metrics at the metrics dot kates dot IO API endpoint in the API server And This diagram kind of gives you an idea of how that works Well, I'm going to too much detail, but see advisor runs on all the nodes it collects metrics at the container level Exposes those metrics through the cubelet API metric server aggregates those metrics from all the cubelets in the node and then makes them available at metrics dot kates dot IO And I'm going to pause out of here for a moment and go over to My cluster just to show you a little bit of what I'm talking about So I also get super buttery fingery when I'm doing When I'm doing webinars, so I copy paste a lot of my commands. Otherwise, we'll be here all day watching me Yeah, we're still only seeing the slides. Oh, you're only seeing the slides. Oh, hold on Stop sharing while you're switching. I'm gonna answer a question that came in from the audience if that's all right Yeah, do it So we have a question About how HPA and Argo CD work together I assume that once HPA changes the number of replicas Argo CD will change it back to what it was in the first place Any idea how to approach that? This is a really common Problem when running HPAs on your deployments. So you can deploy a deployment spec without a replica count if You do this if you leave out the replicas field in your deployment spec Koreans will assume one, but it will also then defer to letting the HPA set the desired replicas And so you don't have to specify it in your YAML and this actually goes for any deployment system Not just Argo CD if I'm using an HPA with my deployment I want to omit the replicas field from my deployment YAML So that when I do a deploy or when Argo CD sinks I'm not overriding the desired replicas of that HPA. We've actually seen behavior where you have a deployment That's being scaled up by an HPA. It's scaled to say 20 pods You redeploy that YAML that where you've specified replicas one and it will immediately jump down to one replica And then jump back up to 30 because the HPA will take back over And so you really want to avoid that situation by omitting the replicas from your deployment YAML Whenever you're using an HPA very common issue easy to fall into that Yeah It's a good question Can you see my terminal now? I Cannot I think the host might need to add It's fine, I will just I will just share the tab specifically All right. How about that? Oh Now we got to terminal Okay, great. So I'll just have to go in and out when I switch back and forth and that's that's fine We'll just be a slight pause All right, so I wanted to show How I want to show what that metrics endpoint that metrics server is serving looks like so this Q control Command can be if you pass in the dash dash raw flag in case you didn't know you can access API paths directly Which I think is pretty neat. So if I Okay, so if I hit that end point You know, you can see here it lists all these metrics. So every container every pod that's running in the cluster It has metrics for every container Within that pod, right? So this is is what it's using or what it's what is scraping This is what metric server is making available to the HPA in order for it to make its scaling decisions So we're going to go into This basic demo We're going to show how HPA works, right? So we're going to go into this basic demo The namespace and you know, we're going to see what we have in here, which is essentially If this is working, yep, all right, so we have a deployment called basic demo where I have set replicas to two And so it's running two pods and has a service. It's a type cluster IP, right? So it's just a generic demo site and I think I think I have it set up as ingress so I'm gonna go ahead and I'll be that. Oh wait, that's gonna be a problem I need to switch back to my window my to my I need to switch back to my Sorry y'all, um All right, let's try Going over here, right? I just wanted to show you the demo app. Can you can you see this tab? Indeed. Okay, great. I can So this is just the basic Kubernetes demo. There are two pods running Um, nothing super special, you know, I've got my website going Um, nobody knows who I am. No one ever comes to it. So, you know running two Uh replicas of this are totally fine. It is a totally fine thing to do, right? um, and then one day, you know, I have an incident at Five guys and next thing I know I've gone viral everyone's going to my Everyone's going to my website because they want to know more about me, you know, and I've only got two pods running Um, and pretty quickly, you know, I will either cash out on cpu or memory So I'll either start being throttled or you know, I'll get them killed and that's a bad experience For my newfound fame. So I want to scale up now one thing you could do obviously and uh, you know, this is the way things work for a while You go and you can like run a you know scale deployed basic demo You know and set the number of replicas to something and scale up, right? But this is kind of guesswork and it's not particularly quick, right? If you're trying to do this in a Reasonable smart way. You're going to want to look at metrics You want to look at your graphs and stuff and make a guess about what you need and it's you know By the time that happens your window of fame is over, right? So you want to move a little more quickly than that and hpas are Super easy to deploy the way that I've shown it so far, right? So you can Literally copy paste this command Over here and you can run this due control auto scale command and We do that What we'll see Is that an hpa? um has been created Referencing the basic demo deployment. We can take a look at it The detail take a look at the yaml Right, uh, you see the status of it. Um, you know what the last Recommended action was you can see the resource this should look very familiar because we just saw this Uh in my slides, right? Um, so average utilization in 30 percent. We're going to scale up and um And um, you know, but we're going to keep mid replica So at two so at any given time, we're just going to keep running two of these pods But if you know now that i'm famous and people are starting to come to this uh come to my website It will scale up accordingly. Um under under uh pressure like if c if it comes under cpu pressure. So if we do, um Get pods right and we watch And I have uh Okay six running over here Nice Pretty sure Right, so if we run k6, we're going to hit this with some traffic and I'm doing a watch on the pods. I'm not I I am not fancy like you andy. I could not uh make myself work with um team Yeah, it's okay. It's fine um Can we do a cube ctl get pods comma hpa and launch that? Be curious to see the metric Changing or not. I don't use the dash dash You do that Yeah, oh, yeah, so we're 100 cpu utilization now Yeah, and as you can see we spun up two more pods And uh if we keep going I think we're going to see A lot more pods being created and so, you know, I have been able to um Run to my computer tippity tap My my little command there and I'm scaling up, right? Um, now there are obviously more Pieces to this right like as you're contained like we were talking about all the other scaling and specifically Cluster auto scaling and as you create more of these pods, you know They have to have a place to lay them and you might run into some problems there That's a topic for another time. But you know at its very basic level you get um a number of pods that are spinning up Uh that will hopefully help your workload deal with the increased load, right? And that's done and so what'll happen. Um, is there's a A certain amount of time that these will sit around here I think you see there's a bunch of them There's a certain amount of time that they'll sit around and then uh the hpa will start to scale them back down Once that threshold once it determines based on the metrics that it has available to it in uh in that metrics endpoint and um, you know your scaling rules it'll start to scale down And eventually get back down to You know two pods, right once my moment of fame is over Uh, then, you know, traffic will drop and people will stop coming to my website Um, it'll the default is that it like there's like a five minute window. Um, I And I left it at that so it'll be a while before we see Um those pods start to drop but that is essentially right the Um horizontal pod auto scaler In a nutshell, right? That's uh super simple to use um It's super simple to deploy like I I was able to deal with my sudden fame with no problem Right, we're gonna need to see a youtube video of this uh five guys incident The audience is requesting it so I'm definitely gonna need to see that um So given how easy and simple this is one might ask why do we need anything else? Why do we need something like a keta? Well, that's because hpa does have Uh some limitations. Um, one of the limitations is that it can be complex to set up other metrics, right? So the spec that we looked at was a very basic, uh, you know scaling on cpu and and you can scale on memory Um, but what if you want to scale on something else, right? What if you have apps deployed in your cluster that you made which I'm sure you do, right? That you've instrumented to be scraped by prometheus um, and you want to use those metrics to scale your app or your um You know, uh talking to a queue that lives outside of your cluster like amazon sqs or something Right and you want to scale on the number of uh messages in your queue Um, it's doable, but it's not particularly easy and it can be a little it's a little complicated. Um to set up, right? So, um in the api server, there are two additional metrics in points, uh custom metrics in point and a an external metrics in point That you can register a metrics, uh server to Um use right and and expose metrics at those endpoints in the api server Um, they require what's called a metric adapter um and metric adapters are provided by Uh usually provided by the thing that is providing your metrics, right? Um And they are a bridge between the source of your metrics and the api server Um, you know, we talked about metric server and it is a bridge of sorts as well Uh, and then these external metric or these metric adapters are also um Are also bridges to do the same thing Prometheus has a metric adapter that is an additional, you know, installation Data dog, which we use as well internally Has a metrics adapter that is built in I think to um One of the to the agent And you have to register register those adapters with the api server and you know This diagram down here kind of gives you an overview of like Some of the pieces that you have to deal with if you're trying to do something as simple as Use your metrics uh from your app like we have in this uh in this example of um Of an hpa spec that uses an external metric if you're doing something as simple as trying to get your metrics in for um auto scaling In addition to your typical deployment and the pods that again, you've instrumented at the slash metrics endpoint that Prometheus is scraping You also have to configure Uh, your metric and the permit you have to install Prometheus adapter And then you have to configure the adapter with your custom metric so that it's available um for kubernetes in the in the api server Um, that's the part that I find particularly difficult with this setup is that in the adapter You have to write the query to create the metric as part of the configuration for the adapter And then you write your hpa to reference that metric So you have to modify configuration in two different places just to just to scale On a any Prometheus query Exactly, and I could write a Prometheus query in five minutes But now I have to go put it in three different places and anyway, right, right Yeah, that and that that makes no sense and uh, I wasn't like I'd never tried to use the Prometheus Prometheus to scale um do custom metrics um And I was like that can't be true like the metrics are right there. They're in Prometheus Why do I need to like repeat that process with another? Uh with another add-on That's silly, but it's what you have to do in the Prometheus adapter is that bridge Um, it sits at the custom metrics api endpoint and horizontal pot autoscaler knows that it can look there for those metrics for for an external For an external metric type, right? But uh, you know, so you can set Prometheus up and and uh, that works for like custom and you can set up An adapter like say for example for datadom, right? And that would use the external metrics api the difference between custom and external Is truly just the source custom metrics Um still come from inside your cluster and external metrics are typically from outside your cluster, right? So here at Prometheus sitting at the custom metrics api endpoint serving up your app metrics You've got datadog sitting outside And then what if you have something else that you want to get metrics for outside your cluster? Now I've got to get that metric into either datadog or prometheus And then go through the the other three steps. We just talked about that's right. That's right And you can only have one metric service. So another limitation of hpa is that you can only have The metric server one metric server serving the external metrics api at a time So you kind of have to even decide in some cases Where um, you want your metrics to come from? So that's a limitation of hpa HPA is limited in the objects that it could scale Um, so I think by default with uh, you know resource metrics it can scale on deployments, um and staple sets Not daemon sets And hpas can't scale to zero. So if you have workloads that you're just like, I know I don't need These pods that you're running all the time, uh, you know, you still have to run like a minimum of one That's the default If you're scaling off resource metrics in in hpa so That's why we have uh, keta, which um, you know bridges some of these gaps and or feel some of these gaps and You know extends hpa. Honestly, I when I first was learning about keta I thought that it was a replacement for um, horizontal power autoscaler I didn't realize that it worked in concert with it to do its job, which is which is pretty cool um So keta is a cncf um, uh, project and it was developed I think by red hat and microsoft I want to say um, it stands Huh, I don't actually know Uh, it stands for kubernetes event driven architecture, right? Um, so the basic high level pieces of keta um, and uh, I'll I'll uh Drop into other, um Yeah, I'll drop into other windows to show some of these things, but uh, we have the scalar Which um is responsible for fetching metrics. So, um, the scalar can talk to an external source of metrics And we'll look at what those scalars look like How you implement them But they're either built you can there's a bunch that are built into keta. You can also make your own Um, but they fetch metrics from an external source They also watch that external source, you know constantly for those metrics and then based on a trigger Or based on some event that you uh describe that you configure It will then send the metrics to its internal metrics adapter. So there's a metrics adapter involved here as well that's um registered at the custom Metrics api endpoint and it will put those metrics there and then the controller Or operator is responsible for bringing the workload up to this the desired state based on the trigger and the metrics and um uh, and There's actually a thing that I learned. Um, that's pretty cool Um, that I'll show that it will go a little bit more to detail if we have time when we look at it a scaled object Reference it's different is that keta does uh, it's scaling in like two phases actually There's a phase where it controls scaling and then it hands it off to the hpa to do further scaling um So those are the basic elements behind uh keta operation Uh, when you saw keta in your cluster, uh, it installs, uh, crd's trigger authentication and cluster trigger authentication are crd's That provide objects that allow you to set up authentication to your external trigger to your external sources Um, and the difference between the two is the same as like cluster role binding and role binding One is namespaced and the other is available across all namespaces um, and then there are scaled objects and scaled jobs, which are resources that you define that tell keta How to create an hpa because under the hood that's what keta is doing essentially it is, you know Getting these metrics and then creating an hpa based on your um On your definition and then getting metrics and watching the source that you defined in your hpa and helping cluster helping the horizontal pod autoscaler scale um Let's look at real quick. Uh, so first we'll look at scalers Right. So, um, we'll run keta site There are 64 scalers available just built in out of the box, right? So they're All these scalers and each one of these defines, um, the way we are still seeing your No, I don't know what's wrong. Let me go over to this other page Screen sharing will be the end of all Forget it. I it's going to be screen sharing Screen sharing Curse you screen sharing. All right. So here's the keta, uh scalers page. So like I said, there's a 64 available um, and all of these uh define the parameters that you use to uh, both Watch and interact to and get uh events and data from the um, the source and uh, how you authenticate with the source um Oh, this is An unfortunate Thing that I have to keep doing in I would have made this so dynamic if I'd known all right, uh So here's an example of a scaled object spec. Um, I would have gone back and forth between them But that's going to be a pain. So we'll just look at this um So this is a a permetheus scaler and permetheus scaler is defined on that page that I showed you And so these are just some of the triggers that the permetheus scaler makes available activation threshold And threshold so this is actually related to the different uh scaling Stages that I was referring to earlier activations threshold tells you what uh threshold the metric has to be at for Keta to do what's called activating the scaler Where it will scale From zero to one or or back and then uh threshold is the scaling threshold and that's the Threshold at which it will then hand control over to the hpa and have the hpa continue Doing it scaling based on Its configuration, which again came from keta So we have like a metric name and a query the server address for permetheus um, you know Like uh I feel like I'm running out of time. So I want to Pop over to my terminal real quick You're pulling that up. I'm going to answer a quick question from the audience Asking about does keta support cluster auto scaling? Does it increase and decrease the number of This says clusters. I'm guessing maybe nodes here based on the defined metrics and the answer is no keta is specifically focused on workload scaling Not on cluster scale and you really need both you need a cluster auto scaler And then you need to scale your workloads and those should work somewhat in conjunction Okay, so uh real quick. I have until 145. Is that it? I thought we had to the hour Oh, and do we have to the hour? Okay, then maybe I don't have to rush through so much. Yeah Great. All right. So um, so uh, I'm over my terminal. You can see that, right? Yep Awesome. All right. So um, what I've done, uh, so we're gonna We looked at hpa and how that works with the basic demo app I have another app called boutique that I've installed here Um, you have to take my word for it that it works because I'm not about stock sharing you go back over to my browser Um, but it's there. It is an app. It's a web app that you know, you get uh online Uh developed by google, I think um And here are all the pieces that are running. It's just a basic web app and it uh shows a shop, right? Um, basic web app with 10 services very basic These days it feels like that's basic, but yeah, I get your boy Um, so a particular interest to us is uh, this deployment, uh friend Which is what um, what uh provides the ui for for this fake website, right? um So we're going to see what it looks like to use keta to scale up Our front end deployment Instead of using uh, hpa, right? So, uh, like I said, this is my deployment and I have here locally a keta object called boutique scale. So first let me actually just Um Let me just show you the crd's that are installed in my cluster. Uh, because I saw keta Um, so there's trigger authentication which in cluster through trigger authentications Uh, which we touched on in that slide Uh scaled objects and scaled jobs. I actually didn't touch on the scale jobs. Um The scale jobs are just essentially what it sounds like Instead of scaling pods That are sort of long lived until you no longer need them it scales Your workloads as a job and your job Will pull a message off of a queue Run, uh, you know process it and then go away. So, um, uh, it's supposed to be particularly good for workloads that have long running tasks involved Long running tasks that end is the key, right? Um, so but yeah, it's great for doing queue processing. Okay And this one have you used this cloud event sources as in super alpha? I think I have not I don't even know what that is. Okay. So that's the crd. Uh, but you know, we don't have to We're not focusing on that because what we're going to focus on is http scaled objects So we talked about scaled objects and that uh, I think, you know, we covered that pretty well Um, and I was going to show a demo using that using Prometheus as a scaler Prometheus and I had a fight. Um And I decided that rather than continue that unhealthy relationship I was going to pivot to another crd that is very similar Um, because ultimately you just want to see what this looks like, right? So Forget Prometheus. I don't know you so this is an http scaled object. It looks a lot like The scaled object that we saw in the slide, right? It's got the basic kubernetes yaml portions here name and name space The spec is like a kind of a combination of An ingress control controller almost because of the host portion Um, but it has this familiar scale target ref section here. And so this scale target ref Uh, it tells us, you know, it's a deployment. We're looking for a deployment named front end Of using apps v1 version and it also includes the service that that deployment Is going to be backed by and then we have our number of replicas min zero max 10 remember You can't scale down to zero With an hpa. So, uh, we're going to start off at zero here And then target pending request is how many requests have to What it sounds like has what how many requests have to be pending in order to start scaling up and Scale down period. You don't have to uh, specify but I set it to 60 because I wanted to show them coming down to and Without that it would take five minutes just like anything. Uh, just like the basic demo pods and actually just to show you the basic demo pods I'm no longer popular. Um, just regular old steve again So that's sad So what we're going to do, uh, so in this, uh, boutique space, right? There's no hpa Um, and we have one pod of the front end So we're going to apply that manifest that I have down there boutique scale dot yaml And a couple of things are going to happen One, we're going to create this actp scaled object Uh called boutique Right, so we'll go in there and we will see Uh, you know, here's the spec Here's the target. Uh, I think this is a deprecated field. Um, that is going to be removed eventually, but it's um So it's empty because I didn't use it and then you know, we see that it is Pending creation of our hpa So if we go look at our hpa Uh, yep, so it's uh, so it created an hpa for us Based on the specifications in our actp scaled object YAML there This is what the hpa that, uh, keta created for us looks like, right? So we have Uh spec max max replicas 10, uh, min replicas one Um, you know, we have our average value target and we've got the metric that we're looking for Um to get that and then we have the actual, um object that we're going to be trying to scale Yeah, and what I love about this is that that metric there It's pointing at the external Uh metrics api source and that metric has a unique name associated with this scaled object But we didn't have to create any of that. We didn't have to manage We don't have to manage the metric the external metric server We don't have to write a query to satisfy that metric keta did that for us under the hood And just pushed it out there into this hpa object I don't even really have to care about the external metrics api myself, which is great because it's yeah Right. Yeah. No good call out. Yes. Uh, keta took care of all this for you. So you have this hpa now Um That keta created so an interesting thing if you come over if we look at the pods again Um, we'll notice the front end pod that was running is no longer running Um, because we deployed that actp scaled objects It took over and it scaled that deployment down to zero. So that pod is no longer there right so I went to windy's And I had an incident at windy's Are you cheating on five guys? Yeah. Yeah, it's an incident at wendy's. All right new incidents at wendy's and incident at wendy's and now I am in the news again, right? So i'm gonna grab my host's name A window that you can't see and that's fine I'm going to update my Uh script Uh, oh All right, so i'm gonna update my k6 script and we're gonna do the same thing that we did before right. We're gonna do Watch pod hpa Or if we can't watch it, but we'll look at it here And then we'll go over to k6 and we're gonna run Our deal And you can see immediately That was quick. Yeah, you can see immediately. There's uh front impods Being spun up to deal with my second bout with infamy So we went from zero to Jenkins a lot Although I this this format of watch makes it look like it's probably a lot more than this. Yeah But uh, you know, as you can see we're spinning up front impods And we literally just created this This object in keta and told it this is what we want it. It took care of the rest of it. Um, so not so, you know Just as easy I think as using the hpa But also, um, I was able to scale on something more powerful and maybe more significant for me Than cpu or memory, right? I'm scaling on htp metrics in this case um But again, you know with all the scalars that keta provides you could be scaling on kafka active mq Prometheus you can even scale on uh those base resources like cpu. They have a scalar for that as well That's awesome. Yeah, that's awesome. We do have a question related specifically to this and I think it's worth addressing So the question is the pod is no longer there meaning that if there's a request it won't work until the pod is created And I think it's important to note in this case with the http scaled object You're actually using an add-on to keta, which is the http add-on And it proxies the request And holds it open until the pod comes back up. So, um, that http scaled object again is is part of an add-on to keta It's a really useful add-on, especially if you want to scale http services to zero But there are other types of scalars as steve noted That can scale to zero and if your scaling metric is something else like maybe a q-based system or some other event driven type of architecture component You might be okay with no pause running. We scale it up as soon as there's information ready to consume Because you don't always have to use the http scaled object to scale to zero It just handles those http specific scale to zero situations where you need something To proxy the request so you don't get a 503 or 504 before you Before you spin up the pod Yeah, yeah And as you can see I finished doing my uh, you know, my fame was even more fleeting this time and The front end is gone. No one's hitting my website anymore Um, I've been forgotten very quickly So so quickly and my fame is fleeting So yes, uh, so that is uh, a very quick demonstration of Keta and like Andy said, yeah that http scaled object is an add-on I bet you can't see this tab. Can you? You can't see anything at the moment. I would like to know we are at Seven minutes, and I know we wanted to leave a little room for Q&A. So if you want to Is there's something you want to wrap up real quick and then we can jump to Q&A or Um, good All right. Yeah, let's take some questions. All right. Well, keep the questions coming. I know we have a few I think katie is here. Thank you to help us with the questions We cannot hear you. We can't hear you. I'm sorry Sorry about that. Uh, it looks like you were pretty on top of it. Andy with answering a few questions. Um We did have an earlier question come in I'll uh Post it to the screen Um Now for you Yeah, so we did address this one partially, but then there was a follow-up question to it as well So I'd like to just clarify again when you're using hpa with Uh, I when you're using hpa at all in any form and you're deploying your deployments Your deployment yaml you need to omit the spec dot replicas field Because if you set that whenever you deploy that yaml it will overwrite the desired replicas temporarily And then the hpa will take back over so you'll have a brief disruption And so the spec dot replicas field must be omitted anytime you use an hpa regardless of what's creating that hpa So even if we're using keta When you do omit that replicas uh field from the deployment or the stateful set if we're scaling stateful sets Any additional questions? It looked like you were really great about answering them. Um on the fly andy So I don't see any Yeah, there was one I think based on hpa That was asking if you can scale on metrics other than cpu in memory I think we kind of covered that with the demo Going on but um, you can definitely scale on lots of other stuff It's just there's more complex setup and doesn't come out of the box right um, there's also a question about Uh defining several hpa's with different hpa percentages for different cases Yeah, that was an interesting one Yeah, I don't know how that works outside of Because you can combine multiple metrics in the same hpa and it will evaluate them and whichever one I think makes the biggest change Is the one that's that's what it'll scale to um, but I don't know If that's the same with actual separate hpa objects Yeah, I don't think you'd want to I think they'd be fighting each other and I don't think you know If you wanted to deploy one at a certain time and then deploy a different threshold at a later time you could do that But in general one one metric one threshold, especially with the basic hpa Now with keta you could have different activation thresholds At which your trigger takes over so you could have multiple triggers That have different sort of metrics or thresholds that you want to scale on And then set different activation thresholds that would be a pretty complex situation actually stevie showed a little bit of this with the first scaled object that she put up because That scale object had two triggers in it. It had a cron trigger Which was designed to scale the deployment up to one and then back down to zero at certain times of day And then there was a request metric. Yeah, so this is it. So this this one I'm actually trying that I wrote this one. I think it's funny you grabbed it. Um I'm trying to write a deployment that will scale down at night because it's a development environment And then scale back up in the morning But if somebody wants to use it during the time when it's scaled down They can send a few requests at it and force it to spin up And that's why that activation thresholds point zero zero one Because I want to be able to just generate a little bit of traffic against this pod And then force it to scale up even when we're inside that scaled to zero window That's in the cron trigger. So this is one example of using multiple triggers with a keta scaler Great. And then another one came in i'm going to go ahead and put it on the screen um Francisco the Configuration of the scaled object seems to be a prometheus query. Could it be configured to connect with prometheus alert? I think the answer to that is no, but at the end of the day prometheus alerts are just prometheus queries so You could just rewrite the same query into your scaled object steve does that sound right? Yeah, I mean the query. I don't know what it would I don't know what it would do specifically with um an alert. Um, so yeah Yeah, I think you just take the query that drives your alert and use that as your query for your scaled object I don't think it can connect directly to alerts specifically Great And then an additional question Uh that came from roy. Can any of these methods help with the right sizing replicas count based on previous baselines? i'm not sure I quite follow that one. So if I try to Dissect that a little bit right sizing replica count based on previous baselines So there's a couple possibilities in this one one of them could be that We're talking about maybe historical analysis right where we have a pattern of scaling where we're at 3 p.m Every day we always scale up because that's the lunch rush or whatever um And so that could be some the basis of this question And I don't think this really necessarily helps with that That would be more predictive auto scaling type things that I you would have to write sort of your own Uh trigger for or maybe use that cron trigger that I was talking about based on your own analysis But it wouldn't do the analysis for you um But other than that most of these things are reactive right all of these different triggers are reacting to a specific event that happened We crossed a threshold of traffic or our queue filled up or Whatever that trigger is and so they're very much reactive not proactive great And then we have time for one last question from conrad Is it possible to scale based on the deployments resource definitions or is it better to put this information within the scaled object? If I resource definitions, I say me mean like what you're putting in for requests and limits with your resources in the In the uh in the deployment spec for example, and I don't think that that's uh No, I don't think that's the thing you can do No, but it is No, I don't believe so Well, I think it's important to note that when you set like a basic hpa And you say I want an average utilization of 30 Right, that's 30 of the cpu request for that container And so we calculate utilization the maximum is the actual cpu request that's defined So the deployments resource definitions do affect auto scaling When you are scaling on cpu or memory So that's an important thing to note That's true. Yeah Yeah, cool. Well, thank you both Andy and cv for joining us today and giving such a wonderful presentation to our audience As a reminder the webinar recording in the slides will be online later today on our online programs that you can find on community.cncf.io We look forward to hosting both you andy and you steve at another future cncf webinar Thank you everyone for joining. Have a great day