 Well, hello everybody. Welcome to KubeCon EU. My name is Kyle. I'm here with Mario as well, and we're here to talk to you kind of about our story of the Battle of Black Friday and proactive and reactive auto scaling at StockX. Before we start, we kind of want to introduce ourselves to introduce StockX a little bit. So some of this stuff makes a little bit more sense when we're walking through it. Like I said, my name is Kyle. I'm a senior software engineer at StockX. I like GraphQL probably just a little too much. If you guys ever want to talk about it, our Twitter handles are in the lower right hand side. Feel free to hit me about anything GraphQL. I've talked at a few other conferences about it. It's a fun passion of mine. And Mario, do you want to give yourself a little intro? Of course I do. Thank you, Kyle. I'm Mario Loria. I actually just have StockX a few months ago for CARDA. I'm a YAML official in ATO, DevOps, SRE engineer. I love everything, infrastructure. Of course, love Kubernetes and auto scaling is one of the funnest challenges. And Kyle and I have a lot to discuss today. It's going to be a fun session. Like we mentioned, this happened at StockX. A lot of the stuff is based off of work we did at StockX. StockX is a stock market of things. We are a live bid and ask model where people will actively bid and ask things just like a stock market. But instead of trading stocks, they will trade normally some type of product. The normal instance, the stuff that most people know is some type of Jordan or some type of Nike card. We just expanded to collectibles. So trading cards are a big thing that we do now as well. We also do some electronics. But again, it's that live stock model where people will be actively asking for something and bidding on something and when they match, that's when you get your item. So the first thing we want to do is we want to really frame this problem of what was our problem that we had to solve and what is this battle that we had. And the big thing is, it's the surges in traffic. Our traffic is very unnormal. So like a lot of places, you would expect your traffic to go up through the day and down at night and kind of follow that nice little easy wave. We don't have that, unfortunately. We have these very, very spiky trends where all of a sudden Kanye will feel like he needs to drop a shoe and all of a sudden all of our stuff tanks and we go from zero request to a million requests almost instantly. And that's kind of where this, a lot of these talk will come from is we have these massive, massive traffic spikes very, very fast that cause us to do some pretty cool things. And to give you kind of a real graph on kind of what happens, this is actually our request per second and our edge layer of what happens when a push notification for a marketing team happens. As you can see, we're hanging out around 15,000 requests per minute. And then all of a sudden we hit 80 K within 30 seconds. And how do we handle that without going down and without having a lot of problems? And originally what we, we did is we would do some good old coob CTL commands and make some things work and scale some stuff up. And that's how we do it. We would manually go through, we would add notes to our cluster. And we would also go around and say, Hey, did, did you, did you scale your stuff up? We have something coming. And the thing is the question is, does this work? And I mean, it kind of work, kind of made stuff work. We didn't go down all the time, but we sometimes went down. And that's kind of what happened, right? We had these two major problems with that approach. One is we had tons of tedious human interaction where you had to remember you had to do it before. And there was no formal plan around this. So it was a lot of just me and Mario and a few other people on our teams running around saying, Hey, did, you know, marketing is going to do something at one o'clock today, we think, or Connie might drop something tomorrow. Are you guys ready? What time do we need to do all this stuff at? And that was very time consuming and painful. The other big thing that really, really came out of this is we had a very hard time figuring out what everybody else was doing to a point where we would get everything. And some, you know, we would have up to a hundred microservices and then we would have to run around and make sure we had all of these things at once, which is really, really painful. The other thing too is as our push notifications evolved and also started hitting different services. So all of a sudden what we would work yesterday would no longer work today because there's a new usage pattern in this and that was always hard. And trying to keep up with all of the stuff as it was going and as more things are happening, more engineers are coming on, more services are being developed is a very hard problem to solve and stay on top of. All right. So let's look at reactive scaling. So this is about proactive and reactive scaling. I'm first going to talk about what we did in the reactive realm, which was very SRE focused for the most part. And I really believe that dialing in the reactive components with HPA and cluster auto scaler, which I'll get into is really, really pivotal for you to understand the types of challenges that you're going to have in your reactive scaling journey before you go on to build something else. So let's get into it. Let's first basically the key thing with our infrastructure at StockX, all EKS, we use cluster auto scaler, which is just the existent helm packages for that. Pretty simple. We fully manage our nodes, which means we use AWS's unmanaged nodes. This gives us access to configure Qubelet certain ways or run things like node, local DNS. And then with cluster auto scaler, as you work on that, it actually automatically taps into AWS APIs to interact with the auto scaling groups. So there's a vibe that you get for free, but there's a lot of setup as well and a lot of dialing in tolerances, figuring out the optimal intervals. We don't want flapping. We don't want nodes that were just added two minutes ago to get removed a few minutes later. And really ironing out that core decision making of when do we scale up versus when we scale in. And those are both calculated in different ways with cluster auto scaler. There's plenty of documentation on this on the Kubernetes GitHub as well as the Kubernetes document sites as well. And so you really pretty much can get really, really far. However, there are definitely some nuances. This is not perfect. Things like running cluster auto scaler across multiple zones takes a little bit more effort and a little bit more watching and maintaining what's going on, dialing in those configuration values and the such, right? It's very dependent on your workloads as well. So I would say status and metrics are really one of the key things that we tried to understand early on so that we could get a good grasp on how to manage cluster auto scaler in our kind of environment with data dog and some of the other ways that we took in metrics and got alerts, et cetera. Of course, as I was saying, definitely not perfect. Getting scale in just right is really tricky. And you obviously don't want to waste, you want scale in to work, but it is really something that you have to iterate on. There's lots of testing we had to do there. We saw a lot of unbalanced decision making, which was hard to understand. I mean, we again could see the status of the cluster auto scaler and what it was doing, but we were wondering a lot of times why it was making certain decisions. So definitely, you know, you're going to need to watch this and again, dialing in those challenges will help you get there. And then definitely actions after burst. So let's say we just added six nodes to the cluster. Any actions that needed to happen where we added more nodes a few minutes later would actually be somewhat hindered by probably AWS API rate limits, which is never fun. And that is not very clear to you, right? So that's something you have to dig into. And then we had to even customize our ASGs beyond the standard that was created as the auto scaling groups. We had to do a suspended action on those or else they would kind of fight against the cluster auto scaler, which again is not ideal. But this is all to say, it got us far enough and we were able to comfortably use cluster auto scaler and trust it was making a good decision, not the best ever decision, not the most cost effective decision, but it was better than what we had before, which was nothing, right? And so we had this sense that our clusters could scale in and out with the capacity that we needed. And scaling out is actually a pretty fast operation when it comes to cluster auto scaler. So definitely good enough. I'm sure it's a lot better nowadays. That takes us to the horizontal pod auto scaler, also known as HPA. So first I wanted to show this is a before kind of snapshot of our traffic. And I believe this was interaround during a Black Friday event with one of our key services. And this would be a backend service, just one of many services, microservices running in our production clusters. And as you can see here, as traffic starts to ramp up, it's very clear that we do not handle that well and that service does not handle traffic flowing its way that rapidly well at all. And with cluster auto scaler and of course HPA, we can see that we are really trying to even this graph out. And obviously there is a jump, but it is nowhere near the jump it was in the prior graph. And we're actually taking more through but as well, which is really great to see. So let's kind of get into how HPA makes its decision. And this is the equation it uses. And so basically what it's trying to get a sense of is how close to a ratio of one are we with the current versus the desired metric value. So we actually did all of our scaling with CPU. And so really for optimizing the HPA, there's kind of different facets that you're looking at. You're focusing on obviously one application, how it is working. And so with CPU, you're looking at your resource requests and limits, mainly the requests. And so how we did this is at Stock Act because I focused on one particular service and this was our front end. And front end written in note actually lived in its own cluster environment. And so it was an easy kind of boundary where we felt comfortable working on just this instance. Of course, we did this in Dev and staging environments before that solely on CPU. And so a lot of this was analyzing the trends. Of course, this is all manual work. You're using your data dog tools or using your K9S terminal user interface to look at the kind of live values for our front ends. The thinking we generally did was tuning our requests first to ensure we were requesting something that was sane. And then we could scale based on what that was. And again, this is the request of all you for CPU and memory, but the horizontal part of the scale are only cared about CPU. And then we would tune the CPU utilization for our horizontal part of auto scalar definition. And those were co-located with the application themselves and their helm deploy. One of the key things that we really like, and I'll mention this multiple times is building in buffers, building in a safe overflow. So if an application is deployed and it has six instances, we would have maybe eight instances or 25% or more capacity inherently that we wouldn't consider as waste, because if there was some sort of operation or marketing push, as Kyle mentioned, we would be able to at least handle that traffic for some period of time until we got more scale. The other part of this is events and knowing what's going on, especially during the testing phase. Kubernetes makes this relatively easy to find. It's a little bit tricky looking at the deployment object and looking at the HPA object can get a sense of this. And a lot of platforms, I know we were using Datadog, will let you feed this in. And then it's an event that you can act on as well. Those are really important. And so you can see that we managed to get to 40,000 requests per minute under 300 millisecond response times, which is a huge, huge win from where we were before. And again, this is just kind of flattening those curves out when it comes to response times, making sure error rates are staying low. That's huge. And we, again, good enough, it was never perfect. This is unlimited. You can dial in, you can kind of react as much as you want to and try to get it as perfect as possible. That's unlimited time you can spend on that. We couldn't do that. We definitely spent plenty of time testing it. We had many different scenarios, as Kyle mentioned, when Kanye feels like doing a drop, we had plenty of data to go off of. So definitely dialed it in really well. And this is me listening to music as I'm coding and working on HPA. So HPA was fun. Here is Kyle to talk about proactive scaling. Thank you, Mario. And cool. We're going to go back to the other side of this. We're going to start talking about some proactive auto scaling. And HPA is great, but it's not always enough. It's great when we have these organic, normal ups and downs of traffic that normally happen in services. But when we have traffic drops, you're fine too, right? Like it's really only a problem when you have a dramatic step increase in traffic. And you can kind of see that in Mario's graphs before when that traffic would sharply increase. That's really the only time we'd have a problem when we had these drops or you know, it's just a normal slow flow up or down in traffic, you're normally okay. And that's what the problem we're trying to solve is. That's where our battle was is when we have these very big spikes in traffic due to a push notification and or a drop or something that would cause a ton of people to instantly open up an app or instantly show up to a site. That's when it's not enough. And the reasons why it's not enough is like Mario said, we tried to prevent flapping a lot of times in our HPAs. So we don't constantly go up and down, up and down, up and down, but that's actually kind of hurt us in this case as well, because we have these delays built in to make sure stuff is enough. We also kind of hold ourselves back when we go way over that and we're constantly going up. It actually waits for a little bit before it actually starts scaling yourself up to make sure it's something that's not going to cause these up and down this flapping. And that's also one of the problems, but there's also a few other things layered on top of that. That's kind of what happens, but the reason that happens is a few things mainly between your configured delay, which you're telling it to wait for. And you're also your metric intake. So you actually have to wait for these metrics to come in and say, Hey, we have these, this large increase in traffic before you can even really start to say, Hey, we should start scaling. Or in a lot of cases, Hey, we should scale in five minutes if we still have this hard traffic, maybe not five minutes, but, you know, some timeframe in the future. And again, a lot of this stuff is really time sensitive. So if you wait that time, you wait that configured delay time, your app's already dead in the water at that point. Most of the times. So if you think of a marketing push notification, if you open up your phone and say, Hey, there's, you know, this new shoe where there's this new thing, you hit it and the app crashes or the app, you know, gives you a white screen, you're normally dead at that point, right? Because people normally don't stay around and keep trying to open your app or keep trying to get to a page, unless they really want something. But a lot of times people will open it up and then kind of, you know, Hey, it's not working in the lead. And that's kind of where our problem is. We had these massive, massive, massive step increases in traffic that would really cause us these huge problems because we would send out a marketing marketing push. Everybody would get a banner, you'd hit the banner and then all of a sudden, boom, our traffic would go up. And a lot of people would ask, how much does your traffic go up? Well, through our math, it actually goes up about 533% and less than 30 seconds. So that is a lot of people hammering our app very, very, very fast. So where we would have these HPAs kick in in the future, it's very hard to have them scale for 533% increase almost instantly and then not have them do that in other times. So you would ask, you know, Hey, what, what does this look like? Give me some data of what it would look like. And here's our data, right? We saw this top graph before. It's our basically our request per minute. And you can see that massive shootup, right? That 533% increase in traffic. What you don't see before is this air chart on the bottom. And this is what would end up getting, you know, really bad for us is you can see these lines constantly going up. And at some points, we would hit over 90% airs. And sometimes we'd even hit as high as 95% airs, which means, you know, one out of 20 people is not having a problem, but 19 people are. And that's really, really hard to deal with. And going back to the scaling parts here, this is our pod counts and our pod charts. You would see here, right? We have these very nice little curve up, curve up, curve up where we're trying to catch up. And they're almost perfectly incremented to where that's where our delays would kick off. Like we've delayed for so long, start scaling up to the point where you're, you're good at your next point and take some metrics, try it again. And we would level out sometime after the push. But again, at that point, we've lost so much traffic, so many users that it's not worth it anymore. We would want this to happen all before. And that's where we created something called barricade. Barricade is basically our version of a warming solution where we would spin up all these pods, but we're going to do that before we're not going to do that. We're not going to wait for the HPA to wait for stuff. We're actually going to do it before. And that's what we called it is a warming system. So it warms our Kubernetes servers and systems to basically be ready for that push. So we would do this before. It is also a very, very small event system. So what's in barricade? Two things, an API to intake events. So you would just say, Hey, something's going to happen from here to here and timeframes ISO dates. And then there's an internal crown job that keeps checking to see when those dates are going to come up. And if it finds one, it'll warm it up. And then the really nice part is, so we don't forget about it. Once that event's passed and it's past that expiration date, it'll actually cool itself off and put it back to where it was before. So we're not wasting money and wasting all of the CPU and energy at stuff after the fact. And this is what it looked like as a practical example of our impact on pods. We put it on one service to test it out. And this is the graph we saw. There's a box in the left hand side here where you see that really, really, really strong jump up. That's equivalent to all of the smaller jumps you saw before, but it happened before anything actually happened, which is kind of the point of this. So now when that service got hammered with stuff, we would actually be ready for it. We're no longer waiting for everything to reactively scale and hopefully that it would scale all the way up to the point that it needed to. We're telling it before, Hey, this is where we need to set it. So we're okay after the fact or when this push does hit, we're not going to be underwater, you know, get to the end point of all of our pods to start. And again, I can say all this stuff. It doesn't mean anything unless we have data to back it up and data of why this is a good idea. These are side by side charts of a push notification before and a push notification after we added barricade. As you can see the graph from before, we had these huge errors bikes, we would hop up to 95% errors at points, but we're consistently over 85% errors. But if you look after when we just do basically what the HPA would do throughout this process and after we hit 0% error, most of the time we're under 1% error, which normally is due to, you know, Hey, a bad UU or not a bad UU ID, a bad product ID, or somebody not being authenticated, you know, the normal stuff of people just using the website, but we're no longer getting these problems of we don't have enough resources to really help you with your requests. And now let's, let's get into a little bit of what is what happened when we created barricade. We really had a few driving forces in it. One of the easiest ones that we had was really keeping it simple. We had a process that worked before where we would truly just go and do this stuff manually. And we would manually go around the kubectl commands. We would scale our stuff up and we hope we got everything. All we really did is take that same idea, keep it really simple and just have a service do it for us. So that service now would instead of us running that kubectl command, the service runs it and the service is normally much more reliable than we are because it remembers and it has a system inside of it to remember when and where to do all of this stuff. And the other thing that we really kept simple is it has three jobs. We didn't try to overload this. We didn't try to add a ton of things to it. We truly just said, Hey, it's going to intake events. It's going to warm stuff and it's going to cool stuff. Nothing more, nothing less. It does one very pointed job and it does it very well. But let's start talking about some of the processes we use to make this really nice because it really illustrates how nice it is now versus what it was before. So now what ends up happening is we have a lambda that pulls a third party system to see if there's something going on for marketing. If it does see something, it sends it to Barricade Barricade, then, um, and just that that event knows when it's going to happen. And then that internal cron job will eventually pick it up and just run that event, warm our pods and cool our pods off as well. We've also seen a few other things where we have a micro service do this. So if ours team is putting something in our internal services that schedule this stuff to go do something like an email or whatever, they'll can go talk to Barricade. And the beauty of keeping Barricade as simple as we did is anything can truly do this. And that's the beauty of it. We no longer have to have everybody know this met, you know, these Kubernetes knowledge to know what and when to go do stuff. And, you know, how should I put it up? How should I put it down? Where do I put my pod counts? How far should I, you know, go with the CU scaling? The beauty of this is you have an API call. You make that API call just like you would if you were trying to update a user or reset your password or sign in and Barricade will do the rest for you. You don't have to worry about anything else. And that's the beauty of it. And just to kind of recap quick, what makes Barricade so useful is it's really easy. There's one endpoint. And again, it's abstracted away for you. So you don't have to know Kubernetes under the hood as much as you would really need to. And I'm sure at this conference, everybody knows Kubernetes enough to really get this and done. But if you have other application developers on your team at your company, they might not know that as much. And the ease of use of basically saying, Hey, here's your endpoint. Here's the few things you send it and it just works. And you don't have to deal with it is a huge, huge benefit because you don't have to share this knowledge all the time and force people to learn stuff they might not be comfortable with. The other huge thing about it is it's really set that you can fire and forget it. You put it in Barricade, Barricade will handle it from there. Unless everything you put in Barricade completely wipes itself out somehow crashes your databases crash, you have a full system outage, you can forget about it and it'll just work at the end, which is a huge, huge benefit. And the biggest part of this, the one thing you can take away with one nice slide here is we went to a peak of over 80,000, uh, requests per minute in less than 30 seconds. And before Barricade, we, and after Barricade, if you pair our rates, we had 93% less errors, which is massive. I don't know how I'm ever going to start to feel better about these graphs after this because now I have a 90% that I can drop on a slide and say it's nice. But again, it's a huge win. It's a huge thing. If you just proactively scale for things that are going to crash your system, you can see these awesome takeaways. And now Mario is coming back with just a few nice things to take away from this talk. All right. So thank you, Kyle. Here are the takeaways. Of course, we didn't do everything perfect. I guarantee it. So first I wanted to cover, uh, HPA and if we were to attack it today and implement it for our services, some of the things we'd be thinking about, the space has gotten a lot better, uh, with many more people focusing on this problem of resource management, uh, Goldilocks from Fairwinds Opt is a great, uh, a great tool to at least service in a nice UI, uh, some, uh, possible, based on, uh, vertical pot autoscalar, some suggested values, which could be helpful for kind of providing a self-service solution for your developers, uh, to kind of, uh, implement, uh, more sanity to their, their resource requests, uh, and limits for their services. Um, Stormforge is a, a fantastic company, um, formerly Carbon Relay and they have just made massive strides on providing that loopback of testing, uh, continuously to get a sense of what your resources should be and what it makes the most sense for the, uh, long-term scalability of your application. Um, Keto, which is a cloud-native, uh, computing foundation, CNCF project, uh, is really, really cool letting you scale on things like, uh, your Q-Size from SQS and other, you know, having, you know, other APIs, uh, to leverage other services to make decisions, uh, beyond just your CPU and memory. Uh, and then, uh, with, I believe it is 118, um, there is a slew of new HPA features that are more than just, what is our target percentage? Um, but, uh, you can actually leverage custom and external metrics, which we would have done a lot more. I know Datadog now offers, uh, you the ability to use any metric, uh, in Datadog and scale on that. Um, there's also more container focused metrics. So it's not just the pod, uh, metrics, which, uh, HPA makes a decision. Uh, you can make decisions on solely containers, uh, within that pod. And then, uh, behavior policies, uh, with rate and Keto support is huge, uh, for, uh, HPA and really ensuring, like I was saying with Cluster Auto Scalar, that we have safe intervals, uh, that we are, are, we're not thrashing, uh, thresholds are, are saying, et cetera. And this is unique to everyone. So making those, something that's configurable is really awesome. So, um, I think that's good for that side yet. So if we look at, uh, a few other things I wanted to mention, you know, always keeping sufficient buffer, I mentioned that before. Um, that will always be helpful. I think, hopefully you're doing that already. Um, preferring more instances. We always told our developers if you were ever questioning how many instances lead on the higher side, we are willing to spend the money up front to ensure that your application has the capacity it needs when a traffic event does hit. Um, SLIs, SLOs, SLAs, the SRE methodologies and principles are really important. Um, imparting these on developers is, uh, definitely another story, but there are lots of organizational things you can do, uh, 12-factor methodologies and other resources out there that you can share and help implement these. Um, work with developers, uh, not against them. Uh, you know, there's, uh, a lot of misconceptions around, uh, the kind of offside of the table and the, the development side of the table where developers are throwing things over, they don't understand, they don't care. Uh, I think this is one of those things where their, uh, problems in front of them are much different than your problems, uh, running that, uh, actual infrastructure. And so you need to, uh, really, you know, do a one-on-ones with, with developers. I hope them understand, do, uh, more educational, uh, take those opportunities to educate, uh, as much as you can of why this is important and why they should care as a service owner and with the long-term life cycle. And, you know, they don't want to get pinged by pager duty. So, uh, you know, there's many, many ways you can impart knowledge, uh, in a more, uh, flexible way. Um, what happens when, uh, what happens when you do a deployment and you're currently in a, uh, traffic event, uh, and you are not sure that that is something you wanted to happen because you just wanted things to be stable. So this, this styles into from code, going through a pipeline, getting deployed. What are the controls that you have in place there? This really relies a lot on your pipeline and how you structure your deployments. Uh, of course, everyone should be considering open source and third-party tools. I mentioned Goldilocks. Uh, there's, uh, you know, storm forwards. There's, there's a lot of tooling out there. Spot.io also helps with, uh, waste and nodes as well. So, um, there, there's a lot to consider. Uh, I think just soaking for a little while, testing is really important. Um, and I think just a couple more things, you know, we manage nodes ourselves and that is painful. And so I really, you know, thank you. If you're in EKS already or GKE, uh, you should really look at the other solutions that are provided. Uh, GKE autopilot just released. AWS Fargate has been around for a little while. Um, but really question, do you need nodes? Um, because when we talk about understanding the cost of a particular service and just that service, it's really hard when you're managing your own nodes, right? There's, there's cube cost and other solutions coming out. But, uh, autopilot and Fargate really help say, actually, we're just going to run exactly what you need, exactly what you're requesting, and that's all you pay for, right? You don't have to worry about the nodes being spawned, the size of those nodes, the types, uh, allocated, uh, versus actual capacity. Um, you know, so, uh, moving on, I know, uh, HPs are great. They have their own set of challenges. I hope you, you've seen that. Again, it's, it's trial and error, uh, kiss, uh, keep it simple, stupid. Uh, one of my favorite, uh, favorite mottos, uh, and manual first that automate. I think it's one of those things where you have to do it the hard way before you can do it the easy way, uh, to understand the nuances of, of how something actually works and what's going on behind the scenes. Um, so, I think, uh, the, uh, excellent. Uh, let me talk about myself some more. So, uh, I'm Mario Loria. Thank you so much to everybody for tuning in today. Uh, I look forward to some amazing questions. I work at a company called CARTA. Uh, if you have stock options, uh, you might have heard of CARTA. Uh, we're also doing a lot more around, uh, 49A evaluations, uh, and releasing CARTA X, which is letting people now invest in, uh, private companies, uh, which is really, really cool. It helps people holding stock options, uh, get some liquidity, uh, as well. So, uh, you can visit me on the web, Mario Loria, that dabbling, so all of my, uh, other resources. Uh, and I think with that, I'm going to get handed off to Kyle to talk a little bit more about himself. As Mario and I mentioned earlier, we both work at StockX. I am still at StockX. If you guys are interested, uh, we are hiring at all positions. If you go to StockX.com, scroll down to the bottom. There's a jobs link. Click that. Everything is listed out there. We have a tech blog. It's recent. There's a lot of GraphQL stuff. Like I said, I like GraphQL a little too much, so you'll probably see my face on a lot of the stuff. Feel free to go read it, go talk to me, go tell me, um, it's either great or bad, whatever. My Twitter is on there as well. So, whatever you're feeling, let me know. It's a good time. And with that, I would like to say thank you guys. Thank you everybody for, uh, the talk, for listening. Thank you everybody at the CNCF for putting this on. It's an awesome time. Thanks.