 OK, I guess I'll get started. I'm not sure how to remove the toolbar and stuff off of Chrome. OK, so thanks, everyone, for coming. I really appreciate it. Thanks for the introduction. I'm here from Yelp. As you said, my name's Rob. I'm at SRE at Yelp. I've been there two and a half years now. And I mainly work on the pasta team. So we look after the platform as a service that runs all of the HTTP services and things that actually run the website. If you're not familiar with Yelp, Yelp's mission is to connect people with great local businesses. So whether you're looking for like a pizza or a plumber, the hope is that either using the mobile app or the website that you should be able to find someone that can fulfill that need with Yelp. To give you an idea of what we're going to go through today, I'll give you some background into how we use Mesos at Yelp. We have a couple of different things that we do. So I'll give you some idea of like the scale that we run out and things like that. Then I'm going to talk about autoscaling. So we autoscale at two separate levels at the service level and at the cluster level. And of course, the impact of the service level changing creates or reduces demand on the cluster. And so that's how the cluster autoscaler reacts to those things. And then finally, I'm going to talk about AWS Spotfleet. So I'll give you an idea of what Spotfleet is if you're not familiar with it. Some of the pitfalls that can come from using it, what value you can get from it over regular EC2 instances. And then finally, I'll talk about some kind of strategies that you've got for dealing with the pitfalls. So the first thing, Mesos at Yelp. So one thing we use Mesos for is pasta. Pasta is Yelp's platform as a service. It's been running in production for a couple of years now. And it runs all of our HTTP services and a lot of batches as well. The second thing is Seagull. Seagull is a distributed task runner that we built at Yelp to run unit tests. Our monolith, which still exists now and is deployed with pasta, is about 3 million lines of Python code, runs like 100,000 unit tests or something like that. Anyway, long ago, it became impractical for developers to run these unit tests on their machine. And so we built Seagull as a distributed task runner to parallelize those unit tests for all of the developers at Yelp, of which there are hundreds. Anyway, I'm not really here to talk about Seagull. If you are interested in such things, Sagar, who's a fellow engineer, is talking about that later. So I'd recommend you go along to his talk and learn about Seagull. As I said, I'm here to talk about pasta, which is Yelp's platform as a service. These are the main mesosphere components of it. So we have, as well as Marathon and Chronos are the two frameworks that run most of the tasks. We do have a couple of other things that launch tasks. But we've transitioned to services over the last three years or so. Historically, we had the same monolith that everyone else does. It was deployed with some bash grips and ran on a special class of machines that existed at Yelp. One real big step forward that we've made this year is that we've managed to have the monolith treated like any other service. So it's not so micro as I hinted at earlier. But it's monitored, deployed like any other service at Yelp, which is a huge win for us. It reduces that cognitive overhead of having separate classes, machines, separate deployment strategies, and so on. So what does the site look like? Well, we run three main production classes. So there's two on the West Coast, one on the East Coast. So on the West Coast, there's US West 2, which is PNW. And then in San Francisco and California, we run in the US West 1, AWS region, as well as our own data center. Those two things are joined via Direct Connect. So the latency between them is pretty good. And then on the East Coast, we do the same. We use US East 1. But that was preceded by our own data center in Virginia. And in the same way, there's a Direct Connect that runs between those to reduce the latency. We are moving wholly to AWS. But that physical infrastructure on our own data centers does that exist. We run about 900 marathon apps. Now, we don't have 900 microservices. Each service can opt to be deployed in a number of ways. So if a given service has an HTTP server to it, as well as a long-running worker that consumes off a queue, then that will end up in marathon as two separate apps. But yeah, so we end up at 900 marathon apps in the biggest cluster. The other two are about 2-thirds the size of that. But of course, there are duplicates between them. We have 5,500 mesostars in that big cluster and about 600 mesos agents. And as I said, they span across Mellon and AWS. So the first thing I'm going to talk about is also scaling. This is Yelp's traffic at its edge. On the y-axis, there's requests per second. And the x-axis is time. So you can see it's pretty diurnal, right? Every day, as America wakes up and New York goes and finds breakfast, and then San Francisco wakes up and has breakfast, lunch, and dinner, that's the spikes that you see. And the troughs are just like EU daytime, where we have a low amount of residual traffic. But it's certainly not the size of when America hits. And traditionally, that line across the top, we've had to provision enough infrastructure to deal with those spikes, which, as a result, means that all of this stuff in pink is just like goes to waste, right? Because it's just there kind of hanging out, not really doing much, and costing us money. Especially now that we're on AWS and we're paying by the hour like everyone else does. Not ideal. So we autoscale in two different ways. We autoscale at the service level and at the cluster level. And it's the changes that we make at the service level that, of course, create and remove demand, as I said earlier, for the cluster. And so we adjust for that. So the service autoscaler, in the same way we always had to provision enough infrastructure to deal with the peaks in load, we, when service owners describe the deployment of their service, which they do in this YAML file that lives in a Git repository, they had to describe enough instances to deal with peak load. And so that meant they hard-coded number 10. In this new world of autoscaling, that changes somewhat. They describe enough for a lower bound for how they want their service to be deployed. They also describe a metric that gives the best representation for their load at any given time. And then they also describe a decision policy. And the decision policy is a model that is used to correct the error between the target utilization and the real utilization. So this is pretty simple. We have a program that runs on a Chrome on every master every 10 minutes. This is massively simplified. But this is roughly how it works. It takes the target utilization and the real utilization and then uses that model, the decision policy, to figure out how many instances are required to correct the error between those two values. And then it just writes that number into ZooKeeper. And that's kind of it for the service autoscaler. We do that for every service that opts in, obviously. We then have a long-running deployment daemon that runs. And this is an event-driven program that describes, like, decides when a service needs to be redeployed. So there are a number of triggers for this. Obviously, the main one is when a service developer pushes new commits to their service. And they want to make a new release and get it into production. This notices that. Other changes that it might watch for are things like changes to configuration. So if a service owner asks for more memory or to be deployed in a specific availability zone or anything else, then, again, we have another watcher in the deployment daemon for that. Another watcher that we have here is it watches those ZooKeeper nodes that the service autoscaler writes to. And when it notices a change, it changes the number of instances that are deployed in Marathon. Now, it has a pretty easy job if we're scaling up, because scaling up's easy. We just ask Marathon for more instances, and that's that. If we're scaling down, it's slightly harder, because we've got to go about making sure that the tasks that we're about to kill are killed in a graceful way. We don't want to interrupt current HTTP connections. And so we have to take them out of the load balancer until we're confident that not only has new traffic stopped arriving, but that any existing traffic has stopped. And then we can kill the tasks and scale down accordingly. So I said that service owners have the opportunity to describe the best metric that represents their utilization at any given time. We offer three of those right now, the first being the CPU. So if a given service is CPU bound, this is the one they'll use. We look at them as after the CPU utilization. And if you go above a given threshold, whether that be like 80% utilization or whatever, then we'll scale up or down accordingly. The second is Uwiski. We are a Python shop at Yelp. Many of our services are Python fronted by pre-forking servers like Uwiski. So when Uwiski forks into a number of workers, then we can take its load as roughly we can take that as a guess of how many workers are busy at a given time. So if there are a number of idle workers, then obviously it's not really doing much. We can probably scale it down. Likewise, if the requests that are coming in are taking time to be processed, because all of the workers are busy, then we take that as an indication that it needs to scale up. And HTTP, if a service has a really bespoke way of representing their load, whether they're reading off a queue or something like that, and they want to use the rate at which messages are being processed off that queue as an indication of their load, then they can do that as well. So decision policies. So we now have the utilization. We have a target and the real value. How do we correct the error between those two things? So the first is we offer a PID controller. So PID controllers have a long history in industrial control systems. And simply put, they do exactly what we ask for. They take a target value and observe value, and they make changes to try and correct that error. The three constants in PID are the way that the person operating the controller can change how big the changes that the PID controller makes to try and correct that error. So the proportional part of PID just says that if you're overutilized by 10%, then we'll scale up by 10%. The integral part of the PID controller kind of adjusts for time as well. So if you're continually making adjustments over time and they're not having the impact that you need, then the constant that you apply to the integral part of the PID controller will slowly increase the weight of that change to try and make that adjustment as time goes on. And then the derivative part of this predicts into the future and tries to dampen any changes that you make, such that you don't keep oscillating around a given value by over and underestimating how much change you need to make. Next up, threshold. This is far, far simpler. Threshold decision policy just says that if you hit or if you go over your target utilization, then we'll scale up by 10%. That number's hard coded, and we'll scale back down as well by 10% when you're underutilized. We don't really use this. This was kind of a proof of concept that we built. Bespoke, again, we have people wear a generic model for describing how we should scale up and down, just point and do it. And so we give service owners the opportunity, if they want, to write their own autoscaler. So in the same way, if we went back to that program that I showed you previously where it writes into ZooKeeper, all we do with the bespoke autoscaler is give service owners the responsibility of writing that value into ZooKeeper instead. And finally, proportional. This was kind of an evolution of the PID controller. If you were to take the PID controller, remove the i and the d out of it, and just take the proportional part, then that's pretty much what this looks like now. So it just directly changes the amount of instances that we have deployed, proportional to the error that's between the target and the real utilization. But on top of that, it has a couple of additions that try and keep the changes that make sense of all. Things like a good enough window, we call it, which is, well, if you're within 5% of your target utilization, then rather than trying to correct for that and then scaling up and then scaling back down when we go over by 5 again, we just kind of accept that it's good enough. We're close enough to the target utilization, and we don't make any changes accordingly. So there are the models that we use to describe those things. So how effective has it been? I went looking for this graph, and I saw this, and I was like, yeah, that's pretty cool, right? Dramatic spikes, and then you see that it's not zero-scaled. And so we're kind of oscillating here between 5,200 and 5,500 tasks. So a small indent, maybe a little less than 10%. And then when you do zero-scale, you kind of see that we're making fairly small indent on the number of tasks. But we targeted this first at the monolith, right? The monolith was the main thing that we had to autoscale because it's probably the biggest consumer of resources in our cluster. So if we take a different view and look at the number of CPUs that have been autoscaled overnight rather than just the number of tasks, and you see the impact of just autoscaling that monolith, as well as a few satellite services around, we see we're making a change of 20% here on the number of CPUs that we're autoscaling it by. And so just by going after that one service, we've made a pretty big dent in the cluster here. Especially given we lose about 50% of our traffic overnight, so that's the most we're ever going to scale. We aim for 80% utilization as well. So we're making quite a bit of a headroom, but there's definitely more work to do. So the next part of this is the cluster autoscaler. This is a bit simpler, right? There's only one consumer of the cluster autoscaler, and it's us, the operations team. So we only have a single decision policy. We don't need to have more flexibility than that. It's very similar to the proportional one. It runs every 20 minutes rather than every 10, mainly because it can take a long time, because we do things kind of serially and we have to make sure that we drain all of the tasks about the load balancer on every host that was shutting down and so on. We aim for 80% utilization in the cluster. This is a really big deal, and we constantly try and tune this, because if we were to go at 100% utilization and only afford the cluster exactly the amount of resources that were required, then we'd end up without any of the operational freedom that the spare capacity gives us to scale up and down in periods of unexpected traffic. So by running at 80%, it means that in an emergency we can really quickly scale up the number of tasks of a given thing without having to wait for the host to come up first. And it also improves the speed of our deployments. The way that most services are deployed is that we'll run the old and the new version of a given service in parallel by bringing the new one up first, and then as the old one kind of drains out of the load balancer, we'll kill off the tasks. If we were to run at 100% utilization, then the side effect of that would mean that we would have to do this in lockstep, where as we killed an old one, we'd launch a new one and kill an old one and launch a new one, which makes for a much slower deployment, especially when a given service can have hundreds of tasks. So that 80% utilization, we keep tuning it and seeing where the right trade-off is between spending a bit of extra cash but also giving us some freedom. But we've set it on this for now. And then finally, we have to run the side of defense, right? Ultimately, we put in this program in control of the amount of infrastructure that we have to serve the website. So there is a whole bunch of checks in there that we have to keep a tab on to make sure that the cluster autoscaler isn't about to shut all of our cluster down because there's a bug in it that thinks, hey, we have 0% utilization. Like, let's kill it all. So there's an awful lot of safety checks in here to make sure that we're not doing anything that we're running. Other things that we check for are things like if we're rolling the spot flee that we're running because we've had to attach new launch configurations or something to it, then it can do things like making sure that we don't scale the old spot flee down until the new one has come up and we've won enough bids and things like that. So there's just an awful lot of complexity that we've had to put in there to make sure that it's safe. Now, the final thing I guess I wanted to share was how we deal with scaling down. Scaling down is a pretty disruptive thing because there's, of course, we're killing tasks, taking them out of the load balance, launching them elsewhere, and so on. So when we do scale down, we have to figure out which host we can scale down whilst causing the least churn in the cluster. So the way that we do this is we rank hosts giving them a fitness score that describes how easy or difficult it will be to shut them down. And the number one metric for us for that is the number of tasks that are running on it. If we have to, it's definitely preferable to shut a host down that's not doing anything or is maybe running one or two tasks, then it is to shut one down that's running 20 tasks because it's been around a bit longer. Other things that we take into account here are the number of batches that we run. So we do have a fair number of batches that are run by Kronos. But unfortunately, some of those batches have been around almost as long as Yelp. And so they were built for a time when you didn't have this notion of ephemeral hosts. And when you were a service owner and you were writing a batch, you could assume that the host that you were running on would exist for the lifetime of the batch. In reality, you kind of assumed that the box was just going to be there for the next year or so, and so if you took a day to run your batch, it was no big deal. But the side effect of that is they were never designed with things like checkpointing in mind. So it can be difficult to interrupt a batch because they kind of have to start again. That's made even worse when batches aren't built to be unimpotent. And so we have to take that into account as well. And then finally is the AWS events. We launch a whole bunch of AWS hosts. Inevitably, we get these notifications that say, hey, you need to be, you're on old hardware, you need to be restarted and go as well. So in the case where it's inevitable that a host is going to be shut down anyway, then we keep an eye on that. And if we find a host that is going to be shut down, then we prefer to shut down that first since it's going to happen. So how effective has this been? Well, you can see, again, going back when I said that we targeted the big monolith, just by doing that, we're making waves here where we go between 500 at the low periods up to about 600 hosts, even kind of higher than that in some cases. And so we're doing a pretty good job of like, this is a lot of capacity that's going overnight, which is really good to see. So the final thing that I'm going to talk about is AWS spotfully. So is anyone familiar with spotfully? Does anyone run it in production? Great. I'd really like to talk. That's really good. So if you're not familiar with spotfully, this is Amazon's way of selling their spare capacity to the highest bidder effectively. So if you imagine what you see on the left there to be like Amazon's capacity in a world where Amazon has like seven instances. And on the right are people bidding for those instances. As you can see on the left-hand side, that Amazon have got like three of their seven instances being used at the moment. There's four of them free available for bids. And on the right, you see that there are a number of users here. User A wants two of those instances and is prepared to pay $4 for them. User B wants one. They're prepared to pay $3. User C wants two of them. They want to pay $2. User D wants three instances, but they're only prepared to pay like a buck for them. So who wins? Of course, user A wins. They get their instances. User B gets their instance. As well, they were prepared to pay $3. User C gets one of the instances that they asked for, but not the other. But the key thing here is they're all paying user C's bid. So that's the lowest-winning bid. And even user A, who's willing to pay $4, only has to pay $2. And then, of course, if user B comes along and says, like, hey, I want another one now. I'm still prepared to pay $3 for this. User C gets kicked off. And now, everyone pays $3 for those winning instances. So what are the conditions? Well, you saw user C kind of got brutally kicked off of that instance there. You get two minutes notice. So if you lose a bid for your capacity, you get two minutes, and then that's it. The instances are terminated. So you've got to figure out a way of bringing up that capacity that you're about to lose within two minutes. Or that's it. So why? Why would you live with such volatile capacity? Well, the number one thing is absolutely money. If you look here, this is a comparison between the on-demand price for an R3A XL compared to the price that's been paid over the last month on spot flea. So you can see it's like $2.96 that you normally pay on demand. And people have paid $0.73 for those instances over the last month. It's like 23% of the price. That's an enormous saving. So how do we go about? Sounds great. How do you go about reducing that risk? The number one thing, counterintuitive, it may seem a bit high. The hope is that the savings in the low periods will outweigh the expenditure that you see in really expensive periods. Of course, don't be too high, because if you've been really high and you're still winning during those spikes of really expensive times where it can go up to three or four times the on-demand price, then you're very quickly going to account for your gains. Just a data point, we bid two times the instance price. Don't come along and bid 2.1. But yeah, we bid two times the instance price. So the second strategy, I guess, is diversifying. So when you ask Amazon to fulfill the spot fleet requests, you give them the desired capacity, which can be either expressed in a number of instances, or it can be expressed in some other arbitrary weighting that you give to instances. And then you give a description of all of the instance types that you want to bid for, how much you want to pay for them, and how much they contribute towards that capacity that you're after. Now, when you do that, you can say to Amazon, here's the spec of all the things that I'm willing to pay for. You can either fulfill it in the cheapest way possible, in which case Amazon might find an instance type that's going crazy cheap right now, and they'll lump you in on that, and fulfill all of your capacity using that. The alternative is that you can ask them to diversify across as many instance types and availability zones as possible. Of course, you're going to spend a bit more when you diversify, but you're far less susceptible to a single instance type spiking in its price and you're losing all of your capacity. So this is a view of how Yelp asks for spot-fleet instances. This is the Terraform configuration language. We wrote a module that wraps the spot-fleet one. The key thing that you see here is the minimum and maximum capacity. So we then turn that into a file that lives in S3, and then the cluster autoscaler turns that into a real number that we give to Amazon. But the thing here is that we're asking for between seven and 70 instances. Now, that's not instances. That's actually the number of CPUs that we want divided by 100. So we use CPUs like the primary metric that we use to define how many instances that we actually need. So when we asked for seven units of capacity from Amazon, we're actually asking for 700 CPUs across different instance types. And then we provide them with the specification of how each instance type contributes to the capacity that we're after. So you can see here that the C4 4XL has 16 virtual CPUs. We give that a weight of 0.15. The reason for that discrepancy is that we keep on CPU behind for system resources like running puppet. And we don't advertise that amount to mesos. But you can see here just a spread of the instances. C4XL, that's got 36 virtual CPUs. We give that a weighting of 0.35 and so on. And so when we then ask for seven units of capacity, you can see how they'll add up according to their weights. And that's how Amazon decides how many of each instance type it needs to fulfill that capacity request. So the final thing that I'll share and talk about now is how we go about, once we get notified of that two-minute timer that we're about to be shut down, how do we actually do this? I really want to hear other stories if anyone's got them of how they go about doing this. When we notice, you can access the fact that you're about to be shut down via the EC2 metadata API. So we constantly poll that. And when we notice a given instance is about to be shut down, we instantly use the Mesos maintenance primitives to mark the host as draining. The reason we do that is it's kind of a natural point at which we can say, like, this box is about to be shut down. And the hope is that one day all of the frameworks that we use will actually take that into account. So we mark the host as draining. And then in the deployment daemon, in the same way that we have watches for changes in ZooKeeper for the number of hosts that we need, watches for new git commits and things, we have another watcher for hosts that go into draining mode. So we poll the master API that exposes all of the hosts that are draining at a given time. And then once we notice that a new host is draining, then we'll instantly scale that application up by the number of instances of that application that are running on the host. So if a host is running like two instances of our monolith, then we'll go along and just make the API call to marathon to say, whatever the capacity that you have at the moment, increase it by two. And hopefully the host that it launches those new tasks on won't be the one that's about to be shut down. We then take the host out of the load balancer to try and plug all of the new HTTP connections that are being made to it. And also finish off the ones that are. And then once we're confident that it's done, then we kill the task. At that point, we enter into this race between ourselves and marathon, where if we kill the task, then we've then got to try and get hold of that resource before marathon takes the offer back from Mesos and launches a new task using it. So the way that we do that is by making a dynamic reservation in Mesos for the resources that have just been freed up. Of course, if we lose that race and marathon launches a task before we get the dynamic reservation done, then we have to go through this loop all over again. So that's that. And then so we kind of just loop through this until all of the tasks are done. Basically, until the end of the two minutes, this is all we do. And then eventually, of course, Amazon terminates us. I do want to kind of call this out. This was a great contribution that was made to marathon, where it'll now ignore hosts. Sorry, it'll ignore offers from hosts that have got maintenance schedule attached to them. If the contributor, Alan Bover, is here, then it's an enormous thank you from all of us at Yelp, because this is going to make our lives a heck of a lot easier. And definitely reduce the amount of disruption that comes from shutting down a host. So is it worth it? Specifically, I'm talking about the spot fleets here. But this shows how much we've paid for a given instance type in a given region compared to the cost of a three-year upfront reserved instance. So that's probably the cheapest way that you can buy Amazon instances, right? As you say, I'm prepared to say that I'm going to have this for the next three years, have all the money upfront. And that's how you get the biggest discounts from Amazon. Then this is the price that we've paid in spot fleets compared to that. So you can see, in the worst case, in US West 2, for instance, we've paid for the C4 AXL, we've paid 81% of that price. And that's the worst case. In the best case, in US West 1, we've been paying 27% of the price for C3 AXLs. And this is over the last three months, I think. So yeah, in some place, we're making really big savings. In the worst case, we're still making some savings, just not quite as much. And then at the bottom, you can see there's the weighted total. In US West 1, we're paying like 47% of what we were. In US East 1, we're paying 51% of what we were paying. And in US West 2, we're paying 60%. So in the worst case, 60% of our previous bill, which is an enormous saving. Combined with the auto scaling, of course, this is really bringing our cost down, which is good. So future plans, like what's next? Well, the first thing is this predictive versus reactive auto scaling. As you saw, Yelp has a really predictable traffic pattern. And yet the way that we auto scale is in reaction to changes in the utilization. And so really, we could get ahead of that by preemptively scaling up when we can, on a reasonable ground, say, yeah, this traffic, the load on this thing is about to go up. Let's just get ahead of it and scale up. Because it means that in that period between when we notice that the utilization has gotten higher and the time where we correct that error, it means that potentially we've got higher timings or something like that in that period. So we can get ahead of that, as I said, by doing some predictive auto scaling. Parallel scaling as well. So I kind of hinted at it taking a long time for the cluster auto scale to run. And part of the reason for that is that it runs in serial. And then after every shutdown, it reassesses how much capacity we need to change. Again, and then kind of keeps going. So in theory, we can just do this in parallel and make it significantly faster. So that's definitely another goal for us. And then finally, deployment to more services. As I have said, we targeted this at the monolith. And a few satellite services have started to take advantage. But we definitely want to encourage more and more service developers to take this on and make it the default. Because it's only going to be better for us if we can increase the utilization on the cluster even more. So the conclusion, well, we get a lot of business value by auto scaling. We save an awful lot of money from it. I think the 80% efficiency thing is a really good lesson that we've learned, where previously, if we tried to over-optimize and run without a reasonable amount of headroom, then we've then suffered in developer productivity where they've had to wait for ages to get their deployment out or in times of increased load that's been unexpected, then we've not had the room to scale up accordingly. Spotfully has further reduced our AWS bill. I think that kind of undersells it. But we've saved an awful lot of money by using Spotfully. But of course, we've also had to spend a lot of engineering time building tooling that's been able to weather the extra volatility that you get from running. And then finally, the thing that we've built all of this on, and we really hope that frameworks continue to adopt, is the Mezos maintenance primitives. They definitely provide us a good place to express the fact that our hosts are going to be in and out of use quite frequently. So I've got time for questions, haven't I? Shout out, we are hiring. If you're interested in coming to work on this kind of thing, we have offices in Hamburg or San Francisco or London as well, which is where I am. So if you're interested, come and talk to me. And I can put you in contact with the right people. But questions, has anyone got anything? Oh. I got a question about mitigating risks when auto scaling. Have you considered an option when you would have two node pools for your servers? One, smaller, well, with moderate auto scaling or almost static for long-running batches, which you should better not touch. And other big dynamic spot fleet based with extremely aggressive auto scaling for everything else? We have. I didn't tell the whole story. We do have some regular auto scaling on-demand hosts that are less volatile, that we run those things that really can't deal with being shut down that frequently. As for the batches, it's kind of a trade-off between us increasing our utilization across the default pool, because, of course, batches, they run on a schedule. And so if we were to just have static infrastructure available to them, then there will be periods where they're just like it's not being used. And so we've got to find the right balance between those things that, yeah, sure, it's risky to, in case the instance does get shut down, but it's not worth us paying for an instance all of the time just for that batch to run once a week or something like that. So at the moment, our kind of strategy is asking for forgiveness. And so we kind of run as much as we can on spot fleets. And then if the developer comes along and says, hey, what's up? I just lost my instance. Then we kind of gently move them over to something more stable. Yeah, that's kind of the strategy we've taken. OK, actually, we do the same that we have. By each every one minute, so our excuses are not working anymore. Do you have some statistics? How long on average the spot instance runs? Or how often do you have to reschedule, and terminate, and do all the scaling? Yeah, I was trying to come up with this and didn't quite get there. Like, I think I found that the 95th percentile for our instances, like the uptime for our instances over the last two weeks, has been a little over a day. So we do churn through them quite quickly between the autoscaler shutting things down and then, obviously, spot fleet as being out there and things like that. Thank you. Oh, sorry. Question. Using spot fleet, you needed a lot of engineering capacity probably to get all of that running. Have you made a total cost of ownership in terms of what you're really saving for many infrastructure point of view and investing in engineering? Is it still a green case? I haven't. I'm certain someone above me has. Oh, OK, thank you. So sorry, maybe I missed something. So you scale the services and you scale the cluster, the number of services in the cluster, and do you scale the resources of the service itself? If the utilization, if you allocate two gig of memory for services, in reality it's using maybe 10% or 50%? No, we don't do that. We do have, we monitor these things and we expose graphs to service developers and things so that hopefully they'll take it onto themselves to see if they've asked for 10 gigabytes of RAM and they're using 0.5, they'll adjust accordingly. We don't dynamically change that value. It's definitely an option, but it's not something we do right now. In our case, we use the same, we use spot instances and these kind of things, but we notice that developers are, they just put arbitrary numbers in the beginning when they develop new service because they really don't know how in reality it will use. And this is a lot of waste. More than 40%, I don't remember the number exactly, but around 40% was just waste. Even we are utilizing the cluster in terms of allocation, but the allocation was not used. So we wrote also a little job that calculates all the, get the historical utilization of every task for the last three days and then add 20%. So make sure that uses 80% and then notify on Slack that this service needs recommended values. So developers can do it manually. We didn't do it automatically because it was a bit scary. Yeah, it's definitely something we've considered. We don't do it, but what we do is, I think on a monthly basis, we produce a report of those services that are wasting the most resources and then send them an email and say, you should do something about this. Yeah, cool. Any more questions? Okay, thanks Rob. Great, thank you. Thank you.