 Hey, everyone. My name is Joel Priest. I'm with Rackspace Public Cloud, an engineer with Rackspace now for going on about four years, Public Cloud the whole time. And my name is Ben Burdick, also an engineer with the Rackspace Cloud. Been there since 2008, working with originally with Slice Host, the first gen cloud servers, and now OpenStack. Cool. So we're going to be talking to you today a little bit about how Rackspace handles fleet management in our OpenStack Cloud. So a little bit of background information about Rackspace here. If you've been to any Rackspace talks, you probably know these facts better than me. We're a managed cloud hosting company based out of San Antonio, Texas, founded in 1998, home of fanatical support. We have more than 200,000 customers across 120 countries. Our fleet has been in OpenStack Cloud, has been in production since about August of 2012. We are in six regions, originally launched in Dallas and Chicago in the States and London. And we've since spread into Virginia, Hong Kong, and Sydney. Each of our regions are essentially a separate installation of OpenStack. They all have their own separate API endpoints. And the control plans for them are all completely separate. This encompasses tens of thousands of hypervisors, hundreds of thousands of instances, 340,000 plus cores, over 1.2 petabytes of RAM. Those were conservative estimates from several months ago, so those numbers are inaccurate. And all of our hypervisors are running ZenServer. And as far as OpenStack services, we run Nova, Glantz, Neutron, Ironic, Swift, Cinder, and more, always looking to add more. And Ben and I's primary focus is based around Nova and Glantz for the most part, and Neutron as well, sorry. So why are we here? Spoiler slash a little bit of a teaser. We want to start giving back more to the community. At our scale, we've been running into issues regularly with scaling OpenStack, and we've done a lot of work to account for that and make OpenStack scale to tens of thousands of hypervisors across six regions, yada, yada, yada. And we wanna start bringing those tools back into the community so that the same people don't have to start fighting the same battles we did a couple years ago. So we really wanna start open sourcing some of the tools we're using on the fleet ops side. We wanted to have a lot of that ready to go. By this summit, unfortunately that didn't happen. So the kind of teaser is that hopefully if you get to make it to the Austin summit, we'll have some stuff for you guys there. That's the plan. Just takes a little while to get all of our in-house stuff kinda cleaned up and ready for public consumption. But that's what we're working on right now. We're one of the things we're working on right now. So kind of the rest of the story. Why are we here? We looked at the things we struggled with and like I said, we go to these summits, we see other implementers, other operators running into the same challenges we did. So obviously there's a need there for these kinds of tools to help y'all and us as well grow our clouds. And as we started looking at our cloud when we were first starting, what could we do different if we had 50,000 instances? What could we do different if we had 100,000 instances? And the thing that we kind of realized is that the ultimate goal is it shouldn't matter how many instances there are. It should be just as easy for us to operate a cloud with a million instances as it is with one instance if you're doing it right. You should be able to, that's the ultimate goal. Obviously it doesn't scale that way in reality but that's what we're trying to reach for. A set of tools, set of implementations that more or less teaches the cloud to run itself and then we can just be there to kind of babysit it and make sure those things are doing what they're supposed to do. So how do we do that? The elusive open stack configuration management database. If you've been to any of the recent summits, especially the operators, mid-cycles, that kind of thing, this has been a very big topic recently in mailing lists at the mid-cycle and there's been some blueprints floating around between us and some other large implementers about how do we do this? We get a CMDB into open stack, one that makes sense, one that does what we want it to do. So the main thing that we've run into is translating things outside of open stack relationally into things that are in open stack. Not all of our data that is relevant to our cloud and how we run it is within an open stack service. It just doesn't work that way. We have asset tracking met software, we have networking software that doesn't have a real construct inside of open stack. How do we put all of those things together into one place so that we can access that information and then build tools around that? My ultimate goal with the CMDB as an operator is I shouldn't have to think about where to go to get information about this part of the hypervisor, or like this part, or where does this live? I should just be able to say, tell me about this hypervisor, get a big object that says where it is, what's on it, what's it doing, et cetera, et cetera. So how do we do that? In our view, that came down to a collection of tools, not one tool. We don't want this giant monolithic piece of software that's just as difficult for us to run as open stack is in the first place, because that doesn't help us at all, that actually makes our job twice as hard. So we want a very, I hate to use this word, agile, small set of tools that are focused and easy to use and directed and can each do what they do well and hopefully without too much complication so that they just make their job, they do their job of making our lives easier without too much hassle. So on the operator side, what does that mean? The things that I want from my seem to be correlation, consistency, sleep and vacation are the last two, way more important than the first two, by the way, but it's how do you get there? So correlation, how can I tell what's affected within my open stack cluster if something outside of the open stack implementation has a problem, for instance? If a switch goes down, I should be able to hit some kind of database, make an API call and say this switch is down, what instances are affected, what hypervisors are behind it, if it's say our internal cloud because we run open stack on open stack, what, which of our services actually live back there or is our, or do we have four API nodes down because this switch died or anything like that? So I need to be able to correlate data from outside of open stack that open stack doesn't recognize two open stack things. Consistency, want to query the infrastructure and say, hey, give me every configuration for every hypervisor, give me this value from every hypervisor. If I get 999,000 answers back that are one value and three that are another, that's probably bad. So we need to be able to make everything consistent as much as possible, sleep. If we can make everything consistent and correlated and thus build tools around enforcing that correlation and enforcing that consistency, I can actually go to sleep at night, which would be great. And if you want to learn more about how I can't sleep, I have another talk tomorrow at about 9 a.m. where you can learn all about how I didn't get to sleep for about a week or so. And then ultimately vacation. I should be able to go to open stack conference for a week, come back and have my cloud not be a smoldering pile of rubble. It should be able to run itself with very minimal intervention on our side. So this is what we started asking ourselves and we thought, how do we build these tools and what tools do we need? Can the fleet provision its own capacity? That's a big one. I want to be able to plug a set of new hardware and say this is earmarked for this region and this type of hardware and the system detect that and go bare metal provisioning, installs in server, configure it with this IP, yada yada yada, plug it in, run tests on it, can I build to it, does it work? Great, go have fun customers. Congratulations, here's some new hardware. What will take to get the cloud to heal itself? If I can plug in new capacity and have it build itself from the ground up, I should be able to take a problem hypervisor, run any fixes on it if it's got bad RAM, live make great off of it, send a ticket into our DC operations, which by the way, DC operations on some hero of a large implementation cloud. Replace that, run tests on it, make sure it's good, run any updates on it, put it back in. Just same thing as adding new capacity almost. What can we learn from web apps? This is one thing we started thinking about was hypervisors, the fact that they're physical doesn't mean anything to me. I mean, again, DC ops are the ones who are doing all that. As far as I know, our physical hypervisors don't actually exist. I've never touched one. I've never seen one except in pictures. It's just a compute power to me. It's just as real to me as the VPS is for our customers. So once you start thinking about it that way, it's just another resource in your cloud. It's another way of doing computations. It's another place you direct things. Like again, it's no different than any other worker node for a virtual environment. So that means hypervisors are expendable. Pets, not, sorry, cattle, not pets. Like we wanna be able to get rid of a hypervisor. Obviously not if customer data is on it, ideally. We wanna save those. But as far as an operator, if I have a hypervisor that has too many problems and it either can't be salvaged or we've moved everyone off of it and it's not worth it, I don't wanna have that thing around forever. I don't wanna have to worry about it. I want it to be replaced or fixed or just gone if it needs to be. So then we start treating everything like a node. We, again, we run open stack on open stack. So physical hypervisor is no different than whatever VM I have that happens to be running Glance or happen to be running Neutron. It's just another service as far as I'm concerned. And how do we scale this? If our cloud got 10 times bigger today, again, should be just as easy to do that as it was for me today. Also, again, in practice, that's not gonna happen. But that's the end goal and that's what we're striving for. And so the ultimate end game, when we know we have done all this as best as we can, is we get to the point where our cloud is ordering its own gear. I wanna be in a position where, say, our Chicago cloud gets low on capacity for, let's say, our performance one hypervisor detects that, sends a ticket off to our supply chain, says, hey, I need another cell of performance gear. Gear shows up, gets rolled in, gets plugged in. Switch says, oh, hey, all my ports just lit up. All that server starts going through all your provisioning stuff. Hands off. Maybe somebody check if a bunch of purchase orders come in for a few million dollars with the gear. But more or less, it just kind of happens and we don't worry about it. So assumptions in terms of our solution for this. One, we're running cells. We were basically the first people to run cells. That's not normal. It's becoming more normal. And just out of curiosity, who here, is anybody here running cells in their implementation? Sweet. Well, there's been a lot of talks recently about bringing cells more into the fold, making it more of the norm. That was a big topic of discussion, I believe, in Paris or Atlanta, I can't remember which one, about even going so far as to make cells the default. You start with a one cell deployment out of the box. Zen server. Anybody here running Zen server other than us? Yeah, I didn't think so on that either. But, so a lot of our specific things we have now, obviously we can't, even if we open sources today, we can't give you our run book on bootstrapping or hypervisor. If you're not running Zen server, that doesn't help you very much. And we run computes as VMs. And we do it in essentially a one to one ratio. Every hypervisor we bootstrap, we put a small VM on it. That's our compute node and it's only in charge of that hypervisor. Except for our Ironic. They, obviously you can't put a VM on a bare metal machine someone else is using. But other than that, one to one. It increases the complexity quite a bit. I mean, obviously you just doubled the amount of nodes you have to control because now you have one compute for every hypervisor. But it gives us a degree of flexibility that outweighs the complexity. And control planes. It's all on VMs. Like I said, we're on open stack on open stack. We call it Inova. It's a small open stack implementation, one for each region. And that's where all almost, I think we only have one service that isn't completely virtualized at this point. Everything else, VMs. So how do you manage that complexity? First example for that is gonna be hardware. We have five separate favorite classes. Within those, we are sourced from at least one up to three vendor hardware vendors. And from those vendors, we might have multiple revisions of the same piece of hardware. So worst case scenario, you're looking at that. Five flavors, three vendors. You're looking at at least 15 permutations just on your hardware types. And that's just the hypervisor level. Things might change upstream in our networking implementation. So we have to know what our top of racks look like. We have to know what our aggers look like and all the way up the chain. So that complicates things. That complicates everything. Testing, pushing a small code change should work on everything, right? Well, you know, have to double check on at least 15 different flavor and hardware implementations to make sure that that is actually true. Because that's not something you wanna make an assumption with. So even very small things get extremely complicated when you have to do it 50,000 times, 100,000 times, 200,000 times. Edge cases, death by a thousand cuts when you're trying to push things at the scale that we are. So very high level view of how we started to attack this problem. First layer, human interaction. It's gonna be a fleet management interface and trending and reporting. Need some place you can go and look and say, how is my cloud behaving? You know, what's my API availability? What's my latency for, you know, how long does it take me to download an image? How long does it take me to do this? You need to be able to look at that so we can make intelligent decisions as the human operators of this cloud. And a fleet management interface to that we mean, sometimes you do have to trigger something manually. So I don't wanna have to go in and manually type out an Ansible command to run a playbook for a cell. I can do it, I do it all the time, but I don't wanna have to do it. I want it to, at the very worst case, be a button I can press and it'll go run itself. So that is going to be built upon automation services. Automation services, playbooks for big fans of Ansible, any Rackspace cloud, private or public talk you go to. We're gonna be talking about Ansible to some degree probably. So we have a set of playbooks to address all of these things that we're doing across all these hypervisors. And we need those to be doing provisioning, auditing and remediation. New capacity needs to come out, need to audit it to make sure that it's as you expect it to be and we need to be able to fix it when it's not. And then inventory. We need to be able to pull all of that data from all those multiple different places, aggregate it, one stop shop for everything you need to know about your cloud and we need an API to interact with it. So now we get to an inventory management system. The goal with that, single place, all that data lives. I wanna make one API call, know everything I need to know about a hypervisor. And that's gonna be pulling information from asset management, NOVA data and wiki pages. There's probably stuff that no one else knows at Rackspace except for two or three people, which is bad, right? So you wanna be able to put that into some place that you can access because things happen. So we wanna put all that in one place. So what do we do? We build something called Galaxy. Again, we talk about Ansible a lot. We are not talking about the Ansible Galaxy. So we unfortunately picked a, individually picked the same naming convention with that. And what this is is a database of all the information we have for multiple sources. We can do human entry to put data in there manually about something if we have to or it can be pulling from our internal systems that predate the Rackspace cloud by five, six, seven, 10 years that we need to be able to still be in line with, pull information out of sources of information from core or any place else and give me an API so I can pull and push from it. At its core, it's a very simple concept. It ends up being a key value store. The challenge is building one that works well with OpenStack and building all of the collectors to pull up all of that different information and put it in that key value store in a uniform way. So this is a very high level view of Galaxy. Primarily it's an API, but we do have a graphical front end for it that gives a very simplified view. It doesn't have all the data, basically greatest hits where you can kind of take a look at it, see how many hypervisors, see how many cells, is the cell enabled? Maybe it's in the middle of being provisioned and it's not enabled yet. And you can drill down a little bit more. This looks specifically at a hypervisor. Again, not all of the information is here. The API is there for us who are doing the heavy lifting and really need all that data and then this is more of a high level view for either us to, again, curse review or from less technical people who need to take a look, account manager needs to take a look at it and see, oh, what's going on with this hypervisor? What's it look like? That kind of thing. And all of those tools, all of the data put in Galaxy enables us to build all of the other tools for provisioning and remediation. It gives us the platform to put on, to build on top of. I mean, you're doing fine. I'll keep going. I'm here all week. So, yeah. We're gonna talk about the provisioning process. Along with all that, all the automation and the tools we've built, we've built a tool called Terraform, which, again, I know there's another tool called Terraform. We're very unlucky with picking our names, apparently. But Terraform uses the metadata that's set in Galaxy, which the host will boot, when you reboot a host, it'll boot via DHCP, connect with Galaxy, read the appropriate metadata saying things like, does this host need to be re-imaged right now? Does this host need to be re-kicked? Does this host have customers on it? And it's identified via LLDP. And if it does need to be re-imaged, Galaxy stores the data about which image we wanna put on the machine, whether it be send server 6.0 or 6.2, whether it be some other future hypervisor version we may decide to use. So ideally, we want hands-free provisioning at all times. We do, at a given time, we do have hundreds or more hosts at some stage of the provisioning process, and it's just too much for anybody to keep up with manually. So this really helps speed up that process. We used to rely on other teams that would kind of set up the base OS for us, and then from there we would provision the software, we need to compute information, all that kind of stuff, but then we have to rely on another team who has their own work priorities, and sometimes things just don't get online as quick as we might need them when we're relying on a different group. And we know that our future is multi-hypervisor for a variety of reasons. And basically, we also want to focus troubleshooting on hosts that are known good. So if a host is acting up, we have the ability to easily migrate or live migrate instances to a new host machine and reboot the host and have it re-kick from scratch. If it's still got problems from there, then we know it's a host problem, and not a hardware problem, and not a software problem. And there's always some set of hosts that are gonna have software problems or a variety of reasons, and it's easier to just go re-image it than sit there and try and diagnose what the actual problem is. And hopefully someday we can reboot a host and it will just upgrade itself magically. And we built a lot of automation services on top of Galaxy, and in addition to Terraform, once the host is Terraformed and has a new image on it, as you said, we do use a lot of Ansible. We have some, what we call our bootstrapping playbooks, which basically you run a playbook and it gathers all the information it needs from Galaxy, networking information, hypervisor information, code version information, basically everything you could possibly need for the hypervisor to know and about 30 minutes later, host is online. We have some system auditing things that check for things that monitoring doesn't. For example, if you're setting up an HA pair, as you said, we run the cloud on top of the cloud. If you're setting it up in HA pair, what are the chances that those two instances land on the same hypervisor? Maybe slim, but it happens and we want something out there looking for things like that. Say, hey, you have an HA pair that if this host goes down, they're both going down, that doesn't help. Another one of our great automation services is what we call Resolver, and that does a lot of automated alert resolving. Alerts come in, a lot of basic things, like it's easily pluggable, we have basic things like disk usage, it's the Glance API up, what's the load average like, and a lot of these are easy to fix. You don't need a human to do that. If the disk is full, we've identified areas where the disk is filling up, files that can be removed, and it's just a waste of time for a human to go in there and delete log files that are no longer needed or clean up a mess somebody made and root. And so an alert comes in, Resolver grabs the alert, goes and does it for us. You can also plug things into Resolver manually if you need to. And 60, 80% of all alerts can be handled by a service instead of a person. It makes a lot more sense to have the person, the people deal with the alerts that can't solve themselves, which are most of them. In between Auditor and Resolver, we kind of have a self-healing cloud going. Auditor will go out, make sure everything's where it should be, the universe is happy. If it's not, it will tell Resolver, hey, we got a problem over here. Resolver will reach out and try and fix the problem. If Resolver cannot fix it, then it gets escalated to a human. But if Resolver can handle it, a human never sees it, never has to deal with it, everyone's happy. We still need people to interface with the fleet and a big part of this, not everyone who needs to interface with it is, gonna be a lot of looking up everything via the command line, so we have built some really nice interfaces for these tools. And this gives you a lot of sorting abilities, gives you a lot of at-a-glance overview of what's going on with the fleet. You can drill down to open issues, issues that are being currently worked. You can drill down from the top of the region to all the individual cells to an individual hypervisor all the way down to an individual instance. Here is also where we are holding our live migration orchestration. Live migration is something we've been trying to get better at and from here we can have the assistance of ops folks who may or may not have access to the hypervisors to easily help resolve customer issues by going in here and scheduling a live migration, keeping a checkout status, seeing if there's any failures, what that failure might be, and escalating it if need be. So this is an example of a cell. From here you can take an aggregate instance view or an aggregate host view. So at Rackspace, we have data in a lot of different places. And that's one of the things that Galaxy helps with and this is getting its information straight from Galaxy is it takes data from all those various sources and it puts it in one place as well as gives us an ability to have a place to store the metadata that we find important, that the other tools within Rackspace don't find important. And this is the view that many of the operators lot is an instance level view as well, but this is the host view. And from a top down you can add a glance, see kind of the health of the host when it'll last alerted, what the alert was for, was it resolved, when's it gonna check next? It's gonna tell you the cell it's in, the region it's in, what Nagios node monitors this host as well as a variety of other information our alerts even enabled. Also you can see on the right there the host tasks that's where you could manually trigger an action if you wanted to, let's say for instance a human needed to look at this, decided this host was unreliable and needed to be migrated off of. That section on the right where it says host tasks is where the admin could go and say, well I'd migrate, don't trust this thing. Disable it in the host DB and move on. And it would move all of the instances off of there for us. Ideally. Ideally. You mean it's not doing it right now? No, it is doing it right now. It just means that sometimes if it's a suspect host there's maybe something that would prevent it from working. Come on, Ranga. Oh, I got up here to heckle. It was a legitimate question. What? So yeah, and along with visualization, we capture all the aggregate data and the trends. We have all kinds of pretty graphs. You can go look at pie charts, go look at bar graphs. We use a combination of abacus, elk, elastic search, log stash, cabana, as well as our own homegrown tool called O3 Fleet Reports, which kind of gives a single dashboard where people can view all these various services in one place. For example, elk, we'll send OpenStack. We send OpenStack logs to elk. If you see 500s from the API, you can easily filter by region, by cell, by tenant, and kind of identify exactly where those API, those 500s are coming from in the stack. In from elk, you can see the entire fleet of hypervisors drill down which cells hosts are running specific Zen server versions, have specific hardware profiles, and it's also very helpful, abacus has been very helpful for viewing capacity reports. It's capacity has been, capacity is a challenge, especially when you're running cells, because eventually you may have a cell where you actually have no more physical room to grow in the cell, and it's important that you know if that happens, and that's actually one of the things resolver will do. If resolver sees that a cell is getting too full, resolver is gonna go and wait that cell down so that customers can no longer build there, and we wanna leave a good gap there. We wanna leave empty hosts there because if we do need a live migrate host, if we do have an issue, we're gonna migrate instances off, and we don't have a host to migrate them to, then we're in a pickle. So yeah, our next big goal is kind of a self-aware cloud, provisioning from nothing. We want the DCE to be able to hook a cab up, turn it on, it automatically gets booted via DHCP to Terraform, which installs EOS on it, which will then tell resolver, hey, we've got a new OS, clean OS, bootstrap me. And from there, it'll run the bootstrapping playbook, set up all the Nova compute nodes, set up networking, get everything going, and resolver will call some post-automation, post-provisioning automation, which kind of does some basic tests on the host. Basic QE smoke test kind of thing, make sure that the host is actually functional, you can build instances there, instances ping when they're done. Once monitoring is green, the smoke is green, host automatically gets enabled. So cabs rolled in, turned on, couple hours later, it's automatically in production. No one had to do any work that day, we did all the work a couple months ago. So, yeah, and the point of all this is, as some of you may have been at Darren's talk yesterday, there's the OSIC clusters, Rackspace, Private Cloud, and beyond, they're spinning up two large 100 node clusters for community use to kind of- What, 1000 node? You said 100. Oh, yeah, that's a big difference. Yeah. And we wanna use this as kind of a testing ground to integrate a lot of these tools we've built because it's, we'll be setting up the cloud, but it's not really our cloud. So this will be kind of the first trial, dry run of running these tools that is on a cloud that isn't so Rackspace specific because eventually we would like to release these tools to the community and we have a ways to go because there are so many Rackspace-specific things in there. So this is gonna help us identify kind of what we need to rework, what we need to tweak, what we need to leave out for it to be helpful to everyone else. And so we, it's probably not gonna be part of the big OpenStack ecosystem, but we do wanna give it back. It is very useful. OpenStack or not OpenStack, I think it would be useful for a variety of folks' infrastructure needs. So yeah, we don't know yet what work we need to do to get there, but that is the plan. And so we're engaging with other groups in these discussions and we're gonna figure out what we need to do to get there. And we'll have an updated version of this talk in Austin. Hopefully some progress to report on that front. We'd be bad host if you came to our backyard in Texas and we didn't have some nice gifts for you guys when you showed up. So that's the idea. Cool. Well, thank you. Questions? Any questions? Come on. I don't know that we have anything automated in right now that would say we've had, I'm sorry. So the question is basically, the question is if we've had multiple, say, host down events, do we do tracking to correlate that with a set of firmware, software version, hardware type, et cetera, so that if something was put into the environment that was disruptive to that particular configuration, do we have anything to automatically detect that? I don't know that we do have anything right now that's built in that would say we had a spike in host downs and it was this firmware version or this or that. That would be something that we could definitely see if we looked at the trending on our graphs because if we made a change and then we see a spike, we would then start looking at one of these things is not like the other. What's our common denominator here? Why did this happen? And we have absolutely done that in the past where we've pushed something out and we have multiple different versions of the server running. We have in just in the next gen cloud, we have hypervisors from 6.0 to 6.2 and we're already evaluating 6.5 for new stuff as well. So I mean, that's not getting, it's getting worse before it gets better until we get some of this stuff more sorted with on the live migrate and upgrade paths and all that. So we have definitely had things happen in the past where we've had a large number of host downs and we've, you know, you do Pareto on it, right? What's the 20% of causes for these 80% of host downs? And we've looked and said, oh, it's something in 6.1. What's different in 6.1 from 6.2? Oh, it's this driver. We need an update for this. Get our contact that's in the server on the phone. Get our contact with our, you know, with our NIC manufacturer on the phone, whatever we have to do. So I don't know that that's automated yet. No, that part is still maybe on the human side. We do have quite a bit automated on that front though. You know, and that's part of the auditor service that kind of validates the host against a certain set of rules. You know, some of those rules are, does this version of the hypervisor have all the latest patches? Has it been patched but not rebooted? Or has it been patched and rebooted? Because often those can be very different things depending on kind of the updates you're doing. So we're enforcing what we want it to have but we don't have anything automatically telling us what we want it to have is wrong. I guess is the way of looking at it. And if it does fail some of those rules, you know, they will talk to Resolver. If Resolver can handle them, it will. You know, things like that. Updating firmware through an automated process live in production can be kind of iffy. So it's, yeah, it's not quite doing something like that yet. We would like to be alerted on that, which is not currently happening, but definitely something we're looking at. Anything else, right? That's, yeah, something like 60 to 80% of the alerts that are generated in our environment are handled by Resolver at this point. Thank you. Thank you. Whether we alert on way too much is relatable though. Yeah. There is that. Yeah. Ultimately. What if Resolver is down? What if Resolver is down? We have a second Resolver that all it does is watch Resolver. That's not true. So. You know, ultimately that would be real bad. So we have Resolver monitored. We have our normal monitoring infrastructure and those alerts are still going into us. Places that we can see outside of Resolver, outside of alerts. And Resolver is just getting its data from elsewhere. We do have multiple Resolvers running, but if something were happened, we actually have playbooks to spin up new Resolvers. So if it is down, it's not gonna be down for very long. Yeah. And it will pick, you know, it'll pick up right where it left off. If like, for instance, if, if the auditing and monitoring tools went down, our admins and engineers would notice that immediately because that's where they live all day at work is looking at that stuff. If Resolver went down, they would also probably notice because their queue of things to look at would be huge because Resolver wouldn't be fixing any of those things automatically, which would, that would probably sort itself out real quick. I don't know of any way to get a faster notification and say doubling an admin's workload in a few minutes. They'll find someone fast, but we don't have anything automated, I think, to put up a Resolver. That's an interesting idea though. If we did something, we could do that cross-regionally potentially, right? Like have a Resolver monitoring, Resolver saying our Chicago implementation that's monitoring DFW and vice versa. And if Resolver notices Resolver is down, spin up new Resolver nodes. And that's the kind of flexibility we have. That would actually be very simple for us. As long, if we have, we already have an Ansible playbook for something and we can make an Agios alert for it, we can have a Resolver trigger for it very quickly. It's basically just a wrapper around Ansible playbooks. It can do other things as well, but that's how we ultimately end up using it most of the time. Questions? Thanks for coming everybody. Thank you.