 Hi, my name is Brendan A. This is James Webb. We are the platform engineering team from T-Mobile. We started ourselves just around three years ago or so with Cloud Foundry. At this point, we've now grown to around 25 people across both PAS for Cloud Foundry as well as PKS. We have a common Intel team that deals with monitoring and logging over the top, as well as a common team that is designed for customer success. They help teams onboard, make sure they run properly, follow-upist practices, things like that. So a pretty varied background. It's very hard to find people who already know Cloud Foundry or Kubernetes who are not actively employed. So we found a lot of people that are XUNIX admins, software developers, things like that. And I really looked for the right mindset in those people and instead trained them up to fit the mold we're looking for. So in terms of what we manage, really looking at Pivotal Cloud Foundry or Pivotal's application service at this point to keep on changing names on us, as well as Kubernetes. And we are going to breeze through these slides pretty quickly. We assume it's going to be some questions and answers. So some of this content will just be going through fairly quickly. We also have some IaaS that we deal with for our customers. We have some guys that need to run services that are not part of the platform itself. So we boss deploy some Rabdom Q, a lot of concourse, jump boxes, things of that nature. In terms of where we were versus where we are now, it used to take us seven months and 72 steps to get new code to production. And you think about how slowly that is moving for something when our CEO or someone else in the business wants to get some feature added, some kind of new promotion they're launching, and we have to scale up for major events like iPhone launch or for Black Friday. That is just a crippling speed to move at. So we've been able to change that around. We can now deploy same day from dead test production. We have some of the guys in this room who are involved in that process as well who could speak to the benefit they've seen in their own teams. Big part of that also, moving towards cloud native microservices and trying to optimize these things for the scale they're running at, has led these apps to run a lot faster. So on average, we've seen 40% to 50% application response time and a lot more liable applications as well. They break 83% less frequently, and we can fix them much faster when they do due to the very narrow scope of them. So all in all, a lot more changes happening, a lot more frequently, apps are running faster, they're breaking less frequently, and they're being fixed a lot more quickly when they are breaking. Beyond that, we have some applications that used to use around 2,000 physical servers. Due to the consolidation we've done here, they now run on less than 150 physical servers along with a whole suite of other applications. So incredible density that we're able to achieve through this platform and through containerization. So throughout this whole process, we asked two questions of the broader community. How big is too big for a single foundation? And how many foundations is too many? And everyone kind of has different answers. We've been pointed to some case studies showing Diego and BBS scheduling the 250,000 containers. And sure, that's great, but what about logging? What about monitoring? What about metrics? What about the fire hose? And those aren't really the concern there. Other enterprises say we don't worry about logging. Some enterprises say customers are on their own for monitoring. Really a lot of different options all across the board, but we couldn't get a good answer to these two questions. So we kind of started to find the answer ourselves. So our largest foundation, 16,000 apps, these are not all running, which is why we only have 9,500 AIs. So 16,000 apps in the Cloud Controller, maybe half are running or so. 365 orgs and 2,200 spaces, which is just insane to us. There are some services that page through all the spaces, and it takes forever. 3,200 users, only around 500 of those are monthly active users, so it's not 3,000 developers log it in every single day. 8,700 services and 43,700 bindings. That is just a lot of services. A lot of these are syslog services that almost every app uses. As you can see, we have 11,000 syslog drains, a lot of app dynamics, a lot of Spring Cloud services, including config server, service registry, circuit breaker, just you name it, we've got it running. We've got a lot of stuff in this foundation. A recent update did not go all that well, which we'll get to in a minute. So at vertical scale, we found some limits. We found what does not work that well. We'll talk about some of the problems we've actually seen that showed us that this is too large. So just kind of general issues. And I apologize if you have sunglasses. You want to block out all the magenta. We've got a lot of that coming up. So general issues, we find that we have some TCP buffers overflowing from our Diego cells that are doing round trip traffic back to our foundation. Just kind of some bizarre problems where we get occasionally a few percent packet loss that leads to some disconnected active sessions, some SSL handshake failures, things like that. Just some really strange stuff we see only in our large foundations that we don't see at our small ones and have not quite been able to pinpoint that. Fire hose nozzles in our large foundations disconnect every minute or two maybe. Regardless of the scale, we cannot keep them running. We still use them, but we have kind of some sparse and not always very accurate metrics. And then beyond that, just some occasional errors where we'll get issues pushing applications, whether it's going to Blob Store, talking to BBS, something else like that, that we just, it makes it difficult for us to have some dedicated transactions running like a smoke test for Spring Cloud Services or for general Paz runtime. Where we have a hard time saying, is this an actual failure that's gonna impact customers or is it just something that if we retry, we'll work without a problem. So it's made it difficult for us to troubleshoot some problems at times. One of the big ones we've seen, this is an example of an upgrade that's running. So here you can see preparing the manifest and then it goes compile packages, which it does not have to do. And then 45 minutes later, it finishes with no changes. Just sits there chewing through, deciding what's gonna change and realizes nothing's gonna change after 45 minutes and moves on. So that's not a huge problem to have upgrades that take a while when nothing happens, but when something fails, and then we have to again start the upgrade again and wait, we push something that's a wrong change and we have to go through and change that again and wait for all this wait time. It gets pretty brutal that to sit there and just wait for it to chew through and turn around and do almost nothing at all. So we don't like that much. We've got these slow upgrades where it can take sometimes 60 to 90 minutes for Opsnan and Bosch to even start doing the work. Beyond that, we got a new stem cell or a new major paths version and it can take 18 to 24 hours to repave our whole oral platform. This is all running on-premise. It's not AWS, so we have underlying infrastructure that could be causing some of these issues, but really just a very long time for upgrades. And then when we start adding in some of the on-demand services, Pivotal Cloud Cache or on-demand RabbitMQ, Redis, MySQL, it gets to one of those jobs and it may take a minute, it may take an hour, it may take 10 hours to go through all these instances, really an unknown amount of time and we can't control the order of those or the time in which they happen. So we're not crazy about that. Another big problem is blast radius. We have a number of production foundations. We've got more or less five or six foundations in the same number context for production. However, we started out with one of these before we scaled out to more and teams still like that foundation. They know it by name, they love it, they want to run all the stuff there. As a result, it handles 70% of our traffic. We had an issue about probably nine months ago that led to a 15 minute outage where no traffic was getting to the foundation. The impact was so large that our CEO was aware of it and wanted to make sure it did not happen again. So it kind of shows you the importance of this one single foundation that we just cannot allow to happen. The Cloud Foundry API, as I mentioned, we've got 16,000 apps, 43,000 service bindings. This stuff takes a long time to page through. When logs or metrics come out of the platform, they are associated with a GUID, not by the app name or the org name or space name. GUID is not very useful to most people and so we have to correlate this information. To correlate it, we have to look it all up in Cloud Foundry API and we have to make sure we keep that stuff up to date and keeping it up to date every few minutes with that many records is pretty significant. Trying things like the Prometheus CF exporter, we actually hit a limit with open file handles because it was doing this massive, unconstrained parallel lookup of all these objects. So we had to submit an issue to them to have a size weight group and said that does not just spin up three or 4,000 threads all at once or go with teams rather. The latency, we occasionally have issues where if we're doing some cache refreshes for some of our downstream systems, we might see latency of three to six seconds or so and as an end user, when you do a CF login and it goes to pull up your list of orgs and it takes six seconds, like that sucks. It just feels slow. Again, for the spaces, waiting more time for that, looking up your services, your apps, all these things take time and it's a very bad customer perception. Logging is one of our favorite and least favorite things to talk about. As I mentioned, we have 11,000 syslog grains. For production, we produce around 4.5 billion log messages per day. We lose around five to 10% of those at a sustained rate that we have not been able to improve upon yet. And then we have some customers, one in particular, that logs out 300 million messages in about 10 or 15 minutes every night for some big batch job. And that's basically one instance, logging out all those logs. We drop 70 or 80% of those logs. And there's not a whole lot we can do about that. So we try to work with these customers, try to improve things, but in general, we've scaled LogGator as large as the LogGator team says we should scale it. 40 doppers, 20 TCs, 40 syslog adapters, and it still is not big enough. And there's not much you can do about it at this point. So there's not much we can do about this vertical scale problem beyond just saying that you're gonna lose some percentage of logs, log less, use LogGator less, then it will probably be better overall. So a lot of problems with vertical scaling. The best solution to vertical scaling challenges is horizontal scaling. So many smaller foundations instead of a few very large foundations. So instead of challenges that comes with that, so James will talk through some of those issues. So I wanna add to what Brandon just said. So it's interesting, it's just a varied experience through the day for those foundations, for our large vertical foundations. And we're not happy about it, right? We know it's not the right thing to do and we're working really hard to fix it. It's just, and we have a persistent sense of dread about making changes and how things impact our customers and work. There are a couple of key contributors right down here. We're gonna help us fix that. Horizontal scale, so horizontal scale brings its own set of problems, right? We're hoping what we're gonna do is we're gonna make life better for our customers but we're gonna take on a lot of complexity ourselves. And one of the main things is just like how do you do automation across so many foundations? How do you, that there's a lot of objects, PCF automation is complex, it's fantastic, but it takes a lot of automation just to get PCF automation working. So that's one of the things we're looking for is how do we automate our automation? The first thing we do is deploy PCF automation with Bosch and Concourse. That's its own set of things and tooling to maintain. We need to make sure that configs are standard across our foundations. We need to make sure that our config management is solid. If it doesn't, once we start automating things and pushing things to production, we can have a severe impact and this has happened before. So we need to make sure that we're doing the right thing with config management. We need everything to be CDCI. We have to dog food what we tell our customers to do, which is everything you do, do with automation, everything you do, do with continuous integration, continuous development, we need to do the same thing. Horizontal scale requires a lot of tooling, right? There's it's, we have a lot of sidecar infrastructure to maintain and again, that's just a lot of complexity to manage. We have 15 sidecar Bosch instances and that number's growing daily. We had centralized Concourse, centralized Bosch and what we found is we hit the same problems with scale and those that we were hitting in our foundations. So now, pretty much every hardware region gets its own Concourse, its own Bosch sidecar. Every foundation is gonna get its own Concourse, right? So again, we run into a management problem of how do we automate all that? We have a lot of other tools. There's over there, we love Bosch, we love Concourse. Like the Concourse is on fire. That's, that tells us now. Yeah. But so I think we have to maintain current state for all this automation tooling. We have to keep it patched. We have to keep it upgraded, monitored, backed up. Right? It's a lot of complexity. So essentially we're taking the complexity from managing Opsman by clicking around the GUI and we're pushing it one level up. And it's, it's hard. So that's the issues with us. Now we talk about the issues that we're, that we're giving to our customers by scaling out foundations. So teams running their applications and multiple foundations is very problematic. That we don't have a good geo load balancing solution yet. We have something we've worked on and we're rolling it out early stages but for a customer to move from one foundation to another they've got to notify all their upstream customers of changes. And in some cases that might be 10,000 care reps that have a bookmark linked to our primary foundation. So it's not as easy as telling the customer just move and when are you gonna be done next week? They can't do it. I will talk about the GSL but so we've tried to solve this problem with a GSLB. We've created a service broker with the open service broker that basically you can register which apps you want to be geo load balanced and then when you bind your, you create the service, bind the service to your app and then when you create a geo route and bind it to that app, as long as the health check passes you'll start routing traffic to that application. So it's DNS, at that point it's DNS load balancing. Seems to be working pretty well. We're hoping this is gonna get out of our mind. We're also really interested in working with Pivotal to try and figure out the service mesh to solve these problems so we can just eliminate our own solutions. And that's pretty much all we got. We figured this was gonna be a, we were looking for questions and feedback from folks in the room because this is a pretty interesting topic to get feedback on live. So yeah. We've chatted with a couple of the customers. We had a customer advisory session on Monday along with the PCF user group and found that we're definitely not unique in these issues. Our friends at JPMC have six times our foundations and they're managing a horizontal scale in entirely different ways than we are. Some other customers have a similar size vertical scale and have similar issues that we have. So we wanted to open it up for questions and answers here. We also set up a channel on the Cloud Foundry Slack running at scale. So hopefully continuing this conversation beyond just the conference, but having some of the large customers that are able to join up and start talking about some of these issues that we've been experiencing. And we could hopefully find some common solutions or patterns at least to mitigate them. We ran way too short. I am. I'm going to get you. For the Global Low Balancer service, is that on a particular Global Low Balancer? And what is that doing? Yeah. So we have F5 GTM set up that we have a separate DNS zone delegated to those. So when customers say, see if create service my GSLB, it goes out there and configures a wide IP on that F5 GTM. So when a customer says, I need to get to myapp.go.t-mobile.com, that lookup itself goes to the GTM. The GTM says, I know that this app is configured in these four PCF foundations. And so it figures out which of those are healthy and which one it can route traffic to. And it turns back one of those IP addresses for the DNS lookup. That's effectively how DNS load balancing works. So it's just been the automation we put in place in front of that that's made it work along with the OSB. Run right back there. So you were talking about the different ways that companies are handling the instances that scale, like, for example, going to multiple foundations or whatever. And that's something that I think we've done to try and address that, what have had a struggle is getting people to actually move to those foundations. I think you described that very well. How are you guys handling that as you onboard new applications, are you having to think about, OK, you're a small fish, so you like going this cluster in, you're a bigger fish, you go in this other cluster, or how are you kind of addressing that problem? Sure. For production, we have CF management set up that PCFS put together. So any time a customer wants to be onboarded to a production foundation, they get the same orgs and same spaces and same quota across all production foundations. So from the very start, they can push their app to all these places at once. We have dashboards we expose to customers that show the current utilization and the available free space in the foundations themselves. So they can kind of make their own decisions in terms of what version is this at, how much space is available, how much am I deploying, and where should that go. We tell all of our customers, you should deploy to multiple. We have two regions of hardware in our primary DC, four foundations across that. And so we say you should at least two of these, run three instances of your app in each of these foundations and distribute load across all at the same time. That way you can tolerate failure of an AZ, of an entire region of a whole PCF foundation without your application being impacted itself. Not all customers do the right thing. So we're working with some of our teams now directly saying we've noticed your application is not running in both places or is not running in more than one place rather. So are there barriers that are keeping you from doing this? When can you make it happen? Making sure that our leadership and their leadership knows that this is the state they're currently in and if we have issues, it will likely impact their application. When we add more foundations, we're gonna have issues where we're trying to figure out what the right approach for that is, right? If we have 15 foundations, do we really want everyone peanut butter across all 15? Or do we start putting together groups of foundations and say you deploy to foundations that A, B, C, right? So it's kind of, we're figuring out what that transition looks like and we have some limitations because of hardware retire. I mean, other just lifecycle events that are coming in where we hope we're gonna be able to do it right, but it's gonna be very challenging. Couple right there. You mentioned CICD for long ops man configuration. Could you please elaborate a little bit on this? Now the question is, you also mentioned automation, your PSA of automation, and that's very curious. What did you do to achieve that? Thanks. Sure. So do you want to talk about that? It's just, it's a work in progress, right now. So right now we manually put a concourse and that automation on top of that, but we have another team that's, a sub team that's looking at how do we automatically deploy concourse and then how do we parameterize everything so that we can set up that, set up PCF automation to then deploy a foundation. In terms of config management for a particular foundation, it's all just kind of a get ops model to where we update repos and then run the automation. We've, we're very diligent now about making sure that we don't push bad things to, or we don't change too much with a given push. So we do some, some, you know, configuration, drift analysis between what it was and what we're going to, and we're very careful about what we push and gaining confidence and making that a much more, just push button automated. We'd like to get to the point where, you know, something goes to PIVnet, it goes to our staging foundations, runs for a couple of days, you know, and then our automation will start just rolling that through all the foundations. Not there yet, but, so we know we need to do automation, automation, not solved yet. We're hoping the community really kind of sees the same need and starts working on those problems as well. Great presentation, by the way, thank you. I've seen other places, especially with more inexperienced development teams, if they have a problem. First place to put the blame is on the platform. That's where the problem is going to be. And it's on the platform team to prove them wrong. And that works kind of okay in smaller environments and I would assume that the skill you guys are running, that doesn't work anymore. So how do you handle that kind of communication, especially when things are not perfect between the product you guys provide and the consumers that you have? I wish I could say a team that was unique and that never happens to us, but we are just like everyone else. Used to be you blame the network team, now you blame the VMware team, then it became the platform team instead. So to a large extent, we try to communicate status of our platforms as best as we can. We run synthetic transactions, smoke tests against our services and against the foundations themselves. So when somebody says app pushes aren't working, please fix it. We can say, well, your app pushes aren't working, our app pushes are. What's different between these things? And for the most part, we've gotten our customers to understand that if there's a platform problem, we'll probably know about it before they do and we'll be raising flags and letting everyone know as we ring the alarms. If they're seeing an issue, it's more than likely on their side. We'll help where we can, but our job is not to debug their application. So we'll investigate things to a certain extent, but we're a small team. We can stay a small team by not fixing every customer problem. So as best as we can, we'll help customers if they're not willing to help themselves, then we'll have a communication with their management saying, we've seen four times in the last two weeks that your team hasn't done basic troubleshooting, like looking at their own app logs or trying to investigate their own problems. And we level set with them and say, this is not the kind of environment where you can depend entirely on us to fix your stuff. We're a shared partnership. I mean, we want to help you, but you have to help yourselves as well. And we try to be transparent as possible. And I feel like we've earned the trust of our user base and that helps a lot, right? So we go out of our way to make sure that we, if there's a platform problem, we're completely transparent about it and when there's not and when users have issues, we share those issues and try to keep a lane-free environment and just move forward that way, right? So it's also great leadership support, right? The leaders believe in the platform and trust that we're trying to do the right things. And when we communicate up, this wasn't a platform issue, right? We have a lot of credibility with them. Awesome, thank you. Great talk. Thank you. A question, actually, to de-manage your own infrastructure, that's the first question and the second question is, you said your ops team is not that large. Can you give an idea of how large that is? Yeah, yeah, we do not have to manage our own infrastructure. We have infrastructure dedicated to us. We work with our VMware team to architect what that looks like from a compute and from a storage perspective. And then they deploy vSphere for us dedicated to our team. But everything above the IaaS layer is our concern. Everything vSphere and below is their concern. And so we say, we want to be able to provision a VM and know what will be done and not have to worry about any of the underlying stuff. And that makes it much easier. However, we occasionally will find problems at the infrastructure of vSphere level that we can say, hey, can you take a look at this? It doesn't look quite right and they'll give us feedback on if that is true or not. In terms of the size of our ops team, we have both a PKS team, like a core PKS team, as well as a core PAS team. And they're both around six or seven people and they have a shared team around four or five people that is designed to deal with logging and monitoring across both of those platforms. So we have a customer team that's four, I think four or five people or so, that really is, when a customer says I'm having issues, I can't get this to work or hey, I knew to your platform, how do I make this work? They sit with them, they explain things to them, they will do training or onboarding if needed. And they're the ones that make sure that they're successful and that if they're gonna have issues, we know about those long before they've grumbled to their management and said this stuff doesn't work at all. I don't want to use it. So about 25, with leadership support, about 25. About probably 20 tech people, then three managers, a senior manager, a product owner. Yes, it does. Yeah, Kubernetes with PKS, also running on Bosch. The number of logs that you reported seems like very large. Have you at least tried to determine whether some of these logs are like difficult logs or necessary logs or something to try to tame it down and you still at that number? Yeah, I'd say of the 4.5 billion per day, probably 95% of those are not necessary logs. We work as best as we can with our teams to cut those numbers down. But for example, we'll see teams that don't escape line breaks in a JSON body they log out. So in Splunk, we'll see all the log metadata with just a single curly brace or maybe just a blank line. We had a team recently where we found they were escaping with the backslash almost every single quote in a JSON body which was over 1,000 of them per message they put out. So 1,000 bytes just for backslashes and a message they put out. We have teams that do transaction reconciliation where they say, I need to be able to find every transaction in Splunk that I have done. And we say, you're losing 5% or 10% of those so it's not accurate. And they say, well, we still have to do it the business tells us to. We also have teams that have debug or trace logging turned on in production. And we find those and tell them not to. As much as we can, we try to identify these things. We have reports in Splunk now that kind of show our top producers and we're gonna use that as a wall of shame to say, here's who's logging. If you're sending out three to a million logs in 15 minutes, it probably is not that useful. We have some guys here that might be on that wall at some point. So we'll see what happens and see if it works or not. It's worked well for overall usage and memory consumption for the platform. So hopefully it does similar for logging but we do know that's a bit more of a change to the application potentially if they're having to change the way the application is logging at a fundamental level. We've got a few minutes left, I wanna make sure I plug the CAS engineering talk we have 4.35 p.m. We've got some cool stuff cooking up there. Our long demo session you'll be able to see all about how it works. The expansions we made for CAS toolkit and turbulence that we're able to do some pretty cool stuff in the past level. Any other final questions? Quick one. Are you all leveraging selective deploys in ops manager and so have you built any automation around that? We are not yet. We just got the PCF 2.3. We have been on 2.0 in majority of our foundations. So we're kind of blitzing through from 2.0 to 2.3.8. So going forward we do plan on using those heavily because that is a big thing that makes our deploys go so much more slowly. That 45 minutes of doing nothing, we could skip that entirely and just move straight to the stuff we care about. Yeah, one more question up here. Yeah, I can make it. Sure. How do we manage your... Oh, she wants it on the recording probably. How do you manage your release of features as the new versions are rolled out? Is there any communication channel or something? And how do you certify features that can be used by the community? Yeah, there's definitely some features we don't want people to use. Example of this would be volume services where we think that proliferation of NFS mounts is gonna be bad for us to manage overall. So we look at features like that and don't advertise those features. For any new given major or minor pass version we'll look at the features that are listed there. See what we think is the most useful to our customers or the biggest pain points people have had and we'll communicate that to our channel in Slack maybe six weeks before we ever do upgrades and say we have these planned, here's the new stuff it brings you. Here's the new features you'll be able to get and then as soon as you have a foundation ready that can take customers, we'll say, hey, we've upgraded to this version over here. You can start playing around with stuff. Again, here's the key features we think that'll be interesting to you and can start telling people about those. For the most part, we are pretty selective with the things that we enable, like container-container networking. We just recently enabled that even though we've had it for a few versions. Just making sure we're gonna be able to tolerate that well, that it's not gonna impact our cloud control database for example, that we have the right overlay network setup so it's not going to, we're not gonna run out of virtual IP addresses on there which we did previously. So yeah, a bit of a slow rollout overall but we try to make sure that we release things that we wanna release and if customers are asking for things we're not gonna release, we're pretty emphatic that this is not something we're going to support. It will never be turned on until we see some changes made to it. So, please stop asking. Can you talk more about your process for deploying upgrades out to the platform itself? Not necessarily on the communication side but like, do you soak test, you mentioned like 70% of the traffic in one foundation so where do you stage that and do you stage it multiple times? That one comes last. Yeah, I mean like, do you follow the Sun model? We have a global deployment as well. We'll kind of like start in one region, just kind of like follow the Sun over half a day and hope that somebody from the first region screams already enough. I don't know, do you have? Yeah, so we have a number of staging foundations, pretty much one of every hardware region that we have. So we'll usually deploy there first, leave it running for usually a few weeks or so. We'll play around with stuff, leave our smoke test running, it just kind of sits around. This is usually for minor versions, not for patch versions we're a lot quicker about. Potentially. Oh, okay. Yeah, it's not always that long, sometimes it'll be days. What's that? Oh no, a critical CVE, that's generally gonna be a patch release anyway so we can push that out pretty quickly. A stem cell update, we're not gonna let it soak for weeks. It might be one or two days at the most depending on severity of a CVE. Just curious, are the users using the staging? No, they are not. Okay, so it's an internal staging one? Exactly, internal staging, it's just for us before anything we push to production goes through staging first. Once we get, once we think things look pretty okay there, we'll push it out to the non-production foundations, which is all the ones that are, the users are using for their own non-production. Okay, and do you do the production simultaneous or do you kind of roll that one too? We normally, we'll normally do one or two non-production foundations at a time and then we'll roll through after those are totally done. We'll start the production foundations after those non-production are complete and we'll do, depending on regions, we might do one of the two foundations in a given region at the same time across several regions just to get things moving more quickly because if we did them all in serial it would take a very long time. So we're still working on improving that. Right now it still takes way too long to push these changes out and so figuring out which level of changes we can push at the same time, which foundations are the most critical and really, we think like that one was 70% of all our traffic. We're actually deciding, we're going to end of life that foundation and say it's going away quarter three of this year, here's your new foundations, please move to those because we're not gonna support this one after this period of time. It's just, it's not safe anymore for us to upgrade that. We had some non-production upgrades with that, the one largest foundation we saw that just made us a little bit too gun shy of touching the other big one. Thank you. And when we get more horizontal, this is gonna change. Hopefully where everything just becomes automated and we become less fearful of it, there'll be less load in any one given foundation so those upgrades should not only go more smoothly but less blast radius if something does go wrong, right? And hopefully folks use our GSLB tool where there should almost be no impact but like our current upgrade process, it's kind of all hands on deck and somebody's pretty much up 24 by seven watching Slack and watching a Bosch task somewhere, right? It's pretty brutal right now just because we're moving from, we're trying to move as rapidly as we can to get caught up to N minus one and then our lives should get a lot better, right? For once a quarter patching instead of where we are now. Yeah, and as we move to more smaller foundations, the impact of any one of those going down is much less so we can be more tolerant of risk in a given foundation. Well, thank you all, appreciate your time. Thank you. Thank you. Thank you. Thank you.