 All right, so This is a user story your data center will implode in 30 days It is based on a true story, but the names and data center locations are have been changed to protect you So We are Stark and Wayne. We are a founding silver member of the Cloud Foundry Foundation You know, we've been going to these summits for a very long time. I think since the inception I've we've been everyone. We've been every one of them So we're here to help we're as I like to say we're the leading Cloud Foundry and cloud native technology consultancy and Myself I'm Brian Segwin. I'm the COO of Stark and Wayne and with me is Bill Chapman. He's our VP of engineering so We wanted to have some fun with this because it's kind of a fun story and fun So what we what we're doing here is we actually have an act or we've kind of orchestrated a play here for for this and So I will be playing the you know the VP in charge of the cloud platform for the for this custom for this company And this is a platform of a multi-billion dollar company that has 300 business critical applications on it with 2,000 AIs and a whole bunch of data services things like that So I had thought as this VP in charge of this platform that my director of engineering was Taking care of the migration efforts because we knew this was a long time coming. We had our data center. The contract was coming to a close So I thought everything was being handled and I approached him one day it were we're just a little bit about 30 just a little more than 30 days out from This this data center migration that we would have and I talked to him. I said all right. So what's the plans? What's the progress? You know, I have a report and He says to me, you know, I don't have anything and by the way, here's my notice. Okay, so So at this point I'm sitting here thinking all right, well My my my director of engineering is left. I really only have a few options. I could quit I Could beg for an extension from our data center people but apparently they had already sold this space And they're forcing the contract out. Okay Or I could find someone dumb enough who thinks they can actually do this all in 30 days So I hire a new director of engineering So please welcome Bill Chapman who is our new director of engineering that says he thinks he can do this We find our fearful C-level executive pacing in their office. They've just got the bad news Bill What are our options here? Well, I think our primary Focus should be standing up a new platform in the new data center and We will have the application developers Each have a maintenance window where they will deploy their applications in the new We have 300 applications here and there's 300 applications teams and about like Three times the size of that in people Do you think you can orchestrate all of those teams to get in sync and do all of this and in this small of a window? Maybe we should go to the whiteboard So what you're saying is we've got our cloud foundry and it's flat Individual platforms. We've got our data services We've got 300 applications 2,000 application instances One thing we haven't mentioned we've got six brokered service teams. So that means Those are all a separate issue. Those need to be migrated as well And we have various user-defined services that we don't have here It might make more sense to deal with the six brokered Services teams and then treat the platform as a data service as itself on itself Migrate it the same way Let me Well, let's I mean, I like the idea that six teams versus 300. I mean, that's a lot easier, you know I mean, so how would you do the migration? Like how would this work? Like let me talk to the team about that. We have some options Okay They've all slept on it calls have been made teams have been alerted. No one is happy No one is happy But plans are in action. Hey boss. Good news. I think we have a plan that we can work with Okay, so we've got a staging platform. We've got a pre-production platform. We've got a production platform We've got a data services tenant in Poughkeepsie We need to migrate all of the databases We need to migrate all of the blobs these blob stores could be large We don't know how fast the pipe is going to be between the data centers You know the second data center is in Anchorage, Alaska. Okay, so We may need to fly it there That is a possibility well fly it there I mean, I guess I could talk to the CEO about a corporate jet, but I don't think he's gonna be I mean There's a lot of risk associated to that. It's it's not likely. We need to do some investigation I think the pipe will be fine. We should be able to sink everything over. I'm just keeping you aware. Okay, so So with this plan, I mean Is there gonna be any like freezes to development like what's our what's our? you know What's our downtime gonna look like? well Most of the risk is gonna fall in the service bound teams. So our our database backed services our my SQL team our Rabbit MQ team those teams are gonna need to migrate their data services individually Those are gonna have to be coordinated with all six of those teams the platform itself We won't have to worry too much about freezing development, but we will have to freeze deployment So the developers will not be able to deploy to the platform While it's inactive migration the data service teams on the other hand, that's a different story the data service teams may have to Have extended periods where there may be outages that affect some of the applications We're hoping to mitigate that what might be ID what might work best as if we deploy those data services to a separate Location proxy to that location the new the apps won't know the difference There will just be a small cut over, but we're not sure yet. We're still in day one here Well, so what's our data loss risk here data loss risk is minimal if you remember Data migration is really just disaster recovery that you get to plan So what what happens if it doesn't work? All right, am I asking the CEO for the corporate jatter? Are we playing ice road truckers and trying to drive us to Alaska? Well? Once the new platform is deployed we can always fall back to what I said on day one The application developers can just individually repush their applications and it will converge on the new state Okay So, I mean how many people do we do you actually need to get this done? I Would say we need at least one member of each data service team That knows that data service and they can plan out their migrations individually Since I run the platform team, I will focus on that and I think I'm going to need two engineers 30 days is going to be cutting it kind of close, but they're going to need to be focused on the price Yeah, I mean how much time is that is this data going to take to sync like Do we even have enough time to sync the data with our 30-day window? If we use the jet But seriously no, I think we the blob store is our biggest concern those are those are the bits of Source code and stuff that's compiled and sitting up there. Those can be very large The last time I had to sync when it took about 10 hours But we'll sync it ahead of time So by the time we get to the actual migration date, we're only syncing the Difference so you're going to have whatever has changed since the last time if we sync it daily It's only going to be 25 hours worth of change Couple that with the deployment freeze so that the developers can't actually push anything different in that last 24 hours Might have nothing there might be no difference So we'll already know the platform is ready when we actually do the final switch over It's been a rough two weeks some things went well some things went very poorly Jane the platform team lead came in Friday and said don't forget. I'm on vacation next week so we've Managed to migrate staging and the data services that are needed for the applications in stated staging were handled by the individual application or a data service teams pre-prod and production are still not provisioned But we have automation to run once those are available Well, so I spoke to Tom Why did it take you guys a week to get? All the networking stuff that you needed for the environment turns out the networking team was complaining that there isn't enough IP space Available for what we're asking for the new platform and remember that since we're doing a Direct migration of the platform. We have to have the same IP space available in the new data center So this is a common problem. We have with with networking issues with we want too many IP addresses It's a large platform. All right. I mean what other blockers do we need to worry about? At this point since we've done the migration to staging we've proven out the idea We have gotten that test in place The plan seems to work as expected, but we're really pushing it for time Also, two of the data service teams were not yet able to migrate Okay, so it took you two weeks to get what I'm looking at is one environment So like by math I'm sitting here thinking alright It's going to take us two weeks for the next pertinent the next two environments So that's four weeks out. We only have two weeks left. I mean well, I understand But the results in staging have been encouraging and the reason it took so long is because we did a large upfront proof of concept We have the I thought you said this was gonna work. Why do you need to do a proof of concept if you've done this before I? Might not have done this before But I promise you it works now Also by the way The stark and Wayne team has a really cool product called shield that helped us out a lot along the way anyways The migration is complete things have gone pretty well some developers are grumpy, but developers are always grumpy The team is reflecting on the month so although This was successful and I still yet to see seem ungrateful to our director of engineering who has actually pulled this off So bill can you actually confirm that we did not lose any data? Well, the data services teams have informed me that their migrations were successful. I'm going to trust them Honestly it really is just a expansion of the application migration paradigm When you when you push an application you are going to move the data move the application cut over DNS This is the exact same thing. We've just done it for 300 applications I'm fairly confident that it went pretty well. I have talked to the leads of every data service team They have confirmed that each data services intact and running is expect expected and we Did have some grumpy-ness from some of the developers when they couldn't deploy But they were all happy to find that their data was where it needed to be when they Were able to get back to work So when you know when we kind of spoke a couple weeks ago You said we might actually have like minutes of actual application downtime and only like a 10-hour window of Development downtime where they couldn't push our applications What was the downtime we actually incurred so? We had a 24 hour deployment freeze for our application developers So we had 300 developers who were not able to do you know their weekly deployments, but Some of the mis-deadlines we didn't do a very good job communicating that to them But with respect to actual downtime It was measured in minutes. Oh The data services were migrated ahead of time things were proxied accordingly when the platform came up it converged on the proper state and There were only about five applications that weren't running as expected One of them is still being troubleshot. It turns out it wasn't running as expected before the migration So you're the scapegoat. Yes, okay So, you know In all honesty, I'm looking at the the projects of the team still working at it I still see that there's migration efforts going on and you know Everything's been up and running for a few days now. So why are people still working on this? There's still a lot of questions about the migration there are still developers who had problems before the migrations that weren't noticed They followed us to the other platform when we did our due diligence up front We knew what state every application was in but it turned out some of the developers didn't know what state their own applications were in So we're still fighting through that. You've told me our mission statement is to be helpful So that's what we're doing. Okay. So how much longer will they need to be on this project? It's mostly done. We're gonna need somebody Partially engaged for both PR for the platform team so that they understand what happened and why this was a tremendous Effort and success on our end, but we're also going to want to help developers so that they understand That there really aren't any differences in the new platform Most of them should have been unaware if it wasn't for the 24-hour freeze on deploying Most of the developers would have been unaware that we moved the platform 6,000 miles away Awesome well, I mean Good job and thank you for your efforts and as a token of our appreciation, please accept this pen This is my pen Thank you Are there any questions So, you know, this is actually You know, we kind of took some liberty of What actually happened from the from the comedic standpoint, but This is actually what happened for a migration effort that Stark away and was involved with not last year and It was it was crazy, you know, they we have a customer that comes to us and they you know, they say well We only have 30 days or otherwise, you know our this data center is going to delete all our data and We said well, can't you talk to the data center people and can't you do this? Can't you do that? And I said well, we had all these conversations and the answer is no Do you do you think we can do it and and the funny thing is is they were going off of You know a timeline and quotes that we had given them like Three to six months previous so immediately they come knock on our door and they say hey, we have this emergency Can you still commit to the the timeline that you said was a best-case scenario and we kind of said? Yes, and no and we actually had a lot of really Big discussions on it and and we did do some proof of concepts before saying yes and the other fascinating part about it was Something that was kind of implicit in this but it was I mean There's a couple ways you can do this migration is a lot of ways But the easiest way is to just deploy a new platform and to get all of your developers on the same page and allow Them to see if push on the other side. They're literally were 300 development teams that Nobody had control over So at the end of the day, they wanted to leave them out of the situation. So what we had to do is deploy the Platform move all of the data and the blobs and the and the bits that matter over to the new platform And then just hope everything converged on the same exact state that it was in in the other data center There is a technical talk on this concept or on this coming today at five o'clock 515 and the actual engineers who did the migration will be giving you the detail about it, but We just wanted to kind of talk through Kind of the high level here. Yes, I believe that if we thought we could have coordinated That many different teams and given them a minimal downtime window because that was the real problem the problem was lights out in this I mean We didn't have to go lights out here lights on here because what they did is they proxied all the data Services from a third location But it in the beginning we didn't know that we didn't know how that was gonna happen We were a week and a half for two weeks into the process before we knew how they were gonna handle the data services So that was the concern there was that if all the data moves then all 300 teams have to be on board right away And all CF push their apps so they're gonna get downtime until they actually see F push so we figured Moving it and letting it converge on that state and not even really caring if the develop caring is the wrong term But not worrying about whether the development teams knew even because from our perspective We weren't the ones notifying the development teams We're under the impression that a lot of them weren't even told this was happening because once DNS was managed and data was moved They wouldn't know any difference Because all all the endpoints were identical in the new data center Honestly in the beginning we had we had many heated discussions about the best way to do it We've done it before but this that the developers might not know about it was the real Clinsher here because we had to make sure that they didn't experience any significant downtime And even when we got to the other end we were pretty pretty happy that it worked I mean, but now we know it works. It's a viable strategy because we've proven it out, but We had some concerns all the way through the process We all we had access to was the CF endpoint so we could profile the system we could figure out what's there But as I joked in the talk We didn't know if all of those developers even knew what state their apps were supposed to be in in any given Oh, you know any given part or some of those apps didn't even have teams assigned to them They were just there and some some automation somewhere was pushing it Yeah, and what was neat about this is any teams that were modern And being proactive about their development and had pipelines that were working on things all of that automation should have Just worked because all the endpoints stayed the same and all the access was proxied appropriately So it was it was pretty neat to see happen MKB right here is who accomplished this feat and we just scratched on the surface But his effort was Herculean. It was actually impressive very impressive Thank you I do want to say that the the satire about actually flying it there or driving it there was a legitimate conversation that we had Yeah Right and and and we were we need we're like well how big are the blobs and are we gonna have to sneak or net it to you know to the other statement and We couldn't believe we were having that conversation in 2070 we couldn't believe we were having a conversation and it was If we still have some time if we have more questions Thanks, everyone. Thank you