 Greetings, programs. My name is Michael Broadhead with Stark and Wayne. This is my colleague, Pat Jones. Once upon a time, in a faraway land called the Cloud, a few of us sat down with a new client. I have three foundations, said the client. And on these foundations, I'm running some apps. And I need you to move all three foundations and the apps that are running on them to my shiny new data center. Well, I inquired, how many apps are we talking about? Well, came the reply. We don't actually know for sure, but across the three foundations, it's on the order of several hundred apps. I thought to myself, gee, that's a lot of dev teams to coordinate with. And they don't even have hard numbers for how many apps we're looking at. So I paused. I rolled up my sleeves. And I said, you're in good hands. We can do that in six months. Now, the people who know this story are laughing. No, no, said the client. You don't understand. We have already given up the lease on that data center. You have 60 days. Because in 61 days, literally, people are showing up to start pulling cables, physically unracking servers. That facility is shutting down. We have a hard deadline. I grew concerned. Pupils dilated. Blood pressure went up. Respiration increased. Cortisol levels went up. And as we dug into the details, it became apparent that we didn't actually even have 60 days that when all was said and done, what we really had was about 30 days in which to do the work, which brings us to the hero of our story, bulk application migration. All right. So let's talk about a few assumptions we're going to make to not only scope what we're going to talk about today, but also to give you an idea of why we're doing what we're doing, why we're not doing what we're not doing, and things along those lines. So these are going to assume that both the foundations are using internal blob stores. If you're using something like S3 for your application droplets, this is going to be a little simpler. So internal blob stores. We're not going to be talking, although it's really important for migrating foundations to talk about application services and service data. That's kind of its own thing. It's almost a whole talk's worth of material. So important, but we're not going to talk about that today as well. We have to assume connectivity between both the new foundations and the old foundation, not only between those two sites, but also to the original application service data as well. And we say connectivity, what we mean is we need to do some file transfer, usually over SSH, between the two sites. All right, so the first thing we do before we can migrate anything, we need our new foundation. So in this case, we create a new Bosch with Genesis, and the big thing for going between two foundations like this, you need to minimize the entropy you have. So you need things like versions between cloud foundries, sizes of cloud foundries. You need these to be similar. You don't want to move as many, you don't need a lot of moving parts to make things an already process that even harder. So basically, we want to basically copy or manifest change of details we need to about networking. For Genesis stuff, we need to change the director details. And basically, we're going to kick that off. So we've got our brand new green-fielded environment. We've got our old environment with all our applications, and we need to start making a game plan of how we're going to actually migrate those applications in one fell swoop. So the big picture when it comes to migrating lots of CF applications at once, there's two big operations that we need to do. We need to migrate and sync our blob stores and the databases of cloud foundry. Those are what we need to do at the end of the day to get those applications to spin back up in the new environment. Now, if you've gone throwing around in CF databases, you know there's more than these three, but these are the three main ones for what cloud foundry knows to be the lay of the land. For as far as application details, tendency details, these are what need the important databases for what we want to make sure is maintained on the next site to make sure everything comes up. Now before we move anything, we want to make sure we understand what it is that we're moving. So we need to take inventory. We want to inventory the applications that are running on the foundations, the services those applications are consuming, domains, any domains that are bound to those services and any certificates that are on your go routers. In this case, it turned out that we were looking at about 1,100 apps. Surprisingly, almost all of them were single AI, but it's important to know if there are any outliers. So as part of the inventory, as an aside, anything database-y that you see in our talk today is going to be using Postgres. Many of you may be running MySQL on your foundations. The concepts are all the same, just a few little details, different. But so we did a query to identify any large applications. If there are applications that are outliers that are running lots and lots of AIs, we want to know about them because an app that's running a lot of AIs has a lot of visibility in the org. So we want to contact those teams, make sure they're in the loop about what we're doing. Similarly, there might be orgs that, well, they don't have any single app is really large, they just have lots and lots of apps. And so these are also effectively high visibility, so we want to be in touch with those teams, make sure they're in the loop as well. We need to take inventory of any applications that are failing. If an application is in a failing state on the old foundation, we can't reasonably expect it to come up and run beautifully on the new one. So determining what applications are already failing helps guide our definition of done. Once we've scraped all of the services and figured out what services the apps are consuming, we need to make sure we really have connectivity to those services. Is there some firewalling in the way? Is somebody running a WAP, a WAP that needs to be reconfigured? So we want to try all this stuff out. My favorite tool for doing this is Nmap. If you don't have that available to you, pretty much every host has Netcat, which you can use. In some cases, we only had host names and not ports, so we just had to ping those hosts and make sure that they were pingable. So while we're still in the beginning planning phases of the actual migration, we definitely want to do some housekeeping things with certificates. As far as if you have any external load balancers, you're going to want to adjust those for the new site and get those ready for the switchover. And also early on, it's a good idea to switch over your DNS, shorten the TTL. So when you do eventually flip the switch on your new apps, that that is a shorter timeframe and you can bring down that old foundation sooner. All right, so again, more planning stuff. We need to actually start timing the operations that we're going to be doing. It's hard to give you an idea, even roughly, about how long these operations will take because it just depends on your Cloud Foundry. How many apps you have, these sort of things, how big your Blobstores are, is really going to make a difference on a lot of these operations. So we want to, you really want to test to make sure you're accurately planning and giving yourself a good buffer for your change window so you're not going to run into any issues. So we're going to run time against these commands to do those two operations we talked about. So our pre-syncing Blobstore. This, the Blobstore is our most expensive operation. This is what's really going to take the most time. So not only do we want to time this, like we said before, but we want to do this as much as possible going forward to minimize the difference between the Blobstore as developers are pushing to it as we approach the deadline. So the longest it's ever going to take is the first time you do it. So we want to keep running it periodically to keep that diff as low as possible. So we're just using R-Sync across the two boxes. Here we use the VCAP user, but just know that you can do it other ways, but at the end of the day, make sure that directory has its own by VCAP and everything will be good. We want to do the same for the databases. So we're going to do a PG dump all, same thing, and time this and sort of put together your change window, approximately how long it's going to take, give yourself a good buffer, and yeah. So now that we know how long things take, we can decide whether or not we need to take downtime. One of the complicating factors in this particular case is the client told us not all of their apps were very 12 factor friendly, and they had some apps that really could only, were designed to run with just a single AI, and they didn't know what would happen if they were running multiple AIs. It might be problematic. So they specifically told us, we can't have more than one instance of their app running at a time. So they would rather take the downtime than risk the undetermined behavior of running multiple instances of those apps. So now we know we need downtime. We know how long things take. So we can go to the CCB, and we can schedule our change windows for each foundation. Now those change windows are inevitably going to be, at least a few days out. So we want to keep syncing the blob store every single day on a regular basis so that we, as Pat was saying, minimize the work that has to be done during our actual downtime. We were fortunate on this project that we had one of the people on the team was my colleague Ramon, who's in Central European time. I'm on the West Coast US time. So between us, we were able to keep things pretty well locked down. And at that, it is go time. Yeah, so this is officially we're in our change window. We're ready to start the actual process and things are ready to go. So our first step is also an optional one. You don't have to do this, but the idea here is the question, what happens during this time if developers are pushing changes and what's going to happen? So the answer is depending on how easy it is to communicate with all your development teams, if you have a bunch of development teams and you can't necessarily talk to them, you might want to bring down the go routers to prevent them from actually pushing changes and you're not going to get any emails, hey, I pushed this, what happened? Or you can not do that. And if they still send you those, just say, I mean, you missed the email. We're doing a change window. So that's out of you, man, I'm sorry. But if you don't want to get those emails, then this is an option for you. So this is optional, but it's up to your organization how you want to do it. So then, like I said, the first sort of actual step you want to do is our final blob store sync. Like I said, this is expensive, even if this is a no op to traverse that tree instead every file is going to be, it's going to take a long time. So we're going to do this, front load our time, front load our work so to speak. So we're just going to kick this off while we move on to our database bits. With the blood store running, it's time to dump our database. So we bash SSH into the database host, we dump everything into a file. Now, before we shift that file off to the new site, we want to run a hash of it. Now, modern protocols are real good at getting files to the right place in good shape. But if we're talking about hundreds of apps, the cost of failure is high. So it's worth being a little paranoid at making sure we dot every i and cross every t. So it costs very little to do these checksums. We may as well do that. So we run a checksum, we note the output. Then we go over to the database on our new foundation, but we can't load the database right away because that foundation is up and running. There are a bunch of processes using that database, using those tables. We don't want to change things out from under it. Madness ensues. So, first thing we need to do is figure out who's connected to the database. So this query at the bottom will tell us the answer to that. And the output will look something like this. Now what we can do is we can go through this list from the IPs, we can identify all of the jobs that are connected to the database. And we can Bosch stop every one of those. And we go through one at a time, we rerun our query, we see what we missed. And eventually, when we run the query, we get this. So now we know nobody but us is connected to the database, subject of verb agreement, gotta love it. And it's safe to do our load, and we won't be mucking about with any running processes. So we run that checksum on a local copy of the file, compare the hash, make sure it came across good. And then we can kick off our load. Now we wait for a little bit. Once the load is done, we make sure our blob store sync is done, and at that point, we can start up our foundations. So we issue a Bosch start command. Now all of the jobs that we stopped are gonna come back up, they're gonna reconnect to the database and start doing their thing. And Diego in particular looks at the expected state of the world, which is, hey, I've got all these apps that are supposed to be running, and the actual state of the world, which is, hmm, I don't see any apps at all running. Maybe we should do something about that, and it begins to converge the state. So it fires up these apps, and if you're talking about hundreds of apps, that can't happen instantaneously. It takes a while to spin up every single one of those containers. So we can watch that process go by running a query on the state of all the processes in the system. So this'll let us see a summary of everything that's there, and we're looking for everything or just about everything to get into that running state. Once we've converged and we're happy, all our apps are doing the right thing, we now can go to DNS and we can move DNS over to point to the new site. And there we have it, disaster narrowly averted. Thank you, folks. Does anybody have questions? Ah, yes, so the question is, what about environment variables that have been set for your apps? Do those make it over? And yes, they do. Anything that's CFM'd in the old place, that's all accounted for in the database, and it will be CFM'd in the new place. So all your configuration will survive. Anything else? Oh, that's its own bag of bees. Come back next year and we'll do a talk on service migration. I mean, everything depends on what the service is, what kind of volume of the data, how live is the data, what is the usage like, what are your downtime, how much downtime is acceptable. It all depends on your particulars and what you've got in your environment. But if you wanna come up and talk to us afterwards and give us some specifics for what you have in your app, we can dig into that if you like. Anybody else? All right, well we are gonna declare victory. Thanks very much, folks. Thanks everyone.