 Hi, my name is Amy Degnan. I'm the CEO of Hook 42, but also a long-term migrator starting in around 1995 and 96. I migrated lots of really basic websites, but very big, big, custom enterprise systems. And I specialized in CMSs. And I'm Ryan Wiel. I have been, I'm not thinking I actually have done, I've been involved with migration much longer, but I've been migrating Drupal since 2013. And I've been around the Drupal community since 2008. I run my own little company called Cafe Interactive. Who are you? Sorry, I'm sassy today. Who's a developer? Okay. Who's a project manager? Okay. Who's a product manager? Oh, a few, very few actually. Who's an executive? Okay. Who's all of them? Exactly. Yeah. Okay. So this is going to be good for developers and project managers. I hope this is going to resonate for both of the folks here. We're going to go fast and enjoy the ride. Slides are text heavy, but so is project management, and so is migrations. So are migrations. Sorry about that. Slides will be posted after the session on the session deck. And before I forget, at the, please rate the session when it's over. And what we're going to do today to walk through this is we're going to take a sample site, like a sample case. A fictitious site. A fictitious site. That kind of combines some of our different projects from the past, ideas, and inspiration we'll say. Right. Then we're going to give you background about migrations, and then we're going to talk specifically about migration phases. So who's migrated a site here? Okay. Actually, not everybody. Those who have done it, can you in one word describe it? Hard. Hell. Okay. Hard, messy. There was a hell. Hell. Okay. Hard, messy. Hell. Long. Huh? Tornado. Tornado. Yeah. Crazy storm going on. Yeah. Those are all emotional feeling words. And they're very valid. So let's help have process and data and planning kind of help that, help mitigate those angry words. Smooth over the lost parts. I got into this because I got really sick of doing copy and paste and having the variables. So this is the way to try to contain the stuff. As we'll see. Hang on for the ride. Right. So first let's get a migration project, and that's nexustravel.com. This is our fictitious site. I actually don't know this domain, but nothing is on it right now. If you go there, go to daddy's like, no. This is the multilingual demo, which hook 42 in the D8 multilingual initiative put together. This is nexustravel. It's a travel site. It's an online business that sells pre-planned trips. It was built on the now non-supported triple six. The website is large. It has many enterprise grade features. There's lots of content of different types on the website and much of the custom code interacts with data. And about the content. We have location data, tours, vendors, members, pictures, tagging, advertisements, commerce stuff. But then there's always, you know, things that people forget. So we'll get to that too. But we've got notes, so like pages, users with additional information. We've got media entities that store our pictures and have metadata. We've got many more metadata blocks that have additional info advertisements. And then commerce has many levels of entities for payments, orders, line items, payment, transaction IDs, shipping things. I think there's like 25 or something like migrations just for commerce. All sorts of things. So entities. The client says, hey, we're under a tight timeline. And the new and improved features, we haven't defined those yet. And the large, we've actually committed to our vendors a huge amount of like investment. Like they paid us a lot of money for these features and they have to be out by a certain time. And Organic SEO is the largest driver of traffic to their site. So that's fun, right? And how do you feel right now? Who feels like this? I have goosebumps. How are you feeling? Oh, okay. Or who feels like this? Like a superhero. Downright, we can do this. So... Yeah, for the video, no one says yes. Well, we're going to hopefully, like everybody should kind of feel like irked looking like that when that business says oh, we haven't defined things. And we have a tight timeline and we've committed money to this. Or even a multi-net's about to hit and we needed just like, just about three minutes before that launches. Yeah, these things have happened. This is real. Like for many projects, lots of projects we've been on. These are real situations. So, but before we start into like how let's understand migration projects in general. They're totally easy, right? Yeah, that would be simple. Just leave it to the end. But it could go wrong. I know. Well, first of all, let's explore why we migrate. So Drupal 6 is dead, but there's lots of Drupal 6 sites out there. I've even heard of some Drupal 5 sites out there. Oh, yeah. Well, you know if it keeps going, what's, you know, it's ain't broke, don't fix it, right? Well, but sometimes you get, your company gets purchased and you have to merge into different technologies. And then sometimes you like, like a client, a new client, and it's just horrid. Migrate out of the site that someone else was building because it was so convoluted that it's just easier to rebuild from scratch. And that's not just easier, it's more cost-effective to do in migration other than like unraveling the crazy yarn. Also, if you need to do like infrastructure, architecture, cleanup, if you're moving from one kind of major infrastructure change to another. Like merging multilingual sites into one or all sorts of different things that people think of with our structures slightly different now. Right, and sometimes you want to have some massive rebranding that you want to do somewhere else. Maybe a lot of new content architecture or new content messaging, but you want to keep your other site up and going at the same time. And there's more reasons to migrate, but these are like some of the big ones. There's lots of types of... There's like four main types of migration. First of all, there's like one-to-one migrations, which is like just keep the data structure and the functionality exactly the same, please. And this is like the Drupal 6-8 migration magic bug, right? And that's in core, and there's also in Drupal 7, the Drupal-to-Drupal migration sort of does. I think you're trying to do it. But that seems like it's the most straightforward because you maintain data types and functionality and you're like, yeah, that's apples to apples. But sometimes you have transformation, right? So you're going to take old data and you're going to switch it into a new architecture. So you might have apples and you get applesauce and then you get apple slices and you get like dried apples somewhere else. It's different. Or maybe you had like components of an address field that's now going into a button field. Right. Multiple sources to a single source. Yeah, and often sometimes somebody will have maybe an internationalization strategy that has like en.foo and fr.foo and then they're going to migrate into foo.com slash en. So that actually is multiple sources into a single source or the vice versa. It's like you can split it out into different domains. And in real life your project may actually use all types. You may not be dealing with just one. And when you start migrating the content you can just run it all at once and say, oh we did it. But then what will happen is there's new blog posts. If you've got commerce in the mix you've got new orders coming in while you're developing and then an incremental pass will bring in the new items. Some people will request that you actually roll everything back and then roll it forward again but you can just pull the new items. So these are strategies that you should be thinking about. Right. And in real life your project based on your content might use both types, right? And the single pass is like press the button, you migrate it once it's done, everybody has like champagne and this incremental is like I have a ginormous site where you have to trickle it in because it just physically takes a long time to migrate. So there's many different reasons why you use single pass versus incremental. The most common I've found is a single pass can be very good for the marketing content like your basic page. They want to rework that content, they're going to throw out some of this stuff, but meanwhile there's new blog posts, there's new commerce stuff and they don't want that stuff that they're editing changed. So that's where you start making that split of like we're going to do these incremental, we're going to do these once you've done those yourself. Right, sometimes the client will ask hey can you please get the structure of those basic pages over and then we're going to tweak it while you guys do the rest of the migration. So it's like content freeze on a subset of content on your site. But also size, scale and complexity really matters. So if you have a small enough amount of content why pay to code to migrate it? It's worthwhile to count the number of objects and we'll be repeating this throughout, but if you find that it's like we've only got 40 items maybe it makes sense to get your friend who has some spare time to just do some copy and paste for you versus like coding a lot of stuff things to consider is that don't always need to code everything, but coding everything is nice in that it's ready when you're there to migrate because you could put something on hold, come back to it and then go oh we just pressed the button again, so. Right, and also if you don't have enough skilled developers doing a lot everything you can have put together your migration party. So order pizza, have everybody get together everybody has 100 things or 50 things or whatever and they're bringing those over. There's issues with that sometimes but if you have budget constraints and you have a lot of hands that are within budget already then use them as necessary. And really pick your battles like choose if there's going to be development for various components and you know that this is critical to the business, get that done because that's critical. So it's important to think about that and not spend your migration time on the little details that aren't business critical. Yeah and we'll talk a little bit about architecture later but when you go like oh I have this whole site and everything's in the body field and we have all these content blobs that we want to use paragraphs in the future or some type of structured fields. You really want to consider the amount of time that takes to actually do something like that and it's very complex. So if we look at this small amount of content it's manual, programmatic you have large amounts of content that has lots of interrelationships and then also sometimes you might like what we've done is we've set up the structure and the big blob of body for everything. We've migrated that over in a structured content area or in a structured content type but we just kept the body field filled with content so we could display the old while the migrators like will manually move that over because it needs humans so you get the best of both worlds here. I'll give you all this stuff over here programmatically, get you jump started and all your team needs to do is jump in and reference that information and put it around. Rather than populating the new fields and leave those for new content and bring stuff into the equivalent to the body field and just say that's it. That can be really nice because they don't have to rework every piece of content or validate as much stuff. And they can start with a new content architecture. In real life how many files do you have? This is real. And there's so much content so yeah this is real who has a photo heavy site there's a lot of them there's a lot of them you guys know what this feels like you also can cover multiple technologies so Drupal to Drupal thanks community there's a lot of tools there fairly automatic it's got rough edges and it's getting better but you can assume there's custom development that has to happen on every migration every complex-ish migration and there's flat websites to Drupal or custom DBs to Drupal and other CMSs to Drupal we've had to do a hybrid of this where Joomla content doesn't really get built until the later stages so you can't just query from the database and get your HTML so you end up building a screen scraper that's pretty much a flat file migration and it's different approach but you get it done. In the wild for this there's a lot of different tools you can use for site scraping or data transformation but you have to actually do work but the nice thing about Drupal 8's migration kind of platform it does give you a nice interface to start dealing with those multiple sources and your project may use many types if you're coming in from a Joomla marketing site and you have a WordPress blog and then you have a couple of other custom database storage your product database there's got a lot of stuff coming on if you're bringing it all into Drupal also if there's some infrastructure considerations infrastructure is a huge deal in migrations because like on Pantheon and Pantheon you can't move files or rename files pull them all down and push them all back up or copy them over to another area but that's okay it's because their infrastructure has a super smart file system Aquia, their files directory structure is often kind of crazy looking and you have to be very deliberate that you don't migrate to or copy from the wrong place also like if you're working on a local host you might run into like memory limitations either on the local host or the local host also you don't have debugging tools remotely like there's a lot of things if you don't have the files locally you can defer to the HTTP URL which means you're hammering the firewall downloading thousands of files from your local machine so it's something to be mindful of right and network is a big deal so it's not just your transfer speed up it's down if you're doing it from host to host in the cloud it's the speeds between those what's capped but also do you have access to get to this port from where you are and this is legitimate impacts to your planning because trying to get access takes a long time trying to fix the access trying to do this and like if you can really mitigate those horrible delays on projects just because somebody can't get through a damn port so we talked a little bit about the pain so probably that takes longer than expected migrations run into after hours sometimes people want to launch on the weekends which is a little crazy and work is really detail oriented if you do not have a detailed person on your team highly consider not having them on your migration project and you need focus time if like as a client you're going hey let's have another meeting let's have another meeting while you just killed a day so that can be very problematic as well so it's important to time box your development time and know that you have a focus also super careful deliberate note taking is required and we'll get into why in a bit and the work can be intense right and who likes to work like this everybody's shaking their head seriously and as project managers actually as developers on a project management on a migration team developers want to be understood that they don't want to be like worked like that project managers also are kind of stuck in the middle but you guys have to mitigate the health of your team for longevity and project fitness you really have to watch when it's like well let's just do a 12 hour day because then there's another thing that comes into play and then you're running charge so you need to you know respect that people have time limits and for those that are familiar with agile I think it's if you have people work overtime for a total of like one sprints fine and then the production like if you have a two week sprint if you do it for one sprint it's fine but then it's like diminishing returns like and everybody checks out and it takes longer to get recover from that for your team also you need very specialized people on your team if you want your project to be a success so a migration project manager it is very hard to project manage a migration project well so there's a lot of kitten wrangling and things new requirements forgotten requirements everything you know somebody gets sick something like that happens everyone wants it yesterday yeah and there to plan and kind of wrangle and educate you also actually need the source technology so if you're coming from like cold fusion or some other thing you need someone that actually knows that a little bit so you can get access to it get maybe like under cover of the you know go into the belly of the data so you can get to it I was once asked how did you get all of that stuff into the site when someone else took over and I said well I knew it was an MS SQL site and I knew MS SQL had XML support so I taught myself to write MS SQL queries so I could get XML and then I migrated from XML and they were like whoa and I said yeah that's how it felt but you gotta do what you gotta do the migration engineer at that point is being creative and developing migration code the migrator there's this button not a lot of people think about this but it's the push the person pushing the button more coffee so you've got everything ready but somebody has to go through your script of activities and you can code a lot of things but you're still gonna have to make sure that all of the pieces fit together at the end so there are going to be manual steps and that's the person that's doing those steps right and that person is not a buddy but they actually are the person that's responsible to recover the whole site from failure so if you're pushing the button you're fixing that if it breaks and that you have a big ginormous site and you're on certain infrastructures and you have gigamits of files that's harder to do so and then also you need a data specialist to test the migrated data not just like oh it looks good on the page but to like correct is the data it's the worst feeling it's like just before launch oh whatever happened to this field it's like you really need to review every single detail and like I've seen people print out pages and mark X's over every field to verify that they like know that they looked at it because sometimes easy to forget right and like where do you get these people you call us you actually like talk to people who have done migrations before so there's a lot we're not going to go into details but as project managers it's you have to like address the personas of people on your team and what they're looking for from you for communication plans so business owners want to know like when their site's going to be done and how much money am I paying account managers are like who is everything okay with the business you know and project managers like oh my god are we going to get this done migration engineers is like don't bother me I'm building this and well actually I'm coding a migration and then the developers are like don't bother me I'm building a site and the site builders are like hey can I make my content right yet? I just want to change this one thing I just want to change one thing and the themers are like why can I not make CSS kind of tweaks on my like females and things yet yeah different tracks yeah and they have different considerations within the course of the project so but the takeaway here is thorough planning and vigilant not vigilante management leads to project success and let the numbers prove it so we're going to do some math along the way with our sample website and really in the end make it easier on your team and simplify when you can oh spreadsheets spreadsheets no cell left behind one student per row no really really spreadsheets this is meant to address some of the the variety of people working on the projects you'll have people that may not be familiar with the bug tracker or you know you'll have your lead on the project but then they will have other content specialists working on stuff you just want to get that data and spreadsheets start to realize there's more vectors you want to be tracking you can just expand that out really quickly you're not being constantly bludgeoned by the tools like I have to spend 10 minutes per issue you can just like paste a few things and come back to it if you need to yeah and Google Google Sheets is great you can share it you can actually track who's touched anything so wow yeah project managers you're like who touched that why did you change it that was requirements freeze things we end up adding is like a notes column for the developer and then a notes column for the client and then sometimes you'll want other teams to also have some input and that's where cases you just say this but we found this problem and then when you're going through your meeting you have a bit of info but then you can elaborate and you know that point needs to be dealt with and I kind of feel like this right now I feel like we're this person going see this stack of papers that's hard this is easy but we're talking about oops come on there it is like the on click isn't clicking we're talking about migrations or this stack right this is the stack of data we're talking about for migrations to track it and manage it so you can't not do this because then it'll lead to time wasted high cost and oops and the devils and the decels it's true so so now we're going to start to get started so that was just thinking about migrating now we're going to start actually planning migrating but how do you use what planning techniques to use and there's benefits to waterfall and agile don't make it be a battle absolutely for migrations waterfall is a perfect thing for order of operations this must go before this because that is dependent on that that's beautiful but also the biggest beautiful piece is the sign off piece you have to have your client sign off so you can be like thank you because this is true we'll show it in a second but agile actually is great because the meeting review and acceptance of things is very it comes to a lot of cases where people are like okay we'll just get the migration done and then hooray it's done and it's like the first meeting after and it's not when it's done that's the first time you review the odd things in the content because you're going to find weird things and then you're going to iterate and then you're going to review it again with them and you're going to need to review multiple times because this is their baby they have like their business infrastructure based on this and if they assume that it's going to be perfect the first time it's not and sometimes that just means they want to rearchitect or they realize that there was something they didn't like about the old site those are the things that start coming up and our client, it's a Drupal 6 site maybe they just did that at Drupal 6 do you remember that person in 2008 that's a 9 year old site and do you think data would have corruption over time maybe due to changes yeah a lot of stuff in the older versions of Drupal and any other system for that matter we're not as good as we are now at input validation so that stuff may have been brought through multiple generations of the project and now you're going to find well hey three years ago your products are starting to fail did you have a different product type oh yeah we deleted that and then you're like worried about it but if you don't have that meeting with them you're going to go and fix all of that and spend hours fixing it and then you show up with the client they go yeah I don't need that so your spreadsheets are coming in handy because you're having spreadsheets to map out your fields and that's what you want and they forget all the details of how big it actually is in the site because they're just happily using it and not thinking about the architecture on a day to day basis cool so we've gotten through two big chunks of the presentation now we're going to move into the phases of the migration project there are ten you can add some phases or whatever but these are just very broad concepts so we're going to group them into three segments getting started with your project building the site and migration then actually the total production migration so three phases are getting started another the four phases are in building and then the rest are in production migration you guys ready to get started because that's the best okay we're going to talk about pre-project education audit for migration and discovery so setting the expectation of we are going to need to review it's very important building out spreadsheets well this is the key one especially if you have a paying client that has never gone through on migration you have to let them know what the phases are you're going to experience and what's past of them and you want to make sure that you clarify impact of requirements and if it's not frozen the impact of budgeting and time and then this is early in the account management you want to identify if it's a really big site do you want to do phased statements of work maybe for the getting started like discovery and architecture phase maybe that's a type of statement of work and then you can do another statement of work otherwise because as we talk through migration you'll see how the cost and time is quite unknown until you get more details along the way this can even be hard to do if you've been the manager of the site over the past generations that you yourself forget like oh yeah we built that crazy thing and I forgot about it so you do still need to go through this process even if you have been the maintainer of the site for a decade and no client wants to hear it's ready when it's ready but that's kind of what migrations are due to all of the different variables that change over time but we don't want them to change we want to have very clear things so we can build it right but the budgeting projects take time specialization, requirements locked down project fitness because they're long and transparency like hey this took longer this took shorter hey please don't make that feature we're stuck on this thing and we can't proceed until that's done open communication and then for our sampled site for Nexus Travel after we've gone through that phase we did let them know that their non-defined features are a huge risk we said that the time constraints will impact the developer work-life balance because nights and weekends have to be done and they need to mitigate the customer expectations their customer expectations with new feature dates and launches but okay so yeah thanks client knows let's get something started right then you start digging into the site you're looking around you're doing your kind of discovery as you would if you were looking at the site for the first time going through quantifying things looking at the content types all sorts of stuff like this figuring out what stuff you may have forgotten about what stuff that they have forgotten about functionality that has you find this section of the site that has a bunch of custom code and stuff associated with that custom database tables and something that's interesting here is like customers will have like eight different words for some feature that I've evolved over the last nine years since the D6-1 launch so that type of you can see that sparkled through like code and configurations and notes and data so basically you'll get all the AKAs because everybody will call things differently and at this point the audit for migration we have a risk register a huge content audit spreadsheet with a structure, data, size and source of information we have a functionality audit and then this is where it surfaces custom code that they might not have told you we have a data help audit so it's talking about oh, years ago this product what happened, so you do this early so you don't get there later and spend that time developing against it also look at the existing infrastructure like do we have to do this in place, do we bring the source content down and also do SEO accessibility like permissions and access controls and you get all the source URLs, right? because you have to start planning like oh, if I'm going to migrate I need to do redirects for good SEO but what's really cool out of this step is links to representative content on the source site so if you have a product page but you have 12 variants of product you want 12 variant links that represent each type of product because each one might have different display roles and different data fields that are shown and everything so this is the that little list is like money through the whole project you'll use it through every single phase you want like the most complex of the things that are built in the site so if you have like a product that has all these options and some of the products don't have those options you want the one with all the options and this will be great when you're going through your meetings with the client saying okay, we've got all of the fields you can see they're all populated it's easy to just like migrate and look at the first item and go, oh, yeah, the stuff is there I see stuff and if the developer has that while they're developing they're like, why didn't that thing come over oh, oh, it's because that bit is silent error oh, yeah, cool so getting that up front is great the olden days they call them those cases but for everybody on the team it's like, oh, did that page work and it's like, everybody gets it did the page look right and that is a really tangible thing oh, oh, I almost forgot yeah what if your DNS is set to cache for like seven days or something ridiculous which has actually happened to us well, we made the change and it takes a while so you can start thinking about that early on and then you can readjust things beforehand so that when you actually get to launch you can be quicker refresh, things like that and sometimes you have to ask like 3IT guys deep to change the DNS because somebody owns that access that's not you or the marketing team lessons learned here very few developers know how to audit for migration and it takes way longer than you expect to get the detail that you need even using tools to help you especially if it's the first time you're doing this it's going to be longer to get your head around some of the concepts yeah and auditing twice is costly especially after you've done a lot of work and then you have to go back to audit also this comes no cell left behind if you're doing an audit and there's a blank cell it means that nobody thought about it yet it's NA that means it's not applicable I have looked at this cell and I have made the decision that this is not necessary empty means I haven't done it yet and also keep everything in one place make one google.drive folder have just a few spreadsheets because if you have a ginormous amount of spreadsheets you cannot like relate information back and forth people get lost and through the life cycle of projects they're like oh I just have the link do you have the link and that's how we talk about it in meetings hey what's the link and that the link is the migration mapping spreadsheet or the link is the go live checklist with all of the things we've done that like information overload paralysis where you're like oh I have this issue and then you're like do I put it there do I put it there do I have to put it in three places like just put it in one place put everything in one place so easy to get your head around and then also it mitigates like developers and architects or anybody really toward the end going oh I didn't think about that right because it really gets you into everything that's happened and it starts making you think about content and ethereal thing so actually out of our project we have all of these content types or nodes and blocks this is the count that we have and the complexity so we have a lot we have 50,000 members we have like so we're trying to quantify the number of items so we have an idea of like importance how long it's going to take to run but we're also quantifying the number of migrations that we're going to be running so we're looking in this case sort of at entity types users and entity file and media or entity taxonomies and entity but as we get down to the end of the list we're seeing like e-commerce entities there's many of those actually those are going to break out into like 5 or 6 different types maybe or even a 7 or 8 if depending on how complex the store is and naturally I just when we were putting this together it's like we just absolutely make it in the migration order in the dependency order of content so normally you do user roles first and users and taxonomies or files first and then you start doing the content types and you can do the most simple one like basic page and you add some with complexity that feed into other content types entity relationship so it's like building a house so we just naturally did it but basically a huge thing we'll talk about this later is that you have to put the migrations and the content in the dependency order you must have a user a role vendor to be related to the vendor node access to update it so you can totally import the nodes but if you haven't imported the users everyone's going all the nodes are owned by anonymous so that's not that useful to you you want to make sure that everything you need to hook into on that page is there migrate will stub out some of the things in certain cases but it's just easier to run it you know in the order of things that need to be there so all of that stuff comes before commerce because commerce needs to know that there is a product it needs to know that a user bought this and you know who that user was so it can put the and then of course at the end you have aliases and redirects everything has to be in to make sure the last of the aliases and all the redirects happen so we just average like oh for all entities that are public or all entities that have an alias or a pack auto there's one or two there's two redirects and at least one act of alias so that's where those numbers came from that's not huge right are you guys who's scared who's the scared person here now that's okay we can do this so then you go into your classic discovery phase right and this is where we're going to do new functionality take them through requirements gathering prioritize the feature development like what's really important with the data in mind and then this is where you capture expectations on the data migration oh make sure that blog posts get carried over we like to look at it and also it's a big thing is we have to re-educate the business with findings here it can also be good at this stage to ask for pain points like what do you not like about your site what things do you want to re-architect because that will open up the opportunity to change those things and you'll find in some cases that like in Drupal 6 they did it this way because that was easy to do in Drupal 6 but maybe that's not actually what they wanted or needed like so it can be much easier if you find out well this is kind of difficult for us and they say well that's actually really easy to do in Drupal 8 we can throw all that out so awesome you know yeah it's pretty cool and then a lot of the artifacts that come out of here is like a feature list a feature roadmap as well right now when I say feature it's like a higher level one this is what the client will actually talk about oh is the landing page done is the shopping cart ready yeah right so this is like here but your teams in the details you might have many steps that mean is the shopping cart ready this is where you get the feature requirements you also continue like expanding your project glossary because we had a project that's like travel agent I worked with a travel agent to create the tour and those travel agents meant three different things it's like I have a feature on my website that uses features feature right and it's like using features module it's the same type of thing so you really want to elaborate with the also known as clarified product glossary or project glossaries and go from there and you elaborate on the representative links list it's like oh yeah there are things now that we're talking about that we didn't find before and then when we're talking with nexus travel what doubt are we going to keep oh it's transformation of things is interesting yeah so our client they want to like take old Drupal six select lists and make them into taxonomy terms so this happens a lot before because taxonomy is huge it's really strong in Drupal 8 this is again just re-architecture but it's not trivial so yeah functionality is not the same so if you hear a client say oh yeah don't worry about it most of functionality is going to be exactly the same don't ever trust them ever we've heard it on every project that has a migration and it's not true because they'll be like well the pain point start coming up yeah exactly so if you don't ask you're like later you'll get the pain point well why did you do it like that well you said do it exactly the same way I mean you'll of course re-architect a little bit with new technology but you're like wait no no you said the same thing it's like if you're moving away from Drupal 6 don't rebuild Drupal 6 doesn't make sense it's like going backwards and forwards at the same time yeah so we haven't even started building stuff but now we are so we're going to architect the site map migrations informations we're going to actually build it and then we're doing this thing called pre-production migration we're going to zip through these in architecture you're going to define the content structure and then define the infrastructure and we're not going to talk about this in this session because that's why we're at Drupal Con to learn how to architect the new site you're going to take all your spreadsheets from the audit and you're going to expand on them and then this is where we know a little bit more about which order the features should be developed and what's in scope and possibly out of scope yeah pushing stuff forward into the roadmap if you're like new functionality it might be better that we put that into phase 2 that we launch and then we bring in new stuff after so that it's ready this will help constrain the timeline yeah and one thing we do a lot is site architecture spreadsheet so when we look at the audit we capture things like the source field name the data type any kind of notes or values that it may be and now the site architecture spreadsheet that audit was the as is this is the 2B so this is where we define hey our new architecture we look at the other one and we're like oh yeah we want it to look like this but it's not the migration mapping spreadsheet it's like old stuff new stuff and then later you have a spreadsheet that's called mapping stuff right so there are 3 separate things are you looking at old stuff new stuff or the mapping because they get quite big right and then also doing URL pattern planning so I can't say that alright so back to the quantification so entities are each generally a migration you'll run into cases like with paragraphs where you have like 5 variants and you can go and try to code all your migrations to do each variant but you sometimes end up shooting yourself in the foot and you'll often find it's just easier to do 5 migrations for each variant so even though that sounds like it's more work like file save as and just doing that with all the same fields but then augmenting that little structural difference can be simpler than joins we have the thing next for you, you'll see it so media entities require at least 2 passes and the first one is interesting you have to copy all the files up and then you have to run the files basically files are managed you have to tell Drupal files exist and then you have to import them as a media entity so that's a big deal to our earlier point about validating the data to see if there's any corruption 1-2% of the files will just be gone bitrod is real I never really believed this until I've done it like 10 times over with different sites over the years and I was like oh yeah there's always going to be 1-2% of the files that are just gone who knows why could be the operating system, could be the disk could be any number of things, but that's going to be a thing it could be that the client deleted it and they didn't delete references they went behind the scenes and deleted it outside of the interface and I really want to say that you must architect everything before you start building and do not ever let your site builders just start building things without writing it down it's harder than if they do I'm not kidding that's such a costly thing anyway this happened before and it's not like our people it's just we've seen it happen on other projects you're like where did you lose that data where did you lose the decision point how do you know what's happening now with this because there's all this other information that goes with it and they're like oh yeah I just made this thing over here and it's like well we needed that first and now we can't migrate yet the architect said you had to do it in this field format because the migration has a certain data type you can't use that format or the formatter you can't use that plugin you can't use that field because the data types need to match and now on our travel site architect and the commerce is doing special things and then they want new things because we've been talking about stuff and they're all like oh this old stuff is boring I saw this shiny new thing so now I want it and I forgot to mention it in our past 12 meetings getting set up but here we are the CEO is happy that we're doing this project and we're like yeah we can finally do this thing that we've had on our list for a year because you're here well you're talking to the developers could you just get them to do this thing wait yeah like it's only a little bit of AB testing it should be easy right and the conversion funnel oh come on analytics for conversion it's just clicky clicky stuff yeah just put a javascript thing in there it's okay good I hear a nervous laughter that's on purpose so migration mapping spreadsheets are going to be like each of your content types we do one spreadsheet with multiple tabs and then one tab will be each entity type entity type and we list out each field so on the left side you've got your old and on the right side you've got your new and you have the client go through and validate each of these things because that's where you find out that this really convoluted field is just going to go away yeah and that saves you a lot of time down the road and because migration engineers are going to be like you're going to be like hey start migrating on site and they're like well what do you want me to do and it gets all quiet so this migration mapping spreadsheet is their technical requirements for doing their development so and it also creates a great testing matrix for the post migration data audit so the big people that actually use this are the migration engineers and the data testers this can be a good place to put the representative nodes as well yeah like you reference out back to the representative and can then just be focused on the spreadsheets they're like mapping out the fields they run it and they go oh I want to see node 1496 they go right to that, they see it's there, they move on you've just made their job way more efficient so don't have entities for like migrations a lot of times they'll just run you'll run the whole migration and see just what happens and you maybe like a representative node is like a really deep you know in your migration but if you have representative nodes they can like migrate those exactly so the migration development time is drastically reduced it's like oh I can just run 10 make sure everything's okay and be like alright I think I got everything now I'm gonna run a full pass to see like how does this work on all the data not just ECU representative nodes you'll find many sites where they built out the initial content using a set of base fields and then later on they were like well let's add something to it that won't come until a thousand nodes later yeah and also the artifacts here we talked about that middle like migration mapping spreadsheet you also look at all of the other spreadsheets like oh cool the source field list is already on your audit oh cool I'm just gonna cut and paste those over oh cool the destination already is on the architecture sheet you're like why are you copying those things over but it actually lets another engineer look at the data to see if it's right right so it's like one more eyes and validation that our architecture didn't shoot ourselves in the foot for migration so it's like let's review that and then they start going but it's also a huge thing that happens is the client will be required to clean up select lists like if they use select lists before and they want to go to a taxonomy term system you'll have a list of like old strings to new term IDs right and that may be different for different types of content yeah you can do things like this TID or this string will match to this taxonomy term or this string will match to this TID which represents a taxonomy term and you can take different approaches with that some maybe faster than others depending on what you're trying to do right and what's interesting is we talk a little bit about migration dependencies you have to do roles first then you do users so at this point you're working with the migration engineer to surface any of the subtleties of what really needs to go first based on all the different editing types and some lessons learned you can do it at the same time as you're doing your new architecture and you probably should right like I'm architecting based on what I need to migrate over and what I need to fulfill for functionality you do that at the same time so it's not like here here's a migration mapping task you just do it at the same time so we've got types, field lengths, formats, dates, filters all of these things have their own challenges so you have to write them down seriously dates you're going to find things like formats are different you may have to augment those this stuff you often don't find until later in the process like latitude, longitude fields are a good example you have to look at the new format and what I'll do in those cases is create a node in the new site and look use the develop tab and see what the structure that came out as and then just try to re-implement that structure and one thing that's been interesting is the field length got us on certain text fields because somebody went in and site built really quickly without looking at the old source they're like no I want that new field to be displayed smaller what do you do you just crop everything but they said I want to display smaller so I'm just going to choose the allowed character length of 60 instead of 150 and what that happened on the migration is well it just chopped everything at like 60 they didn't even make a new field they said can you just keep the old one and like do something with it transform it whatever there's no like summary it like 300 characters thing so that happens right and also this is when you're splitting the blobs it does take a lot of time to go into a structured content from like a known unknown bunch of like random html ok, innovation mapping the basic page tags, select list the image field is mapped to a specific media type, hey let's add Spanish so at this time wait you just added that language wait no it means 5 and we already arched the site oh that's going to add some time and effort so at least it's at least it's in the architecture phase we haven't built anything so now we're building anything we have to re-estimate the work based on knowledge we have and we have to go back to the site architecture if necessary because we added Spanish we have to re-educate the client because this phase of development is like the hunker down phase so they might be like why aren't you talking to us as much anymore because we're hunkering down this is where you told them about their own data what was that thing 3 years ago like oh yeah yeah and at this point we're building the site just building but then we keep going and well actually the biggest thing is we start a tech list so we start like this is where we're like starting all of the migration lists that we're doing and how we roll back the site so very early and this is once again the site building must be finished at least or that for a specific feature finished and then you can move that on parallel so within our migration mapping spreadsheet we might mark one as ready so we make that tab green so that means go ahead one might be read because we still need the client to approve the fields right and then we're going to do the migration dependencies develop the code and the developer is responsible for the first population of how long it takes for a migration pass when you do it full bore how long does it take for 50 to 50,000 users to happen we're going to get through the other ones they're a little bit briefer but the big thing is don't over engineer and don't let your team over engineer because you're only doing this once yes if you're doing incrementals you're doing migrations more than once but it's kind of like that should just be pulling the new engine shouldn't be going back and redeveloping it again but yeah you are not making a work of art you're here Darryl this one's for you the max joins on MySQL databases 61 because those those content types with over 150 fields that are split out and together and don't be fancy like this is one database table I have one database table keep it flat don't try to over complicate it it will run faster if it's really simple yeah documentation along the way is your friend comments and user generated migrations need the parent entity to exist make sure you watch the published and unpublished status of the source when you're developing it's very slow and this is a typical indication of a memory leak there's ways around that by batching things and running them as like units of 100 but it can also be easier to swap out whatever you're using that's causing that memory leaks yeah and I bring this up again splitting a body field to structured field good luck everyone likes Dompath Parse it's great and at this point for our Nexus travel we hit the 61 join limit the share my trip migration ran out of memory and had to be batched in groups of a thousand the network latency between one of the developers homes is really high and bandwidth is low and can compute migration run time especially for file copies that's Darrell at the end this is our story this can be a case where if you're really familiar with working on servers maybe you want to have a little server that you set up to do the migration so you can just ssh in and have it like on the backbones anyway so now we look at our data and we have some migration time here so it takes the big juicy one is that trips can take 16 minutes to 120 minutes and so we're going to look at that one and a couple of these the members take about 90 minutes so this is important when you're communicating back to the client like it's going to take 4 hours to run the migration so do we want to put the site down for 4 hours while we do the transition the cut over this prepares you for that conversation right we also track incrementals but the grid was too big on the document but like it will work is everybody okay on time okay we're finishing up okay then you have to do some pre-production migrations and you just keep running the migrations to see if things are working if there's no regressions that happen this is when the client tests the data and of course they're going to do something else like you have to populate the bulk of the data really so oh yeah and we talked a little bit about you estimate the duration of the final go live migration yeah and at this point the client says we missed something can you add it to the migration so this is kind of the the most salient point is that people are going to hit this point where they go oh we forgot we just completely forgot about this we're pretty much done we got to go back and that's where the costs start to add up so you want to try to contain that this is why we quantify things so we know it's not in scope we know that we have to go over our budget we're really close on budget of time here but but if we do the math so they asked for the trips and the trips had a two hour migration so per migration they're like oh I have to do two hours of redefinition with the business because it's a couple meetings I have eight hours of development perhaps there's two hours basically and four rounds of migrations for the developer to run migrations the client testing to run migrations the migrator to run a migration maybe to the production site or the pre-production site and then the data site QA so we've got like eight hours of testing and deployments and two hours of deployment overhead at that point there's total 20 hours you're like yeah that's cool that's a day no it's not that's a week because it takes like meetings and then building and then test test test test test it's a week and it's about two to four thousand dollars addition so that oops I forgot cost money and time and now they're looking at you like why are you giving me an invoice that's bigger you're gonna fix this oh anyway as you develop I'm gonna make it look prettier I haven't forgot to search for those fixed needs you want to do your site architecture then you write behind it at the same time you're doing migration mapping and you can start the infrastructure early too because you have enough infrastructure stuff to find so get your hosting set up and then you're like okay I'm gonna do that initial sum folding some of those features are defined and the mappings are solid let's get that started then that gives something for the developers to start developing on migration passes then like the themeers are like why can't I see this without real content they will then have some real content to look at and at that point when it kind of all comes together the client can be really into testing and bug fixes and incremental passes so that's kind of what it looks like it takes a little down time or quiet time before the client can really actually touch the site and play along with it so we're going to do this one really quick so the production migration phase is really the where the rubber hits the road so we're pretty much like quote-unquote done and we have one minute for our talk but we're going to go through this one last time and just run the migration and the client's going to come to us and say oh we forgot one more thing and then we go okay well can we go live or not are we going to do that thing again and figure out all the new things and figure out what our budget is and there's a point around the space where you're like oh that new live server becomes really production and so you have to treat it as production gold you can't just chuck migrations again going like oh no that doesn't work and now it's a big mess because you've invested all this time and data so it's real remove all those fixme pages I know if you want to do more stuff and you're like really do you need that then it costs more because oh they asked us something that depended on three other migrations so now you're at like a two calendar weeks it says like 12 to 24,000 dollars it's overage it's real this is real oh but why didn't you go live can you do that on the weekend can you just run those migrations I've worked every weekend for the last six weeks then you go live so you use your go live checklist you check your DNS and your DNS and then you check it again after you go live and you check it again the next day and you keep doing that a little bit and I think the biggest one here is that you have to practice over and over because it's your A game stuff will happen and you have to relax with it and just make sure that you know that you've done everything you can and your team is probably going to be super tired right so nobody's going to have to deal with the midnight like well crap something happened but yeah you know what in this site the DNS for the host so upstream vendor problem vendor problem we couldn't have planned for that but well we have a mitigation strategy yeah well there is no mitigation for that the vendor I cannot hit the vendor sites for the source or destination at all there is no mitigation you're just screwed here it turned off for four hours and that failure was identified on one of the backups that we had asked to run and thank goodness not a migration pass because if you would run it on a migration pass you'd have to roll back as you're going live that sucks that's hard everybody's like on nerve and it will like add time to your go live so if you haven't had a window with your client to go live in two hours because that's what you expected and it's going to add six then that's not good and the DNS but our DNS propagated finally so ours was fine this is a real story finally also the last step is post launch validation and you can't underestimate this right so is the site working are your redirects in place like did the DNS completely propagate or was there some old TTL value that's getting in the way maybe want to mitigate by fancy your artifacts are the speed test SEO tests, error logs and lots of feedback from site users and the next is travel for the site we saw the 404 log showed some missing redirects so we added them they were like super old that were on their Apache there were redirects in their old server that we didn't see when we were in Drupal and the vendors were happy with their new features and that they were like let know that the features took a little bit longer so the client must be happy and some data expected by members was like missing because that was new features by the client like surprise their member base so that's okay it's the client's problem but they told us to do that so the takeaways from this are in complete requirements equals rework and increased time and cost and the more dab and more testing it's just going to get bigger and bigger bigger and bigger and your team may change over time your team will change over time and write everything down because you might pull other people on to help you with targeted migration work so if you have everything written down like the last project I'm like hey Ryan can you come in and just get lend us a hand for a couple of passes he's like oh cool you have your migration map being done cool thanks and it's like yeah yeah as much as you're to start when you have everything presented to you like in one document and you can just go I'm just going to code really quick but normally you'd have to go okay how do I find out about this and you have to look all over the place yeah so thank you everybody join us for Contribution Sprints on Friday and it please go to the session page and give us a rating and we will put all the slide deck up so you guys can have the 10 faces and this will be video we're only 5 minutes over budget oh that's actually pretty good what percentage is that is that like 5% just under 10 yeah just under 10 sorry yeah we actually we set the overtime budget expectations I did want to start 5 minutes early we also did that so technically we were like 10 minutes over budget but that's okay it was the value it was a value based over energy the baby was like kind of straw and I'm like ooh that's my baby yeah are you okay? yeah I know I can tell I know I know I know