 That that's when you do the serious development and it's an iterative process as I said the The target the Drupal site is going to be evolving from out from under you And meanwhile you're also discovering more things that make me to be accounted for so This is a cyclical process. You're going to iterate you're going to run your test migration And you're going to find you're going to need to go in a tweak it and go back again And once it's all done You launch so the the analysis speech usually the your new site the design of that site is Done from the top down it's people Your designers have looked at the old site and sort of imagined what it'll look like in Drupal and they've Taken the visual elements what they can see on the existing site and designed The Drupal site to accommodate those But there's a lot to a website that isn't directly visible So it's important as the migrator you're going to be getting very intimate with that legacy data You need to dig down into it You've got to find those pieces that no one else has thought of and you need to surface them and Communicate them to the rest of the team. I just said that pretty much and I pretty much said that too Again ahead of myself So it's really critical You have to be very thorough very methodical When you're analyzing the data you need to Look through the source data table by table row by row Make sure you're accounting for everything you find and you're going to find a lot of interesting things It's amazing how often we found that The whatever system we're coming from at some point in the past had been migrated from something even older and there's Always little bits and pieces from that ancient First-generation system still lurking around you need to watch out for those you'll see things like Some odd markup in your older data that you're going to need to Massage to make it look nice and Drupal While you're going through this You're going to be acting like a three-year-old you're going to say why Why but why it's There are a lot of WTFs in the legacy data always So once you've done some your initial analysis It's early in the project is possible you want to start implementing Your basic migrations and with the migrate module. It's actually very easy. You can very quickly do Initial analysis an initial implementation that will get now for users just their user names and Email addresses for nodes. You can very quickly put together something at least get the titles and bodies Start getting something concrete people can look at and know that you've been doing something and Based on your analysis you can Actually fill in a fair amount of things right from the beginning some things are fairly obvious It's usually very obvious where the title of your articles come from There are other things that you don't really have to think too hard about there's a status column if 98% of the values in the status column or two. That's probably means published There are things that are a little less clear and then they're the things your WTHs or Fs as you prefer and This is where I talked a little about the migrate module in a In a normal development process you are going to Be explaining what you're doing in your mappings in the code in code comments But no one sees that except the developers But the migrate module lets you do is put the comments actually in the mappings and the mappings are displayed through the My great modules UI so you can share what you've learned and also what questions you have They they get annotated there. They can also be exported which we'll see in a moment So once we've got your basic implementation Done in the migrate module the next step are the meetings and I Have to tell you my name is Mike. I Am a geek. I Am never happier than when I have my nose buried in my monitor coding away I don't want to take time from that for meetings So believe me you you must believe me when I tell you you need to do these meetings You need to sit down with everyone You need to sit down. You need the whatever the technical resource whoever it is on the client side who understands the old data You need the stakeholders To tell you what's important and what's not and you need the site builders because You're going to be just finding new work for them You need all these people need to get together whether it's teleconference or in the same room and review this the data methodically and tediously And and this is really critical You can't skip this step Much as you may think you understand the data from just looking at the tables No, you're going to learn more things from these from these meetings So the migrate module as I said you can build out your mappings of your data in the module and you can export them and What we do is we export this and pull it into a Google spreadsheet these mapping meetings we pull up the spreadsheet everyone's looking at the spreadsheet and We can see going down the left side the migrate module One of one nice feature if I do say so myself it automatically detects Everything that's available on the Drupal side all the fields that are available to be filled in And also as a result of our analysis We've listed all the source fields that are available to you and you see those in the middle columns and You can see where you get the two Lined up there. We've already mapped those fields It was not that hard to figure out that the forum name and jive should become in Drupal the title of the forum node The default column some Some values can be hard-coded when you're migrating in this case We want all our forum nodes to be owned by the administrative account that is account number one You see that in the default column and finally as I said When you can comment on these Mappings within the mapping itself and we've exported that that's the description So in this scenario here We've got these four fields at the top. We've looked at and we have no idea what they are So we have done two things with them. We've added the description to ask in the question. What what is this? do we need it and You may notice in the first column. We've got sections. We've grouped these under client done and DNM each mapping we assign to what we call an issue group and by convention We usually have five or so issue groups Working from the bottom up DNM means do not migrate. These are things we know we don't need to deal with We are always happy to put things in that group Above that done those are the things we're confident about we know what we're doing and we've implemented it We've got the names going into the titles. That's beautiful the other issue groups represent our the different people involved in the project and Basically, we're assigning these fields to those people right here we've The the the next step to be taken with these four fields is for the client to tell us what they are and what to do with them and Very typical workflow here is that these will These fields will migrate through different issue groups if the client at the mapping meeting tells us that this Forum index counter field is something they really really need Then the next thing we do is change the issue group for for that field To the site builders the site builders are going to need to add a field to the Drupal node to hold that Once they've added that field The issue group becomes the migrators. It's now our job to implement the mapping and Once that's done. We move it to done and we are happy So once we've got through this process. We've identified all our mappings we implement them and No matter how good you feel coming out of the mapping means you feel like you've addressed everything you've figured everything out More things are going to come up the the again the Drupal Destination site is going to keep evolving and Even though you think you found every table in row you know the There's going to turn out to be some odd little out-of-band thing Oh, we have these X amount L files off the side. We forgot about or we might in Once we dive deeper in and start doing full migrations We might find that the the bodies of the articles have some sort of custom tags in them And we have to figure out what to do with them. So We iterate we implement we test and we refine and a critical thing and this is again something you want to get Set up as early in the project as possible You set up a staging server you set up a server that's going to every night Pull in the latest code from your repository Build a site a Drupal site from scratch using Drush make or Drush site install Whatever it is and run the full migration and this site should be ideally Visible to everyone who's involved with the project so they can see Well, it's good for you as migrators that they see you're making progress And it's also good because there's they're going to see things they don't like the first time that staging server comes up You think you've got everything after the mapping meetings you've debugged your your mappings everything looks great And the first time one of the stakeholders looks at it and he says Where are the related articles? and you say What related articles? And this is this always happens there There's always these disconnects that these things that were left on the table The customer just took for granted that this was part of the article So of course you knew it was there and you had to bring it cross You didn't know about that. This is something you don't want to find out the week before launch So you get that stage in server and get people looking at it as early in the development process as you can No, one other thing I want to point out about this stage, which is the longest stage of The project that the main development phase is that there's kind of a long tail of Development effort on the migration your migrators Your migrators are going to be very very busy immediately after the mapping tables, but at mapping meetings, but As things as the Site build stabilizes And as you work out all your problems Migrators going to have less and less to do until you get close to launch And so when you're resourcing these projects That's something to keep in mind that Whoever you have Doing the migrations. They might be fully fully utilized at first But you know somewhere in the middle they're going to be half utilized They're they're only going to or they're only going to need to put in you know ten hours a week on the migration so after all of that we're ready for the launch and Typically what you want to do is you get your site your eventual production site Established get your code deployed there and Usually if you have a large volume of data if you have a million nodes or something like that You want to do an initial migration of that data a day or two before your launch Get it in there. You don't want to be Waiting eight hours for your migration to run on launch day it's a Subject for a more technical talk, but if the if you can at all you can You want to establish your migration so that they Use high watermarks that is some sort of flag that tells you when content is updated or inserted So that you can run of the last migration you run on top of this one Can very quickly Find what's changed and even if your full migration is something that's an eight-hour job Your final migration your Delta of whatever changed last day or two Hopefully is a 20-minute job or at least less than an hour So at the designated time and hopefully you're communicating to your site users your customers That you're going to be making this transition and that there'll be a little Time when you you're not accepting new content no comments and so on so the does made time you set the old site read-only You run that Delta migration pull in the last couple days worth of content One quick look to make sure that that last bit of content did make it across And you switch over the DNS or your load balancers whatever it takes to bring me your Drupal site live We're done the best part So just a quick review that some of the main elements of success and a data migration project Get started on the migration as early as you can Start analyzing as early as you can get to the mapping meetings as quickly as you can get that staging server up as quickly as you can and And each of these pieces you don't have to be perfect That the faster you can get people looking at your site looking at the results of the migration the better You can always refine it later Again referring to the migrant module the migrant module makes it very easy to Import and roll back and import again. So Supports the iterative development And communication all along the way is very important You as migrators you need the those technical resources to be responsive when you have questions about the legacy data you also would like the Site builders to be proactive and telling you when they're going to add more fields that you need to migrate into and You need to also keep everyone else informed when you're discovering the new little tweaks And strange little things you find in the legacy data Now I'm going to back up a little To before the beginning of all of this common question that we get is How do you estimate the migration project? How do you figure out how long it's going to take beforehand? Typically people's first Thought when they come to us and ask Want to know how long the process is going to take they're going to say well, we got a million articles We got a hundred thousand users. How long will it take to migrate them? I don't know If you think about a little more What what is Going to be harder to migrate a million articles just like each other with a three simple fields on them or five different content types with Anywhere from five to 20 fields on each you got recipes. I've run into recipe migrations a couple of times Those are those are lots of fun because they have complex internal structure What what matters in terms of The difficulty of migration isn't the volume of the data that the the Main thing that impacts is the launch how long it takes to run that big migration before the launch But it does really has very little effect on the development time, which is the the key point The key point is the complexity of the migration It's how many different things you're migrating and how complex each of those things are We do have sort of a very simple formula sort of an It's a little bit arbitrary, but at least helps us get a ballpark in when that's made and basically that the most fundamental part is aspect is the number of migrations and one migration in this context is Is basically one combination of a source thing and a destination thing an Article on your legacy site to your new Drupal article content type users to users Categories to vocabularies and so forth and Sort of arbitrarily we we decide that Each field is about Each individual field is about a fifth the of the Effort of the whole migration Developing the migration itself your development figuring out what the query is to get the source data and There's certain amount of overhead there fields Most of them are very simple straightforward mappings But the ones that are not will end up sucking a lot of your time like those recipes and so that that that's sort of the first draft formula second draft formula I Don't show the full one here, but some migrations are easier than others Taxonomy is pretty easy files can be a lot trickier so you can you sort of wait those and It's I said there's a magic factor and That's the secret sauce and that really depends a lot on the experience of the people who are doing the migration and Sort of other factors like whether you're migrating directly from a database which is generally the easiest thing to do or Some XML feed and if it is an XML feed How complex that feed is Or if you're coming from CSV files or some other source so It's just about it. I just want to added one little note at the last Moment for the more technically inclined here. This is actually a question that came up during after Jewish's talk There's some talk of putting the basic migrate API into Core Drupal for Drupal 8 and we're going to have a boff session tomorrow at 1 o'clock room to tend to talk about that So any questions Feel free to line up at the mic so everyone can hear what you're asking So I understood a lot of your discussion was mapping Content fields for instance from a database old style, you know content management system to Drupal's But what are some of the issues if you're dealing with files that? Maybe don't have references to them in the old database. I know you you alluded to the file issue taking more time Could you amplify a little bit? Well, what what kind of files you say they're not referenced from the database Are they referenced perhaps from the content itself? You mean like hrefs? Well, that's one of the things sometimes If the You're pulling article bodies and the article bodies contain Image tags that refer to images. You've got no other way to pull them You know, that's one of the things that makes more complex because then you have to develop code That's going to parse the bodies and find those references and pull them in Files files are challenging in a lot of ways they can come from a lot of different places You know, you might be scraping them off websites. They might be blobs and databases You might be able to this is often the easiest thing you might be able to Mount Just copy the files to a file system that you can mount and directly either copy or even link directly into your Drupal system and I Drewish alluded to it and I'll admit it up here the Handling for files is the weakest part of the Migrate module and it's actually the the main theme for the next version of the Migrate Module is trying to make that easier And just one more thing about file sources Migrate has terrific support for migrating from XML files So if you happen to have you know huge XML files and directories of XML files Migrate is quite suited to move all of that data into Drupal. Yeah, so the question was what about HTML files? So really well-formed HTML files can be read in and the data can be parsed messy HTML files Simple XML won't do the job. So you need to look at other tools. One of them is query path Another one of them is like beautiful soup and Python You can find tools that will open up HTML files and get the relevant bits of data that you need there. Hi Just a couple of kind of resource planning questions What's the minimum skill set to do a very simple kind of migrate and what's the kind of maximum worst case skill set for a very complex one? Well, the the baseline Migrate module is a very much a developers tool The baseline is experience with the Drupal the Drupal API, you know comfort with calling things like node save PHP of course and more particularly familiarity with the PHP's object-oriented support the migrate module is an object-oriented framework So you have to be very comfortable with classes What an abstract class is and static functions and so forth Okay, and this is probably the wrong session I Thought this was going to be the technical session. That's my mistake Which is in terms of debugging and migrate when you're a developer and you're in the middle of it And it's something's not going well. What kind of tools are available to you? um, I Hate to admit, but I use a lot of drush print statements. Okay, and they're key points Like prepare row prepare complete I throw in some drush prints to see how the data is being transformed as it goes goes through Okay, thank you very much Yeah, so just elaborate on that The migrate module keeps track of every error that occurred during the migration if it wasn't able to save a particular node You will see that in the error table for that migration So you want to review that and do some correction in your code to cope with those problems Standard PHP debugging is still there. So things like X debug and firing up your debugger are really helpful for my great module debugging and Xh prof for performance You were talking about having a migration team and a site building team What if your company is small or your freelancer and you have small resources? What? Suggestions do you have on getting all that worked on? If you have a small team you have to be creative As I said it the the migration resource is Someone who's going to ramp down after the early part of the Development so they can You know be be freed up to participate in QA UAT or or maybe your site builder and your migrator are the same person And I just add to that I might suggest as well that if you have limited resources Focus on choosing someone in your internal staff that really understands the legacy data and try to outsource the actual migration to someone else But the the real inside knowledge of what your source data Did how it used to behave? That's something you can't really outsource that's something that someone with real deep institutional knowledge has and that person Is going to be very deeply involved on that I Know this is supposed to be for questions But I've been buried in migrate for the last couple years and there were some points in your presentation that I Maybe want to share some of my experiences because I think they might be useful, and I hope that's okay I just want to point out this is a Christopher Woodlum from Martha Stewart, and he was He actually made some many of the same points. I made here in their presentation yesterday and one of the things that that We learned we were talking about being a code geek and and really loving being in the code and so forth and then we're talking about Finding people who are going to be doing this work of mapping and you know doing the analysis of the source data and One of the things that I found over the years of working with with data I'm very much a data geek and and when you're dealing with developers I think that there are code geeks and there are data geeks and they're very different kinds of people and so when You're thinking about which resources in your company that you're going to assign these tasks to Some people really get what data is about and they get excited about data people who don't have that inclination are not going to Discover the issues as quickly and are going to like generally this is you know Not perform as well. It's not because they're not smart enough. It's not because they're not trying hard enough They just don't have that mindset. So, you know, finding people who get jazzed about going through a spreadsheet With 180,000 lines in it and figuring out where the discontinuities are there are people like that Believe it or not and those are the people in your organization that you want to put on this project Putting people who could care less They're not going to perform. That's been my experience. There's no there's no motivation that I can give them It's just it's in their nature to like they open that spreadsheet and they just want to go to sleep That's one thing I wanted to say the other thing I've also noticed I think these are just human factors that I that I want to contribute from my own Dealing with this, you know, when you raise the beer glass in the celebrate part my experience with it is Instead of the beer glass panic No matter how hard I've tried to get people to QA the staging site Most product owners don't really really take it seriously until it's live and That's again, I think that's human nature. I can go in there and I can in meeting after meeting after meeting say Look at the staging site. Look at the staging site. Have you checked everything out? They say, yeah, it looks great Well, what they've done is they've browsed the five top channel pages They've gone to throw some of their favorite articles all the stuff that's being promoted, which is usually your highest quality data But then somebody starts searching and an old video from 2002 shows up and it's all messed up So that's another thing like in terms of that long tail that you're describing you're absolutely right The development resources will diminish towards launch But you might be ready for The fact that after launch Suddenly your migration team is going to be working 12 hours a day for the next three days So that's all I want to share Thanks, that's a good insight about the data geeks after made as someone who is very proud of Constructing a clever sequel group by I am a data geek and on the ladder on the ladder point It's really important to have good project management someone who is going to be Who is going to get all those people into those mapping meetings someone who is going to get those stakeholders to look at the staging server? You mentioned quality assurance seems to be two critical places that You have to be sure you're right One when you finish your model you have any guidance on how you know that That model that you really did understand the source and that model is correct I'll take a shot at that one Mike showed that the migrate module Actually exposes each individual mapping for each of the migrations and part of the mapping meetings You know the second half of those meetings is about looking at those web pages for each migration and You know you can actually go through the trouble of getting sign-off from product owners say yes all these mappings look correct to me That's definitely I agree. It's a critical QA point I think the migrate module helps with that part of the process a lot the second one I'm older but test an evaluation of the final product. You mentioned of the million Nodes how many of those do you look at to know that it all ran well? Do you have any idea? Do you look at 10? Do you look at a thousand? What you try to do is identify a Diversity of nodes to look at you don't just look at the 10 most recent articles which are probably going to be the cleanest ones You you try to in your analysis you try to identify the outliers you find the ones that have you know no Categories versus the one that have 10 categories on them look at samples of those Look at the ones that have no markup in them versus ones that have a lot of links and other fancy stuff There's no way you can look at everything, but you can feel fairly confident if you have Got a good diversity of samples to look at that you're on the right track I'm curious to know more about The high watermark that you mentioned when you're trying to determine When you make the initial dump that might take eight hours to migrate and then you want to do a delta dump when it's in the read-only phase I We haven't done any at my company. We haven't done anything big enough where we've had to do it in separate stages like that But it's bound to happen So I'm curious how you manage that some more detail about that that delta idea the high watermark idea Sure. Well, you're right the ideal scenario is if the legacy data Every item has an updated time stamp on it and every time when it's created that is Set to the creation date and every time that is changed That time stamp is updated and then the migrate module when you implement your migration What you do is sort your source data by that field by your updated field and you query on Only items whose updated time stamp is greater than the last one you recorded every time you're on the migration You record the highest one. So that is a very efficient way if it's available to you To identify your created versus updated if you don't have that then it can be a challenge either you have to Well, a lot of times you just But you can always identify new data because the migrate module maintains a map of what it's already migrated and sometimes you just and up Letting updates go and only pulling in new things that have changed The alternative is to do Like a full update just to make sure you've got all the latest data It's so much easier if you have that updated time stamp though or some other means another actually Actually, I believe we did this with Martha Stewart Is if there's a transaction log of some sort in the old system? we had this for some of the content with Martha Stewart, so we had sort of a Special migration process that would read through the transaction log look for updates and for them at or for deletes and then mark content for update or for delete and That that well not quite as clean as the high water market still a lot more efficient than doing a full re-import Then about it. I want to just make a little pitch here that Acquia provides professional services around data migration Encourage you guys to do them yourselves and in your own organizations if you think you need help Please come talk to us Mike or I or you know fill out the sales forums at Acquia, and we'd be happy to help you guys Alright well, thanks for coming Please complete the survey on the Drupal con site and let us know what you thought of the session. Thanks