 Good morning, everybody. This is Managing Highly Successful Drupal Migrations. We turned the lights down so you could see the screen, so hopefully you won't fall asleep. So who are the folks up here? I'm Frank Fibrero. I'm the CTO with Phase 2 Technology. Michael Morris, Vice President of Solutions at Phase 2 Technology. And we do CMS and web application development for government publishing, community building in the primarily in the DC and the New York markets, but kind of all over the country. Some of the successful migrations that we've done that we won't actually talk about the details of those migrations, but kind of at a more general level, what are the things that we found successful across many of them. But some of the ones that we've done is MovingWhiteHouse.gov from a proprietary CMS into Drupal. We are currently migrating the agency websites for the state of Georgia into Drupal out of vignette and static sites. Moving FEMA.gov and soon to be DHS.gov into Drupal. We've done the WashingtonExaminer.com and that was a Drupal to Drupal migration. And other things like TakePart.com, the New Republic and the nation, and ESPN Sports Property called Bassmaster. We moved actually out of ESPN into Drupal and the Wilson Center. So before we get started, I just wanted to kind of take a poll of the people in the audience. I wanted to know, so first I guess like, who are the tech folks in the room that want to know about the tools that you'd use for migration? Okay, who are the people that are just on a team that is actually executing a migration for a client? And who are the clients that are getting migrated? All right, so there's a good mix. So we'll talk a little bit about the process that we go through. What are some things to look for and kind of ways to smooth the process? Up in the top right, there is a hashtag we call the DC Migrate. So if you have any questions while we're going, you could just kind of send a tweet to that and hopefully we'll get a second to kind of look through them and answer the questions. But otherwise, you know, when we're done, there's the microphone in the center of the hall right there. So what's a migration? You know, a lot of talks here have been technical and they've been focused on like just the specific of like how do we get data from one place into Drupal? But we're not really just moving kind of webpages or data when we're migrating people to Drupal. We're moving their entire organization. We're kind of moving their mindset from what they used to be doing into kind of the way Drupal makes you think or the way a specific implementation of Drupal makes you think. And basically we're just moving the ways that people do their job every day into an entirely different system. There's some care that has to be taken when you do that in order to have a really smooth and efficient migration. All right, so first we're gonna talk just a little bit about kind of like the characters that play when you, that's involved on these projects. All right, so most of you know, there's a lot of different people that get involved in these kinds of projects and we're just gonna focus on a couple of key ones. You know, we're not gonna try to be too obvious, but there are some interesting characteristics about the roles that are being played and how you may approach some of these folks. So obviously one of the most important are the people that are maintaining the content. Generally these folks are probably coming off of former systems that they probably don't like. They have lots of complaints, they've been very frustrated. You know, they are completely dependent on things like WYSIWYGZ, but yet they hate them at the same time because none of them are very good. You know, some folks don't understand even basic concepts like the fact that you can write a piece of content once and then display it anywhere. You know, they may be coming off of more kind of like static web page builder type systems. So there's a lot of concepts that for Drupal developers and for web developers in general that just have to be explained. You know, they also probably write really bad markup. They're used to being able to go in and create HTML tables and put styles in there and kind of do whatever they want. And you know, most of that, we're gonna get into why that's all a problem later on. You know, be really careful of features that they have fallen in love with. You know, there's probably something about their old system that they really like and that they wanna keep. So make sure you ask them and you know, always keep them in mind. And then you have to be really aware of things like, you know, newsletter signups and analytics and ads and things that the website owners have, you know, that are very important to them and make sure that those kinds of things don't get messed up during a data migration. So the next group of people are kind of what we call the technical owners. They could either be, you know, managing the infrastructure that these systems are running on. But then in any case, they have to maintain the technology. And sometimes they're very protective of their environments and what you're allowed to run in. So you know, we say up here, does it run on IAS three? I mean, we actually get asked those questions sometimes. So you have to kind of be careful. Like you just don't come in and say, oh, well, this is crap. And you have to run on, you know, this latest up to date stuff because in many ways you might actually be threatening their job. So it's important to kind of win them over, bring them along and help them understand the challenges of hosting a Drupal site. Yeah, so this is one you guys all probably know people like this on your projects or in your organizations. You know, this is the Saboteur and they generally can take on different forms. You know, a lot of cases it could be a team that maybe you're replacing like a previous vendor or a previous web maintenance group. So there's gonna be sensitivities to that. More times than not, you know, in a pretty big robust website there's gonna be external systems involved, either authentication systems or data that's being pulled in from various places. And those systems are gonna have owners that you'll have to interface with. You know, they might be a little bit standoffish. So, you know, make friends with those kind of folks that you're gonna depend on to complete your job. Security guys, you know, security is really important, obviously. It's a huge focus. There's been a lot of presentations here at DrupalCon about it. And I'm not trying to paint them in a bad light, but just to be aware that you'll have to do things and check boxes and do testing. And we actually talk about secure review later in this, but I have run into a few security people over the years that tend to be blockers more than helpers. And then outside auditors, that's another thing that you may encounter folks that have to come in and review code and review project management processes and things like that. So, you know, I'm not saying they're all bad. They're not all trying to sabotage your project, but there's characteristics and elements about those folks that you really have to adapt to their way of thinking in order to be able to work with them. Another one is, you know, hopefully there's always a project champion. So hopefully somebody who owns the website, whether they're paying the bill for it or whether they are, you know, an evangelist of the technology or whatever, luckily there's always gonna be somebody on your side. So identify those folks, use them to your advantage, make sure that they can help you. In return, you know, communicate with them very clearly and effectively, make sure that you're staying within scope and on budget and on schedule. And you know, first and foremost, make sure that they feel they're really getting value out of the work that you're doing and make sure that they feel like, you know, there's a good reason that we're migrating this, you know, all of this content and moving a website from some other technology to Drupal or even from Drupal to Drupal. Along with that are expectations. So, you know, this is a really common thing that we encounter a lot when we're doing projects. You know, you're always sort of being evaluated. Is Drupal really as good as everybody says it is? You know, how, you know, you're being compared to like former systems. So, you know, I think, again, it's all just like part of an overall package and overall approach to a project that when you're dealing with different individuals, try to key in on the things that are important to them and make sure that you're addressing their concerns and setting the project up for success. Yeah, I mean, one of the reasons that we have a section called People is that, you know, I'm assuming that most people here can actually execute a good site build and technically perform everything that's needed to have a successful launch. But if you overlook the people aspect and the organizational aspect of getting somebody onto a new platform, I mean, a simply bad handling of people can cause something to be perceived as a failure instead of a success. Now we'll talk about preparation. All right, so this is a big one. This is, without a doubt, one of the most difficult parts of any migration project is estimating it. And I wish I had some kind of silver bullet that I could provide to you that is a surefire way of making sure that your estimates are perfect every time. But. No, ours are. So, you know, you have to really factor in a lot of things. You need to factor in organizational complexity, how many people are gonna be involved, every person kind of adds a communication line and those factors start to build up. You know, look at how many, there's a couple different ways to try to estimate. You can sort of think of like, well, how many people is it gonna take to do this? You know, how many developers am I gonna need to write scripts? How many QA folks am I gonna need? You can also look at, you know, just historical evidence if you've done it before. I mean, that's obviously one of the most common ways to try to do estimation. And then some of it's just got to feel, you know, if you're going in and looking at the data and you're like, man, there's just a lot of stuff in here that doesn't look good. You know, sometimes it's a little bit of a gut instinct. I think the most successful way to try to do the actual move of the data itself, and today we're kind of talking about a lot of things, but we will talk about actually moving content from one system to another. Try to get a look at it first, you know. Don't just throw a number out there based on the number of pages. You know, really look at the source data, try to evaluate it, understand how complex it is, understand the relationships. And then you have to sort of factor all this in and give some kind of estimate, so. And kind of how you present the estimate in a lot of cases will affect the opinion of it. If you go incredibly detailed to a line item number where you have, like, tasks estimated at one hour, you know, you might be conveying a false sense of fidelity that you know every single thing that's gonna happen and down to the hour or half hour how long everything's gonna take. So you have to be careful kind of with the level of detail that you go into with your estimates because you might be setting up the wrong expectations. So when you do a migration, there's lots of opportunity. You're coming out of a lot of times old systems, you know, unstructured content, things like that, and it's a really great opportunity to reorganize the content. So don't necessarily just, you know, take these blobs of HTML and move them over into a new system directly. Move the data into the new technology. So reevaluate your information architecture and your navigation, whether you go from unstructured to structured, if that's actually possible. You know, evaluate a better use of taxonomy and navigational structures. And obviously it's a great time to refresh stale outdated designs and sometimes brands. Yeah, I'll add to that too. I mean, one of the things that we encounter a lot is you really have to do this in order to even get the project paid for sometimes. I mean, it's a hard sell to just say, well, I'm just gonna replace the underlying technology. But you know, the people that are paying for that will like, well, so what's different? You know, what I really get out of it. They may not really understand the value of that. So it really is good to, when you're gonna undertake all of this technical lift to try to make a move like this, make sure that you're giving the stakeholders and the users of the site and the reader something out of it by making it look better, making it navigate better and making the content better. So it's a really good opportunity to kind of take a holistic approach to improvements. So site architecture is another thing that you obviously encounter when you're building a new site in Drupal. And you have to basically, I mean, it goes without saying, you have to take into account the needs of the organization and what they're trying to do with their content and how everything is gonna be structured. There's no sense in moving over all this content and doing like a big lift there to be putting it in a platform that's either flimsy or not suited to their goals. So some of the decisions that you have to make, are you gonna build their site on a distribution, something like OpenPublic or Open Academy or something like that, are you building a platform for them? So are you basically building a customer-specific distribution that can support multiple sites, whether through your standard multi-site or something more like a virtual site or a domain access model? Are you gonna use organic groups? And then kind of take it down another level. Do all the tools that they need already exist or are you gonna have to start creating some tools or some concepts that don't exist? So you're gonna have to create new layers of functionality that kind of combine many building blocks that are already out there. A lot of these decisions, you should know upfront there's nothing worse than kind of being three quarters of the way through a project and you don't really know how you're gonna solve a particular tricky aspect of how they wanna present data or organize their content, only to be presented with the fact that you actually can't or it's gonna require another month just to get it done. This one's pretty self-explanatory, but we felt pretty obligated to talk about it. I mean, clearly any project that's going to be successful requires good communication, good planning. One of the things that we're always grappling with is kind of the right level of transparency. We work for a lot of people that really wanna see every little detail. They wanna get into our ticketing system and evaluate every single thing. And that can be great, but they're also gonna be seeing the sausage being made and that's not always a great thing. So yeah, make sure you have good project management, good folks that can plan out these activities because there's a lot more to it than just simply the technical aspects of it, which is what we're gonna get into next. So some tools and some tips. Not these kind of tools, but we're gonna talk about technical aspects and some considerations when migrating. So the first thing we're gonna talk about are data sources. Mike said earlier, it's important to get a look at the data that you're moving from. And some things that you need to really carefully evaluate is what format is the data in? Are you getting it at spreadsheets? Is it static HTML files, direct database access? Sometimes you get these proprietary formats and if you get those, do you have the right libraries in order to kind of extract and investigate the data? Is it structured or unstructured content? Like kind of the difference is, is it just some blob of HTML that represents a page or do you have the title separate than the subtitle? Is the author separated or is there a separate section for tags? A lot of that will help determine the level of effort required to get it into the new system. Another question is how do you access the data that you're given? Do they email you files? Do they put it in a drop box? Do you have FTP access? Do you actually have an API that you could use to extract the data? Do you have direct database access and a really important thing? Are you actually relying on other people in their organization to give you this data? And if so, like what's their responsiveness? And then other things to consider is when you're looking at the data, where are you migrating it? Is it all going into nodes? Is it going into taxonomies? Do you have to build menu structures? What about media, things like images, PDFs, video? Do they want to migrate like their blocks and call outs and the little custom ads that they might put on the sides? Do they want to actually migrate landing pages themselves or are those going to be built after the fact? So when we're talking about moving data, it's important to get a process that you can kind of run it, verify, rewind it, so basically pull the data back out, tweak it and run it again. So getting these automated processes of moving data is very important for our verification and also having a dedicated content verification system. That isn't the system you're developing at that time. So another thing is when you're moving data, especially from an old system that might actually have something resembling structured data, if you're gonna be able to run it, rewind it and repeat it or update just individual content items as you tweak your migration, it's really important to track what the legacy ID of these items are and what the new source system destination is so that you could do selective upgrades. And another thing is do things like create reports that have content that you might think that there's a discrepancy in or something that needs to be reviewed. It's great if you could somehow identify content that you think could be a problem so that there's a checklist for people to verify. So specifically some ways of moving data from one site to another, you could do direct SQL queries out of one database into another. You could extract data through RSS or ATOM or XML out of one system and then use a module like feeds to import it into your new system. You could use the views module, like let's say you're moving from an older Drupal system. You could add the views module if it's not already there and you could create APIs to get at their data as XML or JSON and then use that with modules like migrate or custom scripts that you've written to transition the data from one system to the other. So I mentioned the migrate module, which is a really great OO approach to doing automated migration and transfers. It has a pipeline process where you can do, you could tweak the data and reformat it and structure it on the way in. And then there's also professional services like AQUIA's migration service so you could actually outsource it if you think it's a heavy lift or it's a system that you're completely unfamiliar with. It's a technology you've never seen and you feel too intimidated or you want somebody else to be on the hook. Frank, why don't you talk a little bit about the importance of going through node saves and things like that versus just writing directly into the database. Yeah, so as you're building the system a lot of times modules will, they'll kind of intercept save operations and do things like when you save a node through the node edit form, it'll take the title and it'll create a path alias for it so you can have a clean URL. So if you just extract your content from a source system and put it right into the node tables, you miss operations like that that happen and you have to account for them. So that's why modules like Migrate are really efficient is because they actually put content through the normal saving process and updating process that it would go through as you were using the CMS yourself. So a lot of things like, if you have automated tagging services or like I said, path aliases and things like that. So once all this data makes it over usually looks perfect, right? Boy, so this is where the fun starts. For folks like me, I don't do a lot of the script writing anymore but I work with the teams that are doing QA and kind of looking at this data and trying to make sure that we're getting an actual nice looking website out of it. So you'll find everything. You'll find everything from just poorly formed HTML. You see a lot of, sometimes it's really crazy stuff. People will just take embedded JavaScript and flash tags or whatever. Just things that if they had the flexibility to put whatever they wanted to in the body of the content of the former system, if they had the ability to, then they definitely did. And then other things which aren't quite as malicious but are things that they do, inline links. So if they're linking within their website and their hard coding paths and things like that, inline styles, you always encounter a lot of styles and you have to make sure that you are handling those properly. Either have equivalent styles on the new system that can match up and can kind of highlight, treat block quotes and things like that the same way or come up with translations. So writing scripts that will grab a style and rename it to something else, so a new style that's gonna match your new theme. Let's talk a little bit just about what are some of the good and bad things about moving data. So if you just move the data from wherever it was to your new system, it'll work, but it's not necessarily the best thing. Something better might be, like I mentioned earlier, to as you're moving the data over, I try to identify if there's things like, if there's embedded links and there's links to PDFs or image references, flag them so that at least you can have a list of things that people need to review manually. Even better is to perform scripted cleanups, so as you're migrating the data, use things like regular expressions and strip out inline styles. And then kind of the best thing over there is as you're moving data and you're cleaning it up to also translate asset resources and embedded links so that the content when it hits your new system is perfect. So what are some of the cleaning supplies that you could use to actually do the cleanup that we just mentioned? So regular expressions is fourth on this list, but it's really number one. Most of the time you wind up having to do things like strip out bad tags and translate tags and regular expressions are really like a great thing for that. But there are other tools out there like Google Refine. Like sometimes people think Google Refine is really just like a spreadsheet tool, but it actually allows you to do like a lot of data manipulation and changes across whole swaths of data. So if you're getting your data in spreadsheet format, this is actually sometimes a really good tool. And even in other cases, if it's not given to you in spreadsheet format, sometimes you might want to export it into a CSV, move it into Refine, do some tweaks, pull it back out as a CSV and put it into the database again. So it could be a many step process to clean up your data. So you could consider it like a pipeline and each step has its goals. And that's what you do with Yahoo Pipes. You can actually put together these pipelines of content processing to go from one source of content to like a finalized source. And it works really good if you have feed data like RSS or Atom. And then another thing is things like Xpath and XSLT, if you wanna actually dive into the markup itself, if it's well-structured HTML, you could dive into the markup, identify and remove certain tags and Xpath is actually one of the good tools for identifying things that you might want to flag later. So there's a lot more than just markup that's being brought over. Everybody has lots of different things on their website. Obviously we find lots of images and videos and assets, lots of PDFs. Usually there's some treasure trove of some folder that has thousands or hundreds of thousands of PDFs that have to be pulled over. And those assets are being linked to all throughout the markup. So you really kinda need to be prepared to spend a lot of time in this area. You have to, one of the things we encounter are access control issues. So if you're building a site that has kind of group-based permissions, you need to think about where the files are being stored. And usually they're probably coming just out of a file or out of a directory that may have subdirectories and you need to either maintain some kind of division on the other side or come up with some other plan. So there's usually something that has to be worked through there. Like, and I already mentioned, but there's always gonna be references to these things. So again, that's where the previous stuff we're talking about. You gotta write scripts to go in if you're pointing to, if you can keep the same path that you had in the old system, that might be fine. But if you can't, you need to make sure that you translate them. And another thing to consider is, in a lot of older systems, you'd link directly to a PDF. But in something like Drupal, you could have fields that are specifically for referencing images and uploads. So you might need to consider figuring out which things are embedded and moving those into a more structured asset relationship if you're gonna use that field to, for example, display a list of related attachments or something like that. So then also, when you're moving things like media specifically, like images and video, where are they going? So they were probably just in a directory in the old system. Are you gonna maintain the self-hosted video and images? Or are you gonna move them to something like Vimeo or BrightCove or YouTube, which is much better? Or are you gonna actually move them to something like a content delivery network? And if you do things like that, you might have to have URL redirects in order to make sure that they're referenced properly. And then also, when you're moving, are there media player changes? Are you gonna use like a different player? Yeah, in a lot of cases, depending on the video solution, we don't always recommend a different one. If you've already got BrightCove and Vimeo and that's working great, or YouTube, YouTube embeds work really well. And we're not always suggesting you have to go out and get some high-priced video solution, but at a minimum, you've gotta think about how that content gets moved over, or at least references to it, in the case of YouTube embeds and things like that. So there's always some trickery involved in moving over. And when you move to a service, you also have to consider, how are they gonna be adding new ones? Are you gonna integrate that through a WYSIWYG? Is it through a media field, a video embed URL, or something like that? So those are more things to consider. Legacy URLs are always a very fun, a very fun situation to be dealing with. And it's really tricky, too. There's some bad ways to do it, like having no redirects whatsoever. In which case, you're gonna lose a lot of traffic, you'll probably fall on your Google rankings considerably. There's really bad things to do, like generating over 100,000 redirects and then putting them in a dynamically loaded file like an HT access file. And then wondering why every single page load takes 45 seconds. There's other solutions like the global redirect module, which basically bootstraps Drupal. When there are 404s, it actually looks to see if there's a redirect and then it'll send you there. Well, that's good because you could be reviewing the Google Analytics or other analytics and figure out what the remaining 404s are and then use this module to account for them as well. The bad part is that you actually have to load up Drupal for a request that's just gonna be a redirect. Some better things are to do pattern-based redirects. So some type of regular expression that you could put in a statically loaded web server configuration. Those can be executed pretty quickly, especially if you're using something pattern-based. You might only have to evaluate 40 or 50 options instead of 100,000. And then ideally, you do something like redirects at the edge and what the edge means is like, if you're using something like a content delivery network, a lot of times they'll have support for redirect. So the best redirect is one that doesn't have to get to your site until you're actually loading the page that they're after. Frank, what's the best way to keep track of URLs that come from an old system and do those get delivered in a spreadsheet and they have to get parsed and put into some kind of, like what's the best process for actually doing that? Yeah, so a lot of times you could use, like a lot of times what we'll do is we'll look at some analytics, like Google Analytics, we'll take the top 100 URLs and we'll start there and we'll see, well, do these actually maintain their paths from one system to the other? And then if not, you'll determine from this path to this other path, you'll create a spreadsheet and then you could sometimes automatically generate through regular expressions or something like that through a scripting language like Perl, you could generate what the redirects would look like. Yeah, and the number of redirects that you ultimately choose, I mean, it's kind of up to you. A good starting point is to take your top 100 most traffic pages, but for a lot of sites that's just not good enough. I mean, you have to keep track of every single one of them and that's where some of these other, you can't just dump all of those into your HT access file or you'll have big problems. But it's really important to monitor your site in like the few months after launch to see what is still a 404. And you could use your web server logs for things like that, it's really easy to filter out based on status code. For those that don't know, a 404 HTTP status code means not found, so you could filter your access logs for 404s. So once the content is, you know, once you've done a lot of these things, it's absolutely crucial that the people that are, that can evaluate the content because nobody's gonna know the content better than the folks that maintain the website. You've gotta get them in early. And this is where you've gotta be willing to kind of let them see stuff that's a work in progress and they need to understand that it's a work in progress and evaluate it appropriately. But, you know, very early review, you know, get them involved in like the iterative process, you know, let them know like, you know, you're gonna run a new, you're gonna rerun your scripts like, you know, every night and they can come in the next day and see how much better they've gotten. So that's obviously really important. Yeah, I just said some of that. So make sure you run it multiple times. Another really tricky thing is trying to coordinate the moving of content with the building of the website that will host the new content. And with Drupal, this is an interesting problem because you can make, you know, sometimes, you know, the theme, your whole theming process might be running like a little bit behind the rest of your development process and you may have a site, you know, we call it a skeleton site or an unthemed site. And if you wanna be using that to kind of stage your content migration, that's great. Make sure you explain that and make sure that the person that's doing the reviewing on the stakeholder side, make sure that you kind of trust that they can evaluate the content without having to see it in its perfect final form. Which is sort of the best thing to do if possible. If you can, if you have enough time and you can coordinate some of these things in more of a series, then you can actually be pushing content in to a staging site that's pretty fully built out. It's got most of the functionality, it's already been themed and it will sort of feel a lot more comfortable to someone who's trying to evaluate the effectiveness of the migration. And when you're doing this, and you know, this is a site I wanted to put in here because believe it or not, this is something that can really be confounding to folks if you're not really careful, but generally in a project like this, you're gonna have multiple environments, you're gonna have an integration environment where developer codes coming together, you're gonna have a staging environment where you're looking at content and you may have a production environment, which you will at some point. Make sure that you're very clear about what is happening on each of these environments. Make sure that you're naming them correctly with subdomains or something like that. Make sure that the folks that are looking at the content understand that they might, like I'll just give you a scenario. I'm looking at a piece of content that got pulled in from an automated script and I'm looking at it and I see problems and when I'm seeing problems I'm actually seeing like functional bugs, like oh well this navigation thing didn't work right or this widget didn't work right and it has nothing to do with the content but it's natural that people are gonna find those kinds of bugs. So they're gonna report those bugs and then you fix them but you might not be fixing them on the actual content staging site. You might wanna fix them in another site so it's pretty common for us, especially in big ones where we try to keep our content environment fairly stable because you don't want it to be constantly changing. You wanna be able to kind of compare apples to apples when you're doing these iterative runs of your scripts and it's hard to do that if you're trying to change all these variables at one time. If you're fixing a bunch of bugs and functionality and doing a bunch of theming changes and trying to do content migration all at the same time and people trying to do manual tweaks of content which we're gonna talk about in a minute and it just gets really confusing. So just make sure that you delineate these environments very carefully. Use something like the environment indicator module or put a big red flashing banner or something so people can understand what site they're actually on. Social and semantic integration. So these are things to test as you're migrating content as you're building outside functionality. Like have you implemented open graph tags? Have you tested what your articles will look like? If somebody shares it on Facebook or shares it with Twitter, like these are all things that you should go through as part of your process. To get into more of like the semantic web and technical aspects that help with search, like have you specified RDFA properties for your content through the Drupal hooks? Are you not using RDFA or using something like micro formats or micro data? And then once you've specified these things, have you sent them through a validator to validate that they're actually readable and that they're structured correctly? And this should be part of the process all along. It's very easy to have some potential theming changes actually mess up your micro data or RDFA markup. So part of the process should always be a verification of these things. And one of my favorites is, I think you should always use that Mebo toolbar. That thing is awesome. It sits at the bottom of your site and pops things up all over the place. It's really great. Yeah, it loads really fast too. Yeah, makes your site fast. Code reviews. So this isn't really part of like a data migration and this is I think just the best practice in every site build. Code reviews are important because having many eyes on your code makes it better. It catches bugs. And the important thing is keeping the customers happy through a migration. And when they're moving to a new tool, if they see things constantly breaking or if they report a bug and you fix it, but when you fix it, you break something else, like that kind of erodes their confidence in the system. So code reviews should be trying to ensure things like consistency and reuse and isolation. Like one thing that could be a big problem is something might not be themed correctly in one place. And when you update the CSS, it actually unstyles another page. So trying to look for things like overlapping CSS selectors, things like hard coding, the pre-process data mungings, like if you use hook pre-process to shift data around or reformat it a lot, those things can actually affect negatively how the site looks and how it functions and how stable it is. So not everybody has an organization big enough to have multiple eyes looking at code, so if you just have one guy and that's all you have, then that's great, but ideally, you'll have many eyes looking at your code. If you use a system like GitHub to develop your site, there are these things called pull requests. So if you have multiple people working on a code base, they can each fork it into their own repositories and then basically issue requests to have that code reintegrated in with the mainline. And it's a good opportunity. It's basically like a gate, where the code has to come through a gate and someone has to look at it and they could say, okay, everything looks good. There's nothing out of whack. And sometimes you might not review every single line of code, but it's a good indication. Like if it's a ticket to fix a blog feature that you have and in looking at the code change, you realize that there's actually a whole bunch of changes to some module that you never even considered. It's like a good opportunity to go look and say, oh, did they mess up something while they were doing it? And a tool that we use and we like a lot is a tool called Crucible from a company called Atlassian. And it's actually a formal code review tool that allows you to upload change sets or whole files or whole modules or whole sites. And it allows for threaded discussions on any given line of code. So it's a really good tool for ensuring uniformity and making sure that other people are looking at your code. The tool, it's called Crucible. Yep, security reviews. Those are also important, very important. And for some clients, it's more important than others, but in general, a hacked site is bad for everybody. It's bad for your client. It's bad for the company's perception of you and your work, and it's bad for Drupal. So we think security reviews are also important. So you could do none. And for some people, that's perfectly acceptable as long as you follow secure coding practices in Drupal, there's actually a page on Drupal.org about it. Things like Check Plane and verifying content and using input filters correctly in the HTML text areas. But then you could also do automated tests. You could use tools like Drupal Scout. There's Akui Insight tools for security. There's also automated penetration testing tools that you could use. And then the last one, which really only applies in a very small number of cases, IV and V is Independent Verification and Validation. And it's basically having a third party come in and perform a security audit. It usually involves you giving them the code and they perform what's called Static Code Analysis where they have ways to scan the code to look for bad practices like things that might be SQL injections, cross sites. And then they'll run tests against a stood up instance of your site to test for things like cross site scripting attacks, which can actually compromise administrators. So that's the tools and tips part. Now we're gonna talk a little bit about the transition, which is kind of as you're finishing up a lot of the work and you're preparing to go live. So some of this stuff is probably self-explanatory. So we'll try to move through a little bit quickly so we can have time to get some questions going. Obviously training is really, really key when you're doing this kind of thing. It generally takes longer than you think it does. There's a lot of different ways of doing it. I'm a big fan of the good old in-person, let's all get in a room, let's put everything up on the screen, let's go through it, let's spend a lot of time together. I even like to get people, I think it's a good technique to get them out of their office, like have them come to, if you're building the site away from their office, like get them out of their daily work routine. Don't let them check email and all that kind of stuff. There's a lot of other ways too. Obviously screen casts are really cool, they are really helpful. I finally tend to get out of date pretty quickly, so screen casts are usually, our best success we've had with doing screen cast type training is in little small pieces, like a minute long or five minute long, like a little thing that can kind of, they can just go back and reference fairly quickly. I think if you're gonna put together like a really long 30 minute training video, you could probably regret that because it'll probably be out of date pretty quick. I think the screen casts that actually work best are when you do a training, just record it, and then you could post it for them to look at later, because a lot of times, a training, you're just, you're kind of overwhelming them. It's like an hour or two or three hours of all this new stuff, and sometimes they're still thinking about old things, you've shown them as you're going over new things, so it's helpful to have this so that they can go back and reference and kind of watch it a few times and understand it, and then sometimes you can't always get people in the same room, and you don't wanna necessarily have to perform the same training again and again and again, so if you actually record that training, they could just share it internally to the organization. Yeah, they can also share documents, and obviously doing documentation is a good way. Again, that's probably one of the things I wouldn't go super overboard on. We like more of kind of like a quick start, we can even call it a quick start guide, or more of a reference guide, put some screenshots in there, like some of the most commonly performed tasks, like entering in an article or creating a menu item or whatever. So, but yeah, there's a lot of ways of doing that. When you're going through this, usually we're trying to time training to happen right before the time when we give everybody the keys to the new house, so they've been trained, they get their logins, they go in, and this is when, if you've done a really good job, this can go really smooth, or all hell could break loose. So your folks are seeing the site for the first time, they're seeing all of their content in there, they're starting to visualize it, they're going back and they're scrutinizing it against their design comps and they're checking their requirements list and things like that. So, have a really good QA and feedback loop. I think this is where I see a lot of organizations kind of fall down in this, and it sounds really simple, but it really does take like a really good process. Make sure that there are ways that the people that are reviewing the site can submit bugs to you either through some kind of system or shared Google docs even might be okay. I mean, it tends to, it can get a little bit out of hand. We tend to use, we use Jira, another Atlassian product, and we've used it for 10 years and we love it. So, if you don't have some kind of ticketing system, you should get one. Yeah, visibility is the most important thing. They wanna know that people have looked at or commented on or even just acknowledge the fact that they've accepted the feedback that you're getting. And then also, it might be, it's a good opportunity if they actually have access to your system to ask them for things like clarification, like they might say this isn't working and then you'll say, well, can you give me a screenshot or detailed steps to reproduce it? Yeah, we also use another Atlassian plugin called Bonfire, which is kinda cool because you can start testing sessions and do screenshots and submit test tickets right into the ticketing system, which is kinda nice. Give your folks plenty of time to prepare their content for their launch. Do not schedule the completion of your automated migration right up to the day before they're supposed to launch. I think minimum several weeks, a month, is not too long to give people time to be comfortable with their content, you know, start working on their homepage because that's gonna be the most important thing to them. And then there might be just a lot of manual work that has to happen. They may be like, they wanna go back into some archives and do some tagging now that they've got taxonomy in place. So just give the editors like plenty of time to go in there and do that. And that actually impacts how you do your migration. So if your migration is just one big move everything at once and you wanna do it a few weeks before launch, then that means potentially two weeks of dual content entry until you launch. So another way that you could do it is you can build your migrations to have, I guess what they call high watermarks. So you track what the last piece of content that you imported was or what the date was and then you could start again from that point forward so you could just move subsets of your content over. So you can move it over a month before and then the following Monday you can move over that last week's worth of work and each Monday you can move over the previous week or each night you can move over yesterday's content. So developing the migration scripts can actually help with the content prep. Oh, and also remember to clear out test content before you go live. Remove all the pictures of cats. Right, so one of the things that we do is early in the process we actually add flags to all of our content that allows you to just hit a checkbox and mark it as test. So then later on when you're going live one of the steps that you have is you search for everything that has this test checkbox and then you could just kind of bulk delete them all at the same time. You can also add flags for like final. So if an editor is looked at a piece of content and they're completely happy with it and just mark it final and then they know that like don't touch it. Then you could run reports and tell them how much of, you know, what percentage of their content has been reviewed and like where you are at various steps in the process. Pre-launch this is always the fun part. You know, one of the things that we try to do is really pick the launch date carefully, taking into account weekends, holidays, vacations, things like that. I mean, I can't tell you how many times we've had clients try to launch a site on like December 28th or something like that. It's like the week between Christmas and New Year's. A lot of people take vacation or somebody wanted to launch their site on January 1st and things like that. So really work hard and plan with the client and be reasonable and you know, ideally you can be done and launched and in the production, what we call I guess a pre-production environment for a week and then it's really just a matter of flipping switch. And I would also expect it to slip because you know, that's just kind of the way things work, right? I mean, most of the slips that we see are often, you know, it's not that we didn't get something done, it's that the organization that owns the website is just not ready to go. You know, they haven't had enough eyes on the content, they haven't gotten various approvals, or they decide the last minute they want to add some more stuff. So there's any number of things that kind of push that launch date back, but for the developers, kind of be prepared for this. Like don't expect your development team to just drop off the face of the earth like the day of the planned launch. Make sure that you've got some time to deal with shifts in the launch date because it will almost always happen. Yeah. And another thing is have checklists. As you're building the site, there are things that you know you're gonna have to pay attention to on launch day, whether it's adding an API keys or checking that analytics are working, that the ad server is working, that commenting is working, all these things. So build checklists, assign them to people who have the responsibility of verifying that they're done. You could even do things like set calendar invites or if you use a tool like Basecamp or Atrium, you could like set up basic like tickets in your ticket system to make sure that people perform these actions and assign them due dates. Yeah, we always even go in just go look at how user, like the settings on user accounts and make sure that you don't accidentally have the ability for just people to anonymously create accounts. So work on a checklist and as you do lots of sites over time like build it up and ours is probably about this long now and we go in and just check every little thing even if we are sure that it's been done, we double check it. And another thing we do is what we call disaster scenarios. So like if you have a multi-server environment plan for what happens if the database server fails and the way you test that is have your site running in the pre-production environment and turn off the database and see what happens. And then basically you'll run through all of these plans these disaster scenarios and then you have like a book that you execute while when the database fails these are the five steps that I take and by actually executing these scenarios you figure out what your process is because there's nothing worse than trying to figure out how to recover from something like that while your site's down and the pressure's on. So get that all out of the way beforehand and you'll feel a lot more comfortable when things go wrong. You could just fall back on this document that you've developed based on what actually happens when things go to basically go to hell. And then there's launch. Yeah, running out of time. So you have to prepare for traffic. Hopefully you've done things like load tests and you've figured out where the bottlenecks are and you've resolved them. Is your monitoring working? When you do your disaster plans it's really good to have monitoring on because you'll see, does my phone get a message when my database server fails? Throw a load test at it and try to overwhelm your web server and see if you get notifications that there's too many Apache processes or things like that. Did you remember to have a custom 404 page? Did you remember to theme your maintenance page? Because sometimes your site actually will go down and if so, the blue Drupal sites under maintenance isn't good. At least if you're gonna go down, have something that looks really good, maybe make a joke of it or something like that. And then how's your caching? Are your caches configured correctly? Do you have the right TTL times? Like how long these cached items are gonna stay in your cache? Does the cache clear reliably when you update content? Is your search index up to date? So a lot of things like that. So the big day, this is always a fun one. Hopefully everything you've done up to this point makes this process go pretty smoothly, but we've certainly learned a lot of lessons over the years. Some fairly comical ones that we've run into is the client can't find the login to their DNS. They don't know who, some IT guy that worked there last year had an account at wherever, GoDaddy or whatever the sites hosted and they don't know how to log in. So I know it sounds ridiculous, but it can't happen. So make sure that that's part of your checklist that you, it does happen. And a lot of times we'll recommend that clients log into the DNS a few weeks beforehand and turn down the time to live on their DNS records because that helps with a quicker propagation. And then also have a rollback strategy, like we all like to think that nothing's gonna go wrong and the world's great, but things actually go wrong. So think about what you're gonna do. Like if the site goes down and you can't get it back up, how do you recover? Can you revert back to the old site? Do you have a static site that you could put up in its place, so plan for these things? Yeah, if everything's gone really well, if you've done lots of great marketing and your site's awesome, you get tons of hits and your traffic will spike. So be prepared for that. Make sure that you can kind of take that first wave that comes in and make sure that you've got your XML site map and everything up to date and that the Google and other various search engines are finding the content on your new site. And then there's the day after the launch, all the adrenaline's gone, your site's up and hopefully it's running well. So assuming that things didn't go bad, like what you should be doing now is just monitoring your site for performance and seeing how people are using it. What we like to do a lot is we like to look on Twitter and Facebook and see what people are saying about the site. Sometimes there are problems and you don't even know it and people are tweeting about it and so you can kind of figure out what's happening from that. Social media won't cut you any slack when it comes to those things. And then other things to consider is do your editors need access to the old site for any reason to get, to migrate over other pieces of content, to verify things from the old system to the new system because so basically don't take the old site away, give them some access to it for some period of time. So that's it. And that's it. That's how to manage highly successful migrations. Are there any questions? Well, there's somebody at the mic so actually if you could go to the mic as well. Thank you, this was very useful for me. Talking about cleaning up content, do you have any recommendations on how we can get rid of Microsoft Word coding? Regular expressions. So the question is how do you get rid of Microsoft Word encodings? And a lot of times it has to do with either using regular expressions to strip them out or using some type of XSLT like a transformation. If you could get it into well-formed XHTML or something like that, you could sometimes use XSLT to transition the good pieces of content out of it and leave the bad pieces behind. I was asking if you can tell a little more about the estimation part. I know it's always a challenge to kind of size the migration effort. So how much detail is appropriate? Usually you have to sacrifice a small animal first in order to get the gods right for estimation. But yeah, I think really the key here is you need to be able to get a good look at the source. I think volume is sometimes an indicator. The different types of content, if you're building a new Drupal site and you've got, often the way it happens is there might be 12 new content types that you've created, but maybe there's only like five pieces of content coming from the old system. So just try to narrow down exactly the kind of content that's gonna come over. Like I said, we really don't have a great way of doing it. We'll do a lot of ranges. So we say we think it's gonna be about this big. Sometimes we get surprised and we've done some migrations that literally just take a couple of days and the content came over really clean and it just worked. It just kind of went in really nicely. And we've had others that really just take a long time. Several weeks. Okay, yeah. How do you usually mitigate against kind of the hit you're gonna take to SEO as you have all of these new pages that are out there? I mean, it's like a 301 and real canonical tags pointing back, do you map that out in advance or just at a high level, how do you usually mitigate that risk? Yeah, I mean, certainly 301s is really important. That's why when we said in the presentation, sometimes just having a handful or 50 or 100 isn't good enough and you really have to map every single URL. Or in some cases, we'll maintain aliases from an old site. That's the best strategy. If you could maintain your aliases from one site to another, even if they're not good or you don't like how they're structured or they don't necessarily make sense, you're gonna suffer the least impact by not changing them at all. So then what you could do is you could leave legacy ones the same but have new structures for the new content that they're creating. Yeah, and then just like all the other SEO things, like make sure your markup is structured correctly and using H1s and H2s. I mean, I'm sure, we've all been hearing presentations for years about that so I don't need to go into details. And like I said earlier, if you use things like RDFA or micro formats or micro data, that can actually help your old content as well as your new content. Hello. This was a great roadmap for migration from the organizational point of view but what about your customers that actually hit the site? How would you integrate them into this process or would you? Yeah, that's a really good question. Yeah, I mean that probably, we could have touched on that and to be honest, we had a hard time kind of figuring out what exactly we wanna talk about because there's so many aspects to it. I think that's when you wanna do usability tests, you wanna get your audience involved very early on. Focus groups and things like that. Yeah, we're running focus groups, identifying the different audience types that are reading your site. You can run usability tests on very early, like wireframes or design comps. So yeah, there's a lot, in fact, I'm sure there's been a lot of sessions here this week about that so I'd encourage you to kind of find more out about that. But yeah, back when we had the slide with the funny guy from the 60s, the refresh and reorganize, that's kind of where, when you're gonna go and change a website and make improvements to it, certainly taking into account your readers and your audience is an important aspect to that. Yeah, I just wanted to get some high level advice from you guys, so my current site is for a law school and half of its static content that's semi-structured in Dream Reaver templates where there's a title and a content and a sidebar. And then the other half of the site is dynamic Java server pages that's actually coming out of a MySQL database. I was just curious for some kind of high level advice on a strategy to deal with each side of the website. So there's obviously countless ways that you could do it. One way would be maybe to do a migration of one set of content into a format very similar to the other. So maybe export the data from SQL Server into a kind of structured format or take the semi-structured data and put it into MySQL with the other content. So that might be a way to help write kind of one set of migrations instead of two. I mean, you could just also just have two different sets of migration, have either have different people work on them or the same people. Does that kind of get close to the answering of your question? Or I think that really depends on who's doing it. Some people are more comfortable with like a command line tool and some people are more comfortable with SQL scripts. And then if you have, I think if you have like some Drupal talent using the migrate module it's probably a really good way to do it. And migrate, you can write different things to get it into migrate. So it can migrate can handle both the static files or the database content. Sort of a technical question. Yeah. Going on what you were just saying with the migrate module. We had a recent migration. We went with feeds. And the more I used feeds, the more I found that the use case of feeds might not be directed for migrations but more for other content displaying and sharing. I was wondering what your recommendation is on feeds and migrate and what the pros and cons were. I think in general, migrate provides a much better full end to end process for getting data in because it gives you, you can intercept at all points of the import through the node saves and things like that. So you can have a more robust migration where something like feeds is strictly like, I have a few different fields. I just wanna put them in a few different fields. You know, I have a few source fields. I wanna put them in a few destination fields. So just a very basic mapping. So I think in general, migrate is the best but it really depends on the content and how much you have to do to it to bring it into your system. If you don't have to do much and you don't have much, I think feeds can work really well and be like a much simpler thing. You could even do it without writing any code but if there's a lot of translation and kind of taking disparate pieces of data and putting them together within the same node, then you're definitely gonna wanna use something like migrate instead. Migrate also gives you a lot of like nice rollback features too, right? Yeah, it lets you kind of play it, rewind it, play it again. It keeps track of all the nodes that you put in and kind of lets you go in and see how many made it and then you can wipe them out and start over again. Yeah. And I also wanna thank Laura on our design team for doing all the fun illustrations. Thank you. Thanks.