 So good afternoon everybody. Thanks for coming. My name is Steve Drucker I'm the founder and president of Figleaf software, and I'll be your special celebrity yes instructor for today's case study in migrating content from Actron into Drupal So first we'll talk a little bit about about my company And then we'll get into the the case study and get into maybe a little bit of a pseudo technical discussion as to what's involved in migrating a website from Dotnet and SQL server over into Drupal 7 and Then hopefully we'll have some time for a few questions at the end Also brought along will Kurchimer so will is a senior software engineer at my company my involvement In this was as a project manager on the on the content migration side So a little bit about us been in business for about 25 years actually started the company after graduating from the University of Maryland with a degree in computer science and We're a pretty pretty broad company We do we work with a lot of different CMS's on a lot of different platforms. We build custom applications Our single-page web applications. We're now doing some Amazon Alexa work no JS PHP Adobe cold fusion kind of got our start doing cold fusion anybody here use cold fusion before Yeah, it's pretty good And some of the other websites that we've developed and launched you've probably Actually had some exposure to The one we're going to be talking about today is is Bessie the Bureau of Safety and environmental enforcement but if you've Been on the Metro website in the last Since about December. This is a new site that we developed for for those guys pretty cool Very nice responsive site. Also the National Park Service NPS dot gov We've been managing their infrastructure and their web initiatives for about the last 15 years and Exum dot gov Which is in Drupal 7 and so Get three different examples on on really three different platforms using a bunch of different bunch of technologies In addition, we also have a training division So actually some of the courseware that we've built is available for sale on Amazon.com and we do in-person training and remote training for Got about the eight different Drupal courses spanning everything from site building with Drupal all the way through complex module development So like I said, we work with a lot of different organizations. We've also managed And host websites on Amazon EC2 and Google Clouds as well as Microsoft's Cloud and are a an aquea partner And there's just a few of the other websites that are currently up and running over the produced by us over the years So the what we're here to talk about today is a site that we launched Guess the middle of last year for the Bureau of Safety and environmental enforcement or collectively known as Bessie Bessie was actually created after the Deepwater Horizon incident and their their charter Deals with not only oil well inspection, but a whole host of environmental inspection issues Originally their site was produced in ectron.net We were selected because we're both an ectron partner as well as a An aquea partner, so we actually knew both both ends of the system and how to move from one to the other As part of that to Drupal 7 migration. We were using OpenGov It's actually a variant of OpenGov Called OpenDOI DOI is Department of Interior of which Bessie is a part of so there's an initiative within the Department of Interior And it's very many groups to move everyone over to that OpenDOI platform so the The objectives Bessie was actually kind of an interesting an interesting site in that most of their content is actually file based They've got a ton of different PDF files actually 11,000 PDF files Which actually had to be moved over from ectron over into Drupal and along with that most of their PDF files Did not meet any kind of section 508 Accessibility compliance so part of the gig it was there really two major Components to the gig first is is how do we migrate them from ectron onto Drupal and remap their content into something? It's going to be easier for folks to to find But then also how do we take the existing content and make it in both file file format and also unstructured? Database content and make it section 508 compliant to meet the Americans for Disabilities Act So as as part of that we went through a fairly significant discovery phase a lot of which was Programmatic so we would use things like site crawlers to walk through the site Because we had a fairly deep understanding of the the Microsoft SQL Server database schema that was involved with ectron we'd write a ton of ad hoc queries and Go and analyze that that current content We'd look at things like Google Analytics to see what the most popular content was on the site and how many clicks it was taking for people to locate that and We said our our IA people to work who ultimately produced a a web content mapping Big giant Microsoft Excel spreadsheet that we worked through with with Bessie to show Where the content existed on the current site and how that would might map over to? To content types in Drupal and what those URLs might ultimately be in the in the ultimately produced site And then we also set up some business rules for which content could be programmatically migrated Versus which content would have to be manually migrated and of course which content could actually be thrown out So this was a one of those Initiatives that had to be done very quickly. There was a Political reason why the site had to launch on a on a very specific date And so I think we had about roughly about four months to go from four or five months to go from start to finish Which is fairly fairly quick particularly since we had to migrate 11,000 PDF files and Run those through an accessibility check and accessibility fix process So there were actually two different groups that were working in parallel on this. There were the Drupal site developers Who were who were working on on building out the site using the new the new design which our folks had created? And I've actually got a before and after here See so this is the this is the current site that that we produced and launched in in Drupal 7 and then the previous site got over here This was their their previous site, which is which was an ectron So some of the challenges that we had was While ectron had a metadata system or vocabulary system in place It wasn't actually being used by the folks at the agency or was being used in a very haphazard way So out of that as part of that data conversion process We actually had about 15 people who went through and read all of the content on the Bessie existing Bessie site Over a period of it was about four or five weeks to accurately categorize the content. We talked about Doing keyword extraction. We talked about doing a lot of different programmatic things but one of the challenges that you find with with data migration and this is probably my fifth or sixth data migration of the last 20 years is that programmatic migration is it's very difficult to achieve a hundred percent accuracy and Folks tend to complain about the two percent, you know, if it's even off by two percent It becomes very visible and they will complain about that and data conversion is a it's a very thankless job in computing Most of the folks who do data conversion if it's their very first one They tend to quit after they've completed that data conversion job And in fact one of my first jobs in computing after I graduated with a degree in computer science Was to actually migrate a 50 table SQL server database over to 120 table SQL server database And at the end of that process, I decided I needed to grow the company by a lot So I would never have to do that again So just to put this out there for any of you who are gonna put people on data migration Make sure that there's a bonus or retention bonus at the end of that because it's always a Thankless task. It's one of those things where you know if you're building a site in in Drupal You can kind of go to your friends and say hey check out this cool new site that I built There's a lot of a lot of job satisfaction along with that You know, it's a lot harder to say hey go to the site and notice that all the contents in the right place You know, that's that's a little harder to quantify It's a little bit like walking into a library and saying well gee this library is really neat But there's you know, but I got to complain because there's three books sitting out on the desk You know, it's it's a lot like that So it's so it's a very it's a very thankless job and I always sit down with developers before we undertake these things I think you need to have the right level of expectation as the developer on the project And then the client has to have the proper expectation. There is going to be some end-user cleanup there at the end, right? So there's some some qualitative quality of life issues in fact that very first content management job or content migration job that I did I was bringing a pillow into work because it would actually take all night for that job to execute And I check on it every hour to make sure that hadn't broke and restarted all that So you learn a lot of lessons a lot of life lessons as part of that as well as Always make sure you have a client that has a comfortable couch about 20 years ago So so again, you know, so you're looking through this you have to determine which content is going to be migrated Whether some of it's going to be manual some it's going to be programmatic Setting expectations with the customer is absolutely Key for that and as part of the data migration process just kind of skipping away when you build out your algorithms To do those moves Make sure you build them in such a way that that it's very granular So you can go and execute this on specific content types or specific sections And you can kind of stop and restart it at various points in the process because you're talking about a lot a lot of data I mean this was I think about Relatively speaking, it's not a lot of data, but it's about I think about 15 gig of data overall So just kind of doing the site backups moving that stuff around Re-executing it you don't want to process where you start it and you're not going to know 24 hours from now whether it's done Which is sometimes how long it takes so so do it so you can do it in little in little bits and say okay Yeah, it looks good. Let's go and you know and fire it off One of the things that that made this project again more challenging is that we actually had to separate out the PDF files From the regular content because the PDF files had to have special treatment By a group of about 15 content remediators who spent three months using Adobe Acrobat Pro and a product called common look to go and validate that they were all accessible and most of the times they weren't actually go through and and remediate or mediate that So so you go out there and you take a look at existing content decide You know are there is there going to be direct mapping to content types a lot of times in legacy systems Some of the things that you have in those systems may wind up turning into vocabularies and not content types So you got to kind of make that that decision and in terms of restructuring Again, you may be changing URLs changing file paths And as part of that again from a programmatic standpoint When you're creating these processes to do the imports Make sure that you're creating audit trails a lot of the time you spend is just making sure that everything I do I've written out to a log exactly what happened so I can either Undo it if I need to do it or I can take that data and feed it off to a secondary or maybe a tertiary process To make that happen So, you know the first step is of course backup your content database. It's amazing to be You know and Bessie didn't fall into this category But it's still amazing to me how many folks aren't actually doing regular backups or if they are they're not restoring periodically to make Sure if their backup is a good backup, right? So in this case, we were backing up our our Microsoft SQL server database We had written some code To to go and pull that data out of Microsoft SQL server and export it as tab separated Value files or something it could be read in through a through the Drupal feeds module Get into cleaning the data so This is setting up business rules to make sure that we're we're not bringing over data That's more than say five years old or ten years old or whatever that threshold is and you know Having that having that understanding so then we're importing the content into Drupal Using the feeds module and there's a bunch of related modules to that that allow you to to take the data That's coming in and programmatically tweak it Before actually injecting it into the Drupal CMS And then of course you go through the testing process Again, this kind of comes down to a level of expectation So you need to kind of be clear with everybody that the first run of the data is probably going to look like a disaster And that's okay, right? So a lot of times you go, okay, yeah, you ran all the data in last night Now let me go to the site and check it out. It's boom, right? So you don't worry. This is going to be an iterative process a lot of this is always We'll fix it. We'll get it there So you go through that that refining of scripts and again, this is kind of really what I'm what I'm Getting to when I talked about Creating scripts that you could start and stop very easily that would run relatively quickly You don't want to have that one day of latency You know or even an hour of latency before you can check the results and make refinements You need to be making refinements as quickly as possible and so this so this kind of gives you five six and seven Is we're going to spend a lot of time a lot of refinement as you bring that data in Make sure it's getting categorized correctly make sure there aren't broken links And and whatnot and then you know running link checkers and all that all that good kind of standard stuff Best of you was kind of interesting in the in the sense that about three quarters the content were actually PDF files So there were a couple things that had to to happen there One is that ectron actually exported data Through xml. So we kind of use their built-in content exporter to export all the URLs as well as As writing some directed sequel queries just directly against their sequel server database but The the nice thing about ectron is most of their PDF files were actually stored in one directory So in one file folder it had about 8500 Files so that was fairly easy to quantify and then we had another about 2,500 files that were just spread all throughout the directory system Directory structure and those were a little bit harder to work with My advice and this is going to be a little counterintuitive if you have to go back and and do something like this where your PDF files are this separate entity that need to be checked for section 508 is When you import them into drupal Maintain the same file names if possible and again dump them all into one folder because again, we had parallel processes going on We had folks who were who were importing writing the import to go from ectron Into drupal, but then we also had separate folks who were actually tweaking the content of those documents themselves And we kind of needed to bring that stuff together So one of the things that our drupal devs did is they They took the the original PDF file names and they changed the PDF file names And threw those into the into the drupal file system And that just made things a little bit more complicated Anytime we'd have this big file of PDFs that were being modded By our section 508 folks and we had to write a special routine to pull it out of that monolithic directory structure And then drop it and overwrite the files in the drupal CMS so we kind of set ourselves up for a little bit of extra work there had we had we just kind of Tried to maintain the directory structure for the PDFs as much as possible So for the drupal import modules feeds and feed tamper were a big part of of what we used To to pull that information out of ectron And or to take the data that we had explored from ectron imported into content types and vocabulary Along the way, we were using feeds debug To kind of verify that what we were doing was the correct The correct mapping based on some of the excel spreadsheets and And strategy that that we had laid out And and features so again because of the large volume of data We were setting up multiple instances on amazon ec2 and running those Pretty constantly moving data back and forth all the time With PDF and section 508 compliance if someone asks you to Make 11,000 PDF files section 508 compliant I can tell you that it's over three months That's kind of a big deal So we actually hired about 15 people off the street We actually had developed a PDF accessibility was a one-day adobe acrobat PDF accessibility course We put them through that training As well as had them use a product called common look and and then I issued everybody a quota I said you will get through eight PDF files a day Regardless of file size and that were great in the beginning when most of the PDF files were You know two pages four pages eight pages Towards the end of the process When you had a 700 page document that started to get a little bit more unreasonable so So the so generally our our target was Roughly an average of about I think about 30 to 45 minutes per document To actually make those section 508 compliant, which was a requirement as part of the contract And again a lot of that comes down to a tagging and I won't get into the details of section 508 accessibility for adobe PDF files, but That was a big a big part of the job and had to happen in parallel In order to make the the deadline we couldn't do the import into drupal And then edit the PDF files on the drupal site that had to happen while the drupal site was being developed In order to make the aggressive deadline So when you're dealing with you know gigs and gigs and gigs of files Just moving them around becomes a little bit of a challenge. So we were Doing quite a bit of work at tarring PDF files a lot of tar a lot of ftp a lot of scp copies Moving the files around we'd written some some additional scripts to move things around in php and adobe cold fusion And and then you know associating those files with the Actually brought the files in and for the most part created content objects for each file or around each file using Using feeds tamper And again changing the name. So this would be if I had to do it all over again This is something that I would probably not do which is change the file name from the original ectron name You know keep those file names the same and that way when you're ready to move The remediated files over and drop them into place It just becomes a lot easier to do so so in Uh again kind of just talking about how files are stored in ectron They've got this notion of managed files and unmanaged files unmanaged files are all stored in one big Location one directory Managed files are stored throughout the entire directory structure So This was a matter of writing some you know sql To go and figure out what those files were and where they were located and you know pull them out and then Our our data feed from ectron was xml And we were at something convert the xml over to tsv So they could be used by the feeds module To bring that data import that data into our different content types So we uh we set up a clone of the um of the customer's ectron site, which was in production at the time and then you know went through and Wrote some sql used a combination of writing custom sql and xml to pull data out of these different sql server tables Which fortunately is not hugely complicated. They've got a fairly flat database scheme in in ectron So that generated a list of of content ids And we could use a soap web service in ectron to pull the data then out of the out of the cms And put it into a format Which could then be migrated into druple using the feeds module So in keeping with the strategy of trying to Reduce these down so you're not running the entire import all in one shot We created separate Separate processing scripts for each content type So we could make sure that each one was operating reasonably well before we moved on To the other and again a big part of this was making sure that we had audit detailed audit trails In a format which could be consumed Programmatically to then apply any Little cleanup fixes that we needed to work and that really turned out to be a huge huge lifesaver So then you can see we had all kinds of different scripts for each content type basically had its own Execution script which could run independently And so you can see we had a lot of different content object types which got imported Some of which involved pdf files some of which didn't Some of which, you know, we of course needed to bring over all the images And remap all the URLs to the images And all the hyperlinks had to be remapped So it was it was definitely a Crunch so the the feeds tamper module allows you to Take that data and before it gets written out to the cms To go and tweak it do things like convert Convert file name case removing html tags Doing various find and replaces which we had to do for file pads because of course all the file pads were changing from ectron into drupal And then on the on the qa side Again, this is one of those things that always takes a little bit longer than you expect So we had issues related to date time formats Because dealing with date formats Is painful on every platform It's one of those things I wish everybody who builds a cms would just store date times as number of seconds since the epic or something like that But now it's my own personal pet peeves dealing with that Making sure the the metadata came over a lot of what we did Revolved around making sure our counts were correct. So if we had, you know, 3000 content types We had to make sure 3000 actually made it in to drupal Verifying taxonomy terms got Properly applied. You also run into issues with special characters Where I don't think that I don't think the database in ectron was actually utf-8. So if you're moving from A non utf-8 character set to utf-8 you can run into issues there So one of the kind of lesser-known gotchas We found that you know during the input process that kron kept trying to delete everything that we were doing So that was a little disconcerting And We uh, we were able to fix that by remember exactly how we Yeah, oops, right. Yeah and the issues that you run into With working with this this much content Is will indicated tend to be both somewhat hilarious and yet terrifying at the same time You know, why do I have 10,000 cron jobs running? Oops And again, that's why you you kind of want to set this up on an ec2 instance that you can kill very Very easily although when you get the cpu bill for that, you'll probably also be very unhappy So, you know, you get into some of these issues like what if people are still working on content while you migrate? So again, this was kind of a three month process It wasn't realistic to go to folks and say hey stop working on your site for three months while we You know while we go and deal with this So there were you know a number of different things that we that we used for that I think ultimately we had a cutoff Of about I think it was like the final week. We said look ever okay There's a content freezer for this final week while we move everything over and verify it but Again, this kind of comes down to on the on the pdf side We're writing all of the scripts in such a way that they can be executed with files from a certain Last edit date. Okay, so you can say okay, we're gonna do everything up to today and then we'll do you know, we'll deal with the stuff in phases and And another tool that you can use is called gather content So gather content has a a nice little droopal integration That allows you to kind of do a continuing synchronization from From your feeds So, you know just take wrap up planning is critical one of the things that A manager used to tell me all the time was Speed kills, right? So this was kind of one of those things where the procurement was a little late The project got off to You know a start as early as it could But it was still a very tight turnaround So as a result the customer said hey look, you know, we got these 12,000 pd or 11,000 pdf files We're gonna give them to you today. You can start working on them Well, it turned out that that out of those 11,000 pdf files We only really need to migrate about 8,000 at the end of the day So there's a lot of extra work that wound up getting done So we hadn't figured out what those what those Parameters were yet and Again communicating early and setting the proper level of expectation. It's a little bit like Like screen readers and doing text-to-speech If it's 99 percent, you know successful All you're going to hear about is the complaints about that last 1 percent. So Setting expectations is really key for the development team and the and the client And lots of pdf testing tools that are out there Company called common look They actually have a product which will splatter through your site And validate that all of your pdf files are actually section 508 compliant or not Got some really good validation tools there. But again, it takes time to remediate some of these issues and We had external test consultants on the I think on the section 508 side So we were looking at it internally as well as having a external group look at it because When you produce a site for the government, if it's not 508 compliant, somebody's getting sued There's very, you know, low threshold of Of pain there. Everything's got to be, you know, everything's got to be good to go So I think that takes us to one minute for questions Any questions? Yes No, no common look is a premium product. It's relatively expensive And but I will say that one of the big challenges you have with with pdf's is dealing with with tables in pdf's And you'll it'll pay for itself Just because of that one specific feature Yeah, I guess there are other questions How many of these do you see migration rfps from other cms platforms? To Drupal and then just also Can you tell me the motivation for them moving from electron to Drupal? Sure. Yeah So the question is how often do we see this pretty much every deal that we're pitching at this point involves some level of content migration Um, there's uh, the folks who are Moving to Drupal are not moving from a previous version of Drupal. They're moving from another technology like cold fusion or the move in from dot net A lot of our I'd say about 50 to 60 of our practice is public sector Government and there's been a huge move in the dc area to move from whatever cms you have to Drupal And so that's been the motivation. I won't comment as to whether it's fiscally responsible for for a for an agency That's very successful with their current cms on their current technology platform to move to Drupal But that's you know, that's kind of the reality, right? So So most of the sites most if not all the the Drupal 7 Drupal 8 sites that we're doing Right, so just broad strokes. You might see 10 of these a quarter or something like that. Yeah, you're gonna see pretty much Pretty much everyone everything that everything that we're pitching right now I'd say it's all the public sector stuff is Is Drupal and with very few exceptions and they're moving from something to Drupal and then all the private sector stuff I'd say it's about 50 50 Drupal and dot net Great. Thanks. Yeah Yes, sir just as An organization that's currently on ectron right now and looking to migrate in the next six months I can speak to your question and say that we want to switch because ectron is a terrible garbage fire Yeah, but one thing that we're very worried about the I don't know if you had to deal with any user data Was any of that migrated as well? Yeah, it's not a membership based site. So we didn't have to worry about user data. Uh, I um We were an ectron partner. Uh I had not actually messed around with ectron until this project And uh, well I'm being recorded. So I won't say what but I'd share your general sentiment of What? Like wow But uh, actually they got bought by another company which has its own cms and they killed ectron So in that in that case they're moving their motivation for moving is ectron is no longer a really a directly supported product It's not you know and and moving from ectron to I think it's site affinity Is uh, you know, that's a migration effort in itself So if you have a migration effort, you might as well evaluate a bunch of different products at that point Thank you Yep, anybody else very good. We can help you out with that migration, by the way All right, well, thanks everybody for coming. Uh, again, my name's Steve dr. And Appreciate your time Yeah, yeah, we'll be posting the deck. Um Not sure who we're supposed to get it to at the conference, but we'll get it to the conference. Yeah, we'll get it to the conference It's in uh, it's google slides. So it'll just be a url