 just wasn't turned on I just use it like this thanks everyone for coming we're about to get started so today we're talking about migrating at scale and how not to fail so this is assuming that you are already familiar with migrations if you've never done a migration before then this is not necessarily the session for you we just don't be there will be some code later on and it's don't assume you've already written YAML files or you've written process plugins and so on so first critical introduction to ourselves I'm Stella Power I'm the managing director of Anartec this is a digital agency based out of Ireland but we're actually distributed all across Europe and yeah I'm a recovering backend developer too so while I am managing director every so often I like to get my hands dirty in code and what I like to do are migrations and I'll pass over to Eric. Hi I'm Eric Erskine I've been working with Drupal for about 12 years I've been working with Anartec for about a year and a half I primarily do backend development far too many migrations there's never too many migrations so Anartec specializes in well Drupal that's why we're here and a lot of what we do are integrations and migrations and a lot recently of migrating from legacy Drupal 7 sites given that end of life for Drupal 7 is next year trivia question hint that's not surprising so recently we were tasked by University of Limerick to take their existing Drupal 7 infrastructure and migrated to Drupal 9 so they have over 200 sites and we decided for phase one that we would take 50 of those so you know governance of the of the so there is a number of sites in a multi-site platform and then there was a lots of sites that are sort of standalone so governance is proving tricky every site had a different login while some of them were similar some of them were not so a course on one site course node for example was a bit different on other sites the person content type of one site was different from another site so they had while some of them had started from the same base they had diverged over time maintenance was a bit of a nightmare as well security updates different modules some had extra plugins some of the modules weren't you know it's supported anymore so the plan was or the decision was taken to upgraded to Drupal 9 so as I mentioned for phase one which we've just completed we took a subset of 50 sites now these were the 50 sites that were on a multi-site installation and so that we were migrating them onto one single unified Drupal 9 platform this would allow the University of Limerick to centrally manage all the logins all the permission and have one platform to maintain for this we decided to use the group module hands up anyone who's ever used the group module who is familiar with it so every department used to have their own website or every school in the University had their own website they now became a group with their own editors in their own content so if you're an editor for a particular school's website you locked in and then you could edit the content just within the group that you're assigned to and we also use layout builder so the legacy platform had paragraphs and we switched that to layout builder so oh sorry I know I am right so of the 50 sites there was over 60,000 nodes and over 70 content types and that's not 70 unique content types that's just 70 content types with the same machine name and then there was over 95 paragraph types and over 60 vocabularies at the end of the day there's 30 over 37,000 or no over yeah over 37,000 individual migrations that we ran so we'll talk about so this lesson or this session is about the lessons learned in that and how we the tips and tricks that we learned and picked up along the way and some things that made our lives much much easier so we go through that in this session one thing I want to cover though is why we decided to rebuild versus upgrade or why you might decide in your project whether you're going to upgrade your site or just rebuild it so Drupal 7 was released over 11 years ago some of you who were around then might remember the parties the Drupal 7 release party so there there's a map and you should see all the parties all around the world but you know it's 11 years old and when it you know reaches end of life as currently scheduled for next year it's 12 years old and technology moves on and sites like certainly if you have an early days Drupal 7 site it's grown and has grown organically and things have changed you might move from field collections to paragraphs so you might have both in your site you're going to have made decisions that suited your organization back then but your organization has moved on and your website has moved on and it or your website hasn't moved on and you need to refactor and rebuild so it's simply no longer fit for purpose and that's really why we recommend that you rebuild it's the perfect opportunity to do a redesign and set up the site for the future rather than carrying all your legacy you know legacy decisions and latency infrastructure with you and then of course there's the issue of abandoned modules where there is no upgrade path or where there's simply just no clear upgrade path maybe if to use a different module but there's no way to different aid to be so preparing for migrations get to know your data analyze analyze analyze analyze so there's no too much there's no such thing as too much planning look at all your legacy data structures every single field and every single entity whether it's nodes paragraphs field collections recoveries beans anyone anyone rubber beans so you you need to get through you need to understand your data you need to understand what you're going to keep what you're not going to keep so maybe there's some content that is completely out of date and you don't want to migrate so you know you need to know that or maybe there's some you do but you want to restructure it maybe you want to move from field collections to paragraphs maybe want to move from paragraphs to layout builder like we did so you start by mapping your data structures you create a spreadsheet is what we did for every single entity and you map it to where that data is going to live in the new site so this process can take a while it can take weeks it can take months depending on the size of your migration and how many sites you're doing and then you define your control data so this is the data that you know covers the different aspects of the sites the different combinations of fields and you define that as your control data and that's what you're testing with constantly you run the migration it doesn't work you need to tweet it you roll it back you run it again and when the test pass with your control data then you move on to you know doing it on the full thing so this is an example of one of the many migration mapping strip as red sheets be created so and this was all excuse me this was all done by hand for all 50 sites for all 95 paragraphs by all 70 content types on the left hand side on the blue side that's the legacy data structure so you can see we've got the field label the field name the type of thing it is and then like if it's a term reference or paragraph sensitive reference field what it references what can that field hold then you've got whether it's a required field you know how many values can it take is this a single value field multi value any notes you might want to keep now for this particular project we also had a site column as to some of the fields the ones in yellow were only on some of the sites so you can see that the news space was only on the main site in the core platform whereas the private news that originally on the careers fairs section website and then some of them like the additional content field they're on all of the sites but the configuration of it was different and then you say what you're going to migrate what you're not whether it's done and then you say whether where it's going to in the new infrastructure so this was all created by hand and laborious so Eric came up with a pretty nifty module for the next migration thank you Stella so we like working with these spreadsheets because they're useful for collaborating you know front-end developers and back-end developers collaborate on them say where fields are going what content types are we going to keep what content types are we going to merge to all sorts of things with it and we can add notes to them Stella likes to color things in so when the next triple seven migration came up but we looked at ways at could we automate the process of creating this spreadsheet and we have this little demo of a tool that we wrote command line tool and it will create a Google sheet for a single entity type let's choose a paragraphs now we've got this overview of all of the paragraph types in a source site this first tab gives us a list of them all and then we've got individual tabs for every bundle let's have a look at this first paragraph type that's to CTA section on this tab we can see all the fields in this paragraph type you see there's some nested paragraphs in there and image and some text and over here we've got this this figure which is a usage we count the number of times that field is used and if we click on that we get taken to this report so it's everywhere it's being used and what context we've actually linked this up to the clients legacy site we can go there if we want to see it you see that page directly and we can also see some detail about the relevant entity let's see what's in this paragraph 1036 and now we get a nice little view of everything that says comprehensive view of everything that in that entity I like to compare this one side-by-side with the existing site and kind of see what's what so do you have anything to add on that that module it's pretty nifty it would have saved me hours of work if I had it on the University of Limerick site so we are contributing it back there's a rough version of it up in the sandbox so we'd be cleaning that up a bit making it more suitable for other sites and yeah that we contribute it back to community the URL is there in the screen thanks I want to pick up on the point we made about 3,737,000 migrations that's a big number and we didn't write them all one by one as you probably gathered I just want to imagine migration from a Drupal 7 site and we would have an individual migration maybe for different kinds of entities maybe some taxonomy terms and nodes users we write each of those as a YAML file and each of them gets turned into a plugin it is quite a typical way to go about this kind of migration but University of Limerick had one Drupal 7 site per faculty so this is this is one faculty and they had about 50 faculties so you can see how this the number of these migrations just mushrooms if I had if I had 20 things long here 50 sites immediately I'm up to a thousand we can visualize this on this sort of two-dimensional graph we've got these definitions along here on the X axis and we got the kind of variations of them along the Y axis we've also got there's a few discrepancies few differences between these sites as they they started off from one Drupal 7 site it was copied to another one few changes made copied again so they're slowly diverged over time all we need is a way to take a base configuration for any one of these things and multiply it to get this number of migrations individual ones that we can run one by one and end up with all of the plugins that we need it turns out that Drupal has just a thing for that and called a plug-in derivatives anyone here ever heard of the derivatives before one do okay about ten of you maybe yeah they used in them quite a few other cases not just migration and let's have a look at how these things work so we're gonna take a typical YAML file the definition of a migration then we add in an extra line to say that this definition uses a deriver now what happens is that that YAML file becomes the definition of a base plug-in and it's up to the deriver to take this base definition and return lots of copies and drive looks something like this but this get definitions function gets passed in the base plug-in definition which is the YAML we'll just be looking at and this instance it's returning three migrations it's creating one called arts one called science one called language and it's up to the driver to say okay I need to I need to copy the base plug-in definition and maybe I need to tweak it maybe I need to remove something add something the end result is that we end up with lots of migrations here are some faculties each this is pulling in articles for each different faculty and these are on the screen here we've got five different migrations each one is an individual migration just like a non-derivative one they've got individual queries they run on the database different amounts totals we can run we can import one of these we can import all of them we can roll them back they're just the same as non-derivative migrations and these plugins can be thought of as normal plugins they just have this slightly unusual name made up of the base identifier and then a colon and then the derivative name and other than that they're exactly the same so as I still I mentioned earlier we opted to change the way we decide was built from paragraphs and switched using layout builder not only that but we want down the route of giving a unique layout to every node where we were using paragraph layer so I want to talk a little bit about the the migration aspects of that quick bit about the layout builder terminology it's almost concept within a layout builder it's called the section and sections have one or more regions so on this page I've got two sections one is a single column was a three column every bit of content we can call a component replace the components into the sections all of that information lives in a field on the node called layout underscore builder underscore underscore layout I have no idea why is named like that with those underscores but it is a field just like any other we want to set that in the migration and I'm just about two aspects of the migration we need to get that filled in first step to talk about is we need to get inline blocks created so in triple nine inline blocks are called block content that's the name of the entity type and this migration is like any other migration we're taking paragraph entities and turning them into block content entities and when you use the process section to populate fields that we have then one field I want to draw your attention to is this thing called reusable we need to make that false if we made it true these every block we migrated would be available in a list to add to every page we don't want that these blocks that we're creating are intrinsically tied to one node so we're just going to make sure they're not reusable now we can run this and we do run it first but bear in mind that we haven't migrated any nodes at this point so this point all these blocks can be orphans so the next step is to match them up to nodes again the node migration it's like any other migration we can hone in on this layout build a layout field that we talked about earlier that's got to turn paragraphs intersections and components and the first thing we do is we locate and we load previously migrated block entity we wrote our own plug-in for this and it looks a bit like this it's going to use the migration lookup service first that's going to give us the block ID and then we load the entity second thing we want to do is make a component out of that block the component describes how a block will be placed into a section again we're a custom plug-in we take our block and we set certain parameters in this case the region that we're putting it in we're deciding whether to show the label we're deciding what the label is and what view mode we want to see on the block and we end up with this section component object at the end and lastly we create one or more sections again we've got a plug-in we choose a layout that we want and we put all our components into it and then what we want to end up with is a array of sections right at the end and all of that gets fed back into this magic layout build a layout field that we have that is how we went about migrating paragraphs to layout builder we've got a few tips and tricks we want to finish off with I'll hand over to Stella for the first few of those thanks Eric just to add that we will be writing a blog post with the code and we will be publishing the slides and the blog post will go into more detail than what we've covered here there are some things about like that was a very simple example with one column layout you know what if you've got the multiple columns how do you know what what goes where what if you've got a nested paragraph tree yeah so that be more covered in the blog post later on and we'll publish that so that's so some tips and treats first one I mentioned earlier that you are in you know you're running migrations they don't work you need to tweak them you roll back you do it again and again and then sometimes you're doing my database is completely messed up and I need to start afresh so nifty tip you know Josh site install will reinstall the configuration with a clean database but if you add the existing config parameter it will take all the configuration that's currently exported to your YAML files and import that for you so that's very handy when you're you just want to clean slates to start and test with again also if you are switching branches the configuration can change it's just a you're jumping between branches a lot for migrating different things or testing different things that can be helpful so another thing that we recommend is that you turn off caching for your configuration you will be editing your YAML files over and over and over again and you don't want to do it Josh cached here every time so that there will disable the back end cache for your configuration obviously don't deploy that production that's your local but it is very handy and it saves you a bit of headaches we also put the migrations in the migrations directory rather than config install so that's actually another tip I should have mentioned so you can just edit and tear trash or don't and it'll be there another one that I fell I fall foul love a few times is a search API so you're working away and you're migrating everything is fine you merge in the latest main branch into your migration branch so you have the latest configuration and somebody has configured search API or solar and now every time you run a migration it's also indexing the content you're migrating that slows things down considerably and when you're doing thousands of migrations you don't want that so turn off search indexing whether it's search API with solar or some other tool that you're using turn it off and I'm sure most of you who are familiar with migrations may be familiar with this already but there's the drush okay drush migrate import to to run your your migration but you know the ID list parameters very handy if you just want to test with one particular entity ID if it's a multilingual site you probably also have a colon language code after that and the migrate debug module or sorry not module parameter provided by is it migrate tools or migrate plus I forget which one is migrate tools I think that's very handy there's also the migrate debug pre parameter for when the entity doesn't get saved you can still debug and print out some outputs so that will display the legacy data structure and then also the new infrastructure that you're migrating into if it is if you use migrate debug and the migration did create the entity it will also give you the ID of the new entity created so you can then go check it out in your browser make sure it's correct all right next one it's a plan around your dependencies that you have it's a good idea to order your migrations in such a way that when you come across the reference like I do here that the thing you're referencing already exists so here I've got I've got some category terms and I've got some people and I've got some articles and articles have a taxonomy field that looks at topics and the article's also an author field it looks a person I would say makes sense to migrate the categories and the people first and then come back and do the articles I do you have to do that Drupal has this concept of migration stubs where if it comes across a reference pointing to something that doesn't doesn't exist it will create a kind of temporary placeholder node for it reserve the ID and then when you come to migrate the real thing it will populate that placeholder and fill it in for you which is which is kind neat can be a bit problematic because sometimes we find that we have reference fields that are pointing to nothing they've just I've got bad data in them and if we use stubs then we end up with a stub node sitting around so I would prefer that we don't do that and we just order things in the right way and we can't always do that sometimes we end up with a circular reference in this case each person has a one particular article that's about their biography and then you might have to use a stub and what I would say though is is one of these dependencies you can really think of as the main one you can still think of this in a kind of you can still think of the article being dependent on the person I'll do that one as without a stub and I do the biography with a stub links links are hard particularly links in freeform text like this and particularly if you're mixing you're mixing migrated content with new content and you end up with something like this good luck because that's really hard to deal with a kind of everything that we just said about organizing your dependencies goes out the window here because the link could be anywhere I think things that you can try here is first of all decide if your numeric ideas are actually a public interface these these things do have a habit of leaking out and appearing in various places if you if you can avoid that then then do one thing you can try is to write a process plug-in that actually looks up if there's an alias for note one two three and actually swap it out in the HTML there's another thing that we we were playing around with just last week which may work really well we're still kind of trying it out but that's to reserve a block of IDs for all your legacy content and any new content we we give them a much higher number now this will work if you're migrating from a source was about to become decommissioned it's not going to work if you have a continuous migration you can do today but I'm kind of curious to see how how well that works and if there are any issues with it it would solve a lot of a lot of problems the advantage of that approach is you don't need to use the DOM import and you know the other DOM process functions that migrate plus provides they're really handy for parsing HTML you don't need to worry about doing that you don't need to worry about link fields you don't need to worry about redirects so it can save a lot of pain and I think it will work fine you also wouldn't need to worry too much about dependencies and broken dependencies because you just you can keep the you keep the node ID is the same and if they're broken in the source the broken in the in the destination and that's okay and by not doing a migration look up to find out the alias you're making it faster this is one that really confused me for a while single values and multiple values and it really confused me because they both have these two plugins and both have a transform method that looks a bit like this and I thought what's going on here this is not doing anything like when would you use this turns out that if we do one or two things the process plug in either treats the data that is given as a set of values multiple values and passes it all of those to a process plug-in something like concatenating text would be a good idea if you had five pieces of text you want to concatenate them you pass all five to the process plug-in the other thing you can do is it called a plug-in once every single value so if something like formatting a date would would be one of these plugins operates on a single value you have five dates you call the transform method five times now do we normally kind of know which one of these to pick and how to treat the data but it can get it wrong so these these plugins they don't actually change the data anyway but what they do is they will operate as a hint to the next plug-in okay you should be treating this data that happens to be an array as a single value or you should be treating this data as multiple values and one more is the migrations work best when I have a clean and simple source data to work from if your data is hard it's a bit jumbled up consider having a separate step before you even start the migration to clean it up and I was working with a site that had data scattered across CSV files and I was having to cross-reference things across different CSV files I thought if only this was if only this was a database then I could write a nice query with a join and it'll be so much easier but I've got to hunt through these different CSV files so what I did was put them in a database I just CSVs tabular data it's nice to load it in I can also recommend using something like SQL light for this there's no extra infrastructure you have to worry about it's a single file it lives in the private files directory we dump our data into there and now we can do much easier ways of querying with it that's it anything dad I think the another thing about pre-processing and cleaning your data beforehand sorry I'll come up higher in stage is try to avoid the step on process plugins as they mess up your count so you'll see that you have so many unimported and then you'll run the thing and then you've a less number unimported and no fails and then you're going what happened the other ones so if you can write a source plugin to you know narrow down your data set in the first place then your counts will be sane yeah I can kind of reserve those skip those skip plugins for error conditions I think yeah okay yeah thank you for listening if you have any questions there's a we can give a microphone and you can run around if not you can run around but you can come up and take the microphone put up your hand there's also the app if you're more shy I want to ask the questions in the app we can answer them that way yes thank you you mentioned data cleanup is a good idea before the migration but we discovered in our organization that the content owners are mainly non-technical stuff is there any tool in Drupal maybe a module or anything else that could help non-technical people to do data cleanup I don't know of a tool in Drupal we do use streaming frog to create a list of all the pages on the site in a crawl and then we give it to the client or and or simplified export of it you can have a lot of data from that crawl and they can use the basis of their content audit to do that but we also have a content team who can work with the client to help them with that if they're struggling another question here not a question but another trick I had good success with installing the UUID module on Drupal 7 so I don't have to work with auto increments or it's then very easy you don't have to keep the IDs because they have the UUID and identify on the entity and then you can always use this one to resolve from your destination to the source data set because the UUID is essentially the same but the ID can change it then I think it gets easier yeah I haven't thought of that one yeah UUIDs are useful it's also worth noting that if you get there are ways to create a UUID from an arbitrary string in a predictable way so if you can you can get a key that you know it's gonna be the same you can always create a UUID from it one question here so close by did you do the migration in kind of one clean step or did you do you in incredible kind of steps where you kind of took it step by step in development or in the production we did it in one fell swoop so we obviously during development you sort of just do what you need we would have tied each like note article as articles so we could run all of those as one batch and we would have also tagged them as one site so we could do multiple grouping of things so we could roll out one site and not another as it happens we we readily ran it all from start to finish and then there was some manual cleanup of data after we did the migration because while the design wasn't changing for this particular project there were we were consolidating and narrowing down components so the 95 became 20 or 30 so we were saying well you have to make a decision as to which sort of layout which sort of design you're doing you have an insane number of paragraphs so there was some tweaking required to the content afterwards so we did that sort of we had a content freeze where they get new content if they wanted but we wouldn't update the migration and they were doing manual edits to that but yeah we did it all in one swoop there was a bug in the migrate tools module that meant when you ran everything in a batch that it would run the first migration nice and fine speedy you run the second one when they went to run the second one and run the first one again and then the second one they went to run the third one you run one two and three and so on which was taking hours like 12 plus hours that got fixed and fact went back and it's much faster when you do one at a time so yeah it took about three hours I think in the end to top the bottom hello I'm assuming that you had links inside content and in several projects the links were referring to the same page or external pages how do you manage to do that or to clean up the URLs in one only in the end projects since there are probably links that were referring to Asian stuff or what what was the strategy so the the new strategy that we would own actually wouldn't work for this site no it wasn't for the 50 sites so the we wrote process plug-in to clean up the text so that I mentioned earlier the DOM import DOM export process plugins by the migrate plus module so we had a and then you can do manipulation you can do on the DOM like string search and replace so we had process plugins that would parse the DOM find links and do a search and replace on them but every site that we're migrating were ul.ie slash the name the faculty or the stool so you would if it began with one of our 50 faculties it would we would manipulate it if it went to say ul.ie slash library which wasn't in scope for that migration we would leave it alone we're also a little bit fortunate with the the university is given that they had a site for each faculty they would have www.url.ie slash science so if we found that there were numeric things it would be you know www.url.ie slash science slash node 5 we migrating that into a single site it's just url.ie we could put in aliases with the old site prefix in so we have an alias now it says slash science slash node slash 5 points to node 250 or whatever it is got one more question I think so I'm assuming you don't have much custom code on these sites like there's no custom entities and I think like that I don't worry about but it was a bit hard to create in D7 but you could so I assume there will be some problems you would encounter if you had there was custom code a lot of it but there is no custom entities I don't think we I didn't migrate from any custom entities but if you did have a custom entity you just write a custom source plugin rather than relying on you know the D7 node or whatever is your base thanks I think that's time there okay if anyone has any further questions happy to talk afterwards but yeah thank you very much for coming