 Hi, everyone. Welcome to Drupal South. And thanks for coming to my session. Today, I'm going to share my experience with content migration. Let's start with some introduction of myself. So I'm a Drupal developer at MoV, and I'm a backend and front-end developer. Recently, I've been focusing on web accessibility and content migration. Drupal API, sorry, Migrate API in Drupal, it is a three-step process, extract, transform, and load. So extract is where you get the data from your source, but the raw data you're getting may not be in the right format that will work in Drupal. And that's like the process plug-ins used to transform the data into something useful in Drupal. So what can we do with migration? There are ones of migrations. Maybe you are doing a site upgrade from Drupal 7 to 8. You can be doing, we're building a site from a non-Drupal to a Drupal site where you need to migrate the contents across. Or maybe you are doing some bulk update on the content. So maybe populating the data in a new field or you are transferring data from one content type to another. There's also continuous migrations where you can do like regular update. Maybe you are getting some weather information from the forecast. Or maybe you are getting the stock price from share market. You can also do a centralized content repository where you fetch data into different satellite sites. So migration source. Migrate API supports a wide range of data source. So you can import from CSV. You can do XML, JSON, JSON API. And you can even do SQL query if you have access to the database. But in some cases, exporting data from the source may not be possible. Or maybe the data structure is too complicated. Or maybe it's too hard to handle. So what can you do? There's content scraping. And there are a number of libraries that allows you to scrape content from a website. What's good about it is what you see is what you get. So image is an image. You are not getting a token that you need to process data to get your data. It also works for Drupal or non-Drupal site. The downside of it is the content. You don't have much structure in it. And it's also harder to deal with things like entity reference. So let's see some examples. Let's say your clients want you to migrate a simple block site like this one. What are you dealing with? Looking at the front page, you can see each article. There's a title. There's a date, a summary field. And from now, inside the articles, we've got more fields like the tags, HTML body. There's internal links. There's PDF downloads. There's also other hidden elements, the publishing status, manual URL areas, 301 redirects if you want to divert your traffic from the old path to the new one. So what are the challenges? Let's look at the date. Here is the date format you are getting from the source. But in order to import into a date field, that's the format you need to put into the Drupal. Testonomy, you are getting a list of links or a list of tag links. But in the Drupal world, you need to first create the terms and then you are handling the target IDs in the term fields. The same things for images. How do you download the image, where to save it? And again, you need to create a file entity and you are dealing with the file ID in the file field. So what tools are available? There are a number of plugins that we can use to do the job. And I'm going to go through some handy ones. So Concat, it is a good tool to combine a number of fields into a one single string. So in this example, we are converting an address field into a one single address, one line address, like if that's the format you need on your new site. Or you could be doing the other way around if you want some structure in your address. The Explore plugins allows you to break up a single string and put it in a way format. Substream plugins allows you to take a set one of a string and just output that one set one of it. So in this example, we are just getting the stage from the address. And these are all familiars because they are the PHP functions. In some cases, the values you are getting are not available in the source. And you can use the default values. In this example, we are getting the node offer set into the UID 12 for all the articles because it's not something you can get from the source. Static map is another good tool to use. It allows you to create one to one mapping from one set of data to another set of data. Like in Drupal 7, user role is stored in a, you're storing the role ID. And you can do a mapping for that in Drupal 8. So day format, we have seen this before. The dates you are getting may not always be the ones that you can use straight away. And we have the format date plugin that can transfer the dates format. So let's look at entity reference. Entity reference isn't that complicated if your data source knows what the target ID is, which is possible if you're doing like Drupal to Drupal migration. But in most cases, this is more likely what you are getting, just a list of TAT names. And that's where the entity lookup can be helpful. You basically take the names, the TAT names, and look it up in the system and work out the ID, the TID used for the TAT field. And if you have a new TAT that doesn't exist in the site, you can use the entity generate. So during the input, if it found something doesn't exist on the site, it can generate the field. I mean, generate the term on the fly. Migration lookup is another awesome plugin. It allows you to reference an entity that is created from another migration. So in this example, we have like your first importing all the users. And then while you are doing the article when you import the articles, you are able to reference the over from a user that was created in the first migration files. The download plugin allows you to grab a files from a remote source and save it into a destination. And then we can use the migration lookup to populate the file ID into a file field. But I will recommend using the file import, which is a very handy tool. It basically combines the download, the creation of the file entity, as well as the migration lookup in one go. So this is all you need from your source. If you know the URL of your file, this is how you can do the import. The same thing for images. You can use the image import plugin to do the magic. URL, so if you need to handle URL redirect or alias, redirect is entity in Drupal 8. So here we can do a simple migration and it will import the data into the redirect entity. So that's the destination plugin that we are using. The same thing for alias. With the URL alias, there's destination plugin. You were able to give your content some SEO friendly URL. And there are more. Callback plugins allows you to use PS3 functions to process your data. In this function I'm using, I'm calling a function from Drupal Core to convert libraries into ptest and pr. And this exact function is just using in the test filter. PyLine itself is not a plugin, but it allows you to do a number of plugins sequentially. So in this example, after we are doing the input filter, the input test filter, we are further running and other plugins to do a string with page to fix some typos. And if there is none of the plugins available, you are always welcome to create your own one and it's not hard to create your custom plugin. So there are a lot of process plugin available and I just covered a few of those that is highlighted. And there's more from a control module. So here are some useful links. You can check out if you are interested in content migration and thanks. Any questions? Thank you. So first of all, just a question for you. Have you ever experienced the continuous migration? Yes, yes I do. Okay, so here's a live problem that we have. So we have a website that we do continuous migration using migrations and the same issue, the other light issue talked about, which is when the migration gets stuck. That's bad. So that's, yeah, basically I think like, I'm not sure about now because now in Drupal 8.7, migration now went into call, maybe it's an issue that has been fixed, but our issue hasn't been fixed, which is get stuck and then we need to reset that migration. So have this been solved, do you think? That's bad luck when that happens. But like when you like, so I didn't go through the trust commands, like you can do trust commands, migration import dash dash update, it would do the, it would force an update. So it will, if you're stuck in the middle, if you can work out what the problem is, then fix that and you can do an update from the beginnings and just like just do it again. And also there is a hash tag or hash key that you can set. So it would just look for the differences. So if there's something updated, it only update those ones. So it won't, like if you have 1,000, if you have 1,000 records or 1,000 contests that need to migrate and why you are only like making one changes, you don't need to migrate the 1,000 contests. You can set a hash key. So it finds if the content has been updated, it would just migrate that one content. And same things for if it breaks down in the middle, if you have the hash key set up, it won't migrate the first 500 that already done. You can pick it up from where it fails and then do the rest of the content. Like yeah, like actually like we didn't really have an issue with the high water mark, which is for example, if we have like 1,000 items to be migrated, even if 500 were done, that's fine. We can continue. But the problem is with the migration status, you know what when it says like it's still processing, then we need to do like drush migrate reset. Migrate reset, yeah. So basically what we did is on every cron run, we will have to do migrate, now like drush migrate then reset this migration for all the migrations for example. So we had to do it all the time, which is fine. Like it's working with us, but it's like it feels fiddly. But do you know what's the problem? I think like the first thing you need to work out what causing the- It can be anything like it can be like even for the simplest reason of just if like connection timeout, that's it, the migration is stuck forever. Yeah, yeah, I'm not sure if that can be handled in the migrate API, but you might need to on top of that, do some checking and make sure, or it depends on the data size, like what you say, it could be anything. But yeah, like migrate APIs does its work. And if there's other thing causing the problem, then like, yeah, you probably need to fix that. So- The problems in order to get things working. Just to add the migrate is actually quite stable. So if the internet connection is the issue, for example, dealing with large set of data, how we beat it is separate process to download the data. So that's met a CSV JSON. So you get it onto your machine and then you process it locally. So that was one of the issues because migrate API is actually, it never stuck on me ever apart from timeouts and all that. So maybe look for the process that is not migration that gets the data to your machine and then process it. Any more questions? Thank you. Just one comment for anybody using entity generate, be careful. It's very useful and does have its place in the world, but it makes entity and you have no record of it in your migration maps. So when you start doing things based on your maps, like rollbacks or whatever, or lookups, you can get out of sync and get confused why things don't work. Entity generate does not follow the rules of an ETL process, it breaks the rules. So use it at your own risk. Thank you very much. I was wondering if you have experience like scraping a site, like you have an HTML website that you want to scrape and from there migrate. Like do you have a strategy, for example, have an intermediate tool to create a JSON file or maybe reading directly from the HTML directly into migrate. So have you done that before? So I didn't quite get the question. So if you are scraping an HTML site, like what would be a good way to do that? Like directly query the HTML of the site or maybe have a tool that will read the site and create a JSON representation and then use that JSON file to migrate into Drupal. So I was wondering if you ever have to do like HTML scraping, how would you do that if you have experience with that? How do I do it? I think, yeah, I think what you suggest like putting in a JSON format, I think that's good. And so you first scrape all the pages from a website putting in a JSON format, putting in a JSON object and you can like, yeah, like if you want you can do some changes like with the JSON format before you kind of like import into Drupal. Yeah, so yeah, I think that will work, yeah. Just to add on top of that, we were scraping squeeze matrix CMS and it had a broken HTML. In fact, it wasn't just one pattern it was broken on some pages. So I would really recommend you have intermediate step to really process, make sure that the whatever tool you're using, whatever we're using for scraping is actually the data you're getting is right. But once we had the body, then we actually made migrate to turn it into a paragraphs. So it depends again how trustful your data you really need to make sure that your data is actually valid because otherwise you'll get all sorts of stuff. And the question for me is what's your preferred way? Do you have a rule of thumb when to use which format like XML, JSON, CSV for data? Do you have preferences or? It really depends on what is available from your source life or from your clients that you want to do the migration. But I think JSON is pretty good. It's a well-structured, it's a well-structured format and yeah, we have been using quite a lot of JSON objects for migration as well, so yeah. All right, thank you. Last question, Daniel? All right, thanks Kelvin. Big round of applause. Thank you.