 If you can see my screen today. Yep. Yep. So today I'll be talking about Merlin framework. Like as I mentioned in my introduction, I'm working with salsa since six. I got introduced to this framework for migration, which is called Merlin. So today I'm going to explain, I'm going to give you pointers about what exactly this tool does and how you can utilize this during your migrations. So what exactly is Merlin? So Merlin is a tool that helps us migrate content from a website into a structured data. So basically any source of any source website you can migrate and what you get as an output is structured data, data. This data then can be migrated into like Drupal or any other system. So overall it simplifies the process of migration and yeah. Hello. Can you hear me? Yes, yes. So irrespective of the source of the migration, we can extract data from the source and we get structured data as an output that is the JSON files. We extract structured data into JSON files, which can then be used by other systems during migration. Let me talk about the details, what exactly it does and how it how it works. So Merlin does three things. First thing is it helps us identify all the public URLs that need to be migrated. So for example, if you have a website that needs to be migrated, first step would be identifying all the public URLs and which are the URLs that you want to migrate to the target system. Second thing it does is it loops through each of the URLs to extract or scrape content from the source. And then finally, it gives you the extracted data into JSON format. This JSONs can then be used during your import, like you can import those JSON into Drupal or any other system. So first step, let's see how what what is the first step. So you are Merlin comes with a URL crawler that helps us like generate an extensive list of all the public URLs. Once you have the public URLs, you also have an option to group those URLs based on multiple types that is either you can group those URLs based on type. You can group based on existing DOM elements, or you can based on certain value of a group of an element and I'm going to talk about this in detail. So, just to put it in context, imagine you have a website that needs to be extracted, you will be able to extract all the list of public URLs and segregate them based on the target system like target content types maybe. So, yeah, starting point you have all the URL list segregated based on the content types. Let's see how exactly we do that. The Merlin framework uses config.yml files. Here you can define which is your source domain. You can also define there are plenty of options to provide delay to provide what is the concurrency maximum number of requests that you can you want to crawl. You can also provide and start URL and you can also provide a delay so that you won't have like you won't burden your source system. We have group by which which can be used to segregate the URLs. So this is our typical config.yml file. Let us see group by condition like how we can use grouping. So, while extracting the URLs, you can, as I mentioned, we can group those URLs together based on the target system. So, for example, what we can do is we can have a group by condition based on URL patterns. So that say some pattern needs to be grouped together so you can do that. Secondly, you can use, you can check whether and if an element is present group those URLs together. Third one is if a particular element, if you want to match a property that also you can do and based on this URLs will be grouped together. So these are the three types of groupings that are currently present. So, this is how we run a basic crawl command. We provide config.yml to the crawl script and you define where exactly the output is going to be. Once you run this command, you will get, you will get list of all the URLs and you'll also get the files will be generated based on those. So this is about crawl. This is the first step of migration that we typically follow. Once we have, once we have a list of extensive list of all the URLs in the system that we need to migrate. And once we have finalized what all URLs we need to migrate, we start with generate. Generate command basically what it does is it scrapes the pages that the URLs that we have provided it scrapes all the content based on the mappings that we have provided in the config file. So, so if you have a page if you want to extract some data out of it, we can do that using the config and defining that in the config and and and using the generate command I'm going to explain how this is done. Well, it provides a lot of inbuilt types, which we can use. There's a dog over here. Merlin provides Merlin has a documentation which provides all the predefined predefined types. For example, if we have an accordion on the source page, which we need to extract this is how you do it. If we have some media. And if you want to extract file name, alt, etc. So we can do that. There are all the predefined types which help us during the migration, and there are also pre processors. For example, white space removal. If this is our basically to do some operations on the extracted content. For example, string replace, while extracting only you can do replacement and few other things. So, this is about generate I'm going to. Yeah. So, so for example, let's see a mapping. If we have a, if we have a H1 tag, and we want to extract this H1 tag, basically, from all the pages. How do we do that we add a mapping field title this type is defined here type is text. Define the field name and selector based on the H1 tag. And this will extract the data. Let me show you an example of how, how this can be done. Let me give you a demo. So, I have, I have a generator comma script. I have, I'll extract some data from salsa digital website. Let me show you this website first and I'm going to target only to do elements at this moment. First is the title tag, and let me extract this description in order to do that. So, I have created a Yaman file, defined a domain as salsa digital. I have provided list of URLs in this at this part here. That is, I'm going to extract only five content at the moment. Few other configurations, which include what is going to be conquer concurrency delay. I've kept it like as standard as of now. The two main things that we need to look at is mapping. So, the mapping contains two fields. First one is title and second one is summary. So, first one, I have used a selector here that way that is going to extract H1 text H1 tag and secondly, I've used a long text which will extract the hero banner description. So, let me run this and show you I've provided the config file as an input and I've provided reports folder. So this is how I run it. It would let me it would tell me what are the, what are the URLs it has passed and it will tell me where the file has been generated. So it has generated a JSON file for me. If I go back to the reports folder, I have the article or JSON file. It has extracted title and also the summary for me. Similarly, you can do it for a lot of content types which are defined, including complex content types like, say, a media or an accordion or a paragraph, ordered list and menu as well. You can also extract menu here. So, yeah, so once, once we have this extracted JSON, so this is the JSON that I have, I can then proceed with the migration, I created a migration already, a Drupal migration for a Drupal JSON migration, which has article.json as my input. I have title and the summary as the input and I have processed, I have process plugin for to import it into title and body. So, yes. So let's migrate. So I have created this importer already. Let's import the content and yes, we have this content imported. So, mostly about Merlin framework. Benefits of using this framework is you don't have to get access to the source system irrespective of the source system. It might be any source you can you can migrate and yeah, it simplifies the migration overall. So this is this was my presentation for Merlin framework. Any questions? I don't see any questions. One question from my side about the like, this is the one use case that I can see it is applicable on a public website only. Is there any way that we can use for internal sites or internal application? So when you say internal, do you mean, which are hidden by some login, etc. Yeah, something like that. If it is hidden by just basic shield, then yes, of course we can provide the URL within that. But yeah, that is that that I can understand that the basic authentication is a different thing but in terms of like internet side that is only accessible internally or maybe it's like how exactly that might be the other use case that we can do with the Merlin. Not yet, but it is an excellent feature that can be added. Merlin is open source. I will put in the links to Merlin documentation as well as the Merlin GitHub repository for anybody to have a look at it and have a play with it if they want to. Hi, I see that you have your hand raised. Yeah, thank you. I do. I have a very selfish question. And I'm sorry, I came in late to the session. So I may have missed the answer. Sorry, hold on. Sorry, Max, can Max go first. I can. I was wondering if it's also able to bring across assets that are attached to the pages being scraped as in like PDF images that type of stuff. It can bring in the URLs of the assets, not really. So the aim of Merlin is to get the content in a JSON, you know, structured format. So when we are doing media migration, for example, it can bring in the URLs and then Drupal Migrate API can be used to actually bring on, bring those URLs into Drupal. Cool. Are you there now? Can you talk now? It's okay. It's answered my question. Thank you. All right, cool. I put in the links for the Merlin documentation as well as GitHub for Merlin. Merlin is an open source tool. Please feel free to create issues, etc. If you want to, if you have, if you see something that you want to get at it, please go ahead and create a full request. We can go ahead and look at that. Thanks a lot for your presentation of it. I'll just stop recording now.