 Welcome, welcome to Custom Importers 101. You're going to learn some things today, I hope. My name is Jordan Thompson. I'm a Drupal Solutions Lead at Northern Commerce. I am a level six Drupal-er, so I've been in the Drupal land for about six years, or not six years. Yeah, six years, six in the past, six and a half. That's my little XP bar right there. You can find me on Drupal.org at Nord 102. Some more things about me. I collect Funkos, if anyone knows what those are. They're like little figures. I got way too many of them. I also like Lego a lot. My desk is full of them at work. And yeah, I like cool stuff, and this is cool. At least I think it's cool. So the agenda, what are importers? Available import methods. Why would you make a custom one, as opposed to not? A little walkthrough of how to create one. Scheduling automation and some gotchas. And this is the slide link for a PDF for those slides if you want it. So give a second for anyone who wants that. And everyone else can look at our suite office. Where are you looking at? We're located a little outside of Toronto, Ontario, Canada. So it was a bit of a journey to get here. If my accent doesn't get it, give it away from Canada. OK, so everyone got the bit.ly that wants the bit.ly? I'm seeing nods. Cool. OK, so what are importers? They're tools that can be created and used to make the process of creating large amounts of data from external sources easier, while still having some control over that. And it's much better than manually importing content and bring stuff over. So you think it could be a migration or it could be something that's coming from some external API or something. And it just works as expected. And there's no manual entry. So that's really the big thing. Automation is key. So current available import methods. So we can start with feeds. It allows you for importing and aggregating data into content entities using web interface. So it's a little nice UI, kind of like your little mapping. And it's pretty decent for that. And it doesn't really require a lot of coding either, unless you want to get into some tampering. So we can combine with feeds tamper where you can modify some stuff, do some preprocessing on that data before it actually goes in. Maybe you need to capitalize some stuff or do a little tweaking, because no data is perfect when it comes in. It's always usually pretty bad. And then you can also make your own tamper as well, your own custom tamper to get anything that's not really out of the box. Then we also have the Migrate API. So it's not just for migrations, but I find that at least in my experience, it's very common for that to be the key for migrations. It provides lots of different APIs and tools for importing data in. It's very good from going from Drupal to Drupal, but also works for any other external source into Drupal. You can make plugins. There are available plugins, available modules, contrib modules that you can kind of plug and play and really get a lot of functionality out of the box and just kind of add any kind of specific components that you need based on your data, such as like media stuff or other things. And then there's also the Q API. So hands up if you went to the Q talk yesterday. Hey yo. Mm-hmm. So this allows you to handle a number of tasks at a later stage. You can kind of make a bunch of Q items and then process them kind of whatever you want. It doesn't have to be right away. It follows kind of a first in, first out methodology, which is nice. So anything that you want to stuff in order, it's not gonna just kind of be one at whatever Drupal feels like you wanna do. And just kind of a little summary on some of the steps. So it's using the Q interface. You create an item. So you create a Q item, just kind of sits there, waits for the process. You claim it, which is kind of like the process part of it. If something goes wrong, you can release it. So it just kind of says, sorry, I don't know what happened, but I don't wanna like, I wanna go to the next guy. So we don't like completely kill our process. And then you can delete the item if you're done with it. So it's nice for the release because then you can see like, oh, that one didn't go through. Maybe the data's corrupt. Maybe it's something weird with it. We can take a look. It's not just gone forever. So why make a custom importer as opposed to some of those tools? And I'm actually gonna be talking about more of the Q, Q interface with mine. So I'm a little biased because I like Qs. So feeds, they're a little limited in how much you can do with them, right? They're simple, they're good. They have tamper, sure. You can make hooks for them, sure. But there's only so much you can do even at the fact that you can customize them, right? So it works for simple cases, simple scenarios. But if you wanna get anything complex, it's like easy mode for imports. And then there's Migrate or the Migrate API. I find that the Migrate API is a very steep learning curve. It's just like, there's a lot of knowledge to kind of funnel into your brain. And I even have a coworker that I was just talking to that says she almost exclusively does migrations and there's still things that she doesn't understand when she looks at stuff and she has a problem and she still has to go down the Google train and figure out what is this thing? So anyone who's mastered the Migrate API hats off to you. So yeah, so you can use plugins. You can make your own plugins, but there may or may not be intuitive to even make those plugins either. And a big one too is you typically have to run multiple migrations or imports due to kind of dependencies. So if you wanna make a node that references a taxonomy or a paragraph or a media, you gotta make all those dependencies first, run that, and then run the actual nodes. So you might be running like five migrations just for one kind of one content type but it has all these dependencies kind of connected to it. And then you have the QAPI, which again, we're gonna dive deeper into, but you really have full control over that mapping, full control of the manipulation of the data. It's very custom in like a good way and a bad way, I suppose. Like that semi-downfall is so custom, but the good thing is also because it's super custom, so you can kinda do whatever you want. You can perform multiple tasks in the same run. So for example, I'll show pieces where it's like, if you wanna create those taxonomy terms and those nodes kinda simultaneously, like you're gonna make the taxonomy terms and then make the nodes right after, but it's kinda one run, you can do that. You don't have to necessarily do one and then do the other. I mean, you could, but you don't have to is kinda the point. Whereas you're kinda forced to do it in migrations, like there are ways you can kinda go around it, but it's not, again, not necessarily the most intuitive to try to hack that in. And you can also kind of combine import sources. So say you're reading in a CSV for an import and then for some reason you ought to make an API call randomly to get some information. Again, migrate's not super friendly for that. Like you kinda do one thing at a time, whereas, again, custom, you kinda do whatever you want. So this is a little walkthrough again. This is gonna be a little biased on kinda like how I make importers. So take it with a grain of salt, but hopefully you guys are on the same vibe as I am for importers. So typically I start with an administration form and I like this because it allows them to say upload a file. So in all my examples, I'm just gonna be talking about CSVs just cause they're kinda easy and we could work with something else, but we'll stick with something easy. So someone can upload that CSV, trigger the import. And it's nice because even later when we're gonna automate it, what if there's a scenario where we do need a quick fix and we don't wanna wait till midnight for this thing to run and there's no other way to trigger it. So someone can quickly go in, run that if they want to and then, you know, my content's saved. There's nothing wrong with it. So that's what I like to start out with and it's also pretty easy to test with as well. And then we make an import manager service and you can use this service to kind of have a central source of information. I'll kinda go through different points of that and it can contain all your functions to populate your queues, helper functions, validation functions, kinda your one stop shop for everything you need and then we can use this kind of wherever we need to. So for example, we can have some import mapping. So as you can see here, this is just an example again of some kind of import mapping structure, but we have a title. So sometimes I'll have like a queue directory where, you know, say if I'm importing a bunch of files, they'll sit in the queue directory and then as I process them, I move them over to the archive directory so I still have them if I need to look at them. We have an expected header, which we'll go into for some validation and then we also define our queues and I'll get more into those types of queues in a little bit, but then I kinda like to define my structure in one kind of central place, all my information's there. So yeah. So kind of going into that file validation piece, I find that's really important because I've had so many scenarios where, you know, file, we go to import something, say if it's overnight already, we already have that automation in and oh, you know, file limiter's wrong or the header's wrong and now, you know, the import process just kind of tank and we just lost a whole day doing that or someone's trying to import something and they messed something up because they accidentally put a comma instead of a pipe or a pipe instead of a comma. So I'll kinda walk through again a little deeper on some of those and some examples on that, but a tool that I find very useful is something called League CSV for CSVs and it's kinda like my go-to tool for CSVs. It's really helpful. There's a lot of good functions in it. I've probably only like covered just the cusp of what it has to offer, but the tools that I use from it are really handy. So for delimiter validation, for example, a mismatch delimiter could just, you know, make the import not actually read in the data if you're looking for commas, you're looking for pipes. You know, it won't parse the file properly. It might run into an error. It might try to like think, oh, this whole line is the first column for some reason and it's gonna try to import that and it's just gonna run because it doesn't know any better, right? This is gonna go, oh, yep, I'm just doing my thing and then like, you know, hours later, you're like, well, hold on, all my data's run. So I find that it's helpful to set that delimiter and also make sure that, you know, you have that validation in place so that, you know, we're not running anything. I guess the key to is the validation is not running the import unless it's matched to your business cases, your use cases, your expected behavior, right? So you don't want this to run, you know, night over night and just is not doing anything that you're just using your wrong. So for example, there's a little line right here. This is a function from a league CSV so it's called get delimiter stats and you pass in your CSV and you can actually pass in an array of delimiters and what that does, it returns kind of like a count of how many of that delimiter it found. So if you're like wanting to double check of like what is my delimiter? So for, I specify my file delimiter and my import manager. So like for example, in this scenario, I think I was using pipes. So I'm like, make sure it's pipes and if we actually have pipes in there, it's the counts greater than zero, let's continue. If not, we're gonna mark a flag saying like you failed your file header validation or your delimiter validation rather and we're gonna throw you an error and we're not gonna even attempt to import. Similar for header validation. So a mismatch can cause data to be imported not to be imported or imported incorrectly. So for example, say you have a CSV with 15 columns and one of those has a typo in it. Now you're missing that one piece of information that could have been crucial information too that you're now not importing. And now you're gonna have to fix this you're gonna have to rerun the whole thing again. So you're kind of just wasted a bunch of time, right? And then maybe you've accidentally imported something that was in the wrong column or something like that. So it just kind of gets messy, right? So really kind of setting that strict header and delimiter rules can really help you save a lot of time and also save a lot of figuring out like what went wrong in my import because it could run fail, but you're like, oh, my import's fine. But there's a typo in like the 15th column header out of like 50 that, you know, I had to like sift through and figure out. So I've run into these scenarios and that's why I put these things in place so that I can tell people, no, the header's wrong. I know it for a fact, I run it, it gives me the error, it pops up. We're good to go. So just a little example there. So this is kind of the code that runs after checking that delimiter. And really what it's doing kind of just a summary here is it's just kind of converting those headers in the file to kind of like a machine name style. And then we're cross referencing those with our expected header and our import mapping. And then, you know, if we're matching, if we're all good, then it's gonna set our has valid headers to true. If not, so you saw maybe earlier that there's like a keyed array there for those headers and then it'll actually give them like a nice string version of like, hey, these specific headers are wrong. It's not gonna tell you that there's something wrong with your headers and say, you know, this specific one's wrong, it's spelled wrong or whatever and that's the one you need to fix. So it also helps them figure out what that point is because they might not know either that there's something wrong because they probably got that file from their data people and those people got it from some other data people and you know, they're just importing it. So getting into queue population. So you also let have the import manager do this process as well. So it can read in their file, your CSV file, for example and make that kind of initial queue item with just all that data. Just kind of bringing it in and getting it ready. And excuse me, one moment. You can actually have multiple queues. If you have multiple queues in the process, you can actually chain them using the batch API which is kind of nice. So you'll see that I kind of have a queue that then makes queue items for another queue and then just kind of keeps processing each other and if you kind of batch them together, it'll all run together. So you don't necessarily need to trigger one. Hopefully it finishes in the time that you think it finishes and then trigger the other one at some arbitrary time. So for example here, all. Fisherman's friend. Thanks. I don't have like a real cough just for anyone who's scared. I have just a dry cough when I cough. But thank you very much. I appreciate it. Now you're going to have a fisherman's friend cough. That's okay. You get points, bonus points. There's a quiz later and she already passed. There's this data queue populating method that we have. So I'm going to run through a few slides of what's in here. So first up we do is we just kind of grab those queues from our import manager and we're just going to clear them out. We don't want any old data that might be bad or anything like that. We just want to start fresh and that's what this is doing. It's kind of like grabbing them and then really queues just kind of deletes all those queue items for them. Then we have our initial operation that's going to create that initial item and I don't actually have the code snippet of what that queue creative does but it's really just kind of getting the file contents of that CSV and just shoving it one queue item. With any other details we need. And then the other operations we have are processing the get queue and the save queue and I'll kind of dive a little deeper into what those are. But those kind of are our three processes really but two of them are the main queues, if you will. The get queue and the save queue. And then at the end we're just kind of throwing them all into a batch process and letting it run. So what is the get queue? The get queue is responsible for parsing the data and preparing it for the save queue. So I really, again, these aren't necessarily like standard things, just like things that I like to do. So if you wanted to do this in a different order, feel free. But while preparing the data, you can use the get queue to manipulate that data. So again, doing your string manipulation if you want to map any values, kind of structuring any of that content. What if you have like lat long pairs that you want to put in an array as opposed to having them two different set of columns? You can do that. So sky's the limit, it really depends on your content. But really this is kind of like the preparation stage. And then with that get queue we typically have comprised of like a base class and a manual class. Now the base class is an abstract class that extends the queue worker base. This is where all kind of the generic get functionality is gonna be. So parsing that data and creating queue items for the save queue. So really our get queue processes the full file and kind of splits it up into smaller chunks. So you don't necessarily want to have one queue item with like, you know, thousands of things to process because that's really gonna slow stuff down, right? Think about like if one queue of a hundred that's gonna sling, you know, as it's going it's gonna start chugging, right? It's gonna take up your memory. It might even time out or like run out of memory or something. But if you had 10 of those with 10 each, it'd be quick boom, boom, boom, boom, boom and you know, works like a charm. It really kind of also depends on your content but I find the lower the number of queue or like the things in your queue item even if you have more queue items it still go pretty quick. Like you think, oh, I have 500 queue items with like so much each but it's gonna be much faster than what I want you to think. Now the manual class extends the base class and this is where kind of your more specific mapping can live while and it's intended to override some of the base class. Now something that's useful is you can have multiple manual classes. So for example, I have an import and the reading the CSV is the same thing because they're both CSVs. I still wanna parse them the same way but the mapping that the data inside the file is different for each of these so they have their own manuals that are still extending that base. So you don't necessarily have to make a whole new module to have two different imports. They really can service multiple. So I have, for example, in my example, I have two, you could have five, it doesn't really matter as long as it's kind of harrowing the same thing and again, because you can override those you can kind of do whatever you want. I will make a note, it's not in my slides, kind of my gauntlet but you can also have queues run on cron if you intend them to. So there's like a little like kind of flag of like run on cron runs and the manual is also a little misleading as well. It just means that like you're not necessarily manually triggering it but you are like, it's not being triggered by anything else but something that is being run like a Drush command or something. So for example, we have process item, parse CSV data, clean index and then our process data which is intended to be overridden by our manual classes. This is the base and just a little bit of that process item. So we're getting the records. I guess these are kind of out of order. You parse the CSV first and then we're gonna go through the records. I did these in a weird order, so apologies for that. So these are some functions again coming from leak CSV which are really nice so you can read in the data, you set the delimiter, you set the enclosure, you set that your header's the first row and then it just grabs it, grabs it all. So it's nice and easy. And then you loop through those records. Maybe you do some cleanup of your header keys, the process data's gonna run all that kind of mapping. And then this is also where we're grouping those and you can see that we're creating those items based on some number of groups. So for example, I think I have mine set to, I think the item number of rows is set to like a hundred or something like that. So it's Q items of a hundred. It just kind of keeps doing those and just kind of makes a bunch. So after the get Q makes all of those, so sorry, it makes save Q items so then the save Q then can process those. So all of that data is now processed, it's ready to be saved and now we're gonna save it. And we don't necessarily wanna do those in the same step because again, that's more effort for the system do. We need nice quick step, a very specific like processing and then a very specific save. We don't necessarily wanna, we can combine them, like but again, the more stuff you're doing in a process, the heavier it's gonna be. So prior to saving the data, so say we're importing notes or something, you can check whether or not that content already exists, maybe based on some unique import ID or some combination. If it doesn't, we'll create it, if it does, we'll update it. If there's content references that it needs to reference, we can then check for those, see if those need to be created or updated and you know, et cetera, et cetera. We can also make translations for example, we can do all this in the same kind of run. And again, we have come back to our base and our manual, so the base has that generic stuff, so checking whether or not it needs to create or update and handling, creating, updating references, translation stuff, like that's all very generic code. And then our manual actually has like a field mapping, so all those keys, and again, you could maybe do some of that field mapping in the get to, but I find it's handy to do that on the save. So we go, we grab our mapping and it just kind of takes a look at, oh, this is our keys from our file matching our fields and maybe the structure is a little different because maybe we have to make a paragraph or maybe make a taxonomy term. So the structures are a little different of how we check if a taxonomy term already exists. We need to check the name probably because it's probably coming as a string and that kind of stuff. So again, if you had a use for more than one manual, you could do that, so the mappings are different. But again, they're still using that base functionality to save or update or create. So our base again has kind of a process, the get mapping, which is for overriding. We have like a get node ID that's checking based on some arbitrary comparing field. We update it or we create it. And then there's more functions below, but it's just kind of just the gist, if you will. Now, our queues aren't necessarily limited to get and save, but they are kind of the most common. You get your information, you process it, and then you save it. But there could be other queues that you want to run before, after, in between. Really kind of depends on your use case, but some examples are like a clean queue. So maybe we need to delete content that doesn't matter anymore. We orphan some paragraphs, because we don't need that anymore. We can do that. Or if we have an update, so something that needs to happen within context. So for example, I have a real life example, so we have one where imports taxonomy terms. Those initial taxonomy terms have a reference to another taxonomy. And then there are nodes that also reference that secondary taxonomy. So we import the first taxonomy in the get and the save, and then we have an update to check to see if which nodes are referencing the same secondary taxonomy, and then we get the node to then reference the taxonomy that we're importing. I know that kind of sounds confusing, but you see how you wouldn't really be able to do that in a migrate or anything like that. It's very specific. So something like that could be run after when you've done all your savings. And after all that, you probably want to make a dresh command because you want to be able to eventually automate this, eventually schedule this, probably. And we can just leverage all the things that we did in our import manager, because the form can use it. Our dresh command can use it. It doesn't really matter what's using it because it's just a service. Anyone can use it. And then the only difference that really is like how the dresh command is getting that data, so obviously it's uploading that file, but assuming that file, we can point to whatever directory you want because it's a custom form. We can put it in the same queue directory that we talked about before, and it just grabs it and goes or grabs a URL from somewhere, uses that. So getting into import scheduling and automation, why would you even schedule an import? It allows us to not make any trigger, right? You don't have to sit at your computer at midnight and say, yep, I'm gonna press that button every day for the rest of the week. For the rest of my life. For the rest of my life, I'm just the import guy. Granted, I did use that before I had a schedule thing. Be like, oh yeah, can you just run that after hours and it's just gonna take three hours. You have to watch your computer to make sure it doesn't fail. Like, you got it. But then you can also have this coordinated with other processes. So for example, if you have an external source that's doing kind of like a publish or a drop off into your queue folder. So maybe that does that at one AM and then you have your process run at two AM, for example. So you can time that a little nicer with other things. I even have certain scenarios where we have an import like that, but we also have an export. So we export it to it there and then they grab it at another time. So it's nice for just being able to schedule all that kind of stuff. And how to schedule that, crown jobs and crown tabs. So a crown tab, crown table is a file that contains schedule or crown entries, crown jobs, specific set of execution instructions, specifying when and what to execute. And this is kind of the syntax. Hopefully that's sort of easy to read. It's probably not, but that's okay. So you can do it by minute, hour, day of the month, month, day of the week, and then the command. Now you might think, oh, that's kind of confusing. How do I figure this stuff out? I have this lovely handy-dandy reference down here. So I'm gonna stop sharing for just a sec here and go to this crown tab guru. It's pretty sweet if you've never seen this before, so I'm just gonna fill it in. So it shows you actually what this'll be and kind of the syntax and if you get something wrong because there's not really that kind of air checking when you fill these out. So say I want this to run every five minutes, I can do that. And it tells me every fifth minute. What if I wanted this to run in January? I can do that. And it's nice and it does that. You can do like ranges. You can do like, you know, like Monday to Saturday, right? But it's nice that it validates that you're like, oh, okay, I got this right. Literally almost every time I do a crown job, I go to the site, I fill it out just to make sure I'm not crazy that it's valid. And then I'll copy it, put it into where I'm running to. What is that crown tab? Crown tab.guru, yeah. It's pretty handy. The link is also on the slides as well too. Yeah, this is like a life saver, just saying. Now there is a caveat too. I think like some systems don't like this kind of like every five minutes kind of thing. So your mileage may vary on some of these things, but more or less like Aquia likes this kind of stuff, Pantheon probably likes this kind of stuff. So it kind of depends. So going back into here, some examples every five minutes, every hour, very specific times every day, Monday to Friday at noon or midnight, January 1st. I have literally a scheduled job that runs December 31st of every year at a very specific time because it does like a yearly report. And I can do that because it lets me, so it's pretty cool. And then getting into gotchas. So as I kind of mentioned before, the con of custom imports is because they're custom, there's not really anything out of the box to start with. Granted, I have kind of a structure that I use so I really kind of have like a skeleton that I start with every time and then I fill in the blanks as I need to. Wrapping your head around the QAPI can be a little bit of a learning curve. But again, if you remember, not as much as a learning curve is migrate. I think we're all in the greens there. Migrate API is like up here. QAPI, it takes a little bit, but it's not like years of learning, like going to the library and reading a book. Q will get processed when Cron runs unless you exclude Cron from your QA annotation. So I kind of touched on this a little bit, but I had a scenario where someone had put that annotation in so it tells you like, oh yeah, it's going to run a Cron. And then it kept running every hour. And I was like, why are you running? You're supposed to be running at midnight. And then I said, oh, there's a little Cron thing there. Let's take that out. And then it's not going to trigger anymore. But sometimes you might want that, right? You might want every hour or however you run Cron to trigger that without having like a Dresch command that does that. And then custom importers don't inherently have the same ability to roll back as Migrate does. So that's like a big one. Sure, if you don't have that rollback, it could be bad. But I'm sure there's ways to build it in if you needed to. It's not like it's impossible, but it's just not out of the box. And then for scheduling, I found that time zones and daylight savings time are the worst. Because you're kind of convert these. So for example, I put it in, it's like, okay, so I have to convert the times I want to like UTC and then worry about daylight savings time on top of that. So I've had scenarios where it doesn't run on the right time or it doesn't run on the right day. Daylight savings time actually moved it to the day before because I like to run stuff between midnight and five and it accidentally run it at like 11 the day before and it screwed everything up. So like. I had a newspaper that their servers were in a different time zone. So it did that every, all the time. Yeah. And I just said, it's gonna happen. It gets me every time I have to use, I literally have to use a site of like, so I'm in EST, so it's like EST to UTC, make sure my cron, because then you have to change your cron job till you work in UTC and not in EST. So it's a little confusing. It's a little hard to wrap your head around. Also when you're scheduling overnight, there's not necessarily a guarantee that it's not gonna fail or it won't have an error because you're not watching it, right? But if you have logging in place, so you can at least see, oh, hey, it failed because of the file header was wrong or this ran as expected. I can kind of see an output of like, yep, running, running, running, running, we're good to go. So logging is key. I will say that for sure. It saved me more than once. I was just like, hey, it didn't work. Can you figure it out? So yeah. And that's about it. If you have any questions or comments, feel free to ask. You can also clap. I could do a live demo, but it's gonna take a while. I have an importer that imports. So here's a little story time before anyone asks a question. I have an importer that runs and there's 300,000 lines in a file. And then I have a process. So this file gets dropped off. We actually have a bash script that then chunks that into smaller files of 5,000 each. Because even reading in that initial file load is heavy, right? And then you might throw an error trying to even make a cue item with that. Like that much stuff is pretty heavy too. So we chunk it. Then we do the same get safe process on all those 5,000. But we can only run so much a night because it takes so long. So it takes about a week to run. And we run maybe like 60K a night in between certain hours. And then I turn on maintenance mode as part of my import process. So my Dreshtran runs, it turns on maintenance mode, runs are import, do, do, do. I've ran how many files, turn off maintenance mode. Happy day. Something else that's kind of key too, I've ran too, is sometimes you run out of memory. You can actually increase your memory just for that kind of run. So you can temporarily increase that memory such that it can run all this stuff. And because I run it overnight, doesn't really affect too many people. So yeah, I like to run things overnight. Not that you have to run things overnight, but heavy stuff like that, especially on a site, the specifics had a lot of users use. And if for some reason something was being updated while they're trying to access stuff, not a good time. Anyways, enough rambling by me. Any questions or comments? Question? So I have used Vigrate API for images. Like sometimes you're done. So can you use custom importer to like reference images also like have to import image and then map it in Europe? As long as you have some kind of source for those images, I've done one where we've imported and that image has like a URL somewhere. So we've grabbed that image via that URL and then created like a media and then like make that file, make the media and then have that node reference it. So you could do it for sure. It really kind of depends on like your source data. But yeah, you can totally do that. What about running cron on a second's granularity? So say you have something that runs every minute you have to make sure it runs before you run the second thing and so you want to trigger that at 30 seconds. Yeah, so that is as hard because you don't want to try to run multiple things at the same time, which has happened to me as well. I've accidentally in that scenario where I was running 60K night, the first batch wasn't done yet. So then it's trying to run the second batch, but then the second batch is trying to run part of the first batch. And then there's kind of overlapping and then you have database locks and you know, what have you. It really kind of depends I guess on what you're importing. And I mean, sometimes it's just not feasible to run it quick and like it's just kind of what it is. It really depends on how heavy it is. Like if you're just importing and you're not, say, checking for existing content, maybe that's the key if you're just straight importing something or yeah. It's kind of hard. I do have one that runs every minute. It's not as big as that 3,300,000, but it's really just like kind of like checking if like, you know, say 20 things are still existing. And if it doesn't exist in some kind of source list, it'll believe it. So again, kind of depends on how heavy it is. The one that 300,000 one also has like 50 columns. So like you can also think that's even heavier, right? Cause that's how many fields most of those are taxonomy terms. And again, that's even heavier. So yeah, kind of depends. I'm just curious, who do you like to have as the author of these, assuming like nodes, but like, do you have a special user so that you can kind of tell that the importer did it? Good question. I think it typically just, I don't know. I guess I specify like the site super user. And usually you can kind of tell cause all of their timestamps are the same. More or less, but yeah, that is a good question. Like I could see a scenario where you'd have like a specific user just for the sake of like, this is the import doing this. Cause there's, I have a scenario where, you know, it's an events importer, but events don't necessarily need to be real a lot. Like they don't have to be imported. So other users can make them if they wanted to. Yeah, but some of, but the main ones are being imported. So, but granted, you'd see like, oh yeah, site super made these and it's probably not someone manually making those. But yeah, that's a good question. Any other questions, concerns? Question? What do you do when an item is failing? Like it couldn't process the item you mentioned from logging. Well, what about just like, like in that scenario, there's an import job for a ticker master API, creates events, you know, continuously. It's not like a single timing for, I'm not monitoring the job if I am, has that running in the background. I get called in when events are not coming in and it's like three months of an item hasn't been released. You know, it can't be cleared because it's airing out. Right. Is there a way you like can program it to get deleted or, you know, the two item tries and retries and retries and retries and it just gets done? I think when you, so when you are kind of like looping over them to like claim them, if you release them, they'll get an expiry date in like the queue table. And then when, I believe when cron runs and that expiration dates in the past, it'll kill it. So like the good news is that it'll kind of set that so like it doesn't try to do what you're saying, like try to keep running it forever. So yeah, so like even in the scenario where you had five of them and like the second one has a problem, it'll say, sorry, problem, release you, let's go to the third guy because he might not have a problem. But granted, if it's like an import thing, like your code, they're all going to have problems. But if it's like data-wise, it'll just say, sorry, I need to let you go. Which is probably the best case scenario. So it's not just chugging, trying to import the same thing over and over. But you will like, you know, lose that data. But again, it's probably a data thing at that point. Yeah, it is. But it just happens that, you know, thousands of items got imported, but eventually, you know, one day you hit one. So when the next time it tries, it tries to process the first hundreds of ones and they'll all fail and you can't. Yeah, yeah, I would recommend releasing them if they had an exception. Yeah, that's how I have mindset up. So it's a little more foolproof that way. And I suppose too, you could probably toss in some logging to give you more context of what the thing it's releasing to. Like if you need to see like what data was in there, maybe you can log some of that before you release it kind of thing, because you'll still have the item before you let it go into the void. Yeah, no, I have a lot of logging in mine. So it's constantly going. And if it hits something more times than not, I'll know where it's failing. But if there are scenarios where I just have to put a lot of checkpoint logging in and then it just run and hopefully it hits all the things that I need to. So hopefully that answer your question. Any other questions? I'm trying to wrap my head around why you would need 300,000 tests on your terms. Well, they're not different terms. They're all like, there's maybe say like 10, but there's 300,000 nodes that then reference those same 10. And then I don't necessarily know that this one already got created. So I had to check and then, you know, grab the TID and then associate it. Oh, what if it's now changed to like, you know, this other one, I got to make sure that one's created and do it. So yeah, it's a heavy process just because of that. And then like, fortunately there's like no media involved because again, that would be even more processing. Yeah, it's just like the amount of content and the fact that like there's like 20 of them are like taxonomy terms. So it's just like a lot to process. But again, if you have like all your, all those functions that do all that are very generic, just like, you know, oh yeah, I have some taxonomy term. I like that mapping I do kind of gives it what I need to compare it with. If there's an import ID or just the name of taxonomy and it's kind of works its magic. So yeah, and again, the good news too is like if I need to make a new importer, most of that generic code in the base, I can just reuse. And that's like I need to do something more specific, but it's very generic and I just have to change the mapping and it's kind of like plug and play in a little bit. I am actually working on like kind of like a scaffold right now for those. Just because that's what I kind of do anyways, but it'd be nice to have like an actual like scaffold, like fold like module, not module, but like kind of that you could just kind of shove in and then, you know, you're already like, you know, 60% there. And you just got to fill in the details. So yeah. Go Qs. Woo. Thank you. Where outside of Toronto are you? London, Ontario. Okay. We just say Toronto because no one knows where London is.