 Hi, everyone. Welcome to my session called Drupal Anti-Patterns. Thank you very much for coming. Before I start, I'd like to just give a little brief introduction myself. My name is Joe Chin. I'm currently living in Singapore. I'm a Canadian expat living in Singapore for quite a number of years now. And I've been making quite a few trips down here in Australia. And I thank you for welcoming me here in your country. It feels like I'm back home in many ways, coming to Australia. So I've been working in software for a really long time. I'm not going to say how long, because that just embarrasses me with my age. But I've worked with ERP software, which is in manufacturing, HRM, human resource management, insurance, CMS obviously, and more recently, digital experience. So I've been working with lots of different software applications, platforms, things like that. In terms of Drupal, 15 years of experience. Pretty much doing everything from project management to module development, theming. Back in the older days with Drupal 6, you could kind of be this full stack developer knowing pretty much everything around the application. These days it gets a little bit more complicated because the application is just that much more complex. So you kind of specialize in one area. And I wouldn't say I'm an expert at one. I tend to be kind of like a generalist. But over the 15 years I have collected some worthy, I guess, experience that I'd like to share with you today. Currently I am a technical account manager at Acquia. You may have heard of us in this particular capacity. I look after a handful of some of our top clients in the APJ region. And oftentimes what I help them with is technical issues, typically. I'm helping them with reliability issues, performance issues, things like that. So for this talk I will be kind of sharing some of my experiences I see with my everyday customers. My email is listed below, so if you'd like to reach out to me, please do. Alright, so what is an anti-pattern? So I just kind of did a casual Google search and it kind of gave me these three different definitions. But generally an anti-pattern is a bit of a play on word on a design pattern. A design pattern is often kind of thought of as a best practice for doing something in software development. So the anti-pattern is basically what you shouldn't be doing. And what are some Drupal anti-patterns? So before I kind of go through this list, I want to kind of highlight that. Drupal itself, the core application, generally don't have bad practices. So maybe my slide was a little bit misleading and there's some core developers, core contributors in here thinking. What is Joe talking about by saying there's anti-patterns in the Drupal core? I'm not talking about that. I'm generally kind of referring to custom applications that developers working on a project, you know, the pouring of some kind of custom feature and they're implementing bad patterns there or anti-patterns there. So yeah, just to all the core contributors, no hate here guys. So today we're going to talk about some of the Drupal anti-patterns I've encountered. There are many, but in the interest of time I kept it to basically three of them. So the first one is called long-running batch jobs. And I think what I'm going to show is going to be pretty generalized. I'm not going to get too down into the weeds and I'm hoping that what I'm going to show you will kind of resonate with you because I'm trying to keep it as simple as possible. So we're kind of talking more about concepts rather than deep diving into details right now. So let's start off with a bit of background about what's a long-running batch job. So oftentimes there's a goal, you know, the product managers might say something along the lines of, you know, we need a nightly job that will process a whole bunch of users or a whole bunch of content. You know, we want to apply some kind of flag to all our user base. And your user base might be like tens of thousands of particular users, okay? So that's kind of like the ask coming from, you know, from your customer. The typical anti-pattern is, and now we're looking at the code here, is you, and I kind of hope you guys do know a little bit about programming because this is not a course about programming, is that you would have an array called users, you know, you would fetch that from a previous call, and then you would iterate this user, and then you would then call this particular method called UpdateUser, let's say. It could be anything. This method, you know, might take a second to process, let's say, right? And it seems pretty fast when you just kind of do it for one or two users. But if you have, you know, a collection of, let's say, 10,000 or 20,000 users, you know, even running it at half a second or one second, it adds up to 10,000 seconds. 10,000 seconds, if my math is correct, kind of works out to something like two and a half hours, right? And that's considered a really long job, a really long process in computer terms. So the problems, as I just indicated, is that, you know, it's a single, long-running PHP process, and it could potentially have high memory usage, okay? If you're just operating on one user, then, you know, it's a little bit memory. But collectively, over 10,000 users, you know, it can add up, all right? PHP does have some automatic memory collection, garbage collection, they call it. But sometimes it can still occur. The bigger problem, however, at least how I see it, is if you have this long-running job running, typically it's kind of done automatically at, you know, middle and late when, you know, your traffic is low, and if for whatever reason your server gets rebooted or there's an outage of some sort, then this process will just kind of like stop mid-street, right? You know, you've got 10,000 items, it might stop at 7,000, and then suddenly your server stops. And if you're on a hosted environment, this kind of happens, you know, a lot more than you may realize. At Aquia, we have customers that, you know, we're managing their server, we're applying security updates, so we give a warning saying, hey, we're going to reboot your server on such and such a date. They don't always heed that message, and they kind of forget that, oh, I got a job running at that time, right? So the server reboots, kills the job, and then, you know, they say, oh, crap, you know, I didn't get that report in the morning, that was expecting. So, and then the worst thing is that you kind of process half your items, but then the other half isn't processed. So then you have to either wake up at the time the server comes back, mainly run that batch job, because it was executed from a crime job that was set up now earlier, but then you also don't know where and what spot the process has stopped. So then you end up reprocessing some of those same items, and you could end up with, let's say, you know, a duplication, or like a doubling up of answers, or you know, some kind of data inconsistency where that item has been processed twice. So we try to eliminate all these kind of problems. And the last thing is that it's a minor one, but it's really difficult to monitor, you know, at what stage your job is, unless you purposely put some kind of logic inside this loot that's, you know, that maybe prints a log, you know, you can't really tell, you know, at what stage your job is at, okay? So these are some of the things that we've encountered with, you know, when customers use an anti-pattern like this. So what is the solution? The solution is using Drupal queues. So basically a queue is exactly what it sounds like. It's, you know, you put something in, you pull something out, right? The Drupal queue system works on what they call FIFO, first in, first out idea. The first couple of lines are just kind of like, you know, things to initialize a queue. But I want you to focus on, you know, these last three lines. It's actually essentially identical to what the previous iteration was. But instead of processing that user at this point, I'm just creating a placeholder inside the queue for it. And that's actually a really, really fast process because it's not doing all the heavy work yet at this point. I'm just kind of putting that item and saying, hey, you know, be prepared to process it. And you can actually process tens of thousands in a matter of, you know, a second or two. So that's a really, really fast process. The next thing that you would do then is you would create what's called a queue worker, okay? Don't get too concerned about the top two lines, you know, in code. You know, you can kind of just think as wrapper. But what you notice is that this line here, this update user method, is actually identical to what you just had earlier in that, you know, anti-pattern, okay? But what this does is that it means it's processing each item individually inside a single PHP process. So the final thing you do is you set up a cron job that says, you know, run my queue, run any, you know, run a process that would execute any items that's sitting in my queue, you know, every five minutes. So you just kind of create this job that just runs every five minutes. And then it would just pick items off the queue and then it would just execute them one by one. All right? So what are the benefits to doing all this then? So the benefits to doing all this is when you're executing items one by one, it just kind of creates one PHP process, finish the job, and then close that process and then execute new one. So you end up with a PHP process with a very small footprint and that doesn't really last a long time. It lasts for, like, let's say a second, right? If your server happened to reboot and you have this big long list of queue items, it doesn't create any kind of data inconsistency because a queue is saved. It's saved on the database and when your server reboots, reboot, come back up, Cron will run every five minutes again and then just pick up whatever items in the queue and then just kind of proceed on merely. And the last thing, you know, admittedly is a small little benefit, but you can just run a single Drush command to see how many items are in the queue. And that just makes it really easy to say, oh, okay, I'm 50% way through, I'm 75%, whatever it may be, right? Okay? So that's the benefits of using Truple Queue. Super easy, but not everyone's familiar with this concept. Okay, let's move on to the second anti-pattern. This one's a bit of a mouthful process blocking external EPI calls. And maybe I'll just go on to my next slide to make it a little bit easier to comprehend what I mean by this. So I see this one pretty often as well. So the goal is to display individual information on a web page. And this information comes from an external server. Maybe you have a CRM connected with your Truple site. And it's pulling out, you know, individual user information. So generally that means that the page is not cash. The typical anti-pattern to kind of creating this kind of logic is in the black box you would create a custom block. And inside that block, at some point you would execute some kind of API call to your external server saying, you know, fetch me the user profile for user 123. Then once the response comes back, you would then execute the next, you know, the next set of statements, which is to theme that result in your page. And then after it does all that, then it renders the HTML back to the user client. The problem with this particular pattern is that when the user, you know, tries to load the page, they're first confronted with, you know, basically a blank screen because it makes a request to the server. The server then calls this external API. This API might be a bit slow. You're not in control of it. It might take five seconds. It might take 10 seconds to come back. And this whole time the user is staring at a screen looking at, you know, basically a blank screen, right? So this is not the greatest user experience. A secondary problem is that the page itself is not cashable because we're making a unique request for this particular user, right? So, yeah, so it just kind of ends up being a pretty unpleasant experience for the end users because they're waiting. They're seeing this little hourglass that's going, but it's not going anywhere for any time soon. So what is the solution to handling these type of issues? So we basically break it up into two parts using REST API and using what's called asynchronous JavaScript. So the way to go around this is you would create a custom API on Drupal using probably REST, and it will be responsible for calling this, you know, slow service, right? Then that block that you used to, you know, that was used to initially create, you know, the information on here, you don't use that to call the external API. All you're doing with that particular block is you're injecting JavaScript. The JavaScript is static. It's going to be the same. And when it loads up into the client browser, the JavaScript would contain what's called asynchronous Java, asynchronous Java, you know, an asynchronous call. And what that would do is it would then call your custom API in the background what's happening. So here you have to understand a little bit between what's asynchronous versus synchronous processing. With asynchronous processing, it works kind of like this and what's familiar with it is that, hey, can you do me a favor? You know, I'm going to give you this. You go and process it, but I'm not going to hang around and waiting for you to come back with a response because I've got better things to do. And that better thing is I'm going to process all the other JavaScripts that are part of, you know, that's part of my front page or where it may be, right? And so I make this asynchronous call. It goes into Drupal because that's where I created the API. So that's kind of like down here in the little blue box. It goes into Drupal and then it kind of, you know, takes a sweet time getting some kind of response. The JavaScript meanwhile is not waiting for my response. The front page is now loading up other things that is necessary to show off to your user. And that happens actually relatively quickly. So the end result what happens is that the user gets most of their page content already rendered because that's actually cacheable. So they come onto this page, 90% of the information, 95% of the information is already there. They're left maybe with, you know, this blank spot here where it doesn't have their, you know, their user detail, but then you kind of have some kind of hourglass here that just kind of says, hey, I'm going to fetch the information. Then when the API, you know, when that asynchronous call does finally get back, if you coded your JavaScript correctly, it will then handle that response and then display the information into that little block. Okay? So the word benefits of using this particular use case is that the user generally experiences a very fast page load, you know, when they first come to that particular page. Because everything's all cached. The asynchronous API call doesn't block everything else from happening. You know, if you have other kind of JavaScript widgets going on in the background, it just continues merely and, you know, doing its thing. So the user while they're waiting for, you know, let's say their points update can then be distracted by other things on your page, right? So generally the user is happy. And secondly, it just, you know, have less stress on the Drupal server. You know, the API call that Drupal's making still kind of waits a long time, but it's a REST API and that doesn't take up as much resources as, you know, like, let's say loading up a full customized page. So overall, it just leads to less resource utilized on the server. Okay? So that's another kind of good, you know, this is what, you know, we consider a good best practice. But it's very easy, you know, for particularly newer programmer to kind of fall into this trap, right? I myself have kind of fallen into this trap before. And, you know, sometimes it'll be fine if it's, you know, just a small list. But when you're kind of building at the enterprise level where, you know, you're expecting tens of thousands of users, you know, in a given hour, let's say, then these kind of things matter. Okay? How are we doing on time? Okay, I think we're pretty good. I'm not checking my messages. All right. So my third example is blindly increasing the PHP memory limit. And I'm certain, you know, if you guys did any kind of work on Drupal, you have done this yourself. I've done it plenty of time until I kind of figure out, oh, okay, this is bad. All right, so what's the background behind this? So the ultimate goal is to kind of get rid of these PHP out of memory errors in your error log. You know, presuming you do read your error log messages, which every good developer should. Usually you're not aware of it, but, you know, it does appear, and often the use case is that you're asked to build a page, you're building it, it looks great. And then when you deploy it on production, for whatever reason, it didn't work as expected. You know, in a good scenario, you'll get part of your page and some of the information is missing. In a real bad situation, you just end up with a blank screen, a white screen of death. Okay, so you go into your PHP error logs and then you see this error called PHP fatal error. A loud memory size has been exceeded. So basically it says, you know, you ran out of memory. And a very, very typical first move if, you know, you're kind of familiar with how server configuration is done. And this is one of those kind of like, you know, a little knowledge could be a dangerous thing. So you go into your PHP settings and you say, okay, well, I'm just going to increase the memory on that. Right now it says 64 megabyte. That's way too low, so I'm going to increase it to 128. You run your test again and then fail so that I increase it to 256. And you keep doing this iterative. Iterative, really. You keep repeating this. Until, you know, you find some value that gets rid of the error message, right? So now we're up to one gigabyte. Okay? So that's really, really bad though. But you know what? Hey, it solved this particular use case. But that's bad because for a couple of reasons, when you just blindly increase the memory limit and if you kind of don't know, you know, really what you're doing and not adjust some of the other server configurations, you can actually crash your system. And you know, I'm not going to get into those kind of details, but basically what happens is that you allocate too much memory to PHP and then it takes away from, you know, your overall Apache server memory, right? And basically it then eats up all the available memory limit on your operating system and then your system just crashes and that's a really bad thing. Okay? So you thought, oh, just getting the white screen and the depth is bad for one user, you know, it's really bad when you crash the server doing it this way. All right? So that's really bad. The other problem with increasing the memory limit without kind of, you know, thinking about why you're doing so is that there's a, I guess there's an exponential relationship with how much memory you have versus how much PHP processes you can run concurrently. Okay? So if you have two gigs of memory available for PHP and you set your PHP memory limit to one gig, then it's, you know, you just, you end up with two concurrent PHP processes allowed. If you reduce that memory to, let's say, 256, then, you know, you do some math to just have trust me on this, then it becomes like eight concurrent processes that you can run at the same time, right? And why this happens is that when you say a memory limit, you know, of X, when PHP runs that process, it just allocates, you know, that exact amount to the process, regardless whether it needs or not. You know, it's kind of like, you've been tasked with transporting people back and forth from one location to another, and each time it's like one or two passengers, but you're, you know, you're driving a huge bus, right? There's a lot of wasted resources going on, okay? So, so that's bad. So it reduces the number of concurrent PHP processes, which ultimately means you can't have as much throughput going on. So how do you fix this? And here, my fix is actually not to, you know, like start twilling around with Apache settings and all that, because that actually gets pretty complicated. It's a whole discipline on itself, right? But there are some things in Drupal that you can do to fix that. So the first thing you have to do, though, is understand how much memory, you know, your processes are using. So, and then, and then set your PHP memory limit to it. So what is meant by typical? So you use something like the web profiler, the web profiler module. It can tell you what the, you know, what the memory usage is for a particular, for a particular page. If you're subscribed to something like New Relic, which offer free accounts, you can use it to come up with statistics that says, you know, the average memory usage per page is this and so forth. So I'm not going to get into that, but you have to figure out how much memory is appropriate for your particular application. I'm going to give you a little, you know, a little tip. Almost all my customers, 99% of their requests, and these are customers that have literally tens of thousands, if not millions of hits a day. 99% of them use less than 256 megabytes in their processes. Okay? So the vast majority of requests are actually somewhere around 28 to 150 megabytes. A few more kind of hits to, you know, come closer to 256. And then there are some that kind of goes beyond that limit, right? But we have a way to handle those kind of, like, those top memory hogging processes. But you would set your PHP limit to, you know, whatever that typical value may be, all right? Then the next thing you do is you install this image magic PHP extension. And why you do this is because most of the time when there's high memory usage is due to image preprocessing. So in Drupal, image preprocessing means you're updating an image file and then it kind of does this thing with image styling where it crops or compress and do all those kind of things to make your, like, 64 megabit image down to something reasonable for the internet. And so that's what's referred to as image preprocessing. That process is really memory intensive. And the larger the image file is, not in terms of size in file size, but in terms of resolution. So if you have, like, something huge, like 20 pixels by 20 pixels, that eats up a ton of memory. And it's in those one-off cases that you then run out of memory. So the easy fix is just to install this extension called image magic PHP, okay? Because by default, Drupal actually uses this processing library called GD. But for reasons that's too technical to get into, both process, you still use a fair amount of memory. But the way that image magic does it, it doesn't utilize the PHP memory limit that has been set by this memory limit setting. It does it separately, so it's not constrained by that. So you can more readily process large image files using image magic PHP, okay? So that's number two. And then we still have to address the elephant in the room, which is, I still have certain processes that eats up a lot of memory, right? There are just some that, and that's where you do what's called a conditional increase of PHP memory. So I hope you guys can read it from back there. So basically in your settings.php file, you would have some kind of if condition that says if the URI that is being hit at this point is image processing or whatever it may be, right? That caused the high memory usage. Then you would then just set whatever memory limit is necessary just for that particular process. And once that process is done, it will then revert back to the normal lower default setting. So that's how you address these kind of memory issues. And generally what you find is that, yeah, out of 20,000 requests, one or two might hit this, right? But the rest of the 19,000 some odd requests are using the more efficient memory limit, okay? So I kind of already said most of what I want to say in the previous slide. But yeah, basically you end up with more efficient usage of the server resources. And that ultimately leads to handling more page requests and higher throughput. And more importantly is you're reducing the risk of PHP processes running out of memory, giving the users that white screen at death or even worse, having your server crash. So that's it. Three fairly simple concepts I hope you guys can take back to your next project. Any questions? Yes? Oh, yep, yeah, go ahead. Okay, so I say that your temporary function temporarily increased to 520 megabytes. Is it okay if, so for example, if the request is larger than that, is it okay to temporarily increase that to one year? Yeah, yeah, I mean, it's just for that one process, right? Yeah. And that's probably okay, right? But I will add a caveat that if you have a process that is one gigabyte, then you might want to evaluate, you know, is there something wrong with my code that is taking up that much memory? Taking the day. Yeah, it's usually, so in the previous one I said, you know, it's images, right, that tend to be, you know, the cause of, you know, a lot of memory usage. When you're kind of processing large amount of data, typically it doesn't get to one gigabyte unless you have loaded a one gigabyte data set, right? Typically it doesn't use that much, and when you're loading a one gigabyte data set, then there might be a better solution. So that's where you have to explore, then, okay, how can I, you know, how can I optimize this? Can I use Drupal Q? Like in my first example. Yeah, okay. Yes? I guess you're queuing the data set through, which at least stops it from being lost when it falls over. But do you find that you still have the memory issues from the database table when Drupal's, you know, dealing with like a node saved that's got so many joins? I mean, I found, you know, most of my memory issues were resolved better by not just changing the code, but by fixing the, you know, DB buffer pools to avoid having to ever increase the PHP memory to stop the deadlocks. But then Drupal's on, what is it? Drupal.org's on read committed anyway. Right. So is the issue about Drupal Q or with this memory setting? No, more like when you were talking about the users in the loop and you queued the users. Yeah, I mean, I think... That still doesn't res... Yeah, I mean, would that still resolve itself if you ran into all those table locking issues that Drupal's so renowned for? If I'm understanding you correctly, I think there are different problems, right? Yeah. Because you was, you know... This is about isolating processing... This is called code issue. Right, it's about isolating the process of items one by one, right? Yeah. Whereas in your case, you're talking about I have a single process that just eats up a whole lot of memory. Or more I'm doing something like that, that user, but if I'm using a standard Drupal user, that's this many fields. But if I've gone and added my own custom fields to that user and I've got a complex account system, then all of a sudden that update user's huge anyway. Right. Even if I queue it. But then you're talking about just that one user and then it's a separate issue? Yeah. Oh, yeah. Because you're going to fall out of memory anyway. Correct. To being joined on that final update that you're running. Yeah. You know, user saying is in that update user. Yeah. And maybe that'll be a topic for my next Drupal stuff. That's really helpful. Yeah. Great. Thank you. Any other questions? Oh, quite a few. This lady up here. Yep. Yep. Go for it. So, what about when you have like a hook update that you want to update... Yeah, an update function that you want to batch update about users or content or something. How the queue functionality will play along with it when you batch process your hook update. It's a huge back. Right. Process that has to run other parts that it will take forever. Could the queue could be helpful? We need the same update. So, you're talking about hooks. So, it sounds like you're on Drupal 7. No, I mean that as well. Oh, you're on Drupal 9. Okay. Yeah, there is still hooks going on. Okay. Fair enough. Fair enough. Yep. Okay. So, the hooks is not necessarily what's calling the job, is it? Yeah. So, you would have an individual item and, you know, like I kind of simplified this, right? I just had one method, right? But you may have hook functions being called within this, you know, within, you know, within this iterative loop, right? Same concept applies, right? Because when you're, when you have a single item and you're ending up calling hook functions, it's still really just part of the same process, right? So, they're really not necessarily tied together. Unless that hook function iterates through a large number of items. And then your hook function then can implement this usage of queues. Yeah. The whole point was to run a hook update to update the user names. Right. Okay. So... Yeah. That has a queue behind it. Okay. So, that was kind of my question. Is queue part of it, or you can... The batch API is on top of queue. And when you use progressive batch, and that sandbox is a progressive batch. So, I don't need to worry about... If you use that, it will take a long time to draw shots like France and it will just keep speeding the end of the market. There you go. Thank you. I told you I'm not an expert at any one thing, but I have a wide, wide array of knowledge. So, thank you very much for that answer. That's right. It's specifically the result performance. Yeah. Which is in the fashion that you play out here. So, there is a solution for that. So, polite. I think there was a gentleman in the backpack. Yep. Yeah. I'm not sure about the answer about that, but the batch API compared to the queue API, that's when you use one of the other. Right. The batch API... I'm going to say it's probably not as resilient as using queues. Because what's happening when you're doing batch API, is that... And unfortunately I don't really have a working example here to show you, you're typically on a screen and when you kind of like hit start, what you'll often see is that it will process items one by one, right? And the batch API is really good when each of those individual processes eats up a lot of PHP memory, right? And you kind of want to isolate them to kind of like run one at a time. But the problem with using the batch API is you still need to have your connection back to the server. So here, it doesn't even matter if the server is kind of continually running, but if your network connection happens to fail, then because it's sending the signal back to the client each time, right? So it processes on the client and then the client kind of, you know, gives the signal says, okay, process the next one and it's kind of updating. But when you break that connection, the batch just kind of fails, okay? And these batch processes, I may be wrong here, but you usually initiate it through the front end, right? So it's not quite as robust where using queues you can just kind of create a cron job to execute it. I did actually experiment one time where you're creating batch processes because sometimes people want to have that visual that it's happening right now and to implement queues. So we can actually do the combination of creating queues and running those items in the queue through a batch so that people can see and get kind of real updates to it. Okay. Other questions? No? Okay, great. I'm going to leave you. Thank you. I'll leave you my email address here. I'm a technical account manager, so don't feel if, you know, you're going to call me, I'm going to sell you something. So yeah, do feel free to ping me if you have questions. You want to understand a little bit better of what I said today. Thank you very much for your time. Thank you.