 Thank you all for coming. This is the talk about performance audits and the most common issues that we see when doing them. If you would like to follow along with the slides on your laptop or phone, here you can get access to the slides. I will show the QR code one more time a few slides later, so if you miss it, don't freak out. My name is Giannis. I'm a senior engineer at Tag1 Consulting. I used to be the lead of the media initiative for Drupal 8 and I was also at the largest Drupal website on the internet at the time called examiner.com where it was not the first time where I was introduced to performance, but it was a totally different beast and since then performance has been dear to my heart. A few words about the company I work for. It's a second all-time contributor to Drupal. We are a globally distributed team of experts with largest concentration of Drupal of core contributors at any organization and we contribute a full-time infrastructure expert to Drupal Association. So we basically help you when you're using issue queues and everything else on Drupal.org. We work with many clients from different industries, a lot of enterprise, government, nonprofits, your name. So first I would like to explain what performance audit is. We are usually approached by clients when they notice that they have problems on their website. Either it's too slow for you know, their needs or users start to complain or even worse sometimes site go down or you know, really bad things happen and we're approached to try to figure out what the causes are and how to fix them. Usually it's way too late like sometimes it's weeks or days before the go live or it can also be too late in a sense that there was years of technical debt that accumulated and then it's really hard to roll that back. What we do is we review the site. First we talk to the client. They tell us what the problems are. Is the site generally slow or there are specific pages that are problematic and then based on this info we go in and review the site and we try to identify the biggest problems and also the smaller ones. And of course we usually want to fix the lowest hanging fruit first. Things that will give us the biggest bang for the buck. And usually there are some and some can give you quite big improvements. Sometimes we just provide recommendations and then the internal teams would implement them. Other times we either do them ourselves or we assist or train the clients team to do it themselves. Again, if you've missed the QR code before, it's again one more time. And now let's talk about what we see the most. By far the most common problem that we see on Drupal 8, 9, 10 websites, it's incorrect use of cash. Specifically incorrect use of cash metadata. And by cash metadata we mean cash tags, cash contexts and cash max age as well. Those are the things that define how when something that is cashed will be cleared or invalidated, how it will vary and for how long it will be cashed. If this information that is provided to Drupal is correct, then Drupal's cash system is really powerful and will help you a lot. If you don't provide the correct metadata, the correct information or if you don't provide it at all, then you'll start getting problems. And these problems are sometimes really weird and hard to debug because they won't show up immediately. They will only like show up when you have multiple users on the site and then you see that somebody is seeing something that that shouldn't because it was some other user that generated, let's say that part of markup and since the cash context was not correct and it's not varying correctly, it's being displayed to another user that shouldn't see it or should see it in some other form. And usually that's quite hard, well hard, but it's often easy to overlook this when you're developing locally because you're either running your local environment with cash as disabled or even if you're testing with cash as you don't test it with every user in all different combinations of how these users interact with the site. So it usually appears on, no, UAT or even worse on production. And in my opinion the most important thing when it comes to fixing this problem, which is really common, is to learn and understand how caching in Drupal works. There are plenty of blog posts, sessions at Drupal cons and it's really not that hard. We have to remember that in Drupal everything is cached and when you are doing something you have to think about like if you are, let's say you have a block and if you need to rely on users' role when you're generating that block, that probably means that these needs to vary by role or by permission or something like that. And that means that you have to add user role or user permissions, cash context on it. And if you don't, you have problems. Core also provides us with debugging tools that help you with that. First is HTTP headers where all the cache metadata for the page that you're looking at is displayed. And here we can see that we have a header that tells us that there was a cache miss on this page, which context are used, how long it will be cached when there will be a cache hit, and all the cache tags. This is useful if you're trying to debug a page in general, but when you have a lot of blocks or a lot of pieces of a page, it's sometimes hard to figure out where something comes from, like which part of the page caused session sample cart cache context to appear here. And this is why we also have this debug output that will, when you enable it in services.yaml, it will add an HTML comment above the block that this is about. Like you have any piece of the page that can be cached with this setting enabled will have something like this above it. And here you will see basically the same information as before, except it is specific for that part of the page. So you have like most recent news block, you will see which cache context are attached to it, which cache context for how long it will be cached, you also have, if it is a cache miss, which this one it was, as it says no here, you can also see the rendering time below, which also means that this is also like a performance measurement tool in a way. If you see, let's say your page is slow and you disable caches, so nothing is cached, and you run this, and you will see that most of the blocks will probably render like 50, 100 milliseconds, and then you notice that there is one block that takes 10 seconds to render. You will know that that's the block that is causing your problems. Related to the previous problem is the next one. Very often when people experience the first problem and don't know how to fix it, they resort to this one. They just disable cache. The problem goes away because now things work, but your performance starts to suffer. Everything is okay when you have low traffic, but then let's say your site grows. This is the worst scenario. When you start doing this, it becomes a habit. When you see that first time you see, okay, setting max age zero on some render array fixed my problem, and now you know, okay, next time that I will have similarly weird problem, I can use the same fix, and then you start doing this, and if this goes on for years, then your site is full of this things, and at the same time your site will grow and will have more and more traffic, and at some point you hit the tipping point where everything falls apart. The problem with this and why this is so bad is over the years you accumulated so much of the technical depth that it's now really hard to remove, and you have to spend a lot of time going through the code ways and figuring out where these things are and removing them, but then you also have to fix the bug that caused you to resort to this fix. Another similar thing is if you're building views, in views configuration, you have the caching setting, and by default there are three options there, time based, tag based, and none. Tag based is default, and none basically disables cache, and it's equivalent of max age zero, and it can potentially affect the whole page that it sits on. So even if this is just like a small part of the page, it can potentially affect the entire page. The fix is to never do it, like seriously. I've never seen a valid use case for something like that. Even if you have part of the page that is updated really frequently, it's still useful to cache it at least for 5 seconds or 10 seconds, like some short period of time. It's still better than caching it to zero. Then with lazy loading and big pipe, for example, you can even, like if you have something that is updated really frequently, you can separate the rendering of that out of the rest of the page, so the page can still be cached, and then just this piece that updates frequently will be added at the end of the rendering pipeline, which is actually what Drupal does by default, if you do things correctly, but later I will come to contributed modules. There are some cases where contributed modules cause this to stop working, and then this frequently changing block will again affect the whole page. And yeah, if you think you need it, you probably have the problem with the first one, so recommendations from the first problem apply as well. Next one, it's a little bit hidden functionality in Drupal, and many people don't know about, and the situation that makes it really bad, it's kind of blurred, so we have to dissect it a little bit to understand when it can cause problems. Drupal, in core, by default, has underscore list, cash tag, for every entity type that it has. Even if you have a custom entity type, core will add it by default, so you don't have to think about it. It's always there. And that means that, let's say you update a node, any node, or create a new node, like any CRUD operation on node, if we're talking about nodes, will invalidate this cash tag, which in turn means that everything that has this cash tag added to it will be cleared or invalidated. The thing is, views automatically adds this cash tag to every view. And the reason why it does that is when you build a view, the views are cashed by default, you have a list of most recent news. If it wouldn't do that and you added a new article, that list wouldn't update. So yeah, it's a good thing that we have that. The problem is that it's too general. If you're just listing your articles in a block and you have dozens of content types, any of the other content types will also clear this block, which is only about articles. So it's overdoing it. And it makes sense that it's overdoing it because Drupal needs to make sure that views work in every situation. But it also causes like a performance hit, so to say. And now let's imagine a scenario where you basically build the entire site with nodes, like everything that you... All the content that you have are nodes. And you update those nodes very frequently. Like an example would be you have some content type that is some content that is coming from some other source and you have a script that runs regularly and imports this stuff all the time. And if you have a lot of those, you're basically clearing this cache tag all the time. Boom! Because, I mean, yes, you're using cache, but it's invalidated all the time, so we're basically not using cache. And everybody uses views for everything and everybody uses nodes for everything, so what could possibly go wrong, right? And there are actually quite easy solutions for that. The most simple one is the views custom cache tag module. Which will add you next to those three options in cache configuration in views. It will add you another option, which is called custom cache tag. And when you choose those, it will, A, remove the underscore list cache tag from the view so it doesn't cause problems anymore. And it will give you a text area where you can type in the cache tags that you know are more optimal for that specific use case. And Core already has content-type-specific cache tags, so you can use a column article, and that means that this block, this view will only be invalidated when an article updates, which is what you want, right? And you can probably solve most use cases by using these. If you don't, you can very easily invalidate your custom cache tags and do that. I remember an example where we had a block that showed whose, which colleagues of yours, it was an internet, which colleagues of yours have a birthday today? And users don't update that often, right? So this is not a problem. You shouldn't be invalidating the cache of this block frequently, except if you're using single sign-on of some sort. Because single sign-on systems would usually, every time when you log in, they would send fields about the user to the site that you're logging in, and that gets saved. So often when you have SSO, users are updated all the time, a.k.a. every time when the user logs in. A block like this is being invalidated all the time. And it's really easy to fix. You hook into the saving with users, and you check if the birthday changed, or maybe even something more specific than that, but this is quite simple to do. You see if the user's birthday changed, it won't change every time when you log in, because your birthday doesn't change. And then this block can stay cached basically for the entire day, after it has been generated for that day. Another issue is a lot of joins, particularly in views. This is basically caused by Drupal's data model, which puts each field that we create into a separate table, which is great because it's powerful and flexible. You can have single-value fields. You can have multi-value fields. It's great. The problem is that when you have listings, a.k.a. views most of the time, and you want to filter on those fields, especially if you have a really complex view with a lot of filters on different fields, then you need to add joins to bring these fields into the story and then add where clauses on those joins. And the problem is that MySQL sometimes can optimize this, but often it can't. And often these kind of situations cause for queries to become really slow. If there is a left join, which is when you have this checkbox where you say require this relationship, something like that, if you don't require this relationship, that's usually then a left join. And left join is worse because the size of the dataset grows way quicker than with the normal join. Because instead of joining just rows that exist, it just adds null rows for everything that doesn't have this field, and when you have few joins doing this, the dataset that the database needs to work with explodes quite quickly. The fix for that, the most efficient fix for that, in my opinion, is to use custom entities where it makes sense. It's seriously not a scene to create a custom entity in Drupal. And the entity API is actually quite powerful, so it's not that hard to do. And even when you do it, you still get all the goodies, like you get the entity form, you can use formatters, you can use it in views, like everything works more or less out of the box. The reason why this solution is great is if you have a custom entity that has single-value fields that are base fields, everything is in one table in the database. So it's much easier to query, especially if the dataset is large. It's easier to index it, so it's better. And also, if everything are nodes, everything that you have on the page will be in one table, basically like node underscore field data, or just node. If you have custom entities that each handles its separate thing, you also separate the data in multiple databases, so your datasets by default are not that big. It obviously depends on the use case that you have, but it's seriously not a scene and it's seriously not that hard. Another thing that it's almost a scene in Drupal world is not to use views when you need some custom display of something. Creating a custom blog that loads whatever you need and then displays it exactly how you want it is not that hard to do and sometimes it's way easier because views have to fire up all these plugins that they have and they generate this query that needs to be very general so it's usually not that optimized and then to template that, it's hard because you end up with the gazillion templates potentially, but if you have a blog where you load stuff and just print it out in a render array, it's also easy to maintain and it's not that hard to do. If you have to use views, we often do what we call the in sub query track, which is this. On the left side, you have what views would do out of the box, like if you have a field on a node and you put a condition on it, this is how the query will roughly look like and if you figure out that this query is slow and you figure out that this is the field that is causing the biggest damage, you can use the trick on the right side, which is basically querying the field table first, getting entity IDs out and then on the main entity table using where on the entity ID, which is a primary key, which is indexed, so it is generally much faster. To do that, you usually need to create a custom views plugin for condition or contextual filter or whatever this specific thing is, but it's not that hard. You can do a sub query or you could probably do two separate queries and then bring the IDs into PHP and put it into the next query, both work, but often the difference between these two is substantial. We're talking a few seconds or even more than 10 seconds to 200 milliseconds. I already touched on this one. Views are great. Views are one of the reasons why Drupal is so successful in my opinion, but you can also overuse it. There are situations where they probably shouldn't be used. I've seen views being used as a field formatter and this is in a way that, let's say you have a reference field and you want to just display the things that are referenced in the field and instead of having a formatter, it just loads it from the entity and displays it. There was a view that joined from the field to the node table back to the field again. It already had a few joins before it even started doing what it needed to do. It was basically using views as a template layer because you can add fields in the UI and then it's easy to do it. But it's not really the use case that it was made to be used for. Also, as we've seen before, views out of the box have to be really general when it comes to caching. So if you do a custom field formatter instead or a custom block instead, you can be much more specific about how this thing that you're building needs to be cached, which brings your cache hit ratio up and by doing that, it also improves performance. Another trick, which is very commonly used, and it's another thing that is caused by views, wanting and needing to be very general, is the count query. As soon as you have a view that needs a count and that would usually mean either full pager or somewhere in the footer or the header of the view, the total number of items being displayed, it doubles the execution time. And it does that because in order to get the count, it takes the query that it has already to catch you whatever you need and just puts count around it and runs it again. Which means that if you have a slow query that takes 10 seconds to execute, you just added another 10 seconds on top of it. So this is one of those low-hanging fruits that we see. If there is a slow query that is hard to optimize and we want to have immediate improvement, we just suggest, would you be happy using mini pager instead of full pager? Because mini pager doesn't need count because it doesn't have first, last, numbered pages, it's just back and forth and you go forward until nothing is displayed. So it doesn't need to know how many items there will be. It's really okay to use, to custom code your block, your formator. Views are great for prototyping, for a lot of things, but sometimes when you have more complex things, going custom way is the better approach. Also, measure what you're doing. When you're building a view, enable that setting that displays the query below the view and that gives you the performance statistics. And you will see how long your query will take. And if you add, by using develop generate or something like that, a semi-realistic data set in your table, you will have a rough idea how long will it take for that view to execute. And if you see that even in development it takes a few seconds, that's probably not okay. So we have to figure out some other way to do it. Another one that is views related is overly complex queries. And this can, it's easy to click things together and not look at the query. Then if you do, you see that it's like this. And then even if you throw contrary modules into the mix, it can get even worse. It was easy to configure and it looks good in UI, doesn't equal, it's well done and it will perform well. I remember a case where we had a view that had a really complex language, fallback options, conditions, and it was hard to do. But then there was a contrary module that did it. In order to achieve it, the conditions sections of the view was like this big. But it worked and it performed well. And then somebody found a module that did the same thing, kind of. And that module was used and that old condition set that was used before was removed and just like a single line in the conditions section was added. It looked great. It was way better because it looked smaller. But the query that that module created was awful. So it was actually way, way worse. Things like that. So again, same recommendations. Look at the query while you're building the view. Use explain on the query to figure out what it's doing, whether the size of the data sets, all those things. I also always recommend make this a part of a peer review process. There is a significant change to some query, to some view. Add a comment. This is covered by this index. Add the explain output into the issue so that reviewers can see that, can review if, no, it actually is covered by the index. The explain does indeed look good. It's about a culture. If you really want to achieve great performance, it also needs to become the culture in the team, and it has to be part of the day-to-day process, and it has to be constantly thought about. This one kind of goes against that there is a module for that mentality in Drupal. But while this is true, again, Contrib Space, it's awesome. Drupal wouldn't be the same without it. But if you just blindly install 500 modules on your site, you're doing yourself this favor. Because in general, more modules means more complexity, means slower site because it's heavier, everything is more complex. And then, on the other hand, you also have to be aware that Contrib is not perfect. Even core gets weird bugs sometimes, weird regressions. And Contrib is even worse. In Contrib, this is even more frequent because there are less eyes looking at it. Sometimes a module is being maintained and built by a single dev, and nobody ever reviews it. When was the last time when you installed a module on your site that you actually, before doing it, you actually reviewed it and you try to understand what it does. And if everybody does that, yeah, great. And I'm guilty of that as well, right? But if everybody is just blindly installing modules and there is just a single person that is working on it, there is not a lot of quality assurance going on. And there are things that can sneak into, not like smaller modules, but like quite widely used modules. One example that I was really surprised about is context module that is still being used. It's probably not as much as in Drupal 7, but I've seen it being used also in D8910. It has a bug that destroys auto place-holding that Drupal provides. This is the thing that I mentioned before, where you have a piece of the page that is updated frequently or it varies for every user. Drupal detects that, removes it from the page, and then renders it separately and just injects it into the page at the end so it can cache the rest of the page. Context module has a bug that completely disables that. So if you have like a card block that varies by every user, it will ruin the cacheability of every page that this card will appear on, for example. It also completely prevents or completely disables big pipe. So if you have context module and big pipe, you're probably basically not using big pipe at all. And this issue has been open for six years and there is a patch. So if you're using context, just use that patch, but it's still not committed. So my recommendation is don't be afraid. First, the thing that I mentioned earlier, when you are bringing your contributed module into your project, treat it as a custom module initially and do a review on it and try to understand what it does and try to figure out if there are anything that are potentially problematic and try to fix them. Work with the maintainer, provide a patch and whatnot. Also, it's not a scene to have custom modules on your project. Like if the decision is between some really five line fix in a custom module and this huge extra contributed module that you will need to install, go with the first option. It will be easier to maintain long term and cause less problems down the road. The problem with this is when time goes on and you keep accumulating, it's really hard to scale back because even your team members will change. There will be a module that at some point nobody knows why it's there and you will be afraid to disable it because you never know what it will break. Also, when you don't need something, disable it, remove it from the code base and it's clear, it's not used anymore. The last one and the hardest one to fix are data model issues. In Drupal, you create things in UI and you rely on Drupal to create a data model and we are coming back to the field stable and stuff like that. Also using nodes for everything as we've seen can be problematic. Using complex field types like paragraphs for things that they were not meant to be used for, it's also wrong. Once I had a situation where we had a Drupal instance that was data storage for data scientists and then they were using APIs built with views for these data scientists to pull the data from this Drupal instance and to work on it. And in order to get nice filters in views, they used paragraphs for nested complex data which as soon as you had a decently large data set you started using filters on those APIs endpoints. It just took forever to query. On the other hand, if you would think about this before, you could use, for example, a custom field type with multiple properties because most of those paragraphs were exactly that. A few fields on another field or something like that. And that would be much easier to handle with. The problem with this one is if you don't do it in the architecture phase, when you have the site live and you have real data in, it's really expensive to fix. You basically have to redo the entire architecture, you have to migrate the data, which costs money, it's time consuming to do. Basically, at some point, you're rebuilding your entire site again, which it's not ideal. I promoted custom entity types before and custom field types just now and this is one of the solutions for this. And don't blindly trust Drupal to generate a good data model for you with content types that you create, modules that you install. Do it and then check what's in the database and see what happens. It could be fine. Or you could find some problems that would bring you a lot of headache down the road. To conclude, the type of site matters. All the things that I've been telling about right now were with an enterprise project in mind. You have to be pragmatic. If you're building a hobby site, like a site for a small local sports club or something like that, it probably doesn't matter. You can do all of those and it will be fine. It's still nice to fix the lowest hanging fruit, but it will probably be fine. On the other hand, if you're building an enterprise thing or something that you can expect that could become an enterprise thing in the future, you absolutely have to think about those things, because otherwise, when you will notice that there are problems, it will cost you a lot to fix them. Never guess, always measure, or blindly trust. When you're testing performance, only change one thing at a time. Otherwise, you won't know what brought the improvement that you're seeing. When you're measuring, measuring measuring realistic data sets, use the available tools that we mentioned, build it into a peer review process and integrate it into your culture. Don't be afraid of custom code and custom entities, and you always have to be learning, because there's always something new to learn. This is basically the most important fix. Speaking about tools, I would like to mention Goose, an open source load testing framework that we are developing at Tag1. It's built in Rust, but we love it, and there are also examples to use it on Drupal, so check it out. We are also building Gander, which is a new thing in Drupal Core that is bringing performance testing to Drupal Core, which should, in the long term, prevent us from performance regression sneaking into the core itself. But it's open source. There is a base test class in core, and it will be coming out in 10.2, so you can also start using it to performance test your websites either immediately by using the patch in the issue, or if you wait until 10.2, you won't be able to patch. And especially if you already have CI and tests running for your project, it's really simple to do. Just look into core, look at these issues, and there are a few example tests already in core. To perform a test like a single patch, you need four or five lines. So if you already have this running, it's really easy to add it. If you want to learn more, there is a blog post on the Tag1 website where we have links and a little bit of background and all that. And, yeah. Thank you for your attention, and we have five questions. For questions, there is a microphone there, or I can repeat the question. Yes? I will repeat it. You mean like a different module? Okay, so are you reusing? Reusing, okay. So the question was about paragraphs. If you reuse paragraphs, if that would help. Yeah, in general, I think it would, because if you would reuse them a lot, your dataset would become smaller. So yeah, I guess it really depends on how much you would be able to reuse. But in general, the answer is yes. I would also add that paragraphs are not, are great if you're using them as components. Like if you have a paragraph, and then when you display content and you use paragraphs to display there on the note page, it's great. It works fine. The problem with paragraphs is if you start using them in listings, and you need to add where conditions on paragraphs. Because then what it means, you have a note, and then there is a paragraph filled on the note. It's one join. Then you have to join potentially the main paragraph table. It's another join. You have to join the field on the paragraph. It's another join. And then you have condition on that. So that doesn't work really well. But if you're using paragraphs as a component tool, it works great. It's not a problem. No, when you're just retrieving values, it's not a problem. It's when you're trying to filter on the values in paragraph. There's also, like, I've seen pages that used paragraphs as components. But it used, like, hundreds of paragraph entities. That was the problem. Because then it needed to load hundreds of entities to display the page. Obviously, it was rendering forever. But if you're not overdoing it, and if you're using it as a component tool, I think it's a great tool. Yeah, so the question is about AJAX performance, specifically about AJAX forms. Because the AJAX request on forms can sometimes be... slow. The thing is, if you do a lot of things about that AJAX request, the take time, of course, it will be slow. So it comes down to back in performance. You need to make sure that the things that you are doing are fast. It's obvious. But then also sometimes, when you're doing AJAX, you usually update just a part of the page. And maybe there are things that you are doing, because every time when AJAX form thingy goes, the entire form is rebuilt. So if you're doing some compute heavy things into the form build, that are not needed for that specific AJAX use case, maybe you can make sure that those things are not executed at all. So to repeat the comment, there is a proposal to fix that because the problem is that every AJAX request rebuilds the entire page, a.k.a. the entire form. And there is an idea, a proposal to change that to only build whatever needs to be built. And it's not done yet, but if you want to have fun, you're more welcome to do. The problem with the joints is only for filters and sorts, but not for displaying deep fields or something. And I didn't see this edit to the query. And one way to avoid also the joints is to use a search API, I guess. And then you can index the paragraph fields. So you can go three levels into the paragraph and configure your search index. Of course, I guess if you go too deep and you have this super nested big thing, then maybe you get the pollution of your index stuff. And I also had this problem on the display side that you had to index some stuff just for display. The search API maybe needs to be improved in that way. So that for display, we actually load the real entity and just look at that. And for filtering, we use the indexed data. You're exactly right with everything you said and you just made me realize that I have to update my slides to mention the elastics last, search or solar or whatever. Sorry, we don't have time for questions. We're out of time. Thank you for coming. Have a nice evening.