 Hi everyone welcome to fix me if you can lab session So first of all about this session We're this is going to be a brief introduction talk because for us This is the first lab session that we have and I think for you guys as well And we were quite curious as to how it will work out. We have prepared you a site That is broken in several places. We have we broke it deliberately but the all of these issues that we prepared with the site are things that we typically find in the projects that we Unfortunately have to work with We will ask you to work in teams because we have prepared only so many copies of this site So after the this introduction session will ask you to break into teams We'll give out a password and a site to each of you that you can reach through your laptop And you will work with that there will be identical copies of each other. So should be fine So let me introduce ourselves and the team Delivering the session will be Alex and Annie and Theodore and myself Let's hear a little bit about Alex. You want to introduce yourself a bit my name is Alex and I work as a technical consultant in Takuya and Part of my job is to do site audits and security audits and performance audits of sites that's when we Kind of discover things that we want to show to you today. So Yeah, that's kind of My name is Arnani. I'm a technical team leader of professional services in Europe for Acquia and Well, as Alex says most of these examples came from real stories from our clients So expect that sharing them with you We'd allow it to not find them in many more clients we audit and I'm Teodor. I walk from France Technical consultant as well. I do a lot of work in Drupal core with JavaScript And I'm also taking care of the mobile practice of the PS team for Acquia Yeah, that's pretty much it All right, my name is Balange. I'm also part of the Acquia professional services team. I'm also a consultant We all travel the world and try to solve clients problems and or at least help them off the way So all of the issues that you will see today are things that we typically come across Throughout our engagements. So what we prepared for you is a typical setup for a Drupal website. We have a lamp stack We have varnish that's set up which is supposed to be part of a standard stack And we have set up approximate 20 25 sites. So We would like you to group up Perhaps people who are physically close to you in about groups of five ish so that we have enough sites for everyone To hand out and to start working with we will have one one of us will have Will be responsible for individual sessions And the other three will walk around and help the teams as you go along and try fixing these sites So now if you could find the team for yourself and Just kind of physically group up We should group up with about people of five I'm not sure how many people there are so I can't say how many teams that is But we have about 20 25 sites. So that's basically the limiting factor I'll share the URL in a second after we just covered the schedule Just Okay, okay. All right. Okay. Okay Okay. All right, so we have the team set up just a quick Run over through the schedule. We will first start with the site building We'll have a 10 minute break During which you can come to us ask any questions, etc We would like to ask you to start sharp at the end of each break because we are on a quite tight schedule And there's a lot of stuff to discuss So we would not like to run out of time and we know that at the end of the lab everybody's going to get tired So let's Let's be punctual after that. We'll discuss security. We'll have another break After that, we'll talk about performance and performance related issues And we have about 25 minutes at the end for overlap time or questions and just a wrap-up and summary I Would like to get Account of the teams. So if I will say the team number and if you could just raise your hand just make sure that we have Different team assigned to team number assigned to each of the teams so team one team two team three four five six seven eight perfect so We have one site per team. We have one login per site. So Obviously you should you're encouraged to work together The address will be fix me dot aquia dash ps.com as it's on the screen slash team and A number but passwords for your team And your url I think as well, right It's going to be on the site. So if you just hit up hit this url up, you should be able to get started So soon as Everyone is set up. We will begin with the first session just Look at me when you feel like you can get started and once I get a Feel for where your guys are at we can begin So ask twice when people start to login Sorry So login works for everyone any problems with the sites Worked that was the first obstacle, but uh, it's a feature not a bug It's on the front page if you if you go to the site it should be on the front page Who is who's still struggling with the setup or authentication? Everyone is more or less okay to go okay Anyone still struggling with the setup or Not Okay, so I will I will assume I don't understand Yes, there will be no file access required Right now So we encourage you to do this hands-on. Um, it will be best if you have two browsers available I mean you will all have laptops So you probably know how to start an incognito window because as some of the exercises we we would like to test Or show you how to demonstrate a particular issue with authentication and Non-authentication so these will be the um, I mean it might be useful for you So we need to show the github repo Okay, where is it Okay, but it's not public is it So if you want to have access to well see the code and check it out, you can go to github aquia-pso The fix me repository You should have access to it Please read access Everyone on the page or well All right, so we'll get started if you need to I mean right now we don't really need to look at the code Straight away So we can get back to it afterwards Okay, so first first part. We're going to look at this site building and What we mean by that Is kind of you get a new website That your clients say it's broken It doesn't tell you what's broken or how it's broken or how bad it's broken It just wants you to fix it or to tell him how to fix it so What we're going to look at first is best practices So what are the best practices and how can we check a huge code base against? Best practices we have coding standards. We have security and performance best practices Security and performance are going to be covered by Hernani and Alex afterwards For now, we'll just look at the coding standards Then you have code architecture for custom module How you can check that a module is properly developed and that it follows triple best practices Then content architecture and a little bit about configuration Which process do you use when you get a new website first? Well, usually you get the code base and a database dump So first step is to make it run on your local computer. So you can actually do things with it Then you run automated tools that exist For checking very common mistakes that you don't really need to manually check You look at what those tools are giving you And you decide which Warnings or errors are important and which are not so it takes a bit of experience and we can if you have question we can Walk you through that And usually there's custom code So you need to read everything even if it's One two thousand line of code You will need to go through it at least quickly to make sure that you didn't Miss anything automated tools are great, but they don't catch everything. So you still have some work to do and And then you just look into the messy area That you found out by reading the code and by looking at the output of automated tools What we use it's a first update module To see if there's any update you can make to any Drupal core or contributed modules on the website The hacked module to see if any patch has been applied to Drupal core Or any of the contributed module because I mean we say don't hack core, but Happens all the time. So you need to check for it The coder module will tell you if the custom code follows Drupal coding standards Or if there are any potential security issue that you can Check for without you know going to deep into it Sorry about that PHP code sniffer so that's a tool developed by the PHP community And the coder module will provide a Drupal configuration To check the coding standard like documentation standards the just To space for tabs and not And not tabs these kind of things that can be automatically checked by the code sniffer tool So we work at acquias so we get Insights which is a tool for our hosting environment Basically, that's what update and hacked module do So I mean it's easy. I just put it there. You don't need to use it But it's really handy sometimes And also you need a brain because Tools don't find everything So yeah, sorry Once you get the output of your tools Uh, you should check for a few things that will tell you if the audit or if your fix is going to be hard or not Uh, the first one is the php filter module If it's enabled, uh, you can start to get a bit scared Because you will need to see if there's any php in nodes In views in blocks these kind of things and then can be really messy to debug Uh, then php in templates like if uh, the developers have have made database query Inside the templates That's not a good thing. So It's probably going to take you a while for listing All that's wrong with the website When you have a lot of template template files It might you know, tell you that the design wasn't really sorted properly and that it could use some Uh, some work Same with views blocks panel Maybe they didn't see that they could configure something to reduce the amount of views and panels And the content types as well that's tied with the template files, but we'll see an example afterwards So the first one we're going to look at is uh, is my Drupal core and country module Hiked or patched So to do that, uh log in to your website as admin You should enable the labs application module On the module page That will enable the update hack module and everything that you need. Sorry Can you repeat the question With the website you just got access to earlier like with the past. Yeah remote. Yeah If you have it locally you can do that as well Yeah, so for example, I'm logged in I go to the module page I'm going to Yeah, well Demo effect So now we should be going to the module page and enable the lab application module That's toward the bottom That will enable the update module the hack module And I don't have internet apparently Um Team site, but I don't even have you know the So once you enable the application Yeah, that's uh, well Sometimes happens Uh So everyone has a hack module enabled Okay, can you make sure they So if you So if you go to report and hacked It should run The script to check if the core or country module have been hacked It takes a little while because there's a lot of modules, but you know we can Yeah, yeah And then if it's different they look into it so You also have the diff module enabled And you will be able to see what change what has been patched Exactly on the country core So you may take a few minutes to to run And you end up on a page like that afterwards Why yeah So, uh, how far along are you like 10 percent 20 percent? Yeah So maybe I can just uh Show you what it looks like at the end and uh Once that finishes for you you can verify that I'm not making this up this up So once this very long process finishes You end up on this page and it will show you all your country module and your core and say has it been changed From the version on jupel.org or not So, you know, there's a few modules the theme The only thing that has been hacked Strangely enough is core So if you have the diff module enabled you can click on on the link and You can have the list of files and see which one have been changed So well and below that it's all green There's too many files anyway Uh, so, I mean they hack the read me The guitino that's not a big problem. We don't really care. It's not executed, but they have system that install So this one can be pretty dangerous So if we look into that We can see that On the current version, which is on the Right. Yeah, that's right There's some there's an update function that has been added So if you run the updates on your website, you will get this thing executed Well, this particular patch is to make jupel 7 handle files bigger than 4 gigabytes Because the size of the Of the of the column on the table is not big enough. So it doesn't you know handle more than 4 gigabytes files So this one is not is not really dangerous, but sometime maybe they will hack for The user module for the registration do some weird stuff when a user register That's more dangerous and I mean all kind of things. It's We can see some crazy things Uh, so I guess while the hack module is still running we can Look into the rest. Do you have any question about the hack module or the process of just You know just seeing what's been changed No, I'll clear. All right So then We have the update update module If you while it's going to take a while as well, but if you run it on the website currently It will say that there's a module with a security release available So, I mean that's a very basic check But you'd be surprised how many developers just turn off the update module Develop for a few months and just forget about it. So then you get a six months old jupel core That you have to audit and you're like, well, there's been five security audits in that Patches since then so maybe you should update it Uh How to keep it up to date? It's easy just keep the update module enabled And actually update the module when it tells you to Because that way you will catch bugs earlier. So that's a win for everyone Yeah, and I can't be more sad because I don't have internet so You have to trust me on this one as well and Then we have coding standards So this one It's kind of more boring than the update and the hack module because you will need to read code And check, you know that it's two spaces not three and This kind of things The coder module when you have the peer php code sniffer package installed on your laptop Can check for most of the coding standard violations So I used to have the page open but it's gone now And it's the same kind of concepts So if you're live on the website you can go to the configuration page And you have a coder link on the right side of the On the right side of the page So you go to configuration That was a bad idea You go to configuration you follow the coder module coder link module And you see you can see what you what you want to check So here I have Drupal code sniffer Because the code sniffer package is installed so the tool can run Then the coding standards document some very basic security checks Usually you start with normal or critical because then otherwise you get way too many warnings Here I'm just going to select like two modules to not Have a crazy output That's going to be the email login module If I can find it. Oh, yeah email auto login and The jQuery countdown so that's country module. I downloaded Because I wanted, you know, maybe to to make some fancy things on my website I'm usually it's pretty fast, but And we we also see a lot of clients that try out modules and then they don't Remove them once they don't with it But in best case they disable them But usually just leave them here So does everyone is on the coder page Was able to go to the form I showed just before Yes, no, maybe No So Oh, it's okay. Yeah Slowly Because I mean that's automated too, it's not really It's the less interesting part of the session There are some It's a little bit of both because it doesn't catch everything that could break your websites. I mean the email auto login Module is actually an example of that It will tell you I guess probably most of the of the big problems At least the one that you can check automatically it won't catch like recursion or redirect loops or whatever But it will tell you that your SQL statement is probably not filtered enough or those kind of basic mistakes Um, so the kind of things it tells you is that, you know, you're missing a dog block. You're missing Um I like the way you write the else if is not the proper way This kind of thing. So that's why you need to scan through the results and decide which one are important or not I mean coding style, it will make the code hard to look to look at but it won't break the module Um, so now you see on the email auto login module you have Uh Why a few pages of warnings comments? But you know, there's no critical issue about this module But if you go to the module page and you try to enable the module Your website will crash Because if you open the email module five, there's a php error in it And the coder module didn't tell you that so that's why once you run the automated tool you you also need to read all the code Because that's the only way to check for for those kind of things Um And that's where ideas are really helpful Because they can do Better checks than the coder module. So, you know, a pretty boring output. It's going to tell you that You know, don't use global as well. The name of your global variable is not, you know That's not great Um And uh, yeah as far as the coder module goes that's that's basically it It's just to guide you to to know which Area of the website you will need to look into more afterwards So is that clear for everyone like what we do with the coder upgrade update and hack module? So did anyone end up running that or is it still running somewhere? Still running Okay, right. So you have the outputs available. Okay It's gone. Oh, wow Well, you broke the website That fix it Well, that's not what it's about so I just have Two other slides that I will go very fast because I just have a couple of minutes left on the site building Slots before the break another break you can you know ask come up and ask us questions or whatever you want So the the other kind of thing we see uh often. Oh, yeah, sorry Well, here you can't really fix anything because you don't have access to the code but Well, the problem is that if you have a website that well, if you don't follow coding standards It's hard for people to look into it. So if you open the code and look at the email module Well, I mean that's a customer problem because it it creates development time So if if it's hard to get into the code, it's hard to change it And if you look at the email login module The the developer doesn't use brackets on if and else all the time And the php error is that he put it on one and did that close it. So, you know, it's those kind of things Coding standards will you know remove that kind of thing and I mean php error are a problem for your customers Okay No, that's fine. I mean If you think it's useless, uh I mean, let me know and I try to change your mind Then you have the kind of views configuration kind of things So I just Skip this one The basically the problem is that the user created three different views Where one view with a contextual link would have would have worked just fine So this one just, you know, increase the number of views That's more memory. That's More crap. Uh, so So that that was the views architecture then the content architecture the last exercise It's just a show and tell It's linked to the number of templates. So usually you find It happens pretty often actually so you have an article content type. There's 2000 content items of this one a teaser microsite 200 content Like nodes of this type and then you have things like sports homepage teams homepage Change password content type the login form content type or footer homepage content type And that's and there's just one node of each of them And you can have like 20 content types with one node So here you see that they probably should have used the panel Or one content type with a different taxiometer more value that they can Change the template with or something And but it's, you know, clearly clearly wrong and if you have a hundred content types It's going to impact your performance. So It's those kind of things that you need to check, uh for site building as well and uh It's a break. So, uh, if you have questions, feel free to ask There's four of us and Yeah, we have 10 minutes the inside the Which one the Well, this one you have to go in the database and count it So we we have a tool that does those red flag checks automatically Spits out a report that say this many item in your content type this many vocabulary item this kind of things But it's it's not black and white. We can't put that in code or for example Well, that was I was saying we when we can we use insights like the tool that we have developed for the hosting because it does the updates hacked and also configuration checks like this Uh Text format is not secure because you allow random html in it That we show up in insight and this kind of thing but On contrib there's nothing that I know of that do this kind of checks There's a there's a model called architecture that would give you at least this one. Oh, yeah, here you go Sorry, but this one is very easy to get. It's like Yeah, it does a few things, but I think uh Alex is going to talk about that On the security part of the presentation And also if you have like for the end of the session if you have any use case that you have right now you say well I got this but I don't know how to fix it. Maybe that's a good use case to To you know to to have a look into And if you want to grab a drink or something feel free We'll start with the security part of the presentation Well, if internet is working So, uh, hi everyone. I'm Alex and the next session is about security. Um, can you hear me like this? Sort of like this better. Okay Cool, so I wanted to start with a question. How many of you have hacked into a website? Nice You should do it because it's fun and plus it's very useful um, because um to be able to Secure your website you need to kind of understand who is on the other side of the barricade What tools they have what methods they use And also put your mind into this Frame where you understand that as soon as you Put your website live or connect to the internet your infrastructure it will Get hacked at some point not get hacked, but there will be attempts to hack into it So being aware of of this is important and when you are developing you should know the techniques that An attacker can use to like exploit and get into your website So Here's a graph that shows Vulnerabilities by popularity Obviously the number one vulnerability always has been the cross-site scripting one then Then the next one is access bypass cross-site request forgery SQL injection and there are many others, but we will not cover them today. We'll not touch them today so For today, we will start with access bypass and for most of these Exercises you will need two sessions open to the same website One is anonymous and one is like logged in user like admin The the username and password that you have received so you can use either Two browsers or a single browser But with two tabs like in chroma you can use incognito mode Because you will need to see the difference Like some actions will need to be executed by a logged in user Okay So access bypass Is something that can happen when you misconfigure permissions in your Drupal site or you misconfigure access control in things like use for example so basically Access bypass can happen when you have weak control over a resource so You kind of let people see something that they don't need to see and do something that they don't That they shouldn't be able to do and there are two levels of Two levels where you can add protection to it one of them is authentication It's the step where you authenticate the user. So you kind of Decide who that individual or a system is is he a user that you know about And the next step is authorization when you already know who he is you kind of need to decide Does he have permissions to perform certain action on a resource like view a node Or change a node or create a user things like that Um Yeah, so just to recap Access bypass is basically when the user has the ability to do something that he shouldn't be able to do like view an entity Or modify that entity Whatever that entity is or perform some custom action like send an email from your website So this is a good example Like probably you know the develop module and the develop module has the variable editor Part of it like a tool that lets you edit variables in the variable table And if you misconfigure your permissions Anonymous users can access this tool and basically they can then do anything with your site and Well, there are sites live now that have that permission for anonymous users so It may sound like Like strange that people can do this but people do this and maybe people forget like they debug something and then they Don't disable that permission So to stop access bypass you need to implement checks if you're writing a custom custom module for example, you need to Check like before providing tools for that user to perform an action Like before sending him out a node edit page with the node edit form You need to check if that user has the access to that form And the next step is you need to check whether that user can actually um Whether that user can actually create or modify that node and the next step happens when the user submits that form Or an attacker just sends a direct post request to your site Okay So the first hands on hands on or like exercise um You need to go to a URL which is here. Can you guys see it? Yeah, it's Slash admin slash dashboard slash users slash all And that should happen as an anonymous user So once you get there, you will see an administration view With some views bulk operations Like this module is enabled and it provides some actions that you can perform on nodes or users Or some custom actions like sending an email. So you are known and um an anonymous user So Drupal should not let you block users or send emails from the system But once you get to that page, you will see that that action list is actually there So you kind of You kind of let people Well, we kind of let people do that you can send an email or block a user you can try that actually that works but Better not to try send email, but blocking users is fine But sending email is just not something we want you to do Okay Does anyone have an idea why did that happen? Why did Anonymous user get to that view and go to those actions? Because there are two levels here one level is the access to that page The to view that view of all users right and the second level Happens when you perform that action. So when you block a user or we try to send an email So another check should be there. So there are two places Which have something like wrong in it. Does anyone have an idea? Okay Okay Okay, so that's actually a view So that view has the access check there But in permissions Anonymous user user has the permission to like over like bypass all views access so that's also Can sound a bit stupid for a person to check that box But in triple six that would happen very often because it was just named In a bad way that users can All people are thinking that you need to check that box for people to be able to view any view like Any kind of view so but still in nowadays we see websites that do this But this doesn't answer the another question like Why The first question is why can I access this page the view of all users? Now the second question is why can I send an email? Or perform a certain action there should be a check there Any idea Well permissions are fine like users should not be able to block users Yeah, so this is something bad with like custom code, let's say or country code so VBO module comes with One module inside which is called actions permissions module I'm not sure why they do it this way But you kind of you need to enable that module to prevent users from sending emails And blocking other users So that's that's an example of what can be done in not the best way Like to protect your site Like you have two modules and it's not obvious that you need to enable that And I don't really see a use case to let users send emails like not the most users send emails Kind of strange use case Okay So The next example is our next vulnerability type is cross site scripting And this one is the most popular one Like in 40% on that diagram was the cross site scripting And if there are any theme developers, you should be really aware of cross site scripting things because Most of the cross site scripting Vulnerabilities that we find are in template files Or like preprocess functions. So basically in theme layer So in essence cross site scripting is something that lets user perform an action Without a user's intent but with his credentials And without him actually knowing this because this can happen in the background And If you introduce this type of vulnerability on your side So anything that that user can do with your site Through cross site scripting using java script and attacking can do much faster So if you can delete users, I know one by one then the attacker can delete them in a script in a very fast way Okay So this diagram shows just The steps in which cross site scripting like attack can happen So an attacker just submits a some piece of java script together with your content like with a comment Or with a blog post And then that java script ends up in the database So nothing bad happens here This is normal Drupal behavior because it's good that malicious java script code Ends up in the ends up in the database because then you can you have more tools and ways to To investigate an issue if that happened you can actually see which user did this Maybe see the ip address that that did this if Drupal would filter out that malicious java script before putting it into database That that wouldn't be the case. You wouldn't be able to track down when it happens So the second step is the important step is when a victim Request a page and Drupal render is together with that script with malicious java script code And renders it and sends back in a format that browser will recognize as java script and execute So that's the bad part of it. That should never happen because you should always think that any User supplied data is insecure So before you output that data back to another user or the same user you need to sanitize it You need to like strip down html tags out of there or Script tags any malicious java script code. There are libraries to do that and there are like Drupal has the text format and in text formats you can Kind of configure which tags you want to allow And very often we see that people just allow any tag and that's The most dangerous thing because sometimes people think well My site is only accessed by admins and I kind of trust my admins and they are trusted users So they can they should be able to to like post script tags and iframe tags and embed calls like object calls But at any point in time your trusted user can become untrusted You never know how he connects to your website. You never know Where his laptop is you never know like which network he uses if it's secure or not secure So any user is not really a trusted user Like i'm a bit paranoid here, but I don't trust users and user authentication So This is the step three just just showing that Once the java script has been sent to the browser of the user the browser will execute that and as soon as it does that That java script code has The access that your user has so it has the session cookie and any request that comes into your website Your triple backend won't ever recognize that that's a malicious activity. It will be a normal request for him So There are many Ways to like sanitize your data and many ways of output unsanitized data. So here's an example of what we sometimes find So no title is kind of user supplied data So you shouldn't trust it Then you set a title with that title and then you just print that title in your tpl file Then if I put some Javascript code there it will be sent to the browser in the format that browser will understand and Like execute that java script code So that's very important for theme developers because most of these things we find in theme layer You need to understand where this data comes from and only if it comes from I don't know php if it comes from database most probably was supplied to you by the user Any data should not be trusted and should be like sanitized before outputting it Drupal has already mentioned the text formats So the important bit is to configure those text formats not to be over permissive So you need to strip out script tags, ithrain tags Embed tags, object tags and actually image tags, but that Most often that doesn't happen because users and like you have business requirements that say That people should be able to like post images and comments There are other tools. There are libraries to like One is called html purifier that can detect sort of Things like that, but It's not easy, but but just keep in mind that any data supplied by user is not Really trusted and you cannot just output it without any checks any filtering So our next hands-on Is to see this in action So first step is to go to Profile page of user one But you need to be logged in there And once you get there, you should notice the value in full in full name field And next step is to open a special node that they have prepared It's on slash node slash 56 You should be logged in Just because what will happen there is that there is some malicious JavaScript that will do something bad And you kind of you need to be logged in because that malicious JavaScript needs your permissions So in real life use case an attacker that would prepare a page like this He would send you a URL to that page For example So on twitter or somewhere he can use some URL shortener and send you a short URL with a funny cat picture And once you click on it You open that page and your browser executes JavaScript code And then Okay, can we start? So how many of you guys had a client calling you and said My site is slow pretty common And usually like your most traditional answer is What do you mean like slow? Why is it? Where is it slow? Is it slow on the server? Is it slow on the on the front end? So that's the first thing that you need to understand when you talk about performances Are you talking about a performance problem that is on the front end server or something that is happening on the server? So usually when you look to a problem from the the service perspective or the backend perspective What we typically find About slow applications are services on that website uses that are very slow or unresponsive Things like the database is slow or the web service There's a web service call that is very slow to to respond on that because of that the PHP process starts to hang Or the application is too complex and we see that a lot of times like Things that have been written or code that has been written that is not really such to handle 50 000 nodes one million users things like that So when you wait to get that scale then it starts to degrade one when you start to have a lot of access it starts to degrade and This one also happens But I think that it's something that you always seem to fix in the end Which is like well, you don't have enough servers to handle your traffic and at that point you start to scale but Much often and all the times that we work with clients We always find like things that can be fixed and things that can be improved before we start thinking about Let change the server infrastructure or let's have more servers and things like that it can also be front-end slowness so You do have too many assets on the side things like you are not compressing your css And then your browser is blocking the request that you are getting or you do have some javascript That is very slow to render your dome or change something in your dome so when you Start looking to your site What what's the first thing that you look when when you try to understand? Why is it slow anyone when when you realize that? Well, actually that's not only one page that is slow everything is slow So if everything is slow usually a good way to start you're looking for the simplest page that you can render in a Drupal site And you know what is it? It's probably a 404 So if you look for a page that does not exist on your site That means that you are rendering your theme you are rendering most of your blocks But you are not rendering any specific action that belongs to any specific pass So basically if a 404 takes two seconds to render It means that every other page in the site is going to take at least two seconds to render Okay, then you can start with more common pages. So things that your users are going to find all the time. So notes The home page landing pages. So pages that They do have a lot of traffic So if you have a problem on that specific page, then it can be a big problem for your site And if you do have more data that allows you to keep on tracking what's going on then you should go to those pages And this is something that I don't think that many people think of And It's it's something very easy to understand like Drupal as a framework or as a cms to build websites It's built millions and millions of sites So it's it's quite easy to understand how much time is going to take a page in Drupal to render How much time is it expected the page to render? So when you render a page in Drupal and of course it depends on the complexity of the page Depends on much information you have there what you are doing to load the database and display that data But usually what you should be looking is around one to one a half seconds to generate that page if it's lower much better, but if it's get higher than that start to Seeing that something is not very well done 46 memory 40 or 60 memory of of memory um and 100 to 300 queries, okay? So usually like to render a page 100 queries in Drupal is something pretty normal if it gets 200 It's okay 300 400 500 something there is starts to be very wrong And again simple pages like the 404s are a very good way of understanding What's happening on the general level and then you can go to the most detail level So tools that you can use to chase it tools They can understand how much complex is your application and how can you Find what's going on and how do you can fix it afterwards? So what we're going to use in this lab is basically three Three tools like develop does anyone everyone knows develop? Yeah, xhprof Does anyone knows how to debug with xhprof? No, okay. Good. So we are going to take a look on that and sometimes in Drupal Especially when you use xhprof sometimes it's hard to understand the xhprof trace profile And the main reason for that is that there are a lot of functions in Drupal that in reality What's important about the function is not with the function name. That's what xhprof gives you But the function arguments like if you get call user funk Doesn't really mean anything if you don't understand what are the what is the argument that is there? So either you have experience and you know how to trace back that call to go to the upper level and understand What's going on or sometimes it's easier just to put times timer start timer reads to api functions from Drupal core And and just debug that and and just understand when you are rendering that specific trace of code Put the timer start in the beginning put the timer read in the end and understand how much it took Okay So Five things that we are going to see today. First of all is slow queries. Okay, so usually it's You you open develop and you look to the number of queries like I told you 100 queries. That's cool. That's fine Um, however, five seconds to generate those 100 queries are not fine Okay, so usually if you have a low number of queries and you do have a large Amount of time spent on those queries. Usually it means one two three four five Bad queries, so it's easier to trace to a single point where your application is loading Something from the database that is very slow. Okay, so we are going to look to one of those Second one is a bit more complicated. It's like The query time is is high Um But the number of queries is also high So that means that probably your queries are fast, but they are too many So that means that probably you are loading too much information from the database Or it means that you are doing something weird to get to that data to that data And most of the time it's not only the time that you get Um to to to look to that to that data from the database Yep Ah, sure So I was saying the number of queries is high The number of the query time is high is like almost four seconds But another thing that is interesting is that when you load data From the database that means that probably Drupal is doing something with it as well So it's not only the query time that is slow is what you are doing afterwards with your data is also slow for sure So and you can see The amount of memory that i'm consuming here is already 80 max which is double that I told you that would be Okay, okay, so you are loading a lot of data from the database You are spending a lot of time processing it and you are consuming too much memory Okay Third case that we are looking is edge cases So basically like something that you thought well, this is a bit slow But it just happens once or twice. So that's fine. And actually happens all the time So things like hookinits that you saw at all They are just going to happen in this specific situation and that specific situation and it happens in every page rendering Um or things that are a bit more Weird and hard to to find like It's very easy to plug in Drupal having like a hook node load or a hook node view The problem with those type of hooks is that it's very easy to plug something in a vocation that you don't want to And then every time that you do something like a node load, then you are executing that trace of code Okay, and the the other thing that we can you can put in these categories things like um You have a block and the block is rendering in all the pages And you saw that the block was just showing on the node because you are Getting the way that you are showing it via a template or or via css or via something like that Special task is something that happens a lot as well So imagine that you have a special task that happens on your site like a cron job or something like that That is very slow and then you are executing from time to time or very periodically then at that time The site can have can have problems Okay, so let's go some hands on so what you need to do Um, as I said, you will need three tools for for this part so develop xh prof And the second one is just a browser inspector like cron developer tools or internet explorer developer tools or firebug or something like that Okay, so everyone knows how to enable um A develop that's a very easy one right See if I can Access this site They were able to access it Anyone was able to access it the websites that you have is it working or not? Very slow you can research to us Burnish at the PM might see it or something Okay back to life So to enable develop you just need to convey configuration and in develop It allows you to do three nice things um Query log is interesting. It allows you to see what are the queries that you are executing on the page the other two things that are interesting is the page timer and the memory usage um They've also integrated with xh prof so you can actually enable xh prof here But you will need to have xh prof library installed on the site So usually what we recommend is to use xh prof model for Drupal which is very easier to To install just to enable the model as long as you have the xh prof extension. You are good to go Okay so Let's enable xh prof. Let's enable develop Let's enable xh prof Okay, so if everything goes well Now if I go to my home page if I'm logged as the super admin in this case I do have a couple of more interesting debug information in the bottom of the page um, so there's a wing for an xh prof report and um There's the amount of queries that i'm doing so 86 and 100 milliseconds. That's cool And now so I can see all the queries that i'm doing On on the site. Okay, so let's look to the first um Exercise so if I Go up on the site and if I click in Wait, how do I put So if I go to the list of troopers if that will be the Drupal side a Drupal con site and if I click in The first user demo Let's say that this is a Drupal Drupal con site and I'm showing the profiles of different users that contribute to Drupal And I have a tab that actually integrates with Drupal at org and show me all the commits that the user has done To the Drupal project. So if I click on Drupal commits, then actually what it's doing it's loading It's doing a query against another database And these query it's very very slow Let's see if I can even show it Can we start running it again? There's a Query here here. I cannot show you this one Um, if you are able to download the site, you should be able to see it But basically it's um It's it's a good example for the first example. I gave you so it's um a large query That is doing a bunch of joins and because of that it's very very slow So you would see in the bottom you'd see The amount of memory that you are consuming the amount of time that you are consuming. Okay So This one did not work. Let's try the next one So this one should work Let's see So I told you that one of the one of the things that you could use to understand what's going on on the site It's Going look into a 404 page and understand How slow it is so that would be kind of the the bottleneck or the The the based on for the rest of your outside. So let's say that I I look for something like um Prague Or something with an error that um Is not working at all. It's harder with the french keyboard. I can tell you We'll tell you that doing this thing with the numbers was a good idea So I was saying that um To have xh prof working with develop you need to have the xh Library the xh prof library installed and configured as a virtual host on your server So you need to like go to your server slash xh prof And then you'd be able to access the xh prof reports that your server is generating If you use xh prof model, all of that rooting is done via Drupal. So we don't need to worry about anything like that Yeah, so that's that's what I'm going to show you right now. So that's a good question. Um So This page takes like 400 milliseconds to render and it's a 404 page Okay So it's kind of okay, but it's probably like a bit too slow So if you look to an xh prof output It will give you Um essential information so it will give you two things that are very important. One is wall time wall time is Time that your application is not really working, but it's waiting for something Okay, so usually it's waiting for things like a database or it sinks for a web service call um So usually like understanding what's the difference between the cpu time and the wall time It's a good. It's a good way of understanding if you are depending on something that is slow Which is not really um the php process. Okay So and the first thing it will give you is it will sort by the top one of the functions that you have on the site Okay So if you look to these one of the um calls You see that on the top 100 Um or in the top 20 everything seems like normal. Okay, so Drupal starts does the main does the menu is a good And we're which is like, okay, I want to understand what's the function that I'm going to render on that page um It renders the page it Until now it's it's fine, but now it's starting rending the blocks. Okay And if you look to the blocks, so you see that there's a function in the bottom called Drupal con show weather Okay And if you look to the the page itself So this page here it does not have any weather block. Okay, so you don't see it. Okay Um, however the code is there and if you go back to the call so you can click on Drupal con show weather It will tell you. Oh, I'm taking um 200 200 milliseconds to generate that that call and if you look inside it you have there's a slip function of 200 milliseconds Okay, so it's it's it's slow because we put the slip there that makes it slow And and the reason why is it? Um, slow and I can show you here. So if you look to the codes if you are able to look inside sites These network connections terrible So inside sites all models custom you have a Drupal con weather model and that's that's that's slow. Okay Okay, so Inside the weather as I was saying that there was there was child functions one of them is a slip the slip is 200 milliseconds That's slow. Okay Um, but you don't see the You don't really see that the block is not there. So if you look to the blocks configuration That's something that we it happens a lot of times Is that all the blocks that you have here? Defined as as blocks for the different regions of your site unless you Set it to appear only for certain cordon types or only for certain paths Even if you are not showing as I was not showing in the 404 page. They are being rendered. Okay So the weather block right now I'm I'm not showing it because I just I just have a region enabled in some page template But I'm rendering all the time So it's always taking 200 milliseconds in every page function in every page render that I have from my site Okay, that's very very common. We see that a lot of times we see pages that take Two three four seconds because all the blocks from the from the site They are showing in every page So that means that they are showing on the home page on node pages on 404s everywhere Every time that you are rendering a Drupal page you are rendering all those blocks that are not controlled. Okay Something else that we see a lot of times is complexity. So imagine like You know that to do something in Drupal is always very easy And it's always there's a lot of ways of doing things in Drupal and sometimes there's not a very Better way than the other one. However, if you think about performance and if you think about What am I doing with my data and that I'm getting and I'm rendering then you can get in situations where adding extra complexity can be can be very problematic so For this one, let me show you something on the site. So if you go to the home page again And if you go to the sessions page I have here a block that shows me all the past sessions that have been present in in a Drupal con for instance Okay, so it's a normal jump menu where you can Just click on it and it will it will move you to the session If you look to the number of queries and the amount of time that you spend on the queries, it's a little higher and if you look to The amount of page time that was spent on the page execution, and it's also higher Okay, so and if you look to the xh prof output It will tell you why again So you start looking to the page And you see that at some point again, we saw the 10 normal functions that are there and they are usually there so like If you sort by inclusive time, they would always be there The one trick that you can do however is saying, okay, let me change by exclusive time so here you would get things like slips and Epidio executions and memcache gets and things like that. So it's basically like time that you spend inside the function inclusive So if you can back if you come back you see that there is a function called Drupal con sessions block view That is very slow So that function from all the one and a half seconds that we took to render this page This function took 500 milliseconds So one search of the time that you are spending on our application to render this page It just spent on this little block that is just showing me the rest of last Drupal cons And just present me a select box, which is something very simple, right? I need to go to the database get I don't know one other 200 300 Titles and show them. Okay, so why is it slow if you look to the code Let's see if I can show you In github Do you have the code opens anywhere tiller? You saw the github Can someone open the the code and see what's there? So it should be inside this Drupal con um Sessions model So inside the custom folder There's that Here Okay, sorry Yeah, so you see here. Sorry, oh So let's see if we can see the code Oh, I put this too So to show Like this So I'm calling this function, which is the Drupal con show session menu, which is this function So what what I'm doing is basically I'm doing an entity field query that basically goes to your Database and gets you entity IDs and then I'm doing a node load multiple And and then I'm just Setting the passing the content to a form. Okay, so why is this slow? Yeah, so the node loads if you look to the or if you if you try to count the amount of checkbox or Options that I have in the drop down. I think I have around 400 and 500 So it means that to pop away that that drop down. I'm actually loading 400 or 500 nodes. Okay, and that's always very slow Okay, even if that node information after the first load most of it is cached and if you can use Entity entity cache afterwards to to cache that information It's always a work that you are doing when things are not caching So in situations where things are not cached and you will do have a lot of concurrent requests Then you might have a problem. Okay, so what would be a solution for this? What instead of doing this, what should I be doing? Yep, so fetch a title will be in a custom query will be away Um, lots of times to do things like these you can just use views as well Like to do these if you create a view that the output or the the display format is The jump menu then you'd have the same thing, right? And most of the times also instead of writing a custom query, um, it's it's actually Easier to just write your view and then call that view and get the results back from the view Okay, so you don't need to worry about sequel You don't need to worry about the next time that you you have a major update from Drupal You need to rewrite all your sequel if everything is done via the normal ways of getting data using views Then you don't have that problem anymore Okay So last example is a bit is a bit more tricky and it involves Um It involves infrastructure, okay, so how many of you have experienced with varnish, okay, so varnish is a not very Not something very hard to explain is basically like a box that you put in front of your web servers First time that you get a request. He doesn't know what you are talking about. He goes to the web servers gets the request Gives it back to the user and saves the answer in memory So the next time someone's asked for it, you know, so I know that I know the answer for that request So I'm going to reply to you directly from memory. So it's going to be super fast. Okay, very easy Um, usually in Drupal or any application that do have some sort of personalization You do not cache Data that is different from the different users. So meaning that if you have a session cookie, then you are not cached If you don't have a session cookie, that means that you are anonymous and then you can cache Okay, so in Drupal 7 you have Direct integration with varnish as long as you enable varnish you configure your vcl You set Drupal to cache the page You should have pages being cached If you are authenticated, then you stop being cached. Okay, so it means that every time I do a request Then if the request is the same, I should always get the same answer So it should come from from from varnish or it should be super fast. Okay, so let's try something here So if I go to Drupal cons and if I go to Drupal contract, it's a note Okay, so what I'm going to do is I'm going to copy These pray that the copy in French is the same as in Other languages open these Okay, and now I should be requesting the page from varnish So if I open it a lot of times And then I'm going to try to open Development modes There we go So let's see if I can show it is So most you this is not the standard in varnish But most of the vcls that you'd find out there on the web you contain this information So usually it's a good idea to put as part of your varnish header response a header That would contain if the page came from the cache from the varnish cache or from the back end Okay, so in this case if you look to The headers you find that There is a hit here Okay, so that means that the page is coming from from cache And if I refresh it several times No, not this one because this one is going directly to patching Okay, so you see that it came from varnish and it it hit varnish already once. Okay, so if I refresh twice I get it twice. So it's coming from varnish varnish is working well, right? So Suppose with these pages never hitting the back end, right? So I request the page pages in varnish. I'm fine I don't need to worry about the performance on the back end because the page is coming from varnish and if you look to most of the Um, the assets there is also true the same thing, right? You have css You have js all of that you see xcache is coming from from varnish, correct? However, this page has a problem So if you if you refresh the page and you look to all the requests You see that I've put one of them um, which basically let's see if I can If I can show you what it's doing So the use case it's so imagine that this page is in varnish because it's a node about Drupal con track But you want this little piece here the number of people that is attending to be refreshed from time to time So you will you want to have an expiration time on the amount of people that are attending the Drupal con Different from the main page. Okay, so you don't want to clear the main page. You just want to refresh that small bit Okay, so I'm loading that small bit via Ajax Okay So if you look to all the requests You'll see that there's one if you want to filter it you can filter by xhr which is like Ajax calls And you see that there's this request. Okay, so In outpages. I'm requesting something like This Okay, so it's a url that contains the node id so the Drupal con node id in this case 42002 um And if you look to it and if we refreshed several times you'll notice that I'm always getting a miss Okay, so you see like the xcash here. It's always a miss So all the requests on the website are cached everything is coming from varnish Or every time I do a request to this page The page does another request via ajax to another resource and that's never cached So in reality what this means is that I'm hitting the backend all the time Okay So why I'm hitting the backend why I'm getting a miss from this call Any ideas? I can refresh several times So the url seems similar, right? It's like Drupal con swathers attendance slash 42002 And then I have some perimeter and if you look to it the perimeter is always different That perimeter is always different, okay So if the perimeter is always different that means the url that you are asking from varnish is actually always very different It's not the same Drupal identifies resources by the url If you are asking something with a different url, then you always get a miss. Okay The reason why is the miss is I can show you the code Something that is also very easy to to forget But if you look to the codes and if you look to These Drupal con attendance model Which is Doing the request I can hide this Yeah, there you go Um, so that's a normal jQuery ajax request, right? There's only something there that is a bit misleading for varnish, which is the cache perimeter. Okay So if you set a jQuery request Ajax jQuery request with cache sets to false What jQuery is going to do is is going to add a small cache stampede in the end. So the cache Entry that you have there the key is always different and that's the reason why varnish is always Not not knowing what's what's that so that means that every time that you request the page In reality you are hitting the back end even if you don't know. Okay It also means something else That is also bad So Drupal is varnish caching this page or not It is right The second time I access the site I don't get it because i'm asking something different, but the first thing I ask it is cached So that means that in reality varnish is cached thousands and thousands of requests Without any reason because anytime that I going to ask something is going to be different Okay, so not only you are hitting the back end as you are filling your varnish cache And then it depends on how big it's your varnish cache But you can run on the situation where it runs out of memory because you're just saving too many things there without any need Okay And that was the last example I had Just after all these then That comes the normal the the normal performance talking about caching. Okay, so before Start caching try to understand if what you are doing is really needed or not Try to understand if it's the fastest way you are doing it and after that then you can start caching Um, and there's plenty of ways of caching caching for anonymous caching for authenticated caching at the object level caching at the partial level caching at the page level I do have a blog post about it if we're going to check it out And that that was the that was the end so as a summary um Drupal is It's powerful because of its community So before trying to fix something in your own way or something that you find that This is going to work because you know how to do it in php or you know how to do it before Drupal Always look to others with the same problem in the community Look to the hqs make sure that you are not coming with something that is Very tricky if you are doing something very very tricky in Drupal And you are the first one doing it. It's probably like a very good reason why you are the first one doing it Okay Go step by step never exclude possibilities There are things that seem basic seem obvious and people forget about it. So always check the obvious Um, learner tools that you introduce you to they are Simple and if you know how to use them if you have a good tool set on your hands That's probably what you need And I always try to understand the whole system always try to understand the whole figure Drupal is can be can become very complex because you can you can assemble much A bunch of models that will change the way as the other models work So always make sure that you understand what's on the route and go to the go to the base route always of the problem So that's the the finishes before you're opening for questions again. Um We are hiring everywhere So if these kind of problems is something that excites you and you feel passionate about Solving clients problems. We do have several positions either for Our team or for support or for sales of engineering in Almost everywhere in europe. So if you are interested just talk to us And we can open for question time We have Supposedly six minutes until and we finish The problems that we present here was something that you had before Which ones were the Like performance security Architecture, what was the the most affected ones? Yeah, the behavior the behavior of an application that has thousands of users or millions of users is it's very different So it can be you can be getting a situation where just a small thing that usually it's fast at that point It's it's very slow Um, and we we see that all every day So understanding where where is slow at that point you need to go a bit more low level You need to go to xh prof you need to go to analyze the queries that you are generating understand why they are slow explain them And try to understand another thing that I always find in performance that is missing From most of our clients is data like if I ask to a client how much time is your page taking to render It's very rare that someone is able to render to answer me. Okay. So for those kind of situations you need data You need you need data in your Apache logs that saves the rendering time of your pages You need things like new relic that allows you to save A sample of pages from your site and understand how much time is it going to take to render And at that point if you know, okay, all my pages are taking 10 seconds to render then You know you have a problem and then you can start depending where is the problem But the first thing is always data and then things like xh prof and Some adoption of xh prof that can run from time to time and then you can check from time to time Or things like trace view or new relic are very good tools to understand What's happening on the functional level and it's easier to understand what why is it so slow And usually when you have like you don't seem to have like a concurrent problem You don't have concurrency like three four users. It's not really a big problem It seems like probably like all the pages are slow there. So it's easier to get from one of those reports So Based on is always to guarantee that you have something called testing that is really testing It's like a similar as possible to production And if it's not then it's not really testing Um, sometimes it can be tricky and I think everyone knows that that convincing a client to have a complete rip of production is not easy and it can be costly But even like simple data like you can run like the xh prof model allows you to run to run an xh prof run every 300 requests So you don't need to it's something that is slow to run xh prof But if you run them in every 300 requests, no one is going to die for that And it's probably like it's going to pay the bill in the end for the fact that you at least know what's going on And other things like just put your render times in Apache logs, for instance So you know that's something that will not affect your performance a lot And you'll know how much time Did every page took to render and then you can even say well Looking to the looking to the data I can say that the problem is only when they are looking to a node of a certain content type or just on this landing page or Just in some situations. So the The worst risk is always try to look to the site completely blind and try to guess what's going on And what's going on can be completely different from From from development and from production. So Yeah, I mean, it's it's very rare to find a client these days Very high hand client that is not using memcache. So memcache is definitely recommended And it's definitely something that helps you scale your site also something very easy to install What kind of problems