 Welcome to my session. I'm Mark Sanabam. I'm a performance engineer at Acquia. This talk I put together based on the experiences I have in my job. It's a little unique and the things I see from the Drupal community and Drupal developers in general. So my job is actually just to work on internal products at Acquia. I don't work on sites that much but I do end up getting pulled into it and seeing a site, excuse me, louder, is not on. Sorry, I'm tall. All right, I can get closer. So I usually see sites after they've been through a couple iterations of like performance, optimizations and tuning and the problem has been worked on before before I see it. And so I've been fascinated in the process of how the site got to that point and I've found some things that I want to talk about. So this is the typical scenario that I see all the time. So the general symptom that we always get is like, well this page is slow and really I guess you're lucky to get that sometimes. So really common to just like turn on develop query log, just start looking at queries and then just from there you start trying to fix those things you found. And that may or may not have actually fixed it but you just fix those things because you found those things. And then as soon as you don't have, as soon as you can't find anything else in that one little place, that one little piece of data that you got, you then go to crazy, crazy things like that. It's really, I mean it's almost kind of not funny because I see that all the time where it's like you get to a point and I hear slave databases or reed slaves and ESI so much that I almost don't want to use them when they're actually appropriate. So looking at a performance problem and fixing it is relatively simple. I mean that's a total oversimplification but the problem with this is that first one is the most important, right? And that's often the one that people skip. So all of those other steps are dependent on getting that one right. If you don't know what your problem actually is you'll go on to solving other problems. It's also the most difficult unfortunately. I can't tell you that the collecting and looking at that data and really figuring out what your problem is is simple but it's also just completely necessary and yeah. It's also the easiest to fuck up. So I see that all the time where you get the, you get the data, you maybe look at it the wrong way, you get the wrong data or you just don't know what data to get to begin with. So when you do misdiagnose the things that I see most often I see lots of premature optimization. So you might see something like, like if you're like if you're looking at queries on a page it's the thing I've seen a lot where you see okay this query is called 10 times and it's the same query over and over. Okay you might help to fix that to where it only gets called once but how fast was that query right? That query might be so fast that it really doesn't matter in the end like you're not going to get much of a gain out of that. So yeah in premature optimization it's I mean it's a whole talking itself but microoptimization is sort of the same thing or actually more of what I was describing. Solving problems you don't have is it sounds really dumb but I see this all the time. People are always solving problems they don't have and there are consequences to that and I get the feeling that people perform these optimizations as if they are free and they're really not. So if you say like okay add a MySQL Readslave because it can't hurt right? It can only make the performance better except that's not the actual cost to it. The cost is the complexity in your setup right? When you go around just adding static caches everywhere to things your code becomes less readable and more bug prone. I don't remember who said it but remember a quote sometimes like a cache is a bug waiting to happen and it's very very true. I mean caches are great but they make your code more complex and it's harder to manage so they're not always the answer to everything. And especially with static caches one thing we just don't really think about a lot is that you're actually adding a lot of memory overhead to the page. If you have some big result set that maybe gets called a couple times and you just want to try to avoid that query and you put a static cache on it it was like well now you have that in memory for the entire request whereas you may not have had that before. So static caches are not free. And yeah for your infrastructure I mean adding something like varnish or might as well read slaves. I mean varnish by itself is not that bad but varnish with ESI you have to have a team that can handle that and not there's very few Drupal teams that I've seen that have that also have the infrastructure team to manage that and you can get over your head very very quickly. So yeah what I'm describing sounds very unscientific right? So what we're supposed to be doing as developers this is sort of related to computer science you'd think right but it kind of isn't the way we do it. We as developers we really like to use logic and reason why this is the problem but that's really just totally useless it doesn't matter it doesn't matter how clever you are or like how you can reason into figuring out what the issue is. The only thing that matters is when you measure it and you and you have the raw data. So this is something that I've learned doing this because I I'm often measuring things and it's like I work with the hosting team berries here somewhere I'm always met yeah I'm always measuring things and then showing showing data to bury and I'm just sort of I just get in that workflow because I know that if I don't give it to him he's gonna ask and when you do that it's a very humbling experience because you'll find that when you actually measure things you're wrong 50% or more usually and then you see oh actually I thought I knew a lot about this but I don't actually know. MySQL is a big one like that I thought I knew things about MySQL and it turns out MySQL is really really complex so collecting data is the big issue. We're gonna briefly go briefly go over just some the general tools and techniques for for collecting the right data. So if you're debugging just typical response time like this page is slow and Drupal like time to first byte before it gets to the browser. Profile it. If I mean I much much prefer xh prof I think there's probably no reason to not use xh prof anything else is guessing. If you I mean you can use the develop thing that'll tell you how long your page is but that doesn't give you anything to dig down into and that's that's actually the thing that I see most is that for some reason there's just this barrier to profilers like how many I mean how many people are here are developers okay most how many have x debug set up in their environment with like an IDE so you can do like like debugging how many have xh prof installed and like ready to use all right fewer this is probably a special room generally that's not the case but and those of you who have set up xh prof know it's considerably simpler than actually getting x debug working with a debugger it's not difficult at all but it's just something that we don't consider part of our standard toolchain and I've I do everything I can to fix that I wrote the xh prof Drupal module which saves you the step of getting the like the PHP source from Facebook and setting it up on another vhost and I'm working on some other xh prof related things to make it easier but there's still just the barrier of installing that PHP extension and and even if you don't a lot of people set it up and just get the report and have a hard time reading it I would say even if you get to that point that's still valuable because then you can give that to somebody who does know how to read it and then they can help you but that's but getting to that point is really important and I find that's what most people miss so at most people I mean there's most times I ever see anyone do benchmarks especially on like Drupal.org they're always using stuff like AB it isn't awful but I just wanted to point out that if you're if you're making code changes like I just made the small code change now I'm going to run AB to figure out if it's faster or slower that might work it probably will work most of the time but why would you want to measure that many things if you only need to measure one you know that doesn't really make a lot of sense because there's a lot of different ways that that test can get messed up you can end up with a confounding variable that makes the makes your test worthless essentially because then somebody else is going to do it it's like well I get wildly different results like well that's because you're testing five different things and in that case for code changes just use XHPROF it's really easy like both the XHPROF module and the PHP Facebook source you can give it two runs and you can do a diff between them and then you can drill down to just that function and if you've ever seen if you've ever seen any of the core issues where I've done profiling I always attach images from XHPROF and I always show the like that particular function and the difference and the time spent within that function and because and it's so easy to get there's just sort of no excuse to to do anything else because you have the data right there in front of you you just have to collect it so this is a really like just a gross oversimplification but if you're ever doing just like front end I mean there's a bunch of ways to collect data on front end performance but I find the most helpful thing that that I see missed a lot is that people don't know the chrome dev tools as well as they could it's it is the most incredible tool we've ever had for looking at these things the network tab has been great forever the like the job like the actual like javascript I forget what the little tab is but it shows how things are painted really complex but it's also really really worth learning and just if you look at that network graph and understand what the two lines mean right the the document ready and window load event knowing what those events mean and if you see that first line is super important and that one that's the one it's like until it's finished a user cannot really like interact with the page and so if you see a bunch of requests that are that seem to be pushing that one out that's a good good thing to look into because you probably have scripts that are like probably at the top of the page that are blocking other content from loading but yeah that's that's obviously an entire talk in itself but learn that and for front end stuff really that's that's the only tool I use I don't think I use any other tools but that you see yeah so firebug essentially does the same thing um and actually I mean one one really great way to to collect raw data from browsers is in the horror format there's a bunch of different ways to collect horrors but in chrome you can actually like right click on the network graph and say export is har and har is just a big s json array of of all of that data and like things like web page if you ever go to like web page test.org or like browser mob has their their little checker deal but all the all the all the sites where you just put in a url and then it goes and crawls it they all save horrors and then they all usually show it with the har viewer and har viewer is a it's an open source like just really simple like javascript html thing that you just give it a har and then it'll show it to you and that code is actually extracted from the firefox net tab um so it's good I still like firefox so yeah for web servers just just please stop benchmarking them um they're it's really really unlikely that it's your problem and people love to talk about benchmark or bench or web servers and they love to benchmark them and to look at how much faster engine x is to apache and whatever that's awesome like I would love if you can show me that uh with your web server is actually your bottleneck that seems worthwhile but I've yet to see the Drupal site where it is so I I think it's very very unlikely it's more likely that your web server is misconfigured so if you don't have like say like max clients configured correctly that's sort of an issue but that's not a performance optimization that's just configuring it correctly and so for custom stuff most most the actual measurement I end up doing ends up being custom because I end up having to measure something specific to like like on the aquea hosting stack and the one thing I would have to say is that you should always just record the raw values I see a lot of people will give me data that they collected and they'll give it to me in a format that's already sort of calculated that's like it's already like an average or a tech request per second or something and that seriously limits your ability to look at that data later because you're already sort of locked into into the the way you wanted to view that data so if you can get the raw data CSV is incredibly simple and useful I also see a lot of people put stats directly like my SQL I guess you could but that seems a lot more complex to me any scripting language I mean you can even do this with bash it's really really simple to just write out data that ends up in a CSV format and then most good plotting and visualization tools can read CSVs so just briefly want to talk about like the idea of confounding variables that you and saw like probably a lots of you saw like six months ago or so there's a really good blog post by Zed Shaw like uh like programmers need to learn statistics right will kill them it's worth googling and he touches on this a lot and it's but it's really a huge problem that people don't understand the idea of confounding variables affecting your benchmarks it's somewhat of what I touched on earlier we're talking about not using ab like any other piece of that stack that you're measuring that's not the thing you're supposed to be measuring is potentially a confounding variable so yeah and here's my completely made-up stat on benchmarks on the internet but you should know like you can't always remove all of them shouldn't use that as excuse not to try but it's I found it's sort of about finding the balance between realism and isolation so like if you're doing like there's some kinds of tests when I want to set or test a site or like a like a whole stack where I'll want to go through and use the tool like browser mob and actually make sure that people are going into browsers and everything is happening like it normally would um but that can often be like if I'm just trying to measure the difference in a code change that's probably overkill and there's probably way too many things there that I'm actually measuring and I'm not isolating what I want to met or what I want to so I might compromise and just write like a small script that just hits some URLs that I want but you can but then if you go to the extreme and then just use like so say you're tuning my SQL and then you just use something like like sysbench that's great and you can write down like your your eyes per second and you can show them to people but that may or may not actually tell you that if your your Drupal site will run better and so it's always about finding the balance of measuring just the exact thing that you want but then also also making it realistic enough to where it's still relevant to your actual use case so just the really simple things that I do to try to avoid this uh working a known clean environment for me this is this is my MacBook Air when somebody gives me a site and says the site is slow I never ever look at it in that environment first I always get a copy of it get it running locally and I profile it because I know if I can recreate that same page load time on my laptop then it's really clear I can figure out what it is even if the time even it's usually a little bit faster um if it's a couple seconds different that's fine because really all I'm looking at like an xh prop is the relative times I can see well 70 percent of the time is spent in this thing so if I fix that ideally it's going to be uh have the same effect in the other environment and that's fine if it doesn't because if it doesn't and I get it working well locally then I can move into the environment and say okay well now I know this is an environmental issue so it doesn't matter so much that I mean I'm a completely different OS on an SSD it's a totally different stack the only thing that matters is that I know it and so I know what to expect and if there is something that's different I'll recognize it quickly so once you get the data uh I mean with tools like xh prop they have like built-in viewers um but when you're dealing with any kind of custom data or even data from ab if you are using that it has like a really good option to export csv that you normally put in good new plot but um I've been using r a lot I think it's a really great tool for uh for visualization it's not the easiest thing in the world but it's uh it's very worthwhile um because if you if you get your data in a certain way like if you visualize your data in a certain way in r it's it's much easier to to view it in a completely different way with uh with very little work and very little code so just out of the box it uh comes with its own built-in plotting um but then there's most people use a library called ggplot2 um it has nicer graphs this is a pretty shitty example because these are it makes it look like ggplot gives you pastel um but it's a bit more than that so and just uh a little bit about the different kinds of ways to visualize data this is something I'm I have music degrees uh I don't know statistics that well I'm sort of learning all this myself and I never occurred to me how different different types of plots are and it wasn't obvious to me initially like which ones were appropriate for uh for what kind of data set uh line I always intend to go to lines and if I was especially if I'm doing time series stuff I do it less now but uh like this is one example where it is pretty appropriate um I was actually measuring I was measuring the rate at which php process were spawned uh sent a bunch of traffic to a machine and waited to see like how how fast CGI would spawn them and this was really illuminating because I could see over time how long it takes like while some um well some already got to the max it still took a while for the other ones to catch up which was which was surprising but that's specifically a time series thing and I like to use area for time series but I think just mostly because it's pretty I don't think it's all that useful most of the time um because yeah you can just see that it's very spiky that tells you something uh that tells you there's lots of variation but I think there's better ways to view that most of the time so bar is another one that we see a lot um it's a little more complex though um if you're just looking at data that you usually see in like a bar form it's probably averaged I think the better way to view it is like like this is a histogram so it's showing you like so as I measured what percent of memory my sql was using across all the servers and aqua cloud and so that's just showing me um how many there are at each percentage point and then I just took the median line on top of that um but and that's a good example of like giving r the like the basic raw data and then it can calculate how many occurrences of that uh that there are in the data set and I don't have to like pre-calculate that so I get touched on before with averaging um you see this a lot I see this all the time where you're presented with data and say okay this thing is faster than this thing because I took all of those values and then I averaged them especially when you're doing any kind of performance measuring averages are really inappropriate most of the time they're not really telling you the whole story so this is actually this came from data that we very recently collected like another engineer collected some data on the new amazon provision diops um and then this is the like he gave us some numbers I was like all right that's that's interesting like like give me the raw data for that because I have a feeling there's there's maybe more to this and if you just looked at that you would say well shit provision diops is slow let's not use it right but if you look at it with a box plot box plots are really great I use these all the time uh it shows you like that big line in the middle that's the the median um and I guess I should explain it really quickly like a mean is average median is the point at which 50 of the data is above or below that which I think is most of the time a better metric to use than mean but it's good to see them both so that's the mean the top of the box is the third quartile on the bottom is the first quartile so uh all the data fits within like 25 percent and 75 percent and then it also gives you like the lines extend to the min max and then it gives you what it's considering outliers but from this you can see so one of the things with provision diops is it's supposed to be way more consistent there isn't going to be a lot of variation in i o right that's clearly visible here because you don't even see the box there's practically no variation at all whereas with regular UBS there's quite a bit so what you get from this graph is not that one is slower than the other it's the one is more consistent and without knowing the standard deviation of the data set mean I mean averages are just completely useless so it feels a little silly to put a a link like stack overflow but if you are interested in learning r it's it's absolutely the best resource and there's actually a special like statistics uh stack overflow and and that one book that I've been going through has been really helpful uh there's a huge r community out there and there's lots of resources uh and lots of help and I didn't put it in but I actually just recently started using like r studio which is a like an ide um I don't usually use ides but it's really great and you can put things in this format called r markdown where you can it shows you the code like you normally would in markdown but then it'll actually embed the plots in it and then there's a service tied to it called r pubs and you're gonna say publish and your code goes like goes to the site it's like gisting uh graphs and uh you can have a url for that and then all the data and all your code and then the results of that are all there which is really nice because I can just pass that on and everyone can see each other's work so I highly suggest checking