 Yeah, is that good? Is this working yes, is that good am I good here a couple more questions, and then we'll get started We have about a minute according to my phone Some of the tools that I'm going to be covering are or two tools that I really spent a lot of time on our Load storm and the new relic are you guys using new relic? Okay, so it's it's it sounds like my hands or looks like it's about a third as well Or you're heavily concentrating up. It's about a third of new relic users, right? What are other tools that you're using if you're not using new relic? Okay exchange prof J meter that for this for load testing. Okay. What about so low testing tools other than low? Have you anybody heard of load storm? Okay, cool. Has anybody used a new version? You guys use a new the new version? Okay, cool. I'm gonna talk a lot about that. I just use that tool on That's it's a that's a nice tool, but I'm sure there are others and I've used others that are Probably equally it's good So other load testing tools J meter Siege oh, yes siege that's it's an oldie, but goodie Anything else? Which one? Blitz What's that? Is that the one that comes with the aquia a subscription? Okay. Well, I think I'm gonna get started So again, thank you for coming. I've planned 45 minutes if we have We're covering through a lot of stuff That we're gonna get get you guys out of here a little bit sooner in particular if you don't have any questions So and again, thank you so much for coming So my objectives were pretty simple. I've talked about performance and that's something I really enjoy about Drupal been around in IT before Drupal So it's kind of like a personal hobby and interest of mine And so what I did is I tried to we've little tested other other Sites and and our normal jobs and when I take something that I can really present to the community and kind of Breakdown in myth busters mode What works and what doesn't work or what are some of the what are some of the results that we've had on a pretty standard? Install on a pretty standard set of test cases, which hopefully you can take away and find and see what really works, right? I mean obviously that's not gonna work for every case every Drupal site is different every Drupal site It's going to have different set of performance problems But if we're gonna talk about generalities and if you're you know kicking around things like everybody said Memcash varnish I thought it would be interesting to see what actually they what kind of improvements they actually have What kind of impact they have on a pretty typical setup? So that's kind of what we did we stood up a site stress tested build a script stress tested it looked at results Made an improvement and repeat so that's kind of what this talk is about. It's pretty straightforward Hopefully it's somewhat interesting So what I think what I like everyone and take away is how to take a look at and what to think about when you're looking at test stress test results And also and again, this is particular to how you test which is almost which is equally important as what you test What are some of the improvements that we found to be have most impact? Does that sound like what you came in here for? Awesome, well, thanks So a little bit about me. That's me. I'm not a scuba diver. It's me biking in Chicago. I Do like to bike in the winter although a lot of my colleagues are a lot tougher. It was like minus 20 that kept on biking It's it's very impressive So by the way, I made it very easy for everyone to follow me on Twitter my Twitter handles a koharski Probably don't even need to write that down just rolls off the tongue I work. I'm a founder and president of promet source. We're a Drupal firm just like many others here We do focus really on dev ops and we cook with chef and we provide support We've been around since 2003 and that I mentioned. We're in Chicago. We're in Chicago and Thanks, some of you promised us for coming Melissa I want to give a big shout out to a few folks that from my team that really really helped me out basically They did all the work So that's Kabla Vincent and Greg they're on our team In sis ops or dev in dev or in sis ops So they're they're really great and I couldn't have done it without them like I really couldn't have done this without them and I also want to thank Scott price who is the CEO of loadstorm again I'm not here to kind of push a loadstorm tool But he has been instrumental and has helped me tremendously Looking and analyzing some of the results when I mentioned that I'm going to be using his tool. So Scott if you watch this on Recorded video. Thank you very much. I really appreciate it Okay, so a couple of things of what this is not we're going to get to the good stuff in a couple of minutes What load testing is not is it's not front-end performance testing, right? So we're not looking at improving your web page load As it loads without without any stress on your site Right, that's very important. I highly encourage everyone to do that. That's not what we're doing here It's also not high availability analysis, right? So somebody said add more servers absolutely that will increase your performance However, it is not testing your high availability high availability gets tested when something gets broken If you've heard of chaos monkey or if you haven't as I highly suggest you look it up It's it's a it's a methodology used by Netflix to Inflict pain and make sure that they survive any kind of unexpected Events So why low-test? capacity planning if you're expecting a big event if you're expecting spike in traffic if you're expecting to Partner up with somebody who's going to send you a lot of traffic and you have some kind of an estimation It's also allows you to State what you can handle Performance tuning is done on a page and by page basis XH ref Obviously does a lot that of that for you tools like develop help you Increase or improve your page performance. That's very important. I I have found that certain sites Behaved differently under load. So we have been able to take some lessons learned from that And I have one little story before I jump into Data and that is I found this study really fascinating. I came across it at the velocity conference. By the way Velocity conference is amazing if you haven't gone through that. I highly recommend it. It's it's all about web Making the web stronger But strange loop is a company that presented this case study. They're a Performant they they are in web performance and all aspects of what performance from from network delivery to the NS to Application improvement and what they did is they talked a large e-commerce Partner a client of theirs into allowing them to take a small percentage of their traffic during all Throughout a period of time and delay it and then measure results It's amazing whoever whoever talked them into it. It's amazing and the findings were really fascinating. So As you can see in half a second delay, so you they started noticing An impact right away from a to from basically a point to second 200 millisecond delay Pageviews started dropping and then it was kind of consumer behavior, right on this e-commerce site with half a second Their bounce rates went up the conversion went down their page views went down and it went when you delayed it By almost a second their cart sites started going down. So performance has a big impact on Human interaction with your site. So always keep that in mind So some concepts. I'm sure most of everyone has probably seen this. This is a waterfall diagram of HTML object The the the horizontal axis the is the time it takes for all the objects to or any of the objects to To the time it takes for it to download to your browser The vertical axis is the number of objects on a page So in this case you always want to have a short x-axis and a short y-axis Things on an x-axis things loading faster on a y-axis small number of objects This will also impact your performance testing As you have less objects that your server has to respond to it would be able to turn pages turn through pages quickly and You can do things like offload things to CDN, which will also allow it to respond faster Further you can break down each one of these objects into their own performance Into each individual performance. I wanted to mention this because we are not Going to talk about SSL in this part. However SSL negotiation does have an impact Especially on e-commerce site and e-commerce sites and I think generally now whenever you have a site that has a login It's generally accepted to have an SSL Certificate installed so you so you can't pick up that traffic But it starts from DNS lookup to initial connection How long does it take for the server to process that Request and send you a response and then also how long is it take to take to have that content download? So these are the basics The one other thing to keep in mind is Generally when you have a lot of your Static objects offloaded off your server when you're talking about performance on the server side You're only mostly talking about impacting the first HTML or a few objects that are being processed on that application server So the back-end performance generally is that first request Most of the other things can be in can be greatly Improved on the front end so Striping images reducing the size of your objects Combining them and so on does that make sense? Okay, so just want to get that the the basics down We're gonna talk about the performance stuff now So if anybody's interested the site is still up As I mentioned in the beginning we use load storm commerce kickstart, which is a commerce guys Drupal distribution for commerce it comes ready loaded with with products and we use New Relic to analyze the the server Performance or what it was happening on the server as we're low testing these Low testing the site and you can take a look again. This URL should be open if anybody's tried and it's closed Let me know But if it's not it will keep this site around for a couple weeks after the presentation If you want to take a look at the PHP configuration and everything that's on that That this was running. This is a just a an Amazon instance. That's gonna get blown away Okay, so here's our victim. This is the commerce kickstart out of the box Here's what it looks like on a home page. It's pretty quick 37 objects. It doesn't have anything Terribly heavy on it doesn't have a lot. It doesn't have any The other important thing about this page is it doesn't have a lot of Objects that are that call things from the outside During a little testing we wouldn't be able to do that unless we get authenticated anyway, but generally speaking you want to You want to know you want to make sure you have anything that you reference that's that you have no control over you put that To the bottom of your HTML okay, so low test is Low testing is this is comprised of steps which are basically a step is generally a Click on a page or opening up a home page And then a series of these steps up takes it builds a scenario And then what we do is we take those scenarios and we plan a test So this is really important But I only have one or two slides on it because it's basically stays constant, but when you build your Load plan or your load testing plan what you're trying to do is assume or or basically Simulate the traffic that you're going to expect when You expect the highest Peak of usage right and this is key because if you if you The results are going to be very very very different if you we make certain assumptions that do not reflect reality of your traffic most importantly is Whether you're testing with anonymous users or authenticated users That is that that basic if we change if I were to take this test and change around the percentage of distribution between anonymous and authenticated Most of these findings we completely different and invalidated bait, so this is for this type of test So let's take a look what we did So what I've done is I've built four scripts So four scripts was basically basically think of that as four different types of users that are the that are hitting your site My first script which we run 70% of the time So when we fire up our test we give that a weight of 70% That means that 70% of the users as I'm incrementing the users are going to be running script one Which is basically an anonymous browser at somebody jumping in hitting the homepage hitting a couple products, and I'm bouncing out Script number two is a login of a user six page views log out Script number three somebody logs in puts a product in cart and then bounces and in script number four This is I this is the one that is the key script for this case Is the script is a user that logs in? Puts two products in a cart and starts to check out process So inputs information to the checkout process and goes into the next screen load storm or sorry commerce Kickstarter saves that information into the database. So I'm able to view that information into the database so The reason why this is key and I'll probably repeat this multiple times is as we're getting it was we're implying is we're Applying certain Tools to improve performance some of them really benefit Anonymous users for example like views some of them Impact non logged in users. So if you change your tests around you change your assumptions around you can get different results our target environment was a single CPU medium Instance and Amazon That's it. There's nothing else about the slide. Okay So the second thing that we're trying to do is determine breakpoint, right? So how do you determine breakpoint under a stress-test scenario? Actually, that's a question for you. Like how would you determine a question? How would you determine breakpoint? Server timeout. Okay, great So when you get a five when you get a 500 or 502 you get that's that's it. We have a problem That's how else can you define it? Or how else do you guys define it? Look Okay, perfect. Yes, so you can say my goal is to make sure that My pages or my response object response doesn't hit a certain threshold and I'm going to I'm going to define what is acceptable to me So there are different ways of defining it and actually this was a this was an exercise that we went through a couple times around because We kept on finding breakpoints or we kept on finding some things acceptable and we looked at you know load average And we found that our little server was performing at you know 20 Load of 20 it was still sending responses. They were acceptable to the users. We had that insight so we decided I should say I decided that we're going to look at and Determined that we have a break. We have we have reached an unacceptable Response time or SLA when the average response time to all the objects is greater than a second We have errors an error rate of greater than point five percent and My home page on load on the first script takes more than 10 seconds to load Those are just my definitions. They don't have to be yours But I found that through running my test and repeating and looking at them This was an easy gauge for me to say okay now I know that with this performance I can and we can have this many page views or we're gonna have these many visitors before I Break my SLA promise to my client or to my boss or to myself. Does that make sense? Right, so we're gonna use this all the time these can also vary So if you set your error rate to be zero then you're gonna get different results, right? But we are a little bit more tolerant It all depends on you but but setting these goals in the beginning is very important because as you're testing them Then you'll find out you can say with some level of confidence We know that we can take this much traffic with and the server will do this Based on this type of traffic based it based on this type of user a profile behavior All right, so here we go this is gonna be lots and lots of graphs Baseline out of the box install no optimization no optimization and no caching We started a little test with 250 users any Wagers on how far we got before we reached an unacceptable criteria. Oh my goodness that was spot on Sir you should come up here and help me out Okay, so we hit So a couple things about this graph. This is a loadstorm graph It gives me a lot of information and it can be a little bit. Let me see if I can show you What this looks like? Live demo be kind Okay, so here's a test on a We're gonna be seeing a lot of these so I just want to Okay, so the the blue light blue line gives you an idea of how many users I have so if I at a given time so I'm going to start my test that 50 or something. I'm going through time as time goes up. It's going to go up So this is the blue line indicates how many users I'm hitting the site with The next thing that I like to take a look at to start seeing where my problems occur is the peak response time now Remember, I did not set a failure That I didn't set a failure criteria for a peak response time being over a certain time However, when I see this thing going over 15 seconds I know that something is failing on the server because loadstorm tells me that it did not get a response Or it's still waiting indefinitely basically it stops waiting for it after 15 seconds and you can set that parameter The next thing that I'm going to take a look at our errors now I did have some errors that I Did not remove in my Script so I so if you guys look at this, this is what we did I removed those errors and it allows me to get rid of them and If and the last thing that I really pay attention to is a yellow line, which gives me the average response time to all of my objects Okay, so When you see this graph You can see that it's not really looking that pretty and pretty would basically mean that everything follows this blue line And I can see that initially unload if you had a problem loading some of the first pages And basically I had non-response I had non-responses Oops starting right here and as I kind of pen drag my cursor around this line I knew I see the first at 59 users I hit my 2% errors and 1% 1.2 seconds average response time So that's kind of what we're using to determine when we have a failure And we've reached our limit. Does that make sense? How do I define errors? great question so oops Basically, no, this is a bad example Errors are basically anything comes back from server. It's not a 404. I mean, that's the that's really the The easy response. Let me give you something that has a lot of errors So at a certain point where your server starts degrading and it's going to start throwing all kinds of errors at you So here we have and I turn off 403s per my internal agreement with myself I do not count those as errors because I Know that I can get rid of those they're due to the Drupal Commerce Drupal Commerce Install out of the box has some 404s force and due to my script. So just please ignore those I and put your trust in me So these are the generally these are the things that we've seen bad gateway Request timeout request connection timeout So basically the server starts being too busy to accept connections or starts tripping over itself Let's see So actually that was I had been here. So here are some errors The other graph that I look at at lodestorm still is at one point Does my home page start taking more than 10 seconds to respond and I call that? I call that as a failure too So let's take a look at what's happening on new relics. So this little Cloud server really started getting overwhelmed very quickly It went up to a hundred load load is basically a measure of how many processes are waiting for your CPU behind the current process Generally a hundred is a server. That's Can't probably SSH into it. That's it's it's pretty bad However, amazingly, it still kept on responding throughout this process Just they try to do what it did, but we stopped it probably around around here. We called it around here So what was happening or to some of the top transactions? Actually One other thing I wanted to point out is that PHP was our limiter here On the application side, so new relic allows you to take a look at what's happening on the application side The image style deliver the liver views page No page view just the typical standard Drupal operations. We're timing out or we're using up all of our resources And the shopping cart and collections of products are the largest CPU consumers So we turn on Drupal cache That was a no-brainer So what we did what I did actually is to measure and compare what we were able to get from baseline to subsequent improvements is Come up with a number that gives us based on the number of users at which At the point where the failure occurred we took that number and extrapolated how many users we could have we could the site Could sustain in an hour right that's kind of a more of a of a number that Is somebody making business decisions can understand right not to talk down to any to that But I can if you can say I am able to sustain 150 a Thousand users an hour That's a number that you can think through right how many users because you can think through how many users am I how Many users I might get from search From referral links you can take a look at history or Google analytics and see what the spike can be expected So turning Drupal cash on cash on is basically the biggest improvement. I'm not gonna spend a lot of time here That's a very obvious thing However, it it's it's also important to know that a lot of it that the thing the advantages that Drupal cash on gives you Drupal cash gives you Actually, not all the advantages Drupal cash gives you are used by logged in users So here's our famous or a graph that we're gonna look at a lot again. You can see that we're at 265 users that's pretty good or much better. I should say thing to note here is that My point of failure for the average response time was at 265 my point of failure on the errors was at 290 and it's something interesting happened here, which I'm going to go through Something really bad happened around minute 40 of this test something was happening As you can see I had a pretty Reasonable well, it's actually not on a reason before normal but normal or a normalized load and then something happens At this point, although even though my CPU usage was already maxed out pretty early on really a big I Can see that my PHP is consuming or memory can consume a lot more memory at this point When I look at my overview on New Relic and web transaction response times Huge failure also throughput went down. So basically I had a failure during my test It looks like there's some cash field that was being refreshed at this point I had a cash field insert event happening here Lots of insert time being spent here and then delete time being spent here. So basically invalidated my test We the other thing that I should mention is we run our load test at least three times Unfortunately, that wasn't so great for me preparing for my presentation because we run it for about three hours Which means it took days and days to get valid results, and I didn't have that much However, it's it's really key because if we really found a lot of anomalies from test to test And we basically took and look at looked at them and see okay What can we really expect in a real or is there something happening that gave me an invalid result? Like here. So we ran it. We ran it. Luckily we had three of them. We found another one actually the real number was 154 and That's the result We have normalized the humpback whale. I call this this top memory consumers and CPU consumers Jumps up pretty quickly. So this rise was pretty steep. You don't want that. You want that to be gradual Views is consuming most of the memory. So let's get at memcash All right memcash should give us a pretty good boost in this case. Why? Because we're yes Yes Yeah, by The question was when I turned on triple cash. It was using a database to store cash. Yes by default triple seven Stores cash information in a database. It's a very bad. I'd well, I shouldn't say it was a bad idea It's bad for performance and well, I couldn't hear your other question Yes, so we have a pretty good improvement in memcash. Why? because memcash Pull stuff it basically eliminates database the From the slide where we talked about what kind of tests we have all those 70% of our tests are Anonymous users 30% of them are not so 30% percent of them bypass reverse proxy and all kinds of caches from From Drupal and they interact with the database as they're adding car shopping Especially when they add Products to their cart. We're doing direct inserts into the database So this graph is starting to look better. It's starting to look more normal SLA goes goes up to 253 users Is anybody surprised or not surprised by the memcash? Finding this along with your experiences If we can take a look at top five Processes on the server. It's still PHP. It's our limiter CPU consumers PHP is by high by far the highest consumer So we are bound by that CPU on the PHP side. What's happening here though. That's nice is that We have more run with before the cliffs right so our our humpback fit with Excuse me our humpback whale is is is getting the tail. I didn't practice that. That's pretty good So We have more runway here. This isn't you know, this this is not catastrophic and immediately that's good you can see that starts the that are our Problems start happening right around here memcash is engaged Still PHP is using most of our resources or taking a lot most time to respond here a note page view So it's still views that's causing most of the delays I was actually a little bit surprised by that because I thought that it would be the inserts But but it's not maybe it's just 10% of the time but a home page note page view And then a commerce cart that is that's always gonna be a problem in this e-commerce site because it pulls every page view There the commerce cart Module gives Our users an indicator of how many items they have in the cart So it it it must be checking it in the database and then pulling it from there If you look at top five modules and Drupal and New Relic, it gives you views Social menu items and then co-commerce product reference Okay So before yes You know I actually I'll hold off on long out. I'm gonna wrap up pretty fast Hopefully in 15 minutes leaves us 15 minutes for questions. So the next thing that we did One of the general Good practices for Drupal again, we try to eliminate database is to stop writing your system logs into the database We want to write it into a file or a third part Something about third-party system. So stop doing inserts into the database So we expected some improvement on that. Okay, any other any wager wagers on how much improvement we've seen from that a Lot That's what I thought but we didn't So I Really thought this is gonna be huge and this is a surprise Finding of ours and again, this is with the type of tests that we were running, right? We were putting stuff in their cart 10% of the time and doing a bunch of cart shopping browsing 70% of the time We had absolutely see we saw absolutely zero and we ran this three times, right? So it wasn't like a fluke. We ran every test three times. We saw absolutely or negligent negligible improvement on performance or number of users that that's that that's fail for error rates and Average error response. However the home page kept on responding under 10 seconds a little bit longer. So Haven't dove in and I analyzed why that was happening. I expected a lot more. We got a little bit So I didn't I didn't because it was such a negligent. It wasn't a big improvement. We didn't dive into defining So let's go to the next improvement Okay, so the next thing that we did we had problems with views But I didn't cash views yet. What I did is I turned off views UI now A lot of you guys said your system admins you you guys probably really don't want anybody using views UI right on their life site Developers probably not either. Why don't you just use features? Unfortunately a lot of marketers and communications director really like views right because they probably treated as content So this may be a controversial move. However views UI turning that off had a Okay, I kind of spoiled it a pretty big impact. I Was very very surprised by this. This is almost almost a 40 performance improvement by just turning views UI off So you can see we're getting further here The graphics starting to look better and better. We're at 352 whoops turn for the two users I Pointed us out here. I thought this was pretty interesting. So we're We are getting we were breaking things down at Some some around 15 load. That's that's like I Didn't so I know this is to CPU That's ignore that because we switched that we switched that machine when I was taking these screenshots But when the test ran it was at one CPU machine. So one CPU machine You know, it was always working just fine at over 10 load That was something of a surprise for me. Generally, we try to keep it at half That's even too much, right? If it's if it's at if your load is at the number of CPUs that you have That's generally a warning sign. So this thing was able to respond Keep on responding at 10 the other the other thing that I don't have anybody's mentioned notice this on this screen This is a this is an Amazon thing. Do you see a problem with the CPU usage here? Or something that you would not be happy if you were using this computer server Like 50% of our CPU time is stolen So that means that it's going to maintaining that virtual machine virtual environment okay, so our Does anybody else call call this a humpback humpback whale whale graph? No This is looking much better, right? Kind of looks like it though, right? Can't give you to me Get much better PHP still like we haven't been able to get around it PHP This is this is nginx Not Apache so Apache you have but it's still PHP processor is still taking a lot of time It does have caching enabled it did have it. We did check it did have a pretty good hit rate It was still Consuming that was our limiter on this on this one CPU server Views still using a lot of it so show menu items not to par behind On a database side the commerce order is now starting to come in those those That was that's something that we're seeing a lot. So the commerce order so for the developers in here What would you start doing now? you would either ask the system and guys to pump more machines at this problem or They would say Let's figure out where what can we optimize this query? Why is this query consuming so much of database resources? Okay, so now we have a silver bullet reverse proxy And you want to anyone want to wager on how much improvement we had on a reverse proxy? one order of man to 50% Shockingly none Okay, so I really thought about this. Why am I not having a Huge impact. I think it's the way we set up our tests I think it's because what's happening. Let's see if I can show this Okay So I thought about this this is I really expected a lot as well Right, I mean everybody says the first thing you got to do put a proxy reverse proxy in front of it All of your problems are going to be solved forever and ever right Well, they're not in this case and why are they not solved in this case? Why are we still having failures? at 364 you for users, which is not much when we had before So I Looked at this graph and I thought okay. Well, what is this? What's happening here? Why am I having this? You know, I'm getting some responses that are taking a really long time What are they doing here? And so what I did this is this is tricky It doesn't look so cool when you do it on them. I'll show you how this looks on a graph So again, I'm not trying to sell loadstorm, but this is a pretty cool tool So what it allows me to do remember that? My script number four was the script that did most of the heavy insert insertion and work, right? So what I did is I selected it only gives me statistics on Script number four. That's the one that has a logged in user and starts inserting stuff into the database And this random test that I selected you can see that it immediately has It starts jumping up in errors and start jumping out an average response as to compare it to all of these scripts Which by the way are not way heavily or evenly right because they one of them runs 70% of the time so When looking at the fourth script which runs 10% of the time it seems to me that When this is this blue line as a number of users here as this next batch of user Oops, where's my cursor next batch of users starts loading up It they start writing orders into the database and the database must get choked up here So it's we've defined as our point of failure is number of errors and Number of an average response time well, it must be pushing out It's pushing up that average response time because of this particular script up overall So that's why we have that point of failure So remember when at the beginning when I said it's really important to think about what kind of test Scripts you're writing against your application if we only had anonymous. I am sure that that would make a huge improvement Oh, by the way another fun fact, so we had a we had another this is what this is New relic Looking at web transaction response time something was happening here, so It actually is not valid to us anymore, but Something we had a pretty big failure here And it looks like we had a Flood insert I didn't look into what that was happened what happened, but we noticed that something triggered this massive spike in Clock time and and it kind of threw off of our You know, we're looking at it because everything's relative It you know we we really couldn't see what was going on here It was irrelevant because if you go back to this test it was past the point of failure So we don't really care about that right we care about that line, but you can see that Something really bad was happening here a little fun fact Didn't solve it, but just showing you all right So views cash, why don't we turn it on earlier, right? Duh, that's like we have to that because it's a separate cash It had some impact not huge, but it helped So what's happening here you guys are used to this graph by now I Have two more minutes you can see So this is a this is a comparison of what we turned down views cash versus before you can still steal It gives us a lot more runway This doesn't even look like a whale anymore views still consuming a lot of our time And we had this spike and failure starting around here Although database throughput is pretty good. We don't really start seeing anything problematic until here Okay, so This was the biggest surprise to me We you know we we had a limited time to present we had limited time to to wear what we should do We thought why don't we test per Kona per Kona my sequel per Kona as a company that provides my sequel Performance consulting anybody here per Kona? Anybody using anything but my sequel Maria okay, I Maria as well. Okay, so is Maria have multi-threaded capabilities Okay, it does okay, so perfect So Maria probably will have the same result the my sequel enterprise version does the free version does not so Look at that. That was pretty surprising. We had a huge improvement in performance by installing per Kona my sequel This graph looks like great, right? We only have some small problems Again, not not a whale. So we actually It was running so well. I had to I had to rerun it 1,000 users were maxing out at 500 to get some failure points It does we're still not out of the woods, right? We're still running its 100% CPU usage The loads still go up really really high at 35, but but it kept on responding and kept on performing views and the special menu items are still And the views I'm sure so this is one one thing that New Relic if I wish is anybody here from New Relic if I the improvement that I would love if you could When I'm looking at your at the this is the application level and those that I'm Drupal I'm looking at most time-consuming And I know that's views I wish that could just click through views and tell me Which view like in this category or whatever this this Module which module if I clicked on views then it would take me to the database View with only those that modules Select or insert or you know crud statements being showed up here So I can clearly see that commerce order. I cannot get around this commerce order So I have two options if we continue to work on it on a team with us And and we didn't want to scale it by hardware, right? Option number one basically eliminate that commerce order view From my From my pages and and I don't know whether actually I don't know exactly whether there is a I didn't So basically my options as a developer would be to work on this module and figure out how to make it run better so in summary we have Implemented Drupal cache which obviously had the biggest performance improvement Memcache for anonymous for memcache with the second largest syslog sadly I wish it did in our test did not improve Performance so much view turning off views UI really really did And I will thank Greg palm here on our team for Pushing us up front. He is number one advice always is to turn off views UI Reverse proxy in this case did not have a big impact neither did views cache And I believe that's because how we were testing and the Percona database and again That was a kind of a last-minute surprise finding really really did and we're really going to move forward to to Using a lot more of it So I scheduled to finish at 545. It is now 547 Here are some resources that I put on here. I by the way have I put all of my stuff in a spreadsheet that this is the link to and I will put So I kept track of all of our all of our tests when they were run What scripts they ran or the results with breakpoints when they broke and I've graphed all of this Here so all of this is available. I will upload my presentation to Slide share and I'll be more than happy to take any questions that you have right now Yes, could you come up? Actually, I was told to ask people to come up to the mic so that we can record those or Had for the doors for the fears Yes, I was just wondering if you could Elaborate on maybe why you think views UI has that impact. My understanding was that if you were An authenticated user, but you didn't really have permissions to edit views or access the UI that that wouldn't Really impact you Yeah, unfortunately, I can try to stipulate but I'm not enough of a drooper developer to give it justice I'm sure sure we'll have some blogs written up after this and I'll Ask some of our devs to contribute to that I'm gonna guess that it's probably because it's calling some hooks or it's calling some code while that view is being other views being called I but I contextual links I just had one other one. Yes, just quickly about lodestorm We didn't really get to see the experience of building a test in lodestorm And I was just curious what what's involved how is it easy? I mean my experience with J meter and stuff like that as it's a little tricky if you're new to it Yeah, okay, that's a that's I I really struggled with to keep it in so the question whether to keep it in or to Take it out. I just basically didn't have enough time to cover everything but Let me just try to show you really quickly. It's it's a really powerful interface so building tests takes place by going through your browser and Basically clicking around on what you want to do and then saving as an HRR file HRR file. It's basically so if you go to you you this are if I went to my developer tools and Turned developer tools on and I kept track of the log It basically keeps track of all the objects that are downloaded. I take that file I uploaded here I create a script which then takes number of tests and then what I can do with this script is I have really a lot of options So one of the things that I didn't talk about also is the fact that we had to build Users because we had two users logging in so we had like I had to build like a thousand users So what I did is I build a create an account script I saved it I uploaded it here And then what you do is you parameterize this thing and you can use so this is a HRR file And then you can just look at HTML and then what you can do is You can modify the URLs you can modify the form so you can use your own data to input your Your into your forms or you can have loads or generated for you. You can replace the query string server You are server URL. I mean, it's really pretty powerful. You can also add Transactions, so if you're interested in if you're interested not in error rate or paid average page load But you care about how long does it take my user to buy something or to create an account? And you have an acceptable limit for that you can create a setting there and just basically fail it at that so Building scripts is is fairly easy. You just you know, it sounds like a pain to click around on your browser And I'll pull this file, but it's not and then running it you basically select your scripts You add them here And you add the weight and then you go into parameters And you say I want to you know, how long do you want your test to be how long does it have to be a peak? How many do I start with and when what's what's that peak? Okay Yeah, I wanted to offer a theory on why the reverse proxy cache and Use cash didn't offer you a huge advantage go for it. I'm thinking that possibly since you started out with memcash before those Memcash has essentially caching the page and so you're already sort of using memcash as a reverse proxy cache and so you're not going to get as much advantage over the the sort of Caching that would happen before that. That's a good point. I So memcash does you're right. I'm not using the database to bit to cache my menus pages blocks views Okay, I don't know part of the page. So you have a page cache with that view already cached into it. So You're probably not going to get as big an advantage You might get more an advantage on the the logged in users. So there's still that incremental increase, but Not the huge that sounds right the huge jump that you would expect I think if you had done them in a different order or if you turn Some off you would see different Yes, this is true I could have had more drama, but there's something to think about too Thank you for that it seriously that's that's that's that makes a lot of sense Yes, thank you very much You mentioned you several times that PHP was a limiting factor You didn't maybe I missed it. You didn't mention what version of PHP you were using Would go into PHP 5.5 with the new opcode cache to help your performance Okay, sorry, which version I'm using 5 4 4 So you might see some performance Improvements by switching to 5 5 with with the new opcode caching is it's just a lot more performant great But you may did you also look into you know running a xh prof and seeing it perhaps there were extra database calls that were being made or extra You know optimizations that could be made on the code level Okay, absolutely. So xh prof is is a tool that I absolutely highly recommend what we try to do some It you basically breaks down your PHP execution and takes a look at what's really happening and it allows You as a developer to hone in on where the problems are it completely agree I try to keep this talk pretty simple not none. You know, that's just kind of my approach to to it so we used very Easily accessible tools to kind of take a look at it without really having to die dive too much in the code That's kind of what I my audience mostly in reference to your assertion there that PHP was your limiting factor And so maybe there are some improvements that could lower that the resource hog on that. Yes, absolutely And then I will ask question. I'm sorry for everyone else You said that these tests take quite some time to run I'm wondering if you see that there's a role in running these load tests as part of continuous integration You know, that's a great question, too So I had a conversation with somebody that was running a sub-CNN site on On And I said they had they had a lot of CI going on we do a lot of CI in our shop we don't we don't do load testing and What okay, here's what he told me what so it was pretty interesting what he said is when they run He doesn't do performance testing but just by running their Tests against their dev server What he is able to do is if he sees a big difference, you know, whether on the web transaction level or on the Server level from the previous CI or maybe the last week CI load then he knows that he's gonna have a performance issue on a server So do I see I think that it would be Wise to do so periodically to make sure that you can still guarantee your SLAs to your customers wherever you're servicing if they It it it Especially if you have variant traffic traffic, right? I think this is you look looking at you relic as your CI is kind of a Like a It's not it's not really load testing But it's a way of looking at see if you have you know just a quick way of looking as you have performance right because what you want to do is you want to look at You want to look at your Performance over time right so you want to look at what happens when something changes right and you want to see those That's why you know, that's why graphs are great because they give you tell you a story in time So you want to take a look at a story in time Thank you my pleasure. Thank you Yes, I Would like to share for it's a related to the last question. There is a lot to call Guttling That you can use with Jenkins Jenkins as a plug-in for Guttling so you can make a lot test based on The value that not test from Guttling Gives to you you can fail the build or what is it called? Guttling Guttling It's an open-source load test. That's looks like it But you see that I could get a Gatling gun for sale Second question is if you know such Cash warmer module both for Drupal cash or we use varnish and Sometimes we have to Refresh prime the cash. What do you mean like priming the cash? It's a module Drupal module that Warm the the cash. Yeah, what is it called? I don't know Oh, there is such a so What I have so actually this is a really good point What I demonstrated one point we had a really big failure This is this is something that thank you for bringing us up and I hope This this helps Really bad things can happen When you're running a stress test and you make an insert and all of your caches get refreshed so Generally, I didn't do it at this time But generally we do have a script that runs at a very low percentage that does update Some piece of content during stress testing so that the cat it forces cash flush because that cash flush Basically, you know all the things that we did here. We assumed that nobody's editing any content, right? So what happens that content edit your caches and cash just flow and then like basically throws things away This is like a perfect scenario like nobody's touching content. So What I have what we've seen in those scenarios is I Mean, obviously, you know, if that's a problem you have to ask yourself, you know How how new does my content have to be? I I don't remember off off hand But I remember speaking with Jeff Eaton about Microcaching so basically the only refresh that piece of content that they know is Always changing but not flushed the rest of the cash, but I don't know the module. I'm sorry. I think it exists Sorry Cash warmer the module is called cash warmer Yes, I work with memcash Love memcash, but I had a problem when we have to use more than one memcash server. Oh Because I Think that memcash is not think as a for a cluster so you have we have for example two different memcash servers, but We are not able to Work as a cluster So the front web server try I know in Drupal you can define an array with all the memcash server but when he hit one memcash you found the the key When he hits the others memcash server There is not the same key. So you have to Reload the page feel the memcash Well, are you saying but you if I remember to call correctly what you do is you define You don't have like redundant memcash servers. You define the bins and you split them up between different servers We have three for three web front web server. Sorry for I apologize. Yeah, you know We can try to talk about this I'm actually probably not gonna be able to answer your question on that That's that's too deep for me, right? Sorry, I probably won't be I haven't set up cash memcash servers myself on multiple servers. So I will Okay Yeah Thank you so much. All right guys. Thank you so much. I really appreciate everybody sticking around at six o'clock Two minutes after six. Sorry for going over. Thank you so much. Have a great rest of the Drupal town