 into my keynote. So my name is Stella Cotton. I also maybe destroyed my voice a little bit last night at Ruby Karaoke. So I'm probably going to be drinking a little more water than usual. So I decided to completely rip off this like gimmick that my friend Lily did in her talk on Wednesday. And basically any time that I take a sip of water, I would like for you to just like cheer and go crazy. So I'm going to start it off. We're going to practice this. Yes! You know what you're doing. Okay, cool. So we're going to start off, we're going to get a little interactive, which is going to be funny because I'm not going to be able to see your hands. So here's the scenario. The phone rings, it's the middle of the night, the site is down. Every single person on your engineering team is just out of cell range or they're at RailsConf. And it's just you. So raise your hand if you feel like you know exactly what to do or where to start. Okay. All right. A couple of veterans in this audience. All right. So close your eyes, please. I can't prove that you're doing this. So nobody else can judge you. Try again. Raise your hand if you feel like you know what to do, where to start. Okay. Everybody's very honest in this audience. My hope is that by the end of this talk, the people who raise their hands will get some ideas for how to share with their team and people who might not be as comfortable. Ways that they can understand site availability. And for the rest of you that you'll find ways to get comfortable yourselves. So one of the big challenges with site availability is that often it just catches us off guard. So every day we might practice refactoring, testing. A site outage can happen for a lot of really random reasons. And so this randomness is like a little scary. And I'm going to start off by telling my scary story. So it's July 2015. I'm working as a Ruby developer at a company called Indiegogo. It's a crowdfunding website. And it's a site where basically people come to fund things that matter to them. And we had a lot of successful campaigns. We had an Australian beekeeping campaign that raised like $12 million. We had a campaign to fund the movie Super Troopers 2. And in July of 2015, the news breaks that Greece is the first developed country in the entire world to fail to make an IMF loan repayment. And through a strange course of events, this action manages to take down our entire website. And it's in the middle of the night in California. In Europe, they're waking up, they hear this news, and they also hear this incredible news story about this really great British guy who just has this wild scheme where he's going to end the Greek financial crisis. And he decides that he wants to help the Greek people out. So he launches a 1.6 billion euro campaign to bail out the country of Greece. And his rationale is like everybody in Europe throws in three euro. They're fine. They meet their goal. They bail out Greece. And so traffic just starts building. And people are contributing really small amounts of money at super high rates. And eventually the Indiegogo website goes completely down. And it doesn't fully recover until we put a static page up front to handle the load. And for me, this is just so unlike my day-to-day coding, deploying, even like triaging and investigating 500s. And honestly, I was pretty unprepared. And I was pretty afraid. And I wondered afterwards like, how could I have been more prepared for this? Low testing is a way that you can programmatically simulate many users making simultaneous requests to your website. It acts as sort of a low stress simulator into like really high stress situations. You can play around, build your confidence. And you can kind of come up with your own site availability playbook before your disasters occur. And as an added benefit, you can also identify some bottlenecks in your application that could be dangerous in the future. And you can also measure performance benefits of changes that you make along the way, which is really important. The downside of load testing is that when I started, because I don't come from a DevOps background, I'm just regular Rubyist, I found a lot of like high level instruction that gave commands to just sort of kick off the load tests. And then there was a lot of like really technical instruction about site performance. But there wasn't a lot of instruction on how to bridge those two things. So it's a lot of like trial and error and frustrated Googling. So I like to share a couple things with you in this talk. I want to talk about how I get started with load testing, how you can turn up the volume on your load testing to really add some load to your site, and then to use just a couple of tools to explore the results that you got. A plus. So how do we get started? So we're going to kick things off, we're going to start just by preparing our load testing tool. We're going to talk about the tool Apache Bench because it's preinstalled on many Linux boxes. And it's just a really simple tool to get started with. So this is the command that starts with A B for Apache Bench. And it's really like all you need to kick off your first load tests. So we'll break it down just a little bit further. So you want to choose an endpoint for to like send the simulated traffic to and to start a good a good idea is actually a simple static page that doesn't make any database calls. It's just a way to get a baseline. And once you're really confident that your load testing tool is like actually configured correctly, you just want to start choosing pages that will bear the brunt of traffic. So for example, Indiegogo, it's our actual campaign pages, like our homepage is not is not where the traffic is going to go. But for your site, it could be the homepage or it could be something else. And you can start by testing local hosts, if you're just kind of playing around. But the load test itself is a resource, and it's going to be consumed by your computer. And because it's using computer resources, it's going to take away available resources for your web server. And it's just going to it's going to really impact your results, especially as the load starts to increase. So on the flip side, running a load test against production website can impact your user experience or even bring down your website. So it's best to point to a staging server or a production server that doesn't host any external traffic, unless you're like specifically looking to do like stress testing on your production system. But if you're just trying to like investigate, don't point it to your website. And because like at least one person, Lily was thinking it. Technically, you're only you can read this. So in that same Apache bench command that we saw earlier, you'll want to configure the traffic that you're going to use to simulate your tests. To finish up this basic command, you need to provide two things. One is the number of requests that you want to execute concurrently, which is the C flag, and the total number of requests over the life of the load tests. And that's the end flag. So for here, like, we're starting with a concurrency of one and enough requests that the system will get time to warm up, because that's important. So this basically means you'll execute one concurrent requests 1000 total times. And you just want to make sure that you're able to run the load test for a few minutes. And to divine our terms a little bit, when I talk about requests, like what is a single request in this scenario? It actually doesn't mean a single visitor to your webpage, typically. Depending on the number of assets that your page is loading, or a synchronous client request that you're that your frontend application is going to ask from your server, your single unique visitor could actually make a lot of requests in one visit. And then on the other side, browser caching of assets actually means that a return visitor might make even fewer requests than a new visitor. And there's another thing to keep in mind, which is that Apache Bench and server side benchmarking tools won't actually render HTML or execute your JavaScript. So the latency times that you're seeing here are just going to be a part of your user experience. It's going to be, like, the very baseline. So there's going to be more delay for your users on top of this. So let's look at just an example of an Apache Bench output. Here's a snapshot of the full results, and we'll zoom in just a little bit. And zooming in, like, we can see that Apache Bench will show us the percentage of requests served within a certain amount of time. So when you analyze the initial results, you want to validate that this latency that you're seeing just matches the latency you would expect from a request in real life. So low testing is kind of a black box. If you just, like, start plugging in random numbers and you just don't really understand your system, you can get, like, really amazing results. And you're just, like, yeah, my site is amazing, and it's actually not real. So you want to make sure you have, like, a hypothesis for how you would expect the system to perform. So if you, like, have any numbers around how your production server performs, it could just give you a ballpark for your expected request time. So, for example, if you, like, look at the line that says 99th percentile latency, this is saying that 99 percent of that 1,000 results that we were, a request that we made were served in less than 693 milliseconds. And if you have a graph from your production response times and you're able to see that 99th percentile, and it's showing, like, 650, you're probably on track. But if you're seeing, like, 100 milliseconds, you should be investigating an issue with your load testing setup. And a really common issue that would cause you to see, like, really good results in load testing and shit results in production is that you're testing an error page, especially if you're using a staging server for your load testing. For example, if you're using a basic auth, you're going to need to actually add that to your Apache Bench command with the A flag, because otherwise you're just testing how well your server returns a 400. Another common issue is hitting a 500 page or redirects. Apache Bench won't actually follow through to those redirects, so it'll just log it for you as a non-200 request. And the easiest way that you're, to tell that you're load testing error pages is to look at your Apache Bench output. It's going to show you non-200 requests, and if that number is already zero, even without significant load on your site, you're probably running into one of these issues. And so if you tail your server logs while you're actually running a load test, you should see the server logging the issue. I love the enthusiasm. And it's also, there's kind of a weird thing where you need to differentiate between non-200 requests or responses, excuse me, from failed requests. Apache Bench will remember the content length of your very first request, and if it changes in subsequent requests, so if you have a dynamic session value, any other reason that your content length might change dynamically, it's going to register these in a very mostly non-helpful failed request section. And you'll see it in your output. Just make sure that your logs are showing that you're rendering correct pages, and just ignore it. You can also add a dash L flag in later versions of Apache Bench, and it'll accept the variable document length. So feel pretty good. This low key, low concurrency, load test isn't running into any errors, so we'll just start, and we'll start turning up the volume. And as we start to turn up the volume on our load tests, we can see how our app starts to be impacted by load. Let's talk first about how queuing might affect the user experience. So as we increase load, we'll start to see the average response time of this page increase as well. So there's something called Little's Law, which is a fundamental law of queuing theory, and it tells us L is lambda times W, which is like the average number of customers in a stable system is equal to the average effective arrival rate times the amount of time the customer spends in the store, which sounds kind of like ridiculous, whatever. But if you think about it in terms of a real-world example, it's actually super intuitive. So let's think about a store where there's one cashier checking people out. The total number of customers who are waiting in that line over a specific period of time is going to be the rate that they come in times the total time that they spend in line. So like you get a new cashier they come on duty, they're new to the grocery store game, and they're twice as slow at checking people out as your prior cashier. And if people just keep getting in line at the same rate, the law is basically just going to say that your line is going to get longer and it's going to take longer for people to get through the line. So it's kind of intuitive. And then you can adjust that equation to help you understand why you would see increasing response time as you add additional load. Because that mean response time, total response time, is going to be the mean number of requests in the system divided by the throughput. So if your server is taking 500 milliseconds to process a request in that same study, which actually might not even happen at a load, it might be going up, the total response time is going to increase if you add more requests into your system. So a simple like web application acts like a giant queue that requests a process in. So in this example, we'll talk about a web stack that consists of a few things. So we've got a proxy server, application server, database. So the proxy server sits behind your firewall and it communicates back and forth with your client, aka your users, and your web server. So a common example is going to be HA proxy or Nginx. And then next, there's the application server. And this is going to deal with the request that need processing. It's going to make calls to your database. And in our scenario, we're going to talk about a single threaded server like Unicorn. And Unicorn has a master process. This is actually something that Aaron brought up in his keynote. It has a master process that has a configurable number of child processes or workers that do all of your work. So even though you have a single server on a single machine, it handles multiple requests at once. And so it's like having multiple cashiers at your grocery store. There's other web servers like Puma. And those are going to use multiple threads instead of multiple processes, but it's kind of a similar, similar idea. And then in this simple stack, we only have one database. So all the cashiers at your grocery store are all making requests to the same repository. And they can all live on the same machine, like the same physical machine, and share the same resources, or they can live on different machines. It's just an example. And your website stack will probably look quite different. And so as we add more and more requests to the system, little's law is going to show that this average response time is going to increase. And eventually, if you just add more and more requests, but you don't add any more cashiers to process the requests that are coming in, your weight is going to grow too long, and your users are going to see a timeout. And your proxy server will allow a client request to wait a preconfigured length of time, and it's eventually just going to say, I'm sorry, I can't help you. As you increase the load on your system, there might be a moment where you're like, oh, maybe I should just increase my application server queue. So it accepts more requests. But the danger here is that under extreme load, your request can remain queued up at your application server level, even though your proxy server has long since returned that timeout. And so it means your application server is actually going to keep churning on those requests, but no one will actually be around to see the request, the page rendered. And it also goes against this sort of recommendation of queuing theory, which is that single queues for multiple workers is more efficient when job times are inconsistent. And if you think about this in practice, I give you two available web workers, they can execute jobs, and one is processing this huge file, and one is processing a little tiny file. If another request comes along and you just arbitrarily start queuing up the short requests behind the worker that's downloading a huge file, your short request is going to be unnecessarily blocked by the long requests when another worker could have executed the request in a faster period of time. There's also another maybe strange instinct that you might might feel where you increase the timeout threshold on your proxy server, and you just do it higher and higher, so you'll decrease the error rate. But a user that doesn't see a web page load after a minute or two is going to have an equal or probably worse reaction than just seeing the timeout from your proxy server. But it's also not just your application that gets affected by load, you'll start to see effects on your operating system, like on the operating system level that's actually hosting your application. And so some of these configurations might be in place already on your production machines, but especially when you're bringing on new staging server or a local machine, you'll probably find that you need to tweak them when you get started off with your load testing. Proxy server has to keep track of each incoming client requests, and it basically does this by keeping track of the IP address and the port numbers associated with each request. And because each of these requests takes up a Linux file handle, you're going to start seeing errors like potentially like too many open files. So just be sure that your operating system isn't like arbitrarily capping the number of file handles or file descriptors that your proxy server can access. Like 1024 is a common default. And since your proxy server will use one handle for an incoming request and one for outgoing connection, you actually can run up against the limit pretty quickly. You can actually see on your machine what these are for your user that hosts your proxy server by using the you limit command if you're using Linux. And there's a rule of thumb for calculating this number that's given by the 2.6 kernel documentation. And it basically says that each file handle is basically one K of memory. Don't allocate more than 10% of your available memory to files by default. And so you get about 100 file descriptors per megabyte of RAM. And so you can start there and see if your issues go away. And then so you want to actually check two levels. So one is the system level. And you can edit that by editing this is control configuration. And that's going to be just like the global default for the whole operating system at the higher level. But you'll also want to make sure you adjust the user limit since that's what your proxy server is actually going to come up against. So you'll want to set the soft limit and the hard limit. And those need to be less than that max limit that you set for the whole system. And you save and you close your file and you reload your changes with the sys control command. And then finally, if your proxy server has a file limit configuration, you'll just want to make sure that you adjust that as well. This is an example for engine X, but it'll be specific to your proxy server. Another issue that you might run into is TCP IP port exhaustion. So there's a finite number of ports that are available on your machine. And only a subset of those ports are actually available to your application to use. And these are called ephemeral ports. They're used to handle web requests. And once the process is complete, they're going to be released back into the system so that they can be used on your next available request. And you can tweak two settings to increase the number of ports that are available. One, you can decrease the time wait. So the port is recycled back into the system more quickly. So even though a port is not in use anymore, you're going to see like a period of time where the system holds it back. And this prevents stray packets from leaking across requests. And you can also configure your operating system to just increase the available port range altogether. And that's going to be different on each operating system. And another thing is when you're running your load tests, you just want to make sure that there's a few minutes between each test because these ports will still be in use and you'll want to let them recycle back. Bless you. The unicorn documentation has some really good suggestions for operating system tuning for this for these settings. But like at the end of the day, your application is a very special snowflake. It's you have to think about how your application behaves in the wild and how that's going to affect performance in a way that isn't really being accounted for in your sterile testing environment. So one thing is the relationship between user actions, cache busting and database queries, for example. So if you're testing an endpoint, your URL, and it returns user comments from the database. So an important consideration is like how many rows and what's the complexity of query that you're making against that database. And if there are no comments that are seated on your test machine, or very few comments seated, but an expected user behavior under high load is that people are just like commenting left and right. It's like the Justin Bieber website and it's like comments coming in constantly. Like that's a different real world feeling than in your low testing environment. So even if also if you're returning, you decided to see the million comments and that's on your low testing environment, you're still going to see that query get cached after your first request. So if in your real life scenario is that your user is getting like 1,000 comments a minute and it's continuously busting the cache, you can actually simultaneously run scripts alongside your load tests which will sort of simulate this like true effective comment creation. Another scenario to consider is actually blocking external requests. If you're experiencing heavy load and all of your workers are overwhelmed, any worker that's making a slow blocking HTTP request to like example a payment processor, it's going to add to the overall latency experience for everyone waiting behind that request. And so at this point, you should be really comfortable with sort of the life cycle of the web request in your stack, what logs to look at, like where to keep it out for errors. And you should be confident that when you're running a load test, you're actually load testing your infrastructure and not just like coming up against the limits of your testing framework. And once you're there, you can use additional tools to sort of understand like what are the actual bottlenecks that you're coming up against. And one place to start is to investigate the limits of the actual machines hosting your web server. So as you increase load, you can use top really common tool when you're running your load tests to view what percentage CPU and memory are being consumed overall and who's the specific culprit might be. And one thing to keep in mind is that the percentage displayed is like a percentage of a single CPU. So in multicore systems, your percentage can be greater than 100%. There's also something really nice called HDOP, which I really like. It's probably not preinstalled on your system, but it has just like a really great visual representation of CPU consumption across cores. And it just makes you look like a total badass comparatively. And like when you think about these resources, they're hosting your proxy server, your web server, maybe your database. It's all a zero sum game. So if you recall from earlier, there's like a server like Unicorn. There's a single master process that runs and you can configure sort of an arbitrary number of sub processes to handle your web requests. And this is awesome because more workers mean that you can process more results, more requests simultaneously, but they're also consuming the resources on your physical host machine. And so you want to make sure that you don't have so many web servers configured or web workers configured that your system is just running out of physical memory and it's hitting the swap memory, which is like located on the hard drive. And it's really, it's like a lot slower to access than physical memory. And you'll see that slow down. So if you look at the average memory consumption for each of those like workers on your machine, if you don't have a memory leak, you should see pretty consistent memory behavior and you can calculate how many workers that you can reasonably run reasonably run on your box. And if you're running other applications on the same box, those are also going to constrain the resources that your Ruby application can consume. So for example, if your database is on the same machine, you actually might run out of CPU resources as you increase your web workers long before you'll actually start to be able to test pressure on your database itself. And in real life, when site availability is compromised, you might find it's actually pretty easy to spend up more workers to handle traffic. But if they're all trying to access the same database and they're all waiting on the database, it can really, it can cause huge issues. And so if you're wanting to investigate this scenario and how your database is behaving under pressure, you might want to actually point it to an external database, which is pretty easy in Rails. If you have a database server handy, you just configured your database.yaml to point to an external address rather than local hosts. And you'll also need to configure your firewall to accept external connections, but that'll be pretty specific to your setup. But as a PSA, please don't use your production database in this scenario because you can bring down your website. Please don't do it. I'm not responsible. So another thing that a lot of people are probably already familiar with is using application performance monitoring tools to investigate performance issues. One of the most useful tools is just being able to transact, to trace your transactions, which is just going to collect data on your slowest HTTP requests. New Relic and Skylight are third-party tools that typically come to mind. But ideally, you should try to set up the ones that you use in production on your load testing server so that you can see what issues would actually show up in real life. But you can also use like the gem rack mini profile in production or on your load tests. But just remember, if you're running it in staging, switch the environment to run the application in production mode so that it'll actually profile the requests. And there's a couple of tools you can use to see how load is impacting your database. It's disabled by default, but you can turn on the slow query log in MySQL to see a log of SQL statements that take more than a certain time to execute. And you can configure that time frame to be between one or zero in 10 seconds. And there's also a situation where you could be calling a query just way too often. And it may not be expensive enough to show up in the slow query log. But if you look at show process list, you'll be able to see all the queries that are currently running. And if one query is suspiciously running frequently, it could be a bottleneck or a performance regression that you didn't realize that you had. Especially if it's one of those special snowflake queries that we talked about earlier where the cache is just frequently busted when it's under load. And Apache Mint is not always going to be the best tool on the market. It's very simple, and it's available to you probably right now. But there are other tools. Siege is another tool that allows you to use a configuration file to hit multiple endpoints at once, which is really convenient. And the wrapper Bombard allows you to sort of programmatically ramp up the load on your application, which is really, really nice. There's also Bees with Machine Guns, which is both awesome and has the most awesome name. It's an open source tool brought to life by the news application team at the Chicago Tribune, actually. So it easily allows you to spin up a lot of micro EC2 instances to load test your web applications. If you find that just running from one box is not enough load to really make an impact on your site, it's probably because your site is a lot faster. And then there's also Flood.io, which is a paid service, but they maintain Ruby Jmeter, which is a Ruby DSL for writing up test plans for Jmeter, which is a more heavy-duty load testing tool. And so your app might look pretty different from the app I've talked about today. It might be hosted on Heroku. You might be using Puma, a set of Unicorn. You might be using some kind of complex fault-tolerant distributed system in which you have a totally different set of problems. And I'm sorry you came with this talk. But the great thing about load testing is that it's a framework for curiosity. And it gives you some tools to kind of shine light into your pretty dark and scary places in your app. And it's really, it can be a lot of fun. So thanks everybody. I'll tweet out a link to my slides right after this if you're interested in taking a look. You can find me on Twitter, practiceCactus, and come up afterwards if you'd like to ask me any questions. Thanks.