 We're doing some stuff at our booth, if you haven't come by yet, we're doing a cool thing if you go to HerokuLove.com. You can vote for your favorite open source project and we're going to donate $500 to that project and there's a few just Ruby trend questions, I think there's four of them in total. Here's the QR code but I don't actually expect you to scan that with your phone. We're also doing a thing, Seru Heroes is not happening anymore but I did enjoy the spirit of saying thanks to people in the community that helped you with your journey as a developer and so we have these cool postcards at our booth where you can write thanks and either give it to the person if they're here at RailsConf or you can just post it up on these whiteboards that we have at our booth and then we will either tweet them or figure out a way to make that public. Right after this talk there is a break, there's going to be a bunch of people from the Rails core and contributor team at our booth doing office hours so if you have questions or want to meet those folks like Aaron or Aileen or Raphael, Bitbol and other people you can come do that, get those questions answered. I know a lot of people came by and tried to get shirts and we ran out within like the first 30 minutes maybe even less but we will have some more shirts tomorrow so if you do stop by, tomorrow we hopefully will have shirts for you. So with that I'll give this to Nate to take it away. Thank you. Thank you. All right so this is Heroku's sponsored talk, I don't know if this is on, I'm on here. So I do not work for Heroku, I'm not a Heroku employee but they were very nice to, nice to have to give me this slot. This talk is called your app server config is wrong. When I talk about application servers what I'm talking about are things like Puma, passenger, unicorn, thin, web brick. These are all application servers are the things that start and run our Ruby applications. But first a little bit about who the heck you're listening to right now. I am a skier, I recently moved to Towson, New Mexico just for the skiing basically. I also am a motorcycle rider, I've ridden my motorcycle across country three times on dirt roads. This is my motorcycle taking a nap in the middle of nowhere in Nebraska. I was also on Shark Tank when I was 19, I was on the very first season. That's me on Shark Tank. One of my readers gave me this gift, I enjoy it very much. I'm also a part-time meme lord. I like make spicy programming memes, I like this one, the other spicy meme I made. You probably know me though, not from many of these things but through my blog, it looks like this. I write about Ruby performance topics like making Rails applications run faster. I also have a consultancy that I call Speed Shop where I work on people's Ruby applications to make them faster, more performant, use less memory, use fewer resources. I have written a book, a course about making Rails applications faster, it's at railspeed.com, it's called Complete Guide to Rails Performance. Incorrect app server configuration is probably the most common issue that I see on client applications. It's really easy to knee-cap yourself by having an app server config which isn't optimized. It's easy to over-provision, it's easy to have an app server config which makes you require more dinos, more resources than you actually need. It's very easy to spend a lot of money on Heroku which is great for them but it's easy to scale out of your problems by just cranking that little dino slider all the way to the right and now I don't have a performance problem anymore. If you're spending more per month on Heroku than you have RPM, you probably are over-provisioned in this case. You don't have to spend $5,000 a month on your 1,000 RPM app. Maybe if you have some really weird add-on that is something unique to you, maybe then you have to but that's just kind of a rule of thumb that I've found and been able to get to that point at least on client apps, it's not less than that. The other thing that can happen with a misconfigured app server is you're not using your resources to the, you're over-using your resources. You're using too small a dino for the settings that you have set. Let's talk about some definitions. Container. I use the word container in dino interchangeably because that's kind of what a dino is, right? It's a container in a big AWS instance or whatever they use and you get some proportion of their larger server. This is a Heroku talk so I'm going to be using Heroku terminology. I'm going to say dino but this could also, a lot of this stuff all is not unique to Heroku. I'm just going to be discussing it in Heroku terms. A worker. In Puma, which I'm now a Puma maintainer with Richard, in Puma we have workers. I don't know what they call them in passenger or unicorn, they might use a different word but basically all of the top three modern Ruby application servers use a forking process model. What that means is that they start your application, Rails app.initialize or whatever and then they call fork and that process creates copies of itself. Those copies are what we call the workers. That's probably one of the main config settings is how many processes we're going to run per dino. A thread. Okay, well I guess we all kind of know what a thread is but I just want to draw the difference here because it's very important in regular C Ruby, the difference between a process and a thread, right, is we can run, processes run independently and so two processes can process two different requests at the same time but two threads cannot process two requests concurrently. We can do things like start a request and start creating a response in one thread and then maybe we're waiting for a database to, a database call to return and we can release the global VM lock in Ruby then pick up another response in a different thread, do some work there and then go back to the original thread. So we can do some limited concurrency in Ruby. It's all usually just IO but in general one thread, one request. Okay, so here's the overall process and we're going to go through each step one through five. So the first thing we're going to do is determine theoretically how many concurrent workers, how many requests do we need to complete concurrently. We're going to determine how many users, sorry, how much memory each worker slash process is going to use. Then we're going to choose which container size we want to use, so which dino size we want to use and how many workers, how many processes we're going to put in each dino. We're going to check our connection limits, how many connections we make to the database, make sure we're not going over those limits and then we're going to deploy and monitor queue depths and queue times, CPU usage, memory usage, how many times our processes restart and how many times we have timeouts. Okay, so this is a little hobby horse of mine. This is, Little's Law is a concept from queuing theory. It's used a lot in factory management so when they want to know how many packer machines they need on a floor, they use things like Little's Law. It's a very small equation, which is why it's very small on this slide. This is the fancy Greek letter version. If you can Google Little's Law to get the process engineering version of it, the version that we're going to use here is just to say that the number of things inside a system at any given time is equal to the rate at which they arrive multiplied by the time they spend in the system on average. So translating that into Ruby application server terms, the number of requests that we serve at any given, are serving at any given time is on average, the number of requests we get per second times our average response time. To give you a little, yeah, and dividing the average number of requests in a system by how many workers we actually have gives us an idea of how much we're utilizing the workers that we have. So I'm going to work through, if that was a little confusing, I'm going to work through an example here in a second. It's important to know this is just on average. It kind of assumes that our requests are arriving at equal intervals. It assumes that like a request will arrive every 300 milliseconds. That's not the case. We know requests arrive in bunches. They're randomly distributed. So this is just sort of a starting point in the guideline. So let's walk through some numbers here to give an example. I found these numbers in an old Envato presentation from 2013. Envato runs like Theme Forest, if you ever use that, it's a big Rails app. So they say that they receive 115 requests per second, which average a 147 millisecond response time, and they use 45 workers, 45 processes. I forgot which application server they use, actually. So what we do is we multiply the number of requests per second, 115, by the average time it takes to complete run requests. So I have to keep my units the same here, right? So this is in seconds, and that is now in seconds. And that gives me 16.9. So on average, Envato is processing 17 requests at any given point in time. They use 45 workers to do that, 16.9 divided by 45, 37%. So they're using 35% of their workers at any given time. So what I tell people to do is to do this calculation for themselves. You know how many requests you get per minute, that's right on the Heroku dashboard. And you know your average response times, that's also on the Heroku dashboard. Multiply them together, and multiply that again by a factor of five, so you're using 20% of your theoretical capacity. And that gives you your initial estimate of how many processes you need, okay? Five is just the fudge factor. That's taking into account the fact that your response, your requests don't come in uniformly one after the other 200 milliseconds apart, or whatever your number is. And if that was all very confusing, I find Heroku's Dino load number on their dashboard to be fairly accurate as a starting point. So this is at the bottom of the dashboard, this is impossible to read. So these numbers here on the left go from zero to eight. The dark blue line here is the average load over one minute, and the lighter line here is the maximum load for the last minute. Just look at that max number, and so it looks like on average here my max load is five dinos. So run it at five, okay fine. And of course this does take into account like the fact that it's running, it needs five dinos with whatever your config is at this particular moment. But it's just a starting point. What you'll probably find with Dino load, and what most of my clients find is that this number is a lot lower than the number of dinos that they actually use because their app servers are not configured correctly, so we'll get into how to fix that. So that's step one, estimating our work account. So we know how many processes we need. We need 45 processes to serve our load. So how do we divide that among containers? Do I want to use a 1x Dino, 2x Dino, now you have perf dinos, perf m, perf l, what's the right choice? So I find most people mess up with container sizes because they have a incorrect mental model of how Ruby uses memory. Most people think application memory graphs should look like in Ruby is like this, they should look like a flat line. We've been duped. Duped. Bamboozled. We've been smackledived. That's not even a word and I agree with you. That's not true. They look like logarithms. So a regular Ruby application will look like, their memory usage over time will look like this. So we have a pretty steep startup period. This is when we're requiring code, building out caches like active records, statement cache, and a bunch of other things like that. We're creating these long lived objects and then after a while it'll start to level out. But it never goes flat and I don't want you to think that it ever will. This is probably partly why Heroku restarts your dinos every 24 hours because if they just let them run forever, this line would just eventually go on forever. It doesn't mean you have a memory leak. If memory usage is just as flat, that doesn't necessarily mean you have a leak. But you just need, and I'll talk a little bit more about that in a minute, but you just need to be aware that that line never will completely level out. So you're going to have to use a little bit less memory than the max of your dyno. You're not going to be able to run it close, right next to 100%. You're going to have to give it some more head room. A common mistake I see here is to use things like Puma Worker Killer and then give it a RAM number and say, kill my Rails process, what it's more than 300 megabytes. And if you set that number too low, your memory graph, instead of looking like that long red line, it looks like this. It looks like this purple stuff and you'll see that sort of like it goes up to here, it kills itself, it goes back down, kills itself, and people see that memory graph then and they think, wow, look at that, that sawtooth pattern, I must have a memory leak. But really what's happening is they're not letting their processes live long enough to get to that stable point. People sometimes use Puma Worker Killer as like a faster restart, so you can also give Puma Worker Killer like a six hour limit and say restart my process every six hours. That can also produce this kind of, this memory graph as well. So what I'm telling you is let your process run for 24 hours and if you have to tune the number of processes per dino down to do that, do it. And just as a temporary thing, you know, tune web concurrency down to one, let that process run out and see what it looks like after 24 hours, you're gonna have to run more dinos, but see what it looks like after 24 hours. If it does this, if it eventually starts to level out, that's the real number of how much memory you need per process. So deploy with one X or two X dinos, one worker per dino, five threads per worker, and look at the average memory usage after 24 hours. The average app will come out to about 256 megabytes all the way up to 512. So that's the number you're getting, that's average, 512 is not great, but that's kind of what happens with big old mature Rails apps. They use a lot of memory and that's what you get. There's really no magical way to reduce that number. I have another, if you go back, I have a RubyConf talk that I gave, RubyConf this year about reducing memory usage, but there's no magical way to do this. It's a long and hard process. Okay, so that's step two. We determined how much memory we use per process, per worker. So now, how do we decide what size container to put it into? They should feel nice and comfy in their dino. You should be sitting at 80% memory usage in that dino. It should be just right, not hitting 100% and starting to swap, but just sitting at 4 5ths, 2 3rds memory usage of the total capacity of your dino. So these are the main dino types that you're going to use in production. I didn't include hobby and free for obvious reasons. So the main difference is that most Rails applications are going to care about is the memory, right? So you can read the numbers here, I'm not going to read them out to you. Because Peroku dinos are shared, kind of like a VPS, although a 1x and 2x dino technically have the same count of CPUs, the 2x dino gets two times the amount of CPU time and so on and so forth with Perf M and Perf L. So the Perf M dino should have 12x, well, I guess 1, 2, 3x the CPU capacity of a 1x dino. Although from what I understand from what Terrence told me, so blame him if this is wrong, it's kind of interesting here, two x dinos and one x dinos have access to eight hardware threads. The Perf M dino only has access to two, so that's kind of like an interesting weird difference between Perf M and all the other dinos. Although Perf M does have more share of that time than two x and the whole reason Perf dinos exist is because you do not share CPU time with other people's Heroku apps. So you should get more stable performance from a Perf dino because you don't have someone else's badly tuned Rails application sitting alongside it on this whatever server is actually backing it and crowding you out of the CPU time. Another interesting thing that I noticed when comparing Perf dinos to the 1x and 2x is that the Perf M dino, it does cost $250 a month, which makes it a little bit less cost effective than the other dino types. Perf L dinos are just as cost effective in terms of like dollar per compute unit and dollar per RAM gigabyte as the 1x and 2x dinos but Perf M you take a little bit of a hit and I already talked about 2x dinos have eight CPUs which might mean they can support higher thread counts than like a Perf M dino. We'll get to how to set thread counts in a second. So if you have more than 25 app instances, if based on Little's law, you need more than 25 processes, I would recommend using Perf L. The performance dinos do get more stable, consistent performance than 1x and 2x because they don't share the server with anybody else. Otherwise, try to use 2x. The reason you don't want to use 1x is because you should be aiming to have at least three workers, three processes per dino. If you can't fit three workers inside of a 2x dino, you might have to use Perf M. The reason that you need three workers per dino is because of the way Heroku does routing. So requests can be routed to any random dino in your application, or sorry, any random dino in your formation, I guess. If you only have one worker per dino, if Heroku randomly routes a request to that dino and then that worker is already working on someone else's request, it's going to sit there and it's going to wait until that request is done. This is kind of goes back to an old queuing theory thing where instead of having at a grocery store, instead of having multiple checkout lines, you know, you're like at Walmart or whatever, you have 10 checkout lines. It's more efficient to have one line and then multiple people at the checkout like the way Whole Foods does it, if you've ever been to Whole Foods. So the more workers we have per dino, the more efficient routing we can get out of Heroku. So generally, I've found that if you have at least three workers per dino, you're maximizing your routing performance. I just said that. If you're struggling to fit three workers in a 2x dino, you can try reducing thread count. If you have Puma or passenger enterprise, if you have multi-threaded application server, reducing the thread count to three, if you're running high-thread counts can help, or you can use Jmalloc. SAMS Afron at Discourse has been sort of the pioneer in using Jmalloc for production Ruby applications. You can Google him and read about it, read about how to do it yourself. It can sometimes reduce memory usage by 5%, 10% and give you that extra little bit of headroom to squeeze into a 2x dino. There's a Jmalloc build pack, which I helped to maintain. So you can do this on Heroku. If you search Jmalloc build pack Heroku, you'll find it and learn how to use it. So if you have a bit of knowledge on application server management, you might think that the maximum number of processes you should run per dino should be equal to the core count. You shouldn't run nine processes if you only have eight cores, because in theory, we can only run eight processes at one time on an eight core machine. What I found in production is that, is not really the case. It can, applications could really benefit by having worker counts that are 3 to 4x the amount of cores available. So on a Perf L dino, I know Product Hunt is a Rails application. Product Hunt runs 30 to 40 workers on a Perf L dino, which is 4x the amount of cores available. And they also run some node processes in the same dino. So there's tons of stuff competing for the CPU time. But for whatever reason, I don't know if it's just a lot of waiting on I.O. But don't restrict yourself if you've ever heard that advice before to processes must equal core counts. It can be 3 to 4x that number. Key thread counts to 3 to 5. This is now, the way we set this now is Rails max threads, right? More threads per process than five tends to just fragment memory too much. It's also really difficult with high thread counts to keep yourself under the connection limit. So for Rails to connect to your Postgres database, for example, each thread needs its own connection to the database. So in general, we keep the amount of threads we have per process equal to the size of the database pool. Rails does this by default. If you have like 20 threads per worker and now your connection limit for your database is only 100, it's really easy to outstrip that connection limit really quickly. So I found that thread counts of 3 to 5 offer a really good compromise between processing requests concurrently, keeping connection limits without of reach and avoiding memory fragmentation. How do you know if my app is thread safe? I get this question all the time because people are afraid of Puma or afraid of making their app multi-threaded. So what Evan, the maintainer of Puma recommends people do is just to start slow, just try two threads, if things starts breaking, you can just change that config var back and pretend it never happened. If you use mini-test, you can try mini-test hell, which you just require mini-test hell at the top of your test helper. And it will run each test in a new thread. So that doesn't break things, you're a god. And at the end of the day, if you're running MRI, it's probably fine. I don't see many people running into actually weird multi-threaded bugs. And if they do, they know it's their fault. They're like, yeah, I probably shouldn't have used the Redis Global in this controller or a class level state like user.current or class variables. Generally, they find it, they're like, yeah, that's really obvious. I should have realized that. And the other thing I hear is like, oh, but I don't know if my library is thread safe and same thing. I know as a library author, I really pay attention to thread safety and I go through our code to make sure that it's thread safe. Because in MRI, Ruby code must be, anytime you execute Ruby code that happens with the GVL, the GIL around it, it's actually kind of difficult to run into a threading bug. So it doesn't happen and it is annoying. But don't be so afraid of it that you don't even try it. Okay, we've got our container size, we've got our worker count now. So let's make sure that we're not going to run over our connection limits. Things that use connections, ActiveRecord, ActiveRecord's DV pool, you probably have connections between your Dinos and Redis, maybe Memcache. I think most of the Memcache add-on providers for Heroku, excuse me, don't limit connections, I don't think Memcache, I think it has a limited connections. Redis to go used to really heavily limit them, but some of the newer Redis providers don't really limit them so much. Postgres is really your database is really the main connection pool that needs to be watched because those limits are very easy to hit. You change that in database.yaml, you need one connection per thread. That's the default, I think, in the database.yaml that gets generated. And I already talked about Redis and Memcache. You may need more than one database connection per thread. If you use things like rack timeout, which most people do on Heroku, because it's a 30-second limit, what can happen is rack timeout can raise while we're waiting on a Postgres query to return. And when it raises, that connection can get lost. So you may need to have up to double the amount of database connections per process, then you have threads, if that's a problem for you. You'll know that's a problem if you're getting errors that say like active record is spent too long waiting for a connection or doesn't have one available. These are the Heroku Postgres plans and how many connections they support. And after standard four, the larger sizes, it's all still limited to 500 connections. You need more than 500 connections for an example on Heroku Postgres. Heroku provides, I think it's a build pack, right? So the PG Bouncer build pack, which you can add to your app, which will pool these connections for you, and you'll be able to share a smaller amount of connections per process than you actually have threads. So just do the math to figure out how many dynos would outscale your connection limits. So as an example, if I have a perf ldino with 20 app workers and each of those app workers has five threads, that's 100 threads and 100 DB connections. So if I have five dynos, that's 500 connections, and I've hit my standard for Heroku Postgres connection limit. So now we've checked our connection limits. We know how many, the maximum number of dynos that we can scale to before we hit our connection limits and we're ready to deploy. So here are some things to watch after deployment, watch memory. This is a pretty typical pattern that I see is memory usage is fine and then blows out when someone hits like the CSV export controller. Looks like that, yeah, looks real. So that's swap, that's really bad. That dark purple swap, you don't want to see that. That means you're using too much memory and you need to back off the number of processes per dyno. Now, when you have, this is not a memory leak. It is a fat action. The only way you can really track it down, if you're seeing a curve like this where it's flat and then some action that someone used blew it out to double that number, you gotta install an APM that does memory profiling. So New Relic does not do this very well for as much as I love New Relic for everything else. Skylight, Scout are both commercial services that have memory profilers in production and they can tell you, hey, this controller action allocates 18 million objects and you can say that's really bad, I'll fix it. An open source alternative is Oink. And Oink basically writes to your logs and says this action did XYZ memory things. And then Oink has like a log parser that will give you some statistics about what controllers allocate how much. So if you're running out of memory, scale down web concurrency. That's the way Heroku has us all set up by default. If you're not using 75% of the amount of available RAM, you can scale up web concurrency. You can also tweak thread counts. So fewer threads will use less memory. You may think that because all a thread, because threads technically share memory, right? So all you need to create an additional thread is just eight megabytes of stack. But the way that malloc works, and this was something that changed in the C year 14 stack, is it allocates what's called arenas to each thread when they conflict. And at the end of the day, all it really means is that Glypsy malloc can have really bad memory fragmentation for a high thread count or very highly multi-threaded programs. You can control that with the malloc arena max environment variable. I can't get too much into detail about this cuz I'm running out of time. But if you just Google this, like malloc arena max Heroku, Terrence wrote a really good explanation of what it is, how to tune it. This is really only relevant for people that are running high thread counts or maybe your sidekick processes, which one, 25 threads or whatever. Or Jay malloc, which I talked about earlier, tends to do a good job of this. This is a customer example, a client example of tuning malloc arena max on a sidekick process. So they had a sidekick process that would balloon from 256 megabytes to a gig over 24 hours, that's really bad. And then right here, he changed malloc arena max to two. And it almost completely stabilized his memory usage. Watch queue times. New relic will tell you how much time on average a request spent queuing. So how much time it was not actually being processed. Less than 10 milliseconds is good, more than that is bad. If you have high queue times, that just means you need more dinos. That's the time when you wanna scale up. CPU usage, if your CPU usage is low, you may benefit from a higher thread count, restarts. So if you're using PumaWork or Killer, if you have to because you have a leak and you can't fix it, you need to be watching how often those processes are restarting. What I find is that some people install these killer, automatic killer tools. And then they don't know how often it's restarting. And it's like restarting every other response, every request. That's really bad. You're gonna really hamper the performance of your application. If your processes don't get to live very long, at least six hours between restarts is a good goal. And if PumaWork or Killer is, or whatever, is killing your processes that quickly, you need to change those settings or use a bigger dyno. So timeouts. So we all know Horoku has this 30 second timeout where if your application takes longer than 30 seconds to respond, it basically gives up on you and says I will not return this response anymore. So we have things like rack timeout to fix that. If you have a lot of controller actions which tend to time out frequently and you don't have time to fix it, a good band aid is to change to a dyno formation where you're running more workers per dyno. So as an example, I had a client that had some controller actions which took like 10 or 15 seconds to complete. It was admin stuff. And what would happen is a bunch of these requests would come in one after the other and they would back up all the other requests behind them. So dyno would take 15 seconds to do this admin action thing and then a bunch of requests would pile up behind it. So now all of those requests now take 15 seconds plus whatever time they would take normally. If you have problems like that where you have these 95th percentile times which are really high, you're gonna benefit from having more workers per dyno. And that's because while Heroku will route randomly to whichever dyno it wants, your application server will not. They all work differently here. Passager probably has the best model for this. But even Puma will do a better job of routing requests to open processes which don't have any work to do. So with this customer, they were running two X dynos. I put them on perfiled dynos and they almost completely got rid of their timeouts and reduce their average response time by like 20%. This is not big enough. But you can also, so you probably have racked timeout. But Puma has a setting called worker timeout and in passenger, it's passenger max request time. I don't know what it is in unicorn. But you can actually just kill the process after a request is taken a certain amount of time. In Puma, we do this by default. It's 60 seconds. And passenger, they don't turn this on by default. You have to turn it on yourself. If you are using passenger, I do suggest you turn this on because your requests probably don't need to take a minute. And if they are, you might as well just give up. So this is it. That's the process. Those are the steps. This is a slide you probably want to take a picture of. I'm Nate Birkepeck on Twitter. I'm going to tweet these slides out as soon as I'm off the stage here. And the website of my blog slash consultancy is speedshop.co. Thank you very much.