 I got a lot to cover today, so probably needed all my full-time allotment. Thank you for coming, it's 4.20 p.m. Thank you for skipping your smoke breaks to come to this talk. All right, so let's talk about having your memory usage with 12 weird tricks, the 11th one will shock you. My name is Nate Berkepeck, I run an independent one-man consultancy, I'm calling Speed Shop. I work on people's Ruby and Rails applications to try to improve their performance and scalability. So let's talk about memory. So the inspiration for this talk was that, I'd say a good amount of the people that come to me and to fix their Ruby applications performance have memory issues. And even if they don't come to me for that, I used to have to fix the memory issues first. And also like, I'm very active in reading the Puma and Sidekick GitHub repositories. And if you look at any of those repositories, especially Sidekick, maybe 90% of the issues that have to do with memory and say my app uses too much memory, I switched to Puma, now I have a memory leak or I switched to Sidekick and now I use 300 gigabytes of memory. About 90% of them are not actual leaks or bugs but just misunderstandings of how Ruby works with memory. So part of this talk is gonna be about, talking about those misconceptions and correcting them and then providing you some solutions to fix the real problems that you have. So we think we're leaking memory all the time but really we're not. Ruby, the thinking I think goes like this, Ruby is a garbage collected language but my memory is going up, therefore that must be a memory leak. And that's just not the case. I think as Ruby programmers, we're allowed not to think about memory and that's a good thing. Thank God we don't have to call Malik and free on our own, otherwise we'd all be C programmers. So I think it's okay and expected that as Ruby programmers we don't really understand what's happening at the memory level. So this is Puma's top issue ever in terms of comments, there's 172 comments about memory usage going up over time. And this is, I want you to remember the shape of this graph because it's very interesting about this, this guy thought he had a memory leak or memory problem. And this thread is really interesting. It's just dozens of people talking about how they switched to Puma and now they have a memory leak or they switched to Puma and now their processes are using four gigabytes of memory and Unicorn didn't do that. So there must be a problem with Puma. So solution one of our 12 solutions here is just to dial back the amount of application instances because a lot of the people in that thread didn't actually have any idea of how much memory, one instance of their application used. And this is pretty common, people hit an R14 error on Heroku or they are hitting the memory limits on their AWS instance or they're using a worker killer that kills workers after a certain amount of memory is used. And they never really find out that if you just left this application running for 24 hours or 18 hours, how much memory would it use after that amount of time? Does the memory usage ever level off? People just look at too short an amount of time when they're looking at how much memory does this process use? So what we first need to do is dial back the number of instances we're using so that we are not A, hitting any worker killer limits and so we're not hitting the limits of our server, our container or whatever. Because if you are doing either of those things, you do not know how much memory you're actually using in the long term. So it's bizarre to me how many Ruby applications I see that are like literal syncing ships, like they're applications on fire, they're running out of memory, but it's so easy to fix that problem, you just turn it down, you turn down how much memory pressure you're doing. I very rarely come up against someone's application that said they had an out of memory error and they were already running just a single instance of their application in a container. And what I'm talking about when I talk about instances, is Puma workers, Unicorn I think calls them workers or something like that, each forked process of your application. Not talking about threads here, because threads share memory. The process is sort of only share memory. We'll get to that in a minute. So the myth here is that memory usage should just look like a long flat line. Memory usage should never grow in the steady state. And the reality, and this is what people think Ruby memory curve should look like. The reality is they look like this, they look like logarithms. And there's so much important information about what's going on here. So this section, so we're talking about the first two hours or so after your application starts up. Code is getting required. Not everything gets required at boot. Like Rails tries to do that, but maybe your libraries don't. Maybe your application code is getting required. Things like that. So code's getting required. That's gonna increase our memory usage. We are filling up caches. Even if you don't do application caching, which shame on you, you should. You might be filling up caches like Rails adequate record cache, which was introduced in Rails 4.2. You could be filling up, creating connection pools to the database. All these things create objects, create memory. Takes a while for those things, all those code paths to get hit under production load. So you're gonna see memory increase during the first hour or two for that reason. And finally, different actions in your application require different amounts of memory, right? The simple ones maybe you only allocate a couple thousand objects. The complicated ones allocate 200 megabytes worth of objects. And as those complicated paths get hit, you're gonna see the amount of memory usage grow, right? Because Ruby needs a bigger heap to process that action, okay? So that's why we see that sharp increase in the first couple hours after starting a Ruby application. And then when we get to the end here, you'll notice it doesn't really ever level out, okay? And we're gonna get to why that happens in a minute, but I don't want you to expect that a steady-state Ruby application will somehow never grow in memory usage. And I should say it in caveat here, I'm talking about MRI, I'm talking about CRuby. And this talk is very CRuby focused, not talking about JRuby. Sorry if you came here and you have a JRuby application because I'm not sure how much of what I talk about with memory is gonna be relevant to you. But I think most of us are running MRI in production, so that's why I've tailored the talk to this. The other problem with this logarithm memory curve is you can look at any portion of this curve and look at a very small portion, so you're looking at memory usage over 30 minutes or an hour or whatever. It looks like a short, sharp, linear line. If you look at just memory usage for one hour from here to here, it looks like, oh, it's growing like crazy, I have a leak, you don't ever see the full logarithms. That's why it's so important to let a Ruby process grow over 18, 24 hours. And again, this depends on how much load you're under, right? So the less load you have, the longer you have to wait to see your true steady-state memory usage. If you have some, if you're like Shopify or whatever, I mean, it takes like 20 minutes because you have a million requests coming into each instance, so it depends on your load. But for most people, I think, you really need to be waiting like 12 to 18 hours. And the problem with worker killers, right, is we have this hard memory cap that the worker killer imposes. And if you set that cap too low, you're killing the Ruby process before it ever has a chance to get to a steady-state, okay? So if this line here is at a gig and the true steady-state of your instance is two gigs, you're killing it after one and they see this sharp, you see this sharp curve in your memory usage graph, right? So it goes up and then goes down and you think, oh, it's this super sharp, intense curve. I must have a leak. But you're not looking at the full curve. You're not letting it live long enough to prove that you have a leak. So my recommendation, this is a very general, sort of one-size-fits-all thing, is you should be aiming for 300 megabytes per instance. And this is for your general Rails application. It's probably gonna be a lot less if you're some rack application that's like bare metal, just like serving an API or whatever. But for most Rails applications, I think 300 megabytes is a good goal. I've seen as far north as 600 per instance. That's not great. But you should be aiming for 300. And that also applies to SideKick. SideKick has to load your application code, right? So your SideKick processes should be somewhere around that number as well. Solution two, stop allocating so many objects at once. This is the most important thing. And this is where I'm probably gonna spend most of my time in this talk. Myth, shouldn't the GC just clean up all of our unused objects after our job or action completes? Memory goes down, right? No. Garbage collection is very lazy. Garbage collection doesn't work based on timers. It works based on thresholds. There are various thresholds in your application that trigger, in the Ruby VM that trigger GC. It's not something that just runs constantly in the background, at least not the part that you care about. I'll get to that later. The sweeping phase of GC, the phase that actually frees memory or frees objects, runs based on these thresholds. The other reason why memory usage doesn't go down is something called heap fragmentation, which we're gonna get into. And free isn't the free C function, doesn't actually always free memory back to the operating system. We're gonna get into that too. So there's a large number of reasons why memory usage does not go down. So let's talk about thresholds for garbage collection. There are three main thresholds. The amount of slots that the Ruby VM has for objects can run out. So if we run out of slots, we need to get more slots. So that will trigger garbage collection. Ruby will attempt to first, garbage collect its existing slots to find objects it doesn't need anymore and it'll throw those out of the slot, right? So then it gets a free slot. And then there are two malloc thresholds. So this is Ruby 2.1 and up. We have generational garbage collection. So we have an old object generation and a new object generation. Those objects could allocate memory on the heap. And I'm gonna get to that in a minute. But when that memory that we've allocated crosses a certain threshold, we can trigger a GC. These thresholds move. So it's not a single value. It multiplies. So for example, with free slots, we start with 10,000 or whatever it is, I forget. And then we run out of slots, right? And we need to grow the heap to get more slots. I think we multiply by a factor of 1.4 now. Someone can correct me on that. But now, so Ruby multiplies the size of the heap by 1.4, so now we have 14,000 slots, right? And then we fill up those slots. And if we need it again, we need more space. We multiply by 1.4 again and so on and so forth. So that threshold can kind of move. Then we have heap fragmentation. Reading the C source of GC.C is a little bit like opening the arc of the Covenant. But what I've been able to glean out of it, we have something called the object space. And this is also sometimes called the heap in Ruby, but I feel that it's a little bit confusing because there's some other stuff in the heap. So calling this part of Ruby the heap is a little weird, so I'm gonna call this the object space. In the object space we have pages, which are these columns, and we have slots. Each page has 408 slots for objects. Pages are 16 kilobytes, and each slot is 40 bytes. So each slot contains something called an R value, and this is just the C internal name for it. And it just kind of says, this is the object, this is its class, and either here's what it is, or here's a pointer to the data that contains all the data for this object. So the problem with heap, well what happens with heap fragmentation is, so let's say I allocate 600,000 strings. So I have 600,000 slots, and then I don't need them anymore, so I get rid of them. But if I needed, if I held a reference somewhere else to 10,000 of those strings, I have 10,000 slots somewhere that are still full, right? Ruby cannot move objects between slots in pages, right? Because of the C extension API, any C extension can hold a pointer directly to where this R value is, and if we move it, we'll break it. So if we move objects around in Ruby's object space, or if the garbage collector compacts the Ruby object space, it will break your C extensions and cause a segfault. So we don't wanna do that, so we cannot move the pages, and we cannot move the objects in the pages. So people ask why can't Ruby have a compacting GC? That's the reason. And so you have these, and Ruby can only release a page, 16 kilobytes of memory, back to the operating system, can only call free on it if there are no objects in the page, okay? So if there's even one object in the 408 slots in the page, we cannot give it back to the operating system. And Aaron Patterson's doing some work on this to basically guess as to we should, the idea is we should put all old objects into a single page, so he wants to have a separate heap for classes and modules versus strings, because if classes and modules, we figure those'll be around a long time. So that should reduce heap fragmentation, he'll probably talk about it tomorrow morning in his keynote. I mentioned there's also another place that we allocate memory, and that's what I'll call the heap. So I said in our values only 40 bytes, right? So what happens if I have a 500 character string? That's not 40 bytes, that's much larger than 40 bytes. Well, instead of storing the string in that 40 byte slot, we have a pointer to space on the heap. Ruby calls malloc, allocates some space on the heap, and then, well to put some space on the heap, that's it. So we have two areas of where Ruby objects are kind of stored, right? This is where that old malloc and malloc limit comes in. If we, so that's where that threshold is. If we have more than X number of bytes allocated in this space, that can cause a GC in different cases. These values you can get in GC internal constants, this is available in any Ruby process, so it tells you how big the slot is, that's the size of an R value, and how many heap, how many pages, sorry, how many slots are in a page. This changes based on your architecture, so this is a 64-bit system, this will be different on different architectures. Malloc and free are suggestions and not commands. So Ruby calls free, and it says, hey, I have an empty object space page, I would like to return that to the operating system, it'll call free on the address and say, I'm done. Malloc can hold on to that, or the allocator, I should say, may hold on to that memory. This depends on what allocator you're using, but it can sometimes put that memory into a free list, because the allocator's idea is like, well, programs allocate a lot of memory, so I should hold on to this memory because you might, you're just gonna ask me to allocate it again, right? So people think that, well, if we're free on the memory page, my RSS, my memory should go down, well, it's not necessarily the case because the allocator can hold on to that. Also the operating system may not necessarily want that memory, may not reclaim it. Mac OS has like a thing called inactive memory, which is kind of like this. So there's really no guarantee that any part in this memory stack here, the Ruby VM, the allocator, or the operating system is actually going to cause RSS to go down, even if you're trying to, you're trying to call free and say it's available. Another thing that can cause heap fragmentation in Ruby applications is, well, so we can't move pages around, right? So just like we can't move slots because that could break pointers from C extensions, we cannot move the page itself because that would also change memory addresses and break the C extensions. So if for example, I allocate 600,000 strings and at the, right before I did that, I allocated, I created a constant or some non-garbage collected object. So I do that, allocate 600,000 strings and then I create another new constant, right? So now I have in memory, basically, constant 600,000 empty pages and a constant. This space in the middle that gets, it'll get freed up because I don't need those 600,000 strings anymore. This space in the middle is heap fragmentation. So this can be 16 kilobytes. If it's 600,000 strings, I mean that's gonna be like 30 megs. And when that space gets freed, the malloc implementations usually have a lot of trouble with memory where we have, remember we need it this at one side of the address space and remember we need it the other side of the address space and then there's a bunch of free memory in between. Allocators usually don't work very well with that. What they like is the opposite. They like all the free space to kind of be on one side of the address space. So if we allocated our two constants, we created our two new constants and then allocated 600,000 strings, it's a lot easier for the allocator to deal with that. But because Ruby applications are super complicated and we're doing all kinds of things all the time, we can end up with these kind of like Swiss cheese looking at heaps. And like I said, we can't move these memory addresses around so we're stuck with that. And that creates more memory usage than you would think is strictly necessary. If you make that diagram backwards, it's a French heap. Oh, wrong way. So this can cause long-term slow leaks. Even in an application that has constant memory needs, if we only really need 200,000 slots or whatever, this phenomenon of heap fragmentation can cause a slow increase in RSS usage. And that's not necessarily a leak, right? It's just heap fragmentation. It's just chaos created by the running of our program. So memory fragmentation I should say is usually like a small leak over time. It's not some big 300 gigs every hour or whatever. This is that really slow, coupled kilobytes at a time memory usage you see at when your Ruby processes otherwise in a steady state. So the end result is that Ruby memory usage over time, that steady state of your process, is really the point of maximum memory pressure. What I mean by that is so this is our memory usage curve over time, right? And this is how much memory Ruby actually needs at any point in time. So you're hitting a request here and then another one here and then Billy from accounting hits that CSV export action in admin controller, right? And that allocates a million objects. So over time, because free isn't really is a suggestion, not a command because of heap fragmentation, we're gonna see long-term memory usage as the top of that peak, right? And this all this free space is what malloc and Ruby are holding on to in expectation that Billy will go back and look for another CSV export. So the general idea here is we need to reduce the size of these peaks and really you can only do that by allocating fewer objects and that's like a whole nother conference talk for a whole nother day. And like most of the time these are n plus ones. Most of the time this is you created 3,000 active record objects at once and you need to fix that. So that's a whole nother talk for another day. Can't talk about that right now. I can't tell you how to find these problems. You need to use an APM. New Relic is not great for this unfortunately. It's kind of like New Relic is like tons of features but none of them are really all that great. I really like New Relic, I'm not ragging on them. Really Scout and Skylight are better in this area. Skylight's profiler is really nice and their memory information is great. Scout works as well. Scout I think you can get for free. So I would recommend checking that out. It tells you how many objects you're allocating per controller action. You can also use memory profiler in Oink if you're a cheap bastard that doesn't want to pay anybody. So these are two Ruby gems. Oink looks like this. No I'm just kidding, Oink looks like this. So Oink tells you how much memory is being allocated by each controller action, what control actions are blowing out your heap. Memory profiler looks like this. Memory profiler just can tell you what memory is allocated by a block of code and where it's being allocated. So what I usually do is I look for bad actions in Oink or in my APM and then I dig down with memory profiler. So I can put a before filter, after filter kind of thing or memory profiler hooks into rack mini profiler which is another gem if you're familiar with that. And then I use memory profiler to dig down. It's like okay exactly where is all this memory being or what's allocating all this memory? And you can also make your own with objects based in GC.stat. GC.stat is a thing you all have in every Ruby process. GC.stat has just a hash with a bunch of information. These keys are unfortunately kind of opaque and to really understand them like you have to understand how GC.C works but the simple ones are like how many GCs are you doing? And so some of them you'll get right out of the bat and it's a great way for counting how many GCs are happening during an action. So the idea is you just have a before filter that checks what GC.stat says and then an after filter to compare the difference and you get the idea. Objects based dot count objects will do the same thing. These keys on the left are actual names of like internal MRI representations of your data but like there's like T underscore string, right? So if you're in a before filter and an after filter and you see in the after filter that objects based dot count objects grew by 6,000 million strings, you know that's where the problem lies and you can dig down a little bit more. And if all those fails, pull it in a rake task. The idea is is that if we're trashing the VM with a certain action, let's just move the action into another VM and then trash the VM when we're done. Heroku makes this super easy because you can just Heroku run rake whatever and then it'll throw away the VM when it's done. So if you can move your big export tasks or you know, Billy from Accounting's thing into a rake task or a sidekick worker that is gonna reduce that size of that maximum peak and reduce the total size of your Ruby process. Throwaway VMs are much better than bloated VMs. I also heard a really great idea from Mike who's sitting in the front row. They move their bloated sidekick jobs into a different queue. So and then they have a, that queue runs in a separate dyno on Heroku. So they kind of have like the bad job queue and that's the only one that gets blown out. So that's another interesting idea. You can't really do that with web requests but you can do it with jobs, right? So how do we take out the trash? So gem file audit. So derailed written by Richard Siemens sitting in the front row. It's an awesome tool, has a lot of cool benchmarks. I mainly use it for this one. It basically just goes through each gem in your gem file and tells you how much memory, or how much memory requiring that gem takes. So Richard used this to find a bug in the mind types gem which is required by mail and therefore everyone's application. So this is super cool for checking out like how much does that dependency really cost me because there's a myth that dependencies are free. And if I need user auth, I should just throw device or omni-auth into my project and I'll be fine. But dependencies are not free and they cost memory. So going through it with derailed and checking how much that really costs you is awesome. You can also require false for assets. So we lost the assets group in Rails 4.1 or whatever. And that means that a lot of people are requiring gems like SAS, Uglifier, et cetera in production. And if you're pre-compiling your assets like you should be, shame on you if you don't, then that means you're requiring these dependencies you don't need in production and using up memory and wasting memory. So Sprockets is requiring these things as trying to require these things before it actually pre-compiles your assets. So you can save some memory by just not requiring them in your gem file and then always requiring them all the time. This is maybe not like, I know this works with SAS. It may not work with other gems. So you're gonna have to like read a Sprocket source and try it out yourself. And you can look at autoload.rb and Sprockets to get an idea of what they will actually pick up. Okay, Jay Malik. So you actually have a choice in Memory Allocator in Ruby. So like normally your program uses the G-Lib C Memory Allocator but you can use Jay Malik. Jay Malik was written by Facebook. They said that their mission in Jay Malik was to emphasize fragmentation, avoidance, and scalable concurrency support. Basically their PHP processes were like blowing out after three days of being up. And when you're at like serious scale like Facebook, you're looking for, you wanna keep your processes long running, right? Because if you're restarting your processes every four hours at Facebook scale, you're losing all that warm code, you're losing all those caches, you just can't do that. So your processes need to be longer running. So they're trying to solve with Jay Malik exactly the problem that a lot of Ruby processes have. So you can do this two ways. You can use the LD preload environment variable and that just loads Jay Malik before all other libraries. Or you can compile Ruby with configure dash dash with Jay Malik. And all these solutions, I have like a gist at the end that I'm gonna link you to so you can like find more details about this because I know you're not gonna remember that, it's fine. I got notes for you. Solution five, use copy on write. So I don't really know anyone that doesn't use Puma Unicorn in production right now but I do know that some of them don't use preloading and I think that's a mistake. I know it's kind of complicated to set up. I'm not gonna talk about how to do that today but you really need to look at that option for your application because getting some advantage from copy on write memory is important. Copy on write increases shared memory. So we can have shared memory and private memory between processes. Let's talk about that. So what preloading basically does in Puma Unicorn is it will load your application call rails.initialize or whatever and then fork after that point. So what that means is all the memory that we allocated before forking could be shared between our two workers and then after that they have their own private memory. Now this is a little complicated because basically from the perspective of these two workers, they think that they have their own memory. They're not aware of this process. This is happening at the operating system level. And whenever one of these processes attempts to read one of these memory locations, it gets to read it, that's fine. But when it tries to write to one of them, it copies it. That's why it's called copy on write. So we want to avoid writing the shared memory. Unfortunately it's kind of difficult to do that because a lot of writing the shared memory happens in the garbage collection process. So you don't have a ton of control over that but the reality is that some copy on write is usually better than no copy on write. And there's a little bit of a myth here that the total memory usage of your Ruby application is just the sum of the resident set size, that's RSS in your processes. But that's not true usually because of shared memory. So PS I think will give you shared plus private RSS. So just summing those up, you'll end up looking at it and saying, well copy on write didn't save me any memory. You really gotta look into your memory, whatever you're using to measure that memory usage with because a lot of them won't split that out separately. What you really care about is, okay when I kill this worker, how much memory will be freed up, right? How much less RAM will I be using? And usually what you mean when you say that is how much private memory, how much private RSS am I using? So you just gotta make sure when you're measuring with these memory tools that you're getting that correct. And it's really surprisingly difficult to measure. Memory can be virtual or real, shared or private, resident or swapped, and a bunch of things that I really don't have time to cover in this talk. There's a memory fact which I will link to at the end of this talk if you wanna learn more about what these terms mean. But I just really want you to understand that if you try copy on write with preload and you somehow don't see a decrease in memory usage, that may just be because of the way you're measuring it and you need to dig into your tool a little more. And again, it isn't perfect, but it's a start because of changes to the garbage collection algorithm, copy and write effectiveness has been a little bit reduced since Ruby 2.0, but it's still good, it's still a start. I recommend everyone give it a shot. Solution six, use a threaded web server. Puma and passenger enterprise, 90% sure, are really the only ones you can do this with in production, like thin as a threaded mode, but like no one can really use it. So this is a way of increasing concurrency with lighter memory usage, right? Threads use the same memory in our Ruby process. So we don't have to allocate hundreds of more gigabytes when we fork. This is what I think is a mini myth, I'm gonna go on a limb here. Most people think their applications are not thread safe, but the reality is that most people aren't really writing crazy super unsafe app code, I think. That's been my experience anyway. People are like, oh, Nate, you're crazy, you're gonna create a bunch of threading bugs, telling people to do this. I do want you to give it a shot. Really the only way I know of right now to have some idea of whether or not your app might be thread safe on production, because threads can feel a little bit juggling, all this crazy stuff, you don't know where the shared mutable state is. The only way I know of fixing this right now, or of getting an idea, is using Minitest Hell. So you require Minitest Hell before your tests run, and Minitest will run each new test in a new thread. So if that doesn't find a threading bug, I don't know what will. It only works with Minitest, obviously. Maybe RSpec has a similar thing that I don't know about, but I don't use RSpec, so whatever. Solution seven, keep Ruby and Gems up to date. Authors like Richard are working hard all the time on the performance of their libraries. Please help them by running bundle update every once in a while. So my general recommendation here is that people should be on Ruby 2.2 Plus, or Rails 4.2 Plus. 2.2 had incremental GC. 2.3 doesn't really have any crazy performance improvements you need for upgrade. It fixes a memory leak and prepend, if you're using prepend anywhere, but really no huge performance upgrade, so I don't think it's a huge problem if you're not on that. And Rails 4.2 Plus, because in Rails 4.2 we got this thing called adequate record for caching active record queries, which is awesome. I would watch out for Ruby 2.4 when that drops, because it's gonna have a faster hash implementation and a faster regex implementation, and it's gonna have some additional control over the number of free slots that we can give back to the operating system. So I think that's gonna be an interesting development for performance. I would definitely upgrade to that as soon as I could. Solution eight, tune malloc. So there's a couple things we can give, we can tell malloc to change its behavior. The most interesting one is malloc arena max for people running threaded web servers in production. So if you're running Puma with threads, passenger enterprise, I think this is an interesting setting you need to look at. Basically, when you have a threaded application, malloc, glibc malloc, will the default malloc will create these things called arenas. And what it's trying to do is reduce contention for memory reading and writing between threads. So it creates arenas every time it detects a contention for memory access between threads. The problem is that the default limit, the default limit for the number of arenas it can create is something like four or eight times the number of cores. So that can end up being a lot of memory, right? So changing this value from 64 or whatever it is on your machine to a number like two or three can reduce the total amount of memory usage of a Ruby process. It will also decrease performance, right? Because what malloc is trying to do is reduce thread contention. So by decreasing the number of arenas, we are increasing contention, right? And which will cause waiting. So there is a performance decrease. But for some people, that's not as important as the amount of memory usage that they can get down. The performance decrease for changing this value to two or three is like sometimes in the neighborhood of 10% and it can save 25% of the memory usage. So if that's an interesting trade off for you and you run a threaded web server, I would check that out. There's a great Heroku article about that, which I'm pretty sure I linked to in the notes. There's a whole list of other things you can tune in mallopt. And you can tune all of these things with environment variables. So if you know a little bit about C and you know a little bit about allocators, I would check out mallopt and check out what you can tune with environment variables to try to reduce some memory usage. It's very interesting. Malloc Arena Max is the only one that I currently kind of recommend people take a look at. But if you know your C, I would definitely check out in mallopt what you can mess with. And then the final solution is tuning your GC. So actually I don't recommend you do that. If you can read GC.C and understand what the environment variables you can set in there actually do, I'd say go ahead, give that a shot. But a lot of people find these Ruby GC settings on the internet and just copy paste. And the problem is the Ruby GC has changed so much in the last three years that settings are out of date. Settings may not be appropriate for your application. And you can really shoot yourself in the foot with this stuff. You can really mess up your garbage collector. So GC tuning can fix too many free slots being in the object space. It can fix slow startup times. And it can fix too many or too few GCs. If you don't have one of those problems, you shouldn't be looking at tuning GC in my opinion. And the amount of memory you can save from reducing the number of free slots is really not that much. It's probably 5% of your total applications usage. And you can really mess up your application if you get these numbers wrong. So don't recommend you do that. I think we need some more up to date documentation on what these things actually do that's accessible to a normal person or a normal Ruby programmer that may not know how the GC works or whatever. So if you see this thing and you're like, oh, I should try these variables that I copy pasted, like please don't do that. All right, so that's all I had to talk about tomorrow. There's gonna be a performance birds of the feather meeting at 1.15 p.m. If you have questions, I'd love to see you there. I have an entire course on tuning the performance of Rails applications. It's at Railspeed.com. It has 18 hours of video with interviews and screencasts and over 350 pages of content covering front end performance, Ruby performance, database performance and how to measure and profile and do all that great stuff. If you've read my blog online, this is what it looks like if you recognize that. It's being moved to speedshop.co right now, which has been a fun project. Like my speedshop website, it's all 10 kilobytes and everything's in line. It's like super performance optimized. So I'd love to talk about that too if you wanna like ask me about it. Anyway, thanks. The slides and the notes for this talk are on my Twitter handle at name Birkepeck. You can find me at speedshop.co and railspeed.com. Is there any questions? I don't know how much time I have, yeah. Yeah, so Puma Work A Killer, Richard's project. It's awesome that it exists, but again, a lot of what I see when people use Puma Work A Killer is they're getting like 30 instance restarts per hour. They're restarting their workers 30 times per hour. Puma Work A Killer is not a tool for like cramming 30 workers into a single one gigabyte dyno, right? It's really meant to be a tool for that one dyno that gets out of control and like has two gigabytes of memory usage. So I think that Work A Killers are a very sharp instrument and I would prefer that if you're gonna use one of those that at first you don't, not like permanently, but turn it off and run that process for 24 hours, see what the memory usage is like at that point and then use that information to set that limit for Puma Work A Killer, okay? Don't just set that number out of nowhere without knowing exactly how much memory you're actually using in the first place. They are necessary tools, I think, but I think that a well-architected application usually doesn't have to deal with those problems and Puma Work A Killer and Unicorn Work A Killer are really more of band-aids than real solutions and Richard's not in his head so I think he agrees. So the question was like, so I talked about how constants or basically old objects that can't immediately be garbage collected can cause keep fragmentation and is there anything we can do about that? Honestly, I don't really think so. All you can really do is not allocate tons of objects at once. All you can really do is not blow out the heap, the object space to huge proportions in the first place. So if your heap fragmentation will be lower, if for example the maximum amount of allocations that happen in a controller is 300,000 objects rather than five million, which is a number I've seen, actually it's an 18 million. So just reducing that max allocation number, the max amount of memory that needs to be allocated at a single time is probably the most important thing you can do to reduce heap fragmentation. There are really none as far as you as application developers are concerned. You can go read the issue on Ruby Core as to why it was not included in Ruby Core. Basically I think the developers feel like they don't wanna hitch themselves to a big open source project that they don't control. I think their idea was that they would basically bring all of Jemallet code into the Ruby Core repository like they do with OpenSSL and it's a big project and I understand they don't wanna do that. But from an application developers perspective, there really aren't any drawbacks. It's not gonna cause random seg faults. You can use it, I should mention, you can use it on Heroku. So I maintain a Heroku build pack for adding Jemallet to your Heroku dyno. And I would like more people to try that. It's not super battle tested at this point, but you can spin up a new dyno and add our build pack in and then try Jemallet with your application to give that a shot. Like what would I do differently if that, either one was the problem. So what would I do differently if the problem was many small objects versus one single large object? I don't know, I think it's really the same problem. I think most people have the latter problem. Most people have the problem of my controller has a million active record objects it needs at once. I can't say I've seen, and I think it's a little bit easier to deal with on the fragmentation side because when you think of fragmentation as a problem, it's about the space between the objects. So that is necessarily bigger if there's 600,000 objects, right? Like if there's many opportunities for each of those 600,000 objects to be not garbage selected, to stay around. And then that causes that heap space, right? With one single object, there's only the beginning and the end of that object in the address space, so it doesn't cause as much fragmentation. But like I said, I normally see the latter problem. The main problem is that, so okay, the question was like, I used my example for 600,000 objects that like I kept 10,000 of them around and that's what caused the fragmentation. What if I didn't keep those 10,000 object references anywhere and I just let the 600,000 objects go away? I actually have some demo code that I'm not gonna run right now that actually does that. And what you actually see in reality is that that still causes a lot of heap fragmentation. The reality is I haven't used like Valgrind or any crazy memory heap analysis tools to figure out exactly why, but that can still cause heap fragmentation. I think what's happening is there's memory being allocated for other reasons after we allocate those 600,000 strings and that's creating a new heap page or memory access somewhere and we're still getting all that free space. And so I think that script, that test script is in the notes and if it's not, I'll add it but you can mess around with that and play with like what causes fragmentation and what doesn't. Again, it depends on the allocator. Like JMalloc deals with that case a lot better than glibc malloc does. No, there are no other allocators right now that I would recommend. There's TC malloc but I found it a lot harder to set up and I found the results not as good as JMalloc and there are like some like closed source allocators that I haven't tried so maybe one of those is okay but JMalloc is good, it's free, it's battle tested by Facebook and it's also battle tested by discourse which is a huge Rails application so that's currently the only one I think people should really be looking at super hard. Anyone else? All right, cool. I'm happy to talk about performance war stories if you have any questions about Ruby performance at all, I'm happy to talk to you so and hopefully see you tomorrow.