 curious case of Ruby's memory. It's two talks roll into one, because, you know, that's what I wanted to do. And who here have used Ruby 1.0 in production? 1.8? Ah! So you remember the pain, right? Ruby has garbage collection and it always had since 1.8. And up to 1.9 how it used to work, it was mark and sweep. So the garbage collection had two phases, the marking phase and the sweeping phase. In the marking phase, all the objects that have references are marked and whatever is not marked is sweeped away as deleted. And that was pretty good. It was efficient, simple and easy to reason about. Except the problem is that if you're running a real sabbath that has, that is taking 400 MB and you have probably a million objects on your heap, then traversing this entire heap each time is very costly. So Ruby 2.1, K01 brought generational GC. So, three cheers for generational GC. The insight that generational GC has is like most objects die young. They barely survive as one or two garbage collections actually. So what does it do? Is these blue guys here, if you can see, if they were, what Ruby sees on the major GCs is if these guys survive a predefined number of garbage collections, they are moved to this generation called old generation. And whenever more memory is required, first a minor GC is performed and these old generation objects are not tested at all. So as you can imagine, garbage collection is a lot faster. That's very nice. But there's a problem. Let's say you have this hash called active users and you're storing, that's a hash that pretty much is live through your entire lifecycle of application and it stores a reference to this user object which is a new guy. And so that's the case here. So you can see object in old generation refers to object in new generation. The problem is on a minor GC, the Ruby is not going to check old generation for references at all. So your user object might be deleted even though somebody is holding a reference to it actually. So that will be a really bad bug. So all generation garbage collectors typically implement this trick called remember set where what they do is whenever an old object tries to, or you write something to an old object, it's put into this different set called remember set. And on minor GC, not only your young generation is traversed, but your remember set is traversed as well. So now you can see that the user object will not be garbage collector earnestly. So that's pretty good. And that's what JVM does and other VMs do. But Ruby has a problem that if you're running a Rails app, it probably has at least 10 or 20 C extensions loaded. And these C extensions can do crazy things. They can, you know, they can take your Ruby pointer and they can put something in that array via C. And Ruby won't know about it that there's this array which is an old generation has just added a new object. So something, I'm just giving examples. We are adding something to a Ruby array, not like a C array and, you know, like doing mems, CPY, copy into Ruby array. And this object is an old generation. So what will happen is Ruby is again not aware that old generation, an object is old generation is holding reference to object new generation. And this object could be deleted, you know, like on a minor GC. So this is a big problem. Unlike JVM, Ruby doesn't own the heap space of a process completely. So to work around this problem, what K01 introduced was something called write bad years. And what it does is pretty somewhat something simple and very great is like it marks objects, it categorizes object into two categories. Shady objects and sunny objects. When you touch a object, when you take a Ruby object and you do something with it in C via any of the macros or methods, macros particularly, then the object is marked as shady object. And not a whole lot of, like if you're writing a C extension and if you're writing your own custom data structures, they all are shady, actually. There are only very few, the hash and array, procs are by default shady. So a lot of objects in Ruby are shady, actually. And what happens is that on minor GC, now, not only young generation is marked, but young generation, all the shady objects and all the remembered set objects are traversed. So it kind of, the bad part about it is it kind of takes away the benefit of generational GC because we are traversing more objects. We are looking through more objects even on a minor GC, actually. But the advantage is that it's 100% comfortable with almost all the C extensions out there. And, yeah, so the objects that are not, that you're not accessing from C and just doing it in Ruby, they generally will be right protected and they will be marked as sunny objects and they will not be touched. So the key take away is from, so that's how Ruby is, you know, like generational GC works in 2.1. 2.2.0 is bringing something called incremental GC, which means that not just two generations, not just young and old, but there will be more than one generation. And there are trace points hooked at fire and then there's a symbol GC coming in 2.2.0 as well, which will reduce the heap size further. So all that is pretty great. Key take away is from the stock is don't use low level access for Ruby data structures like our pointers. Ruby actually provides APIs, like functions that you can use if you're writing a custom data structure that you want to expose to Ruby, not just keep it in C, you can expose the C data structure to Ruby, right? So you can mark that as right protected, there are functions to do that. So use that. So that's about it. Now, next thing I'm going to talk about is how to tune the garbage collector wire in moment variables. But before I do that, I'm going to take a trip back to the memory lane and talk about memory profiling Ruby applications. It was horrible, right? Like Twitter had to use detrace and then experts had to be brought in actually like people from expandables and things like that. So it was hard. And usually profiling look like a graph is done for something you used to get and then you just see what's going on and it's terrible. Now, thanks to Ruby 2.1, which has trace points, it has great instrumentation support. I work for a company called CodeMensors and we as a team decided to do something about it, although this was not our full-time work, we decided we are going to build a Ruby profiler which will be as easy to use as something like your kit, which you can use in production with very lower head and it doesn't cause problems with your application. And I'm happy to present you our big kit. I have never announced it before. It is the first time I'm showing it off to people at Rocky Mountain. So, yeah. So it's up on GitHub. We just open source it a few days back. It's a two-part application. You can see this is a snapshot, a picture of the app. I'm going to actually demo it, how to use it. It runs on OSX. So, it's great. And it's not one person effort. It's like four people from Bangalore are working on it, so not full-time as part-time at this place where we work. It was not easy, you know, like to delve into, to convince bunch of Ruby developers to go do C. The RB kit gem is written mostly in C because a lot of trace points only work in C and then the desktop app is written in QTC++. So, it was very hard for, there was a fear factor actually. So, you know, like, oh my God, I do see what if it, you know, like I do something funny with the pointer, what's going to happen? So, but we did it. And it's when, so C is for us, something that, you know, like Linus Torvales wrote, it was hard, but we did it and we built RB kit. So, what is RB kit? RB kit is a low overhead Ruby profiler built for MRI, written almost completely in C. It has two parts, a desktop application and a Ruby gem. What is the great about RB kit is like the gem, Ruby gem, doesn't do much other than gathering the data of your, you know, like the data, okay, this object has been deleted, this object has been created, this object holds references to this object, it doesn't do anything else. And it sends all the data via 0MQ to the desktop plant. It uses message pack in C as the serialization format. It's just real fast. Great thing about 0MQ is, 0MQ has its own IO threads actually. So, when we send the data, we are not going to block the Ruby thread or Ruby processing actually. I can send million messages and it's just going to work. So, the beauty is a profiler that can be used in production. Using RB kit is real simple. Just in your real site, you can just put gem, RB kit and then in the boot.rb you can put require RB kit, RB kit start profiling. It actually will listen to a socket for incoming client desktop connections and yeah. Next thing I'm going to talk about is RB kit app, the desktop app. It's a cross platform app written in QT C++, ran your QT bindings and we do QT has this thing called WebKit, QT has a widget called with QT. So, you can render certain portions of your application using WebKit JavaScript, D3JS, whatever. We use SQLite. So, as I said, in this application, all the heavy lifting is done in the client side. Like, you know, like which object holds references to what object, where it was allocated, all that stuff. And we benchmarked various, you know, like GUI libraries and we came to the conclusion that QT C++ really matches what we are trying to do. And we didn't want to build the OSX only application that unfortunately pervades the Ruby culture. So, we wanted to have it work on both OSX, Linux and preferably Windows if we can. We can still render the certain pages using plain HTML via the QT WebKit bridge. The tools we use, it will be funny if a memory profiler itself leaks, right? You don't want to use that in production. So, two tools that are very important is Valgrind and another tool that we use is called Simple Command Line Program which is available on OSX leaks. Just run it right now and you can, if you have OSX, then you can run it right now and you can track memory leaks in your C applications using leaks very easily actually. It's great. So, the status report is RBKit is not going to be just about memory profiling. We are going to build CPU profiling as well. Right now, today, our memory profiling pretty much works. CPU profiling isn't works and it's open source. So, RBKit demo time. Let's see if it works, right? Okay. So, this is the client. I need to resize it slightly. I hope you guys can see it in all its entirety. Okay. So, I'm going to just click connect because it needs to connect to a running Ruby application. And I'm going to just to press this thing. One great thing about ZMQ is that the server need not be running for clients to start connecting. So, I'm running this and now if I go switch to this thing, you can see that it's, the live profile is on and you can see there's a GC stat. Almost three seconds it took for the GC run and the object count growing and then you can see the memory size, heap size, RIS memory size and the heap size of the process and you can see all this various stats. Now, I can, generally if you're profiling a Rails app, it's always a good idea to trigger a GC before you take a heap dump. So, I'm going to, I can trigger a manual GC here and then I can take a heap snapshot. It shows a neat progress bar as it takes but you cannot see it because it's just, yeah. So, this is the status of the heap. It's showing all objects that are there live on your heap and I can see that I'm leaking not, there's objects 29,000 strings allocated there. There's this foo class that has like, on line 90 is 12,000 and then it holds references to 24,000. Then you can see there's a hash here and that holds references to 49,000 object. I can view references and I can see all this stuff where is all this references coming from. We can of course take another heap snapshot because one snapshot is not good enough and then, yeah and then I can compare heap snapshots actually and then, I miss mirroring. I'm sorry guys. I'm just going to enable do mirroring for second so that I can see what you guys are seeing. All right. So, yeah. So, I just did compare snapshots and I can see that you know like these are the objects that were there in heap 2 but not in heap 1, heap snapshot 1 and I can see okay where they are coming from. So, there's something going on on line number 21. Obviously, I intentionally wrote an app that leaks actually to demo what I want to demo but this is like the, you can do already, you can download a DMG file, just double click and download the gem and run it. Simple and yeah, it's easy. Hey pep, next is like I hope this was not enough. So, we are going to talk about more stuff and I'm going to talk about GC tuning visualizations. Now that we have built this, we can do crazy things with once you have all the data available on a zero MQ socket. There's a zero MQ to WebSocket bridges. People are going to go crazy. But I'm going to just use RBQ to create the desktop application to show you some cool stuff with GC tuning that you know you have read in blog post and everything. It's out on the internet. You can find it but I'm just going to make it look like, make it more you know like apparent to you how it affects your applications performance. The first one is simple one Ruby GC heap INIT slots. The number of heap slots that Ruby starts with. The default value is small and you can find out like 60,000, 600,000 on blogs if you go read it. So, I'm going to just stop this and first thing I'm going to do is unsit whatever I have set. So one of the things you don't want to do is you don't want the GC to trigger quite often because that could be detrimental to your applications performance if you're running unit tests on your own machine like triggering GC on when the rails boots up could be you know like bad. So, I have unset whatever I had already set and I'm going to just start the RBQ thing and I'm just going to connect and then I'm going to run the script that you know like that does something here. So, as you can see that by default there will be already there's a GC trigger happening. It's kind of bad right? It means like you're running slowly. So, what I'm going to do next is I'm going to set it to a higher value, sorry. I'm going to set it to a value like this. So, one of the things that I did not implement it, we did not implement it in RBQ desktop client is cleaning all the state and restarting a new profiler. So, that's the reason I'm starting new closing that application and starting from fresh. I hope that won't be too much of a problem for, so this time I, anyway. So, this time you will see that no GC happening actually. So, this thing is a little bit obscured but you can see that major GC count is zero and there's no GC here on the charts as well. So, when we set the INIT slots to a higher number, this, the number of GC counts have dropped. So, that was one. Second one I want to talk about is like the Ruby GC heap growth factor which is like, okay, when Ruby needs more memory, Ruby does not store all the objects in Ruby's heap, by the way. Ruby heap has, Ruby stores, there's a heap slot where all R values are stored and then there's a, if object needs more than what can accommodate in that heap slot, it uses typical AMLOC call to take memory from C heap and create and allocate more memory. So, let's talk about, but focus on Ruby heap when Ruby needs more memory on its heap, then it uses this environment variable to keep track, okay, by how much the heap growth should be. The default value is 1.8. Some people suggest to keep it, you know, like a lower value or higher value. I have just, I'm going to set it to 1.99 and demo this. But before that, I need to unset the INIT slots so that the, because if the initial number of slots is higher, then the GC is going to occur much later and we'll be sitting here and waiting for GC to happen. So, all right, so, oops. So, if you keep an eye on this, this one, this blue one is what represents the Ruby's heap size, not really the, and this gray color represents the actual live object size. Live object size means the objects that were allocated even on C heap all are included. So, you can see here the size was initially approximately 1 MB and it grew to more than 1 MB and then it grew by whatever factor we defined in the environment variable. So, it had a direct impact on that. So, that's great. And so, we covered these two. Now, yeah, another thing I wanted to talk about is the Ruby heap versus C heap, the R value size 40 bytes, but for a typical application like Rails application, the strings and objects are going to be of much bigger size and they're not going to fit in the heap slot actually. So, Ruby uses Ruby XMLock to take memory from C heap and, you know, like allocate objects for it. Usually, Ruby doesn't keep track of accurate memory that is, that has been allocated to MLock, but it knows roughly. So, yeah. Usually, so, like, if I, earlier case, I was allocating small strings. So, if I, if I just increase the size of strings, you will see, you will see that this, the rest memory size, which is the memory taken by all the live objects, is going to go much higher actually when you're allocating larger strings. If I wouldn't have, I wouldn't be when allocating a class or something, then all the objects will fit right in the heap actually and this will be same as this. So, that's another insight that we can draw from looking at this data. All right. So, next thing I want to talk about is Ruby GC MLock limit. And this is another important factor that controls how many objects are there. So, how frequently GC runs. So, to demonstrate this one, I, what's the idea here is, like, the initial heap slots and growth factor are not alone to control how frequently GC happens. Ruby GC MLock limit is another variable which, which controls on C heap how much memory is allocated, then, and if, if that much memory is already allocated, then the GCs should happen. The default value is 16 MB. It means that if your application has already allocated 16 MB on the C heap, or via MLock, then it's going to trigger a new GC. Even though, let's say, if your initial heap slots number was pretty high, but heap slots don't store all the objects, as you know. So, this variable controls all the live objects that, that require more than 40 bytes. So, so if you're, for 16 MB, generally a much lower value. So, we can see this with a Rails app. First thing I'm going to do is I'm going to define a real high value for, for this one. I'm going to unset whatever I defined here. It doesn't matter, actually. I'm going to start. I already have included, I've already included Rbikit here. Rbikit in Gemfile and Rbikit.start profiling. So, it already knows how to do this. And I have the profiler running connected really nice. And I just need to do, I, usually it's much better idea whenever you're profiling an app, Rails app in particular, to start in production environment. So, as you can see, even though the, even though like the INIT slots value is much higher, even on boot there's already been GC here. And as application grows, there will be more number of GCs, actually. So, this is, you don't want this to happen as well. So, what we can do is, what if we close that and define a higher value of Amalok limit. And I'm going to do this is 64 MB. And this, restart, sorry about this, almost done. So, you can see that like the number of garbage collection, collections that you will see now will be much lower, hopefully. It crashed actually. So, there's some problem here. So, yeah. So, that's all I have for this thing. The code is already available on GitHub. And you can use it. You'll find on Twitter and everywhere. And my name is Hemant Kumar. I'm from Bangalore and Atlanta. Depending on which time of year it is. Thank you. I have time for one or two small questions. So, the question is what kind of overhead do you see when you run RBKT in production? And answer is like there are, RBKT can be run in two ways. One way it can just, in one way it can just start the server and it will start the zero MQ server but it won't install the TracePoint hooks. So, it's just listening for TracePoint hooks. So, in that mode it has zero performance penalty actually. Then if you connect to, through the, from the desktop, you can, desktop client, you can manually trigger, start, okay, start profiling. And at that time there indeed is a some performance penalty because the, when each object is being allocated, we are running some code. So, yeah. But it's still not that bad because all we are doing is putting objects, recording them and putting them in a hash table and then like every, you know, like every thousands, every thousand millisecond or one second we are sending the data back to the desktop client. So, yeah. So, the question is this is quite a big project. What kind of problem you guys had that made you write so much code? So, we have been, I've been working with a, with a client in England and, and we had some issues there with memory application, with memory users of Rails processes and, but beyond that, I was, in general, like very much, you know, like, enthusiast by the, the, the instrumentation support that was there in 2.1. And I wanted to see what can be built on top of it. There are a lot of command line tools and then you can write a dot file and then process it with graphics and do something like that. But they feel so inaccessible to, you know, like a, a, a common developer actually sometimes. So, I just thought, you know, like, okay, we can do something really great with that. So, let's, let's do that. So, that was more than idea than the, yeah. Yeah. So, I didn't want to go a lot into that. But this third link and obviously I'll be posting the slides online and, and so it covers all the environment variables, their explanations in, in a lot of detail, actually. Okay. So, yeah. Thank you again.