 All right, did everybody make it to Jim Wyrick's great keynote this morning? All right, good, because your mind needs to be kind of limber for the adventure we're about to set off on. OK, so I titled my talk There and Back Again, or How I Set Out to Benchmark an Algorithm, and ended up fixing Ruby, because that's basically exactly what I did. And I would like to share the story of that adventure with you. OK, so first of all, my name is Joshua. I live and work from home in Ankara, Turkey, which is actually a very lovely place. And I work for a company that used to be called ELC, but we're actually in the process, right in the middle of the process of rebranding is Burnside Digital. So if you go home from RubyConf and want to find out more information, Burnside Digital is the place you want to go looking. All right, so like any good story, this one comes with a warning. This is an advanced presentation. While I do not expect you to be an expert on the inner workings of Ruby, I do hope that you have at least some familiarity with pointers, memory allocation, memory pages. If you're prone to motion sickness or become nauseous at the sight of hexadecimal memory addresses, this talk may not be for you. And if you want to take a moment now to check out one of the others, I would not be offended. I realized when I proposed this talk that it was a little ambitious, but I figured, what the heck? Maybe somebody else out there has as much fun doing this kind of stuff as I do. Okay, I do want to make sure that everybody gets a little something out of this talk, though, so I want to, before we jump into the actual adventure story, do a little bit of talking about contributing to open source, Ruby or any open source project, really. I'm going to be talking in the context of Ruby, but this applies pretty ubiquitously. So I've been hanging out on the Ruby Core mailing list for a while, and there's a question that pops up occasionally. You get some gung-ho programmer, discovers Ruby or gets really excited and wants to contribute. How can I contribute to Ruby, they say, right? Let me add those bugs. Well, one thing that I've found is the best way that you can contribute to Ruby or any open source project, for that matter, use it. Don't go looking for the bugs. Use Ruby, and the bugs will find you. And that's actually a sort of a more efficient way of going about contributing to a project because after all, we write software not to write software, we write software to run software. So the more you run it, the better it is going to be. Now, let's say you've been using Ruby and you happen to stumble across a bug. What next? Well, I want to introduce you to the four Rs of bug reporting. These are things that you should do if you want to be a good open source citizen. So you want to reproduce, report, reduce, and regress. Okay, let's look at those a little closer. So this is an often overlooked step in reporting bugs, but the first thing that you need to do if you think you found a bug is make sure you found a bug. A bug that happens once is not a bug that you can really go about fixing. So the first thing you want to do is run the code again and see if it crashes, or if whatever the bug happens to be reproduces. Can you make it occur every time? Now, not every bug is 100% reproducible. Once you get into things like threading and memory allocation, maybe it's only 60%, 50%. It should be more than zero, though. If you can't reproduce it, you're not gonna be able to fix it. In the process of reproducing it, there are a number of things that you need to be aware of. Ruby can be influenced by a number of different configuration files, an IRBRC, a GEMRC, various other RCs, the environment that you're running it in, the particular shell, shell configurations, environment variables, all of these things can have subtle effects on the way that Ruby is running. So make sure when you're trying to reproduce this that you pay attention, especially lately, a lot of people have been using RBM and RBN, and these also do a lot to the environment that may alter it in such a way that it affects whether or not the bug is going to occur. The best thing that you can do if you're trying to reproduce a bug is ask a friend. Get one of your buddies on chat or whatever and say, hey, I ran this code, I think there's a bug, can you run it? Do you see the bug? If two people see the bug, it's probably a bug. Okay, so once you're sure you have a bug, the next step, report it. Again, something that is quite frequently overlooked. You have newbie programmers come into something like Ruby or a library that's out there and they're using it and they say, oh, that looks like a bug, but I don't know, somebody else must have found it. I don't need to report it. No, if you find the bug, report the bug. Duplicate bug reports are not a problem, actually. Some people might complain about them, but honestly, duplicate bug reports are a good thing. They're a good thing for two reasons. One, if a particular bug is reported, and again, and again, then that means that people are obviously encountering this bug. This is an important bug. This is how open source projects prioritize their fixes. The bugs that get reported more often are fixed sooner. The other reason duplicate bug reports are good is because not everybody is going to hit the bug the same way, and one of the most challenging things about fixing any bug is narrowing down the specific set of criteria that are needed to trigger the bug. So if you have multiple bug reports, you can look for the commonalities between them to help zero in on what the actual bug is. Okay, so Ruby has a bug tracker, bugs.ruby-lang.org. When you're reporting it, make sure you include all of the steps. It's really good if you can actually just attach a little script and say, run this script, you'll see the bug. That's great. Make sure you make a note of what Ruby version you're using. It should go without saying not every Ruby version has the same set of bugs, so if you found a bug in, say, 193, it may already be fixed in 2.0. So make sure that you report the version. And if the bug that you're reporting happens to be a crash, make sure you attach the crash log. Now, if you're on OS 10, oh, and by the way, I should mention that this talk presumes you are on OS 10, a number of the tools that we'll encounter later are on OS 10. There are equivalents on other operating systems, but just so that caveat is out there. So on OS 10, there's this app console.app. If you don't find your crash log in your terminal, you can go to console app, and there's this little section, it's kind of hard to read, but it says user diagnostic reports, and there you will see all the various programs that have crashed and their crash logs. And you should see one that says Ruby underscore, and then it'll give a date. So you just look for the one that corresponds to the date that the crash actually happened, and you get the full output of everything that happened there. Okay? Next step. Now those first two steps, those are kind of the bare minimum for reporting a bug, right? Make sure it can be reproduced, and then actually go and report it. Now if you want to be a really good open source citizen, the best thing you can do is to reduce the bug. So if you find a bug running a Rails app, right, and you go and you say, well here's a 100K line Rails app, and I was running it, and I did these 30 things, and then I found a bug. Okay, that's a bug, but it's not gonna be horribly easy to reproduce, right? So try as much as you can to eliminate the need to include any extraneous libraries. Try and zero down on just the code that's crashing. Simple scripts are the best thing that you can have, five or fewer lines. I know that sounds really, really small, but seriously, if you can get it down to five lines of Ruby or less, that's the ideal case. And aside from making it easier for whoever's going to try and fix this bug to understand, we'll see a little bit later that some of the tools that you're going to use to try and fix the bug depend on the bug being a small test case. And finally, if you're feeling really gung-ho and you really wanna be a really, really good open source citizen, try and regress it. So Ruby has multiple different versions, multiple different patch levels of those versions. You can try all of those and see does it occur on 187, does it occur on 193, does it occur on 2.0, what patch level of 19. If you're really ambitious, Git bisect is an amazing tool that you can actually just set to run automatically and find the exact commit that caused your bug. That's really helpful. It doesn't always work, but if it does. And finally, if you really are ambitious, you can also try different build settings, try different optimization levels, but that's kind of almost going a little too far at this point. Okay, I said four, there's actually five. If you wanna be a really good open source citizen, try and fix it. And so that brings us to our story. So a long time ago, about six months, maybe a little longer, in a faraway land called Turkey, I was working on a delegation library. Now, yes, I know this isn't technically what delegation is, we've all been lectured on the delegation pattern is not what you're doing in Ruby. All right. When I say delegation, what I mean is I have an object that I'm going to pass to the initializer to a second object, and I want methods on that first object to be called when I attempt to call them on the second object. There's a number of ways that you can achieve this in Ruby. So there's what I was calling pre-delegation, where as soon as I initialize the object, I'm going to take all of the methods on the second object and hoist them into the object that I'm creating using defined singleton method. Then there's what I was calling post-delegation, where I don't hoist the methods until the first time they're called. So I rely on method missing the first time you call a method, and then after that, I'm going to hoist the method into my object, and from then on, it'll be called as if it was a method on this object. Then there's on-demand, where it's basically just going through method missing every single time. Then there's this complicated beast that I was actually interested in a delegation library that would allow me to do quick calls to the delegate, but also swap out the delegate at run time. So I came up with this crazy scheme where I undef method when I decide that I want to reset the delegate and all the rest of that, right? So that was, and then the next step, benchmark it. How do I know which of these strategies to choose? Well, run through a couple of different scenarios. Find out what the run time looks like. That'll help me make an informed, evidence-based decision. So that's what I did. Now I ran through those kind of quickly. I put up a couple of blog posts on our company blog describing the backstrand a little bit more detail why you want to benchmark and how I came up with those various tests. But as I was running my delegation benchmark, when I got to one of the later benchmarks that was putting a lot of stress on the system, all of a sudden, bang, crash. Ruby crashed. I found a bug. Now I don't know about you guys. Maybe you get annoyed when you find a bug. I get excited. I think yes, all right. So, okay. So going through those four Rs, right? Before I reported it even, I was able to reproduce it. That was not a problem. Every time I ran that benchmark, it crashed. Before I reported it, I wanted to reduce it because this is actually, I know you can't read this. You're not supposed to. This is the entire benchmark that I was running, right? And that's a lot of code. So the first thing I did is those blocks at the bottom are all the different benchmarking runs that I was doing. It was only the last one that was finally causing the crash. So I said, okay, what if I take out all of the benchmark runs before that, right? Will that last benchmark run on its own still trigger the crash? And yep, it did. So then I said, okay, well it's crashing on the first of the methods that I'm benchmarking in that last benchmark method. So let's eliminate all of the other delegation strategies that I was trying. Just go with this pre-delegation one and see if just running that in that last benchmark causes a crash and yep, it certainly does. Okay, so at this point I don't need the benchmark library or I suspect that I didn't need the benchmark library anymore. I could just hoist the code out and run the method that was actually being benchmarked on its own, see if it's still crashed and yep, it did. Now, when I was doing this benchmark in this inner loop here, right, so I've got the 10,000 times do this initialization and then 10,000 times call method, right? I was actually calling three different methods in the benchmark that was triggering the crash and I wanted to know, do I actually need to call all three of those? What if I just call the first one? So the first method I called in the benchmark was the length method. So let's see, if I do this, is it still gonna crash? Nope, that didn't crash. And this is kind of the art of creating a reduced test case. You sort of vector towards the minimum amount of code that will trigger the crash and as you do, you'll probably end up eliminating the crash at some point. So then you got to backtrack and say, okay, what can I do differently, right? So the second method that I called of those three was reverse. And I said, what if I call just reverse, right? Do I actually need to call all three or is this one enough? And yep, that one did cause a crash. So the next thing I wanted to know is do I actually need to hoist all of the methods from my delegate onto the object that I'm created or is it enough for me to simply hoist the reverse method, the one that I'm calling, right? And so I eliminated that iteration through all of the methods and just did that single defined single method with reverse and yep, that's still crashed. So at that point, I was pretty much at the minimum amount of code that I could have that would still trigger the crash. The last thing to do is to play with the number of times of iteration and so you'll notice that I only needed to run the outside loop a thousand times. The inside loop, I still needed to run 10,000 times in order to trigger the crash. And so now I have my minimally reduced test case that this is the minimum amount of code that I need to trigger the crash. At this point, I went and I filed the bug and then I went to sleep because it was probably about 2 a.m. on a Friday night. But then I woke up on Saturday and I couldn't stop thinking about it. I said, I wanna fix this. All right, so before we set off on this adventure, we need to pack our bags. If you're going to debug Ruby, you need Ruby. So going at the source code, conveniently it's mirrored to GitHub, so it's actually really easy for you to get now. You don't have to worry with subversion checkout commands. And go ahead and build it. Over the years, building Ruby has become incredibly easy. In fact, I do believe you don't even need that first configure step anymore. You can just simply type make and it'll run configure for you before you actually make. If you want to play around with build settings, configure dash dash help has a lot of information about different build settings. We'll see some of them a little bit later. All right, so make sure that you've got the Ruby source code and you can build the Ruby source code. Then study the map. Again, I don't expect you all to know Ruby internals, but there are a couple of things that are useful to know if you're going to head off and start debugging Ruby. I do suggest that, I'm forgetting the name of the book, but the book that was discussed in the session yesterday, Ruby internals. Ruby under microscope, thank you, yes. That sounds like it would be an excellent resource if this is the kind of thing that you want to get into. But at a bare minimum, things that you should be familiar with. Inside of the Ruby source directory, you'll find this file include ruby ruby.h. This is the public C API, but it's also a lot of convenient methods that you can use when you're attempting to debug things. When you look at Ruby source, you're going to see a couple of values that are repeated over and over again. One is ID, and an ID is essentially just a symbol. Then value, which is essentially everything else. Ruby is dynamically typed and C is not. So Ruby sort of fakes out C by calling everything a value. Then there are a couple of methods, good methods to know. RB ID to name, this actually will take an ID and it will dump it out as a C string. It's important that you have a C string because we're going to be in a second using GDB. And GDB only knows about C strings. It can't print the Ruby string, but it can print the C string. So RB ID to name turns an ID into a C string so you can print it out in GDB. Likewise, RB string value Cster is essentially like two string, but it's going to, as the name implies, output a C string so that you can view that in GDB. And finally, as we're going to be looking at some back traces, VM call method is an important one to pay attention to because this is how methods get called in Ruby. Okay, and finally, make sure you bring your gardener because what's an adventure without your gardener. And in this case, that's mini Ruby. So if you build Ruby and pay attention to the build log, you'll notice the first thing that happens when you build Ruby is that you actually build mini Ruby. Mini Ruby is sort of a bootstrap, although it's not exactly what you would consider a traditional bootstrap. It's actually a very capable Ruby executable just without a lot of the extensions and standard library parts. So it's smaller. It still has the Lexar parser interpreter, VM runtime, all of the core library or core classes. So long as your bug is in one of those core library classes, you actually don't need to run Ruby to debug. You can run mini Ruby. And this is convenient because mini Ruby builds a heck of a lot quicker than Ruby does. And if you're going to be putting in print dev statements and all the rest of that, it's nice to be able to rebuild quickly. Okay, so we're ready. Let's set off. The first weapon that you need to get proficient with if you're going to be debugging anything in C-Land is GDB. And the way that this works is you simply give GDB the executable to run, which in this case is going to be mini Ruby. What will happen is GDB will start up and then it'll stop and it'll wait. And that's so that you can do any sort of break point setting that you need to do before you begin running the actual program environment variables. We'll see all of this in a second. Then the arguments that you would normally pass to mini Ruby, right? So if you're going to have a script that exhibits the bug, normally you would do mini Ruby and then pass the name of the script on the command line. Well, because we're inside of GDB, instead of passing it on the command line, you actually pass it as an argument to the run command in GDB and that'll pass it through as an argument to mini Ruby. And then in this case, because we had a crash, we wait for it to crash. So here's what that looks like. Okay, so we're starting up GDB on mini Ruby. It's going to give you some introductory information, copyright notices, you know, this is GDB as compiled by blah, blah, blah. None of that really matters. And then for a first go, let's just run our program. Now I have this tradition, I like to name my bug scripts bug.rb, so I don't have to remember the name, I just always do a run bug.rb. And so that's now going to tell us it's going to start the mini Ruby program executing with the argument bug.rb. It's loading all of our libraries, shared symbols, and there's our crash. So we crash in tiny free list, remove pointer. What the heck is that, and how the heck did we get there? All right, well, the backtrace command, or BT for short, will show us the stack trace at the location of the crash. And in this case, this is actually much longer. So like I said, I tried to fit as much as I could onto these slides without making it crazy, so the output is actually way longer if you do this on your own. But if we look backwards through our stack frames, a little bit back, back, back, eventually we'll see there is a VM call method. And if you've looked at the call signature for VM call method, you're going to see that, one of the arguments that it takes is recv, which is the target of the call. And so, gee, wouldn't it be nice if we could figure out what object we were actually calling a method on, right? This is just trying to figure out which direction we're going, right? What happened at the point of the crash? So we use our RB string value cster with the memory address of the receiver and crash. We got a crash inside of our crash. So this time, Ruby is telling us that we can't do object allocation during garbage collection. What? Well, go back to the back trace that we had before and you'll notice between the method that actually triggered the crash and the VM call method, at some point we were triggering a GC lazy sweep. All right, so this crash is actually happening inside of the GC routines. And if you study some of Ruby's internals, you'll know that once you're inside of the GC routines, there's not a lot that you can get away with, right? So you could get around this, you know, the RB string value cster method actually goes in and looks at the structures and figures out what the name is based on values stored in different structures and pointers and you could trace that down on your own, but you know, and I did, but it didn't really actually tell me anything. All right, so let's go back to the message that we got. In our terminal, the first time that we saw the crash, it said, incorrect check sum for freed object, object was probably modified after being freed, set a break point in malloc error break to debug. Now this is something that OS 10 does. Malloc error break is a method that doesn't do anything. It simply exists as a method for you to set a break point on if you're attempting to debug something that is error in malloc, right? So let's go back to GDB, start it up again. This time before we run the program, let's set the break point. Now you notice GDB does this weird thing, it tells us you're breaking on some method, I don't know what this is, right? That's because the libraries haven't been loaded yet. So it says, should I wait and see if this method shows up and you say, yeah, okay, go ahead. And then you run the program, right? And then it says, oh yeah, there's malloc error break. Okay, so the break point is set. And then we're gonna run, run, run, and crash in the exact same place. So setting that break point didn't exactly help anything. Hmm, okay, enough with GDB. Time to turn to some more advanced weaponry. So if you go into the OS 10 man page for malloc, you'll find that there are a whole collection of really interesting environment variables that you can set that modify the way that malloc works. One of those is called malloc stack logging no compact. All right, it's fun to say. What happens when you set malloc stack logging no compact is that every time there is an allocation or a free of memory, right? So every time memory is allocated or freed, the system is going to look at the stack at the point that free or allocation happened and record it to disk. And then there's a tool that you can use malloc history. You can go in and you can retrieve those stack traces, right? And this is the history for that memory address for the entire runtime of the program. Now first thing we're gonna need to do before we set off on GDB again, and actually probably could have done this a little bit sooner, is go back to that configure make step and make sure that we are making with our optimizations set to zero and our debugging symbols included so that we can actually see all of the memory addresses. Because if you go back and look at those back traces before you'll see a number of them it said inlined or value not available because of optimizations and stuff like that. So recompiling and optimization level zero will allow us to see all of the memory addresses that we need to see. Okay, so start up GDB again, before the program runs, set the environment variable. Malloc stack logging no compact equals true, right? And in GDB it's confusingly set environment instead of set end because whatever. Run and you'll notice as soon as we start running we start getting some logging messages. Malloc is telling me, okay, I turned on your stack logging and I'm not gonna compact the stack logs but your disk is gonna fill up really fast. And this is why it's so important to have a reduced test case because as soon as we get into these debugging tools they stress the system pretty significantly. Okay, so we let it run and pay attention to the process ID that it reports to us for what we're running because we're gonna need that when we go to use Malloc history and there's our crash again, same crash as always. Okay, so now when we look at the back trace we're gonna go and these methods appear because we're optimization level zero now, right? Before the Ruby X free and VMX free were all inlined so we didn't see them. So we can actually see the pointer that Ruby is attempting to free when this crashes. And so now we can take that pointer, we can take the process ID from the screen before, we can go Malloc history, process ID, memory address and bam, we get this whole, and I mean by the way this is like 20, 30 pages of this, I just showed the very end of it, right? Every time this memory address was either allocated or freed anywhere in the Ruby running process it records the stack, right? And it reports the stack with pipes. So you can see that this is the last time that it was allocated and freed. You can see the last time that it was allocated and freed it was a start main Ruby run node, all the rest of that, that, that. And actually if you look the free one actually matches the back trace that we got for our crash, because this is our crash, right? We're crashing as we're freeing this block. Now what you can do is you can go back, so at some point there was a problem with an allocation or a free, right? The bug that told us that somebody was touching our memory inappropriately, right? Probably after it was freed. So we can look through this log and we can look for things like, for example, you can make sure that every allocation and free is matched up, right? So there are different things that Ruby allocates and frees. There are procs and blocks and all the rest of that, right? So you can see here the last allocation was a proc new that allocated it and then a proc free that freed it, right? So this matches up, and actually if you go back through the history, it matches up the whole way back. Hmm, that didn't actually tell us much of anything. Kinda cool to look at, but we need something else. We need gmalloc. And you can read the man page for gmalloc, I suggest you do, it's actually kind of entertaining. But I think the first line summary is enough to give you an idea. Says guard malloc is an aggressive debugging malloc library. Let's see how it looks. Okay, so this is your memory. You've got pages and normally what malloc is gonna do is it's gonna give you a little bit of memory from a page and another page and then it's gonna go to the next page. So it's allocating your memory and when you free, it's gonna mark something as freed and then it's gonna allocate some more and then maybe you come along and you want another pointer and it's got its free list of pointers that have been freed so it recycles that memory and gives it back to you, right? This is how malloc normally works. This is how your system runs every day. gmalloc does something interesting. When gmalloc allocates memory, it allocates it on the edge of a page boundary and then it marks the next page as unusable. So what this does is for example, if you were working in C and you allocated enough space to have say an array of 10 inch and then you went and attempted to access the 11th member of the array, right? That's a bug. But normally C will not complain. It'll just give you whatever is at that memory address, right? C doesn't care. So what gmalloc does is to catch those kinds of bugs. It ensures that you cannot access the memory after the end of the memory that you've allocated, right? So the next allocation is gonna happen on the next page boundary and then the next page after that is going to be marked off. Do you see how quickly we're using up memory? Again, this is why it's so important to have a reduced test case. gmalloc really stresses the system. And on top of that, the other thing that gmalloc does that's particularly important for our case is that when you free memory, it doesn't return it to the system. It marks that page as unusable as well. So if you attempt to use memory after you freed it, you will crash. So the whole point of gmalloc and the man page explains this, right? Is your program should crash? It should crash immediately where the bug occurs, right? The reason that we are having such a hard time finding this bug is because somebody allocated memory, somebody freed memory, somebody touched the memory inappropriately, then somebody else allocated the memory, somebody else freed the memory and that's where the crash happened, right? The crash happened all the way over here, but the bug was somewhere in the past. So gmalloc should help us catch that. So here's what we do. We start up GDB again. We have to set our malloc stack logging no compact. We wanna know what happens on all these allocations. We set this dial D insert libraries environment variable, which on OS 10 tells the dynamic loader use this library, not the one you would normally use. We tell it to use libgmalloc and then we run. And again, we're getting all these nice wonderful errors about or warnings about how much our program is gonna suck up all the resources of our machine. And now when we crash, it's a completely different crash. This is something. This looks like it could be the location of our bug. We're setting a mark on this ME, which stands for method entry on CFP, which is the not current frame pointer. That's corrected me yesterday, but I forget. Anyway, so we can print out what is CFP ME and there's the memory address. Okay, so we have a memory address that's been inappropriately touched. We have a process ID, which we recorded before. So now we go in, we do malloc history, we give the process ID, we give the memory location and it tells us the last time that it was freed, right? So somebody freed this memory before they should have because somebody else is attempting to mark it. The GC mark phase is attempting to mark this memory is still in use, but it's already been freed. Who was it? Turns out it's that guy. Free method entry I. Now if you go and you look at the source for free method entry I, it turns out it's one line. Should have been three lines because it didn't check to see whether or not the method entry that it was freeing was already marked. So all you have to do is add that guard so that we don't free method entries that have already been marked by the GC and the bug goes away. And that's it. Got a little bit of time for questions, so thanks. Yes. How did I, well because I actually spent a good 48 hours straight looking at almost every method in the stack and trying to figure out who's marking the method entry and when are they marking it and when are they freeing it and tracing through all of the methods. So no, I mean, it's a fun adventure. If I told the whole story, we'd be here till nightfall. So yeah, I kind of shortcutted that but took a little bit of our, what's that? Did you get blinded to find out who? Yeah, the same person that fixed it is the one who broke it, so yeah. If we were statistic enough to want to reproduce this, did the blog post everything you mentioned before with the Ruby version and all that? Well, that blog post isn't up yet. I'll actually be posting my slides. Yeah, I forgot to mention. I'll post my slides so that, because I know the text is kind of small, so you can go through it at a more leisurely pace. Turns out as far as Ruby version, you can actually run the most recent 193 because it didn't get back ported to the most recent patch level. So it's only fixed in Ruby 2.0. Whoops. Any other questions? Well, thank you for entertaining my little adventure story and hope you enjoyed.