 Let's continue talking about performance here in this room. And this is the talk about some performance secrets, but most importantly, how to uncover them. So let me say a few words about who I am. My name is Alexander Daima, and I've been doing coding for a while, and I've been doing Ruby and Rails since 2006. And before that, I was coding in C and C++, so I got used to the best performance. So when I started writing Ruby, I immediately noticed that my code is slow, and I had to optimize, and I never stopped since then, so I have to optimize it every day, and I do that right now. I gave a lot of talks on Rails performance before at various Rails cons and other conferences, and I also have written some articles on Ruby performance, which you can find online. Some of them are less relevant, some of them are more relevant. And I am writing a book signed up with pragmatic programmers on Ruby performance, which I hope will become the first comprehensive book on Ruby performance ever, because I'm not aware of any other book right now that has all the information about performance. So by all means, go to rubyperformancebook.com and sign up for the mailing list. I'll notify you when the debate is ready, and it's going to be soon, and I hope that the book will be released next year. But this talk is not about the book. So big thanks before we start to sponsors who helped me get here. This is Ninefold, of course, which is the hosting that will give you more performance for the same money. The GitLab, the open-source GitHub alternative, and the Algae Energy, the only energy company that will not screw too much. So given my background on performance and various talks that I have given, you could have expected me to talk about performance tips and how to write well-performing Ruby and best practices, but this talk is not about it. So today, I am going to talk about how you can find these performance tips and tricks, how you can come up with the best practices yourself and how you can do that. And I will show you how I did it and hopefully you will learn how you can do it, by example. So I'll give you two examples. And the first one is a complicated and the second one is simple. Here's the first example that I have. What can go wrong with this code? It's just a loop inside the loop and there is a sum. For those who do not do Rails, sum is basically inject. So what can go wrong? I found that this code is slow and it was slow sometimes. And what I thought I have found is that that is because of the sum or inject. And I replaced sum with a while loop, but let me show you the simple example. So on the left is just one line and on the right is the ugly while loop that does the same thing without inject, but faster. How fast? For me, it was like 100, 200 milliseconds faster and sometimes. So what can go wrong if your code is slow sometimes? And the answer is it smells like garbage collection. So that's exactly what happened. So I wrote the simple synthetic example of the same code. So on the left is the loop which goes and calls inject 10,000 times. And on the right side is the same loop. The loop doesn't do anything. Just calls inject or while. And it has the same performance characteristic that the code that I showed you before. So the right side is faster sometimes. So if we suspect garbage collection, what can go wrong with inject? So garbage collection happens only if you allocate too much memory or you do too many allocations or you create too many objects. I'm not sure that inject can create a lot, allocate a lot of memory. Like it doesn't do anything, right? So the only reasonable assumption would be that inject allocates new objects by itself. So how to check this? There is only one way to check its measure. And I used the profiler for that. So there is a way to profile memory in Ruby which some people know about, some people don't. But there is such way, but it needs patched interpreter. So by default, Ruby will not give you back the information about memory allocations and memory usage. Or it will, but not in the right places. So if you are going to use the profiler tool, which I will use, you need a patched Ruby. And for the time it was hard to install it and the patches were bad, but right now it's easy. So many people here probably use RVM and it's just one RVM reinstall way. There is a set of patches called Rails Express patches. Those are memory profiling plus more patches. And you can easily install that using RVM. The only thing is that you cannot use at this point of time 2.1. So there is a reason I crossed it out. Memory profiling does not work yet. Either I will fix it or somebody else will fix it, but it's just a matter of fixing. Right now it doesn't work, but in a month it can be. It can work. So if you're using 2.1, it's easy to go back to 2.0, profile your memory and go back to 2.1. So it's not going to be a huge problem. And of course, if you're still using 1.8, if there are such people using 1.8, there are patches for 1.8 as well. So the profiler that I use is called Ruby Prof. I know several profilers available, but I like this one. And I also like it. I like it because it's a jump and I like it because there are good visualization tools available. One of them that I like and I personally use is called KcashGrind. It was written originally on Linux, but there is a Mac version and there is a Windows version. So you can use Ruby Prof with a patched Ruby and with a KcashGrind or QcashGrind to see what's wrong. So this is how. The command is run your Ruby program inside the Ruby Prof. Just tell which mode to use. In this case, I'm interested in the number of allocations. So I pass... Okay, it should have been dash-dash mode, but I pass allocations to it and I ask it to give me the output in the call-grind format, which is call-tree. And then I visualize it. So let's go back to this code and talk about what do we expect? How many allocations do we expect? How do I think how many are there on the left side? Like this simple loop and inside the loop, we allocate the array every time. So we have 10,000 arrays, at least. And on the right side, it's the same. We allocate 10,000 arrays. And that's it. That should be it. And of course, that's not what we see in the profiler. This is a visualization of the profile of the inject. So as you see, the whole program actually allocates 30,000 objects. 10,000 are allocated in this integer times. That's the actual array. And 20,000 objects are allocated by inject itself. So you can see that there is a column called self in the output and it shows 20,000 objects for the inject that does nothing, right? It's just an empty inject. It does nothing, but it allocates, at least in this case, it allocates two objects per call. Why? We'll see why. And look at the while loop. This is the profile of this while loop code. No extra allocations. Just 10,000 arrays and that's it. So what happens? And how can we understand what happens? That's more important. How can it be that inject is slowing down your Ruby application? And the only way to know is actually look at the source code. And in this case, the inject is implemented in C. So let's go and see what it does in C. For those of you who are not familiar with C, do not fear. You can always take a look at the C code and understand what it does, because Ruby is so cleanly written that even if you don't know any C, you can actually understand what happens. So in this case, it's easy to understand. There's an object that inject creates that is called MAMA. That's an internal storage for inject iterator. And this is exactly one of the two objects, extra objects that we create. But just by looking at the code, I couldn't tell where is the second one. So that's the first one, whereas the second one. I couldn't tell that. So I ran this inside GDP, inside debugger. And this is also quite easy to do. And if you don't understand the source, you can just run Ruby inside GDP. It's as simple as doing GDP, and then path to Ruby executable. Then you, in this case, I knew where my code is. So I knew the function name. So I used the L command, list command of the debugger to see the source listing for the function. And I used the B command to set the breakpoint on that function. And then you can run your Ruby code. In this case, I'm just passing the Ruby code as an argument. So I used the R run GDP command, and then pass minus E, minus E is usual Ruby interpreter command option, and then my code. I'm just interested in one inject. And then I step over some things. I see that inject is creating this member object, and then it goes to RB block call, which is, it's calling the block that I passed to inject using the iterator. So it goes inside RB block call. It sets up some things, and then it calls the iterator itself. And here what it does. When it calls the iterator, it converts my block into the function. Function node in this case. And this is the second object. Here we go. So this is why Ruby created two objects for me instead of zero, as I would expect it. So this is the code. This is the first one, and this is the second one. So two objects, two node objects per one inject call. It's been called, in my case, 10,000 times. I get 20,000 extra objects. So as you can easily guess, it's a lot of objects, and you can easily pass the garbage collection threshold. So you can trigger the garbage collection because of these allocations. So it's indeed some work for the GC. You may ask if other iterators are as bad, and the answer is yes and no, of course. So I compiled a huge list of iterators, and here's just some representation of it. So bad guys are all, each with index, surprisingly, inject, find, any, and the good guys each. If you call each, not each with index, you're just fine. You're not allocating any extra memory. If you want to see the whole list, of course, it's going to be in the book. That's shameless plug. But that's not what I am trying to urge you to do. So you see that by using a profiler and by using a debugger and by looking at the source code, I actually understood what happens. I actually understood why this is slow, and I understood that I shall not, probably, I shall not use bad iterators like Injector and all inside other loops because they will become a problem. They might become a problem, of course. So this is the tip which I can give you, but the value is not in the tip. The value is in the process. You can repeat the same process. If you see something weird happening, just run a profiler and go look at the source code, go inside debugger to see what Ruby actually does, and then you will understand what happens and why it does it. So that was one example. And I have another example which I really like. As you might know that people always tell you not to use band functions. So everything that ends up with band is good for performance, right? Even if you didn't know that, I'm telling you everything with the band at the end is good for performance. It's not supposed to allocate a new object. It's supposed to modify the object in place. So in case you don't need it, here's the 10 megabyte string and I'm doing the replacement. So in case you don't need the previous version of the string, you just use the band, or at least you're supposed to use the band, of course. Supposed. I guess what happens? I replaced G-SOP with a G-SOP band in my code and so no improvement. Then I set up to figure out why. And you know why? I don't know why. So I went back and ran a profiler on that code. So I ran the profiler on the G-SOP band to tell me what's wrong. So in this case, I was actually suspecting that G-SOP band does not save me any memory. It copies memory. So the profiler I used is the same Ruby prof, but with a memory profiling. So I'm measuring memory allocation, not object allocation. I'm measuring memory usage. So this is what I see. So the flat profile section here will have this string, multiplication that allocates 10 megabytes and you will see that the G-SOP band itself also allocates 10 megabytes. So here we go. The performance tip did not work. And actually what happened is that G-SOP band did not save me any memory at all. It copied my string in memory to do a replacement. So what I saved is just a slot for one object, which is 40 bytes. That's a great savings. And as with the iterators example, so you may ask that are all band functions are bad? No, not at all. I haven't checked them all. I just checked one. I haven't got time to do that. So I just checked down case. And then down case is just fine. As you can see, there is a string, 10 megabytes string, and down case itself allocates 0 megabytes of memory. That's what I like to see. And in this case, the performance tip worked. So again, the thing that you shall remember after this is not that G-SOP band does not save your memory, but that if something bad happens or something uses a lot of more memory that you expected to use, just use the memory profiler. It's easy to use. Remember that you do need to have a patched interpreter to do that. But profile memory and you'll know what happens. So in any case, challenge all performance tips, tricks, and best practices. Just try them for yourself. Run inside the profiler and see what you gain, what's your gain. Because in this case, I committed this crime. I always was telling people, use G-SOP band. It will save your memory, but it just doesn't. So I was wrong. So these are the two simple examples that I wanted to show you. And the conclusions for this talk are that go and profile your code. It's simple. It's not as hard as people telling you to... It's simple. You don't have to take any guesses. You just profile and see what happens. So if you don't know what's wrong with your code, you need to profile CPU. That's the default monitor for Ruby prof. If you don't specify the mode, you will go to CPU mode and you do not need patched interpreter for that. If that doesn't help and it will not help in 80% of cases, you need to profile memory. So go ahead and profile memory, profile memory allocations. Do not profile number of garbage collection calls. I've seen that Ruby prof can tell you the number of garbage collection calls. But this is what you don't want to profile. It's the derivative function of the amount of memory that you allocate. So you need to profile for the first order of things, not the second. So profile, CPU, and memory, and that's it. And if something bad is happening, just look at the source code, see what it does, and try to understand. So again, big thanks for the sponsors. And if you're interested to learn more about Ruby performance, if you are not into profiling yourself, go ahead and sign up for the book mailing list. If you want to talk to me about performance, just let me know at Twitter. And thank you for your attention.