 use these. How is everyone feeling? I feel amazing. I feel incredibly good. I was actually in Japan 48 hours ago and super jet lagged. I'm completely jet lagged. However, I decided to go take a nap and I took a nap and naps are amazing and I feel really good now. So I'm going to give a talk now and I'm kind of excited because no matter how poorly I do, I'm not going to get invited back. In fact, I'm going to make a huge faux pas right now and get my clicker out. It's in my bag. One sec. That wasn't a pun or anything. Me getting my clicker. Okay. So this talk is titled, Reducing Memory Usage in Ruby. Hello. Hello. I am so excited to be here. I am excited to be here. My name is Aaron Patterson. You may also know me on the Internet as Tenderlove. If you don't recognize me here on stage, this is what I look like online. I do look different. That is a wig. Some people don't know that. Aaron, you've cut your hair. Yes, I did. I work for a tiny startup company called GitHub. You may have heard of it. It is the only legit company I've ever worked for. I love using Git, but I will not force push it on you. Yes. So I use these. I use these puns at every conference now that I have started working for GitHub. However, our company has recently been in the news, so I feel like I'm going to be able to branch out soon. In fact, I think that this will open a lot of doors for me. Did I say doors? I meant to say windows. I can't wait. I'm going to make so many puns. It's going to be so good. Okay. I have two cats. One of them is named Gorbachev Puff Puff Thunder Horse. This is him. You may have seen me airdrop to you, or you may have rejected my airdrops. I'm not going to call out those people rejecting my airdrops. This is my other cat, CTAC airport, Facebook, YouTube, Instagram, Snapchat. What is another social network? I don't know. Anyway, her short name is Chuchu, but we just keep adding social networks to her name, mainly because cats have, she has no idea what her real name is, so it doesn't really matter. This is the very last Goruko. I'm excited to be here. I'm really happy to be here once again in the big apple. Francis specifically told me not to call it the big apple, so I have to. I'm trying to do all of the typical tourist stuff I possibly can. Going to Sabaro and going to Times Square. Getting mugged. Yes, getting mugged. I'm staying at the Hilton, there's a Hilton up the street, I'm staying there, and I'm a member of the, they have a points thing, so I am a member of it, and it's called the H Honors, so every time I stay at a Hilton, I say like, I'm like, hey, I need you to enter my number in. I'm a H Honors member. So I went to the Hilton today, or yesterday, and I did the same thing, and she said to me, well, you can't use it because you booked through a third party, so it's not going to help you out in Iran. So that was my experience at the Hilton yesterday. Anyway, so let's talk about, let's talk about some Goruko first. Goruko has always been a special conference to me, and I have a lot of, I have a lot of firsts in my career associated with this conference. So Goruko is the first regional Ruby conference that I've ever been to. I think it actually is the first regional Ruby conference. I think there was just RubyConf, and then Goruko is the first regional one, and I loved attending these Ruby conferences because, as Francis was saying earlier, like back at that time, there weren't many technical conferences for programmers. I actually, I happened to be, well, not at the time, I just got my first Ruby job, but I used to be a J2E developer. Don't tell anybody that. So I used to be a J2E developer, and I would go to conferences, and it was really not fun because I couldn't use any of that knowledge. First off, we used a proprietary J2E container at work. So anything I learned at work, I couldn't really use in other contexts, and whenever I went to conferences, it was mainly marketing stuff, and it was nothing like I could use very much. So going to Ruby conferences felt like a breath of fresh air, and I really felt that way going attending Goruko as well. So Goruko was also the first place where I made a talk proposal. It's the first conference I ever proposed to talk to, and it was also the first conference to reject my talk proposal. But here I am today delivering the last keynote at the last Goruko. So this is a bit, it's a huge honor to me. It really is, and I want to say thank you to the organizers. Thank you for all of your hard work. Please give them a round of applause. They deserve it. So let's get to the, we've gone through the jokes part of my presentation. So let's get to the actual technical part. I'm going to talk about reducing memory usage in Ruby, and I'm actually going to talk about two patches that I wrote for Ruby for reducing memory usage. We're going to talk about loaded features cache, and we're going to also talk about a thing which I call direct I seek marking. And we're not going to look at the code so much as the techniques that I use to find these optimizations and how, like how to actually derive these particular optimizations. Because I don't think that the, I don't think that the patches themselves are that interesting. I mean, they're interesting from the perspective of, yes, it reduced the amount of memory that we use. But I think it's more interesting for us to learn how to make those, learn how to find those things, because we can use that, we can use that knowledge in different contexts and maybe in our own applications. So the first thing we need to do when dealing with memory is find, being able to figure out what our memory usage is, and MRI, Ruby is written in C, or C Ruby is written in C. So we need to be able to find memory usage in C-based programs. And there's two ways that I typically go about finding memory usage in C-based programs. And the first way is not very good. It's a very bad way of finding memory issues. And that is reading the code. I think this is the first, this is the worst way to find a memory issue in your program. But sometimes you have to read code, so I do that. The other thing that I like to do in C-based programs is what's called malloc stack tracing. And in Ruby, we actually have two different types of allocations. We have allocations that are made with the garbage collector. We have to worry about allocations made with the garbage collector. And we have to worry about allocations made with malloc. So we need to worry about GC or allocations in the GC or allocations in malloc. And of course, GC memory is also allocated with malloc, but we have tools in Ruby to help us inspect Ruby objects. So in this case, it's not as interesting to me. We could use these tools, for example, our object space, if you look at the object space API that's built into Ruby, or the allocation tracer gems, these are both useful ways for us to inspect memory, Ruby-based memory. So when we're looking at Ruby objects and whatnot. But if we need to find things like array bloat, which I'm going to talk about later, then we need to use lower-level C-based tools. So one of my favorite ones to use is what's called malloc stack logging. This tool is only available on macOS, but there are equivalents on other operating systems as well. So I'm going to show you how to do this on macOS, but you can definitely reuse this knowledge on Linux. Oh, boy. I don't know about... Oh, I don't know about Windows. This talk isn't recorded, is it? Okay. Moving on. So, malloc stack logging. This is how you use it on macOS. First off, you have to enable the logger using an environment variable. That environment variable enables it. In this particular case, what I'm going to do, what we do is I'm booting up a simple Rails application, so we're profiling a Rails app. We print out the PID, we clean up any garbage because we're not really interested in any live garbage, and then we pause the process. The reason we have to do this is because we're going to, in a different terminal, get heap information from the process, and when we start a process, the... I can't remember what it's called, but the address is where things are allocated is randomized, so we have to profile live processes so that we know the addresses of things. So start this up, pause the process, and then in a different terminal, we dump these malloc logs using this tool called malloc history. So we can give it a... We use malloc history, we give it a PID, and in this case, we can say all events, which mean I want to know about all of the allocations and all of the freeze that happened in the process, and then we dump it out to the log. We need to make sure that you have some disk space, because this simple... This is just a very, very basic Rails app. If we look at the log for that, it's actually a little over six gigs. So you need to have some... Have some room on your hard drive when you're using this. So if you look at the log file, it looks like this, where we have basically an allocation. It says alloc at the very beginning. That tells us what the allocation is. And we have a memory address, so we know where it was allocated. We have a size, and then we have a stack. And in the case of the free, we have similar information. We know it's a free. We know what address was freed, and we know the stack where it was freed. So we can actually... If we process these logs, we can reconcile live memory within the application at any point in time, basically. So this is an example program to reconcile that data. So since we have a record of every malloc and every free, while the program is executing, we can know a few things. First, we can know how much memory the program was using at any point in time. So we have a history of the entire time the program was running. Second, we can know what is allocating the most memory, because we have all the stack information from everything called alloc or free. And finally, we can know how much memory is alive when we pause the program. But these stacks aren't that interesting to us, because we don't really care that malloc was called, for example, here. We don't care that malloc zone calloc was called, or that we don't care how much that particular function is allocating. We care about its callers. So in this particular example, we want to know that rbObspace alloc called malloc. Those are the ones we want to blame. We want to know who called malloc. So basically all we have to do is modify this previous program to just look at the caller of that malloc call. And if we do that, we can get an idea of who the top allocators are in our program. So here is an example of just some sample data. We know that 16% of our Rails boot is used up. The booted process is used by RB ASD new nodes. So we can go inspect that function and figure out what is actually allocating all that memory. So in this case, we combine the two techniques of malloc stack logging, instrumentation plus reading the code. So we instrument the process to find where we're using up memory, and then go read that function to figure out how we might reduce the memory. So now that we know how to find where our program is using memory, let's talk about the improvements that I made. The first one I want to talk about is a loaded features cache. This is a, I made an optimization to the loaded features cache. And in order to talk about this particular optimization, I want to talk about a shared string optimization. This is just a general optimization. So for example, we have a string in Ruby, this particular string here at the bottom. This is the Ruby code to generate those strings, x, a, and b. Strings in Ruby are actually backed by character arrays in C. So we have a Ruby object that points off to some character array in C. Now, because of that, what we can do is we can say, okay, well, the x variable points at the head of that. And so does the a, even though we duped it, we have two Ruby objects, but they both point to the same string. And the b, the b object, in fact, it can point into that string and say, okay, well, I'm slicing from a to the end. So in this case, we have three objects, three Ruby objects that all point to the same C string. So this, this optimization works because Ruby strings are implemented using the C character arrays. Now, unfortunately, what this means is that we actually have to use the string all the way to the end. We have to, we have to go all the way to the end of the string in order to share them. So here's another example. Let's say we have this x variable and we take a slice from it, but we don't slice all the way to the end. In this particular case, what will happen is we have to get the first couple characters and make a new, a new string from it. And the reason for this is because C strings are ended with a null byte. So we have to have that null byte at the end in order to know whether or not it's actually a string. So the shared string rule is, and there are a few caveats to this particular optimization, which please come talk to me later. I love to talk about this, and we can get into the fine details, but just imagine this for now. Shared string rule is always copied to the end if you can. And we will use this, we're going to use this optimization to reduce memory usage with loaded features cache. So what is the loaded features? What is this thing? It is a global variable that is basically a database of everything that has ever been required in your program. So if we look at this array, it's basically an array. If we dupe the array, require some new string, require some new file, and then take a look at, take a look at the difference, we'll see that it has this new, this new foo variable in it. So it has a list of all the things that we ever required. Now one behavior of Ruby that I'm sure we're all used to is that if you require the same file over and over and over again, it only requires it once. So this particular program is only going to require the foo file once. Now Ruby uses that loaded features array in order to determine whether or not something has been required already. Okay. So an annoying thing is what is the same file? What does this mean? This is actually kind of a complicated problem. If we take a look at this, these are all actually requiring the same file. So you'll see that the arguments to require are different in each of these. But these all try to require exactly the same file. So we have to know, we have to have some particular heuristics about the parameters sent to you require. Now a problem with this technique, this loaded features cache, is that doing an array search is slow. If we have to search that array every single time we require a file, the boot time of our process is going to get really slow doing array searches. So array search is slow. So what we need to do in order to speed this up, Ruby generates a cache of possible parameters that you could pass to require. It's basically optimistic. It says, okay, I figured out the file that you're trying to require. Now I'm going to create a cache of things that you could possibly send to require that may be the same file. So what that cache generation looks like is, let's say we require this file abc.rb, we'll generate a cache that looks something like this, where these are all the possible parameters that could have been used to require that particular file. Now we have kind of a fast way to look up those files in the loaded features cache. If we look at the cache structure, it looks something like this. We say, okay, well, that file points at index two. So all these possible parameters point at index two. And when somebody recalls require, if we get a hit in that cache, we can go look at that index and figure out whether or not it's actually the same file. So there's a function for generating these keys. And this is what the function looks like. You don't need to read it too much. It's actually written in C, but I converted it to Ruby because this is a Ruby conference, not AC conference. So let's look at how the actual algorithm works. The way that this algorithm works is, let's say we require abc, this string, abc.rb. First we start off with two pointers that point at the very end of the string. Then we move one pointer back to the first period. Then we move the other pointer back to the first slash. From here, we're able to determine our first two keys. The first two keys go from the next character to the end of the string and from the next character to the next pointer. So in this particular example, we'll end up with c.rb and c as our two cache keys. Then we move the pointer back again to the next slash and repeat that process. So we end up with two more keys. We move the pointer back again. We end up with two more keys. And we keep scanning until we get to the very end. So this is how we end up with all of these keys. So the way that this algorithm works is, internally, if you look at that actual Ruby code, we're doing substrings on this string. So we'll call rbsubster on these and just keep copying these strings across. Now, what this means is that we'll actually end up with multiple c character arrays. This is the structure that it'll look like. We end up with eight Ruby objects that point at, what is that, four different or five different c character arrays. So the other problem is we have to call malloc to allocate all of these. So let's do, one technique we're going to do is reduce mallocs with that shared string optimization we talked about earlier. So instead of copying from one string, what we're going to do is create two strings. So we'll say, okay, we're going to take a substring of this down to the dot. So we have two character arrays now. Then instead of taking substrings from one, we'll take substrings from the two, depending on which key we're trying to generate. In that case, we'll end up with substring algorithm that looks like this, where we have eight Ruby objects that only point at two character arrays. So we've been able to eliminate, what is that, three mallocs. Now the next thing that I wanted to do here is eliminate these Ruby objects altogether. If we looked at that loaded feature hash, it pointed at a bunch of Ruby objects and all those all pointed to the C character arrays. But this hash is actually implemented in C, and that means that we can bypass these Ruby character arrays. We can say, I want the hash to point directly into that C character array. So instead of having this intermediate Ruby object, we'll say, I'm going to point directly into this thing. So there's a technique that we can use in C to eliminate these character array allocations. So let's take a look at the implementation of this. Here's the implementation. This is the patch that I wrote. That is the entire thing. It's very simple, as you can see. Yes. So we'll measure, we'll measure the impact of this. We can measure the impact by just requiring some files and checking how many objects they allocate. So here's an example of doing that. This is using the allocation tracer gem, and we can see how many objects doing or require allocates. And if we look at the output of this, this is the output on Ruby 2.5 versus the output on Ruby 2.6. I know this is hard to read, so I made a graph out of it. The green bars are Ruby 2.6. The blue bars are Ruby 2.5. And you can see here that we've been able to reduce the string allocation by about 50%. Now, what we can do next is we can say, okay, let's take a look at the malloc stack logging and see how this impacts the amount of memory used. And this is an example from the actual Rails application boot process. Now, what this graph is, I'm going to explain this graph. The bottom of this is not actually time. The bottom is the x-axis is actually sample number. So every time we call a malloc, we consider that to be a sample. The y-axis is the total amount of memory at that particular sample time. So the very end of that is the total amount of memory used by the process at the time that we paused it. And one of those is shorter because we actually call malloc fewer times. So if we can move the line down and to the left, that is better. So before is up there at the upper right. And after is a little bit below that. And with this particular patch, we saved about 4.2% overall for a Rails boot process. But of course, this depends on how many files you actually require. And the neat thing is that the more you load, the more you save. How much would you pay for this feature? $100? $200? Well, you can get it today for free by upgrading to Ruby 2.6 when it gets released. So I can't tell you how much memory this will save on your particular application because it really depends on how many files you require. But of course, the more you require, the more you save, as I said. So here is the... I wrote this patch. I have a little bit of backstory about this patch. I wrote this patch. I found the memory issues. I wrote the patch. I was really excited about it. So I uploaded it and posted my results. I was so happy. And then somebody commented on it, this person by the name of Funny Falcon. He commented and he said, hey, did you compare this to this other patch? And I was like, I have not seen this other patch. So I go and click on it. And it turns out he had found exactly the same problem that I had and written a patch that was very, very similar to mine. However, he posted this five years ago and nobody had applied it. So I applied his patch instead of mine. So the moral of the story is always search the issues. All right. So the next patch that I want to talk about is called direct instruction sequence marking. Unfortunately, this technique means that you need to know a little bit about how Ruby's virtual machine works in order to understand how this technique works. So we're going to talk about Ruby's VM for a minute. Actually, more than a minute because I need to use more than a minute. All right. So Ruby's virtual machine is a stack-based virtual machine. What that means is that we have a list of instructions and we have a stack. And these instructions manipulate the stack. These instructions represent our program and the stack represents our workspace. It's like what our program is doing at the time. Now, we have an instruction. Each instruction has zero or more operands. And in this particular case, we have a push with a three operand. And we also have a program counter. And this program counter always points the next instruction to execute. So the way that the VM executes is it says, okay, I will increment the PC and then do what that instruction said. So here we push three onto the stack, next we push five onto the stack, and then we add the two and push that onto the stack so we get eight. But how do we get instructions? The way we get these instructions is by compiling our Ruby code. Our Ruby code gets compiled before it's executed and that compilation process results in instructions. So the processing phases are basically we take some source code, the text that you wrote, we convert it to an AST or an abstract syntax tree. And that AST gets converted into a linked list and that linked list gets translated into byte code. So we can roughly divide these steps up into essentially parsing and kind of compiling. Kind of compiling the code. Optimizations to your Ruby code are done here during the linked list phase. That is because linked lists are a bit easier to operate, manipulate than the AST is. And then finally our product is the byte code. That's the final product that we want and that's what we actually execute. So let's take a look at some of the data structures used here. Now as I said, our program gets converted into an AST which stands for abstract syntax tree and that's really just a fancy way of saying a tree data structure. I'm not sure why it's called abstract because it is a very concrete tree. All right. So first off we take our Ruby code and we convert it to an AST. So our plus operator is represented by a node and it has two children, the three and the five. And what's neat is these nodes are actually represented internally as Ruby objects. This is going to become important. They're represented as T node objects. Now next we take those T node objects and we convert them into a linked list and we do that by walking the AST. So first we visit the plus node. The plus node has some children so we have to go visit those children before we can finish processing the plus node so we visit the child. We visit the child and three has no children so we add a node to the linked list which is our push instruction for three. Then we go to five. We add a linked list node for push five. Then finally we're done processing the children for plus so we add a plus linked list node. So we end up with a linked list that looks like this. Here is kind of the C code converted into Ruby. It's essentially just a recursive procedure for processing these AST nodes. So the next thing that we do is we apply optimizations. However, I am going to hand wave over applying optimizations. Yes. Visual pun. Yes. So I'm going to hand wave over applying optimizations because we're talking about memory optimizations not actually speed optimizations here. So imagine that we do our optimization pass. We start out with a linked list and essentially what the optimization pass does is take that linked list and convert it into another linked list that is an optimized version of our code. So since there is nothing to optimize here, we will just call this the optimized linked list now. Yay, we did it. Our code is optimized. All right. So before we get to talking about byte code, let's talk about what is byte code? What is the byte code that's executed by the virtual machine? Now, this byte code is actually just a binary representation of your code. Binary representation of the instructions and operands that is going to execute and it's literally just a list of numbers. It really truly is a list of integers. And this will be important a little bit later. So what we do to get this list of integers is we walk that linked list, we say, okay, well, I have a push instruction here, push three. So I'm going to need to convert that into some byte code. And we'll say that the push instruction is represented by the number one, two, three. And the operand, we'll just say is three, the number three. Same thing for push five, we represent that push as one, two, three, and then five, and then maybe add, we represent as four, five, six. So this is how we end up with our array of instructions. Now, what's another interesting thing about this is, as we said, these linked lists, these are also Ruby objects represented as T nodes. Over here, the byte code is also a Ruby object. It is called an immo object. So it is represented internally as a Ruby object as well. So we can make a simple VM using this knowledge. All we have to do is we have to know, okay, well, we just have instructions and a PC and the PC walks those instructions. But a difference here is that this, okay, I messed up my transitions. When we increment the PC in this particular case, we only have to increment by one. But if we look at our instruction sequences, we don't have an array of arrays, we have an array just of numbers. So we may have to increment the PC more than one. So in this particular example, we need to increment our program counter by two in order to push three on the stack. And then we need to increment it by two again in order to push five on the stack. And then we only need to increment by one in order to do our ad. So let's write our VM. This is a very simple implementation of it. Just a loop that all it does is executes that byte code, pushes things onto the stack, etc. And right here, we have an extra increment to our program counter. So we have a very simple VM here. So we know how compilation works. We know how the VM works. And I think this is a really good accomplishment for us so far. So let's do one more, let's do one more program, and then we'll look at a little bit more, to look at a little bit more details of the VM. In this case, what I want to do is I want to manipulate a string. So we have this program here on the left, all it does is, we can all read this as a very simple program. In this case, we have two strings, we had numbers before, but in this case, we had two strings. So we'll convert this to an AST again, like this. Now what's interesting here is when it gets converted to an AST, these string literals are actually represented as Ruby objects. So that hello and world, those are Ruby strings. They are stored internally as Ruby strings. They're Ruby objects. Now we convert this AST into a linked list. We go through the same procedure we went through earlier, gets converted into a linked list, but those two Ruby objects down here at the bottom, those are inserted into the linked list. Those are the same Ruby objects. They are truly Ruby objects. So then finally, we translate this linked list into bytecode. It goes through the bytecode translation process. And the bytecode, since the bytecode is just numbers, we can't insert a Ruby object in there. We have to insert an address for the Ruby object. So we have a pointer. And these pointers are object addresses that actually point at the Ruby objects themselves. So we have Ruby objects being pointed to in our virtual machine instructions. So let's write another, let's write a VM again in Ruby, but this time let's make a little bit, before we do that, let's make a little change to our bytecode. So we have, this is our bytecode representing that program before. We don't have pointers in Ruby, so we're just going to embed the actual strings into the bytecode. So this is the bytecode we'll use in our VM, or with our VM. So we'll make our VM implementation. This is, you don't need to read it too closely, but we have essentially the same thing. Now, if there's a bug in this VM, when we run it, if we execute the same program twice, the output will actually be hello world, hello world world. So why did this happen? Well, the reason this happened is we have Ruby objects actually stored inside of our instruction sequences. So when we execute this code, we'll push the string hello onto the stack, we'll push world onto the stack, and when we append, this is actually a destructive operation that actually appends those two strings together. We have those two strings that point at each other, and when the append gets called, it turns that one into hello world, and that actually mutates the string that's stored inside the instructions as well. So it becomes hello world in that case. So then when we execute the program again, we have hello world stored inside of our instruction sequences, and we end up with hello world, world, hello world world. And as expected, if we were to execute this program three times, we would end up with hello world, world, world, et cetera. So how do we deal with this problem? The way that we deal with this problem is that we have to actually dupe the objects before we push them onto the stack. So right here you'll see I added a new line that just called dupe on that string. So when we execute this program again, it'll push copies onto the stack. So this is exactly how the virtual machine works today, and if you look up the push object instruction in the VM, you'll see that it has to do this duplication before pushing onto the stack. What's interesting is this also explains object allocation in our code, so let's say we have some code that looks like this. When we compile this code, we end up with one immo object and two string objects, but when we execute the code, we end up with two more string objects because we had to dupe them when we put them onto the stack. Side note, this is why the frozen string literal can help us out. So if we specify frozen string literal here, then the compiler knows that these string literals cannot be mutated, and because they can't be mutated, we can optimize this and say, okay, we're not going to dupe it when we push onto the stack, we're going to push the real thing on there because it can't be mutated. So in this case, we'll have two strings, one immo, and only one string object allocated at runtime, which is our new, our concatenated string. And I think knowing this is neat, this is neat. This is neat to me. I hope it is neat to you as well. So let's look at actually reducing memory usage. We've learned enough about how the VM works in order to understand the technique for this particular patch. So byte code is stored on instruction sequences. Instruction sequences are Ruby objects, and note that that means that your code is actually managed by the garbage collector. So your code is managed by the GC. Note that I seek equals instruction sequence. So let's take a look at the object layout of these objects that are being managed. So we have this I seek object, which is your code, it points at the byte code, that list of numbers. That list of numbers, as we talked about earlier, has an address for those Ruby strings. So it points at some Ruby strings. Now, if those Ruby strings were to go away, like let's say they get GC'd, then your program will blow up. So if the instruction sequence object does not tell the GC, hey, I have a reference to these strings, those strings will go away, and your program will explode. Like this. Sets fire. So how did the VM deal with this? How do we keep those strings alive? The way the original VM author chose to deal with this was to introduce something called a mark array, and essentially what this array is is we have the instruction sequence points at an array, and this is truly a Ruby array. And as the program is compiled, when we come across those string literals, we'll add a pointer to them from the instructions like this. There we go. And we'll also push it onto the array. So as we're compiling, we get more string literals. Now we have an array that has a reference to these strings, and also the byte code has a reference to these strings. So when we're done compiling, this is kind of the structure that we'll end up with. Now, when the GC marks this, the GC will mark the instruction sequence object, and the instruction sequence object will say, hey, I also need you to mark this array. It goes and marks that array, and it's truly just a regular old Ruby array like you would use in your programs. And the Ruby array says, oh, I have some strings internally, and I need you to mark those two as well. And now those two strings will stay alive, and your program won't blow up. So there's some problems using this mark array technique, and I think the first problem is just an organizational problem. So we have here essentially an instruction sequence object maintaining two pointers to these strings. One of them may be through an array, and the other one is through the instruction sequences, but we're duplicating information. If you look at the mark function for the instructions, you will never know that this reference exists, this dotted line reference exists. So you don't know that these things exist. So from the standpoint of reading the code, these references are hidden, and I think this is just an organizational problem. We have duplicate information here, or what I like to call hidden references. The other problem is that we have the array bloat, and now what this is is when you add items to an array in Ruby, sometimes the array has to increase the amount that it can store. So every time you add an item to an array, it doesn't increase the capacity of the array by one, it increases it by some amount, such that next time you append to the array, it can append faster. It doesn't need to do any more allocations. So if we graph that, here's an example of a graph, as we add items to the list, blue is the number of items in the list, and green is the capacity of the list. So we can think about this, this right here is unused memory. We don't use that unless we fill up to capacity. So normally this isn't a problem because typically you're manipulating arrays and throwing them away, but these mark arrays stick around forever. This array lives forever in your program, so if you have that excess space, it's just sitting there doing nothing. So how do we deal with this? We can remove the bloat by resizing the arrays, that's one technique, but I think it would be even better if we just removed the array completely. And we can actually do that. We can do that and the solution is almost exactly the same as the techniques we use to write the virtual machine in the first place. So what we can do is we can say, okay, if we had a function that would walk through the instructions and say, okay, I'm going to decode this instruction, I see that this instruction has an array or has a string parameter, so what I'm going to do is I'm going to mark that parameter and then continue through the instructions decoding and marking. If we were to have a function that could do this, we wouldn't need that mark array in the first place. And this function looks exactly the same as our virtual machine looked like, but rather than actually executing those instructions, we're going to say, I need to mark something instead. So rather than execute, we just say, hey, I want to mark this parameter. So using that, this particular technique, it means we can eliminate this array completely and mark the objects directly. So we mark objects in the instruction sequences directly. This is why I call it a direct marking technique. So real-world example or real-world impact of this, here's a basic Rails application. Along here at the bottom, those are the different types of objects that are allocated when you boot the application. And green is Ruby 2.6, blue is Ruby 2.5, and you can see that we have an array reduction. A reduction in the number of arrays allocated when you boot your Rails application. In Ruby 2.5, we allocate about 35,000 arrays. Ruby 2.6 is down to 8.5, 8.5,000. And overall, this is a 34% reduction in the number of objects made at boot. Oh, thank you. No, no, no, no, stop. This isn't that good. It's okay. I mean, it's nice. But the problem is, even though we've reduced the number of objects by 34%, that doesn't mean the amount of memory that your program uses reduces by 34%, because there's a bunch of other crap in there, too. So if we use the, if we use the malloc stack logging stuff that we were looking at earlier, we can see what the actual impact of the total memory usage is. And this is almost exactly, this is the same graph we looked at earlier, but using this patch rather than the loaded path cache, whatever we talked about earlier, that thing. So this up here is before that red line, and down here, the green is after the patches applied. And that means that what this means is our process memory is reduced by about 6%. Now, of course, this actually depends on how much code you load and the more you load, the more you save, yeah. How much would you pay for this feature? $50? $100? You can have it today for free by upgrading to Ruby 2.6. So this patch is here. You can take a look at it if you want to. Now let's take a look at the actual code itself. This is a patch that's actually very, very simple. It is so conclusion. Let's wrap this up now. We learned about virtual machines. We learned about memory management. But I think most of all, we learned about ourselves. Upgrade to Ruby 2.6. Thank you so much for having me.