 That said, I'm really happy to introduce our first speaker, Aaron Patterson, also known as Tenderlove. I feel weird doing all of these because like, who doesn't know who Aaron Patterson is? Our local boy turned famous. I think Aaron is like the epitome of you two can become a major global success Ruby core team member, Rails core team member if you get out of Utah. So, thank you, Aaron. All right, all of you other speakers out here, remember to when you come up on stage to ground yourself and not just, you know, get down to earth, but I mean literally touch ground so that you don't ruin the equipment. All right, sorry, that's a different presentation. Sorry. Hi, hi. If you get anything out of this presentation, this is what I want you to go home with is if you go to preferences, keyboard, text on your machine, you can enter a bunch of stuff, which is, this is what I have. And what it does is when you're typing stuff out, it's basically a shortcut. So when I type face, it turns into that. Or if I type parts, it turns into that and I'll give you a quick little demo here. So this is like, it's a practical information that you can have on your Monday morning. See that, right? Yes. And then I can do that. I got a bunch of hearts there. So please, yeah, if you learn anything from this presentation, take this home with you today. So I came in, I came into this building and I was coming in and I saw, I heard that Confreaks was recording and then I saw this sign that said no taping or photography allowed in the theater. And it made me laugh because I just thought that we were having some Mountain West anarchy. So, you know, use that hashtag if you want to. All right, so today I'm gonna talk about how our methods formed. I personally wanted to call this method mania but I thought it might be a little too non-descriptive. And we're here today at Mountain West in case you have forgotten where you are this Monday morning, just as a reminder. I tried to come up with the conference name in Emoji and this is the best that I could do. We have a Mountain West, that's a map. So that was the closest I could find a West and then there's no Ruby, so I did root B. So that's Mountain West, root B. Now this is, you may or may not know but this is the final Mountain West. And I was extremely nervous about doing well like, you know, entertaining everybody and giving a good presentation but then I realized that it doesn't matter how poorly I do, I'll never be invited back again. So it doesn't matter. So to the rest of you speakers, like the pressure is off now, it doesn't matter. It's fine. So as Mike said, my name's Aaron Patterson. You might know me on the internet as Tenderlove, that is my avatar, that's not my real hair. You should follow me on Twitter, 90% of my tweets are puns, the other 10% might be cats or technical content. And it is true, I am from Utah. I grew up here but I moved out in 2001 so if you're from out of town, like don't ask me for any restaurant recommendations, I don't think I can help you. Actually, how many of you are here from out of town? Raise your hands. Okay, okay, like maybe 25% or 30%, okay, that's good. All right, so this is the last Mountain West conference and I wanna tell a story about when I spoke here two years ago about my Tender Mom and Tender Dash. So two years ago I spoke at Mountain West and so I was the backup speaker, apparently one of the backup people canceled and then the next backup person canceled and Mike, I am to me and said, Aaron, would you like to, would you wanna come give a talk at Mountain West? And he'd asked me many times before and I usually said no, because I come home every year for Christmas and I'm like, I'm already home, I have had enough of my parents. I don't need to see them twice in a year. But I thought to myself, it would be nice if my parents knew what I actually do, right? So I would like to see them give me a, see the, I would like them to see me give a talk. And this happened to be perfect, my parents live here in town. So I said, yes, I will give a, I will give a talk at Mountain West, but only if you give me two free tickets for my parents. I was driving a hard bargain there, right? Two free tickets for my parents and Mike said, yes, of course, absolutely. So I showed up with my parents and the thing is both of my parents, both of my parents are engineers and I talk to them very frequently. So they know, like I tell them what I do, they don't think it's weird that I type on a computer all day for a living, but I tell them everything about what I do, but I don't tell them, I haven't ever told them my internet name. That is one thing that I never, ever told them. And so we show up at the conference and I show up at the conference with my parents and we meet Mike and I'm like, hey, Mike is my parents. And he's like, hey, everything's going good. And he's like, all right, I've reserved three seats for you down in the front. And so we go down to the front and there's three seats, there's three seats with signs on them and the signs say tender love, tender mom and tender dad. And I'm just like, oh, no, no, why, why right now? Why right now? So I had to very quickly say to them, I'm like, mom, dad, people know me by this name tender love and just don't worry about it. Like they're gonna, people are gonna ask you about me and they're gonna ask you about this name but just don't worry about it. It's fine, it's all, it's all fine. And I did not explain it any further to them and that was the end of the conversation. That was it, it was done. So now they know, they don't know why and we have never talked about it since then. So that's fine. Anyway, so that was my nice story I wanted to share about Mountain West a couple years ago. All right, so let's move on. I work for a company called Red Hat. That is where I work. I'm on a team called Manage IQ. We develop a project that manages clouds. So if you have clouds that you need to manage, we can manage your clouds. And our project is open source. It's up on GitHub, you can go here and check it out. I'm gonna be talking about this project a little bit later near the end of the presentation for some research that I was doing for these slides and you'll see that later. Also I love cats, I love cats. And I brought some stickers of my cats but I left them on my parents' place. So ask me about them tomorrow. I'll bring them tomorrow. This is one of my cats. Her name is SeaTac Airport, Facebook, YouTube. She likes to sit on my chair and I decided that I would dress her up as Donald Trump. So we did that and it was, it was adorable. I like this one so much. This is my other cat, this is Gorbachev, Gorbipuff. His full name is Gorbachev Puff Puff Thunder Horse. I wanted to change, when my wife and I got married, I wanted to change our last names to Thunder Horse because I think it's awesome. But she said no. So that's unfortunate. Anyway, also my wife really wants me to give a TED talk so she made this slide for me. So I'm obligated to put this into my slides now but I love it, it's amazing. I also enjoy hugs so please come give me a hug later in the day. I would be very happy if I got that. I know it's not Friday but I will absolutely accept Monday hugs too. Monday is a very hard day although we're at a conference which is really awesome. This is a great way to start the week I think. So all right, we're gonna talk about methods today. We're gonna talk about methods and method optimizations and specifically with methods we're gonna talk about types of methods, how methods work, we're gonna talk about byte code and we're gonna talk about VM internals. And then we're gonna go on to talk about some method optimizations specifically inline caches, polymorphic methods, polymorphic inline caches and then we're gonna look at some optimization tests. So actually implementing optimizations and testing them against real code. The important thing that I want you to know is basically there is a method to my madness. Monday. All right so this is a very highly technical presentation and I apologize for that on a Monday morning but I wanna start out, so usually I start out with some jokes and then I go on to the technical portion of my presentation but I wanna start out with something a little bit more soft and I wanna say basically I wanna give some advice for new people in the audience as well as people who are experienced. This is an advanced presentation but I wanna make sure that it's accessible so even if you're a new person who's new to programming I want you to be able to get something out of this, something out of this presentation. So my goal for this talk is to make sure that there's something in this for everyone for all levels and the other thing I wanna make sure is that if you are new, if you don't understand some of the things that I'm talking about don't be embarrassed to ask me questions about this. I mean I didn't know a lot of this stuff. There was at some point in my life I didn't know the stuff that I'm presenting to you. I had to learn it somehow and the same is true for all of you as well. So if you have questions, ask me please. You don't have to ask me at the end of the presentation. You can come up afterwards. I don't bite, I promise. And I wanna say to you who are people who are more experienced and know all this stuff. If somebody is new and comes up to you and asks a question, be kind to them, answer their questions, they need to learn stuff too but as long as you're asking questions make sure to be genuine about the questions that you're asking. Try to learn and that's just advice that I wanna give to new people as well as experienced people as well. So we're gonna be looking at high level method stuff, high level concepts and low level concepts and like I said I wanna make sure that anyone can pick up something from this presentation but don't be shy and ask me questions later please. So let's get started. First off I wanna say this presentation is a failure. I failed, that is it. This is the end, we're gonna get to the end of the talk and you're gonna find out that everything failed at the end but it's fine because what we're gonna do is we're gonna learn about all the stuff. It's all about the journey, right? It's the journey. So we're gonna go through this journey together. So first off I wanna talk about call sites. What specifically a call site is. This is an example of a call site. Here's some code. The call site, you can easily recognize a call site by the dots you see right there. There is a call site right there. That's very interesting and I want you to know that call sites in your code are unique so if we were to repeat that line multiple times there would be multiple call sites there and throughout the presentation I might refer to a left hand and a right hand side and what that means is the left hand side of the call site and the right hand side of the call site so that's our left hand and our right hand side so if I use those terms you'll understand what I mean. I wanna give some more examples of call sites. This is just some sample code here. Of course we have that one that we saw a little bit earlier. We have one here too. It's basically the same except our left hand side is a class rather than an instance. We have another one right here. This is a call site as well but you'll notice that there's no left hand side. It's an implicit left hand side where the left hand side is self and in this case when you're just writing a script like this the left hand side is gonna be main or it'll be whatever object you're inside of at the time. We have a few more down here at the bottom. We've got one here at this case one statement that we're gonna talk about a little bit later but you can think of that as translating to object triple equals x. We could actually rewrite this case one statement as object triple equals x so we have one there. We got another one here that's x double equals 10 that double equals is a method call. We have another one here that's an implicit one in this print. All right so now we have an idea of where these call sites are and we know they're all over the place and we can kind of figure out where they are in our code. So let's talk a little bit about how Ruby's VM works and I'm just gonna breeze through this fairly quickly. It works similar to a calculator and I'm gonna kind of explain to you how I visualize this in my head. When I was in school I used an HP calculator that looked like this, kind of like this actually this is a newer model. Mine was a 48G and it's broken so I have a 49G now and I really love HP calculators because they are rad. So for example if we wanna use this calculator we wanna do nine times 18. Let's say we're trying to figure that out. The way you do that is you do nine, enter, 18, enter and then that puts those two numbers on the stack. So you have the stack here, right? So you hit nine and put it on the stack. We hit 18 and put it on a stack and then we do times and then that pops both the values off the stack and then puts the calculation back on the stack like that. So I know many of you out there are saying that's so much work, why bother? And to you I say go back to your TI-83 plus, we don't need you here. Anyway, so Ruby's VM works very similarly. It has a stack, pushes things onto the stack and then pops them off and we can do exactly the same thing with our Ruby VMs. So we have on the left-hand side our byte code and on the right-hand side is our stack and it just works its way through the byte code. So we say like push six and we put six on the stack, push eight, puts eight onto the stack and we say add and then that'll pop both those off the stack and put 14 on the stack. So that's all it is, that's it. We can go home everyone, we understand Ruby's VM, yay. Now an important thing, obviously there's a bit more to it than that, this byte code, I wanna tell you this byte code is actually stored somewhere so this byte code isn't some magical thing, it's actually, it's in the computer. It's, the byte code is in the computer and it's actually stored as an array of arrays. So if we look into Ruby's VM internals, you'll see it's essentially stored like this. If I were to translate this into a Ruby-ish data structure, it's an array of arrays. So we have the outside arrays are a list of byte code and then on the inside we have each individual byte code with an operator and then an operand and that's essentially what the entire array looks like. So it's important to know that this byte code is actually stored in memory in the machine and you can manipulate it, it's there and can be manipulated, okay. All right, so let's move on to how methods work. From the high level methods, we find the name of the method, then we look at the type of the left hand side, we figure out where the method is given that the above information and then we just execute the method, all right. So that's from a high level, that's how these things work. Low level, if we wanna take a look at this from the low level, we can with, by inspecting the instruction sequences of a particular method. So this is how we can actually take a look at the byte code. So if you execute this, well it'll output the byte code for the foo method. So if we run that, it'll look something like this and you'll see right here, those are the two important lines for executing our method and if we multiply that, if we do that twice, say bar dot baz, bar dot baz and we output the byte code, you'll see, it's essentially, it's almost exactly the same but you'll see those two pairs are repeated in our byte code. So that's what we're looking for and we can match these back to our code so this get local thing is matching back up to the bar, that's our local variable bar and this opt send without block, that's matching to the actual method call, right? Now we can also see here in the byte code we've got our operator and our operand. I know those line, those arrows probably aren't lining up very well but that's right there essentially. Those two things are split and if you move down here, we have the same thing, our operator and our operand and then we have this magic value here over here on the very far right which we're gonna cover more in depth a little bit later. Okay, so as this executes, we can view the stack. If we view the stack while this is executing what it'll look like is we say okay, we do this get local and what that does is it pushes the value of bar onto the stack, so that's our local variable, it's on the stack, then we execute this, the next byte code will pop that off the stack, call baz on it, get the value for baz and then push that value back onto the stack. So that's what the VM looks like internally and it's not much different than our calculator example and we can actually look at how this byte code works by looking at this file, insns.def, if you go check out Ruby, you can find this file, this is the definition of the byte code and you'll see there's the name right there, ops and without block and you can see basically all it does is say search for a method and then call that method. So for your homework, what I want you to do is go mess around with this, grab some methods, output their byte code, take a look at it, just write some small methods, mess around with some Ruby code, take a look at the byte code that's output from that and then go look at insns.def and see what those different byte codes do. It's a really good way for you to start learning how Ruby's VM internals work and you don't really need to know C code too well, you can get an idea of what's going on inside and a key for insns.def is essentially, if you look at the instruction sequences, there'll just be a bunch of different blocks and they all have this header and it starts with the name, the name of the instruction sequence followed by the parameters to the instruction sequence, any values that it's gonna pop off the stack and then finally it's return value. So you can just look at those parts in the header to kind of get an idea of what's going on. All right, so we have to find the method, find the method before we can actually execute it and I've rewritten that as Ruby, I've rewritten the algorithm as Ruby so we can take a look at how that works. Essentially what we do is we say, hey class, give me your method table. Do you have the method? If not, let's try the superclass, if so, great. So we say, okay, we're gonna find this method and we keep calling recursing up to the ancestor chain. So for example, if we have some code that looks like this, when we go to find the method, the algorithm's gonna do a bit like this. We'll say, hey, class A, give me your method table. Do you have foo? Nope, all right, let's try B, nope, C, again, nope, D, yes we have it, great, we found the method, we can call it. So if we think about this algorithm, this means that our method lookup is an ON operation where N is the size of the ancestors. It means that the more ancestors we have, the longer it takes to look up that method, right? The slower, the more ancestors, the slower the method call, which is interesting. So let's do something, let's test method speed. Let's say, okay, this is a great idea. We know the algorithm. We know that an object with 10,000 ancestors is gonna be slower than one with 10 ancestors, so let's take a look. We have a test here that says, all right, we've got a class called 10 with 10 ancestors, and we've got a class called 10,000 with 10,000 ancestors, and note it's not actually, I know it's not actually 10 or 10K, there's a few more there, but come on, who cares, right? Anyway, so we run this, when we run this benchmark and look at the results, this benchmark is using iterations per second, which means that the more iterations per second, the faster it is, and if we look at this output, they're almost identical, right? So I said previously, the more ancestors you have, the slower it gets, and when we benchmark this, clearly that's not true. So how do we speed those things up? How do we speed up these method calls? This algorithm is true, but somehow they're the same speed. So the way that we speed up these method calls is we essentially cache things that never change. So if you take a look at this code, you'll notice, well, the ancestors for the 10, that 10 variable never change, and the ancestors for the 10,000 variable never change, so why do we need to look up that chain every single time? If we know that those ancestors never change, we can just look it up once, and then cache that value, and then the next time we call it, we just use the cached value, right? So that's essentially where this comes in. That cache is stored right here, this call cache. So that method lookup cache is actually stored inline with the bytecode. So this cache exists in the bytecode itself. So it's inline with the bytecode, so we call it an inline cache, okay? So you can go back to work and say, hey, I know about inline caches. They're caches that are stored inline with the bytecode. That's great. So this is what inline cache means. Now what's interesting is when we break that cache, if people say breaking method cache, whoa, breaking method cache, what they're talking about is breaking that particular cache right there, and we're gonna talk about that, how to break that cache in a bit. So I wanna take a very slight detour and look at case when statements versus if statements. So we were looking at case when. We had this particular bit of code here, and we said, okay, we've got all these, these call spots for call sites, but we've got one special one here, this when object, and what I wanna do is let's take that case when statement and break it down and let's expand it into an if-else statement. So we have on the left, very far left, an if-else statement that's using triple equals, and in the middle we've got our case statement that's using just a case when. So these two methods should be exactly the same, right? And we're gonna benchmark the two and take a look at the benchmarks comparing between those two. If we execute our benchmarks, we'll see that actually an if statement runs faster than the case when statement, even though those two methods do exactly the same thing. So why is that? The reason is because we don't actually have a cache at this when statement, where we do have a call cache at all of those triple equals statements, and you can verify this inside the byte code. So if we dump the byte code for the if-else statement, you'll see that there's a call cache there, and if we compare that to the case when statement, that check match, it essentially does the triple equals call, but there's no cache there, okay? Now what I am saying to you today is don't go changing all of your code. Don't go change all of your case when statements, so this is if-else statement. We can fix this, it's fine, okay? It's really fine, but just know that in some cases we don't have a call cache and you can use those instruction sequences to see that. The other thing that I want to talk to you a little bit later, the other thing I want to talk to you a bit later about is noticing how we have all these cache sites everywhere, and this is important because the size of the cache matters. Since we've got call caches everywhere, if we double the size of the cache, that's probably going to double the size of your byte code, right? And you probably don't want that. The memory of your program is going to increase too much. All right, we'll talk about that a little later. Next thing I want to look at is, well, we've discussed where this cache lives, it lives inside of the byte code, but what's actually in the cache? We have to have a key and a value, and I'm sure most of you deal with caches at work, like you might have a mem cache or whatever, and you know you need to have a key and a value. So the key and value here is actually the class of the left-hand side, and the value is the method that we looked up using the right-hand side. So we'll say, to calculate this cache, we'll say, okay, give me the class of hello, and we get a serial number for that class, and then on the value for that is actually the method that we looked up. So this class of thing returns us the very first class in the ancestor chain. So for example, our foo class is going to be just foo, it's not going to be any of the ones above it. So let's look at how to actually make cache misses. When people talk about breaking method cache or making cache misses, let's talk about how we actually do that. And to do that, we need to be able to measure it. So this is how we can measure it. If you do rubyvm.stat, you'll get this hash back, where we have three values in it, and I'm only going to talk about two of those values today. If you break this one, this global method state, this cache impacts every single, every single call site cache in your program. So it's very bad if you break that number. And I wanted to unnecessarily shorten vary to v because I know that that's what the kids are doing these days. So it's v bad. So if you break that one, it's very bad. This other one, this class serial only impacts the particular class that you broke plus its descendants. So it's not good if this serial number increases, but it may not be super terrible, depending on how you're using the code. So let's look at examples of actually breaking the cache. If we define a new module, you'll see that the serial number increases. I should have printed out that earlier. If you define a new class, we'll see that the serial number increases. If we monkey patch the class and add a new method, you'll see that the serial number increases. So the idea here is that anytime the shape of our code changes, this cache gets broken. So we got to think about the shape of our code as in the classes and modules that are defined and the methods that are defined as well. Now this happens as soon as your code gets loaded. So you're thinking, oh my God, when I require all these files, all the cache gets broken, but it doesn't matter because this only happens once. This happens at the very beginning when you boot your program and this cost should be amortized across your process and you're not hitting those caches anyway, so it's fine. Now let's take a look at some, I wanted to come up with some places where cache was getting broken at runtime. We don't care that it's being broken at boot time, we care that it's being broken at runtime, so I decided just randomly, why don't I try this with RSpec? So I said, okay, let's print out rubyvm.stat inside of some it thingies and then run this. And if you run this code, you'll see that between each of the it statements, the class serial number actually changes. So RSpec is doing something internally that breaks this method cache. Now don't rush out and change all of your RSpec tests to mini test tests, although Ryan might tell you something different. We will try to fix this, right? This can be fixed. Anyway, so let's take a look at what actually, what does this at runtime? Where can these runtime breakages come from? And I'm gonna show you a few examples here. I wrote this method, this stat diff that actually prints out a diff so you can see what hunks of code will impact your caches. And over there on the left, there's no diff, it's fine. If you extend something, that breaks the method cache because we're adding new methods onto the instance. If you instance exec, that's gonna break the method cache. If you access the singleton class, that's gonna break the method cache. And the thing that's common between all of these examples is that we're accessing the singleton class of that instance, okay? So we're accessing the singleton class of this a instance and that is what's breaking the method cache. And let's talk a little bit more in-depth about that now. So this is an interesting question. If you look at a variable, ask yourself, what is this an instance of? And this seems like it might be a very easy question to answer. So here's an example, class foo, foo equals foo.new. What is this an instance of? Now this one isn't a trick. There's no trick question here at all. Foo is an instance of the foo class. That's it, right? Now what if we had code that looked like this? We say, all right, let's define a singleton method on foo. Now what is the class of foo? What is the class of that instance? Well, it can't be an instance of just foo because there's also this bar method and other instances of foo won't respond to bar, right? What it is, is it's actually an instance of a singleton class that inherits from foo, okay? So when you access the singleton class, you can imagine that magically a singleton class gets created and that singleton class inherits from foo. So we can think of this class, whether it's actually foo or the singleton class, we can think of that as the real class. And if you look through Ruby's source code, you'll notice this, you'll find this terminology used where they refer to the real class of an object, where the real class is either, for example, the real class here of foo is gonna be the foo class and the real class here is gonna be a singleton class, okay? So you'll find this referred to throughout the source code. And the way that this impacts us is, let's say we have this sort of access here where we're doing an instance exec and we create that singleton class. When we ask for the cache of that, we say, or when we need to calculate that serial number for the class, we say, okay, well, now we're mapping over to a singleton class on the top, whereas the real class is just foo on the bottom. And then we ask that class for the serial number. So you can see how accessing the singleton class is going to break those caches because it's actually gonna change the type of that variable. So far we've looked at, I wanna kinda recap here, so far we've looked at two types of cache misses where we've just done stuff where we're defining new methods or new methods, new classes, all that. And we're just doing that at boot time. This is normal Ruby code, looks totally fine. And then we have stuff that's done at runtime where we're accessing the singleton class. And this stuff is a bit more, I don't know, I don't wanna say not controversial but shady. Like why are you instance-execuing? Why are you accessing the singleton class? These are the types of things where you look at it and you're like, hmm, why are we doing that? A little bit more questionable. So those are different ways to break cache. Now I wanna talk about another way to break cache that is a perfectly reasonable way to write code and that is polymorphism. So hopefully we're all doing polymorphism in our code. It turns out I will give you a hint at the end of this talk where none of us are doing polymorphism. Hopefully we are doing polymorphism but we are not. So let's take a look at this example. We have on the left and right, we've got one class that is using, or we've got a test that's using just A and it's calling foo with an instance of A and we've got on the right here B, it's calling foo with an instance of A and B. And I've made sure that both sides are calling foo the same number of times. So we're getting the same number of method calls going through foo, okay? So if we run this, if we compare the two, you'll see that A actually runs a bit faster than B. So it takes about 400 milliseconds more on the polymorphic test than it does on the other one. Now, if we look at that, the reason is because at that particular call site, we say, all right, let's look up the class of bar and access the serial number for that class and we know by looking at this code, it's just oscillating between A and B each time. So every time this method gets called, that's a cache miss. We never hit that cache in there. So we're always having to look up the method for that. So if we look at this test again, we'll notice that this call site only sees an instance of A where this call site sees an instance of A and B. Now we can call sites that have one type, monomorphic. Those are monomorphic, they see one type. And we can call call sites that see two or more types, we can call those polymorphic, okay? Now, if a call site sees too many types, we can call those megamorphic. And I put too many in quotes because how do you know what is too many, right? What is too many? We'll talk about what too many is in a bit. So speeding this up, how do we speed up this polymorphic example? So the cache looks like this today. The cache says, okay, give me the class of that variable. Does the cache have the same serial number as that class? If it does, we've got a cache hit. Otherwise, we need to go look up the method. That's what the cache looks like and it's actually in C. This is Ruby, but I have translated it. So we can call this a monomorphic inline cache because we're caching one thing. We're gonna break down this terminology here. Monomorphic, it only sees one type and only caches one type. It's inline because it's cached inside the bytecode and it's a cache because it's a cache. I am good at defining things. So what would be nice is if we had a future cache, it looks something like this, or we would say, okay, well, what if we had a list of caches and we iterate over those, a small list. Let's say the list is small. Maybe we keep two or three cache entries in there, right? And we cache the last two or three things that we saw and if we find any of those things then it's a cache hit. Otherwise, we have to go look it up and put it into the cache. So when I say we're gonna cache two or three, that's gonna be our polymorphic cache and if we have four or more, that's too many and that's gonna be a megamorphic cache. So this type of cache is called a polymorphic inline cache because we're caching multiple inline as well and it's a cache. So I wrote a patch to do this against Ruby and this is what the patch looks like. I need you all to study this very carefully because we're gonna have a test. There it is. That is the patch. Actually, it's not that bad. I made it look really bad with all the scrolling. It's actually only 186 line diff and that was just at WC. I'm not even sure how many lines I changed. It's not too bad. So if we run our test again, this exact same test, we go ahead and time it, numbers are exactly the same. So monomorphic and our polymorphic tests run in exactly the same amount of time. So, you know, mission accomplished, we did it. Woo! Commit! Everything's great. We did it, yay! Now, unfortunately, we get to the part, so we've come to the climax of the story. This is a climax of the story. Everything's happy and fast and now we're going to descend into where everything begins to suck for me. Not for you, the audience. For me, for me, everything is terrible. So, you know, a very important thing about doing any sort of optimizations is that you need to measure their impact. It's really important. If you speed something up but nobody uses that thing, does it matter? The answer is no, it does not matter. So, my performance tip for you today is to only speed up bottlenecks, okay? Only speed up bottlenecks. So, in order to figure out what a bottleneck is, you need to measure your code and there's various tools for doing that. I'm not really going to talk about all those tools today, but you need to measure your code, look for those bottlenecks and then only optimize those bottlenecks. So, the question is here today, what, you know, okay, great, we sped up polymorphic call sites, but the real question is what percentage of our call sites are actually polymorphic, right? We're only, this particular optimization is only going to increase the speed of code that uses polymorph, or that has polymorphic call sites. So, how do we measure that, how do we find that? So, the way that I measured that is I added logging to the cache key lookup. So, in the code inside, in the code I said, okay, where we look up this cache key, I'm going to log it. I'm going to log whether it was a hit or a miss and I'm going to log what the type is. And this is it right here, you don't need, don't read this diff, it's not that big. But this adds, what this does is it adds a trace point to Ruby that allows you to hook into when there is an inline cache hit and an inline cache miss. And we can actually use this in Ruby like this. So, we say, we create a new trace point and we say, okay, I'm going to print out the call site information. I want to know what the information is in that call site and I care about hits and misses. I want this probe fired any time there is a hit and a miss. And we're going to enable the probe right here right before we call o.foo and right before anything else happens. So, if we look at this, we'll see this probe is going to be fired two times. One down here when we call o.foo, that's our first call, and then up there where we call bar.baz. So, at those two call sites, those are the only two call sites that get executed after we enable this probe. So, when we run this, we'll see that the output looks something like this. And the information that we get, we get the call site ID. This is just a sequential ID and a unique ID for every call site in your code. We get the serial number of the class. So, that's our cache key. And then we get the name of the class. And the Ruby VM internals don't care what the name is. I only output the name because I am human and I don't understand what a number is. A number is meaningless to me. I wanna know what the name of the class is. So, this is the information that we get out. So, we can actually, we can log all this. And if we add a few more calls, if we add a few more calls, we can log all that information and we can now separate the call sites that have one and two types. So, using that log information, we can say, okay, which call sites have one type, which call sites have two types. And if we break this down, we'll see that we have, let's see, one, two, three, four, five call sites and one of them is seeing multiple types. So, if we create a graph of that, this graph is exactly the same thing. So, four call sites, three of those call sites see one type, one of those call sites sees two types and that's that foo method is the one that sees two types. All right, so great. We have a way of measuring, we have a way of measuring the polymorphic call sites in our code. So, we need something to test that, like some actual code to test that with and I used our application at work because I figured it's probably good for me to do work. I should work. Yes, so I used our application and it's a very large Rails application and I logged about four million calls. So, basically what I did is I booted the application into production mode and then went and clicked around through a bunch of pages. Well, not just that, I got some production data, put it into our database, booted the application into production mode and then went around and did some actions on the website to try and recreate what an actual usage, an actual use case would be for our application. And here is where the bad news comes in. Here is a histogram of the types at call sites. Along the x-axis there, that's the number of types at a call site and along the y-axis, that's the number of calls at that call site. So, you can see there on the very far left, those are our monomorphic call sites. So, this polymorphic optimization, it will optimize that. Which you can see is a very, very tiny percentage of all the call sites in our application. So, what we can learn from this is that most sites are monomorphic, at least in our application. Most of the call sites are monomorphic. So, let's at least take a look at some interesting stuff from this. And the interesting thing was that, oh wow, there is a call site that sees over 16,000 types in our code base. What is that? So, I tracked that down and what that is is, we're using Thin as our web server and Thin uses Event Machine. And if you take a look, Event Machine, every time it gets a new connection, it does this allocate and then instance eval, which we just learned earlier, it creates a singleton class, which means that every one of these will be a cache miss. Every time an Event Machine connection is used, it's going to be a cache miss. All right, so, that was not fun. We have since switched to Puma. Number of sites, let's take a look at the call sites where the number of types is two. Let's take a breakdown of what those look like. This is what it looks like. I know you can't read it. Unfortunately, R was the only thing that would generate good graphs and unfortunately, they look like crap on a screen. So, I tried to put some in there into Keynote and it looks a little bit better, but let's just take a look at what the top types are. The very top type is a set or an array. So, at one particular call site, we would see either a set or an array. And we were saying set or array.include. And what that was is mime types. I gotta hurry. What that was is a list of mime types. What's interesting is if you look at the code, if you go trace through it and you ask yourself, well, I see why it's a set. And in this other case, why is it an array? And it turns out there's no reason for it to be an array. It could just be a set. So, we can change that to a set and now it's monomorphic. And we're getting call site hits again. Unfortunately, we're getting call site hits, or fortunately, we're getting call site hits, but unfortunately, now polymorphic inline caches won't help this at all. So, that's no fun. The next highest one was something.blank. And that was an array or a nil. So, at that call site, we would see an array or we would see a nil. And we would call .blank on that thing. And if you trace back through that code and find out, well, why is it a nil? Turns out there's no reason for it to be a nil. It could just be an array, an empty array. So, we change it to that. And now it's a monomorphic site and we're getting hits again. Next thing, this one's a little bit harder. As symbols or strings, we're calling 2s on those. So, it'd either be a symbol or a string. We would call 2s on that. Now, what's interesting about these top polymorphic sites is that very few of them were performance or behavior related, right? None of these were performance or behavior related. When you look at them, they're just like, well, it doesn't need to be an array. It could be a set. It doesn't need to be a nil. It could just be an array, an empty array. None of them were done on purpose. None of them were done by us saying, hey, I really think polymorphism would help out in this role and I want to look at some examples of that. So, for example, one way you can increase performance is say, let's say we have this class foo on the left here and it's got this predicate method, interesting. And every time you call that predicate method, it's doing that comparison. We can actually split this into two classes where we know on instantiation whether or not this is an interesting object. So what we can do is create two classes. One of them that's interesting and just always returns true and one of them that's not interesting and always returns false. So we're able to eliminate that conditional. We execute that conditional once and subsequent times we don't execute that conditional again. So this is one thing I'm talking about when it comes to performance. So we want to reduce the number of conditionals in our code. The next one is behavior and let's say we want to configure some object with a particular behavior. So this is just some abstract bit of code. Now unfortunately none of these call sites were using that type of stuff. All right, all right. So it's time for me to wrap up. It's time for me to wrap up. So this polymorphic inline cache we're probably not going to apply it to Ruby because as you can see, it doesn't really do much for our application. And I want to say, okay, so it's only a failure if you learn nothing and this is not for you. This is for me to make myself feel better after all of this work. What I need to say about inline caches are that they are load specific. So if you looked at this, you'll say, well, you know, it's interesting. Aaron, your application would not benefit from polymorphic inline caches. And I hope that all of you are thinking to yourself, well, maybe mine would. Maybe Aaron's application is not representative of Ruby programs at large. So I feel like I should test your code, Aaron. Right, you're all saying that? Please be saying that. And if you do want to test it, go check out these branches and you can test it against your code and I would urge you to actually test it against your code because I don't know that our application is representative of Ruby programs at large. It could be that ours is an anomaly and everybody out there is using polymorphism and it's great and this will totally speed up your code but it won't speed up ours. So please test this. Also, only optimize code that matters. I honestly, I should have done this, I should have tested our application and looked at where the polymorphic call sites were before coming up with this patch. If I had done that, then I wouldn't have gone through all the cycles to try and figure out how to actually jam an inline cache into MRI. So please do measurements before you do optimizations. I guess I just said that. And also finally, please use more polymorphism. Validate me. Make my patch worthwhile. Please. All right, thank you very much. I'm honored to be here at the Last Mountain West. Thank you for having me. So the question was, those polymorphic, or those polymorphic cache sites or those polymorphic call sites where it was array versus nil, how much is that gonna cost or how much is the miss gonna cost? Essentially, that's a very good question. The miss actually isn't going to be very expensive because Ruby actually has two layers of caching. The first layer, our first layer is that call cache which we went through in this, we talked about in this presentation. The second layer is actually a secondary, there's actually a secondary cache where if we miss that call cache, it'll actually cache in a hash all the ancestors. So it's actually not very expensive. We would get slightly more performance benefits with this pick, but we're not killing ourselves. It's okay. The question was, what do you think about writing a Dr. Seuss-like book called The Cache and the Hatch? I think that that is a very, very good idea and I will co-author it with my cats. So the question is, what about active record polymorphism? I don't know, I mean, we use active record polymorphism in our application at work and I mean I might not, I don't know if we use it as heavily as everybody else does, it's a problem. Like I don't have a good idea of what everybody uses in their applications, which is why I'm pleading to all of you to give this a try. If you're using active record polymorphism, this will definitely help you. I don't know by how much though. Sure, so the question is, if you need to balance, well, the question is how to balance performance code versus readable code, essentially. When do you say like, hey, no, we need to make sure that this is readable versus performance and the answer to that, in my opinion, is how hot the hotspot is. So, I mean, what you should be doing is coming up with, coming up with a number where you say this is fast enough, right? And then matching that number. For example, we have one of the extreme hot spots in active record is we iterate over a bunch of records when we pull them out from the database and if you go look at that part, that is so hot, everybody hits that all the time that the code is really, really terrible. We just wrote the most, it's horrible to read but it is also the most performant code that you can get. But I think the best answer for that is come up with a number that is fast enough and make the code bad looking up to that point and stop there. Or if you can, the other thing is if you can try to think about the larger picture rather than focusing on that particular bit of code and optimizing that one little thing, think about the larger picture and think, well, how can I do this? Is there an easier way to accomplish the thing that we need to do? Maybe there's a more performant, not necessarily algorithm but design that we could have in the code. So, any other questions? Okay, one other thing I wanted to talk about that I did not mention in the slides is that, all right, so we were talking about, this is another thing that totally sucks about this patch. I was showing you where we had two sites. We wanted to optimize where we were seeing two types, right? And this inline cache would optimize whether like especially oscillating calls. So if it was type A, then type B, then type A, then type B, then type A. Now, the interesting thing is on cache misses, let's say when we do that type A and it misses, it stores that type A information into the cache, right? Now imagine that we have a call site that's type A, A, A, A, A, A, A, then B, B, B, B, B, B. All those subsequent A's are gonna hit and all the subsequent B's are gonna hit. So that first miss is actually amortized over the life of those subsequent A calls. And it turns out when I logged a bunch of those, logged a lot of the method calls in our application, many of them exhibited exactly that behavior. It would be the same type many times, then switch once and be that same type many times. So in that particular case, this isn't gonna help that much either. So, waw, waw.