 like if I do extremely poorly, at least it happened at the beginning of the conference and everyone will have forgotten by the end of the conference. But today I talked so that kind of made me nervous and then I realized something that I'm giving the last talk and you and beer. So I'm still on mute. Am I on mute? Ah, sorry. Now we're on. Wow. Much better. Okay. That is between you and beer. And I also figure that once we have go to the after party, hopefully after a few beers, you'll have forgotten everything that I say. So it'll be fine. I'll be fine. At least I tell myself that. I work for Red Hat. I manage IQ team at Red Hat and we manage clouds. So if you have a cloud, we can manage your cloud. That is what we do. That is what our team does. So if you happen to have one, we won't manage it for you. On the team, my job title is Hacker Man. That's me. So I'm on the Ruby Core team. I'm also on the Rails Core team and I'm a pro Nekoatsume player. So I've gone pro. Pro at this game. This is a picture of my setup. I really want to say thank you to all the organizers here. Thank you, Winston. Thank you, all of you for coming. Give yourselves a round of applause, please. Thank you. Thank you for being here. This is an awesome conference. I feel really privileged to be here for the third time. It's amazing. I'm so happy to be here. So I want to talk a little bit about my experiences here on this trip. So I just want to talk a little bit about the stuff that's happened to me while I'm in Singapore. I had to endure in the heat. I ate a lot of food. I took many selfies. So these are all my selfies. More selfies. Selfies with Mads. I took over 200 selfies and we fees and I plan to do more of that after this talk is over. So some of the stuff that I learned, I want to talk a little bit about some of the presentations I saw here. I learned that RubyMotion is not PhoneGap. So it is scientific. Tails is not Omakase. I'm really glad to hear about this that it's not Omakase because once I heard that it ruined Sushi for me. Everybody was doing Richard Stallman stuff so I decided to Google awkward Richard Stallman. So that's what I found when I look for that. I learned that refinements need more refinement. So there have been a lot of really, really awesome talks here today. So what I'm really trying to say here while you watch my presentation is that you should lower your expectation. So I'm really sorry. I'm going to apologize up front. Really got to apologize up front because all the stuff that I'm going to present to you today is like it's all very, very technical stuff. Most of the stuff we're going to talk about is very tech. Number is heavy things and I'm afraid that the content might be very boring. It might even be super dry. So welcome, welcome. Welcome to my talk. You may have noticed I'm trying to use. So I'm trying to use in all the slides a Baskerville font. And the reason that I'm doing this is because I read online that stuff in Baskerville, if it's written in Baskerville, it's more believable. So I'm not making this up. You can go to that URL and read about it. So I wanted people to believe the things that I have to say. And then I just decided I should put in a picture that I like, which is this picture. I saw that on the internet. I just wanted to share it. That's it. It's just a picture I like. All right. So I've got a couple cats. This one, this is one of my cats. Her name is Choo Choo. Or C-Tac Facebook YouTube airport is her full name. And this is Gorbachev or Gorby Puff Puff Thunder Horse. And I actually have stickers of him. So if you want a sticker of my cat, come say hello to me and I will give you a sticker of my cat. If you want to know how I got these cats, I'm going to explain that. Basically, this is the process. So if you want a cat like mine. I also like extremely strange keyboards too. One of my hobbies is to build keyboards. I love building keyboards. This is one of my keyboards. I have it with me today. So if you want to check it out, I'll tell you all about that and we can talk about strange keyboards. And my cats love them too. There's another one of my keyboards at home and they like to sit on it. That's Choo Choo and then Gorbachev also likes it. It's really convenient while I'm programming. It's right there. Helping me out, right? This is my other one. Choo Choo is my other keyboard. Recently, well, I guess last December, my wife and I decided to get some photos taken, professional photos taken and I wanted to share them with you. These are supposed to be for holiday cards. This is my wife and I. This is another one of me. I'm not sure. So I'm not sure if I actually am supposed to show these because I don't actually own the copyright on them and I'll explain that to you over some beers. It's an interesting story. Slightly interesting. And this is another picture of my cat, just for fun. I'm really into extreme programming. So I love XP. Really love XP and when I was at RailsConf, I got the opportunity to meet Kent Beck, which is really awesome. He took a photo with me. It was really cool of him. And now we're best friends forever. Anyway, he and I, while we were being best friends, we collaborated together on an extreme programming setup and I had been prototyping it at my apartment and this is the extreme setup so that you don't get hurt while you are extreme programming. Anyway, so the title of my talk, you might have read in the program the title of my talk was Code Required, but actually the real title of my talk is everything you ever want to know about loading files but we're afraid to ask or maybe you didn't really want to know about it anyway, but you're here just because. And then they told me that was too long for the program. They couldn't fit it, so I had to give them a short name. So this is called Code Required. And I also put it in an emoji. So I want to talk a little bit about security. This isn't the main thing, main part of the topic, but we're covering other stuff that, you know, other talks here and Andre talked about security and I want to tell a little bit, a little story about security stuff, a little security mistake that I made. And this story sucks. So this is, I learned this command. You don't need to do this command today. If you're on a newer get, you don't need to do this one. But this is a really, really important command that you should, you needed to learn earlier. So I'm on the security team, the rail security team, and I deal with security issues that come up from time to time. And basically the way that I handle those is, or the way that we handle those are my personal way that I handle it is, we have a branch, I'll have a branch that's like, you know, we have the 32, 32 stable branch. And that's the one that's up on GitHub. And, you know, just for example, there's all the other ones, the stable branches as well, right? And then I have a, on my local machine, I've got a 32 sec branch and I'll do that for each of the branches that we're maintaining. And I keep that branch on my machine until it's time to actually do the release, right? When it's time to do the release, I'll merge those back into the stable branch and then push those up to GitHub. Anyway, so I had been working on some security stuff and it was time, you know, good. We're gonna do the release the next day. Everything was going fine. And so I decided, like, we had planned on doing the release the following day. So I was like, well, I'll just do some bug fixes now or whatever, you know, just go about my work because we got that done. So, you know, I'm working against master and I decide, okay, I'm gonna do a, you know, fix a bug, do a commit. So I do that, do a commit. And then I was like, okay, I'm gonna do get push. And then as soon as I did get push, that happened. And all the security patches were posted to GitHub a day early. And then I learned how to delete branches on GitHub very quickly. So to fix this problem, you use that command that I showed earlier and I wanted to find, so I was thinking I should find this, I should find what happened, like the campfire conversation, what happened when this, you know, when I did this. So I searched campfire for this. And I couldn't find anything, there were way too many, there were way too many records in there. So anyway, after that, I learned how to delete branches on the remote on GitHub, I did that very quickly. And then I tweeted this. And everybody thought, wow, that's a very handy command, I should use that too. And I was actually just face bombing the entire time. Why didn't I do this earlier? Anyway, so the moral of this story is that security is not fun. And don't do open source. I'm just kidding, I'm just kidding. I'm kidding about the don't do open source part, security is not fun. It's just not fun. All right, so I'm on the Rails team. And I spend a lot of time thinking about Rails performance. I like working on Rails performance, but a lot of the time, I think a lot about boot time. Our application at work, our application is actually open source, you can go to our GitHub, I should have put the URL in here, but you can go to our GitHub, the manage IQ GitHub, and actually look at our repositories. You can see what's happening there. And our app looks a little bit something like this, we have over 500 models, 83 controllers. This is just the numbers from rakes that so I don't know if it's totally accurate. Our boot time is like 12 seconds. And the way that I'm measuring boot time for this purpose is just like this. I'm running Rails Runner and just printing out the version. So it takes about 12 seconds to do that. And a lot of the time spent in our application is loading files. So I'm thinking about how can we load files faster. So that is, that's the stuff that I've just been working on recently. And that's really what this presentation is about is things I've been working on recently. So before we get into how to make this faster, I want to talk about what it does today. And then we'll talk about usage, usage of loading files, how people use it. And then we'll talk about speeding it up. So first let's talk about how files are loaded. So we're going to look at three functions. We're going to look at load, require, and autoload. And in order to talk about these functions, we need to know about a few global variables. And you probably know about these variables already. So we'll look at these variables. The load path, we have dollar load path. That's our first global variable. And it's just a list. And you can modify the list. You can modify it by just saying unshift. Or you can use dash i and mutate it. So you can actually say like, I'll run with, you know, I'll run dash i. Hello. And you'll see that at the top there you added that to your, added that to the load path if you print it out. Now you can think of the load path as essentially our code database, our code to load database. When we want to look up, we want to find some code, you say require foo, that's where we're going to go look for it. That database is the load path. Now the other global variable we need to know about is loaded features. This is the other global variable. This is our code that has been loaded. This is just a list with a caveat that it's not just a list. It looks like a list to us Ruby programmers. If you print out the global variable it's just a list. But under the hood inside of MRI there's actually a cache, a hash of that so we can do lookups faster. So it's not just a list under the hood but to us it is. This is our already loaded code database. This is the code that we've already loaded. So you know when you require a file twice it's not going to be loaded twice. So if we need to find, if we need to find code we look in the load path. And if we need to test whether or not code has been loaded we look in loaded features. So those are our main players when we're trying to look up files to load. All right so our first function to look at is load. This is a function for loading a file, it takes a file name and it also takes this wrap option and we'll talk about that later. You can give it a full path. So you say load, full path. And if we run that code so we load up test twice or test will load x.rb twice. When we run that you'll see it actually loads a file twice. It just takes whatever code is in that file and just executes that thing. We can also give it a very, or a relative path so you can just say load x.rb. And in that case it will search the load path and load those, to find those files. So if we run it you'll see the output is hello world twice and you'll also notice that I had to provide a dash i, the temp so that it knew where to find x.rb. So that searches the load path but it only searches the load path if you provide a file that's not an absolute path. Now this option is interesting, the wrap option. How many of you have used this? That's what I thought. Yes. Okay. So you can provide a true, you can say load true. Now let's actually, I don't want to spoil the surprise, sorry. Let's look at this. This is our usage of it. We'll say load x.rb and we can print out x the class. And on the right hand side there we'll print out the name of the class. Okay. And if we run this we'll see it looks exactly what you'd expect to print out the string x because that's the name of the class and then it just prints out x again because that's the class that was defined. Now if we add true, if we add true what will happen is Ruby will actually evaluate that code inside of an anonymous module. So if we execute that, say this program we'll see the output is, okay module, blah, blah, blah, blah, blah, blah, x and then uninitialized constant x. It's kind of interesting. So you can actually load some code inside of an anonymous module. If you want to, I think this would be interesting, like let's say maybe you want to load two versions of the same library inside of an anonymous module, I think it might be interesting. However, you can't get access to that anonymous module outside of the load. The caller can't get access to that. The other interesting problem with this is that let's say we have this setup here where test loads x and then x loads y. So this evaluates that file and then comes down and evaluates this file. So we print out the names of each of those. So when we execute this program we expect the first thing to be output is that name and then that name. So we expect y to come out first and then x to come out second. So does anyone know what the output of this will be? Anyone? Probably not since nobody used the true value. The output will be this. You might think that both of those would be wrapped in the anonymous module but it's not. Actually y will be defined at the top level and then x will be defined inside the anonymous module. So my question is, is this actually very useful? And I talked to Matt about this earlier and he was like, I don't know. I don't know if anybody uses this so the answer is ah. So it seems to me like we should probably wrap everything, wrap everything or somehow give a module or something to that variable, say wrap it in this module or remove the feature. I'm not sure what we would use that for otherwise. Although since nobody is using it, it doesn't really matter, right? All right. So to recap load, load searches the load path but it doesn't have any interaction with loaded features. It doesn't have any interaction with that global variable at all, only the load path one. We saw that when we gave a relative file name. Now if we look at require, let's have another example. We can give require a full path so we'll do require twice and that second file, x over there will print out hello world and if we execute this program just as you'd expect hello world is only output once. We've all used require, we all know about the semantics, it'll only load that file once. We can also give a relative path to require so like this. When we execute that, again hello world is printed out and you'll notice that I had to specify slash temp to dash i so that it knew where to find that file. What else is kind of cool about require is require will return a boolean to you about whether or not that file was loaded. When we run it the first time it'll say true meaning yes I did load something, I loaded some code. The second time it'll say I didn't load any code, false as I didn't load any code. In order to do that it has to search loaded features to figure out whether or not it already loaded that code. This is how our second global variable is used. We can actually see that modification in action with this program right here. If we look at the loaded features before we require then print out the difference between before and after. If we look at that we'll see yes indeed it added that fully qualified path to the loaded features array. You'll see require is smart in that if we say require x and require x.rb it'll canonicalize that file name and check the loaded features thing. If we print that out it'll be exactly the same output as the previous slide. Now I keep mentioning canonicalization and this is going to be important. The load path is used for canonicalization. If you look at this example on the left hand side we have the non-canonical format and on the right hand side we have the canonical format being the entire path of the file. Ruby knows using the load path how to figure out that whole file path and check the loaded features as to whether or not it's been loaded. The logic for this looks a little bit something like this. Is it canonical? Ruby says is it canonical? We say no. Then it says okay we canonicalize it and then we go back to is it canonical? Yes. We check is it loaded if it is isn't then we load it, add it to the loaded features and then we're done. If it's already been loaded then we're done. This canonicalization step right here is where the load path is used and this is loaded part is where loaded features is used and then if we actually load the code loaded features is used again right there. Got it? It's just that simple. Those seven easy steps. We all got it. Yes. Okay. All right. So the next thing we're going to look at is autoload and autoload usage looks like this. We have we say okay I want to autoload some particular constant bar and when that constant is referenced for the first time I want you to load that constant from this particular file here. So bar whenever anybody looks at bar it'll look in file X for that thing and then when we reference the constant bar it'll go load this program so as soon as bar is referenced it'll load X.RB that'll print out high and then it'll go define bar and we're done. So when we run this program you'll see the output is just high and then it's that constant foo bar. So this the semantics of autoload are exactly the same as require in this particular case. So if we reference bar multiple times it's not going to load the file multiple times. So if we execute this you'll see high is printed out once and then it just prints out the constant three times after that. So the autoload logic looks a little bit something like this. When a constant is referenced it says did we load it already? If we didn't load it then let's go do the require logic and then we're done. If we did load it then we're done. So we do constant reference evaluate this file and then execute the whole thing. So we know we put out hello now we get bar and then as soon bar is evaluate I think I said this already I'm scared. Anyway, so when bar is referenced X isn't necessarily evaluated. Now the interesting thing is we have to say like when we get autoload the autoload logic is a little bit more complicated when we say when we reference a constant we have to say well let me be a little bit more clear when we're evaluating this file what actually happens is we're referencing that constant twice. We reference it once here in the foo bar but then as soon as we're evaluating this file it comes in and it prints out high and then it references the constant a second time here right there. So what happens when that bit of code is executed? We know that we're referencing a constant a second time and we have to say well we're currently loading this file we don't want to reload it if we reloaded it we'd be into an infinite loop right? So bar when bar is referenced X is not evaluated that second time so our autoload logic is a little bit more complicated we have to say okay did we load it are we currently loading it is it in flight if we're not currently loading it then we do the require logic if we are currently loading it then we have to say we're done we need some mechanism for making sure that we're currently loading that particular file so I'm going to get a little bit hand wavy here hand wavy but what's going on under the hood is that we actually have a hidden global variable that isn't exposed to Ruby there's a global variable inside of Ruby that keeps track of the files that we're currently requiring the files that we're currently loading and that's called the loading table if you look inside a MRI source you'll find this function called get loading table and this is the thing that keeps track of files that are currently being loaded so file load steps when we looked at the file load steps like this right here at this load section that's the part where we actually add to that add to that in flight list so our load steps look a little bit something like this we take out a lock we add the file to the loading table if you look inside the Ruby you don't see this normally we eval the file add that to the loaded features and then remove the lock and remove it from unloading and we're done so I want to talk so far we've been looking at functions that are just inside of Ruby we haven't talked about Ruby gems at all and I want to talk a little bit about Ruby gems and its relationship with Ruby's require this is important for figuring out how we're going to speed up loading stuff in Rails so let's say we do require rack we know that we've installed the rack gem and when we do require rack how does it know to find rack so how does it know that we can look that up or where do we look that up the way that it works is that Ruby gems implements require and if you look here you'll say like to prove this we can get a reference to the method and say tell me where the source location is for that method and you'll see there it's implemented somewhere inside of Ruby gems like .38 or whatever right there now if we run without Ruby gems so you can say disable gems and run IRB you'll see the method source location is nil so right up there disable gems means no Ruby gems whatsoever and if we look for the source location of require we'll see that it's nil and that means that it's implemented in C so if your methods are implemented in Ruby you'll get a source location for it if it's implemented in C you'll get nil and that means Ruby gems is require work I'm going to boil this down very simply it looks like this basically what it does is alias is Ruby's require off to the side then it tries to call the original require and if there is an exception then it'll go look for any gems that have that particular file in it then it mutates the load path and then tries the require again so I know that's a lot of code the way that it works is we say alright try Ruby's require if that works great we're done we just return if an exception happens then we say okay go find a gem that contains that file then mutate the load path to put that gems directory onto the load path then try the require again and then we're done so you can see when we do require on rack that very first that very first require is going to cause an exception the second require will not have an exception because the gem is now put onto the load path so this very first section here we say alright require rack lock we hit an exception and we go all the way down through here and we're done that second one the e-tag one just goes straight down to done because we're already on the load path there's actually a way to load rack without causing any exceptions in your process if you run on rack like this the gem rack actually mutates the load path all that does is it looks up the gem mutates the load path and then both of these have no exceptions so to tie this together we can see the exception in action if we run with dash D if you run Ruby with the debug flag on you can actually see all the exceptions that are occurring inside your app now you can see right there our first we have one exception on rack lock doesn't matter anymore we can also see this from inside IRB if we dupe the load path and then we require rack lock you'll see down there at the bottom that gem has been added to the load path you'll also see that the loaded features are mutated at the same time so so far we've looked at require loading code with require auto load and load we've looked at global variables the global variables that are involved load path loaded features and that hidden one that we don't see in Ruby land loading table and we've also looked at Ruby gems as require and how it mutates the load path so the next thing that I want to look at are Ruby gems usage and performance characteristics and performance improvements that we can do with Ruby gems so I want to know and note that I'm saying Ruby space gems usage I want to know about people's development environments it's hard for me when I'm doing development against Rails and trying to improve the development environment of Rails it's difficult for me because I don't have access to all the applications that all of you are developing I have access to my application at work so I use that for sure and then maybe some other open source ones but I don't know what the typical developer is like what is the typical developer like I don't know that and that's the question that I want to answer is what does the typical development environment look like so I created a survey thing here this is the code you can go visit it you don't need to run it now because I'm going to show you the results that I have from that and hopefully I'd like to get this thing running every year so we can see how development environments change and the data that came back looks a bit like this make sure to read all this I'm going to quiz you on it later so the data that I'm collecting I think it's kind of interesting basically all this gem does it's not even a gem it's just a script you run all it does is collect some data about your environment and then post it to a Google form and that goes into this this document here so the data that I'm collecting are like how many gems do people use like how many gems are installed on your system like system wide then how many gems are in your project so I want to know like your Rails project because the gems that you use in your Rails project are different than the ones that are installed globally on your system how many files are in each gem and what versions of Ruby gems do you use and what versions of Ruby do you use so the reason I want to know this data is because gem count impacts your performance because it modifies that load path that load path impacts our performance and we'll see how that impacts our performance later and the file count impacts our performance as well the number of files impacts our performance is how are we going to do caching what is our caching strategy going to be so the data specifically collected is gem count the gem count per project and system wide for that particular user your Ruby version, your Ruby gems version hostOS and the file counts for each of the gems that you have installed min, max, median, mean but nothing specific about each of the gems I also collected a unique user ID and a unique project ID interesting thing I did so I'm going to share the code with you the user ID the unique user ID and project ID I put quotes around them because they're not necessarily unique they could be duplicated basically I generated a hash for that particular user and this is what the code looks like for generating that hash we just said okay give me your host name, your IP address your time zone and whatever your home directory is mash that together as a string and then SHA-256 send it off so that's our ID so theoretically two people who are running exactly the same setup like this could have sent duplicate results but probably that's not true so each project I grabbed per project I just grabbed your bundle gem file bundler actually sets an environment variable that points at your gem file SHA that and send it up so duplicates are possible but unlikely and also this data is pretty anonymous I don't know anything particular about anyone who submits data so as far as our responses are concerned I got 466 unique projects 140 unique systems and a lot of the data I'm going to present to you here today I used R, the R programming language to do processing on it and after using R for a while I can tell you that R is terrible it sucks I wasted many hours on this I would have been done way earlier if I had just stuck with Ruby and we'll talk about that over beer tonight please come ask me for a sticker and I'll tell you about how R is terrible these are the versions that we looked at this is just a version breakdown I think I had one response using 187 so that's kind of cool you have to notice that these statistics are totally biased because the way that I advertise this is through Twitter so they're biased towards people who follow me on Twitter they're also biased towards people who will actually send me some data but I guess whatever those are the people who are going to get performance improvements so good for them so our implementation system-wide we had all I got were the only responses I got were MRI and JRuby and that's what the breakdown looks like I think what was interesting about this is that JRuby users are I guess running more projects so a person running JRuby on their system has more projects than usual another thing I thought was interesting was looking at RubyGems upgrades it turns out 44% of people have upgraded once so 44% of the people who responded to me have upgraded the RubyGems system and the way that I measured this is I looked at I went through every version of Ruby that shipped with that version and then compared that to the RubyGems version that they told me they were running so 44% have upgraded once and 23% are on the latest version which I thought was interesting looking at project distribution so I wanted to know how many projects are on each machine this is what the project distribution looked like most people are only running a couple projects but there's a few that are running like almost 90 on one machine so our summary looked a little bit like this this is our max output we had 82 projects on one machine so most people have like 3 at most our OS distribution looked like this I had 0 people with windows respond to me everyone is using OS 10 or Linux and I'm trying to get I think that this reflects development environments we're trying to get development environment information because that's what I care about optimizing is your development environment I don't care about your production environment don't quote me on that I do care about your production environment but I'm trying to optimize your development environment so our project distributions what was interesting about project distributions I want to look at gem distributions per project how many gems does your project depend on and this isn't just the number of gems that are in your gem file this is the entire graph what it looks like this is kind of interesting I'm not sure what type of graph that is it is a graph it is a keynote graph I was going to say oh it's linear blah blah blah I don't know it's keynote here's the statistics about this our max we had somebody with 287 gems and most people are running about 100 or so gems which actually works with the project we have at work I think we have about 200 gems or so but average is about 100 gem dependencies file distribution this is interesting this is the number of files that RubyGems thinks are requireable in a particular gem okay now look at that that is the number of files in each gem that's not total across the project that's each gem so you'll notice on the very right hand side there are gems that have 14,000 files in them and those are requireable files you can actually require all of them so the average here is about 4,000 files what I think is interesting about this is it means there are 4,000 files that are potentially requireable inside of your project but probably not all 4,000 of them are being required system distribution number of gems on each system this is interesting to me because the number of gems that are on your system will impact bin stubs RubyGems bin stubs so when you run bundle exec bundle whatever this number will impact that command so that's why I wanted to know this so we want to optimize your projects and we also want to optimize your system summary looks like this so this is the number of gems that are installed system wide one person that responded had over 1200 gems installed on their system which is crazy so number of files number of files per system almost 90,000 files it's crazy there's some really interesting data out here so the average project average project just to summarize average project has about 100 gems about 4,000 files the average system has about 3 projects on it 280 gems are typically installed on your system and maybe 13,000 files are I think what this boils down to is that people typically they're just installing the gems on their system and then they go into their projects and they're doing bundle there so you're probably installing more gems on your system and then using those inside of your bundles so performance characteristics let's move on to the future let's talk about the performance and how we're going to improve the performance as the number of gems grows how does require change so we see we have a range of projects here from very few gems up to many many gems how does the performance of require change if we change the number of gems that are on the system so what I'm really saying here is as the load path grows how does require time grow because that's what we're doing with gems when we load the gems we put them on the load path right? so what we're really talking about here is load path search time how long does it take to search the load path and this is the test code that I used again please read it it's going to be quizzed on it later I know it's a small point font but it is Baskerville so this is the test code zoomed in a little bit basically what I did is I said okay get the clock time require the file and then get the clock time afterwards and if you can read this code great if not I'm going to be posting the slides later but what I think is cool about this is we can get Ruby exposed as a high resolution clock to us that's also monotonic monotonic meaning if the system clock changes that doesn't impact our test so what I did is I said okay we're going to increase the load path to one file and we're going to do a worst case scenario file and a best case scenario file and we're going to graph that time and this is what the graph looks like so the red one is our worst case scenario that's the worst case file and the blue one is our best case the fastest one so as you can see here down along the x-axis there that's the number of gems we have activated so that's roughly the size of the load path and the y-axis that we're going to require time in milliseconds so how many milliseconds it took to require that one file so you can see here we scale linearly as the size of the load path increases the amount of time it takes to require also increases so we see when I say worst case what does that mean well when we do let's say we have a load path here and we're looking for foo the worst case means we go in here and we say oh it's not there it's not there it's not there it's all the way at the end and we found it the worst case scenario means it's at the beginning we found it at the very beginning it's there we're done so what I think is really interesting about this graph and it's a question that I don't have an answer for today if it's at the beginning of that list it should be constant time it should always be the same speed we know it's at the beginning we're done every single time yet you can see here it actually increases linearly so I think this is a bug I don't know why it does this it's something I need to study a bit more and I can tell you when I was producing these graphs I was crying because I was hoping I could come in here and say to you oh it's linear time everything's great but this is the reality so at 300 gems you have it takes four to six milliseconds to require one empty file this is an empty file and you think wow four to six milliseconds it's actually pretty fast but then when you think it has 3,000 files in it so six milliseconds times 3,000 files adds up right so let's look at some performance improvements what I want to do is we have this load path search and that load path search is ON and I want to change it to 01 I want to do constant time lookups I want to have constant time requires so searching load path is ON and the reason searching load path is ON or we have to search load path because we keep mutating it now searching that is searching that is ON and I think well okay how can we improve the performance here I think the way that we can improve performance is what if we just stop searching the load path eh let's just stop doing it just don't do it anymore so how can we do that the way we do that is if you provide a full path name that doesn't search the load path anymore so if you do something like this we say okay require foobarbaz.rb that doesn't search the load path and we can have a constant time require in that case but the question is how can we accomplish this you don't want to write out slash foobarbaz on your system because you're shipping that code out to somebody and they might be in a different place and also that's a pain to write and we write Ruby for fun and writing out the entire thing like that that is not fun and why are you making me write out all this stuff this is really terrible so what we need to talk about is canonicalization we talked about it a little bit earlier it happens in two places it happens when we search the load path and it also happens when we search gem specs what's interesting is when you say require foobarbaz what Ruby does is it says okay I'm going to go look for foobarbaz.rb I'm going to go look for foobarbaz.so or foobarbaz.o maybe there's a file named just foobarbaz I'm going to look for that file in all of those directories in the load path what's interesting is the logic for RubyGems is required and the logic for this is a little side thing here I found while I was researching this data is the logic for RubyGems and the logic for Ruby are different unfortunately so this is using just the RubyGems so we say require nocogiri.bundle and that works require nocogiri.so that works actually no I'm sorry this is plain old Ruby now if we use that same code with RubyGems it just breaks this is my life what I do every day for all of you so maybe I will have two beers tonight anyway you're getting off track yes yes I am you say require foo and it calculates all these it tries to find all these things we have a require parameter and it goes from the require parameter to the file name right but what if we went backwards what if we went backwards from that we know what the files are it's going to look for we can say like okay well given a particular full file path we can predict what the parameter to require will be we know it will either be foo.rb or it will be foo or it will be the entire path we know this data in advance we don't need to do this at run time so the idea is we can put together a translation hash we can say alright let's put together a hash that says it has foo.rb and foo in it and all those point at lib foo then we can change our RubyGems as require to instead of looking like this where we have on here on here and on here what we do is we look up that parameter inside that hash and we actually end up with constant times on those two first steps will be constant time alright so I put together a little proof of concept for this I ran the code exactly the same benchmarks I showed you before except that this time we're looking up that file in a hash rather than scanning the entire file system or scanning everything in the load path and this is what the graph looks like we say alright down there along the x-axis is the number of gems that I've activated so on the very right hand side I've activated a thousand gems and on the y-axis there is time in milliseconds to require one file and you can see it's linear time there and what I think is really really cool about this is that it's less than one millisecond I actually had to change my test if you look at my test it says okay time this and time it in milliseconds and it was always returning one so I actually had to measure in nanoseconds which was fun but anyway when can we do this when can we calculate this cache we know about this cache on gem install so as soon as somebody installs a gem we can look at all those files and say okay here are all the files that they installed let's calculate those short names for it put that in a hash and then when you run your program we can actually use that hash now astute watchers those of you who are still awake and not mad at me because you want beer we'll note that there are challenges with this and I want to talk a little bit about the challenges with this that I'm trying to overcome first off we have to deal with dash i people can run, I showed this at the very beginning people can run with dash i and dash i takes precedence over gems if you run with dash i you want to be able to get that file not the one that are in gems we also have to deal with load path mutations like if somebody does this I've seen this in ruby code you might do unshift require something now the other important extremely important thing is we have to have bundler support for this if we want to have this particular strategy because the way that your applications work if you're using a bundled application like we are at work when you say bundle exec rails whatever that rails whatever isn't actually using ruby gems anymore ruby gems is completely out of the picture in that case so we need to be able to support that case too what happens in that case is bundler sets up the load path for you so you actually have a load path of however big it is and then you scan against that so it would be nice if we could integrate this into bundler as well and then have constant time look up in that case too now I want to end this I've been talking for a long time I want to end this with a strange bug super strange bug that I ran into while trying to figure this stuff out alright we have two files here a.rb and b.rb and a auto loads b the same setup we had earlier but the setup before was like temp and x or something like that have a here, b there and if I run this program I get an error but what's weird is it printed out high so clearly I mean it's getting into that second file but it just gives me an error it's weird right so I couldn't figure this out I'm looking at this what is it doing, why is it breaking does my code look wrong to any of you is there a bug in this code anyone see a bug class inside class no class inside class is fine double colon nope nope is there a bug okay nobody sees a bug alright I get an error now I get on IM with a friend of mine I'm like hey my code is breaking can you help me out I'm trying to run this thing it just seems like auto load is completely broken what am I doing wrong the conversation went a little bit like this it works for me really you must have broken something I said no no no what file names are you using I'm shortening up this conversation here and they said well I'm using I'm using x.rb and y.rb I said try a.rb and try b.rb oh so there's no bug in my code what it turns out what actually breaks is if you have a file named b.rb or rb.rb it won't work so the solution is don't name your files like that no no no no no that's not right you should be able to name your file b.rb or rb.rb it should work fine what's interesting is I researched this bug I dug into the internals of ruby it is a bug in ruby it's a bug in ruby and I fixed it but the problem is I think while I was working on this bug I was looking at pairing with somebody and pairing on fixing it and we're looking through pouring through the documentation because I three people thought I was doing something wrong I contacted three of my peers and they're like you must be doing it wrong you must be doing it wrong autoload works for me no no use a.rb and b.rb so I'm pairing with somebody and we're reading through the documentation of autoload and the example in the documentation of autoload uses a.rb and b.rb so I was thinking about this situation for quite a while we fixed this bug together I think as developers 99% of the time we blame ourselves if you had had those two files a.rb and b.rb and you're running that and you're getting that error and you're saying man what am I doing wrong I'm doing something wrong and then you go along and you just change the file name and now it works I think most of the time you just move on everyone's like well I must have been doing something wrong I don't know what I was doing before I don't know what I was doing before but now it works I know you all have said that I know every one of you have said that phrase we're all trained to 99% of the time blame ourselves and the thing is the reason that we're trained to do that is because 99% of the time it is our fault we did screw we did screw something up that's really the truth of it I know that personally for myself 99% of the time I do make mistakes but I think what's important is when you get errors like that when you get an error like that it's important to take the time to understand why why is it giving me that error I know it's a lot of work I think it's a lot of work to look that up but I think if you do that if you're always asking why if you're always asking why you'll become a better developer once you understand what is the source of this because for example in this particular case it turned out to actually be a bug it was not my fault this is like the one time it was not my fault but the only way I found that out is because I persisted and looked into this and kept digging and digging so what I want to encourage everybody to do is take the time to find out always be asking why it's doing the thing that it's doing take the time to figure out why it's doing that and maybe you can fix it and if you don't fix it if it turns out you are doing something wrong you've learned something new I'm not sure it is Friday right oh so we have an answer over here that says regular expression and that's almost true someone else asked me a question while I find the patch for you because it's really amazing I have a joke about regular expressions if you want to hear it so programmer had a problem and he solved it with regular expressions and now he has two problems that's true that's true ok so that probably needs to be bigger can everyone read that is it readable ok so here is the nope that's wrong hold on let me do getLog function no no no no come on Aaron ah there we go files named B there is the answer right there that so I'll explain what the bug is and it basically boils down to pointers in C so just as bad as regular expressions basically what's happening here is there's that function there loaded feature path basically iterates over everything inside of the hash so I talked earlier about the loaded features array the hash that's used as a cache this iterates through that hash calling this function looking at the file names and comparing them to the canonicalized file name or is it no no not the canonicalized file name the short file name now what it does is it says it's trying to take a shortcut here everything in the loaded features path is a canonicalized file and the path that we passed in here is actually the short name it's whatever you pass to require so this particular shortcut what it's doing is it's trying to say okay it's trying to tell did you pass in the full path or most of the full path and it says alright I'm going to take this key and I'm going to move the pointer all the way out to the left hand side of the string and then I'm going to take yours move it all the way out and then I'm going to move over one and check to see if that string is the same so since pretty much all the files end in RB and my file name was B it would walk back one character and compare those two and say yes I have a match which is why RB dot RB would also fail if you try to do autoload with RB it would do that so basically what this did is it said okay only do this speed hack if the file name has a dot in it so technically files named dot RB dot RB would now probably fail to don't do that the other ones will work other questions you mentioned load and require and autoload how about require relative so I didn't mention I didn't mention require relative and that's basically because require relative all it does is figure out the full file path and send that to require so it wasn't I probably should have put it in here for completeness but it wasn't interesting to me as far as performance improvements are concerned also I don't particularly like require relative should I explain why I'll ask myself questions so I'm not a huge fan of require relative and the reason I'm not a huge fan of that is because it calculates the full file path and basically does a require on that full file path that is faster if you do that that's faster because you're not doing the search for the entire load path so that's faster but the downside of it, the thing that you're giving up is that if you change dash I it won't impact require relative so let's say for example I find this technique to be very useful especially when dealing with legacy code is let's say you have a file and you need to test it but you don't want to load that file you want to fake it out you have a class inside a file that you want to fake right what you can do is you can provide a special path with dash I so let's say you have food.rb in your main application but you want to replace food.rb with your stubs you can say dash I and provide a food.rb that has your own stubs in it and it'll load that instead of the one inside the application so it's very handy when dealing with legacy code you can say I want to replace this one stub out this one section of my code base and you can't do that if somebody is using require relative if they're using require relative you'll always get that file so I'm not particularly a huge fan of that more questions come on I'm not always in Singapore please so the question is what if the file system changes or should we dig around in the load path if we can't look it up in the cache essentially so I don't know I guess the reason I wanted to do it on install is because we know at that time we can calculate it at that particular moment we could do it at runtime too I just don't know how expensive that is and probably the problem is let's say we have a gem that contains 4000 files then we'd have to calculate that cache for all 4000 files even though your app probably only requires 100 of them something like that so maybe not do it at runtime although I don't see why you couldn't I mean if we figure it if we're able to calculate that cache at gem install time we could have an option to do it at runtime too okay if not I have a question for you what day is it today Friday therefore Friday hug time okay everyone are you happy that it's Friday I don't really believe that no no come on are you happy that it's Friday come on we're getting beers soon please everyone stand up I like to do it so I work at home I work remotely for five years and I get very lonely when I work at home so what I do on Friday is I give the internet a hug to say hello okay hold on okay everyone on the count of three say happy Friday one two three happy Friday