 I have math background, so I am going to throw some math concepts at you. Don't be too afraid. It's mostly just stats, and I'll try to explain it well. It's not going to be scary, I promise. So when I was first becoming a developer, I would ask a lot of questions of people who are more senior for me. I would want to know more things, and so I would ask questions. Oftentimes I would ask things such as, why did you do this instead of this? And they would say, because it's faster. And then I would just fire a bunch of questions on them. What do you mean faster? Why is it faster? How is it faster? Can you show me how it's faster? And they would just kind of tell me, junior developers do not need to worry about the performance of their code. Just go away. So not all of it was just to go away, to be fair. And how many of you are junior developers? Okay, some. But yeah, it's not completely a myth, because really you could spend probably another year of your life optimizing the current application that you're working on, and you would probably still have some room to improve. Also, computers nowadays are pretty fast. So if your website takes, let's say, half a second to load, as opposed to quarter of a second, most of your users will not notice and will not care. So why should you even worry about the performance of your code? Well, you don't want to make it that slow that you break production. And this actually happened to me when I was first learning to develop, and I put in one of my first PRs, and it was to fix a very small bug in this calculation that ran every night. It was a very large cron job, and it was not optimized at all. It took six hours to run overnight, every night. It was terrible. And I fixed the bug. The calculations were correct. And the next day it took 12 hours to run. It was not the only cron job we had, and it sort of broke production. So I'm here to make sure that this does not happen to you. So really, the moral of the story is not that you need to understand every single detail that there is to know about code performance today, otherwise you can't develop. Ruby is friendly for beginners. Ruby is supposed to be easy to learn. And really, you just need to be aware that there is such a thing as code performance, and sometimes it's important to make sure that you don't slow it down too much. So how can you be aware of code performance? If you Google around, there's a lot of books, and you can read through all of them and fall asleep as you're reading through them, because a lot of them are very dry and very difficult to actually get through. A lot of them are great as well, and I don't want to discourage you from learning more theory. If you want to learn more, the more you learn, the better for you. But there's just a lot of theory, and it's not that easy to learn all of it. So I'm not going to recommend you any books today. I'm not going to ask you to read through all of that dry theory. What we're going to talk about is we're going to go through a simple code example just so that we can all kind of think of the same concepts and same code example as we're talking through different concepts. We're going to talk about sampling, because I have a statistics background, and I want to show it off. And also it's a little relevant. And we're going to talk through the benchmarking module how to even approach optimizations, and when should you even worry and go ahead to benchmark your code. So imagine that you work for a cat shelter, and they have a bunch of different cats. They have some small little cute kittens, and they have some older cats that are maybe less cute or maybe have health issues. And the cat shelter wants you to work on an application where you give them the expected days it will take for each of those cats to get adopted. So you have some basic code, and don't look at it too much. It's not good code. And it's not based on any actual cat statistics of adoption. It's just for the purposes of this talk. But each cat, and cat is a class, and it's an object, it will have different attributes. And each of those attributes will affect the expected days to adoption in a different way. It will either be a positive or a negative attribute. And each will affect it at a different scale. So maybe a loud cat is less desirable than a cute cat. But people would rather have a cute cat that is loud than a non-cute cat that is not as loud. So you write an app, and there's a method in there. I'm not going to go through it in details. That basically will make the more desirable the cat is and the more desirable attributes they have, the quicker it will take for it to get adopted. And then the business unit comes around and tells you, actually we've noticed that time of year affects how quickly cats get adopted. And specifically around Christmas, people are more likely to adopt cats. Now by the way, if you want to buy someone a cat as a gift, don't do it unless you're 100% confident that they want a cat. Or that's how cats get abandoned. You don't want to do that. And then around Halloween, people are, there's some sort of magic in the air, and people just adopt a bunch of black cats. And the closer it is to Halloween, the more likely black cats are to be adopted. So how fast is this code? Well, to answer that, you could use the benchmarking module. And even if you have a slightly older Ruby version, as early as 1.93, it comes with it, so it's very easy to use. Now what does it actually mean to benchmark code? Well, all the smart people always put in a Google definition, so I did that too. So it means to evaluate or check something by comparison with a standard. Now, software terms, that's essentially what it is. You take the old implementation of the code, and you compare it to new implementation. Or to break it down in more details, you run the old code and the new code, you measure the time for each, and you check which one's quicker. Hopefully your new code is faster than the old code. So when you have any type of method or a function, usually the way that functions work is you have an input, you have a black box of your function, something that it does, and then it outputs some sort of output. Maybe not all of your methods actually have inputs, like directly in there. Maybe they don't all have attributes, but they will probably pull some sort of data. Maybe they're going to query the database directly. That's still some sort of data that you need to run that code. Now, if you're benchmarking, you want to run the code probably through a lot of different data, which means you're going to have to create a sample of data that is hopefully somewhat representative of the real population, which is where sampling comes in. So there's a lot of different sampling methodologies, and we're going to go through each one of them individually. So if all of this sounds really weird to you, or maybe some of it sounds familiar, we're going to go through it in more detail one by one. So simple random sample, it's kind of like pulling random elements out of a hat. So if I were to collect all of your names and then shook the hat and then picked five out of the hat, then that would be a simple random sample of people that came to my conference, and I would be very happy. Each person in here would be just as likely to get picked. So this means that I am eliminating a lot of biases by using a simple random sample. I'm not looking at gender. If I'm looking at my cat application, I'm not looking at which cat is the cutest and only picking those. I'm picking them randomly. And then there's the stratified sample. So this one, you divide based on certain attributes. So in my cat application, if I were to divide my cats by color, for example, and I would have a group or a strata of black cats, white cats, orange cats, gray cats, other cat types or cat colors, and then I were to pick from each of those groups specifically, that would be a stratified sample. Now, the reason why you would want to do that is if you want to specifically include elements from each of those groups. You know that, hey, there's something about color attribute in my cat application that may slow down certain groups of cats. You want to make sure that you include elements from each of those groups. Now, there's two subgroups to the stratified sample. You can do it either proportionally or disproportionately. So if you know that 30% of your cat population are black cats and you're going to pick a sample of 100, 30% of your stratified sample would be black cats if you're doing it proportionally. You could also do it disproportionately, so you have 10 colors and a sample of 100 cats, 10 colors or 10 from each color. Then there's cluster sampling. So you, again, divide your population into groups. This time we're calling them clusters instead of stratas. And then you pick a specific cluster. So again, if we're going to talk about cats, this would be if you were to pick only black and white cats and ignore all the rest of the population of cats. This is often used in real world when it's cheaper. In software, you may still want to use it though if you know that there's specific populations that you're interested in and you don't really care about any other populations. Then there's systematic random sampling, which probably all of you have experienced at some point in your life. So that's if I were, if there were 20 people in here and I said I need 10 volunteers, count one, two, one, two, one, two, two is common stage. That would be a systematic random sampling. So I pick, I get my population, I put them in a random order and I pick every nth element. And then finally, there's multi-stage, which is sort of a mix and match. So that's where you would pick, for example, a stratified sample and then within each strata you pick them randomly. That would be stratified plus simple random sample. Or you could do a cluster and then pick clusters based on a different stratification. So really any combination you can imagine, that's multi-stage. So when benchmarking and either creating your own data or picking data from population, what method should you use? Well, I usually use a strata if I know that there's some variables. How would I know it? I would look at the code and see, okay, there's this one attribute that goes into a specific code path and I don't know how long this code path will take as opposed to any other code paths. So I want to split my population into different strata and make sure that my benchmarking goes through every single code path that is possible. Now, there may be some things that I do not see as I'm looking at the code because code gets complicated, there may be metaprogramming, there may be a lot of stuff I don't understand, especially if it's a new code base I'm working with. So within each strata, I would do a simple random sample to account for those unknown variables. And then finally, if I know, okay, it's a black cat specifically that I care about, then I'll pick a cluster of black cats specifically just to look at those. So in the cat application, we can see that there's two branches of code that split up, there's two if statements, so very obvious branching off, and it's specifically based on color and date. So variables to consider, black cats and the current date. Now before we can actually pick a sample, we want to know how big a sample should be. Generally in statistics, the larger the size, the larger the confidence level. If you pick a sample of one, that's not a very good sample size. If you pick a sample of two, that is significantly better than one, but that's still pretty bad. If you do 50 versus 51, it's not that big of a difference. So you get diminishing returns. And it also depends on population size. If your population size is 10 and you don't expect it to grow, then pick all 10. Now in statistics, there's this magic number assuming a very large population of 384.5. And that number will give you a 95% confidence level that your sample is resembling your population. And it's not all outliers. Now in benchmarking, you don't have to do that exact of a science. And especially if you're playing around with making sure that you're taking into account certain attributes and you're already doing a lot of stratification yourself. Then, well, I usually just pick something between 100 and 500, and I usually pick around a number. So if I know I'm going to have five different groups, I'll pick 500 because that's five 100 element groups. And that's just easy math. Now the benchmarking module itself, it has four different methods. There's the benchmark method. It will allow you a lot of pretty customization. You can change the label, the caption, the width, formatting. All those really nice options. If you don't really care that much about what your output looks like and you just want the data, BM, same thing, but less input options. You can still provide it with a label and a label width, but you don't have to provide any other customization options for the way that the output looks. BM-BM, this one is kind of interesting. So it's my favorite too. So this is the one that will run your code twice. And this might seem redundant. However, Ruby does a lot of magic for you. So it takes care of things like garbage collection and other things that you don't need to know about to write code in Ruby, but that means that it slows it down sometimes at initialization. So basically if you're passing a couple of blocks to benchmarking, a lot of the times the first run through whatever first block is will be slower than the rest. And if you're trying to show which implementation of your code or which sample is slower, that's not going to be good for you. So doing the first run through as a rehearsal will make it a lot better for you. And then measure that one. You just pass in a block of code to it. You can still put in a label and it will just measure how long the code takes. So if that sounds like a lot of stuff, here's an example. So I'm using the BMBM method. So like I said, my favorite. And I divided my cat sample into two different samples. So I have the simple random sample, which includes all cats except for black cats. And I have black cats specifically separately just to see how much slower my black cats are. And then for each of those cats, I just run expected days to adoption. And since this is BMBM, I get the rehearsal block over here. And as you can see the results in here, the simple random sample that does not include black cats took slower about twice as long as black cats individually. Now in the real run, it's the opposite. And that is because of all those background processes. Also you'll notice benchmarking has four different columns. There's user, so that means user written code. So that's your application. There's system, so that's kernel stuff. So stuff running sort of in the background, not your application specifically. And there's the total, which is this plus this. Since the cat application was actually pretty quick, you don't have anything because it kind of rounds down to zero. And there's the real, which is the real time it takes for everything to run. This may be slower, or this may be faster than the total. And the reason for that is because it is sort of the real time it takes your application. So if you're waiting for user input, real will take that into consideration. Total will not. Total is just user plus system. If you're doing multi-threading, then total may be still longer than real. Because you're doing multiple things at once, and in real time it takes less time. So if you think you want to optimize some code, some things to watch out for, and may give you sort of red flags, will make your code slower. Loops. So every time you have things like each and map, and other different looping methods, your loop will go through each element individually and perform whatever's inside of your block for each of those elements one by one. That means that the longer of a list you have that you pass into it, the slower it will be. Now if you have nested loops, that will be even slower because for each element it goes through each element inside of the loop. That's n squared and go notation, which in English means very slow, and it gets slower. Recursion. That one is a little bit tricky. It can be fast if you cache it well, and it may be useful in a lot of times, but it's kind of like a red flag where you may want to benchmark it and see how quick it is. If you require a lot of things at the beginning of your class, you'll have to load before your code can run, so that will slow down your application, and if you have a lot of callbacks and observers, that's something that's kind of tricky and not directly in your methods that will also run and will slow down your code, and if you're not sure what it is that's taking a lot of your time, check if you have any callbacks. So in the cat application, what made it slower? The method that only the black cats go through is a recursive method, so that means that the farther we are from Halloween, the more times we're going to run that method, and any time you're doing the same thing over and over again, it will take you time. So that's why black cats specifically were slower in the cat application. Hopefully, you're getting a little bit excited to start optimizing some code, but before you run into your application and actually start doing it, a few more things. Tests, make sure you have tests generally before refactoring any code. It will prevent you from introducing bugs. It will give proof to other developers that stuff is still working as expected, so they don't have to spend as much time code reviewing and making sure that you think through all the corner cases because they will be in the test cases, hopefully. And if you don't see any tests for the code that you're about to refactor, that doesn't mean you don't need tests. Write them first, make sure they're green, make sure you have a good understanding of what this code should do, then make sure they're green, refactor the code, make sure they're still green. Rethink the problem. A lot of the times, you will be given a feature and you'll implement it, and it will work well. And then you will be asked to implement 20 sub-features that you did not ever think of while doing your original design. And now you have to write some spaghetti code and now it all looks ugly and terrible and it's not performant at all. So it's useful to sometimes kind of step back and think about, well, if I were to write this code to accomplish what it accomplishes today, how would I write it differently? Step back, try to look at the big picture. If there's anything that looks like it's complicated, think about how you can simplify it. And also, talk to stakeholders. Your job as a developer is to understand how to write code and hopefully you can do that well. But you may not understand all the intricate details that go into your business and your business model. That's why you have a separate business unit. And they probably understand the problems better than you. So when I was preparing for an interview a few months ago, I went through a lot of code challenges. And one of the code challenges I came across to was, there was this triangle of odd numbers. And I was supposed to, given the number N, find the sum of the nth row of that triangle. So I solved it. It was not a good solution. And this was my solution. I basically reconstructed the triangle one row at a time to create an array of arrays where each sub array was representative of the row of the triangle. And then at the end, up to the nth row. And then at the end, I would sum the nth array. So after thinking about it for a while, I couldn't really think of a better way of doing it though because, yeah, I just wasn't sure how to do it in a better way. So I looked up other people's solution. And the top solution, oh, yeah, and sorry. If you look at this, there's a one loop here, nested loop here, as we talked about earlier. That is very slow. A better solution and to the third power. I had no idea that this could have been done. And with, yeah, and this is one of those little things, although it wasn't like a real-life code application, sometimes you'll find things like that in code where you overcomplicate a problem or you come to a solution that is not straightforward. It works. The person that actually understands the problem the best is probably not a developer. And it can be good to talk to them and get that different perspective that, hey, you can just raise the number to the third power. You don't have to construct a triangle of odd numbers. And the performance difference? Huge. My way, again, I'm using BMBM. My way took 2.8 seconds to find the sum for number 9999. The better way, yeah, a lot less. Find some bottlenecks. If you're given the task of refactoring a very large class with a lot of submethods and you don't really know how to approach it, you don't necessarily have to refactor every single line of it. Find what methods are the ones that actually take up the most time. Maybe it's two or three methods that take up 90% of the time. Refactor those. Don't worry about the rest. Optimize your own time as a developer. And you don't have to actually use benchmarks to measure time. Really measuring time is as easy as doing start time equals time current and time equals time current at the end of the method. And put it out as in you can do a put statement. So, yeah, if you're not super comfortable writing too many benchmarks and doing that, sometimes it might be easier to just use some put statement. And you're more or less getting the same information. And finally, whenever you test with optimizing code or you think you're optimizing code, make sure you do actually run benchmarking and you show some sort of proof for yourself and for other developers that what you did actually improved the code. And specifically, if you're making a lot of changes, it's very easy to kind of get into the groove and say, oh, I did this, this, this method up. I'm going to do the same thing here. It should speed it up as well. Prove that it does. Show some data. Show people, look, this one little improvement improved performance by 10%. This one improved it by 20%. I ran into an issue where I was optimizing code and I thought I made a small improvement. Actually, I slowed it down by 10%. I did five other improvements. So overall, it was still quicker. But if I didn't incrementally check it, I would have actually not optimized it as much as I could have. So when should you actually benchmark your changes? Any time from experience you're working on large crown jobs that take six hours, definitely benchmark every single bug fix you do to that. But really any time code that runs frequently or runs often or runs through large batches of data, if you have a small method that takes a quarter of a second and runs through 10 million pieces of data or 10 million pieces of inputs, you want to make sure you're not slowing it down or that will be a huge impact even if it's a small performance degradation. If code is already slow, that doesn't mean you can slow it down even more. Make sure that it's still below the acceptable threshold. And you can always talk to other developers about it and just put a comment on your PR saying, this will slow it down, but it should still be fine. But if it's not, blame it on me and I'll fix it. Take ownership. And obviously use benchmarking when your task is to optimize code. Show that you did what you say you were going to do. So yeah, with that, go forth and benchmark. And if there's any questions, I'll stick around here and can talk to anyone.