 While people are coming in, this talk is called Making it Fast, the Power of Benchmarking. My name's Cameron. I'm Cameron Pee on Twitter and most other services including the Elixir Slack and GitHub, things like that. I come from New York and I just really want to thank the organizers actually for having me here because it gave me the excuse to come to Bangalore, which I just think has been fantastic, the first time in India and you might think that it's very different to New York, but there are similarities and I have to say I feel very comfortable right at home walking across the street in this town. It all makes a lot of sense. I write software. I've been writing software since 93 working at Microsoft and Real Networks and various other companies. I'm a little bit lazy and Java later and then Ruby and then more recently an Elixir. If I'm perfectly honest, the last eight or 10 years I'm more managing people who write software and not writing so much in production myself, but I try to at least keep my hand in it and write things as much as possible. Three years ago I started looking into Elixir and learning about it and I got really, really excited about it and started focusing on it. In fact, I helped a few people put together a conference in New York which is called MPEX, which was the first regional Elixir conference in the United States and has now actually expanded. We're doing it in New York in May and then we're also doing one in LA. Desmond Bowie was the original founder of MPEX and he's moved to LA and we're going to do a conference there. If anyone is interested in speaking at MPEX LA in February, their RFP is open and you should hit them up. With that I end the self-promotion. Making it fast. When you talk about making something fast and improving performance and things of that nature, you could be talking about a lot of different things and you could be talking about optimizing the garbage collector and little tricks that are dependent on your language and your framework or whatever else. Hacks in the compiler. If you've built any Java code, there's lots of stuff that you can do in the compiler. I'm not going to talk about any of that stuff. The stuff that I want to talk about is applicable, I think, more or less no matter what you're doing. What I mean by that is I want to talk about the process of optimization and how we build that kind of thing into sort of a continuous process instead of being reactive instead of saying, oh, I've done this thing and like, oh, crap, it's slow in production and now let's fix it. Can we build something into the process in the same way that we have done with unit testing, continuous development and stuff like that? Specifically I'm going to be talking about benchmarking and how we can build that in. A lot of what I'm going to say I think is applicable into different frameworks and languages, but I am going to be a little bit elixir specific later as you'll see. What do I mean by benchmarking? Benchmarking is not profiling. Profiling in my opinion is something you tend to do reactively. You tend to say, okay, this application is slow and I need to find out why it's slow. You hook a profiler up to it and depending on your language or your framework, that might be quite a big job, but you start looking at it and you're doing it kind of reactively. Once you've done it and you say, okay, problem solved, application is fast enough now, you probably put the profiler back on the shelf and don't touch it again for a while. Benchmarking is more like unit testing in that I feel that it's something that you can do early in the process, not first, we're not talking about benchmark first development here, but early enough in the process and continuously enough that it can kind of continue to hopefully yield dividends over time. Has certain characteristics, right, so benchmarking is granular. Benchmarking is not something you necessarily do at the very top level of code. You try to find, just like with unit tests, you try to find small pieces of code that you can benchmark. It's repeatable. If I've got a benchmark test written, I can run that benchmark test again and again and again and as I make changes to the code, I'm going to be able to see how those changes have affected my benchmarks. And it's relative, whereas if you take something like a unit test, I mean it either passed or it failed. You just described an expected behavior, you either got it or you didn't, and it's true or false. Yes, absolutely, exactly the kind of thing I'm not going to talk about. Because yeah, depending on your framework, there may be really specific details exactly like that where you need to make sure that you're actually benchmarking something that's relevant. And so yeah, for sure, I'll touch on something like that, but yeah, I'm not going to get into the weeds too much there. It's also relative. So the question of relativity is that there's no such thing as a successful run in terms of the benchmarks. You don't ever say like, oh yeah, that was fast enough. Because whether it is fast enough is totally dependent on your problem domain. But if you'd make some change and something unexpectedly just got five times slower, that's actually pretty interesting maybe. So to illustrate this, I'm going to tell a short and slightly contrived story about performance. And so here it goes. So when I learn a new language, any language, I have a tendency to, a lot of people do like a big project and spend a couple of weeks on it. And I don't do that really. I like to do small problems. I do like to do small things that take me a couple of hours. And maybe a half a day or something, but not something that takes weeks. And I like to do kind of lots of them. And I find that I learn a lot faster that way. So that's my approach. But I don't just sort of do the small problem. And once I've successfully solved it, call it a day. I try to treat it sort of professionally. I try to treat it seriously as I would if I was doing the problem in a professional context. So I do the small problem, and then I refactor the small problem, and then maybe I optimize the problem if it's relevant. And maybe I do the documentation. And by doing that, I find that you learn about lots of bits and pieces of the language that you might not necessarily delve into otherwise. So where to get small problems? There's lots of places, but I have a personal favorite which I advocate to everybody who'll listen to me. And that's a site called adventofcode.com. They've got loads of great problems on there. And it's essentially an advent calendar, which is like a thing we use for kids at Christmas time, starting on December 1st. You get a new puzzle every day from the 1st to the 25th. And they get harder and harder. So the first few are easier, then they get really, really hard at the end. And so this story is about Advent of Code 2015. And this is what it looked like. And the problem that we're talking about is right there. And that is December 6th, 2015, when I received the following problem. And there's lots of background to the problem. It's all about Santa Claus and all sorts of crap. But I'll just break it down simply. The problem is you have 1,000 by 1,000 grid of lights that are addressable from the top left corner, 0,0 to 999999 in the bottom right corner. And they're controllable using special commands, so they give you an input file to parse. And you're supposed to process the commands and figure out how many lights are to lit at the end. So this is a million lights. And the commands look kind of like this. So the commands are just kind of turn off, and these are supposed to be rectangles. So turn off the lights from 660, 55, through to 986, 97. And that's supposed to be like a grid. And it's supposed to turn off all those lights. And then it can turn them off, it can turn them on, and it can toggle them. And what you're supposed to do is you're supposed to process all of these commands. And there's like 25 or 30 of them. And at the end of it, you're supposed to count how many lights remain lit. And then you can enter that into the website, and it'll tell you whether you got it right or not. Now, to be honest, this is really not an ideal problem for Elixir. If you were actually solving this problem in a professional setting, you probably would not reach for Elixir to solve this. You would also maybe ask yourself some questions about your job. But the reason why it's not great for Elixir is, first of all, it's a 2D array, and we don't got 2D arrays. And second of all, it's mutating this giant data structure over and over and over again, which is not something that we do as efficiently in Elixir as you might do in C or even Java or Ruby or something. By the nature of the problem, you're not going to get any wins from concurrency by doing this. But we're not doing this to solve a real problem. We're doing this to learn. So we're doing it anyway. So the title of this talk comes from a mantra that I did not invent, but I've been saying it for years without knowing where it came from, or it turns out exactly what it means. But it's make it work, make it right, make it fast in that order. And in the course of researching it for this talk, which is weird because I've been saying it forever and I never looked it up, but it turns out that it's maybe Kent Beck, not sure, probably Kent Beck. Seems like. Let's say it was Kent Beck. It wasn't Einstein. So first of all, make it work. What does that mean? Well, make it work just means, OK, you've been given the problem space. Do what you've got to do to solve the problem as quickly as possible. And maybe not as quickly as possible, but don't spend a lot of time thinking about performance. Don't spend a lot of time thinking about architecture. So you might have to throw it all away. But if that's the case, why do that? Why go through that process of possibly throwing away code? And my feeling is what we're trying to gain from that is understanding. And so if you go through that exercise and you do the make it work exercise, at a minimum, you're going to understand the problem space a little bit better than you did before. And maybe you'll have something that's worthwhile, and maybe you won't. But even if you don't, you're going to at least personally understand a little bit better. So given the problem that I just showed you, let's take a look at the code that I initially wrote to do this. So I have to do a little bit of a jumping around here. OK. So this is the code. This is the code for the thing that I originally did. Is that big enough everybody can see? Yep. You want it bigger? Nope, you're good. All right, so there's not much to this. I usually break problems down like this. Load just loads the input file with all those commands. Process is the part that is probably going to take up the most time. And then compute is just count how many lights at the end. And so when I thought about it, I was like, God, how am I going to do this? But I don't have 2D arrays. I don't have anything like 2D arrays, really. And I just don't really know what to do. And so I definitely don't know it's not a list of lists. That's not a good idea. But the only thing I could come up with was a map where I've got tuples representing the points, x, y. And then the value of the map is a true or false or a one or a zero or something like that. It doesn't feel right to me. Because a map is not a 2D array. But I didn't have any ideas other than that. So we went with it. So I implemented this. This is what I did. So I implemented this map. And if you look at process, well, actually, let's look at load first real quick. So load, I'm not going to really delve into load too much, because it's not that important. But it just parses the file. And it just produces a bunch of tuples like this, where we have on, off, toggle. And then we've got this function parse range that pulls out. It parses the integers, and it produces, it runs a reg x. And it parses the integers, and it pulls out what our range looks like in terms of two points. So that's really all that does. So process, it just takes those commands that have been created by load, and it reduces them into a map. And then exec, it picks the appropriate function based on the command. And if you look at what those functions do, all they do is they find out all the points in the range that literally takes the two corners and produces a list of points. And then it turns in these onlight and offlight things, they just mutate the map. So that's all that's going on here. It's not terribly elegant, but maybe it works. And so I thought, OK, well, maybe that's pretty good. So now, are you kidding me? Oh, there we go. You can't help me with anything. Well, we're just going to have to do this hard way. So now that we've done it, now that we've come through basically the code, the working code, and it worked. It ran, and it took a while. It took like 45 seconds or something like that, or a minute, so maybe not as good as it could be. But it worked. So I've kind of got it OK. So now it's about making it right. When I looked into it, the concept of making it right was another thing that was open for debate about this mantra. Some people thought make it right meant handle all the edge cases and corner cases. And some people thought it meant refactor the code and make it architecturally sound, which is what I always thought it meant. So I'm just going to say it's make it refactored. So it's make it work, make it refactored, make it fast. So let's go ahead and look at the refactored code. So the refactored code looks like this. So all I've done here, really, I did some small refactorings to remove a bit of duplicated code, which I won't really go into, nothing really major. But the main thing that I've done here is I've separated the file into three parts. And this is going to turn out to be important. So I'm still happy with load process compute. But I've moved everything that relates to the map into one section of the file. And I've basically said, OK, this here could potentially be extracted from this file. You could potentially move this into another module. I'm just holding off doing that for the time being because I tend to like to push that back. But now I've taken the piece of code that I'm most suspicious about that I think I'm now preparing effectively to make it fast. And I'm also preparing to benchmark it. And so that's really all I've done. I've removed some duplication in the way that command processing work. But that's all. And so what to do next? So next thing to do is make it fast. Now years ago, there was a product called FoxBase, which most of you are probably too young to remember. But FoxBase was like a version of another product called Debase, which most of you are definitely too young to remember. But it was exactly like Debase. But it was like 10 times faster, like literally 10 times faster at everything. And the guy who wrote it was interviewed and asked, how did you make it 10 times faster? So I just took out the slow parts, which is, yeah, that's great, man. So yeah, just take out the slow parts, and you're going to have a much faster program. So this is where benchmarking turns out to be a big win, because it allows us to identify what the slow parts are. So I use something called Benchfella. There are many benchmarking toolkits out there. This one I like, mainly because it's simple. It's not super feature rich, but it's got enough. But you can get going with it really quickly. And it's really analogous to the way test unit works. So it sort of feels right in terms of my whole benchmarking unit testing dichotomy. So you put this in your depths, in your mixie access. And you have to create a folder called Bench underneath the root of your project, inside of which your benchmarking tests are going to go. Now, I wanted to make one more change, which is we had this big list of inputs with all of these commands. And I don't want something that's going to take 45 seconds when I'm doing my benchmarking, any more than you want your unit tests to take that long. I want my benchmarks to happen a little quicker than that. So I've made a simplified version of this. It has smaller numbers and it has fewer commands. And so hopefully my benchmarks are going to run a little quicker. So let's see how it goes. All right, we'll load another thing. So what we have here is I've added the Bench directory. And we have this lightsbench exs. And so as you can see, we're just touching the top level. So you could get really granular with this and benchmark loads of things. But just for now, I'm just going to benchmark the top level and we're going to see which part is slow. So we've got load, which we probably think is not going to be very slow. We've got process, which we suspect is going to be very slow indeed. And we've got compute, which is the adding up at the end, which is also potentially slow. You don't know. And so now once we run this, we're going to be able to see which one's slow, which one's not slow. So all you do is just type mixbench and it runs the three things. And it's basically running these functions as many times as it can in the kind of allowable time frame. And sure enough, to no one's surprise, load is very fast at 100 microseconds per op. Compute is also very fast, a little quick, a little slower, but still 267 mics an op. And process is the slowest by far. So process is the slow part. And process is the thing that's dealing with that giant map where we really want a 2D array. So well, that's great. But what now, right? It's not like I suddenly have 2D arrays. So I thought about this for quite some time. And mostly I think I just googled this one. And once again, Erlang OTP just has all the things. And here in the Erlang library is a module called array. I don't know how many of you have delved into the Erlang standard lib, but you just should. Just go into this Erlang standard lib and look at all of the things that are available in there. And you'll be amazed at some of the stuff that they've got like directed graphs, they've got stuff you wouldn't believe is in the standard lib. One of the things I have is this array. And I see this and I'm like, oh, this is the business. It's going to be functional extendable arrays. It's going to be fast. It's optimized. Here it is. And there was a couple of problems with it. One is that it only supports one dimensional arrays. But I said, all right, well, I'll just have a one dimensional array of arrays. And then I'll just have two lookups. And maybe it'll be quicker. So let's see how that looks. So now I've done a bunch of things here. One thing that I've done is I've moved the persistence layer actually into its own module so that I can have a map grid and an array grid, map grid being the original one and array grid being the one that's implemented using the Erlang array. And the Erlang array is just exactly the same. It's just exactly, wow, can I make it a little smaller? It basically just implements exactly the same methods that we implemented for the map of before. But it just calls this colon array module. And it calls get to get the rows. And it calls set to set the things. And we've had to turn a bunch of stuff around because it's Erlang and the object goes at the end. But basically this is exactly the same implementation as we had before. It's just a different underlying persistence model. And so the other thing that I've done, actually, I should show you, is I've changed these benchmarks so that they can either use a map or they can use the array grid optionally. So now I'm going to be able to have benchmarks that compare and contrast what's going on. And so I've got process using an Erlang array, process using compute, using map, et cetera. So what happens when we run mixedbench here is it now goes through all of this process and is running all of these things. And at this point, I am super optimistic. I'm like, this is going to be awesome, right? Like I can't wait to see how much faster it's going to be when I've used the Erlang built-in array thing instead of the map. And I don't know if you can see that. But it's substantially slower with the array than it was with the map. And the level of disappointment here, I can't even begin to convey, because I was so excited about how this was going to work. And I was like, wow, Erlang has everything. And then it's slower. But I was like, well, since I've come this far, let me show you. So yeah, that was me. I'm like, no. But since I've come this far, let me at least just try to run the problem with the two things. Now, this first line up here is 35 seconds. That's the original way with the map. But then the second line is 18 seconds using the array. So despite what the benchmarks were just telling me, it's faster. It is faster, right? It's 100% faster. So I'm like, well, what's wrong? Why did my benchmarks just come up with nonsense right there? So how to think about that? And if you remember, no, I don't have the preview here. So things are surprising me. If you remember, I kind of simplified this by shrinking the number of lines. But it also shrunk the number of numbers. So the effect of it was that I wasn't just doing fewer operations. My map was smaller by a lot. And so that small map was having a substantial impact on the performance, which shouldn't really be a huge surprise, but it didn't occur to me. So let's change this so that we can have a sort of real map. And so that is OK. So in this case, what I've done is I've added a new input, which is a bit more realistic, which is big numbers, four things. But big numbers should create a big map. And I put that in bench about two. And then I also need to make a change to the bench file to make sure that it loads that one. So if we go down here and bench about two. Now we're going to run this benchmark with a little bit better realistic data, maybe. And we'll see what happens now. And lo and behold, now Erlang Array is substantially quicker. It's interestingly, it's a little slower for compute. But that slowness for compute is more than made up for by how much faster it is for the process part. So the mutation is, for whatever reason, a lot quicker. But the important thing to realize is somewhere in between these two sizes of maps was a sweet spot. And so whether or not the right thing to use for you for this kind of problem is a map or an Erlang Array is entirely dependent on your program and only really something like benchmarking with realistic data can tell you that. So in this case, this is explaining why I was seeing the behavior I was seeing or I was working. So actually, I want to show you a couple other things. I can't demo this live because it actually requires the internet. And I don't trust it. But there's a few other little features of bench fella that are worth looking at. So one thing is that bench fella puts snapshots. Every time you run a benchmark, it creates these snapshots of the last time you ran it. And you can compare this snapshot to the previous snapshot. And I think you can make it generate warnings when things in the snapshots get significantly slower, which is really useful. And the other thing that it can create is graphs. And this is the thing that requires the internet. But you can just say mix. I'm not going to do it because it'll break things. But you can just say mix bench graph. And it will either produce the last snapshot, or you can make it compare the last two snapshots. And it'll show all of your stuff. Pretty neat. And I did it earlier. So hopefully you can see it. So this is what the graph of the first run looked like, where we were just using the map. And we've got compute. And you can see it's very clear that process is the issue. And then this was the one where we had all five. We're computing using an Erlang array. And you can see this is compute using map. And you can see that it's slower. It's faster, rather, than the Erlang array. But this is process using Erlang array. And it's considerably quicker. No, it's not. I'm not sure what the hell's going on. That shouldn't be that way. In any event, you can generate graphs. So back to the slides. So yeah, like I said, the important thing is the size of your data matters to some extent. So you want to simplify maybe, but don't oversimplify. So my tips for benchmarking and the importance of benchmarking is that you should be writing your benchmarks early, but not first. So first get it working. First get it into a state where it's where you would be comfortable maintaining it, or your team would be comfortable maintaining it. Then build your benchmarks. Simplify the inputs sufficiently that you've got benchmarks that you're not afraid to run all the time, that you might maybe be willing to automate or make part of a continuous integration process. But not too, simply. And if you're seeing weird behaviors where the benchmarks are just not agreeing with the production behavior, well, then you need to think about what's wrong with my benchmarks. Something is clearly not representative in my benchmarks. And keep running them all the time. And yeah, that's pretty much what I have to say. Like I said, I'm Cameron P on all the various services. And are there any questions? Nope. Thank you very much.