 Do I have a mic? Oh, I do. How was lunch? Good? Bad? OK. Jameson nailed the bio. Let's talk about Big O. This is a great topic for right after lunch, right? Good topic to take a nap to. Who here already feels pretty proficient in the world of Big O notation? Great. Great. Before we start about Big O, I wanted to show you something that just warms my local Utah heart, which is my favorite place to buy tires. I think if you're a software engineer and you don't get tires from here, you really need to think about what you're doing. OK. Why do we talk about Big O? And for those of you that don't know much about Big O, you're probably saying, that's exactly why I'm here. It is a great tool for communicating about the speed of algorithms with your peers. It is a really good tool to use to reason about your own code. Also, job interviews seems to be a favorite go-to question for job interviews. How many people have been asked a Big O question in a job interview and been like, be de de de, de de de, de, de, de, de? Yeah, I have. And last one, if you ever get invited to a dinner party with Rich Hickey, you'll be happy to know something about Big O. You can see Big O out in the wild. Nowadays it's getting pretty popular. This is a screenshot from the React documentation. I've highlighted four places on this page where they talk about the runtime complexity of their algorithms in Big O notation. The big question is how do we talk about the speed of our programs? This should give you a language to do that. Do you have the vocabulary to be able to express to one of your peers how fast this function is? And I'll walk you through it because we're gonna revisit it a few times. This is a dumb little function that takes a string as input and returns the index as a number of the first digit, zero to nine, that it can find in that string. Okay, it uses a little regex to do that and then returns it and returns undefined if it's not found. How fast is this function? Could you say? If you just blurted out an answer, one answer is probably not enough to express this because the real pragmatic answer to that question is pretty long. First of all, it depends on a lot of stuff. It depends on where the position of that digit is in the string. It depends on whether the Justin Time Coppiler has run on your runtime or whether you even have a jit in your runtime. It depends on if there's other programs running on the computer, right? Depends on the length of the input string. It really probably doesn't actually matter in most cases how fast this function runs. And it depends on the hardware and depends on the browser, among many other things. These are all very much correct answers to the question of how fast is this function. But Big O doesn't have anything to do with that stuff. It doesn't care about your hardware, it doesn't care about your jit, it doesn't care about your runtime or your browser. What it does care about is limits, like in calculus, steps that your program does, and big inputs. It's a way to categorize your code. So let's go back to our example. Find digit time. There's really only one point on that list that actually matters to Big O and it's the size of the input string. And here is a graph of what we might expect that function's runtime to be on the left, the vertical axis. And on the bottom, the horizontal axis is the input size. So this is pretty much what you would expect that as a string gets longer and the digit is randomly placed within it, you would expect that function to take longer and longer and longer, relative, linearly, to the length of the input string. This is just what you would expect by looking at it. Could we make it faster? Yes, we probably could tune it a little bit, maybe change the regex, maybe compile the regex first and cache it off and then call things, make it a little faster. But at the end of the day, that code, the way it is written, the best you could really hope to do is move the slope of that line downward. You probably aren't going to change the shape of it. It's still gonna be a straight line. This is a complexity class in Big O speak. We use the term linear to describe this kind of a function. A function where the input size affects the runtime linearly. More inputs, or bigger input, equals longer runtime in a linear fashion. Some other terms you'll hear used to describe this is Big O N, linear runtime, linear complexity, linear order of complexity. These all basically mean the same thing, Big O N. All right, could we do something to change the complexity of this function? Could we implement the code a different way so that we could find the first digit in a way that depends less on the input size? We probably could, but we'd have to make some assumptions. What if we assume that the string coming in is always alphabetically sorted, or as I like to say, ascobetically sorted by ASCII? What if it was that way? Could we change our function to make it faster? Yes, we could. We could do something more complex that looks at the middle of the string first, test to see if that character's a digit, and if it is, if it's not, but it's less than the digit, go one way, and if it's greater than a digit, go the other way, and then bisect the string again at the remaining half, and every step of the algorithm we divide the string in half until we hone in on the actual digit, if there is one. We could implement it this way. If we did that, what would the graph look like? Does anyone have any ideas? Someone, yeah, people were like, it would be like this. I'm like, that is correct. The hit here is that it would, every iteration of that while loop in this code, we are cutting the input size in half, which means you could double the size of your input and only add one step to that while loop. This kind of complexity is called logarithmic. Has anyone ever looked at a stock chart and there's a little option to say linear or logarithmic? That's the same logarithm that we're talking about. And so you can see that as your input size increases, the runtime does not increase linearly, it increases logarithmically, and we use the term big O log N to describe these kinds of algorithms. By the way, this algorithm is called binary search, which probably most of you are familiar with. Could we do even better? What would be better than logarithmic? What if we had a cache of some kind, like a memoization store or something that has already solved the answer for us and now we just have to fetch it and return it? Well, if we make some pretty silly assumptions about the speed of that and about having a pre-computed cache, we could do a pretty crazy thing and just return the pre-computed answer like that. But really we have to make some pretty crazy assumptions. First one is not too crazy, then that would be that JavaScript object properties are always look-up-able in constant time. That's pretty much mostly the time true, thanks to some really cool stuff in JavaScript runtimes these days. And could we assume that it's true for very large strings? No, we really can't assume that. That's actually a bad assumption, but let's just make the assumption so we can have a pretty graph. We also have to assume that the sum cache value that I showed earlier is pre-computed and ready to go. If we make all of those assumptions, we can achieve what is called constant time complexity or a big O of one. What that means is that regardless of the input size, your function will always return in a constant amount of time. In other words, as you increase the size of the input, the runtime does not. Constant time complexity. Now it doesn't mean that the constant time will be very fast necessarily, it just means that the runtime does not increase when you increase the size of the input. Hashing is what we call that sometimes, memoization is commonly used in the JavaScript ecosystem. Here's a couple of other constant time operations that you probably use often. The first one is deciding whether an integer is even or odd, right? Modulo two, that's a constant time operation. It doesn't matter if you're passing in the number one or the number three billion, that operation always takes the same amount of time. Reading a length of a string, you could imagine a world where that would be a linear algorithm, but it's not because JavaScript actually caches the length of every string on the string itself. What about c, though? You know, anyone here a c developer? I'm so sorry for you, but c actually uses an array that doesn't have a length field, and by default c implements strings as an array of bytes, so you actually do have to walk the string, right? One by one, and that would be a linear operation. Overwriting an element in an array is also a constant time. So here's a list of the complexities that are the most common that we'll be talking about today. First one is constant, which we just talked about. Logarithmic, this is the kind of problem where you can divide the input space in half for each step of the algorithm or divide it by some significant amount. Linear is when your input size dictates linearly your output, or rather your runtime. Quadratic, I'll show you an example of that in a second here. That's where the runtime goes up as a square of the input size, and then the really nasty one is exponential, two to the n. That means every time you add a single unit to the input size, your runtime doubles. Those are actually quite rare in practice. I have hardly ever written one of those. It's actually hard to write one of those on accident even in my experience. So let's look at what quadratic looks like. You can see some numbers here. On an input of size one, in this case, you get about a nine or a 10-ish. Inputs of size two, you go up to like 30-something. Input of size three, you go up to like 80-something, and so on. These are the squares, right? So the idea is if I have an input size of n, then the runtime will be n squared. So you'll hear that called quadratic complexity or big O n squared, so the terminology works. An example of quadratic complexity is a sorting routine called selection sort. Here you can see it visually what it does, but selection sort iterates over the elements in a list, collection or array, and it then, from that point, spawns off another for loop that goes to the whole rest of the array looking for a value that is less than the current point, and if it finds one, it swaps them, puts it into place so that the smallest value is always at that point, and then it does it again and again. As you increase the size of the array that you wanna sort with the selection sort algorithm, you will increase the runtime of that program or that function as a square of the size of that array. So it's a pretty bad way to sort things. Let's look at what the code looks like. The code's actually pretty simple. It's just a couple of nested for loops with an i and a j, and so you walk through the array length with i, and then j for each i walks through the rest of the array looking for a value that is less than the value at that j point, and if it finds one, swaps them. Pretty much any time you see code that has a for loop nested under a for loop that's operating on the same loop value, so in this case the array, you're almost always dealing with an n squared algorithm. Very, very common, very easy to write, and we'll talk about a special case where the nested for loop isn't operating on the same data, and that's not necessarily a quadratic or n squared algorithm. Okay, this last one is called exponential complexity, or big O, two to the n. This one goes off the chart incredibly quick, because every time you add a single unit to the input size, the runtime doubles, and they're pretty rare in practice, but a couple of examples you'll hear mentioned is the traveling salesperson problem. This is where you have a set of cities and a set of distances between those cities, and you have to write code that traverses all, that visits all cities and comes back to the starting city in the least amount of time, and so, or covering the least amount of distance. In order to do that, you are actually required to visit every city a ton of times in your algorithm to find the least, the smallest path, and it will take an exponential amount of time as you add cities. If you have five cities and it takes 10 seconds to run, if you add a sixth city, it will probably take 20 seconds to run. If you add a seventh city, it'll probably take 40 seconds to run and so on. That's what two to the n or exponential looks like. The only other example I hear talked about much with this one is something called brute force matrix chain multiplication. Anybody done that before? Did I see a hand? No, just scratching. Okay, oh yeah, I do it all the time. Anyway, I won't even talk about that. I don't really know much about it, but you can only find a couple of examples of two to the n algorithms. All right, what happens when you start combining complexities? So for example, what if I have two different inputs? And I need to iterate through, let's say for example, each character of a string and I've got an array of strings. What does that look like in terms of big O? Well sometimes we call that big O n by n where n represents the character count of the strings and m represents the number of strings in the array. N by m. A purist would probably say, oh that's just n because the complexity class doesn't actually matter. The fact that you have an n by m is unimportant, but to the pragmatic engineer, that usually is pretty important. Another one that you'll often hear is called n log n. Has anyone heard that term before? Probably have. Most sorting algorithms that are efficient come out at n log n in the average case. And this one has a funny name. Sometimes people call it linear rhythmic because it's linear times logarithmic. And the reason it comes out that way is because as your function iterates through the array, it will then, for each element, it will iterate through some of the elements in the array that remain logarithmically, meaning if it's like, well it's hard to give real numbers, but basically for each element you do a search on a small limited subset of the rest of the array. Okay, let's talk about amortized big O. So how many of you have heard of amortization charts with mortgages and in fact the root for mortgage and amortize is the same thing? The idea is that over time values change, go up and down, but some things remain constant if you consider them over time. In the case of a home mortgage, your payment remains constant, but the amount of interest and the amount of principal in any given month will move, right? Depending on the amount of principal remaining. Well, amortization also applies to algorithms because sometimes you have an algorithm that is fast in most cases, but then occasionally, say every 100th time, or every 100th step through the array or something, it needs to do something unusual that takes more time. Let me give you an example. C++ has a vector class and under the hood, the vector is allocated as a contiguous array of memory. So of course it has to pre-allocate that contiguous array of memory and if you want to add to the vector, well then it has to find a new chunk of memory to go allocate to be able to hold your new element. Well let's say that the runtime allocated 10 units of memory and your vector only has two elements in it. Well you can add a third, fourth, fifth and so on element all the way up to the 10th and it'll just be constant time. Each one of those adds will just be constant time. But on the 11th, the runtime's gonna have to do something because it's out of memory, so it has to go find a new block of memory to allocate and most C++ runtimes are smart enough to not just find an 11 sized unit of memory, instead they'll go find a 20 and they'll double it. Now on the 11th it had to go do a reallocation and on the 21st it'll go do a reallocation and then again after the 21st it'll go find a block of 40 and so on it'll double in size every time. And so what happens here is adding an element to a C++ array is constant time most of the time. But occasionally it's linear because you have to go find new memory to allocate and copy each of the elements in the array to the new location. So that's what we call amortized constant time. If you wanna read more about that there's a really cool stack exchange link down there that talks about it. There's a great website called bigocheatcheat.com. This person put together a whole bunch of graphs that show each of the complexity classes visually and so you can reference them. They even threw in some that are kinda really rare like n factorial, kinda cool. But you can see along the bottom you can see the flat blue line you can barely see behind the red line that's your constant time. Moving up to log n and to n and n log n and n squared and so on. And you can see which complexity classes are faster for large inputs than others. Which is pretty cool. Here's some other links for later you can check out. One thing I found is that when trying to study bigo it's really easy to stumble into mathematics land and get a completely different bigo. If you just pull up bigo notation on Wikipedia you'll walk away and think I have no idea what this is. This is nothing like what I learned about today. But if you look at time complexity you'll hear exactly what we just talked about. All right, so I wanted to do a little quiz just to quiz the audience. I'm gonna name some algorithms and I'll have you guys yell back to me the order of complexity or the bigo notation. Probably I'll just have you guys yell it out or maybe I'll have you hold up fingers one, two or three. We'll see. On the lower right you've got your question or your answer bank to choose from. I've listed five complexities and we'll start with this one. So uppercase of string where the length of the string is in characters. Complexity. Nailed it. Linear. Determine if an integer is even. Nailed it even for very big integers. Count the number of bits that are one in a 32-bit integer. Constant. Do I hear any dissenting voices? What's that? I heard a linear. Okay, so this one's a little bit tricky because you might think, oh well it's linear because there might not be, there might be up to 32 bits. Well it turns out a 32-bit integer is always a 32 bits long and so your algorithm, depending on your implement it, will always be checking 32 bits. There are faster ways to do it of course but it's constant time for that reason. Copy an array. Linear? That's right. What if the array has like deeply nested objects in it? Still linear? Deep copy. Still linear? Yes, no. That one could probably be debated but I'd say still linear. Okay, quiz number two. We're gonna get a little harder. Insert an item into the middle of an array. Let's say in JavaScript since this is a JavaScript conference, insert an item. Linear? Why is it not constant? What? Yes, exactly because you have to copy all the remaining values for the whole rest of the array, one unit to the right and it might even be worse depending on whether it needs to allocate memory to fit the new stuff but since JavaScript's arrays are not contiguous in memory, it'll probably be linear. Remove an item from an array. Linear? Yeah, what if it's at the end all the time? And it's constant. If it's anywhere else, linear? Append an item to the end of an array. Constant, thank you JavaScript. Remove duplicates from an array. Linear? Quadratic? Logarithmic? Exponential. You're not gonna get logarithmic because that may be exponential. If I write it, exponential every time. Almost all of these are right. So n log n is right, linear is right and quadratic is right depending on how you implement it. I'll leave that as an exercise to the reader but if you can get a linear algorithm for this that's the best case I can think of. Cool, oh and part of the reason some of these answers are what they are is because JavaScript arrays are actually not arrays in the way that they were defined by a lot of other programming languages. They are actually objects where the array indices are object keys in most cases. Okay, final quiz. Insert an item into a linked list. Constant time, if you already know how to have a pointer to the place you wanna insert it. Remove an item from a linked list. Constant time, man you guys are amazing. Find an item by index, in other words by number in a linked list. Linear, yeah very good. And append an item to the end of a linked list. Constant, oh bonus quiz. The first one, compute the check sum of strings in an array. So all the check, compute multiple check sums one for each string in the array. Yeah, n by m or if you're a purist just linear. And then sorting an array, of course we already talked about that, who knows what that one is. Exponential of course if I wrote it, all right. Good, so I wanted to give you an example of just enclosing of big O complexity problems in popular culture, who knows this song? Yes, what is the runtime complexity of this song where n is the number of items that your true love gave to you? Is quadratic, that's right. So quadratic in the wild, there it is. That's all I wanted to tell you about big O and if you wanna geek out with me about big O anytime feel free to tweet, thanks very much.