 Hey, everyone. In this course, we're going to be learning about all the fundamental data structures and algorithms that you need to know, especially with a focus on coding interviews. This course is meant for beginners or anyone who just needs a refresher. We're going to be talking about the design, implementation, tradeoffs, and analysis of all the common data structures and algorithms. Being able to solve these problems efficiently, analyze them, discuss their tradeoffs, and then communicate the idea to others is what can make a difference in hundreds of thousands of dollars in compensation. I hope the problem-solving skills you learn in this course can serve you for your entire career. For more details on what we're going to cover, please scroll down. And when you're ready, let's get started. So the first digital structure that we're going to cover is arrays. But before we even get into that, let's first understand what even is a data structure. Well, as the name implies, it's a way of structuring data. But in our case, we're talking about computers. So we're going to be structuring this data inside of RAM. Let me show you what I mean. So suppose that this is our RAM in this case, and RAM is basically where all of our variables are going to be stored. So as we write code and as we use array data structures, let's say that this over here is our array, even though maybe we don't quite know what it is yet, we know that this is going to be stored in RAM. Let's say we have a one of three and a five in our array. So this is the information that we're trying to store. Now, how are we going to store it in RAM? Well, first of all, RAM is measured in bytes. It's common for many computers to have, let's say, eight gigabytes of RAM. What do we mean by gigabyte? Well, giga in this case means about 10 to the power of nine. So let's say that's about a billion. But the more important question is, what exactly is a byte? Well, a byte is nothing more than eight bits. So the next question is what exactly is a bit? A bit can be thought of as a position that can store a digit. But the restriction on the digit is that it can either be a zero or a one. You might already be familiar with this. Zeros and ones are the language of computers. And as you can already see, individual bits can form multiple bits, which can form bytes, which can form RAM and RAM can be used to store advanced data structures. But now let's get back to what arrays are. Now, in this case, we have an array of integers, as you can tell. Now, don't worry, I know I still haven't told you what an array actually is. So in this case, we have an integer one. How can we store this integer one in our RAM? Well, we have to store it in terms of bytes. Now, it's pretty common for integers to be represented not by a single byte, but actually as four bytes. So what that would mean is instead of using eight bits, we would actually have, let's say 32 bits. I'm not going to draw all of them out, but you can assume that we have in total over here, we have 32 bits. So how are we going to represent this one? Well, we're going to talk about this a little more in depth later in the course. But for now, I'll just tell you that it's going to be 31 zeros. And then all the way at the end, we're going to have a single one. So what's important here is that we have a way to take this one, you know, represent it in terms of bytes, and then we're allowed to put that into RAM. Now, while RAM can be thought of as a contiguous block of data, it also has two components to it. Of course, we're going to be storing values in our RAM, but every value is going to be stored at a distinct location, which we call an address. Now, I've put a dollar sign in front of every address just to kind of distinguish that between the values that we're going to be storing. But what you can see here is that the first address is zero. Now, as we store an array in RAM, we don't always get to choose what location it's going to be at. But one thing about arrays is that they're always going to be contiguous. What that means is here, we're going to be storing the one here, we're going to be storing the three and here, we're going to be storing the five. So nothing is going to be in between the three that we have stored over here, and the five that we have stored over here, nothing is going to be in between that. But if that's the case, why am I incrementing the address by four every time? Shouldn't this be stored at zero? This should be stored at address one. This should be stored at address two. Well, remember, each value takes four bytes to store. So this doesn't represent one byte, it represents four bytes. So we're storing 32 bit integers. That means they take four bytes to store. Now this might seem very straightforward. And it is for the most part. And that's because arrays are considered the most simple data structure. We just store them pretty much the same way that we're representing them here. In memory, they look the exact same as they do the way we use them. A contiguous set of values. Now in this case, we're storing integers, but we could be storing other values as well. Let me show you what it would look like if instead we were storing characters like ABC. Characters are stored pretty much exactly the same contiguously as we would expect. But you notice the address is a bit different. In this case, we're incrementing by one as we add a new character. That's because each character actually only takes one byte to store in memory, not four bytes. At least that's the typical case if we're storing ASCII characters, which isn't super important to know what I'm getting at, though, is we can store values contiguously, regardless of how big or small they are, as long as we increment our address by the size of the value. So this was definitely a lot of theory to understand. But don't worry, this course is mostly going to focus on practical knowledge. But at least we learned a bit about what memory is and how arrays can be stored in memory. Now it's time to understand how arrays can be used and what exactly are its properties and tradeoffs up next. Now let's discuss the properties of arrays. But first to review, we know that an array is a contiguous block of data. We know it stored the same way in RAM. It's stored as a contiguous block of data. Now with any data structure, the two most common operations are reading to the data or writing to the data. We're going to first handle the simple case, which is reading the data. Let's suppose we want to read the first element of our array. Well, as a programmer, we have an array. Let's say we allocated it by calling it my array. So this is a variable that we have to read the first element. The intuitive thing is to go in memory at the first address and then read the value. Now that's exactly what's happening under the hood. But we as programmers don't need to know the exact address of every single value. As we use the array, we use indexes to access values. The first value is always at index zero, not at index one, that can be confusing if you're a beginner, but it's always starting at zero. The next one is at one. And the next one is at two, et cetera, et cetera. So when we read the first value, which is at index zero, it's automatically going to go to this location in our memory and read the value that's stored in memory. Since we can automatically map any index in our array to a location in memory, that means we can instantly read any value as long as we have the index for it to represent that that's an efficient operation. We call it big O of one. We're going to talk more about big O notation later in the course. But for now, what this means is that in the worst case to read a value, assuming we have the index. So suppose we want to read the value at index two, it's an instant operation. It happens in constant time. No matter how big our array gets, if we want to read any value in the array, we can read it instantly. That's what big O of one means. At least that's the oversimplified explanation so far. The fact that we can go to any arbitrary address in our RAM and then read the value at that address is a property of RAM. It's actually in the name itself. Random access memory is what RAM stands for. It basically means we can randomly access any portion of the memory in constant time. Suppose if we wanted to access this value, we don't have to start from the beginning of our RAM and then keep looking for it. We can automatically access it. And that's pretty important, as you can tell, because if we have a lot of RAM, we don't want to have to go through the entire RAM every single time. If we want to access the third value in the array, we don't want to have to go through the first and second values because it's not necessary for us. Now, after we read our first value, next we want to read the second value and the third value and keep doing that until we get to the end of the array. That's pretty easy. We know that the first index we start at is, let's say, zero, I'm going to call it I in code, we would typically say my array at index I to read the value. And then of course, we would increment the I to be one. And then we would increment it to be two. And we would keep doing that until it equaled the length of the array. And then we would stop. So essentially, we would loop through the entire input array. In most languages, you can do so with a for loop or a while loop. Now, while this might seem really simple, the ideas that we're talking about now are going to be the building blocks for very advanced algorithms, we talk about later in the course. So we talked about reading from an array. Now it's time to talk about writing to an array, which is going to bring us to an important properties of arrays, which is that they're of fixed size. In this case, we declared an array of three values. It's fixed size. Suppose we want to add a fourth value to it, let's say a seven. Well, taking a look at our model of memory. So the simple thing to do would just be add a seven in memory. But we have to make sure it's contiguous. So we would do so only at address 12. So then we have an array of size four, and it's still contiguous. This seems like it might work. But the way Ram works, we don't always get to decide exactly where we allocate our Ram. We don't always get to decide exactly where we store our values. And we don't know for sure that this spot in Ram is even available for us. It could be that we already have maybe another array stored over here, or maybe the operating system is using this part of our memory for some other purpose. So we're not necessarily allowed to put the seven over here. Okay, then you might think, Well, let's just let the operating system decide. Just just put a seven anywhere in memory. But if we do that, we don't follow the property of arrays. It's no longer contiguous. Now we have two arrays. That's not what we wanted. We wanted a single array because if we do it like this, and then let's say I'm looping through this array, I'm at index two. Now I increment and I'm at index three. What we're going to try to do by accessing this is we're going to go to this spot. But we don't know that the seven is stored over here. It might be stored somewhere else. So this is going to break. Basically, we've come across the biggest limitation of arrays, at least for static arrays, which is what we've talked about so far. static arrays basically mean that they're fixed size arrays. So before we continue to discuss the limitations of static arrays, I do want to mention if you're used to a programming language like Python or JavaScript, you might have never come across this limitation, because those languages don't really offer static arrays. You never have to worry about running out of space when you're using an array. And that's because those languages usually offer dynamic arrays as the default, which we're going to discuss later on. So suppose we allocated an array of size three, initially it's empty. But there has to be something stored here. And usually by default languages will initialize it to be all zero, or sometimes they won't initialize it at all. And sometimes there can just be some random values here, like just some random arbitrary values that we didn't even put there. So initially, we have some space, let's say we start adding values to it, we add a five, that puts a five in memory, we add a six, that puts a six in memory, we add a seven, that puts seven in memory. Now we're allowed to remove values. But when we say remove, we can't actually delete this in memory. We can't deallocate that with a static array. When we say remove a value, we're basically saying we're going to overwrite it by putting, let's say zero in that case. And we would say that this value is no longer relevant. We have a zero here. We only care about these two. This is still in memory, but we don't care about it. And we can, you know, remove all the values in a similar way. But what we can't do is add a new value. By the way, as we add a new value to the end of the array at the next empty spot that we're keeping track of, it's also going to be an instant operation. Because we know this is index zero, this is index one, we want to write to this position. Therefore, we know instantly that we're writing to this position in RAM. So writing to any position is also a big O of one operation. It's constant time, we can think of it as an instant operation. Similarly, removing a value is also as efficient as big O of one. It's a constant time operation because what we're really doing is just going to that position in memory and xing it out. We could be replacing it with a zero or a negative one or something like that. We're not actually deleting it, but we're overwriting it. So these are the efficient operations with a static array. So we have a couple values in our array. And remember, these are ordered values. The five comes before the six. We know that because of the way the indexes work. So let's say we wanted to arbitrarily insert a value into the middle of the array at any position in the middle of the array. By that, I mean any position in between these values at the end of these values or at the beginning, we know inserting at the end is always efficient because we're inserting into an empty spot. If at least there's one available, we're assuming that there is space available. That's efficient. That's O of one. But inserting at any arbitrary position, like the middle or at the beginning is not going to be efficient because let's say I want to insert the value for at this position. Now, it's easy enough to overwrite the value with a four, but then we have a four and a six. What we actually want is to have four, five, six. We could put the four over here because that's efficient. It's all of one. And there is space there. But that doesn't achieve four, five, six. We're we have the elements out of order with arrays, including static arrays. The order of the values does matter. So for us to insert the four, over here, we first have to shift the five to the next position. But if we shift it over here, we're going to get rid of the six. That's not what we want to do. So before we shift the five, we have to shift the six, and then we shift the five. In other words, we're going to take the six. It's at index one, but we're going to set it to be index one plus one, meaning it's going to be at index two. So the six is going to go there. The same thing with the five, it's going to go at index zero plus one, which we know is one. So it's going to go here. Only after we've performed this shifting, are we allowed to put the four over here? There's space for it now. And we preserved the order of the values. The downside of this is we couldn't do it in a single operation. We had to move every value in the array. In this case, we only had two values in the array. But imagine if we had a really long array and it was all filled with some values, and we wanted to insert a value at the beginning, then we'd have to take all of these shift them, and then we could insert the new value. This is not very efficient. That's what I'm getting at. We call this a big O of N operation and in this case refers to a variable. And in this problem, N refers to the number of elements in the array. We can call that the length of the array length, meaning the actual values we have not the size of the array, but the number of values. And I've been using big O. But what it actually means is the worst case because this what we just did right now was the worst case. What we had to do is we had to shift every value in the array, which means N values over to the right by one. And then we were allowed to insert the new value. But if we were inserting in the middle between five and six, let's say what we actually wanted to do was have a five and then a four and then a six, then we wouldn't have had to shift the five at all. We would only have had to shift the six, the five would have stayed over here on the left, and we would have inserted the four over here. So in that case, we would have only needed to shift one value, not every value in the array. But we don't want to go through every possible scenario. So we generalize it. We say in the worst case, if you're inserting a value in the middle of the array at any arbitrary position in the worst case, we might have to shift every value. We might have to shift N values. So in the worst case, the time complexity is going to be big O of N. And the exact same thing is true if we're removing a value at any arbitrary index. When I say removing, I'm not just saying, you know, put replacing it with a zero or something. I'm saying I don't even I want to pretend like this doesn't exist anymore. I want the first value to be six. In that case, we would do something similar, but we would take the six and then shift it to the left by one. So we would say one minus one is going to be zero. So we're going to put it at the zeroth index. Seven is at index two, we're going to take two minus one, which is one, and we're going to move this to that index. In the end, we're left with a six, seven. Technically, the seven from over here would still be there, or we could have replaced it with a zero, or we could, you know, do something to pretend like it doesn't exist. But the memory that we actually allocated here definitely does exist. And it would be, you know, reflected in RAM. So to summarize the operations for a static array reading or overriding the ith element of an array, ith basically means any arbitrary element, you could pick any index and read or write from it. You can do that in big O of one time, you can do that in constant time to insert a value at the end of the array, or to remove the value at the end of the array is also a big O of one time operation. That's assuming that we have an empty space left at the end, or we at least have one element at the end for us to remove. Now, inserting into the middle of an array in the worst case is going to be big O of N. For an empty array to insert in the middle, it would be very efficient. It would be one big O of one. But we know we're talking about the worst case. That's what big O refers to. To insert in the middle would be big O of N. We might have to shift every value in the array. The same thing for removing in the middle, we might have to shift every value in the array. Next, let's move on to dynamic arrays, which are much more common and much more useful than static arrays. You've probably already used a dynamic array, you just might not have known it. In Python and JavaScript, dynamic arrays are the default. In Java, you can use an array list or in C plus plus, you can use a vector. So let's remember that the problem we're trying to solve here is the fixed size problem. Our static array is a fixed size. But what if we have an array that is dynamic in this case? Now, when we were creating static arrays, we usually have to specify the size. So when we were using static arrays before of size three, we specified that it was size three when we created that array. But in this case, let's say this is a dynamic array, and we created a dynamic array, but we didn't initialize any of the values. And with dynamic arrays, you don't necessarily have to specify the size. If you don't specify the size, usually it'll initialize it to some default size. For example, in Java, I think usually the default size of an array list is 10. But that number could be arbitrary. And it's not necessarily important. Let's say we initialized a default empty dynamic array, let's just say it got initialized to size three in this case. While the size is three, the length of the array is actually zero in this case, we have nothing added to it yet. So we say the length is zero. Now, as we add elements to the end of the array, we call that pushing to the array. So we're pushing to the end of the array, we pushed a zero to the array. Now we're going to push another element. And what that means is insert it at the next empty spot. So let's push a four in this spot. Now with the internal implementation of this dynamic array, it will maintain a pointer or basically a variable telling us what index is the last element of this array. So we know that the first element is at index zero, the next element is at index one. And we know this is the last element so far. So we have a pointer telling us that this is where the last element is, we can also use this pointer to get the length of the array. So right now, this pointer is at index one. That means the length of the array must be two, because remember, we're starting at zero. So we talked about pushing, but we can also pop values popping in this case also refers to the end. So that's why this end pointer is important for us, we're going to pop the last value, just like how pushing a value to the end is an O of one time operation, popping a value from the end is also a big O of one time operation because we know exactly where that value is because we have that pointer to the last element, we pop it. And we also want to now take this pointer and then shift it to the left by one. So now we're maintaining that this is the new end of our array. We could also pop this value as well, which would make our array be basically of length zero again. But that's the main idea. Now for the problem we're trying to solve Let's push another element. Let's push a seven. Now this is the end of our array. Now let's push another value. Let's say I want to push a nine. Well, we ran out of space. So what is our dynamic array going to do so far in memory over here, we have a zero of four and a seven. We previously talked about why we can't just allocate more memory right next to it. So what are we going to do? Well, we're just going to allocate a brand new array that will be able to contain all of the elements that we're trying to store. What I've done here, though, is take in the original capacity, the original size, which was three and multiply that by two. So now we have an array of size six, rather than size three. We're only inserting one extra element, though. So it would have been sufficient to just have an array of size four. So why did I double it? Well, we'll get to that. In a minute. So since we allocated a brand new array, we know it's going to go somewhere in memory somewhere random, but we don't really care where it is. Of course, it has to be a different location than where our first array is because this is still in memory so far. And what we're going to do is take all of the original values stored in the array. So over here, and we're going to copy them into the second array starting at the beginning. So if this is at index zero, well, it's going to be a zero. It's going to get copied to index zero in the second array. If this is at index one, it's going to get copied to index one. If this is at index two, it's going to get copied to index two. So our second array is going to look like zero for seven. So you can tell we solved the problem of not having enough space. Now we definitely have enough space for this nine to now also be here. So now our memory looks correct. We have all the values that we wanted. But the problem is we still have the original array. Do we really need this anymore? No, it's just taking up space, but we have all the values in the second array anyway. So now at this point, we deallocate this we free this memory. We basically tell the operating system. We're not using this anymore. You can use it to allocate different variables if you need to. So this doesn't exist anymore. So this definitely allows us to get around the fixed size property. So now if we add even more values to our array, which replaces the original array, we can keep doing that. Let's add a 10. Let's add an 11. Let's add a 12. If we add a 12, we're going to do the exact same thing that we did before, because we ran out of space. Now it's time to allocate a new array and we're going to make one that's a double the length of the original one. So size 12. And then we're going to insert the new element and we'll have a bunch of extra space to add more elements if we need to. But we might not need to. Now let's get back to one of the questions before why did we double this rather than just increasing it by one for the space that we needed for the original element. Well, notice how since we allocated a brand new array, we basically did a big O of n operation because first allocating the memory itself takes a big O of n where n is the size of the memory that we allocated. But regardless of that, we also had to take the original values from the original array and then move them into the second array. We had to push every single value. So that was also big O of n where let's say n is the length of the array, meaning the actual values that exist in the array. But what we're used to is pushing a value to the end of an array. In this case, let's say we push a value here, 10, that would have been big O of one. So what we notice is if we run out of space, yes, we can still add the next value. But it takes extra time because we have to create a brand new array, push all the original values into it, which is big O of n time complexity. That's the reason why when we create a bigger array, we are doubling the capacity every single time we're not just increasing it by one because we're finding a middle ground between not having to allocate a brand new array every single time. But also we don't want to have a bunch of empty space. We could have just created a super super large array with a bunch of empty space, right, like 90% of it could have been empty space, but that wouldn't have necessarily been good, because then we're taking up a bunch of memory from our operating system. And if we do it this way where we doubled the capacity, we get something called amortized complexity amortized time complexity. And what that basically means is yes, it took big O of n to add an element if we ran out of space. But we know it's going to be pretty infrequent when we run out of space. So we can still say that the amortized time complexity of pushing a value to a dynamic array is going to be big O of one, you can think of it as being the average time because we know on average, it's usually going to be big O of one, not big O of n because it's the vast majority of the time we add elements, it's going to be constant time, not big O of n. And there's actually a mathematical proof for why this is the case. In math, it has to do with power series. And I won't go too in depth into the math, but I will give you a high level explanation for why this is true. Let's say our goal is to push eight elements into a dynamic array, maybe the default dynamic array was only of size one when we allocated it. So let's say we add our first number five to the array. And so far we have an array that looks like this. Next we want to add a six to the array, but we know we don't have space. So what we really have to do is take this five, move it down to here and then insert the six over here. So I'm just counting the operations we're doing so far. We did one operation when we added this five here. And then to add the six, we actually had to do two operations where we moved the five and then added the six. Next, let's add the number seven. But for us to do that, we have to allocate a new array. And since we're doubling it every time, this array is of size four. So we're going to move the five here, we're going to move the six here, and we're going to add the seven here. So while we did three operations, we inserted three values here, allocating the array itself actually also took time complexity. So allocating an array of size four took four operations, we can say. And we're going to continue to do this. I'm going to add an eight over here. That's perfectly fine. We're going to add a nine here. We know that's going to require shifting all of these. And then let's say I add a 10 an 11 and a 12, which we do have space for here. So let's say it took us an additional eight operations to actually build this one. Now I'm oversimplifying a little bit because it actually took eight operations to allocate the space and then it took another eight operations to actually move each value into the array. But we're going to ignore that just for now. The main point I'm trying to get at, though, is to create an array of length eight, it actually took us eight plus four plus two plus one operations. But since we're doubling this every single time, this calculation is always dominated by the last term. What that means is the last term is always going to be greater than or equal to the sum of all previous terms. This is a math concept. And this is the reason why we're doubling it rather than just increasing it by one. But why is this actually important? Well, to create an array of size eight, let's say in this case, it took us 15 operations. We know that this number is dominated by the last term, which means that it's always going to be less than or equal to two times the last term. In this case, the last term is eight. But we know in the general case that it's going to be n for how many elements we wanted to push to an array. So we say that this is two times n. So in big O terms, we would say it's two times n. But with big O notation, we never care about constants when they're multiplying a variable. In this case, two times n can be simplified to be big O of n. We don't really care about constant values, at least when they're used in multiplication or addition. So finally, the result that we arrived at is pushing an arbitrary number of values to an array is big O of n time overall, even though sometimes when we have to exceed the capacity, we have to shift a bunch of values. But overall, it reduces to be big O of n, which is pretty much the same that we expected from our static arrays. So as long as we double the size every time that it's succeeded, we do have an efficient data structure when we use dynamic arrays. By the way, let me illustrate to you why we don't care about constants when we're talking about big O time complexity, we don't care about the difference between big O of two times n, or the big O of n, we don't care about any constant c that could be there, any constant, no matter how big it is, it could be 10, it could be 100, it could be 1000, we don't care about it. And again, it's because of math. Suppose this line is representing our n line. As the input size grows, the runtime is also gonna grow. But it's growing linearly, right? It's a straight line. But if we had a runtime complexity of n squared, now this is a two, but this isn't the type of constant I'm talking about, because this is being raised to a power of two, it's not being multiplied by two powers are definitely important. But we don't care about, you know, some constant multiplied by n, and we don't care about some constant added to n. It could be a different constant than this one, but we don't care about it. If it's added or multiplied, as you can see, n squared grows much more quickly than n. And that's the idea. Now for some smaller inputs, it might matter. Now with these, they do intersect at input size n equals one. But suppose we had a constant here, maybe we actually have a line like two times n, it doesn't matter what the constant is, because they are definitely going to intersect at some point. And probably in this case, it'll probably intersect at 1.4 or something like that. But the idea is they're going to intersect at some point. And after that point of intersection, the larger growing function is always going to be greater than the slower growing function. And that's what we care about. We don't care about small inputs like one, two, eight, even 100, or even 1000, we don't care about these small input sizes, we care about really, really big input sizes, because those are the ones that slow down our CPU when we run programs. That's why when we talk about big O time complexity, we don't care about constants, we care about how quickly is it growing? Is it big O of n, or is it big O of n squared? The same would be true if we added a constant. For example, this is what n plus eight would look like. Well, it gets a head start over the quicker growing function. But at some point they're going to intersect. And at that point, the quicker growing one is going to grow way faster. It's going to go off the charts, while this one still grows at the same rate. Now to summarize the time complexity for dynamic arrays is essentially the same as static arrays, we can still access the ith element anywhere in constant time, we can push and pop elements at the end also in constant time, even though in some cases we might have to resize the array, we still basically assume that it's a constant time operation for the reasons that I talked about earlier. And the good thing about dynamic arrays is we can do so, even if we don't have space because yes, we can allocate more space. The downside though is inserting into the middle or removing from the middle is still going to be a big O of end time operation because we're going to have to shift every value over. And there's no amortization for this. In the worst case, this is always a big O of end time operation. So now that we understand dynamic arrays, the next thing for us to do is understand how we can use them. There's a common data structure in programming called a stack, which typically supports three operations, you can push an element to the end of the stack, you can pop an element from the end of the stack, or you can look at the last element in the stack. Meaning in this case, we just read the last element without removing it like we would in pop. These operations should run efficiently. So pushing should be a constant time operation. Popping should be a constant time operation. And peaking at the last element should be a constant time operation. So do we need to design a data structure from scratch to be able to implement this? No, the dynamic arrays that we talked about earlier actually satisfy all of these requirements. So a stack can be implemented using a dynamic array. So typically when we're programming, we actually just use the default dynamic array that's given to us in the language that we're using. So remember, our stack is nothing but an array and we're allowed to push elements so we can push a one onto the stack, we can push a two onto the stack. And sometimes people like to think of stacks as being vertical. So when we say we're pushing onto the top of the stack, we're throwing a value in here and it gets thrown to wherever the top of the stack is since right now, it's empty, it'll go to the bottom. Now in reality, we would have some kind of pointer or we would maintain the number of elements that are added so far. So we would know exactly where to push the next element. let's push a two, it's going to go where the next empty spot is, which is now going to be shifted here. Same thing with the vertical representation. We push to the top of the stack, which is considered the end of the stack. If you're talking about a regular array and we can keep doing this, let's push a three. We can also now push a four. Even though technically we don't have space, we know our stack is implemented with a dynamic array. So we don't even have to worry about running out of space. We know it's going to end up allocating some more space for us anyway. So we don't even worry about that. Stacks are just one common use case of dynamic arrays, but we can also support popping elements. So when we pop, we can only remove elements from the end of the array, but we know that's efficient. So we're in this case popping from the top of the stack or the end of the array. Internally, we wouldn't necessarily delete this portion of memory. We would just say that this is now the new end of our array, or this is the new top of our stack and we can keep popping elements. Notice something about the way the elements are being inserted and removed. Remember, we first pushed a one and then we pushed a two and then we pushed a three and then we popped a three and then we popped a two and now we can pop one more element from our stack, which is the one. So notice how the order that the elements were inserted in is the reverse of the order that the elements are removed in. In other words, the last element that was added to the stack is going to be the first element that's removed from the stack. The second to last element that was inserted is going to be the second to last that was removed if we pop all of the elements. So stacks are considered a life. Oh, data structure. That means that the last in is going to be the first out. There are many use cases for a data structure like this. The most obvious one is the one that we kind of implicitly just showed, meaning we had some sequence of values. In this case, it was numbers, but it could have been a string of characters like ABC. But the order that we added them in was the reverse of the order that we remove them in so we could use a stack to reverse a sequence if we wanted to, even though there are other ways of doing the same thing. There are definitely a lot of other use cases for stacks that can get a lot more complex. And as you solve problems related to stacks, you'll definitely notice this. We will continue to talk about stacks throughout the course. But if you want to practice using this data structure, you can find some stack practice problems in the practice section. Now let's learn about linked lists, which have a lot of overlap with arrays, but also a lot of differences. The first thing to know about linked lists is that they are made up of list nodes or sometimes just called nodes for short. A list node is an object that encapsulates two things at a minimum. These are the value of the list node. It could be an integer, a character. It could be another object. It could be anything. Maybe in this case, we have a four as the value. And also we have next, which is a pointer. This is going to tell us what is the next list node in the linked list. So this is essentially a way of connecting multiple list nodes together, which forms a linked list. If we just had a single list node, it wouldn't be very useful. In this case, let's say our next pointer points at null, which is another way of saying that it points at nothing. That means that in this case, we just have a single list node. That's not very interesting. So now let's understand how these nodes can be connected together. Let's first build a linked list. So first, I'm going to create one list node. The value I'm going to give it in this case is going to be a string. Let's call it red. And the next pointer is initially going to be empty. Or we could say it's by default set to null, so it's not pointing at anything. That means this list node is not connected to anybody yet. And we know when we create this object, it's going to go somewhere in memory. We don't necessarily get to control where it is, but we know it's going to be somewhere. We know that we're going to have a red node somewhere in memory, and it's not going to be pointing at any other node yet. Next, we'll create a couple more nodes. Let's say a blue node and a green node. So now that we have three list nodes, it's time to actually connect them together. Now, in most languages, the next pointer is actually a reference to another list node. That's how it's implemented. So in terms of code, this is what we could do. We could set list node one, assuming that's the name of our variable, our object that we created dot next, the next pointer to be list node two. So what that would do is take this pointer and basically tell it that the next node in the list is going to be this node. Now, what's actually happening under the hood at the lowest level is we have some address for our second list node, our blue list node. Let's call it X, X, and maybe this one is stored at address Y, Y. The way we know how to get to the second node is at the lowest level, maybe your language or operating system is doing this. The pointer tells it that the address is here. So that's how we know how to get to the second node. But usually, depending on the language that you're using, you don't have to worry so much about the address because the reference to that object is enough. So when we talk about pointers, we're usually talking about references to objects. But I'm mentioning this because as you can see here, the nodes are out of order in the way that we're going to connect them. We've connected our red node to our blue node, but the order in actual memory is the opposite. What this means is that linked lists are not stored in memory the same way that we use them as programmers. They could be in some random order in memory, but they're connected via pointers. So the red node is going to point at the blue node. And eventually we're going to have the blue node point at the green node. We could have some big mess like this in memory, but as programmers, we're going to have something nice and clean like this. So this is a difference from a linked list compared to arrays. Arrays are stored the same way in memory, but linked lists, as we can see, are not necessarily stored the same in memory. Next for us to connect list node two to list node three, we would do the same thing we have down here, except for list node two dot next, we would have a two here is going to be set to list node three. So something like this. And then we would have a nice linked list finally formed like this. So now that we have an idea of how linked lists are stored in memory, we're not going to worry so much on the details of what's exactly going on under the hood. Now, what if we wanted to start at the beginning of our linked list and then loop through every single node until we got to the end? And we would know we got to the end, by the way, because our last pointer is not pointing at anything. Now, depending on the language that you're using, the code for that would look something like this, where first we have a variable called cur, we initialize it to be at the first node in the linked list. And we keep looping as long as our pointer current is not pointing at null. What that means is we have a valid linked list node. So let's walk through this code really quickly. Suppose this is our current pointer, it's pointing at the red node. That's how it's initialized. Then we go to the while loop statement. Right now it's not pointing at null. So then we're going to enter the while loop current is going to be set to next. So we're going to follow our next pointer over here, which is now going to be pointing at the next node. Very simple. Then we're going to go back to our conditional current is still not equal to null. So now let's set it to current dot next. So we're going to follow the next pointer, which is now pointing at the green node. So this is where our current pointer is now going to be. We're going to again check the condition. It's still not equal to null. So now we're going to follow the next pointer again, which in this case it is pointing at null. It's not pointing at any other node. So now our current pointer is going to be set to null. And then we're going to execute the conditional. And in this case, it's going to be false. So we're going to no longer execute the while loop and we're done with that piece of code. Now my question for you is what would have happened instead? If this next pointer of the third node was actually pointing at the first node? Well, we would have started here, then went to the blue node, then went to the green node, and then we would have came back to the red node and then back to the blue node, back to the green node and just kept doing that over and over again. It wouldn't be very fun to run because the program would crash. It's an infinite loop. But going back to the first case where we didn't have an infinite loop, what was the time complexity of going through the entire linked list? Well, it was the same as doing so with an array. It was big of N where N is the size of our linked list. So far, we've learned that link lists are pretty similar to arrays. Let's talk about a few more similarities real quick. As programmers, it would probably be helpful if as part of our implementation of the linked list, we're always keeping track of the head of the linked list and the tail of the linked list or the end of it. So let's assume that we always have pointers that can give us access to the first element and the last element. If we have a linked list of size one, suppose these didn't even exist. That means our tail pointer would be pointing at the same element as our head pointer. That's how we would know we only have a linked list of size one. But in this case, we have a couple more nodes. Suppose we now wanted to add a new element to the end of the linked list. And let's say we create a new node and its value is purple. Well, if we want to insert it at the end, conveniently, we have a tail pointer at the end of our current linked list. And all we need to do to add this to our linked list is just connect the pointer. So we would simply take the tail dot next pointer and set it to list node four, which is our purple node. So that would look something like taking this pointer and then instead connecting it to this one. But the problem here is that our tail is still pointing at the third node. We want to update our tail pointer now to actually be pointing at this one. And in terms of code, that's also pretty easy. We can set tail equal to list node four. We could also have set tail equal to tail dot next. Because remember, tail is still pointing at the third node and we know tail dot next now is set to list node four. So by setting tail equal to tail dot next, we're automatically setting it to this node. Can you tell me the time complexity of this operation? Well, it was a constant time operation. Since we already had a tail pointer at the last node, we didn't have to iterate through every single element in the linked list. We started all the way at the end and then we created a new node and then connected the pointers. It was just one operation that we needed to do, which is the exact same thing with an array. Now, let's say instead we wanted to remove a node. Removing any node in a linked list, whether we're talking about the end of it or the beginning of it, is also always a constant time operation. But there's a really big caveat. That's assuming we have a pointer to the previous node that we're trying to remove. So if we're trying to remove this node, we should hopefully have a pointer to the previous node. So if we're removing this node, what we would do is set our head dot next pointer. Instead of it pointing at the second node, we would set it to the next node that it's pointing to. So this is a little complicated because we're chaining the fields. Head dot next, remember, refers to this node. This node is equal to head dot next. And then if we say dot next one more time, we're following this pointer to the third node. So what we know is this evaluates to the third node. So when we say head dot next, which is this pointer is pointing at the third node, we're basically getting rid of this guy and taking this and setting it to the third node. Now, if we want to, we can free the memory that this node was taking up. But in most languages, we have garbage collection. So we don't really need to take care of that. The language and operating system will take care of that for us. We can assume that it no longer exists if there aren't any pointers pointing at it anymore. But this is sort of a benefit compared to arrays. We can remove any element from whether we're talking about the beginning or the end in constant time because we don't have to shift everything over like we did with an array. But in the general case, if we wanted to, let's say, you know, delete an arbitrary element in the linked list, we would first have to start at the beginning of the linked list and then follow the pointers until we found the element and then remove it. So practically speaking to remove an element, it can still take big O of end time if we have to search for the element. Now, so far we've talked about singly linked lists, but there's a slight variation called doubly linked lists, which can also be useful. The main difference as the name implies is that we have two pointers. We have double pointers. We have one pointer for the next node, the same for singly linked lists, but we also have a previous pointer, which is going to point at the previous node in the linked list. So with three nodes, one, two and three, the next pointers are going to be connected to the next node in the linked list. The previous pointers are going to be connected to the previous node in the linked list. Now, the first node does not have any previous pointers, so it's going to be set to null, we can assume, the same way how the last node has a next pointer set to null. That's how we know we get to the end of the list and this is how we know we've gotten to the beginning of the list. Now, just like with singly linked lists, let's assume that we have a head and a tail pointer to the first and last node of our linked list. Now, let's say we wanted to add a new node to the end of the linked list. It would be very similar to doing it with a singly linked list, meaning we would take the next pointer of our current tail node and assign it to the list node four. We would also want to set the previous pointer of this node to be at list node three. We would also want to set the previous pointer of list node four to point at list node three, because we want to make sure that this is doubly linked. We want to preserve the properties of our doubly linked list. And then we would take our tail and shift it to the new node that we just inserted. This is how the code for that operation would look like. We would first say tail dot next is equal to the new node that we're inserting. We're connecting the pointers, we're adding that node to our list. We're also connecting the previous pointer of that new node. The previous pointer should be set to the current tail. Remember, our tail pointer was originally pointing at node three. So that's what our previous pointer is going to be set to. But finally, in the end, we take our tail pointer and set it to the next node, which is going to be list node four from the assignment we did right here. The one thing to note is the order that we do these operations in is very important. We have to do this operation before we update the tail pointer. So now let's look at removing a node. Let's remove the last node from our list. And it's a little bit easier with a doubly linked list because we're allowed to look backwards. Assuming we only have a pointer to the tail, that's actually all we need. We don't need a pointer to the previous node because from our tail, we can follow the previous pointer and it's going to take us to the previous node. This is really convenient. With a singly linked list, we would have had to start at the beginning and then kept going forward until we got to the node that we wanted to be at. We would start at the tail, follow the previous pointer. We would be at list node two. We would take the next pointer of it, which is now pointing at list node three. And we would instead point it at null. And it's really that simple. Now we have a linked list of size two. Technically, this node still exists and its previous pointer is pointing here. But if we take our tail pointer and set it now to be here, as far as we're concerned, this node doesn't exist anymore. Technically, it does, but we have no references to it. So we're never going to see it ever again. And depending on the language you're using, garbage collection might delete this automatically for us. And the code for that operation would look something like this, where we create a variable called node two and we assign it to tail dot previous, which is going to be here. We didn't originally have a pointer to this node. So we created one by taking tail and getting the previous pointer. Now we're taking the next pointer of this node and setting it to be null. It's used to point at node three. Now it's not pointing at anything. That means that this is the new end of our linked list. So we should take the tail pointer and set it to node two as well. At this point, we have a linked, we have a doubly linked list of two nodes left. By the way, we did a few operations, but we didn't have to loop through the entire linked list. Regardless of the size of the linked list, this delete last operation is always going to be the same. It's going to be a constant time operation. So with doubly linked lists, we can insert a value at the end and we can remove a value at the end so we can do both of those in constant time. Doesn't that satisfy the requirements of a stack data structure? Yes, it does. That means that stacks can be implemented also with linked lists, just like they can be with arrays. But it's a lot less common to implement stacks using linked lists because with our original dynamic array, we could pop and push to the end of the array efficiently. But we could also access any arbitrary element of the array. This is a downside of linked lists, whether we're talking about singly linked list or doubly linked list because we can't just arbitrarily access the second element or the third element or the seventh element. We have to follow our pointers to do that. That means to access a random element of the linked list, it would be an O of end time operation. So if that's the case, it's probably better to use a dynamic array if we're talking about stacks because we get extra functionality that we lose when we use linked lists. So let's wrap up by reviewing the time complexity of linked lists and compare that to arrays. So remember accessing the ith element of an array, accessing any element at any arbitrary index is a constant time operation with arrays. Inserting or removing at the end is also efficient. We can push and pop efficiently. But inserting into the middle or removing from the middle is going to be a linear time operation because we might have to shift all of the elements over in either direction. Now with a linked list that looks something like this, accessing any element in the linked list is not necessarily going to be efficient. I mean, of course, if we already have a pointer to this node, for example, it will be a constant time operation, but that's not always going to be the case. If we had a really big linked list and we wanted to access any element in the linked list, in the worst case, we would have to start at the beginning of the list and then keep incrementing, keep iterating through the list until we got to that element. So linked lists are different. We can't just randomly access any element. The worst case time complexity is big O of N. But if there's a downside to linked lists, we can also expect that there's going to be an upside to them as well. Because that's the whole point of data structures. Some are better at certain things. Now inserting and removing at the end is the same as arrays. It's a constant time operation. We talked about that earlier. Inserting and removing from the middle, though, is a constant time operation. Remember, if we have a linked list that looks something like this, if we want to remove the middle element, we don't have to now shift everything over. We don't have to take all of these and then shift them over by one. All we have to do to remove an element like this one is take the pointer. It's currently pointing at that. But now we want it to point at the next one. That's all we have to do to remove from the middle. Inserting works very similar to that. So, yes, it's efficient to do that with linked lists, but the caveat and it's a really big one for us to remove any random element. First, we have to arrive at that element. We need a pointer to the element before we can remove it. So in most cases, we're going to have to start at the beginning, keep iterating until we find the element that we want to remove and then remove it. So while doing the insertion or removal is efficient, usually we have to iterate through the linked list before we can do that. So in most cases, it's going to be a linear time operation. So we can see that linked lists do have a slight benefit compared to arrays. But I'll be honest and say that arrays are much more common and much more useful for problem solving, being able to access the ith element, any arbitrary element very efficiently is really, really important, much more than something like this, especially when most of the time we do this, we're not actually going to be able to do it in constant time. We're going to have to iterate through the linked list. These are a couple of very fundamental data structures and I can't wait to show you even more. Next, let's talk about queues. Queues are another important data structure similar to stacks. The difference is queues follow FIFO, which means first in, first out. They support two operations, typically NQ and DQ, but different from a stack when we NQ elements, it's the same as pushing. So in this case, it's actually similar to a stack. When we push elements, we push elements to the end of the queue. That's called NQing. Now, DQing is removing a value, but instead of removing from the end, we're actually going to remove from the beginning. That's why it's first in, first out. The first element that we add is going to be the first element that we remove. Elements are going to be removed in the same order that they were added to the queue. And we want to do these operations efficiently. We want to be able to add to the end of the queue in constant time and remove from the beginning of the queue in constant time. We know we couldn't do that with arrays because if you remove from the beginning of the array, then you have to take every other element and then shift it over. That turned out to be a big O of end time operation. But with linked lists, we can achieve this. Let me show you how. So suppose initially we have an empty queue. So we'll have a current pointer, which is initially set to null. It has nothing. And then we add a value to our queue. Let's say we add red to our queue. Then we create a list node. And then our current pointer is going to be set to that list node because now we have one element in our queue. By the way, with this linked list, we will be maintaining a head and a tail pointer, which both are initially going to be at this node as of now. But we want to add another value to our queue. So we add another, in this case, blue to our queue. It's as simple as taking the next pointer of this node and setting it to the new node that we're inserting. At this point, our head pointer will be pointing at the first element here and our tail pointer will be pointing at this, which is our tail. That's how we know when we insert a new element like this one, we're going to take the next pointer of the tail node and then set it to the new node. This is really easy with linked lists, but now it's time to do the other operation, which is DQ, remove from the head of the list. Well, the good thing is it's as easy as taking our current pointer and setting it to be current dot next. Since our current pointer is at this node, dot next will follow the next pointer, which will be here. So it's as easy as that to take our current pointer and then point it to this node. And then if this is the beginning of our linked list, we can pretend like this doesn't even exist anymore. So DQing an element is very efficient. That's how we can do it in constant time, which you can't really do with arrays. But I do want to mention that technically it is possible to implement queues with arrays, but it can get a lot more complicated. It's much more simple to implement them with linked lists. Queues are a very commonly used data structure and will continue to use them throughout this course. Recursion is a very important concept for data structures and algorithms and we'll be using it throughout the rest of this course. It can be pretty difficult to wrap your head around. So I think it's best to introduce this topic as soon as possible. We're going to start with what I call one branch recursion. And we're going to do so using a math formula, specifically n factorial. If you're not familiar with it, the n factorial is a shorthand of writing n times n minus one times n minus two all the way until we get to the base number, which is one. So for example, to say five factorial would be to say five times n minus one, so five minus one, which is four, and then n minus two, which is three, and then keep a decrementing the number by one until we get to the base number, which is one. And then, you know, we do this calculation. It turns out to be 120. Now, you definitely don't need recursion to do this calculation. The easiest way to do it would be with a loop where we take this number and then decrement it by one and multiply it by the result until we get all the way down to one. But we can also do this with recursion and I think it's a good way to get introduced to recursion. Notice how five factorial is the same as five times four factorial because we know four factorial is going to be four times three times two times one, and we see that repeated over here. So another way of representing five factorial is to say five factorial is equal to five times four factorial. Or the general equation, which we've written up above, would be to say n factorial is equal to n times n minus one factorial. The reason it's important to understand it in this way is because recursion is all about sub-problems. We can see that it's not super simple to calculate n factorial, but we know it's going to be n multiplied by n minus one factorial. So we took a big problem and made it a slightly smaller problem. We call this the sub-problem, and this is why we're allowed to use recursion for this problem because we have a sub-problem. Now, I could show you what the code would look like to solve this recursively, and it's actually very short, but I think it's better to start with a visualization and then move to the code, which makes it easier. So suppose we're trying to compute five factorial. We know that the sub-problem is four factorial, n minus one, right? So we take five minus one, and we want to compute four factorial first. So suppose we want to compute five factorial. We know that the sub-problem we have to compute first is four factorial, and then we can multiply that by n, which in this case is five. By far, the best way to represent recursion is using a decision tree. In this case, we only have one decision. That's why I call this one branch recursion, but a tree is still helpful to visualize this. So before we can compute this, we have to compute four factorial. So in terms of code, this step would look something like this, where we have a function called factorial. It's going to take in some integer where we're going to compute the factorial of that, and then we're going to return the result, which is also an integer. Now, so far, we know that to compute n factorial, we have to solve the sub-problem, which is n minus one factorial. So we call the function that we're inside of, we call our own function. We call the same function. This is called the recursive step, but this isn't all we need. That's why I've left a little space over here. So let's continue with this and see what happens as we continue to solve the sub-problem. So we have four factorial. Well, again, we know that's equivalent to four times four minus one factorial. So we want to compute the sub-problem three factorial before we have four factorial. But the same applies for three factorial. It's equivalent to three times three minus one factorial, which is two factorial. So let's compute that. And again, same applies to two. It's equivalent to two times two minus one factorial, which is one factorial. Now, this is where we get to something different. Once we get to a one, that's the base of our series. That's the last term of the multiplication. So when we get to one factorial, that's equivalent to one. So that's what we call our base case. That means that for one factorial, we're not going to use this formula because we assume that one factorial is mapped to a constant, which is one. So we don't have to do this calculation because it would result in one times one minus one factorial, which is zero factorial, which in math is valid. Zero factorial would also map to one. But that's not so important for our case here. What's important here is that one factorial is when we can stop. So in terms of code, that would look something like this. If n is less than or equal to one, we can return one. The reason I say less than or equal is because of that zero case. If we were given zero as our original input, we would want to return one. That's just the definition in math. But you can see this is very important because if we don't have this, we would continue this infinitely. We would continue to call the same function that we're inside of. It would never stop. And we would never go back up to compute our original result, which we were trying to do was compute five factorial. We would never go back here. We would just keep going down and down and keep getting more negative and negative and negative. So how is this actually going to run? Well, let's say we were given five as the input. Five is not less than or equal to one. So we say that we're going to return five times four factorial. So that's how we go down here. We're going to go down a step to here. And then for four, we're going to see the same thing. It's not less than or equal to one. So we're going to compute three factorial. So we're going to keep going down and down and down until we get to one here. So that's going to be our base case. That's when this is actually going to execute. We're going to end up returning one. So we're going to go back up to our parent and have a one that we're returning. So at that point, this code is going to execute. We passed in two minus one factorial here. This return call evaluated to one and currently n is two. So this is going to be two times one. And then from here, we're going to end up returning two. So from here, we pop back to the parent and return two. So then this line is going to execute because what we passed into here was three minus one, which was two factorial. We know two factorial now is equal to two. That was the return value of this. So then we're going to call three times two and then return that that's six. So now it's starting to get a little interesting. We're going to return six from this line. We're going to return six back up to the previous call. So this, which was four minus one, which was three, three factorial is six multiplied by n, which in this case is four. So four times six is going to be 24. We're going to return 24 up to the previous call. And then we'll know that four factorial was 24 and in this case is five. So five times 24 is 120. That's going to be the result of the call, which is five factorial. So when we called factorial and passed in five, this is going to be the result. n is five. So five times 24 is 120. That's going to be our result. So when it comes to analyzing the time complexity, it can be difficult to determine just by looking at the code. That's why drawing a picture like this can be helpful. You can see that to compute n factorial, we basically have n steps. We have n calculations that we have to do n multiplications rather. So to compute this, the time complexity is big O of n, where n is, of course, the input that we're given. This is the same as it would be if we used a while loop or a for loop where we start with n and we decrement it by one each time to get this computation. So there's definitely no benefit in this case of actually using recursion. I'm mainly doing this to teach you and actually in this case, the recursive solution is actually worse than doing it with a while loop because in this case, we end up needing extra memory. This solution actually has big O of n memory complexity. We care about time complexity, but we also care about memory complexity. We'd rather not allocate additional variables that take up extra memory if we don't need to. Now, in this case, we're not even allocating any major variables. We don't have any arrays or linked lists or anything like that. So why is this taking up O of n memory? The way this code is going to run is similar to how I've drawn this picture. We're going to have a function that's computing this. That function is going to take up space in memory and it's going to we're going to save that. We're going to put that function on hold and we're going to say, OK, stay right there. First, we have to compute four factorial and then we're going to keep this in memory as well. We're going to save our spot here and we're going to say before that compute three factorial before that compute two factorial before that compute one factorial at the point where we're here, where we're calculating one factorial. We're going to have all five of these functions taking up memory. Now, each function doesn't really declare any major variables, but each function is going to have its own input parameter. So you can think of the function declaring one variable at the very least and the function itself actually takes up a little bit of memory so that it knows when this function is done, return back up to the parent. But that's getting a little too into the details. The important thing here is that we have n function calls that are all going to be alive at the same time. So this takes O of n memory. And by the way, the non recursive solution would look something like this, where we just have a single variable. Let's call it our result or our factorial. It can be initialized to one because we know that's going to be the base case anyway. I don't think negative factorials are something that we actually have to consider. So we're going to assume that n is always greater than or equal to one. And then while n is greater than one, we would multiply the result by the value. So n and then decrement it by one. So then we would get, you know, n minus one and minus two until n is equal to one. And then we would stop. Now let's look at the more interesting case of two branch recursion. And let's use another math example. In this case, let's calculate the Fibonacci numbers. Now, the general formula to calculate the nth Fibonacci number is to take the n minus one Fibonacci number and the n minus two Fibonacci number and add them together, which gives us the nth Fibonacci number. But if we do it like this, we're going to go on forever if we try to calculate the fifth Fibonacci number, because when do we know we can stop? Well, we do have some base cases for us to find the zero Fibonacci number is zero. The first Fibonacci number is one. So using this, what would the second Fibonacci number be? Well, it would be the n minus one Fibonacci number, which is one plus the n minus two Fibonacci number, which is zero. And we know that these are defined as a part of our base case. This is one. So one, this is zero. So plus zero. That means the second Fibonacci number is equal to one. So now we have that, then we can compute the third Fibonacci number, which we know is going to be n minus one. That's two plus n minus two. That's going to be the first Fibonacci number. So now we have the second Fibonacci number, which is one. We have the first Fibonacci number, which is also one. So this is equal to one plus one. So therefore the third Fibonacci number is two. Now, the way I'm calculating it right now is very straightforward. I'm pretty much looping through these and we could go as high as we want. We could now calculate the fourth, the fifth, the sixth Fibonacci number. But this is not recursion. The recursive solution for this problem is actually less efficient. But I still want to show it to you because it illustrates for us two branch recursion. So let's say we want to calculate the fifth Fibonacci number. This is where our decision tree is going to get very useful. Maybe we can't immediately compute this, but we know this can be broken up into a couple sub problems like n minus one and n minus two. So to compute the fifth Fibonacci number, we can compute the n minus one, which is four and n minus two, which is three. But for either of these branches, we still haven't reached the base case yet, which we know is zero or one. So let's continue down the left path. This can also be broken up into a couple sub problems. Again, n minus one and n minus two. So n minus one would be three and minus two would be the second Fibonacci number for this, it's going to be n minus one, which is the second Fibonacci number and n minus two, which is the first Fibonacci number. Now let's continue to draw this out. So the third Fibonacci number can be broken up into n minus one, which is the second Fibonacci number added with the first Fibonacci number to get the second Fibonacci number. We can get the first Fibonacci number added with the zero Fibonacci number. It's starting to get a little bit messy, but we're almost done. So over here, the second Fibonacci number, which we sort of just did over here, is also going to have two steps, which is the first Fibonacci number added with the zero Fibonacci number. Now over here, do we also need to break this up into sub problems? No, because we finally reached our base case over here. So now to take a quick preview at the code, the base case we know is going to be either zero or one. So we could have an individual case for each of these. We could say if n, the input is zero, then return zero. And if it's equal to one, return one, but we can condense that into a single if statement, if n is less than or equal to one, that means if it's either zero or one, then return n, because when n is equal to one, we're returning one, when it's equal to zero, we're returning zero. So whatever it is, that's what we can return when it's less than or equal to one. But for the recursive case, as we've been doing, we know we have to compute the n minus one and the n minus two Fibonacci number. And then when we have those, we're finally going to add them up and then return that integer value. But we're not quite done yet. We know that this here, the first Fibonacci number, it is one. So that's what I'm going to put here. And actually, we've reached the base case with all of these, but there's just one branch left for us. So let's quickly complete that one and then we can compute the result. So the second Fibonacci number we know is n minus one, which is one plus n minus two, which is zero in this case. So adding these together will give us the second Fibonacci number. So now going down this path, let's say for n minus one, we're going to end up returning one in our code over here for n for zero. We're going to end up returning zero. And then we're going to be back up here in our decision tree for this. The code we're going to be at is this return code. We've computed n minus one, which turned out to be one. We computed n minus two, which turned out to be zero. Now we're adding them together one plus zero. So we're going to end up returning one from here. And this was also a base case. So recursively here, we would have executed this line of code and we would have ended up returning one. You can see that it might not be easy to know what's going on just by looking at the code, especially when you get to two branch recursion and beyond that. That's why we draw out these decision trees, which make it relatively simple. So we know to compute the third Fibonacci number over here, we get the results of the two paths. We were first computing the second Fibonacci number was turned out to be one and the first Fibonacci number, which turned out to be one as well. We add those together. So we add the return values from here and then we return two from here. So now we're trying to compute the fourth Fibonacci number. First, we needed to compute the third Fibonacci number to do that. We did that here. Now we need to get the second Fibonacci number and to get this, we get its sub problems, the first and the zero Fibonacci number, which both are, well, this is one and then the zero is zero. We add those together. That's going to be one. So from here, we return one. So to get the fourth Fibonacci number, we have two, which is the third Fibonacci number and one, which is the second Fibonacci number. We add those together and we return three. Now at this point, you're probably starting to get the idea. So finally, to get our original result, which is the fifth Fibonacci number, we needed to compute the fourth Fibonacci number. We did that. We have three. And now we need to compute the third Fibonacci number. And to do that, we need to get the second and the first Fibonacci number. And to get the second Fibonacci number, we need to get the first and the zero, which are going to be one and zero. Add those together. We get one as the second Fibonacci number. Add this and this together. We get two as the third Fibonacci number. And then add this and this. We get five to be the fifth Fibonacci number. It's gotten very messy here. But we did finally get our result. And again, this is definitely not the most efficient way to solve this problem. Earlier, we were doing it with a loop. So to get the Nth Fibonacci number with the loop technique I was talking about, the time complexity would be big O of N. But here we've drawn it out recursively. And clearly this is not big O of N. So to analyze the time complexity of this recursive solution, looking at the code is not going to help us a ton. The picture is a lot easier to use, but it's going to require a bit of math. First, to get the fifth Fibonacci number or just some general Nth Fibonacci number, we have to break that up into two sub problems, right? And then those two sub problems have to be broken up into two more sub problems each, and then those sub problems might need to be broken up into two more sub problems unless, of course, we've reached the base case, which for this one, we have reached the base case. But in general, for each of these, we might have to break it up into two more sub problems. Now, the fact that we have to break it up into two sub problems comes from the fact that we have two branches in our recursive tree. But in this recursive tree, how many times are we going to have to break it up into two steps? In other words, what is the height of this decision tree going to be? The height is basically how many levels does this tree have? You can see one, two, three, four, five levels. That makes sense because for the path that where we have five, then we have four. And then to get that, we have to go to three, then to two, and then to one. That's what's the longest path is going to be. So the number of layers in this decision tree is going to be n, where, you know, that's the nth Fibonacci that we're trying to compute. Now, this is the part where I could get overly technical if I wanted to, but I want to try to keep it simple in terms of math. So in general for this, for us to get the number of Fibonacci values in the last level, theoretically with math, it would be we start with one and then we double every time. So in theory, it'd be two times two times two. And the number of terms that we would have is equal to n. Roughly speaking, it's equal to n because that's the height of the tree, right? If we're taking this and then doubling it and then doubling it, the number of times we double it is going to be n. And that's the same as the input value. So to get the number of values in the last level is going to be roughly, let's say two to the power of n. Now, in this case, we can see that the last level only has two. So in reality, this is actually an upper bound for the number of values we could have in the last level. It's not the precise value. It's an upper bound. But what we know about this last level is the same thing we talked about earlier in math, when you have a series like one and then two and then four and then eight, where you're doubling it every time. This is special in math. It's called the power series. It's always dominated by the last term. So eight is going to be greater than or equal to all the previous ones combined. If we add a value 16, it's going to be greater than or equal to all the previous combined. So what I'm saying here is that this is an upper bound for the last level. That means it's also an upper bound for the total number of values in this decision tree. If you don't believe me, well, remember, we can multiply this by any constant. So we say, let's say our constant is two. So two times two to the power of n is still in terms of big O time complexity equivalent to two to the power of n in terms of big O time complexity. Precisely, though, it would be n plus one. But remember, we don't care about constants when they're multiplying or adding to variables. So two to the n plus one is equivalent to two to the n minus one as well. Right. In terms of big O time complexity, all three of these are equivalent. So this is essentially a long explanation for saying that two to the power of n is how we can bound this in terms of big O time complexity. Even though this isn't the precise number of values in this decision tree that we drew, it's still a very good upper bound because remember, we don't care about constants like plus one or minus one. And this idea of two to the power of n is something we're going to continue to see throughout the course. It all comes from that idea of math, from that idea of power series to turns out to be a very special number for computer science and math. Next, let's move on to a very fundamental topic, which is sorting, and we're going to start with the insertion sort algorithm. Sorting is used all the time, and there's a ton of different sorting algorithms, all with their own trade-offs. It's usually applied to arrays like this one, but you can also apply it to other data structures, including linked lists and others. So in this case, we have an array of five values and clearly they're not in order. They're not in ascending order, which would be an increasing order like one, two, three, but they're also not in decreasing order, which in this case would be six, four, three, etc. Suppose we want to sort them such that they are in ascending order. So our target array would look something like this, one, two, three, four, six. We can achieve this with the insertion sort algorithm. Let's see how we can do it. Well, the main idea is to break the problem into sub problems again. So if we want to sort the entire array, the first thing would be to have the first portion of the array, meaning just the first element have this sorted. Well, one value like two is sorted by default. I mean, how could this be not sorted if it's just one value? So we consider any array of one value to be sorted. The next thing to do, the next sub problem would be to sort the first two values, make sure these are sorted and then have the first three values sorted and then have the first four values sorted and keep doing that until we have the entire array sorted. So thinking of it this way in terms of sub problems makes the algorithm a lot more simple. Now, since we are doing it with some problems, technically, if we wanted to, we actually could solve this using recursion, which is a technique we just learned, but it's not really useful in this case. So I won't do it. We can do this without recursion, AKA we can do this iteratively. We can have an iterative solution, which is using loops. Now, the idea is we're going to loop through the array, but we're going to start at the second element because we know that this sub array is already sorted. So starting here and assuming that every value before this position, meaning this sub array is already in sorted order, our goal now is to have this array be sorted. So our goal really is to figure out where to put this value. We already know all of these are sorted. Where should we put this one in terms of this array? In this case, we don't have a lot of choices. The logical thing to do is compare this value to the only other value in our array. Is this larger? If this is larger, then it should stay where it is. If it's smaller, it should be swapped with the value over here. Since this is larger, it can stay here. Now, there's the other edge case where what if both of these are equal? If that's the case, this value can also stay here because it doesn't matter whether this goes here or here if they are both equal because it'll be sorted either way. This is for ascending order, but it would work the same way with descending order as well. So now that we know where this value should go, it should stay where it is. We can now move to the next value. So by the time we get here, we know that this sub array is already sorted. So at this point, we're going to do a similar comparison. We're going to compare this to its left neighbor is for greater than three. Yes, it is. So it's going to stay exactly where it is. And at this point, we didn't even compare four to two, but we don't need to. Do you know the reason why? Well, we already know these were in sorted order. So if four is greater than three, of course, it's going to be greater than every value to the left of three, because these are sorted. Every value to the left of three is going to be less than or equal to three. And now every value to the left of four is going to be less than or equal to four. So at this point, we know that these values are in sorted order. We can shift our pointer to the right by one over here. And now things are going to get a little bit interesting. So now we have this sorted portion and we have to figure out where the one should go, should it go at the end of these three values, should it go in between these two values, should it go in between these two values, or should it go over here? Well, let's see what the algorithm does. This is why it's called insertion sort, because we're taking one value and inserting it into a sorted array. So let's see what the algorithm does in this case. First, we're going to compare one to four is one greater than or equal to four. It's not. So now we finally execute the other case. And by the way, the pseudocode for this would look something like this. Remember our pointer, which is I in this case started at index one, and we're going to keep going until we've reached the end of the list. So that's pretty straightforward. And now we're going to get to the interesting part where we compare this to the neighbor. So we have a second pointer for the neighbor we call that J. This is at I, or you could also call this J plus one. That's what we're going to be doing. And then you compare these. And now the value at J plus one is less than the value at J. That's this part of the code. This part is basically to make sure that we don't go out of bounds. And you'll see what I mean in just a second. But now that this comparison holds, we're going to perform the swap. This is where we take this value and move it over here and take this value and move it over there. So the first thing we would do actually is take the value that was originally at J plus one, which is one and put it in a temporary variable. Because what we're going to do is now at this index J plus one, we're going to move the value that was here. So we're going to put a four over here and then over here. Now that we've lost the one, we have to use our temporary variable. So we're going to put a one over here from our temporary variable. And originally, our J pointer was over here. We're going to decrement it by one, which means we're going to shift it to the left by one over here. And the reason this is a while loop and not just an if statement is because now we're going to continue this whole process. We originally had the one over here and we know it should be shifted to the left because it's definitely less than four, but we don't know that we're done. We don't know that this is the final spot that this one is going to be. We have to continue comparing to the left neighbors. So now we're going to execute the condition again. First, we're going to make sure that J is not out of bounds yet, because as we keep shifting it, we know at some point it's going to be all the way over here. It could be out of bounds. But right now it's not out of bounds. We know that because it is greater than or equal to zero. And now we're going to compare the value at J plus one to J. J plus one is less than the value at J. One is less than three. So now we're going to do the exact same thing. We're going to perform the exact same swap. So now we have our three over here and our one over here, but we're not done yet. We're going to decrement our J one more time. Now J is going to be over here. It's still in bounds and the value at J plus one is again less than the value at J. So now we're going to do another swap. So the two is over here and the one is going to be over here. And now J is going to be shifted to the left by one again. Now it's out of bounds. So now there's no point in even doing a comparison. That's why we're checking that it's in bounds before we do this. We don't want an index out of bounds error. So since it's not in bounds, we we can stop doing this loop. And we know now that our eye pointer, by the way, was over here. So now at this point, our eye pointer is going to be shifted over here because we're going to execute the next part of the for loop, the outer loop that's over here. And we know because of that, this portion of the array, these first four values are now in sorted order and they look sorted as well. One, two, three, four. So now that our eye pointer is over here, our J pointer is going to be initially over here, J is in bounds. So now let's compare the two values is six less than four. It's not so we know that this is greater than four. Therefore, it's going to be greater than all of these as well. So that means that this is already in sorted order. Our outer for loop is now going to stop because there's no more elements over here, you know, left for us to traverse. Now, we've been talking about sorting integers so far, but of course, this could be applied to characters if we had something like B, C, A, as long as we have a way to compare characters. So in this case, let's say we're sorting based on just alphabetical order, we can basically sort any set of values as long as we have a way of comparing two values together. So in this case, we could sort these to be something like ABC. Of course, we could also sort strings themselves the same way a dictionary sorts words. So a word like Apple would go before a word like banana with sorting. There's also this notion of stable versus unstable sorting. Let's say we had this array of values, seven, three, seven. Any sorting algorithm is going to take these values and put them in sorted order, which would be three, seven, seven. Any sorting algorithm could do that, but the difference between a stable sorting algorithm and an unstable sorting algorithm is this. This is what a stable sorting algorithm would do. It would put the numbers something like this. The thing to notice here is that for seven, it appears twice in the array. So this first seven could have gone here and then the second seven would have gone here, or they could have been in reverse, right? This seven could have gone first and then this seven would have gone second. Both of those are valid sorting order, but a stable sorting algorithm will preserve the original order when there's a tie. We know that there's two sevens. And for some reason, if we wanted the original relative order of these two sevens to be preserved, meaning this one showed up first before this one. So it should go first in sorted order. That's called stable sorting. Unstable sorting might achieve this, but there's no guarantee. It's not necessary that that will be preserved. So it could have flipped them for some reason. We'll touch on this again as we continue. But my question for you, based on this, do you think the insertion sort algorithm is stable or is it unstable? But well, let's quickly go through how it would execute on this. First, our pointer would be over here. We would compare the three and the seven. They're out of order, so we would swap them. So then our our first two values would look like this. So that's good so far. And then our third pointer would be over here. Right. And this would be a three and this would be a seven. And we would say this seven is it less than the seven on the left of it? Right. That's where this would execute. This seven is not less than that. So it's going to stay exactly where it is. So therefore we would achieve this output where the relative order in case of a tie would be preserved. These two would stay in the order they were in originally. So based on this implementation, this is a stable sorting algorithm. It's preferred to keep an algorithm stable if you can. We could code this in such a way where it's not stable, but that would have no benefit for us, so we're going to keep it as stable. So now finally, let's talk about the time complexity, which is interesting for insertion sort, actually. And the best way to actually understand it is to look at a couple examples. Let's say we were running insertion sort on a list that's already sorted. What would the algorithm do? Well, we would have a pointer over here. We would want to, of course, iterate through the array. But but then we would do a comparison from here. We would compare it to the left neighbor. They're already in sorted order, so we don't have to do anything. We can shift our pointer over here. We compare these two values. They're already sorted. We don't have to do anything. We shift the pointer again. Then we compare these two values. They're already sorted. We don't do anything. We shift the pointer. We're out of bounds. That means we're done. So as you can see, when we have an array that's already sorted, we run insertion sort and the execution is just iterating through the list, which is big O of n time complexity. Of course, we have nested loops here, but just because we have a loop here doesn't mean it's necessarily going to execute. I just showed you an example where this inner loop is never going to execute. So in this case, the best case time complexity is big O of n. Now, technically big O means worst case, so I'm kind of misusing it here. But it's kind of common for people to talk about it this way. But in this case, I'm saying the best case time complexity, which would occur when the input is already sorted, is going to be linear time complexity. Let's look at the other case, where is the worst case time complexity, which would actually occur on the opposite example where the input is in reverse order and we want to sort it in ascending order. So in this case, again, we would loop through the array. We would start here. We would compare the two neighboring values. We would see that they're out of order. So then we would have to swap them. We would put the three over here and the four over here. Then we would shift our pointer over here and then we would compare these two values. And then we would see they're out of order. We'd have to swap them again. We would have to put the two over here and the four over here. And at this point, we're still not done yet. This two is going to be compared to its left neighbor now. And they're again out of order. So we would have to put the three over here and the two over here. So you can see it's definitely getting messy. But ultimately what we would find is we would have to take this three, shift it over here, then we would get our pointer over here. This two, we would have to shift it here and then we would have to shift it here. Then when we get to this one, we would have to shift it to the left, then shift it to the left again and then shift it to the left again. So for every single value that we iterate through, the inner while loop would have to execute the maximum number of times until that value is shifted all the way to the left. When we get to the two, we'd have to shift it all the way to the left. When we get to the one, we'd have to shift it all the way to the left. Now, for an input size of four in this case, n squared time complexity would look like four squared because the length is four, which would be something like four plus four, right? Added together how many times? Well, four times. That's just the math formula here. But this isn't that. What we were doing right now was actually a bit different. What we actually have is something like one and then plus two and then plus three and then plus four, because for this value, we might have to consider it to be in these two positions. For this value, it could be in these three positions. For this value, it could be in these four positions, which correlates to that four. And I just talked about the two and three case. And we might not have a one case because remember, we're skipping the first index. But again, this is just an approximation. And that's enough for us when we're doing big O time complexity. Now, what most people do when they have nested loops, they automatically assume that the time complexity is big O of n squared. And usually it's the correct assumption, but I bet most people don't know why that's the case. And I'm going to explain it to you briefly right now. This equation is bounded by n squared. It's pretty obvious when you look at it like this, because every term here is greater than or equal to every term here. But I'm going to show it to you in terms of big O time complexity, where remember, we know that constants don't matter when we're multiplying by a variable. So some constant C times n squared doesn't matter regardless of what the constant is. I'm going to use this idea to prove it to you that this is bounded by n squared. And I'm not even going to use a math proof. Now, maybe I'm not smart enough to show you the math proof of why this is true, but I'm smart enough to draw a picture and that turns out to be enough. We know that n squared, where n is four, would look something like this in terms of geometry, right? Four times four, right? This is us having to iterate through an array of size four, four times we have to do that. We have to iterate through it once, twice, three times, four times. But we also have the case where we iterate through just, let's say, the first value of the array and then the first two values of the array and then the first three values of the array and then the first four values of the array. So maybe now it's becoming a little more obvious why this is bounded by this. And not only that, this is actually pretty much exactly half the size of n squared. And we even talked about how the first term in this case doesn't actually apply. So maybe this isn't even there, but it's still you can see approximately half the size of n squared. We didn't even use a math proof to figure this out. This is obvious enough from looking at a picture. So if this is half the size of n squared, we can write it as n squared divided by two, or in other words, one half times n squared. And this is just a constant, remember, and we don't care about constants. So we can say this, or in other words, iterating through an array like this would be the same as big O of n squared time complexity. That's the worst case time complexity of insertion sort. So that was a lot to cover. But to finally wrap up our first sorting algorithm insertion sort, which is a stable sorting algorithm has a worst case runtime of n squared. But it's also worth mentioning that the best case runtime is big O of n. That's not always true for other sorting algorithms. But it does turn out that there's better sorting algorithms than n squared worst time complexity. Merge sort is one of the most common sorting algorithms and for good reason, it's a pretty efficient algorithm. The main idea is to take the input array and then split it into two approximately equal halves and then for those halves split them into approximately equal halves and then continue to split each sub array until we can't split them anymore, which would mean we have individual elements left for us to then sort. The idea is to break up the original problem of sorting the entire array into sub problems. So in order for us to sort the entire array first, we can sort the first three values three, two, four, so we can sort the first half of the values. In this case, we have an odd number of values, so we can't split it exactly into half, but more or less, it's approximately half. And then the second half of values, we want to sort these as well. But then for these sub arrays, we also have sub problems. We can split this array into half, the first half and the second half and sort those portions of the array before we sort this original array. And the same thing for here, you can see that since we have a natural way of splitting this into sub problems until, of course, we reach the base case where we just have sub arrays of a single element like down here as well. This problem naturally lends itself to recursion. I promised you that recursion was going to come up a lot more. And this is one of those cases. And this is clearly two branch recursion because we're splitting this into two halves, and I'll talk later why exactly we're splitting it into halves. But to give you a hint, as you know, the number two is pretty important. And as we take the array and divide it by two each time, that lends itself to be an efficient algorithm. We'll talk about that more a little later. But this array as well is being split into two halves, two branches. This array, same thing. But these are the base cases. Now, we talked about how we're going to divide this into sub problems. By the way, the technique we're using right now is sometimes called divide and conquer because we're taking the original problem and dividing it into sub problems and then solving those sub problems before we solve the original problem. But I haven't talked about how we're even going to solve these sub problems and what we're going to do once we have that done. Well, let's start on the left side. We're going to first take this array, get the left half of it. We're going to take that and get the left half of that. And then we're going to take this and then split it into two individual elements, three and two. Now, this is technically a sorted array and this is technically a sorted array. When you have two sorted arrays, we can merge these two arrays back into the original array such that they are in sorted order. That's why this is called merge sort because we break it up into two sub problems and then merge them back together. So forgetting the original values that were in the array, we create a copy of the value that was here and we create a copy of the value that was here. And in this case, it's pretty simple because we just have two elements. So we're going to compare these two. Which one is smaller? Two is smaller. So the two is going to go over here and the three is going to go over here. Now, from the perspective of this array, we've solved the left sub problem. We have this portion sorted. Now we need this portion sorted. Well, it's just a single element. So this is the base case. So now we don't care about the original values that were over here. We have two sorted arrays and we want to merge them together into the original array. In general, the algorithm is going to be two pointers for us to do that. We're going to start at the beginning of this array and we're going to start at the beginning of this array and we're going to compare the values because we know at the beginning of each array is going to be the smallest value because this is sorted and this is sorted. So we're going to compare the two values. Which one is smaller? Two is smaller than four. So two is going to go over here. The pointer that was over here is now going to be shifted to be over here. And now we're going to continue the algorithm. We're going to compare the value here to the value here, which one is smaller? Three is smaller. So the next value that we insert is going to be three in the next position of this array. And at this point, our pointer that was in the first array is now going to be out of bounds. So at this point, we have no choice but to place the value from the second array in the next spot, which is also the last spot. So we put the four over here. So now we finally arrived back at the original problem we were trying to solve. We sorted the left half of the array. Now it's time for us to sort the right half. So then we have this portion. We break it up into individual elements. We don't care about what's originally here. We compare these two values, which one is smaller? One is smaller than six. So we put a one here. That is what the original value actually was. And we put the six here. So in this case, the original was already in sorted order, but we still had to do this work anyway. But now that we took the original array, have it sorted into a left half and a right half. We can merge these two portions together to make the original array sorted. So forgetting about the original values that were over here. We again use our two pointer technique. We have a pointer at the beginning of the left array and a pointer at the beginning of the right array. We compare the values. One is smaller than two. So we're going to take the one, put it at the beginning, and then we're going to shift the pointer that was here to be over here. By the way, we are going to have a third pointer in the output array. It's going to tell us where we should insert the next element. Originally, it's over here. Now that pointer is going to be over here. And then next, it's going to be over here and then here and then here and keep going until we run out of elements to insert. But right now we're going to be comparing the two and the six, which one is smaller, the two is smaller. So we're going to insert the two over here. We're going to take the pointer that was at the two and then shift it to the right to be over here. And we're going to repeat which one is smaller, three or six, three is smaller. So let's put the three over here and then shift the pointer to be at the next element. Let's compare these two again, four is smaller. So we put the four over here and then shift our pointer to be over here now. It's out of bounds. So we have nothing left to compare this six to. So at this point, whether we have one element over here or maybe we would have a couple more elements over here, regardless, since we ran out of all elements of one of the arrays, at this point, we would just fill in the remaining spots from the other array that's non empty. So in this case, we only have a six. So we put the six here. But if we had more elements like a seven and an eight, we would just start filling them over here. So taking a quick look at the code, like I said, this is going to be a recursive algorithm clearly where we're breaking it up into two steps. So this is going to be two branches of recursion. Now, into this merge sort function, I'm passing in the starting index and the ending index, as well as the array that we're actually going to be sorting. That's because as we break this up into sub arrays, we don't necessarily have to create a variable for each sub array. We can just kind of keep track of this half based on the starting index and the ending index rather than actually creating a brand new sub array like that. But how do we know if we've reached the base case? Well, if the length of the sub array is less than or equal to one, well, how do we get the length itself? We can take the ending index minus the starting index plus one. So basically, you know, this element by itself is at index zero. So what we do is take the ending index, which is zero minus the starting index, which is zero plus one gives us one. So this is of length one. So that's how we know it's a base case. That means it's already sorted so we can just return the array itself. But if we got to a case like this one where the starting index is zero and the ending index is one, then we're going to get the ending index minus the starting plus one, which is going to give us a length of two. That's correct. This is length two. But that means our length is not less than or equal to one. So what we have to do is calculate the middle index between these two. So we take the starting plus the ending and divide it by two. So zero plus one divided by two. So most programming languages will round down. That's what we want to do round this down. So one divided by two is going to evaluate to be zero. So we're going to say the middle index is zero. This is the middle index. So what we're going to do now is break this up into two halves. We're going to call merge sort from starting to middle. So in that case, this means call merge sort on the array starting at index zero and ending at the middle index, which is zero as well. So we're going to call basically that evaluates to this. We're going to call merge sort on just the first element. And we're also going to call merge sort on the second half of the array. So from index M plus one, which is one and the ending index, which is also one, so, you know, that's pretty much correct. We're going to call merge sort on this portion of the array. And these, of course, are the base case. So after we call merge sort, we're going to expect those two halves to be sorted. So then what we're going to do is merge those halves together. Now, I haven't shown you the implementation of this. It's not super complicated. It's just a decent amount of code. It's basically the two pointer technique we were talking about a moment ago. But what we're saying with this function call is merge the left half of the array, which starts at index S and ends at index M and merge that with the right half of the array, which starts at M plus one and ends at index E. So it will basically merge these two together into the original two element array. Now, I will say when we do merge the left and right half together, we do need to create extra memory. So for the left and right half array, we're going to create temporary arrays. So we are going to create copies of the left and right half arrays, but we're only doing that for the merge step. So only when we start to actually merge them together into the original are we going to create those copies so that we don't lose what the original values were. So in general, we start out with a big problem. We split it into two sub problems, solve the sub problems and then merge the result of the sub problems together. Now, in terms of time complexity, this is a relatively efficient sorting algorithm because we're splitting it into halves, we're doing that on purpose because it does make things more efficient. So the first question is starting with an array of length N, how many times can we split it by half? In other words, how many levels are there to this recursion? Because it's going to be the same number of times where we can take N and divide it by two because here we have let's assume actually we just had four elements to keep the math simple, we would have four elements, we would split it into two elements on each side and then we would split it into one element on each side. And these two wouldn't exist. I'm doing this just for simplicity, but we start with N equals four. We divide it by two. So N equals two. And then we divide it by two until N equals one. We know when N equals one, that's our base case. So what's the math formula for how many times we can take this number and divide it by two until it's equal to one? Well, if we were starting at one and then multiplying it by two every single time, that would be something like two to the power of N, but we're doing the reverse of that. We're dividing it by N. So what we're asking is we have a number N and we want to divide it by two and then divide it by two and then divide it by two, which is the same as saying two times two times two, et cetera, et cetera, right? Which is the same as saying two to the power of some variable like X, right? This is the number of times we're going to divide this by two. And we want to keep dividing it by two until it's equal to one. Well, by doing a little bit of algebra on this, we can rearrange this to be the same as N equals two to the power of X. We care about X. That's how many levels are going to be in this merge sort algorithm. So how do we solve for X? Well, in case you forgot a little bit of math, I'll tell you we can take the log of both sides of this equation. That will give us, and by the way, I'm doing log base two. Log base two of N is equal to log base two of two to the power of X. And if you forgot your log operations, this can be simplified to take this X away and put that X over here. Now, log base two of two is the same as saying two to the power of what is going to be equal to two, while two to the power of one is equal to two. So this whole thing, log base two of two, simplifies to be one is what I'm getting at. So what I'm saying is we already solved for X. X is equal to log base two of N. So what I'm saying is if we have a number N equals four or any number, how many times can we divide it by two until it's equal to one? Well, our answer is log base two of N log base two of that value is how many times we can divide it by two until it's equal to one. And this turns out to be a reoccurring theme throughout algorithms and data structures. It's something we're going to see a lot more throughout the course. And this is kind of the math background of why most people don't touch on this and most people never learn it and never understand it. But I wanted to explain it to you because I think thinking about it this way and understanding your thought process is very important throughout algorithms and data structures. It's really what separates beginners from people who can solve problems on their own. So we did all that work just to figure out that the height of this is going to be log base two of N. I'm going to usually omit the two. Usually it's implied that logs are based to most of the time. But that's not the entire time complexity. That's the number of steps that we're going to have. But what's the time complexity of each step of this algorithm? Well, it's big O of N. Let me tell you why. Basically, it has to do with this merge step. So suppose we're at this step of the recursion. We have our sub problems here and here. And now we want to merge them together in this array. What are we going to have to do? We're going to have to iterate through every element in each of the sub arrays, which is going to result in an element being put in each spot in the original array. So basically for this step, we're going to have to iterate through every position of the input array. So that's clearly an O of N time operation. But my argument is we're going to be doing that for every level in this recursion. So here for this sub array, we're going to split it into a couple of smaller ones and then merge these into this array. But this is only half of this level, right? To do this part of the level, we're going to have to do the same thing. We have an individual element here and here. We're going to merge these together into the original array. And so this level to build this level took O of N. And similarly for the bottom level, it's the base case, but we still have to visit this right like one operation is still going to run for this part. And then same for this and for every part of this. So this level takes O of N time complexity to construct this level. So basically the number of levels we have is log N for each level. We're doing big O of N time complexity. So the overall time complexity becomes log N times N. Or in other words, we could write it as N log N time complexity where this is being multiplied by that. So overall, the big O time complexity of merge sort is N log N. Now, the memory complexity is going to be big O of N because at any point, like for this point, we're going to have to create two temporary arrays, which are these ones to build the original array. So that's going to have extra memory, which is roughly big O of N, which is the size of the original input array. So now the last question to answer is merge sort a stable sorting algorithm. Does it preserve the order of elements where there's a tie? Like when we sort this array, will this seven come before this seven? Well, we can definitely code it in such a way that it does. So first, we'll merge these two together. There's only one way to merge them together. The three, of course, is going to go before the seven. It's smaller. It has to do that. And this value, there's only one seven. So that's the only way we can have that. And now as we build our new output, we're going to have two pointers, one pointer here and one pointer here. This one is smaller, so we have no choice but to put the three over here. Next, we're going to shift this pointer to go at the next position. Now, to make sure that this is stable, we have to put this seven before this seven, because now there's a tie. How are we going to code that up? We're going to say as long as the element from the left array is less than or equal to the value in the right array, then this value is going to go in the next spot. So it matters how we handle the equality case. That's the edge case. If this one is smaller, it's going to go first. If this one is smaller, it's going to go first. But if they're both equal, then we're going to have the left element go because that makes sure that this is stable. Because if the element showed up in the left half of the array, that must mean it came in the original order. It came before the element in the right half of the array. So to add to our list, MergeSort has a time complexity, a worst case time complexity of n log n. And it's also a stable sorting algorithm. So MergeSort is in most cases preferred over insertion sort. That's why this is one of the most common sorting algorithms used. But you might be wondering, is n log n really that much more efficient than n? And the answer is yes. We know that the value n grows linearly and the formula 2 to the power of n grows very quickly. And the comparison between these two is the same comparison between n and log n. This grows so much more slowly than n. Think about it like this. If we had a value like eight and we wanted to get it to be down to one and we were decrementing it, we would go like seven, six and then keep doing that until we got down to one, that's pretty much big O of n. It's linear, but when you have something like log n, we start at eight instead of decrementing, I'm going to divide it by two. I'm going to go way fast. We're going to hit four and then we're going to divide it by two again. We're going to hit two and then we're going to hit one. This is a small example, but this is equivalent to log n. But you can imagine for really, really big numbers, it's going to be a lot faster to take a big number and keep dividing it by two to get it to one rather than taking a big number and then subtracting one, which would be linear. So log n is very, very efficient compared to n. Quicksor is another common sorting algorithm. And it's actually very, very similar to merge sort. The idea is that instead of splitting the array into two equal halves and then sorting those halves instead, we're going to pick a random value. And for convenience, we usually pick the right most value. And this is called the pivot value. So the pivot value is special because what we're going to do is we're going to iterate through every single value in the input array except for the pivot value. So basically every value before the pivot value. And we're going to compare that value to the pivot and every value that's less than or equal to the pivot value is going to be in the left partition of the array. So what we're going to do is take the input and partition it into two pieces, a left and right piece where every value that's less than the pivot, less than or equal to the pivot, is going to go in the left half and every value that's greater than the pivot is going to go in the right side. And performing this partition is actually pretty simple in terms of code. And we can actually perform the partition in place. That means we don't need to allocate any extra memory to do the partition. We can do it with just a single array. What we're going to do is take the right most value. That's going to be called our pivot. We're going to have a pointer that's going to start at the beginning of the array. And the idea is if we took every value that's less than or equal to the three and then placed them in this leftmost area, if every value in the input array that's less than or equal to three was placed over here, then by definition, all the values that are greater than three would take up the remaining spots. As long as we swap them appropriately, that's going to be the case. So all we have to worry about is taking the values less than or equal to this and then filling them starting at the left side. If we do that correctly, then the less than or equal to values will be here and the greater values will be here. So let's start doing that. So we're actually going to have two pointers. One pointer is just going to help us iterate through here. And then the second pointer is going to tell us where we should insert the next value that is less than or equal to three. So first we get to six, six is not less than or equal to three. So in this case, we don't do anything. We take our pointer over here and then shift it to the right by one. This pointer stays here because when we do find a value less than or equal to three, this is the first spot we're going to put it. So now we find a two, two is less than or equal to three. So we swap two with whatever happens to be in this position. So we can swap these two values, six goes here and then two goes here. Now this pointer that's over here is going to be shifted over here. Now, this is the next spot that we're going to insert the value that's less than or equal to three. So next we can take this pointer and then increment it one more time. So now we're at four is four less than or equal to three. It's not so we don't do anything with it. We increment our pointer one more time. We get to one one is less than or equal to three. So that means this one is going to be placed where this pointer happens to be. So we'll be swapping these two, placing the one over here and placing the six over here. And then this pointer will finally reach the pivot index. And we don't really need to do anything. We basically ignore that. But at this point, our array looks something like this. Two, one, four, six, three. We wanted to partition the array such that every value less than or equal to this was on the left side, every value greater than this was on the right side. But where should we actually put this pivot itself to satisfy that? Well, this is where our pointer ended. That means our pointer is here. That means every value to the left of that is less than or equal to three. So this is a good position to put our pivot at. So that's exactly what we're going to do with quick sort. We perform one last swap between where our left pointer is at and our pivot value. So we end with a three over here and a four in this position. So to make it more clean, after doing our partition, we have a two, a one and a three over here and we have a six and a four on the right partition. Now, notice how the left side is not sorted and the right side isn't sorted. Just because we partitioned them does not mean they're sorted. But what is true is every value on the left is less than every value on the right. Every value on the right is greater than every value on the left. Now, this is where the recursive step comes in. Now, let's run quick sort on the left side and run quick sort on the right side. Now, before we recursively run quick sort here and here, one thing to notice about the three is that it's already in sorted order. Everything to the left of it is less than or equal to it. And everything to the right of it is greater than or equal to it. So this three doesn't actually need to be part of our equation anymore. We can just ignore this. So it does exist, but we're not focusing on it. When we run quick sort on this sub problem, let's just focus on these two values. This is already exactly where it needs to be. And that's always going to be true of the pivot value. So now we're going to choose the right most element, which is going to be our pivot. So in this case, one is our pivot. We're going to start at the beginning of the array. We're going to compare two to the pivot value is two less than or equal to one. It's not so we don't do anything. And then we would take our pointer increment to be here. It's at the pivot so we don't do anything. That's when we stop. And the last thing we do before we split the array is we take the pivot and swap it with wherever our left pointer would have been. Remember, our left pointer starts here. And since we didn't end up swapping anything here, our left pointer would stay here. So what we're going to do is take one and two and then swap them. Because we know our pivot should go at the beginning. One should go at the beginning. Two should be here. And so this is the index that our pivot went to. And I didn't mention this earlier, but when we take the array and then split it into two pieces, we do it based on where the pivot value was inserted. So the pivot value three was inserted right into the middle. That's why we took the first three values and then put them over here, even though we didn't actually include the last value. And then we took all values to the right of that pivot and then, you know, ran quicksort on them over here. But so now that our pivot was over here, we're going to take the one and it will go on the right side and then all the remaining values in this case, two are going to go here. And that's essentially the base case anyway. So when we run quicksort on this and this, we're not going to do anything. Of course, an individual element is already sorted. And remember for quicksort, we're not allocating any extra memory. These swaps that we're doing are happening in place. That's the advantage of quicksort over a merge sort. We don't actually have to allocate extra arrays. Now running quicksort on these two elements, this is going to be our pivot. This is where our pointer is going to start. And this is also where our left pointer is going to be the pointer that tells us where to insert the next element that's less than or equal to the pivot. So is six less than or equal to the pivot? Nope. So we take our pointer, our left pointer will stay here, but our eye pointer will be incremented over here. And that's how you know we can stop when we reach the pivot. And then the pivot will always go to the position that the left pointer stopped at it was over here. So we swap these two values. So this would be a four and then this would be a six. And then we would run quicksort on these two sub problems for and six. But this is the base case. This is also the base case. So we're done. Now, remember, this is happening in place. These are not extra array. So if this is in the first position, this is in the second position. We know that three was in the third position, right? And that was always going to stay there. This is in the fourth and this is in the fifth. We have a sorted array. Now, based on this example, the time complexity looks very similar to merge sort because we're breaking it up roughly into equal halves. But the thing about the pivot is we're always picking the right most value and it might not result in equal halves. So if the elements were arranged a bit differently, in this case, they're already in sorted order, we would have six be our pivot. And what we would end up doing is partitioning every value less than or equal to six on the left side. All of these values are less than or equal to six. So in that case, we would split this into two sub problems where we have the first four problem of four values over here, one, two, three, four. And then on the right side, we would just have six. And then this four would be the pivot when we're running this sub problem, which would, you know, keep the values in the same exact order. All these are less than or equal to four. So then this would end up splitting it into something like this. One, two, three, and then four would be over here. And we would essentially continue doing this until we complete it, which would look like this one, two, three. And then lastly, we would split this into one and two, the individual values. But then this all of a sudden the height of this becomes n, where previously if we split it into equal halves, we know then the height of it is going to be log n. But in this case, we're not splitting it into equal halves. So in the worst case, the height of this is going to be big O of n. Now, for each level, we're still having to iterate through approximately the size of the entire input. That's approximately how many we're going to have on each level. So in that case, the overall time complexity becomes n squared in the worst case. So that kind of shows us that quick sort is not necessarily better than merge sort. But on average, this is not going to happen. This is the worst possible case where the input is already sorted. One way to get around this is not to pick the pivot as the rightmost element. It's to take, you know, maybe the leftmost value, the rightmost value and the middle value. And among those three, choose the middle value and then use that as the pivot. So the way we've been selecting our pivot is a bit naive, but there are optimizations you can do, which will help us to make sure that we don't hit this worst possible case. So on average, if we do split the two, if we split the array into two roughly equal halves, then the time complexity of this, obviously, is going to be very similar to merge sort. In fact, the height is going to be log n. And since we know each step we're doing an O of n operation, then the overall time complexity for quick sort on average is going to be law n log n, not n squared. But the worst case still is going to be overall n squared. So quick sort is a pretty interesting algorithm. So the code for the partition would look something like this. Of course, the algorithm quick sort is recursive just like merge sort because we're splitting it up into sub problems. The base case is the same as merge sort. If the length of the sub arrays less than or equal to one, then the array is already sorted. We can return otherwise we're going to pick our pivot. In this case, we're doing it the naive way where we're just taking the right most element. We're also initializing our left pointer to S, which is the start of the array. And that left pointer is basically what's going to tell us where we should insert the next value that's less than or equal to the pivot. So now to actually run the partition, we have a single pointer I, that's that pointer I've been referring to on the bottom, just scanning through the input. And we check, is this value less than or equal to the pivot? That's what we're checking here. If it is, we're going to perform a swap. If it's not, we don't do anything. In this case, it's not less than or equal to the pivot. We don't do anything. We increment our pointer to be over here. Is this less than or equal to the pivot? Yes, it is. So let's perform a swap. First, we take the left value over here, put it into a temporary variable because we're about to overwrite it right now. We're going to replace it with the value at index I. This is that value we're going to put two over here. And then we're going to replace this value with that temp value that we stored. So we're going to put six over here. And then lastly, we're going to increment our left pointer. So this pointer is now going to be incremented to be here. This is the next spot that we would insert a value that's less than or equal to the pivot, which is three. So in general, this is how the partition is going to work. And don't forget the last part of the partition is to take that pivot value and swap it with whatever happens to be at the left index. So in this problem for this example, I'm pretty sure this is where we ended up swapping it with. And this is really the difficult part of the work after that. The last thing would be the recursive step where we run quick sort on the left portion, we're taking left minus one because we know at left, left is the index where we put our pivot and we know that the pivot is essentially in the perfect spot. In this case, three would be here, four would be here. The pivot is already exactly where it needs to be. So we're running quick sort on this half and running quick sort on this half. We're not even including the pivot value anymore. So we would do the left side and do the right side and then we would return. So in this case, you can see the recursive step is pretty simple. The partition is where most of the logic happens to be. So the last question is quick sort stable and the answer is generally speaking. No, there is a way to technically modify it so that it is. But generally speaking, quick sort is not considered a stable sorting algorithm. And let me show you why the easiest way to do so is through an example. So we're going to partition this array. Five is going to be our pivot. So we're going to be iterating through the array. Seven is not less than five. Three is less than five. So we swap these two values. So now the three goes before the seven. We'll look at this value. It's not less than or equal to five. And next we'll look at this. Four is less than or equal to five. So this is going to get swapped in the next position, which is over here. And you can probably see where I'm going with this. So swap these. So now our three is here. Our four is here and we have a couple of sevens over here. I mean, the partition worked perfectly, but you can see that the original relative order of the sevens, the red seven was here and the yellow seven was over here. But now the yellow seven goes before the red seven. So this definitely does not preserve the relative order of values when there's a tie between them. So generally speaking, quick sort is not stable. So adding to our list, we have quick sort, which is an unstable algorithm. Generally speaking, it is better than insertion sort, though the worst case time complexity, as I showed earlier, is N squared. The average time complexity is N log N. I know usually we care about the big O time, which is the worst case time. So if we're being super technical, it would be N squared. But generally speaking, people consider quick sort to be an efficient algorithm on average, it's N log N. So that's how I'm going to list this as, but in a real interview, it would be worth mentioning that the worst case is N squared. But generally it runs more efficiently than that. Now it's time for another sorting algorithm called bucket sort. Are you surprised at how many sorting algorithms there are? You should be because there's a lot more sorting algorithms that we're not even going to cover, but bucket sort is definitely worth covering because it's unique in that actually it can run in big O of N time, even in the worst case. So it's super efficient. So why did we even bother learning the other algorithms that run in, let's say, N log N time? Well, that's the downside of bucket sort. It's very rare that you're able to use bucket sort. It's sort of a forbidden technique, which very rarely gets to be used. And that's because there needs to be certain constraints for the problem. In this case, the constraint is on the specific values that we are sorting. We're only allowed to use bucket sort if we're guaranteed that all the values that we're sorting fit within a finite range. So in this case, the only values we have are zero, one and two. So we can say that the range of values is in between zero and two. Now, typically when you have an array of integers, usually the integer is bounded. It might be a 32 bit integer or a 64 bit integer. And therefore we would have a range like negative two to the power of 32 approximately, all the way to two to the power of 32 positive approximately. So we could say that that's our range, but that's a really, really big range and that usually doesn't qualify for bucket sort. We're talking about generally smaller ranges, but in this case, we have a very small range, zero to two. I mean, you could have a lot bigger zero to a hundred or maybe even a thousand or 10,000 or 100,000, right? Like that's still relatively small. Back to our example, we have this constraint so we can use bucket sort. So what are we going to do? Well, for every single value in our range, in this case, we have a zero, a one and a two, we're going to create a bucket for every single value. That's where the name bucket sort comes from. So for zero, we're going to have a bucket for one. We're going to have a bucket and for two, we're going to have a bucket. So I have an array of three values. This is where each of our buckets is going to go. So each of our buckets is basically a value itself. So for this array, we have indexes zero, one and two. So conveniently for us in this case, the zero is going to map to the zero index. One is going to map to the one index and two is going to map to the two. It's not necessarily always going to be the case like this. For example, instead of having a two here, we could have had, let's say, a four. In that case, we would just arbitrarily put the four in this spot. The point is for each of these values, we have it mapped to a position. And what this position is going to represent is the number of zeros that we have. So what we're going to do here is go through this array. Count how many zeros we have and then put that count over here. So we know we have two zeros, so two would go over here. Here, we're going to put how many ones are in the input array. We just have a single one. So we're going to put one here. How many twos are in the array? We have one, two, three. So we're going to put a three over here. Now, what would be the time complexity of counting all of these? You might think we have to pass through the input three times. One for each of our buckets, but we actually don't. Let me just quickly show you how we would actually go about counting these. It's mostly straightforward. We would have an index, right? Like I starting at the beginning of the array and actually just jumping into the code really quickly, assuming we're given an array like this, which, you know, looks exactly as our example and we have our counts. We're going to have an eye pointer going from the beginning of the array all the way to the end. And what we're going to do is take this value to, well, first, let's talk about how we're going to initialize our counts. Initially, we're going to say we have zero zeros. We have zero ones and we have zero twos. But as we iterate, we're going to get to the first value. We have a two. So what we want to do is increment our number of twos. We can do that by going to counts at index two and incrementing it by one because we know that, you know, two maps to the index two. That's convenient for us. It might not necessarily be the case, but we could have some kind of mapping. So we can increment this by one and we're going to continue to do that. We're going to go here. We're going to end up incrementing our number of ones. We're going to go here. We're going to increment our number of twos again. We have two twos. We're going to go here. We have one zero. Now we have two zeros. So increment the number of zeros to two. And then we have our last value, which is a two. So we increment the number of twos to three. So now just cleaning this up a little bit. After counting, we know we have two zeros, one, one and three twos. Now it's time for the second portion of the algorithm where we actually sort the array. So the reason we counted each of these values is because we know when we take this array and we sort it, of course, the zeros are going to go at the beginning, then the ones and then the twos are going to go at the end. We know the output is going to look something like this. So if we could just know how many zeros to put at the beginning, then how many ones to put and then how many twos to put. And of course, you know, maybe we would have some threes as well and have some fours, but in this case, we don't. But if we can do that, then we can fill this in without having to swap these values around every sorting algorithm we looked at so far would swap values. We have a zero. We know that's going to go at the beginning. So we swap it with the beginning, but bucket sort does not do that at all. By the time we get to this point where we have counted all of the values, we don't even need the input values at all. We don't even care what these are anymore. We're just going to start overriding them and we're going to do that with a couple of loops. So our outer loop is going to have a variable called N, which is going to iterate through every position in counts. So we're going to iterate from zero to counts dot length. So we're going to go from here to here to here. By the way, this is just pseudo code. It's not necessarily syntactically correct, but this is the main logic that any program in any language would follow. So we have N and then in the inner loop, we're going to have a second pointer. Let's call it J, which you can see J is not actually being used anywhere, but we're just using this pointer to iterate this many times. So let's forget about J. Let's just see how many times are we iterating? Well, zero to counts of N. So basically what I'm saying here is we're going to iterate this many times. At this point, our end pointer is here. So we're going to iterate two times. Now, what exactly are we going to do? We can see we have a third pointer I. This is important because I is going to tell us where to fill in the value. Initially, I is going to be at zero. Of course, we would want to fill in starting from the beginning. So we're going to fill this position with whatever N happens to be. We know N is the value zero is the value. So we're going to put zero over here. And how many times are we going to iterate over this loop? Well, whatever counts of N is two times. I hope it's starting to make sense now. We have two zeros. So we're going to put two zeros at the beginning of our array. So we're going to increment I now to be over here. We're going to put a zero over here. And then our inner loop is going to finish executing. And then we're going to go back to the outer loop. Now our N is going to be incremented. Our N is going to go over here now. So now how many times are we going to execute the inner loop? Well, N is at one. How many ones did we have? We only had a single one. So now with our eye pointer over here, we're going to fill in a one. And then we're going to increment our eye to be at the next position. And this loop only executed a single time. And then our N is shifted here. So we're going to put a two over here. We're going to increment our eye. We're going to put a two over here. We're going to increment our eye and we're going to put a two over here. We just ran that three times. So our inner loop is finished executing. Now we're going to go to the outer loop and end up shifting our N one more time. Now it's out of bounds. That's how you know we're done with the outer loop. We're done with the algorithm. Our array is sorted and we can return it. So now the question is, why is this algorithm big O of N? Because don't we have nested loops over here? Well, as I just showed you this nested loop portion, how many times is this nested loop portion actually going to run just because we have nested loops? Does not mean the time complexities N squared. We know that this eye pointer is going to be incremented every time the code actually runs and evaluates. And as I just showed you, our eye pointer is going to start at the beginning and then keep being incremented until it goes out of bounds. How could that possibly be N squared? Just because we have nested loops does not mean it's N squared because the first time the loop runs, it ran twice. Then the next time it ran once and the next time it ran three times. All of those are always going to total up to be N because, of course, we're only going to have N values. These are just the counts and then we're going to put the values into the array, which of course is of size N. So what I'm saying is this nested portion is big O of N time complexity. Of course, we did have to iterate through the input one time before that over here and the time complexity for that was also big O of N. Well, adding these together would make the time complexity big O of two times N, which we know reduces to be O of N because we don't care about constants. So the overall time complexity is big O of N. In this case, though, we are using a bit of extra memory. We're using an array, but what's the size of this array going to be? It's going to be whatever the range of values happens to be, the possible range of values. And we know in this case it's from zero to two, but it could be from some constants N to M, something like that. And as long as it's some fixed size that isn't like insanely large, we assume that this is a constant. So therefore the extra memory we're allocating in this case, you know, it's big O of three, but it's going to be some constant C. And we know constants reduce to big O of one space complexity. But I do want to mention that this is sort of a forbidden jutsu. It's very rare that you're going to actually use Bucket Sort. The only time you're probably going to end up using it is in an interview, to be honest. But there definitely are some practical applications, and maybe you'll come across them at some point in your career. Now, the last question for us to answer is Bucket Sort a stable algorithm? Well, the answer is pretty short. No. And I think in this case it's actually pretty clear why. When we overwrite the array, we're not even caring about, you know, which two came first because we just counted that there were three twos and they're all going to go in the last three positions. We don't care which one goes where. In fact, we're not even swapping the values anyway. We're just overwriting them. So it doesn't preserve the relative order at all. So Bucket Sort is definitely not a stable sorting algorithm. But if we wanted to, I mean, technically we could make Bucket Sort a stable sorting algorithm, because for example, as we count the twos, we could preserve that somehow. For example, if this was actually a linked list instead of an array, as we count the twos, instead of just counting them, we could have actually taken each two and added it to a linked list. So we would have a series of twos like this, where this one would have been the first one, this one would have been the second, this one would have been the last, and then when we finally build the result, we would have put them together something like this. Now, this isn't super important to understand. I'm just mentioning it anyway. It's pretty rare that Bucket Sort is used on anything other than an array anyway. So now lastly, let's compare our sorting algorithms. So Bucket Sort, which is generally an unstable sorting algorithm, runs in the worst case in big O of end time. But it's a little bit misleading. Remember, I said this is sort of a forbidden jutsu. Most likely you won't be able to run Bucket Sort on the input that you're given. But if you ever are given an input where Bucket Sort can be run, don't ever use any of these other algorithms, because they're not going to be as efficient as Bucket Sort. If you are given an input where the values are within some specified range, always use Bucket Sort. Otherwise, if you're given just a general input, most likely you will end up using either Merge Sort or Quick Sort. I think Merge Sort is generally more common. Next, let's look at the binary search algorithm, which is very related to sorting. And that's because binary search can only run on an input that is already in some type of sorted order, because binary search is sort of an algorithm that you might already know intuitively. If you're searching for something in the worst case, you would have to look at every single value in the input to find what you're looking for. And let's say you were looking for an A in this example, you wouldn't reach that until the last position. So you would have to look through all of these before you realized, well, the A was at the end. Now, only if you knew that the input was actually in sorted order, you could have done it more efficiently. And the idea is the same with dictionaries. When we have a really massive list of words, we put them in alphabetical order in a dictionary. So if you're looking for the word banana in the dictionary, let's say you go at the halfway point of the dictionary and we find words starting with the character J. At that point, where would we go? Would we go to the second half of the dictionary or the first half of the dictionary? Well, we know the character B starts before the character J. So then we would go to the first half of the dictionary and then look somewhere over there. The idea is the exact same behind binary search. Another example, if I told you to guess a number between one and 100 and I told you if you guess wrong, I'll tell you if your guess was too small or too big, of course what you would do is guess 50. So then if I tell you your guess is too small, then you know that the answer is somewhere between 50 and 100. But if I tell you your guess is too big, then you know the answer is between one and 49 or the third case is your guess, your first guess was already correct. The good thing about this guess is you can eliminate half of the possibilities and reduce your search space and you can't only do it once, you can repeat this operation. So if I told you your guess is too big and the real answer is between one and 49, what would you guess? Would you guess two? No, you wouldn't guess two because there's two possibilities. I tell you that your guess is too big. That's the good case because if two is too big, then that means the only answer left is one, so you found the answer. But what if I tell you your guess is too small? Then the answer is somewhere between three and 49. You didn't really narrow down much, you only narrowed down like two values. So that's why we guessed the halfway point. It's less risky because we know every single time we're gonna eliminate at least half of the possibilities. So in this case, we would guess something like 25 and maybe I tell you your guess is again too big so then we know that the real answer is somewhere between one and 24. We narrowed down half of the possibilities again. This is the idea behind binary search. Now it's most common to run binary search on some input, usually an array. And it's very important for the input to be in sorted order because if it's not in sorted order, then we can't run binary search because if it's not in sorted order, then the best we can do is just individually look at every single element and we know the time complexity for that is big O of N, but binary search is gonna be even more efficient. Let's continue. So in this example, our target is eight. We want to return the index that the value eight appears in the array. And if the target doesn't exist in the input and that's definitely a possibility sometimes, what we normally do is just return something like negative one to indicate, negative one isn't a valid index so that indicates that we weren't able to find a solution. So what we would wanna do is go somewhere close to the middle of our search space. In this case, we could either go here or here but what's the main algorithm that we use? Well, first we want to have our boundaries. Our boundaries tell us what is the portion of the array that we're currently considering. At the beginning, the value eight, the target could be anywhere in the input. So we have to consider the entire input array. So we have two pointers. I call them the left and right pointer but sometimes people will call it the low and the high pointer. Basically it's our two boundaries which make up our search space. So by taking our two boundaries and adding them together and then dividing that by two, we can get approximately the middle. So in this case, that's gonna be zero plus seven divided by two and usually we round down so this is gonna evaluate to three. So this is what we call our middle index. I'm gonna call it M for short. M is gonna be at index three. So taking a look at some pseudo code, this is what we've done so far. We've initialized our left pointer to zero and we've initialized our right pointer to be the length of the array minus one and then we start looping. We calculate our mid pointer between the left and right pointer dividing by two and then we wanna check the value at the mid pointer. We know there's three cases. Either our target is greater than this value meaning it's actually somewhere to the right or our target is less than this value meaning it's somewhere to the left and the third case which I've used as the else case is we've found the target. Cause if it's not, if the target isn't greater and it's not less than it must be equal at that point we're going to return mid which is gonna be the index. That's what we're trying to return the index. So in this case clearly our target eight is greater than four. So we know we should now start searching over here. So what should we do? Well, we have to update our search space to be over here. How do we do that? Well, our right pointer is already where it should be. So the only thing we have to do is take the left pointer and shift it over here and how am I gonna do that? Well, I'm just gonna take mid plus one because we know this is not the solution and we know that everything to the left of it is also not going to be the target because the input is in sorted order. We can only assume that because this is sorted. So we're gonna eliminate all of these from consideration. And now we're gonna go back up to our loop and let's just assume that this mid is gonna be changed now as well. So this is the case that we ended up executing. Target was greater so we updated the left pointer. And actually to make things a little bit more interesting I'm gonna change the target instead of being eight. I'm gonna change it to be five. So let's assume our target is five. An initial part of the code would have executed the same regardless of whether it was eight or five because five is also greater than four. We would have still executed that. But continuing, we're gonna now calculate mid again because our loop is gonna execute. I'll touch on this condition a little bit later on but our mid now is going to be four plus seven divided by two, that's 11 divided by two. We know we round down. So we're going to get five as our mid. So let's put mid over here at index five. Now let's compare six to our target. Is our target greater than six? Is five greater than six? Nope, so we don't execute that. Is our target less than six? Yeah, our target is less than six. Five is less than six. So now we're gonna execute this portion of the code. So now what we would logically wanna do at this point is since our target is not six and it's less than six we know it's gonna be somewhere to the left. So would that mean it's somewhere over here? Well, we already eliminated all of these from consideration. So yeah, it's somewhere to the left but it's not this, not anywhere beyond this point. So how we're gonna handle that is by taking our right pointer and moving it to be at mid minus one. It's to the left of mid. So this is where our right pointer is gonna go. And you can see our left pointer is also at that position. What that means is our search space was reduced to a single value. That's good because that means we're either close to finding our result or our result doesn't exist in the input. By the way, if we're searching to the left that means we removed all of these from consideration. So now finally we're gonna execute the loop one more time. We're gonna calculate our mid left and right are both four. So we're gonna say four plus four divided by two. That's of course gonna be four. And that makes sense because this is the only position in our search space. So now we're gonna check, is our target greater than five? Nope, is our target less than five? Nope, that means the else is gonna execute. That means our target, which is not greater than five or less than five must be equal to five. So therefore we found the solution. We can return the index mid, which is four. So we return four. We found the solution. Now it's possible we would not have found a solution. Let's look at a slightly different example. Let's say our target was actually nine. We would have still gotten to this point. Our mid would have been over here and we would have executed the loop again. Is our target greater than mid, which is six? Yes, it is. So then we're gonna execute this portion. We want to then search to the right. If our target is greater than six it must be somewhere over here. So we update our pointer left to be over here and we remove these from our search space. So next we're gonna execute the loop again and the condition by the way for us continuing to execute the loop is if our left pointer is less than or equal to our right pointer. That basically means we have some existent search space that we're searching over. Even if left and right are equal that means we have one element remaining. And if left is less than right that means we have multiple elements remaining for us to search over. So now we calculate mid again. Six plus seven divided by two rounded down is gonna be six. We're gonna check is our target greater than seven? Yes, it is. And this is where our mid was. So we're gonna take left and set it to mid plus one which is gonna be over here. So we remove this from our search space. Our left pointer is over here as well. Now we check are we executing the loop again? Well left and right are equal. Therefore left is less than or equal to right. We calculate mid again. It's gonna be seven plus seven divided by two. That's of course seven. So this is where our mid is going to be. Now is our target greater than the value here? Yes, it is once again. So what are we gonna do? The algorithm is gonna say set left equal to mid plus one. So now left is gonna be over here. So basically it's gone out of bounds. And then we try to execute the loop again. But at this point, left is definitely not less than or equal to right. And by the way, our search space is empty. So this is always going to execute. Left is not gonna be less than or equal to right anymore after our search space is empty. And that means we did not find the target that we were looking for. So we're gonna end up returning negative one. This would have executed the exact same way if instead of our target being too large was actually too small. So maybe it's zero. We would have crossed all of these out and our left pointer would have been here and our right pointer would have gone right over here. Or what would have happened maybe if our target was actually equal to two. It's not too small and it's not too big. It just doesn't happen to exist in this range. Well, to quickly simulate, first we would check mid. It's not the target. We would cross all these out. Our target must be somewhere over here. So among these, we approximate the middle. We check, is this equal to the target? No, the target is too small. So it's not this or this. Then our right pointer would be over here. And we would check, is this equal to two? No, our target is greater than this value. So then we would take our left pointer and then move it to the right by one. And then at this point, we would say, well, our left pointer is not less than or equal to the right pointer and we don't have any elements remaining to search. So we return negative one. That's the general simulation. That's the idea behind binary search. It can seem complicated at first, but once you get used to it, it's not too bad. But what about the time complexity? Why do we care about binary search at all? We could just scan through the input and find the target in O of N time. Isn't that good enough? Well, with binary search, with every iteration of the loop, we are eliminating at least half of the search space with one execution of the loop. And then we eliminate half of the remaining values. And then we eliminate half of the remaining values. Again, until we have a single value. And then we eliminate that as well. So what's the math formula for what we're doing here? When we eliminate half, aren't we taking the length of the input? Let's say the length is N and we're dividing it by two again and again and again until it's equal to one. We talked about that earlier with merge sort. How many times can we divide a value by two until it's equal to one? Because that's gonna tell us how many times this loop executes. Well, we talked about it earlier. The formula for that is actually log base two of N and usually we don't even mention the base two. So this is log N. That's the time complexity of binary search. That's how many times the while loop is gonna execute. Inside the while loop, we're just doing a constant time operation. So we don't care too much about that. But this is the worst case runtime and we know log N is much, much more efficient than big O of N. And in terms of memory complexity, we're just allocating a few pointers. We're not allocating any additional data structures or arrays or anything of variable length. So the memory complexity is big O of one. There's a slight variation of the binary search algorithm that I wanted to talk about because it can come up in interview problems and it's actually pretty simple to understand once you know it. It goes back to the idea I was talking about earlier where let's say we were guessing a number between one and a hundred. In this case, we're not necessarily given an array of all the numbers from one to a hundred in sorted order. That's not the problem that we're given. We're just given a range of values. And in this case, we need to guess the correct number. So we're in this case, we're not actually given a target either where we're given some target, let's say 10 and we have to find the index of that. That's not the problem either. We're not given a target. Our goal in this case is to search along this space of values, this range of values and find that correct value where that end value, whatever happens to be satisfies the conditions. And in this case, that would mean that N is the secret number that you're looking to guess. But in other problems, it might be something different. In this case, it's just a guess. There's nothing special we have to do with that end value. But what it could be is we're searching this range of values and we need to check if N satisfies some requirements. So let's say we have some kind of dummy function which calculates if N is correct, it is the value that we're looking for. And this function could have anything inside of it. It could be some computation. It could be any kind of algorithm, but that's the idea here. But for in our original binary search, we were just checking that value. We were checking N. We were checking if it's equal to the target. So this is sort of a more general case. For the original case we were looking at are is correct would basically just check is N equal to the target, but it could get a lot more complex than that. So this is the general template that you might end up using. That's what I'm gonna be talking about now. And it's very, very similar to the original binary search. I just think it throws some people off guard because they've never seen it before, which is why I'm showing it to you now. So this is a sort of template we can use to solve these range style binary search questions. Now this is Python code, but we're mainly focused on the main ideas of the algorithm, not the exact syntax. Now for this binary search, we're gonna be given a range of values, let's say, and we're given a lower bound and an upper bound. We're not given an array. Notice that. And the rest of the binary search is gonna be pretty similar. While our low is less than or equal to our high, we're gonna perform binary search. We're trying to guess the number between one and a hundred. We're gonna calculate the midway point because we know we wanna eliminate half of the possibilities every single time. So one plus a hundred divided by two is gonna give us a value of 50, somewhere in the middle. Now, since we're not given a target, how do we know if this is correct or not? Well, it's gonna depend on what type of problem you're solving. But either way, let's have a helper function that's gonna decide it for us. In this case, I'm calling it is correct. We pass in a single number and then this function is gonna somehow compute for us if our guess is correct. And if it's not correct, we do need to know was our guess too big or was it too small? So that's for this specific example, but in other types of problems, this helper function could have some different type of computation. In this case though, you can see I've hard coded it so that if n is greater than 10, we're gonna return one, which indicates that our guess is too big. If n is less than 10, I'm gonna return negative one, which indicates that our guess is too small. But if we execute the else case, that means n is exactly equal to 10, so we return zero. That means our guess was correct. So in this case, I've hard coded it such that 10 is the target value that we're looking for. But of course, there could be different types of more complex computations that we do. So in this case, for our guess 50, it's too big, we're gonna return positive one. So our is correct is going to be greater than zero. So then we know our guess was too big, so we're gonna update our high value to be mid minus one. So our guess, which was mid, is 50. So now we're gonna update high to be 50 minus one, which is 49. We eliminated half of the possibilities. And now we go back up to the loop. Low is less than or equal to high. Let's recalculate the midway point one plus 49 divided by two. That's gonna give us a guess of 25. We're gonna call is correct. And in this case, our guess 25 is greater than 10. So we're gonna return positive one. This part is gonna execute again. So that means our guess was too big. We have to update our high once again. Our high is 49, we're gonna set it to be mid minus one. So it's gonna be now 25 minus one, which is 24. And repeat, take these two, add them together, divide by two, our guess is going to be 12. We're gonna call is correct. 12 is still greater than 10. So our guess was too high once again. We're gonna update high to be 12 minus one. It's gonna be 11. So now our high is 11. We re-execute the loop, add these together, divide by two, we get six. As our mid, we check is six correct, is six greater than 10? Nope, is it less than 10? Exactly, so we're gonna return negative one. So which one of these two is gonna execute? The bottom one because is correct is less than one. Because is correct is less than zero. So now we're actually gonna update our low pointer because our guess was too small. We're gonna update low to be mid plus one. So our low is gonna be six plus one, which is seven. And at this point, you can see how the algorithm is gonna progress. Add these two together, divide by two. Our mid is going to be nine. Is that too big? Nope, is it too small? Yup, so we're gonna take our low, set it to mid plus one. So our low is gonna be 10 now. So now finally, we're gonna take these two, add them together, our guess is going to be 10 plus 11 divided by two. So that is gonna round down to be 10. So our guess is 10, we're gonna check is correct. Is it greater than 10? Nope, is it less than 10? Nope, it's exactly 10, so we're gonna return zero. This isn't gonna execute, this isn't gonna execute. Our else is going to execute. So we're gonna return 10. That was the result we were looking for. We weren't looking for the index. We were just looking for what is the correct value. It turns out to be 10. So of course, this is very simple, straightforward binary search, just that we're not given an array. But the time complexity is still going to be the same. So the big O time complexity is still going to be log N. But in this case, what is N? Before N was the size of the array that we were given. In this case, N is just the range of values. So since we're going from one through a hundred, the N is basically a hundred in this case. If it was one through a thousand, it would be a thousand. It's the range of how many values we're running binary search on. I hope this gave you a decent understanding of how binary search can have multiple variations and how in general this method could be extended to do a bunch of different things while keeping the core binary search template still the same. Now let's move on to probably the most complicated data structure we've covered so far, which is binary trees. We're actually going to be using pretty much every topic we covered so far to understand binary trees because there's a lot of complexity to them. To start with, we're just going to understand the structure of binary trees. They're similar to linked lists in that we have nodes which have some kind of value. I mean, it could be a character, it could be an integer, or it could be a string, it could be an object, there could be anything contained in the value of this node. To keep it simple, let's assume we have integers and let's say we have the integer two here. And we know that these nodes are no fun unless we have some pointers to connect them together. With binary trees, as the name implies binary, which means two, these nodes are going to have two pointers. But with linked lists, we had some next pointers and previous pointers for doubly linked lists. So it's kind of similar to that in a sense. It's similar to a doubly linked list node. But in this case, we draw the pointers typically down. And that's because there are some relationships with these nodes. We call this the left pointer and the right pointer. And when we have a pointer like this, that's connecting it to another node, let's say a one value here, that node is also gonna, of course, have a left and right pointer. Now these nodes have a relationship. It's not like a linked list where nodes are just connected left and right. We say the relationship between these nodes is that this node is the left child of this node. So there's a relationship called the child relationship. This is a child of the parent node. So this node is considered the parent node of this node. But as I said, since this is the left child, this node could also have a right child. Let's say the value is three. This node is also a child of this node and this node also has the same parent as this node. Both of these nodes have the same parent node. And what about these two nodes now? Well, this node does not have a left child as of now and it does not have a right child either. We can assume that these pointers are pointing to some default value of null. They're not actually pointing at any real nodes. And when we have nodes like these that don't have any children, we call these leaf nodes. So this would be a leaf node and this would also be a leaf node. But what if for this node we actually did have a child over here? Let's say it's four. Then this node would no longer be a leaf node. This node would be a leaf node. It doesn't have any children but its parent node over here does have a left child. So it's not a leaf node. With binary trees, we're always guaranteed to have leaf nodes. If we start at the top of the binary tree, which by the way is called the root of the binary tree. The top node is always called the root node whereas the bottom nodes are called leaf nodes. And these are terms that come from trees in general. Trees have roots, trees also have leaves. But the interesting thing here is you might have noticed the root goes at the top and the leaves go at the bottom, which is kind of the opposite of a regular tree where you know, let's say you had a tree that looks something like this where the roots would be at the bottom of the tree and the leaves would be at the top of the tree but we know that that's not the case here. So that's just an interesting point about trees. We kind of draw them upside down but these are important terms that are used with binary trees. Of course every tree is only gonna have a single root node but it could have multiple leaf nodes and it's guaranteed to have some leaf nodes but why is that the case? I mean, what if I had from every single leaf node one of the pointers just pointed back up to the root? I mean, we had that with linked lists, right? In some linked lists I talked about you could have a cycle but when it comes to binary trees we are not allowed to have cycles. That's part of the definition of a binary tree we're not allowed to have cycles. So since binary trees can't have cycles that's the same reason why two adjacent nodes like these can't be connected. We can't take the right pointer of this and then connect it to that. That would sort of form a cycle. Now technically this here does not form a cycle because each of these pointers connecting nodes can only be one way. We can only go along it in one direction but the definition of a binary tree we say that it doesn't form any cycles basically assuming that these pointers are not directed. Assuming we could go both ways there should not be any cycle in the graph. So therefore we can't connect these two nodes together. By the way, nodes like these that have the same parent are considered sibling nodes because they share the same parent so that's where this word comes from sibling like brother or sister. And I guess it's hardly worth mentioning but all the nodes in a binary tree have to be connected together. We can't just have some random node over here which isn't connected to the rest of the binary tree and then call this one cohesive binary tree. It's not because every node has to be connected. We can put the node here where it's actually connected to another node but we can't just have it floating randomly. All nodes have to be connected to each other. Now there's another property of a node in a binary tree which is called the height property and it's mostly straightforward. First of all, we would measure a single node. We can say the height of a single node like this one is one. We say that because there's just a single node so we call it to be one. The height of this is one. It doesn't have any children but for a node like this one where it does have some children we measure the height starting from this node going down to the lowest descendant of the node. This is another keyword. Descendant basically means for some node such as this one a descendant of this node would basically be any child or any other node that comes beneath that one. So basically for the root node over here every node in the tree is a descendant of that node and it's also worth mentioning that another keyword used is ancestor which is straightforward just like descendant. It would mean that this node, any node that came basically in the parent chain of this node. So this is the parent node. This is sort of the grandparent and basically any node going up that chain would be considered an ancestor of the original node. This node doesn't really have any ancestors cause it doesn't have any parents but it does have a bunch of descendants. But going back to what we were talking about the height of this node is starting from this node going to its lowest descendant which in this case it only has a single descendant. What's the length of the path? Or in other words, what's the number of nodes starting from here going to its lowest descendant? Well, we can count two nodes. So we say the height of this node is equal to two. Can you tell me what would be the height of this node? Well, we could go down to the right this forms a path of two. That's the longest path we can form going on the right side. On the left side, we can form this path which has three nodes in it. Clearly three is bigger than two. So we say the height of this node is equal to three. I do want to mention that in some cases people measure the height of a single node to be zero. So that would mean the height of this node is equal to zero. That would mean the height of this node which we said was two would actually be measured to be one. I don't prefer that way of calculating the height but I just want to mention that sometimes in some textbooks and courses people do that. So don't be too surprised if you see it but the important thing here is whether we start at zero or start at one, the important thing is how you would actually calculate it. And I've illustrated to you that starting at the node it would base the height would basically be measured by finding the longest path going down. We have the opposite concept of height which is depth. So what would be the depth of the root node? Well, it's measured as the path from here to the root node and this is the root node. So we could say that the depth of this node is one based on how I like to calculate it but some people would say that the depth is zero. What about for this node? What's the depth of this node? Well, what's the path from here up to the root node? Well, it contains two nodes. So I would say that the depth of this is two but some people instead of counting the nodes they would count the pointers connecting the nodes together and in that case you could call it to be one. What's the depth of this node? Again, it's gonna be the way I calculated it. It would be two because we have two nodes here. What about the depth of this? Well, it's gonna be one, two, three. So the depth of that node is three because that's the path from this node up to the root. So these are a lot of the key concepts that can come up when talking about binary trees. So I wanted to introduce you to some of the basics. So it wouldn't be complete if we didn't at least talk a little bit about the code. For now I'm just gonna cover the general structure of a tree node. It's pretty similar to a linked list node where we have a few properties. One is gonna be the value. In this case it's an integer but it could be anything else. And also we're gonna have a couple pointers which could be different based on the language that you're using but in general they're gonna point at a couple tree nodes the left and right child. Initially they could be set to null and to add a child node you would simply reassign this to the child tree node. And generally when you create a tree node you want to at the very least initialize at whatever the value happens to be. So the value for this tree node of course is two. The left pointer is pointing at this node the right pointer is pointing at this node. So now we have a decent understanding of the basics of binary trees. Next let's talk about a special type of binary tree which is called a binary search tree or BST for short. Binary search trees are pretty much exactly like binary trees except they have a certain sorted property to them. They're not literally sorted like an array would be but they have a sorted property which actually allows us to use them similar to how we used a sorted array especially when we were doing that binary search algorithm which is sort of why these are called binary search trees. So what is that sorted property? Well it's very simple for every single node in the tree the guarantee of BSTs is that every single node in the left subtree we call this a subtree. This entire thing is the BST but this is a subtree. Everything in the left subtree has to be less than the root value and every single node in the right subtree has to be greater than the root value. Now what about the case where there's equal values? What if instead of this being a three it was actually a two? Well generally speaking binary search trees do not contain duplicates. So we don't have to worry about that case. So when we're talking about the left subtree having values that are strictly less than two and we're talking about values in the right subtree having values that are strictly greater than two but what about this tree? Is this a binary search tree? Well every value in the right subtree is greater three is greater than two. What about the left subtree? One is less than two. Four is not less than two. Four is in the wrong spot. Can you tell me where the four should actually go? Well first of all it has to go in the right subtree. Okay but where should we insert it now? Now we're at the three node. I mean I guess the four could either go here or it could go here. Let's put it over here. Does this satisfy the requirement of a binary search tree? Well four is greater than two and it's in the right subtree. Should that be enough? No it's not because the definition of a binary search tree is actually recursive which means that the sorted property is not only true for the root node it's true for every single node in the tree. In other words it applies not only to the tree but also to every single subtree as well. So we call that recursive because from the perspective of this node this is its own tree. We call it a subtree because we know it does have a parent but this is essentially the exact same as any other binary search tree. It should have the same properties. It has to be sorted. So when we look at this node we check are all the values in the right subtree greater than three? Well it doesn't have any right children. What about the left subtree? Are all values here less than three? It's not true. Four is not less than three. So we can't put the four over here. We have to put it over here. Now you might be wondering why? Why do we care about having a sorted property with this data structure? What does it achieve for us? Well pretty much exactly the same thing as having a sorted array with a sorted array when we need to search for a value we can do it in log n time. If the array was not sorted we would have to look at every individual element so we would have to search for a value that we're looking for in O of n time but with sorting we can do it more efficiently. The exact same thing is true for binary search trees. If it didn't have a sorted property we would have to look at every single node to find some target value but with the sorted property we don't. We can follow the same logic as binary search. Suppose we're looking for a target value of five in this case. We start at the root. We're looking for five. Is this five? No, it's two. Okay so we didn't find the target but is our target greater than two or is it less than two? Well it's five so it's greater than two. So this wasn't the target but now tell me which direction are we gonna look? Are we gonna go left or are we gonna go right? Well we know with binary search trees all values greater than two are gonna be on the right side so if five exists it must be somewhere over here. So by making that comparison we not only eliminated this value but we also eliminated everything in the left subtree. In this case it's only one value but it could have been a bunch of nodes and by doing a single comparison we eliminated half roughly half of the possibilities and then we would continue. Now we would look at the right child here and we would do the same exact comparison. This is not equal to five but is five greater than three or less than three? It's greater than three so we're gonna go look to the right. We would have eliminated anything on the left but there isn't anything here anyway. So now we're at pretty much the last node. Is this equal to five? It's not. Five is greater than four so we're gonna go down to the right but at this point we reach null. There's no more nodes left that are greater than four. So basically we did not find the target that we were looking for so let's say we were returning true or false whether we find the target. In this case we didn't find the target so maybe we would return false. If we do find the target we could return true. So that's the idea and it's pretty simple when you're looking at a picture for how to perform this search algorithm. Now let's take a look at the code. Since binary trees are recursive in nature meaning a subtree has the exact same structure as the entire tree we can expect to use a recursive algorithm to traverse it and to search for values in it. Now it's not required to use recursion actually there are other ways to do this but the easiest way by far is recursion once you can get familiar with it which is why we talked about recursion earlier in the course. Now even though this data structure has two branches right it has left and right children to search along a binary tree is actually one branch recursion. The reason is when we get to a node we do our comparison based on whatever the target value happens to be and we only go in one direction we're gonna recursively either go along the left subtree or the right subtree there's no need to do both with binary search trees that's the efficient part. Let's suppose our target was three in this case you can see the search function is gonna take the root of some binary tree and some target value we're gonna search for that target value and let's start with the base case which is the simplest one but it's the one that we're gonna reach lastly of course. We saw in the previous example where we were searching for a five we started here then we got here and then we got here and then we went to the right child of this node but that's when we reached the null case. So in general if the root node is equal to null we're gonna return false because that means we ran out of nodes to search and we didn't find the target so we can return false but if that's not the case that means we have a valid node let's assume we started at the root we're at this node currently we're gonna do a few comparisons that might look very similar to the binary search comparisons we were doing earlier we're gonna check is our target greater than the root value or is it less than the root value or if it's not greater and it's not less that must mean it's equal so we can return true so in this case our target is three which is greater than two so what we're gonna do is recursively call search but where are we gonna do it? Are we gonna go to the left node or the right node or are we gonna stay at the root node? We're gonna go to the right node so we're gonna pass in as the node root dot right we're gonna run search on this child node so this is the recursion part this is the sub problem that we're gonna run search on first we started with the whole tree now we know it's not that but this is the sub problem now so now being at this node we know that this is not a null node we're still looking for three is our target greater than three? Nope, is it less than three? Nope, it's exactly equal so at this point we return true but when we return true this return call is gonna go back up to the parent because remember we're inside of a recursive call so where did the parent call search on the child? Well we remember it was this conditional where we called search on this node now we're gonna return whatever the result of that was whether it was true or false in this case it was true so we're gonna end up returning true from here which means we're returning true from here that means this entire tree does contain this target value so what's the time complexity of performing this search? Well just like binary search it's going to be log n that's the benefit but the asterisk with that is that it's only gonna be log n if we have a binary tree that's roughly balanced because our assumption is when we get to a node and we choose either to go left or go right as we do that we're eliminating half of the possibilities every single time half of the remaining possibilities so from here we would eliminate half of the possibilities and continue doing that but we can only do that if the tree is roughly balanced when we say balanced that means for every single subtree including the root tree the heights of the left and right subtrees are roughly equal meaning that they differ by maybe one that would be a perfectly balanced tree in this case this tree is balanced because the height of the left subtree here is one the height of the right subtree is two so they differ by only one what about the left and right subtree of this node? Well it's zero here and zero here so they differ by zero what about this node? The height of the left subtree is zero the height of the right subtree is one so the difference is one and lastly for this leaf node again the heights of the left and right subtrees are zero so the difference is zero so this is a balanced tree but we might not have a balanced tree what if we didn't have this node and we just had a big string of nodes going like this well this is essentially a linked list and we could have values like five and six while this is sorted the way our binary search is gonna run we can't just go halfway in between all of these nodes we're gonna start here say okay our target is too big go here maybe our target is too big again and it could just keep going like that and in this case clearly we're not eliminating half of the possibilities every single time so if the tree is like this where it's not balanced the search algorithm in the worst case would be big O of N but the reason we have binary search trees is to have balanced binary search trees and it's usually assumed that if you're working with one you do have a balanced tree which would usually be log N but in terms of technicalities a lot of people will say the time complexity of running search is going to be big O of H where H is the height of the tree which makes sense because for an imbalanced BST the height would be N because we just have a string of nodes like this but for a balanced tree the height would be log N for similar reasons we talked about earlier in the course especially when it comes to merge sort right we know this is being multiplied by two every single time so then the height of that would be log N so this was definitely a lot to take in but there's one last question I wanna answer for you and that is why do we have binary search trees if we can already have sorted arrays if we already have sorted arrays which we can run binary search on we can already search for values in log N time so why create all this complexity to create a complicated data structure just to achieve the exact same thing well the downside of having sorted arrays is if you want to add values to the array and remove values from the array and keep the sorted property of that well we know removing from an array in the worst case is gonna be big O of N because even if we can find the value really quickly we still have to then maybe shift over every single value in the array and if we wanna add a value like if we wanted to add a value over here then we would have to take all of these and then shift them over by one so when it comes to inserting and deleting from an array it's always gonna be big O of N whether it's sorted or not but that's not true for binary search trees with binary search trees inserting values and deleting values can also be log N that's what we're gonna talk about next and that's the main benefit of binary search trees so we talked about how the main benefit of binary search trees over sorted arrays is that we can insert and remove in log N time assuming the tree is roughly balanced in which case the height of the tree would be roughly equal to log N so inserting is going to basically traverse the height of the tree let me show you how it's gonna work let's assume we call insert on our tree over here and assuming this is the root let's say we call insert on the root node of this tree with the value six that means we're gonna insert a new node into this tree with the value six so we start at the root really we just need to find which position we're going to insert the node at now technically there are two ways we could do this there are two valid BSTs that could be the result we could have a six as the new root node and then have a four as the left child of the six because remember all values in the left subtree should be less than the root node all values in the right subtree should be greater than the root node this satisfies that or we could have this where six is actually the child now we prefer to do it this way because it's easier to insert a node as a leaf node in this case we can do it just like this and you'll see what I mean as we add even more nodes into the tree but it's usually easier to add nodes as leaf nodes so what we're gonna do here is we're gonna start at the root and we're gonna keep traversing the tree pretty much the exact same as we did when we were searching for a value so essentially we're gonna be searching for six but we know duplicates don't typically exist in a binary search tree and if they did in this case we wouldn't wanna insert another six anyway but at this point we're gonna compare is six greater than four or is it less than four it's greater so we're gonna go down to the right now we reach null there's no more nodes here anyway so what we're gonna do is essentially create a new node over here with the value six and then we would take that node and return it up to the parent and what the parent would essentially do is reassign its right pointer equal to the result that we returned we'll take a closer look at that when we look at the code but for now let's insert a few more nodes next let's insert a five so we start at the root again we compare is five greater than this or less than it's greater so we're gonna go down to the right subtree we have another node is five greater than this or less than this it's less than it so we go to the left subtree and there's nothing there so now we can actually create that node with the value five and of course we're gonna end up returning to the parent where the parent is gonna assign its left pointer now to this node that's how we're gonna connect them together this example does a better job of illustrating why we insert at leaf positions rather than in the middle because here if now we wanted to add a five possibly as the root node what we'd have to do is create the five connect its right pointer to six and connect its left pointer over to four now this is actually the preferred way of doing this because notice how this tree is more balanced than this one but there are implementations where you can implement a balanced binary search tree where the insert and remove operations result in a balanced tree there's a couple advanced data structures that can implement that one is called the AVL tree which I cover in my advanced algorithms course but for now we're just gonna focus on the simple cases of just inserting and removing because removing is actually gonna be a little more complicated than you think so taking a quick look at the code you can see it is very similar to searching where we're passed in a node which is the root of the tree and we're passed in a value instead of searching though we're gonna be inserting this value the base case is again if we reach a null node but instead of returning null we're actually in that case going to create the node and then return that node so suppose you know instead of this tree we actually had an empty tree like for root we were passed in null then this insert would actually return the new tree after creating a single node so something like this where we were pointing at null we call this we end up creating a node with some value and then return that node but normally we'd be inserting into an existing tree if we don't reach the base case suppose we were actually inserting the seven right now so let's say our pointer is here in this case we're not at null we check is our value greater than the root value it is so we call insert on the right subtree with the exact same value we're still inserting a seven but now we are gonna check in the right subtree and then once we've done that insertion we're gonna end up returning that tree with the seven inserted and then that tree is going to be assigned to the right pointer so we'll have to see what happens with that but for now our right pointer is gonna be down over here we're in a recursive call now so we called the function again we're not in the base case this is not null we check is our value greater than six it is so we execute this again so we're gonna call insert on the right subtree over here and then return the result of that and assign it to the right pointer of this node so here we actually do reach the base case we're in another recursive call we're at a null node we create a node with the value seven it's not connected to anything just yet but then we return it and remember this is where we called insert from when we were at this node so we're gonna take that node that's what we returned and then assign it to the right pointer of this node so I'm just gonna redraw this over here and now we're at the perspective of this sixth node and this if just executed so now we're done with that and the else if is also not gonna execute now only one of these is gonna execute and then we're gonna return the node which we're currently at that's what root is at the current node that we're at so this six we're gonna return the node itself and then assign it to the right child of this node because that's where that recursive call was made as well but we know that that assignment already exists so in this case it didn't do anything but if we ended up inserting the seven right over here that's where that would have been useful so that's the idea of inserting we didn't cover the else if case which is basically if we were inserting in the left subtree so for example if we were inserting the value two we would start at the root we're not at null a two is not greater than four but it's less than four so we'd call insert then we'd make another recursive call execute the base case create a node two return that node and then over here instead of assigning it to the right pointer we'd assign it to the left pointer of four and after that we would return the root this node so that's basically how it works pretty similar to searching now the time complexity is also the exact same as searching because notice how we're not as we iterate through the tree we're either gonna go left or go right we're not gonna go both ways so we're basically in the worst case gonna go visit one node for every level in the tree as we keep going down so again the time complexity is gonna be the height of the tree which usually is log n if we have a balanced tree so now let's talk about the more difficult algorithm which is removing a node but before we even do that let's talk about how we can find the minimum of the tree because we're actually gonna need that in our remove function but the good thing is that finding the minimum is pretty straightforward just like with an array, a sorted array the smallest element is not gonna be in the middle or at the right it's gonna be all the way at the left assuming we have it sorted in ascending order the same is true for a tree we know the recursive property all values greater are gonna go to the right side all values less are gonna go to the left so of course we're gonna go to the left and from here the same property is true all values greater and the right all values less are in the left so if we keep going down to the left we will arrive at the minimum so to take a quick look at the code we get something like this at least in Python but obviously it would be similar in most languages we're passed in some node we wanna find the minimum node we could return the value itself or we could return the node in this case we're gonna return the node because that's what we're gonna need in our remove function but you're given a node let's assign some variable to that node we don't have to do that we could use the parameter variable itself but either way we have some pointers suppose at the root node while our pointer is not null and the left pointer is not null we are gonna go to the left so from here we satisfy that condition we go down to the left here we satisfy that condition we go down to the left now in this case current is non-null but the left pointer is null so we're not gonna continue that means there aren't any nodes that are smaller so here we're gonna stop and here we're gonna return you can see that this isn't even a recursive function because there's no need for it to be recursive because as we go in one direction we're essentially traversing a linked list and we know that it doesn't take recursion to do that so when it actually comes to removing a node we break it up into two cases the easy case is case one where the node that we're removing either has zero children like this node or this node or this node or it has just one child for example this node only has a left child the more difficult case is when we're removing a node that actually has two children for example this node has two children and this node also has two children and its children even have children so that could definitely get complicated but first let's start with the simple case so let's say we're removing the value two the call would be similar to the insert call where we're given the root of some tree and some value that we're looking for to remove now in this case it's also gonna be similar to searching because first to remove the two value which is over here we have to search for it we don't have to look through every node because we know we have a sorted property in the BST so we start at the root is two greater than four or less than four it's less so we go down to the left is two greater than three or less it's less so we go down to the left is two greater than three or less it's not either it's actually exactly equals so we at least found the node now how do we remove it well similar to the insert call from this subtree we're expecting to remove the two node and then we're gonna return to the parent the new tree after we removed the two node it could be here it could be somewhere down here or somewhere down here we know in this case it's at this node so after we remove this node what would this subtree look like it would basically be null so that's what we're gonna return to the parent we're gonna return null up to the parent and then the parent node three is gonna assign its left pointer to be at null so this node is essentially not gonna exist anymore it's not gonna be connected to the tree anymore now that was really simple because this note did not have any children how about instead of removing the two we actually remove the three this is still a case one because three only has a single child searching is gonna go the same we start at four three is less than four we go here this is three so now what do we return we can't return null because if we return null then that's gonna disconnect the three from the four but it's also gonna disconnect the two so what we're gonna do is check is one of my pointers null is the left pointer null or is the right pointer null in this case the right pointer is null that means we have at most one child so we can execute case one so since the right pointer is null we're going to return the left child here what that means is when we were we're basically we're not gonna return this we're gonna essentially remove it we're gonna return the left child here so then when we get back up to the parent from four instead of having the pointer pointing at three this pointer is actually gonna now point at the return value which was this node so that's essentially how we remove this node from the tree and by the way if this node actually didn't have a left child either the code would execute the same we would check we don't have a right pointer okay then return the left pointer which in this case happens to be null so then we would return null back up to this parent and then it would basically not have a left child anymore that's the case if this node actually didn't exist so you can see it works out the exact same whether we have zero children or one child now for the difficult case where we are removing something like six where it has two children so it's gonna start the same we start at four six is greater than four we go down to the right we arrive at six now we check is our right pointer null nope is our left pointer null nope so we don't have the simple case so if we remove this node it's gonna disconnect everything from the four it's basically gonna delete the entire subtree which is not what we wanna do so what we can do to fix this the easiest thing we can actually do is replace this node with one of its children or one of its descendants in this case we can replace it with either node we can replace it with five or seven so what we would do is take the six here and replace it suppose with seven from over here and then what we would do is call recursively call our remove function on the right subtree instead of six though we would replace we would remove seven and then once that removal is finished assuming this node doesn't exist anymore this is what our tree would look like where the six is now replaced with a seven the six is removed from the tree now so that's the idea and it looks really simple in this case actually because the children of this node are leave nodes we can just move one in here but it would be a little more complicated if we were removing a four let's look at that case with four we can try the same idea replace this with a value from its right subtree naively let's try six okay we put the six over here but now we have to replace the six with something it works in that the six is greater than the four therefore it's gonna be greater than everything in the left subtree of four so six can be over here but five can't be in the right subtree of six because five is less than six that's what complicates this that's why when we replace the four value here it's best to replace it with a leaf node but which leaf node should we replace it with well if we're gonna replace the four with some value from the right subtree should we pick the largest value in that subtree or the smallest well we have to choose the smallest value we can take this five and put it over here and this is still a valid binary search tree because the five was the smallest value so now if it becomes the parent of all these nodes it's gonna satisfy the sorted property because all these nodes are greater than five so what we can do here is take to replace this value we can take the smallest value in the right subtree and do that now technically you could also take the largest value from the left subtree and replace it with that and that would still work as well if we take the largest value here which in this case is three and put it over here we know the sorted property is still gonna be valid because we know since three was originally less than four it's also gonna be less than all of these values and since three was the largest in the left subtree it's gonna be greater than all of these values now I'm gonna decide to do it with the right subtree because it doesn't really matter but you can see where this is coming right we wrote our find min function so that we could find the minimum in the right subtree we also could have written a find max node function to find the max in the left subtree now if finding the minimum is as easy as going all the way left finding the maximum is as easy as going all the way to the right so they're both easy algorithms to implement so now to finally look at the code let's actually walk through this with the example of removing for you you can see we have our find min node that's the exact same now with remove we do have several cases let's say we're passed in the root node and the value is four we get here we're not gonna execute the base case because this is not null four is not greater than this value four is not less than this value either so we're not still searching for the value we found the value which is this else case so now we're gonna check are we missing the left child nope are we missing the right child nope so we can't execute the simple case where we have just a node with zero or one child in those cases we would return whatever the non null child is so for example with this node both of its children are null so we would end up returning root dot right which is null itself that would end up deleting this node for this node we would check that a root dot right is null so we would return the left node which is here so that would be the new child of four but we're not executing either of those simple cases we're executing this else case for this node it has two children so what we're gonna do is first find the minimum node from the right subtree which is this node and we're gonna return it so we're gonna do a search find this node and then return it and we're gonna change the value of this node to the value from the minimum node so this is now gonna be a five the good thing about nodes is we can just change their values and in this case it turns out to be useful now the last thing we're gonna do we know now we have two copies of this five so we actually have to now remove this five so recursively we call the remove function again on this subtree and pass in the value five now the good thing here is since we found the minimum value from this subtree we know that node was a leaf node so we know that this removed even though it's recursive we might end up calling the same function within the function we know that the second time we call it it's going to be the simple case where actually this is not necessarily gonna be a leaf node it could have a right child over here or something like that but it's not gonna have any left child for sure that means at most it could have one child therefore it'll be the simple case over here where we don't have to make a recursive call so let's actually run through that really quickly so we're looking for the five five is less than six so we're gonna go down here now we found five right so it's not greater or less we have the exact node so if not root dot left in this case it doesn't have a left child so we return the right child which in this case is null but that's okay because we return null from here so up to the parent six where we called this function root dot left is equal to removing five from this subtree we did that so we end up returning null here so the left child of six ends up becoming null and then of course we would return this up to this parent but that wouldn't change anything these are still gonna be connected but you can see we removed four from the tree we moved the five up here replacing the four now this case that we just talked about right now was actually the worst case time complexity because we had to find the node where we're moving which was pretty easy in this case we found it immediately but then we had to find the minimum of the tree which we had to traverse the whole height of the tree which let's say is log n in this case and then after we did that we replaced the five and then we had to remove five so we had to traverse the height of the tree again so we had to do it twice that's the worst case we won't have to traverse the height more times than that because we know that the second time we remove a different node than we were originally removing that's gonna be the last time we remove a node because it's going to be a node with one or zero children so the worst case is we traverse the height twice two is a constant so the big O time complexity is log n so this is the benefit of BSTs over sorted arrays we can remove and insert in log n time so when we have a sorted array there are many times where we want to iterate through that sorted array and we can do that pretty simply just by going left to right it would be useful if we could do the same thing with a binary search tree and actually we can we can go through all the values in sorted order and then do whatever we want with them but it's not gonna be as simple as iterating through an array but it is gonna be as simple as going from left to right what I mean is if we want to traverse through the entire tree and we wanna do it in sorted order we call that an in order traversal we wanna go through the values in order so like I said we wanna do we wanna go left to right because starting at the root node we know that the smallest values are gonna be in the left subtree so before we even process this value we want to process the left subtree the entire left subtree and the same applies this is a recursive definition it's a recursive function to go through them in sorted orders so we go to the three and from here we know that all values smaller than the three are gonna be in the left subtree so we do that now from here we're gonna do the same thing we know all values smaller than two are gonna be in the left subtree but there aren't any values less than two so we are at the two now what do we actually wanna do with this value well we could do anything we want we could print it we could add it to some output array for now let's just say that we're printing it and I'm gonna write them up here but okay we're gonna go ahead and print the two next before we pop back up to the parent there could be some values over here in this case there's not but if there were we would want to traverse the right subtree of this node right after we process the node itself because we know all values in the left subtree of the three are gonna be less than it so any values here are gonna be greater than the two and less than the three in this case there aren't any so we don't have to do anything but we pop back up to our parent and from here we know we just process the whole left subtree so now let's process this node the three and then let's do its right subtree well it doesn't have one so same thing we go back up to the parent we process this node because we just did the whole left subtree so four we print it and now it's time to recursively do the right subtree in this case we actually have a right subtree for us to traverse and of course we know all values here are greater than all the values we printed so far you can tell that just by looking at it so now let's start at the six but before we can print the six let's recursively traverse its left subtree we get to five before we print five let's recursively traverse its left subtree we get to null that's gonna be the base case of our recursion so now it's time to print five because we just did its left subtree now it's time to do the right subtree of five it doesn't have one that's the base case then we can return back up to our parent six we just did the whole left subtree of six so let's print six and now let's recursively do the right subtree of six it does have one so we get to seven let's do its left subtree it doesn't have one then let's do this value seven and then let's do its right subtree it doesn't have one but notice how for every node we went left then we did the current node and then we went right so it is similar to going through an array in sorted order we are kind of going left to right it's just different we're doing pointers and we're doing recursion now even though this is kind of complicated to understand especially if you're a beginner it took me a while to understand this the first time I learned about it the good thing is that the code is really, really simple let's take a look so in order traversal is another recursive function we're given some root node like this one and the base case is similar if the current node that we're at is null then we can simply return there's nothing for us to traverse that would be the base case when we go down here or when we go down here or the right child of here or of course here, here, here and here the recursive case though is where we wanna run in order traversal on the left subtree first so we recursively go to the entire left subtree and then from here we do the same thing we go to its left subtree and then from here we do the same thing we go to its left subtree which ends up being null so we return and then from this node to after we've done its entire left subtree we print the node itself we print two and then we run in order traversal on its right subtree that's the base case it's null so then we're done with that subtree we're back up to three we just processed its left subtree so now let's print three and then do its right subtree it doesn't have one over here so you probably get the idea at this point we go to four we've already done its left subtree so now print four and then do the entire right subtree so recursively we would do the exact same thing here now what's the time complexity of doing this well in the worst case and pretty much every case we're gonna have to traverse the entire tree so we visit every single node once and that makes the time complexity the size of the tree so it is big O of N the same as if we were traversing through a sorted array there's no way to get around this I mean we have to go to every single node it can't possibly be less than big O of N but you might have noticed something if we were given some random values like we were given these values but we weren't given them in sorted order we were just given them in some arbitrary order we could sort them using a binary search tree and then get a sorted array so instead of printing each value we could have taken each value and added it to some array what would the time complexity of that be well first if we were given these values we would have to build a binary search tree so for every value we would have to insert it into the search tree and we know an insertion into a binary search tree is big O log N for a balanced tree at least which is what we're assuming and we would have to do that for all N values so N times log N would be the time complexity of building the entire tree itself and then to traverse it to build the output array in sorted order we would have to do big O of N from this in order traversal so the time complexity would be N log N plus N so we can write the time complexity like this but remember how I said that any constants are ignored like for example if we had two times N log N we wouldn't care about the two and by the way in terms of math this is equivalent to saying N log N plus N log N that's the same as two times something it's just adding it together twice now obviously N log N is gonna be greater than N itself right as N increases as it becomes a really big number of course this is gonna grow faster than N itself so therefore this equation down here is larger than this equation up here and we know that this is equivalent to this and we know that this reduces to just N log N because we can get rid of the constant so what I'm saying here is when we have added terms where both have variables in them we only care about the larger one we don't care about N here because this is gonna be less than two times N log N and we know that that reduces to just N log N so when we have something like this we only care about the larger term so in this case we wouldn't even care about the N this is dominated the time complexity is dominated by the larger term which is N log N so basically what we found here is that to be given some random values and to sort them using a binary search tree it takes N log N which is similar to some of the other sorting algorithms that we talked about like merge sort and quick sort so sorting as you can see is really not going away I told you that there were a lot more sorting algorithms that you might have thought and there are a lot more including some that we're not even gonna cover in this course there's just a ton of sorting algorithms now if you actually wanna go through the values in order which is usually the most common thing with binary search trees this is the way to do it but there are some cases where for a tree and it could be a binary search tree or it could just be a generic binary tree there are some cases where we would want to visit the root node and then visit its children and do that recursively so by the time we get to this node we wanna visit this or print it or add it to some array or do something with it before we do anything with its children we call that pre-order traversal so let me ask you if we wanted to traverse this tree such that we visit for we print for and then we print its entire left subtree and then we print its entire right subtree AKA pre-order traversal how would you augment this in order function that we have over here the good thing is we don't have to change much we just have to swap the order of these two lines over here instead of going through the entire left subtree before we print the four we just print it before we go through its left subtree and then we go through its right subtree so to make it more clear this is exactly what that code would look like printing it before we run pre-order on the left and then we run pre-order on the right otherwise it's the exact same the exact same base case the exact same parameters now there's the other case which we call post-order traversal which is the opposite so if we wanted to for this node traverse all the values in the left subtree and then traverse all the values in the right subtree and only then print the root node we would recursively run post-order traversal on the left and post-order traversal on the right and then print the value so these three functions are very very similar we just swap kind of the ordering the way that we do the traversal that's kind of the nice thing about doing this recursively it's hard at first to wrap your head around recursion which is why I've really tried to emphasize it throughout the entire course but once you do understand it things can become really really simple so we got in-order, pre-order and post-order by the way we could do reverse order as well do you know how we could augment this? what I mean is we printed the values in order we got two, three, four, five, six, seven but what if we wanted to do the opposite we wanted seven, six, five, four, three, two how could we do that? well if we want to get the largest value first when we arrive at the root node we should probably go to the right subtree and then print four and then go to the left subtree so then we get to the six we should probably go to the right subtree before we do six so let's get down here we should probably go to the right subtree first there's nothing there then we print the seven and then we print the left subtree there's nothing there and then we get back to six print six and then we do the left subtree five, print five and then we're at four print four and then we go here we would print three go to its right subtree there's nothing there and then print two so that's how we would do it in reverse order how would you augment the code to do that? pretty much just swap these two lines so instead of doing the left subtree first we want to do the right subtree first and then print the value and then print the left subtree by the way for all these traversals pretty much every way that you can traverse the entire tree the time complexity is going to be big O of N now believe it or not we just talked about probably the most common algorithm when it comes to coding interviews and one of the most common algorithms in general it's called depth first search aka DFS for short all three of these recursive functions are examples of DFS and it's pretty much like the name implies as we search we go depth first so for example we start at this node and we go in one direction but you know we could go in the left or we could go to the right we could process this first and then go to the left subtree or we could process this after but either way it's a traversal we're going as far deep as we can first we're not gonna you know do this node and then do this node and then this, this and this we're gonna go as far deep as we can in the left subtree before we go into the right subtree so we go here we go to the left we're going deep as we can down here now right in this subtree here we're gonna go as deep as we can we go to the left node again here we're gonna go as deep as we can this is what we did first this is the first thing we did we reached the bottom of the tree we went depth first we got to the bottom of the tree this is the first thing that we did we went as far deep as we could in one direction before we even visited any other node and obviously the tree could have been even larger but you probably get the idea and we would have done that with all of these functions whether they go in the left subtree first or the right subtree whether we process the first node before or after it doesn't matter we go deep first and we pop back up again and then we get here and then we recursively run DFS on the right subtree we go as deep we starting from six we go as deep as we can we go to the left child here five and then we keep going there aren't any children but you know we did the same thing we went as deep as we could and now we pop back up to six and then we go as deep as we can in the right subtree there's only a single node but you get the idea right we're going depth first so imagine for every leaf node we're going as deep as we can for every single leaf node first individually now it's hard to understand DFS without understanding an alternative way I kind of glossed over it a second ago the opposite of DFS would be breadth first search aka BFS and for that we would go layer by layer we would go the first layer just this node itself then we would do this node and this node we would do this entire layer and then we would do this entire layer this is clearly not depth first search because we're not reaching the bottom immediately we're kind of going like this and breadth first search is actually what we're going to start talking about up next so next let's talk about breadth first search or BFS for short and just like DFS this actually doesn't necessarily need to be applied to a binary search tree it could be applied to any tree whether it has that sorted property or not but the idea here is that we want to traverse this tree layer by layer so breadth first meaning we want all the closest nodes first and then we want the next layer of the closest nodes and then continue doing that we don't want to go depth first we don't want to go all the way to the bottom before we traverse the closest ones we want to do it the opposite way we want to do this layer and typically what we do with breadth first search is we go left to right but you don't necessarily have to do that but for trees it usually makes sense to go left to right so traversing this and let's say we're printing each value we would have the first node the root node we would print four and then we would go to the next layer how would we do it? well for going left to right we should go to the left child first so from here we print three and then we do the right child so from here we print six now if we print six after we print three how are we gonna get to this node because now for our breadth first search we want to do this next level we want to do two then five then seven so how are we even gonna get to the two I mean assuming we had a pointer here when we were at three next we're gonna have a pointer here when we're at six we kind of have to save that pointer so this algorithm actually doesn't really suit recursion very well because it's not like we're gonna recursively traverse this and then recursively traverse this to get that breadth first search so we're gonna have to do this iteratively but let's continue next we would want to traverse two how do we know we wanna traverse two well we're gonna traverse all the children of this node just like before first we processed four and then we processed all of its children next we processed three and six and now we wanna process all the children of these nodes from left to right so we get all the children of three and then print them so we get two then we get all the children of six and then we print them so we get five and then we get seven now this didn't create anything super useful for us we didn't print it in order or anything but BFS is definitely a useful algorithm to know it could also be called level order traversal because we're going level by level in this tree so the code for it is gonna be pretty interesting remember how I said that as we process a node after that we want to process the children of that node from left to right and then as we process another level or another node here we wanna process its children from left to right and then process its children from left to right so as we go through every node and process it we should take the children of that node and add it to some data structure to remind us to next process these nodes and then next process these nodes and then next process these nodes well there aren't any more nodes here for us to process so that's how we know we can stop what kind of data structure can we use well actually we've already implemented the data structure that we need and it's called a queue because we're gonna add these elements first in, first out so if after processing this we add its children in the order left to right we add the three and then the six those are added to our queue we're gonna process these first in, first out we're gonna process the three and then we're gonna add its children to the queue so at this point we would have a six and a three in our queue we're gonna do it first in, first out since the six was added first we want to process it first and that makes sense because we wanna finish this entire level before we get to the next level let's take a look at what the code for this would look like so this is the Python code but it would be very similar in other languages the idea is that we're gonna use a queue in this case in Python it's called a deck which is a double-ended queue but it's basically the same queue that we implemented earlier in the course so we have a queue for our BFS we're given the root of some tree now of course it could be empty in which case we're not gonna add anything to our queue but in this case suppose we were given this node that would mean we added this node not the value but the node itself to the queue so as of now our queue has a single element this node in this case suppose we actually wanted to count the levels as well like as we print these values we say this is level zero this is level one, this is level two, et cetera we don't have to do it that way it would be simpler if we didn't need to do it that way but let's do it like this because sometimes this can be more useful and this requires a couple extra lines of code so it's a little bit more complex which is why I'm showing it but here we're at level zero is the length of our queue greater than zero well this is our queue right now so it is greater than zero we can print the level now so level zero and then we're gonna print every node in the zeroth level there's only gonna be one node in the zeroth level because there's only one root node for every tree so from our queue we're gonna pop in Python it's called pop left but you can just kind of imagine this is what the queue looks like we add elements like this so we're adding to the right and we're popping from the left this is just our standard queue pop we're popping from the opposite side that we're pushing so it's first and first out so we pop the node for we get it, we print the value of it then we check its left and right children we want to take its children and add them to the queue but if its left child is null we don't want to add it to the queue so we check is the left child non-null yes it is so we add it to the queue it's important to add the left child before the right child if you want to traverse them like if you want to traverse each level like this of course we could do it the opposite way if we really wanted to but usually it's more common to do left from right so we add the left to the queue then we add the right child to the queue because it's also non-null so this is what our queue looks like so far we've printed four so far and at this point actually our loop our inner loop is done executing because we took the length of the queue which initially was one we just had a single value in it and then we're finished now we did add a couple more elements to the queue but in this case what this does is it takes a snapshot so we just took an integer one and we had a loop that iterated one time because we only had one node it's basically like we have a constant value here we only calculated the length of the queue a single time so we did that and now we finished this inner for loop so we increment the level by one and then we go back to our outer while loop and check is the length of the queue greater than zero we have two elements in the queue so it is greater than zero so we print the next level, level one if we're starting at zero and then we pop from the left of the queue so we would pop the leftmost let's say this is the leftmost because we inserted it first so we pop it we print it we check its children it has a left child so we add it to the queue it does not have a right child so we don't add it to the queue so we've printed four and three so far so our queue is going to look something like this we first had the four on the queue we popped it then we had the three and the six on the queue we just popped the three and printed it and we still have the six on the queue and we just added the two to the queue but we're going first in, first out so next we're going to pop the six we pop this one we print its value we look at its left and right children first the left child it's non null so we add it to the queue by the way I'm drawing it as the value but in reality in the queue we would actually have the node itself because when we pop the node we want the pointer to the left and right child of that node we wouldn't have that if we were only storing the value it's important to store the actual node itself so we do the same thing with the right child it's non null so we add it to the queue and at this point we continue so we pop the next node in the queue which is two we can print two and two does not have any children so we don't add anything to the queue so you can kind of start to see how this is going to finish up now we pop the five we can print the five it's getting a little bit messy over here but it's not so important the values that we're printing themselves it didn't have any children either we pop the seven next it doesn't have any children we will print the value or do something with it and then this is our level order traversal at this point our queue is empty so we stop and that's gonna in terms of code gonna stop here the length of our queue is not greater than zero so we stop by the time though we got to the last level when we were here with a two five and seven we would have taken the length of the queue to be three we would have executed this loop three times popping a node each time and we would have tried to add the children but the children don't exist so our queue at that point would have been empty we would have gone back up to the outer while loop the length of our queue is now empty so we stop so that's the idea behind breadth-first search now what's the time complexity well people get really confused when they see nested loops people automatically assume when you have nested loops that the time complexity is going to be n squared but that's not the case here and the simplest way to understand that is just by looking at what we did you watched me go through this tree we only traversed each node once right we printed each node once we added it to the queue once and then we popped it from the queue so you could say for each node we had let's say two operations technically you could say we have three because we're printing the node we're appending the node to the queue and then we're popping it but it doesn't matter it's some constant c times n where n is the number of nodes so this is the number of operations we did we know that reduces to big O of n don't worry so much about the nested loops yes we have nested loops but clearly the total number of operations we're going to do is big O of n so that's the time complexity just like any traversal we visit every single node once so just like DFS the time complexity is going to be the size of the data structure that we're traversing which is the number of nodes I mentioned DFS is probably the most common algorithm you'll use BFS is probably the second most and maybe it is the most common and we definitely have not seen the last of BFS I'll tell you that much right now I want to introduce a couple data structures to you which are sets and maps and that's because these two data structures are commonly implemented using binary search trees so a set is pretty straightforward it's some set of values right like one two three it's pretty similar just to having like an array but usually the word set implies that there's some other underlying data structure being used not an array one implementation could be a binary search tree so if we have the values one two three but they're in a binary search tree it could look something like this and the advantage of having a set implemented with a binary search tree rather than like an array is that we can search for values insert and remove values in log n time so that's pretty straightforward now I also want to introduce to you the idea of a map let's first understand why it's important suppose we had something like a phone book where it's sorted in alphabetical order from A to Z right and that's based on the name of the person now most people don't use phone books anymore but I think it serves as a good example so we have a bunch of names in the phone book something like this we could have Alice, Brad, Colin now notice these are in alphabetical order but what's important here is that for every name we don't just care about every single name every name is mapped to something else that's where the word map comes from in this case every name is mapped to some number you know some phone number let's just put one two three for simplicity and you know Brad could have some other number four five six and then Colin could have something else now the idea here is that we want to store all this information but we want to sort by the name but for every name we want to have some other information we call this a key value pairing so we map a key to a value and the data structure is typically sorted by the key so in our case when we had a binary tree originally we had nodes that just had key values that's our simple case where we have a set but using a map our node each node could have a key which is what would decide how to structure the binary tree in terms of sorting it but each node could also have an associated value which in this case would be like a phone number the value could even be its own object if we wanted it to be right it could be multiple values it could be you know all the information about the person their phone number their social security their email a bunch of stuff the idea though is that we're sorting by the key so if we want to search for somebody in the phone book it's efficient for us to do so by searching the names we can do that in log n time because we know it's sorted by the name but once we have that name we have all the information about the person aka the value so this is one way of implementing sets and maps using binary search trees we can implement sets and maps so the binary search tree is the underlying data structure it's similar to how we talked about stacks earlier stacks are kind of the interface or also the data structure we can push and pop underneath that it's implemented with dynamic arrays similar to how cues are implemented with linked lists sometimes they can also be implemented with dynamic arrays as well but sets and maps they can be implemented with other data structures which we'll be talking about later one example is hash maps and hash sets but like I said we'll talk about that later depending on how you implement these like which data structure you use whether it's a BST or a hash map there can be different trade-offs we talked about the time complexities of binary search trees already now depending on what language you're using these can have different names but for binary search trees it's typically called an ordered set or a tree set and for maps it's called an ordered map or a tree map and just to take a quick look at what it looks like in some common languages the good thing about Java is it has a built-in tree map so you can create a tree map where the key is a string and the value is a string you know we could also use integers and other things as well C++ also has native tree maps they're just called maps in this case the downside about Python and JavaScript they don't really have native tree maps you can install other packages for example in Python you can use the sorted dict package and this is essentially a tree map under the hood for JavaScript there are other packages as well which implement tree maps now in terms of real interviews it's most common that you're actually implementing a tree map or doing or running some algorithm on a tree map like searching or inserting or something like that it's not super common that you actually need a tree map like the built-in tree map at least but if you run into that case and you're using Python or JavaScript most likely your interviewer will let you assume that some object or some interface like that already exists and you can use it and you don't necessarily have to implement it from scratch if that's not what the problem is now sometimes the interview question itself might be to implement you know the data structure or parts of it from scratch but what's most important is understanding what kind of interface a tree map would have like the things that we talked about you can insert you can remove you can search you can iterate you can do the in-order traversal and understanding the associated time complexities of those operations which we talked about so I think we've learned enough to understand the backtracking algorithm pattern it's essentially based on the DFS algorithm that we already talked about in this case we're going to apply it to a binary tree not a binary search tree but just some regular binary tree so it's going to be recursive and the best way to understand it is to talk about an example so let's look at one example question we could be asked we're asked to determine if a path exists from the root of this tree to a leaf node but the restriction here is that it may not contain any zeros so we want to find a path from the root to any of these leaf nodes as long as the entire path does not contain any zeros and let's say we can return true if that is the case and if we can't we return false so obviously we're going to try this recursively so we're going to start at the root node what we want to verify is that this is not a zero because obviously if the root node is a zero there won't exist any paths from the root to any of the leaves that doesn't contain a zero so in this case four is not zero so we're here so far let's try to check if there's a path that exists in the left side of the tree so we get to the left child it is a zero we're not allowed to have any zeros what we say in this case is we're backtracking we tried one possibility now we're going back up the idea is similar to going through a maze let's say you start here and your goal is to get to the end of the maze you go and try every path you go this way but then you realize that it's a dead end so basically you backtrack and then try another route another way to reach the end so basically we're recursively trying every single path it's kind of a brute force approach we haven't talked about brute force yet but it's a style of algorithm where basically you go through every single possibility for example with binary search trees we know that to find a value that we're looking for we can take advantage of the sorted order property if we're looking for a value greater than four we would look in the right subtree we wouldn't even consider the left subtree but if we're given a binary tree not a binary search tree then we can't use that we actually have to check both the left subtree and the right subtree so we kind of have to brute force it in that case so that's the idea behind backtracking we have to go through every possibility when you go through a maze you have no idea which path could actually lead to the end so you have to go through every possibility in this case even though there's a seven down here we can't go there because this is basically a road block we can't go through any zero so we're not allowed to go this way so in this case we backtrack up and now we're going to try the same thing in the right subtree recursively we're going to go to the right child it's a one that's good it's not a zero unfortunately this is not a leaf node so we do have to continue now we try the left subtree it's a two that's good it's not a zero and then it does not have any children so that means we found the path so from here we didn't find the path so we would have returned false that's how the root node would know that we didn't find the answer in the left subtree let's try the right subtree in this case we did find the answer here so what we're going to do is from this node we're going to return true up to the parent over here and then this node knows that since we already found the solution in the left subtree we did find a path there's no need to look in the right subtree anymore so we don't even go in the right subtree from here we again return true up to our parent over here and then from here we end up returning true that yes the answer to this question for this tree turned out to be true now suppose if we changed this node instead to a zero we would have gotten here we would have went to the left path we would have found that no this was not the answer we returned false up to our parent here so we would you know backtrack basically back up try it from the right subtree we would see that this is a zero so this was not the path either we ran out of options we backtrack up and then from here we returned false to our parent and then from here we would also return false the answer to this question then would have been false not true so the code for this solution would look something like this you can see it's pretty similar to the other tree traversals the recursive ones that we already looked at so in this case the base case would be if we reach a null node like empty or we reach a node with the value zero that means that you know for example here this would not contain a path that we're looking for so in that case what we do is return false now if we get to a leaf node that's another base case so for example if somehow we were able to reach this leaf node where we did not encounter any zeros and at this point we would return true but we know we can't get there because there's a zero blocking us but eventually we get to this node we get to two it's a leaf node and we know we got there without encountering any zeros because if we encounter a zero we have to return false so if we ever get to this point where we reach a leaf node we can return true otherwise we are going to recursively check suppose from the root we would recursively check the left subtree and then check if that contains a path and if it's true then we can return true if it's not we don't do anything we know in this case the left subtree did not contain that but we ran recursively can we reach a leaf node on the right subtree from here and we know we did eventually we went here and then we went here that was the base case we return true so then from here we end up returning true and then that pops back all the way up to the root and then from the root node itself from the original function call we can return true now if this was a zero for example and none of these would execute to be true we would return false so now let's look at a slightly more complicated version of this same problem now instead of just returning true or false let's actually return what the values of that path would look like for example the solution here would be we return for example the solution here would be we have to build a path that looks like four one two because those are the values in this path so the solution to that would just be slightly modified so in this case we'll call it leaf path we're passing in our root but we're also passing in another variable path which which in this case will be an array or a dynamic array so it's sort of going to serve as a global variable for this function it's basically going to be the same array that we pass in to every recursive call so initially let's say we have an empty array we start at the root the root is not null and it's not equal to zero so we don't execute the base case where we return false now we're still going to return true and false to indicate whether we found the path or not because that's going to tell us whether we can stop looking or we should continue looking for simplicity let's assume that there's only one path from the root to a leaf node that's a valid path that doesn't contain any zeros because if we have multiple paths then we get into the question of which one should we return to keep it simple we're going to assume only one path will exist at most so maybe we'll have zero paths or we'll have one path in this case we have one path so we start at the root we don't execute the base case so what we're going to do to our path is we're going to push the value from the current node so we're going to push the value four so assuming that this is our path so far we're going to push the value four and by the way I'm actually going to swap the two values of these two nodes because that's going to illustrate this algorithm a little bit better so let me do that now how the rest of the algorithm is going to execute we're going to check does this have any children it does so we don't execute this we haven't reached a leaf node we call recursively to check if there's a leaf path in the left subtree we pass in the same path variable basically it's we're not creating a new copy of this variable we're just passing the exact same reference to this array into the recursive function it works pretty much exactly the same in most languages so we're at the left child but here we know that the same base case it's going to execute because the value is zero so here we're going to return false so we pop back up to our parent we backtrack so then from this four node where we checked if the left subtree contained that this is going to be false so we're not going to end up returning true just quite yet we still have to build the path so now we're going to be back here we're going to recursively call a leaf path on the right subtree over here now and actually I realize I want to modify this tree a little bit more to actually illustrate the complete backtracking here so what I'm going to do is change this to let's say a three and I'm going to add a right child which is zero here so this will give us the full picture of this algorithm but now we're at the one it's not null and it's not equal to zero so we're going to consider it part of our path so far so this one is going to be added to our path down here we're going to check is it a leaf node nope then recursively we're going to check if a path exists in its left subtree so then we're going to get down here to three we're going to continue the algorithm it's not a base case so we end up adding three to our path so so far this is our path we add the three down here now from this three it's also not a leaf node we recursively check if we can find a path from the left subtree but we see that the left child is null so here we're going to return false so from the left child it didn't have a left child so we return false so that means we didn't find a path yet then from this node we're going to check it's right child using this recursive call and this is also going to return false because while this node is not null the value is equal to zero so from here we return false up to our parent so then this is going to evaluate to false over here so we're not going to return true now this is the interesting case this is part of the backtracking since we originally were considering this path we were checking does this path lead to a solution now we know that none of the paths from three lead to a solution so while we were building up our path now we have to backtrack we have to say we've tried this path but it didn't work for us so we have to pop three from our array and by the way a dynamic array in this case is essentially being used as a stack for us because we're pushing to the array and we're popping from it so we're essentially using it like a stack but we're popping this is important to backtracking so and then we recursively get up to this one we know that it's a left subtree did not lead to anything let's try the right subtree so here we would call leaf path on its right subtree we would get to two we would check is it null or equal to zero nope so then we're going to say to our path let's append two so right now our path is four one two we were moved to the three that was very important for us to do because otherwise we would have gotten the wrong answer at this point after we append it we check is this a leaf node it is a leaf node it does not have any children so at this point we found our answer this is the first time we're returning true so we're going to return true so we're going to return true from two back up to the parent over here and from here we know we were calling leaf path on the right subtree over here and so since that evaluated true we can return true from there and then all the way at the root node we're going to execute this again and it would return true so then we did find the path and in our path variable that we passed in it would now be populated with the path that we were looking for four one two so backtracking is an algorithm that can actually be used on more than just binary trees it's similar to the recursive algorithms we talked about earlier like the Fibonacci sequence but understanding the fundamentals that we talked about here will serve as a really good foundation for understanding more complicated versions of backtracking the ideas are the same where you maintain some kind of solution keep adding values to it as we're trying to search for a solution but as we find out that we went down the wrong way we pop from the solution then by the way in this case since in the worst case we would have to traverse over the entire tree the time complexity for both of the backtracking algorithms we talked about would be big O of n where n is the size of the input tree and that's usually the case for backtracking it usually is a brute force algorithm so it would in the worst case run over all possibilities in this case that means the size of the tree next let's look at another really important data structure which is called a heap slash a priority queue let's start with kind of the meaning behind priority queue we already talked about regular queues where first value in is the first value out of the queue but we might want to order the queue a little bit differently instead of ordering it by the first in first out we could order it based on some type of priority value for example we add some values like 7, 3, 9 so now which value are we going to pop first well it depends on the priority of the values there's typically two variations of this one is you know the minimum priority and the other is the max so in this case we could prioritize the minimum values first so if we did it like that we would first want to pop the 3 and then pop the 7 and then pop the 9 now this is dynamic meaning as we add values we might choose to pop a different one right now we want to pop the 9 if we pop a next value but if we add a smaller value suppose a 4 in that case we're going to want to pop the 4 first and then pop the 9 that's if we did it based on minimum of course we could do it based on the maximum as well in that case we do 9 first then 7 and then 3 so the idea is actually pretty straightforward now the reason we have two different names here is kind of the same idea when we originally had queues those queues were actually implemented with linked lists so in this case the relationship between these two terms is similar the interface that we're using is called a priority queue right that's kind of the idea behind it but underneath the hood that's implemented using a heap so people kind of use these two terms interchangeably so you'll see that a lot I'll probably be doing the same and outside of this course people use these generally interchangeably I think it's more common to use the term heap just probably because it's shorter to be honest but in this case a priority queue is going to be implemented with a binary heap which is this data structure that we have here this is what we're going to talk about and by the way that heap that binary heap could either be a min heap or a max heap depending on how we want to implement our priority queue whether we care about minimums or maximums I'm mainly going to be focusing on the minimum case generally it's more common to use a min heap rather than a max heap that's what this is here but the implementation is pretty much exactly the same it would just be swapped you know instead of taking minimums we would take the maximums I'll talk about that more now now at first glance this looks like a binary tree and that's because it is a binary tree but it's not a binary search tree and you can tell because the root here is 14 with a binary search tree every value in the right subtree would be larger than 14 which it is in this case and every value in the left subtree would be smaller than 14 but that's definitely not the case actually when you look at the root node here it's smaller than every single descendant in the tree that's going to be the order property that we talk about in a minute but let's first start with the structure property a binary heap is essentially a binary tree that is considered a complete binary tree so a complete binary tree basically means we have a tree where every single level in the tree is going to be completely full there's not going to be any holes anywhere except possibly the last level the last level might have some holes but every other level in the tree is not going to have holes so you can see that that requirement is satisfied in this case now maybe if we're missing this node it would not be the case because we're supposed to have four nodes in this level but we only have three so if any of these four nodes is missing we don't have a complete binary tree if any of these two nodes are missing we don't have a complete binary tree now the last level here we can have some missing nodes that's perfectly okay but if we insert a node over here now regardless of the value tell me is the structure property satisfied definitely not because because while the last level has some missing nodes that's okay the second to last level now also has missing nodes that's definitely not okay now another thing about the structure property as we add values you can see they're added left to right it's not enough that only the last level is allowed to have missing nodes if it does have missing nodes they should be at the end of that level meaning that as we add nodes to it like assuming that this level doesn't even exist as we add nodes we'd first add the node here and then add the node here and then add a node here then here then here then here so it's kind of intuitive that we want to have a complete binary tree and as we add nodes we're adding them in a pretty straightforward way right in the next available position essentially so that's the structure property next is the order property remember the whole point about having this heap aka priority queue is about finding either the minimum or the maximum value really quickly really easily that's a that's the entire point of having a priority queue in this case we're talking about minimums so it makes sense that we would want the minimum value among all of these to be at the root because from the priority queue we want to be able to find the minimum easily by looking at the root we can do that pretty quickly we can look at it in oh of one time as long as we're not removing this so what order property should we give this to do that well we should say recursively for this tree we want every value in the left subtree to be greater than 14 and every value in the right subtree to also be greater than 14 that's the only requirement we have but it's recursive that means that for this subtree for the root node of this subtree we want every value in its left subtree to be greater than 19 that is the case as you can see here and we want every value in the right subtree to be greater than 19 that's also the case as you can see here same thing for 16 19 should be greater than 16 it is 68 should be greater than 16 and technically in heaps we're actually allowed to have duplicates you can see we have two 19s so so actually when we look at this value we don't necessarily need every value in the left and right subtree to be greater than the node than the root node it's okay if they're equal as well that wasn't true for binary search trees but it is true for heaps we could have some duplicates and that's okay it's just that for every node all the descendants have to be greater than or equal to that node at least when you're talking about min heaps which is what we're doing here obviously for a max heap we would want the opposite we would want for every node all of its descendants should be smaller than it because in that case we would want the maximum to be the root now one of the most interesting things about binary heaps is that while we draw them as binary trees that are connected via pointers and nodes and all that they're actually implemented under the hood using arrays and another interesting thing is that we actually don't start at index zero for this array we essentially don't care about the zeroth index and I'll show you why so what we're going to do is put the root node at index one that's always going to be the case the root is always going to be at index one so we got the root node next we fill in the next level we go left to right so for the 19 we're going to put it at index two for 16 we're going to put it at index three and we're just going to continue going like this but one thing I want to point out is that we know the root is at index one to get its left child we go to index two to get its right child we go to index three let's continue this and then I'll show you what's the pattern behind it and then we'll understand exactly why we use arrays rather than nodes and pointers so filling in the next level we have 21 which goes at index four 26 which goes at index five 19 which goes at index six and 68 which goes at index seven and the last couple nodes we have 65 and 30 they're going to go in index eight and index nine so the main reason we skip index zero is to get the math to work out what do I mean by math well for index one to get its left child we basically have a few formulas so for any node so for 14 which is at index one to get its left child we simply take the index that it's at and multiply by two so we know that its left child is going to be at index two which is 19 that is the case over here we know its right child is going to be two times i plus one i in this case is one so two times that plus one is going to be over here so just the next position over 16 is its right child and as we continue so for 19 we want to get its left child we multiply by two we get 21 that is the case to get its right child we multiply by two plus one that would land us over here 26 is the right child now these properties by the way are only true because we have a complete binary tree which means every node is filled except possibly the last level this would not be true for a regular binary tree or a regular binary search tree that's why we use these for binary heaps because they're complete binary trees that's also why we can use this array to represent the binary tree because if we had holes and stuff over here right like this 26 doesn't exist then it wouldn't really make any sense to use an array because we just have random holes everywhere using a regular binary tree but with a binary heap like this one what we're never going to have holes that's the entire point of having a complete binary tree and by the way if we can go from any node multiply by two to get its left child multiply by two plus one to get its right child then from any node we should also be able to go to its parent and we can do that by dividing by two now it's pretty obvious why if we multiply by two here to get the child we can divide by two to get to the parent but why is that also true for the plus one case if we multiply by two plus one we get to the child and then if we divide that by two we get to the parent why is that the case well pretty much because when we divide by two we round down we know that this number is always going to be even you know we're multiplying some number by two so by definition it has to be even we know that this number is always going to be odd because we're we have an even number and we're adding one to it this is the odd number nine when we take this and divide it by two we're essentially getting rid of this two so we're left with i plus one which is also divided by two and we know we always round down pretty much every language will round down automatically with python you have to use double slashes but we can still do that and when you round down this term is basically eliminated and then we're left with a single i that that would be the exact same if we took this and divided it by two we would just get rid of this we'd be left with a single i so when you take nine divide it by two you get 4.5 you can round that down that leads us back to four so for example if we wanted to you know start at this node we'd start at index one suppose we want to go to the left we would just take its index multiply by two that would lead us over here in the array now suppose from here we wanted to actually go right we would take the index multiply by two which leads us to four plus one which leads us to five then we'd be here or here using the tree representation and from here maybe we want to go to the left child so we take the index and multiply by two that leads us to 10 that's out of bounds because it's greater than the last index therefore you know we reach null so you know it basically can be used the exact same as we traverse a regular tree as long as you have these properties you can traverse it as if these were pointers so this is kind of a good introduction to heaps the structure property the order how they're implemented using arrays next we're going to talk about how we can actually insert values into a heap and remove values into a heap while maintaining the structure and the order properties because that's very important without these properties we don't have a heap anymore so now let's talk about inserting into the heap aka pushing into the heap that's usually the term that's used when it comes to heaps or priority queues so if we want to push a new value into the heap suppose we want to push the value 17 well thinking about the structure property the obvious place we would want to put the node is over here I mean we could I guess put it here but then there's like a hole over here we want these to be contiguous we always want to insert in the next available position we definitely don't want to put it here so we're going to put the value here it's a little bit messy over here but bear with me so we have a 17 over here so the good thing is that the structure property has been satisfied what about the order property well remember this is a main heap so for every node we want its children to be greater than it so for this node that we just inserted we want its parent to be smaller than it so let's compare the value here to the parent by the way in our array it's going to look something like this the 17 would have been inserted over here and we want to compare this to the parent how do we get the parent take the index divide by two so from 10 we would compare it with what's at index five clearly you know that's the same value and in this case the parent is actually larger but it should be smaller I mean technically they're allowed to be equal but in this case they're not equal so these should be reversed the smaller one should be the parent so what we do here is just simply swap the two values in our array so it's a bit messy but we swapped the values in our array and that means that the tree representation would look something like this and then we would continue the algorithm for the node now that we you know moved over here we want to compare it to its parent now we want to make sure that the order property is being satisfied we know of course it's satisfied everywhere else in the tree but with this new node it might not be satisfied so we look at it compare it to its parent its parent would be five divided by two rounding down so at index two in that case you can see that that's you know also the same node so the parent should be smaller but the parent in this case is larger than the child so again we want to swap these two values so now we swapped these two values 17 is at index two it's over here 19 was moved down here now before we compare this one to its parent I want to mention we moved the 17 up here the reason we don't have to compare this now to its left child and pretty much everything over here is because we already knew 19 which was in this position was valid 19 was allowed to be here because 19 was smaller than all of the values over here but now we replaced 19 with a value even smaller than it which is 17 so therefore 17 must also be smaller than all the values over here that's just basic math and logic so we don't have to compare it to every single value that's a descendant of it but now doing our last comparison comparing this to the root node 17 should be greater than 14 and in this case it actually is we compare this to the parent and the order property actually is being satisfied so in this case we know we're allowed to stop that's how you insert aka push into a men heap it would work similarly with a max heap just you know the order would be the opposite now in the case that we had an even smaller value suppose instead of 17 we had a value like 10 in that case we would end up swapping it with the root node then 10 would be over here and then we 10 would not have a parent right this is the root node it would not have a parent so at that point we would stop so either we go all the way up to the root or we get to a spot like this one where the order property is already being satisfied now taking a quick look at the code of what we've talked about so far this is what the class implementation of a heap would look like so for the constructor we would initialize the heap as just a single value array where the value in this case is just a zero that would just be kind of the dummy we know that the zeroth index is actually not going to be used so this is essentially an empty heap that we've initialized now to actually push values into the heap we would do what i just talked about which is taking that new value that we're inserting and appending it to the heap so putting it at the last index so the heap itself remember is just an array a dynamic array but essentially an array then we would get the index that we just inserted at which would be the length of that array minus one so we would be at index 10 and then we would do something called percolate up that's pretty much what i just talked about where we insert the node here the node that we did was 17 and then we percolate up which is basically shift it up while the order property is not being satisfied so you know we shifted 17 up here then we shifted it up here and then we stopped so this code is essentially doing the exact same thing you can see that index i which would be 10 initially is being compared to its parent i divided by two in python we use double slashes when we want to round down but the general idea is going to be the same in any language we're going to compare it to its parent if it's less than its parent then we're going to swap it with its parent so we you know have a temporary variable move the parent into the child position move the child into the parent position and then we want to take our index and then divide it by two because we just went from here to the parent position and now we want to potentially from here swap this with its parent or at least compare it with its parent and then we would continue the loop to do that comparison and keep doing that until this is no longer true until the node is in the correct position what would be the time complexity of this by the way well essentially the height of the tree and in this case we actually know that the tree is always going to be balanced that's kind of the point of having a complete binary tree so in this case we can definitively say that the time complexity is basically going to be log n where n is the number of values in our heap aka in our array next let's talk about removing from the heap aka popping from the heap we don't actually specify a value that we want to pop because remember our priority queue is about popping the priority element and we know that the minimum is going to be at the root so that's the one that we're always going to pop so it seems simple enough but actually popping is more complicated than inserting if we just wanted to read the top element we just wanted to read what the minimum element is in our priority queue we could just go to index one and then return that that would be easy but to actually pop it is going to be a little bit more complicated because we have to maintain the structure and order properties now after removing this node or removing that value from the array your first instinct might be to just take the minimum of its two children and replace it with that that would kind of maintain the structure because we have a hole here we need to fill we can fill it with one of its children and the order property we want to make sure that the minimum value goes here so the minimum must be one of these two children because all of these are going to be greater than 16 and all of these are going to be greater than 19 so the minimum has to be one of these two it seems simple enough but if we do it like that we're going to put the 16 over here which works and then we have to replace this with the minimum of its children which is 19 so we remove the 19 and then put the 19 over here now the order property is being satisfied but not the structure property notice how this level is full this level is full this level is not full we introduced a hole over here we didn't want to do that if we're removing a node we should remove it from the bottom level but we weren't able to do that this is why this way doesn't work it's a good idea but it doesn't work to get pop to work and to maintain the structure and order property and do it efficiently we actually have to use sort of a genius technique but it's actually a relatively simple one once you know it what we actually do is remove the 14 and replace it with the last value in the array the last node in our tree which would be over here because remember that's the way to maintain the order property so we remove this node and move that 30 over here and of course under the hood this would actually be going on in our array but i'm not going to be drawing all that out because it's actually going to get kind of complicated here now the good thing here is we satisfied the structure property but not the order property but we can fix that instead of percolating up like we did when we inserted a node we can do the opposite by taking this and percolating down because we don't know that this is in the position it should be at we need the minimum to be in this position we know that this is probably not the minimum the minimum is one of these two so what we do is we take the minimum of our two children the left and right the minimum is 16 so now we compare that minimum to this node 30 which one of these is smaller 16 is smaller therefore we're going to swap these two nodes we're going to put 16 over here and 30 is going to go down here now we already know 16 is smaller than all of these because it's smaller than 19 which is smaller than all of these remember heaps have a recursive order property but we're not done yet because we need to satisfy that recursive order property 30 should be smaller than all of its descendants meaning it should be smaller than both of its children so let's take the minimum of the two children it's 19 is 30 smaller than 19 nope that means we need to perform a swap so let's swap these two values move the 19 to be in this node and the 30 will now be in this node it's definitely gotten pretty messy sorry about that but we're pretty much done at this point so it's really counterintuitive that we would do it this way we would literally just take the last node move it to the root and then percolate down but that is the most efficient way to do it that's why I say it's a genius thing to do because you wouldn't really come up with it but once you know it it seems really simple and now our min heap is satisfying all the min heap properties now let's take a quick look at what the code would look like for this it's actually a little bit complicated so this is what pop would look like in python self is just a python thing which gives us access to all the member variables of the class we're assuming that this is inside some class I didn't write out all the code because we're already kind of running out of space but remember the ideas that we're talking about here could be applied to any language so when you pop from a heap first of all if our heap is empty there's nothing to pop and we know it's empty if the length of it is equal to one because remember we have like a dummy value in it so in that case we would return null if it's equal to size two that means we just have a single value the root we would remove it from our array and then return that value so we would return 14 and then remove this then we'd have nothing but the complicated case is if we have you know multiple values in the heap so we'd first assign the result we would take 14 copy it into some temporary variable because we know that's what we're going to end up returning but before we return that we actually have to delete it from the heap and then you know reorder the heap so next we're going to take the last value and move it into the root position so we're popping the last value in our array and then moving it to index one so we're putting the 30 over here next we set our i to index one that would mean our pointer is at the root node and then we would percolate down so this is a lot of code and there are many different ways to write this but what's important here is the logic so we know that there's multiple cases when we're at a node it could have both children it could have just a left child for example if this node didn't exist then this node would only have a left child it wouldn't have both children now it's impossible for a node to have a right child and not have a left child because that doesn't satisfy our complete binary tree definition so we only worry about the cases where we have two children or just a left child and the third case is of course we don't have any children so that's what these conditionals are all about the first case is where we have both children but before that first of all this loop of percolating down is only going to be running while we at least have a left child how do we know we have a left child well we take where we're at this is where our i pointer is at and we take two times i which will give us the pointer to the left child and we check that that is less than the length of our heap at this point that means we do have a left child so next in our loop we check do we also have a right child we know for sure we already have a left child but do we also have a right child we can check that by saying two times i plus one that's going to be the pointer here we do have a right child and if the right child is smaller than the left child which in this case is true 16 is smaller than 19 and the current node that we're at is greater than the right child in this case the current node that we're at is actually going to be 30 because remember we swapped it with this node down here so then we compare these two 30 is greater than 16 so all three of these conditions are met and they're anded together logic anded so therefore in this case we're going to end up swapping this with the right child and that's kind of the code down here to do that now if these were not satisfied for any reason then we would check maybe we want to swap this with the left child instead and we would only do that of course if 30 the current value that we're at is greater than the left child which is 19 so in that case we would potentially swap it with the left child but let's just continue it on this example so suppose we put 16 over here and we put 30 down here now we're going to go to the next iteration of the loop and before we actually continue to the next iteration of the loop we would want to update our eye pointer initially it was over here now we want to put it wherever we moved the node to which was here so we would take i and set it to two times i plus one so now our eye pointer is over here does this at least have a left child yes it does so that means possibly we might continue to shift it down let's check first of all does it have a right child yes it does is the right child less than the left child nope it's not so in this case we don't execute this first if statement we take a look at the else if else if 30 the current value is greater than the left child which is 19 then we swap it with the left child that's exactly what we're doing in this case so we moved 19 up here and we moved 30 down to this node and in that case we would set i equal to two times i because we're taking it from being pointing over here to now being pointing at the left child and then we would execute the loop again but we would see actually that in this case we don't have a left child anymore so we can stop the loop and at that point we would return the result which was the original root node that we were trying to get rid of now the last thing that i didn't kind of go over the last case would be you know suppose we had a 16 here we were moving the 16 down here we would compare it to its right child it's not greater than its right child we would compare it to its left child 16 is not greater than its left child that would mean that the last else case would be the one that would execute and that would basically break us out of this while loop so then the while loop would terminate and then we would return the result why we would execute it like that is basically we know that the node that we're percolating down is already in the proper position we don't want to swap it anymore we don't want to shift it down it's already where it needs to be that's why we would break out of the loop so this is definitely pretty complicated there are many ways of writing this i tried to write it where there are distinct cases for each of the things we're doing swapping right swapping left not swapping at all most likely you won't actually have to implement this in a real interview i think it's pretty rare but knowing the general idea of how this works can definitely be helpful now the time complexity is again going to be the height of the tree we know it's a balanced binary tree so the height is basically going to be log n so that's the big o time complexity and that essentially comes from a percolating down so we already talked about one of the advantages of heaps over binary search trees we can get the min or the max depending on what kind of heap that we've actually implemented we can do that in constant time whereas if you had a binary search tree you would need to do it in log n time because you have to kind of go all the way left in the tree now another advantage of heaps is how we can actually build them we know that when we have a binary search tree you have to insert a node every single time to build that binary search tree and the time complexity of that is n log n because we have to do n insertions and to insert each one takes log n time now for heaps you can do it the exact same way you can keep pushing elements into the heap and it will also take n log n time but there's actually a special algorithm with heaps which is sometimes called heapify or just build heap and the idea is that we could be given some set of values some list of values suppose in the format of an array like this and we can take those values and there in no particular order and we can take them and turn them into a heap that satisfies the properties of a heap and we can do that in o of n time let me show you how first of all if we're given an array in this format it doesn't even satisfy the structure property because we know that we don't really want to have a real value at the zeroth index so the simple thing to do here since we know these are already in no particular order let's just take this one and move it to the last position so at this point with this array what would our heap look like it would look exactly like this here we would have the 50 in the root position then the 80 then the 40 etc etc so we satisfied the structure property pretty easily but now we have to satisfy the order property and remember the order property is recursive it essentially means that for every single subtree every node should have a value that's smaller than all of its descendants so that's what we're gonna do for every single node we're gonna check does it have a value that's smaller than all of its descendants and that's because we're doing this for a min heap for a max heap you would just do the exact opposite so we can start at the bottom nodes and then move our way upwards so we can start at the last node over here but when we get here we know it doesn't have any children anyway so there's no point in doing any kind of comparison here same thing for this node same thing for this node and this node and this node they don't have any children so we have nothing to compare them with anyway so that's actually going to be true for half of the nodes so what i'm saying here is we can actually just skip all of these if we want to i mean we could start at the end and you know run the algorithm like that but there's no need to so to be a little bit smarter we can just skip half of these how would we do that in terms of code well we can get the number of nodes in the heap which is not necessarily the number of nodes in the array because we know we have a dummy so we would take the length which in this case would be 10 minus one which would give us nine because we know there's nine actual nodes here and then we would take that and divide it by two so nine divided by two rounding down of course would lead us to be at index four so 30 is the first node that actually has children taking a look at our picture that is definitely true if you don't end up remembering a fact like this it's okay to still start at the end and then go backwards anyway it shouldn't change the overall time complexity but this is how it's taught in most courses so we already know that all of these subtrees are valid they don't even have any children anyway but now moving to 30 let's check is 30 valid is it smaller than all of its children well it has two children it's smaller than 90 and it's smaller than 60 so we don't have to do anything notice how what we're actually doing here is the same percolating down algorithm that we talked about earlier when we were removing nodes this is the exact same thing what we're doing here essentially is going through every single node that has children and percolating down if it can be moved down so we did it for 30 we didn't need to you know perform any swaps now let's move to the next one 40 take a look let's compare it first does it have two children yes it does is the right child smaller than the left child yes it is is the right child smaller than the parent yes it is so we swap it with the right child so now that we've done that we already knew this subtree was a valid heap this subtree was a valid heap now we know this entire subtree is also a valid heap now let's move to the left we get to 80 that's this node over here does 80 have two children yes it does is the right child smaller yes it is so if we're going to swap 80 with something it would probably be this one so let's compare these two together is 10 smaller than 80 yes it is so let's swap 80 with this node over here so originally we already knew this subtree was valid and this subtree was valid now we know that we've you know percolated down for this that this entire thing is valid now when we are percolating down we are going to keep percolating down but at this point this one doesn't have any children anyway so there's nothing for us to do now we get to the last node in our array well the first node actually 50 it's the root so time to percolate let's compare the two children together which one of them is smaller 10 is smaller so if we're going to swap 50 with anything it's probably going to be this one let's compare 50 to 10 is 10 smaller than 50 yes it is so let's swap these two now it's definitely starting to get a bit messy now but let's try to continue we're almost done we're at the 50 now we're continuing to percolate this down let's compare the two children together because we have two children which one of them is smaller 30 is smaller so let's compare 50 with the smaller child 30 is 50 bigger than 30 yes it is so let's swap them and at this point I'm not even going to be maintaining this array because I think it's getting too complicated but you get the idea that anytime we perform swaps here under the hood there's an actual array that's maintaining all of that so the 30 is now going to be over here and the 50 is going to be over here now continuing to percolate this let's check we have two children still let's compare the two children this one is smaller now let's compare that smaller child with 50 is 50 bigger than 60 nope so 50 can stay exactly where it is we found the correct spot to put the 50 so now at this point this is what our heap would look like and tell me does it look valid to you or not well it's a little bit hard to read because it's gotten messy but the smallest value is in the root 20 is smaller than all of its descendants 30 is smaller than all of its descendants 50 is smaller than all of its descendants 80 doesn't have any descendants and the rest of these nodes don't have any descendants either so we definitely were able to do this now let's take a quick look at the code so the heapify algorithm is pretty similar to remove that we talked about earlier only in this case heapify or build heap you could call it is going to be passed in some input array and when we're when we have that input array we're basically going to take the 0th element and move it to the last position that's what append is doing it's basically pushing it to the end and then we're just assigning that array to be our heap we know it satisfies the structure property now it's time to satisfy the order property we have a current index we basically take the length of the heap subtract one and then divide it by two just like I talked about earlier this is integer division in python it'll round down but most languages will round down by default and current is going to be at this point just like we showed in the walk through and then we're going to have another copy of current we're going to call it i and at that point we're just doing the exact same percolate down algorithm I talked about earlier when we were doing removing or popping from the heap so I won't go super into detail it'll be the exact same idea but after we've percolated this down in this case we wouldn't actually do any swaps we're going to be done with this while loop and then we're going to take our current index and subtract by one because remember we're going in reverse order we're going to go to this node and then we're going to go to this node and then we're going to go to this node and for each of these we're going to percolate them down if we can now if you were actually implementing your own class for a heap you'd probably put this in like a helper function since it's being used not only here but it's also being used in heap pop so you could probably just put this in a helper function the reason I wrote this out explicitly is just to kind of show you exactly what is going on with the heapify algorithm now why is this heapify algorithm a linear time algorithm because at first glance it seems to be an n log n algorithm because we're going through pretty much every node in the tree and having to percolate it down percolating down might be a log in operation let's say so why is the big O time complexity a big O of n well first of all I'll just say that you're probably never going to have to prove this or even argue that it is a linear time algorithm just knowing it is enough but I'll quickly go over the general intuition of why it's the case first of all if we had to take every node and percolate it up or if we had to take every node and percolate it down which one would we rather do which one seems more efficient well if we're percolating down then only the root node actually has to go through the whole height of the tree and then these two have to go to the whole height minus one and then a bunch of nodes here go an even smaller distance and we get more and more nodes as we go down because remember this is a complete binary tree but if we're percolating up it's worse because now we have a bunch of nodes that have to go the whole height of the tree and then we have a smaller amount of nodes that have to go the whole height minus one so percolating down is going to be more efficient when we get to the root node we just have a single node here that doesn't have to be percolated up whereas if we percolate down we have a bunch of nodes that don't have to be percolated down so shifting these down is generally going to be more efficient if we have to do it for all the nodes now let's say n is the total number of nodes in the tree and let's say we in this case we actually had a full binary tree just to make the math a little bit more simple we know that roughly half of the nodes are going to be in the last level right that's kind of something we talked about earlier in the course right this is going to be one two four a each term is going to be roughly equal to all the previous terms combined let's say we have roughly n divided by two nodes in the last level now for each of these nodes how much are we going to have to percolate them down well remember the last level doesn't need to be percolated at all so we get a zero times n divided by two what about the next level well that's going to have half of the nodes that this level had so we can call that n divided by four how much are we going to have to percolate these down well one each so we're adding these two terms together zero times n divided by two and one times n divided by four and if we expand that we get a summation that looks like this now the number of terms that we're going to be adding together is going to be roughly equal to the height of the tree so so if you remember your math you know we'll get something like this but at this point you probably don't care and this would roughly evaluate to n so we know that the big O time complexity of this is indeed big O of n you won't have to memorize this don't worry and you most likely don't care about it either i just wanted to mention it for the sake of completeness but the important thing here is that just because we can be given some random n elements and then turn that into a heap doesn't mean we can actually sort all of them in O of n time because just because we have a heap if we wanted to turn that heap now into a sorted array we first built the heap in O of n time but then we'd have to pop every single element in the heap we know popping is still going to be a log n time operation if we have to do it for n elements then it'll take n log n so what i'm saying here is we can actually use heaps also to sort values and we can do that also in n log n time just like binary trees just like merge sort and the rest so this is yet another example of sorting so we talked about some of the advantages of heaps over binary search trees we can get the min and max in constant time we can heap a five we can build the heap in O of n time but the downside of heaps is that we can't just search for a random element and then do that in log n time like we can with bsts with a heap suppose we're looking for the value 30 we get to the root node we're at 14 it's not 30 well we know all values in the right subtree are greater than 14 and we know all values in the left subtree are greater than 14 so the 30 could be in both subtrees it could be here or it could be there we don't know where it is we haven't narrowed it down so to search for an element we'd have to go through every single node in the tree that would be an O of n time operation but with binary search trees it's log n so binary heaps are not good for searching but that's okay because they're not intended for that they're intended to get the priority element which is either the min or the max and the last thing i want to mention about heaps is that when it comes to coding interviews you actually have to use a heap more often than you have to use a binary search tree at least i'm talking about like a built-in data structure where your where the data structure is a part of the solution bst problems are also common but it's usually you're implementing a part of the bst you're searching for something you're calculating something but when it comes to problems where you actually have to use the data structure as like a utility data structure it's pretty rare that you end up actually needing binary search trees it's more often than you that you need a heap because it's more often that you need like a minimum or maximum value you need to continue doing that there are a ton of algorithms where heaps are really important so it's a very good data structure to understand you should at the very least understand that heapify runs in linear time and you can push and pop from the heap in log n time and you can get the min or the max depending on what type of heap you have in constant time those are all important to know even if you don't necessarily remember how to implement all of them so next let's move on to probably the most common data structure you're ever going to use whether we're talking about coding interviews or real coding at your job or projects or anything like that hash maps and hash sets are really really useful and to start with i'm going to be focusing on the application of these rather than the implementation because it's much more important to know how to use these than it is to actually implement them we already talked about sets and maps earlier specifically the ones that are implemented with binary search trees now i talked about how sets are just kind of a set of values like one two three whereas maps are a little bit more complicated that is kind of a set of keys which are then mapped to a different value we gave the example of like a phone book where some name is mapped to you know the phone number one two three something like that right so we're primarily going to be focusing on maps because they are generally more common and they're a little bit more complex but the main ideas behind a map are nearly exactly the same as a set instead of just having a set of keys we're going to have that set of keys except each key is going to be mapped to some separate value but the key is what's used to sort the binary search tree it's also what's used to access values from the hash set or the hash map first let's compare tree maps and hash maps though there are typically three main operations you want to do when you have a map you want to insert when it comes to tree maps the big oh time complexity is log in to insert into that tree that's definitely better than maintaining a sorted array because inserting into a sorted array in the worst case would be big O of N removing from the tree is the exact same it's also log N likewise with searching we essentially run binary search on the tree map and we can search for any value in log N time that's pretty good and there are other bonuses of tree maps because they're ordered we can iterate through it in sorted order if we have that tree map we can iterate through it in sorted order in O of N time so that's like our in order traversal just by having a tree map we are kind of maintaining that sorted order with a hash map though things get interesting inserting is actually a constant time operation removing is also a constant time operation and searching is also a constant time operation so you can already see why hash maps are probably the most useful data structures you're ever going to use they really just beat tree maps out of the water but there's a bit of a caveat here that this is actually not the worst case time complexity even though i'm writing it as big O which means worst case hash maps actually don't have this as the worst case time complexity for these three operations this is actually the average case time complexity but the vast majority of the time people still assume that it is the worst case time complexity so we write it as big O of one like if you're ever doing a coding interview it's safe to assume that these operations run in constant time that's what people assume in jobs and just in general i think if you ask them most people don't actually know that these operations are actually average case in the worst case inserting removing and searching actually could be O of N time depending on how you implement the hash map but that's not so important and we'll actually touch on that a little bit later now the downside of hash maps and there's always a downside when you have something this good it's that hash maps don't maintain any kind of ordering actually so if you wanted to iterate through all the keys of the hash map you could not do it in O of N time you'd basically have to take all of the keys and then sort them and then you know depending on what sorting algorithm you use you could use you know merge sort you could even use a binary tree sort that it would be N log N typically though so that's sort of the downside but you can see that the positives definitely outweigh the negatives especially if you don't need to traverse it in order and you just need to insert remove and search which is what we would probably do with a phone book then hash maps are very important so let's take a look at a common use case of hash maps let's say we were given a list of names maybe in the form of an array and we want to count for each name how many times does it show up in this array so how would we use a hash map to do that well the obvious way would be to map every single name so use that as the key and the value in this case will be an integer with the count of that name so initially our hash map is going to be empty we start at the first name it's Alice so we go ahead and add Alice to our hash map but before we would even do that what we could do is run search on our hash map we would check is Alice already in the hash map no it's not and remember search will run in constant time so it's not in the hash map so when we do insert it into the hash map we should put the value one because this is the first time we're seeing this name next we would go to the next position by the way for us to have done that for us to have gone through one position we did it in constant time if we were doing it with a tree map in the average case that would actually run in log n time I mean we know that inserting a node into an empty tree is probably going to be of one but in general it's going to be log n so now we're going to get to the second name Brad we're going to search for it in our hash map that search is going to be of one it doesn't exist so we're going to do the same thing with Brad we're going to add it as a key and the value initially is going to be one we're going to get to the third one we're going to get to the third name Colin do the exact same thing Colin is also now added to the hash map the value the count of it is one now we're going to get to our first duplicate Brad we've already seen Brad so when we search our hash map we're going to find that it already exists in the hash map and it's at this position so instead of assigning one here we don't know the count could be anything right now it's one it could have been two it could have been three it could be some value we know it's not going to be zero though so what we should do here is instead of assigning it to one we should add one to it which would put it to be two so we overwrite that value with a two by the way doing that iterating through this position was also a constant time operation and notice we don't have duplicates in this hash map I mean we could have added another entry for Brad and put one here but that isn't what we're trying to do for Brad we're trying to count how many times did it show up it doesn't make any sense to add a second entry and this brings us to a property of hash maps they cannot contain duplicates duplicates are not allowed in hash maps because it doesn't really make sense for us to do that that's not the use case for a hash map why would we want multiple copies of the same thing and that's the exact same with binary search trees remember we don't allow duplicates and binary search trees so a tree map is also not allowed to have duplicates so now let's go to the next position Dylan it doesn't exist in our hash map we figure that out by searching which is an o of one operation so we add Dylan to the hash map and the count is going to be initially assigned to one we get to the last person Kim and they also are not in the hash map we add Kim and then set the count to be one now for each position in the input array we basically did a few of one operation so that means to have gone through this entire array and built this data structure it would have been an o of n time algorithm now also the space complexity would also be o of n because in the worst case every single name could have been unique that would have led the size of our hash map to have been o of n but this is definitely better than using a tree map a tree map would have had the same amount of space roughly speaking it would have been o of n but the time complexity for each one of these insertions and searches would have been a log n operation and we know we have n values in the input so the overall time complexity would be n log n with a tree map if we were doing it but now we have a hash map hash maps are much more efficient by the way the code for this type of algorithm using a hash map would look something like this in python suppose we're given an input array names and you know the same exact names that we have listed up here we create our hash map this is how you do it in python it's a little bit different than most languages but in most languages you would declare like a hash map data structure with like a constructor or something like that we're calling it count map then we're going through every single name in our list of names and remember we have two cases if count map does not contain this name now in python the syntax for that is like this if name not in count map but most languages will have a count map data structure and that will have a method or function which is called let's say contains and then you'd pass in this key as the parameter to that method call but basically we're checking this name does not exist in this count map therefore we're going to assign that name to have an initial value of one the else would be that the name already exists in the count map therefore we would increment that count by one like we did for brad so it's a pretty straightforward algorithm it runs in linear time next let's talk about how we can actually go about implementing a data structure like this now let's talk about the implementation of a hash map now most likely you'll probably not be asked to actually implement a hash map in an interview it's pretty rare that you have to do that and the implementation of hash maps can get really really complicated we'll be talking about the general concepts and then we'll be coding up a relatively simple hash map but i think the concepts are what you should really focus on because the concept of hashing and the other ideas we're going to be talking about definitely come up in software in general so this is definitely not the last time you're going to be hearing about these ideas of hashing key value pairs and things like that first of all under the hood a hash map is actually implemented with an array so in this case we'll have some index and that index will be mapped to the key value pair and just because we have an empty hash map right now does not mean the size of our array is zero even an empty hash map is going to be non-zero in this case let's say we have an empty hash map but our array is of size two we have index zero and index one now the actual key value pair in this case suppose it's just null right some kind of default value it's empty we're basically indicating that we don't actually have anything stored here just yet now with an empty hash map the first thing we would probably want to do is start inserting some values but how does an insertion actually work in this case let's say we're calling it put it could be called insert or something like that in this case the key value is going to be a string and the value itself is also going to be a string that doesn't necessarily have to be the case you know this could be an integer or this could be an integer or they could be some other type it could actually even be an object but what's important here is that we have a way to convert the key we don't care so much about the value but the key should be able to be converted into an integer and how we're going to do that is called hashing we're going to have some kind of hashing function which is going to take the key value and then convert it into an integer and when we have that mapping you can see where i'm going we're going to use that integer as the index and that's how we're going to decide where to put that pair that key value pair into our array so in this case this array is actually an array of objects where the object itself is a pair a key value pair well the simplest thing would be able to map each character to some integer for example the first character a could be mapped to zero i think l is the 12th character so that could be mapped to the integer 11 and we could do that for every single character and then we would add them up and we would get some big integer let's just assume that the integer we get is a 25 i'm just making that up but the idea is that we converted each character to an integer then added them all up and we got another integer 25 now the only problem here is our array is only of size two so this is not really a valid index for our array how can we fix that well this is where a little bit of math is going to come in we can take any integer no matter how big it is as long as it's positive and we can take it and then mod it by the size of our array so in this case mod it by two if you're not familiar with modding we're basically dividing 25 by two and we're getting the remainder that's left over so in this case 24 divided by two would equal well we're the remainder of one because two can keep fitting into 25 until we get to 24 and then if we add a two we're going to get to 26 so that's too big with 24 we have a remainder of one left over now if we instead instead of 25 we actually had 24 and we were modding it by two then we would have a remainder of zero because two does fit into 24 perfectly 12 times and then we're left with a remainder of zero and those are really the only two possibilities but what's important to notice here is that when you mod by the length of the array no matter what integer you're modding you're always going to get a valid index as the output the result that we get would always be a valid index if we're modding by the size of the array which in this case is two it could have been three it could have been four but the mod is always going to result in a valid index and the reason behind that is pretty simple no matter what integer we're modding by it could have been five the result the remainder is always going to be less than five because the remainder always has to be less than five it always has to be less than the number we're modding by because if it wasn't then it wouldn't really make any sense right we can't say 25 modded by five is equal to five because we can still divide this by five and then get zero right so that's it's just a little bit of math but that's the main idea so we mod by the size of the array to get to convert any integer like 25 to a valid index so modding this by two in this case would lead to one so then we'd say alice is going to be inserted at index one so in this case alice is going to be at index one the key is going to be alice and the value is going to be the city that she lives in which let's say is new york city in this case now we're going to insert a couple more values but suppose we wanted to first get alice we could do that in o of one time we call get on alice we would using our exact same hashing function take alice convert it to an integer let's say again it's 25 mod that by two then we get one so in constant time we were able to get the index that alice is stored at then over here we could go to that index and then say okay what city does alice live in again well it's new york city so that's the idea and that's why it's so efficient now we're going to run into a few problems though let's say we now want to add brad to the hash map let's say we convert brad to some integer right taking the sum of the characters and by the way when we actually convert a character to an integer most likely we're going to be using the ascii representation of the character there are more complicated ones but if you're using a set of 128 characters so if we had a relatively small alphabet size we could do this basically it's a way of converting each character into a integer and in that case capital letters are actually different than lowercase letters like for example capital b could be evaluated to like 70 lowercase b could be something like 40 4 but that's not so important let's just assume that we have a way to convert each character to some integer but let's say we took brad and the integer that it turned out to be was 27 now we mod this by two we get one again so what are we going to do we're going to say okay at index one let's insert brad but alice is already at index one so we can't really do that without losing the information we've already stored this is a problem and this is the problem with pretty much all hashing it's something that you can't entirely avoid it's called a collision clearly it's bad to have collisions but there are ways that we can work around it we can minimize collisions and we can try to work around them but we can't entirely get rid of them that's really important to know hashing collisions pretty much always happen now first let's understand how we can try to minimize collisions there's sort of a problem here no matter what we convert brad into no matter what integer it gets converted to modding it by two is either going to lead to a zero or a one if our hash map is half full like one of the spots is already taken we basically have a 50 50 chance of having a collision or getting the position that's empty those aren't great odds so typically what happens with hash maps is we keep track of the size of the array and we keep track of how many keys are actually inserted into the array in this case the size of the array is two and we only have one key actually inserted in the array so we say that the hash map is half full and typically when it becomes half full is when we resize the array and resizing the array actually works pretty much exactly the same as when we did it with dynamic arrays meaning we take the size and roughly double it and we would have done that not at the time that we're inserting brad we would have done that as soon as we inserted alice we inserted alice and then we realized our hash map is half full so at that point we would have resized the array we would have doubled it roughly so the array would now have index two and index three so four elements now but ignoring brad for a few more seconds what if now we try to get alice well we convert alice to the integer 25 we mod it now instead of modding it by two we're actually going to mod by the size which is four and it turns out that the index is one again so we see that okay alice is still in the correct spot in this case it works out but it's not always going to work out for example what if instead of alice being a 25 was a 27 in that case when we mod it by two it would have gone at index one exactly where it's still at but then when we resize and we try to get alice again we try to search for it we mod this by four now because the size has increased and modding 27 by four leads us to three so we would say okay alice is probably at index three we would check and we'd see that no alice is not there what i'm saying here is when we resize the array we can't just leave the values where they previously were because the size changed therefore the index using our hashing function might have also changed so what we have to do when we resize we take every single key that's already in the hash map we recompute the hash of it using the new size and then move it to where it should now be for alice's case 25 modded by four is still one so it can stay where it is but if it wasn't we would have to move alice to the new position by the way we call this process of increasing the array size and then moving all the elements to their new position we call that rehashing the array and since we're doing it when the size is half full and we're doubling it every time it's still relatively efficient because it's going to be infrequent that we have to do this rehashing obviously it's pretty expensive because we're going to have to go through every key that's in our hash map but as long as we double it the same way that we did with a dynamic array it's going to be relatively efficient in the average case so now finally let's get back to brad instead of modding it by two now that our hash map is size four we're going to be modding this by four 27 modded by four is three so in this case we don't have a collision and it makes sense because our chances of a collision have definitely decreased we have a bunch more empty positions so brad is going to go over here and the city that brad lives in is chicago and since once again our hash map is half full we're going to again resize it so we would double the size so now it would be of size eight i'm not going to draw out the whole thing because it's not super important but let's just assume that the size of this has now become eight this will help us minimize the chance of a collision but we're going to insert one more person now callin let's assume that after hashing the name callin we get an integer value of 33 we mod that by the size of our array now which is eight we mod that by eight we get a remainder of one so even though we tried to minimize the chance of a collision which is good sometimes we still run into the case where we have a collision we are going to insert brad or callin at index one even though alice is already there and by the way when we resized this from four to eight we would have actually had to recompute the position for each of these and then potentially moved it to that position so for alice since i think alice was 25 that was the hash of alice and then we'd have to mod it by eight which in this case would have been one once again so alice actually would have stayed exactly where they are the hash of brad was 27 we would mod that by eight and that actually would have also been three exactly where it still is that's mainly just a coincidence to be honest i didn't even plan that but it's convenient for us because we don't have to redraw this now how would we actually handle this now i mean if we really wanted to we could resize the array and maybe that would decrease the chance of this being a collision but it still might end up being a collision so we need a different way of handling collisions and there are actually multiple ways of doing that one way of doing it is actually at every index instead of just storing a single pair we actually store a linked list of pairs so we could actually have multiple pairs occupying the same index now the downside of that approach is we have to kind of maintain that memory and the pointers and all that and when we actually run the get operation and we arrive at let's say index one we're looking for colon we would first see in the linked list we'd have to traverse through the entire linked list we'd first see alice is there that's not who we're looking for we'd go to the next position in our linked list and then we'd see oh colon is over here and then we'd return whatever value we're storing for colon which in this case is seattle but that's kind of the downside of and that linked list approach is commonly referred to as chaining there's another approach called open addressing and it's called that because we kind of loosely define the position that a key should go at like the index it should go at even though we computed the index one for colon we don't have to put it here we can say well this is already occupied let's try the next position so we take that index one and add one to it so we try index two is this empty in this case it is so we can go ahead and put colon over here and the city that they live in see for sure now in this case when we run get on colon we're looking for colon what city does colon live in we would compute the hash 33 mod by eight we would again get one we would see well this is not where colon is so if colon exists maybe they're at the next index maybe we open address this so let's try the next index we look at the key well this is the same key we were looking for colon so we found him and the city he lives in is seattle now possibly we weren't looking for calling we were looking for somebody else named dylan who actually does not exist in the hash map maybe we somehow computed the exact same index for dylan we would look at index one we'd see this is not dylan we would open address we would try the next available address is dylan over here no they're not we try the next available address is dylan over here nope is dylan over here nope and at this point we know we can stop we don't have to search through the entire index first of all that wouldn't be efficient that would be o of n and we definitely don't want to do that but we know that if dylan did exist the hash of dylan would be at one but this was occupied so we would have put dylan over here but dylan was not over there so then we would have tried the next position dylan would have been inserted over here but he's not there and then we would have tried the next position over here but this is empty so if dylan existed he would have been here but he's not therefore dylan does not exist in our hash map this will make a little more sense if i explain how we would insert dylan so let's say instead of getting dylan we're inserting dylan let's say hashing dylan we got the index one then we tried to insert dylan over here well this is occupied so now we try the next available index this is occupied next available this is occupied next available well we're going to insert dylan over here and whatever city that they happen to live at so this is a sort of naive way of doing open addressing because we're just trying the next available position every single time the downside of this approach is we're going to cluster the hash map potentially if we have a really big hash map and we get a lot of collisions all the keys are going to be stored close to each other there are better ways of doing this open addressing instead of just computing the plus one maybe we take that value and square it and do other things with it but the important thing to focus on here is that we have collisions and we have ways of handling those collisions and there are you know more optimal ways of handling that and you could spend a lot of time learning about that it's pretty math heavy to be honest so we won't go too in depth into it and you likely won't need that kind of knowledge for coding interviews but it's important to know that there are ways of handling things like this now the last thing i want to mention before we actually take a look at a simple code implementation of a hash map is that when we maintain the size of the array it's more optimal for the size to actually be a prime number for math reasons that ends up resulting in less collisions for the hash map so typically the size of the array is going to be a prime number so instead of this being eight it would actually be better for the size to be seven and then instead of actually doubling this to be 14 because 14 is not a prime number we would actually you know just try to find the next available prime number let's say it's 17 in this case and we would roughly double this but roughly double it to a number that is prime so maybe the next one would be something like 37 if that's prime i'm not even a hundred percent sure but that's the main idea now this is kind of complicated to implement this in code so i won't be doing that and it's very rare you'd actually be asked to implement a hash map like this i mean if you're asked to implement a hash map and you have to do it so perfectly for the size of it is prime and you're doing all these kind of crazy optimizations that would be a very very long interview question and it's very unlikely you'll be asked that even if you are asked to implement a hash map it's probably okay to do it a very simple way that's why i'm going to be showing you the simple way to do it right now so first let's look at how we would kind of initialize our hash map like i said we're actually going to be storing pairs in this array this is one way to implement that we have like a class and that class will store two values the key and the value of that pair you know in our case we were doing strings but these could be any kind of type as long as you have a way of hashing the key and this implementation will mostly focus on if we had strings as keys and strings as values so initializing our hash map this is python but of course you could implement this with any language the logic will be pretty similar we're going to initialize the size of the hash map the size is the actual number of keys in our hash map initially that's going to be set to zero we're also going to initialize the capacity i'm arbitrarily setting this to two but it could be any number optimally it should be a prime number but in this case it's two just like we kind of started with in the example and then our map itself remember it's maintained as an array and in this case there's only two values remember and to indicate that it's empty i'm initializing the two values as null if they were non-empty we would have some pair for those values the next really important thing for a hash map is to have some hashing function given some key we should be able to convert it into an integer that fits within the boundaries of our array in this case it's size two so it should fit in our array of size two assuming the key is a string we go through every single character in the key and we convert that character to some integer this in python will convert it into the ascii value of the character but that's not super important as long as we have a way of converting the character to some integer we're going to take that integer and just sum all of them up so initially our index is going to be zero we're just going to keep adding that integer to our index we could have some really big integer it could be out of bounds of our capacity so what we're going to do before we return is mod it by the capacity so if our index ended up being something like 10 we would mod it by our capacity which is two we would get zero if our index is actually not out of bounds for example maybe it's one we would mod that by two and we would still end up getting one which is still in bounds anyway so if it's in bounds that's okay if it goes out of bounds then we'll mod it to bring it within bounds so by the time we return the integer for this hashing it will be some index that is in bounds of our array now let's move on so first let's look at searching for a key aka the get operation so given some key we want to return the value of that key if it exists if it doesn't exist we can return null or we could throw an exception or we could do something but the details aren't super important we care mostly about the logic so first we're going to convert that key into an index using the hashing function we talked about a second ago and then we're going to loop this implementation by the way is assuming we're doing open addressing where if we find a collision we're just going to try the next index so given that index first we're going to check is it non-null suppose we're searching for alice and we get an index of one we would look here it's non-null so then we would check is the key of what we're storing here of that pair equal to the key that we're searching for in this case it's alice what we can do is just return that value we found what we're looking for but maybe we're actually looking for somebody named colon in that case we would have not found what we're looking for this is not colon so instead we would have not returned we would have incremented our index right try the next index and here the reason why we're modding by the capacity is assume that actually this part of our array didn't exist we're looking for colon we took one and incremented it by one well that took us out of bounds to bring us back in bounds we can mod the index by the capacity just like we were doing earlier that would bring us up to index zero now we're still looking for colon we execute the loop but at this point we see that there's null here this is empty that means if colon existed in the hash map he would have been here but he's not therefore we know he doesn't exist and our loop stops we can return null or we can throw an exception we can somehow indicate that the key colon does not exist in the hash map now let's insert into the hash map a.k.a put we pass in some key and some value we want to insert that into the hash map the idea is similar we convert the key to some index and then we start looping we're looping wild true because at any point that one of these executes we're going to return immediately anyway so this is just kind of a convenient way that I like to write it so there's two possibilities one that this key does not exist in the hash map let's say we're trying to insert colon the index we computed was one our loop would execute like this first we would check is this index empty it's not empty if it was then we would insert colon over here but it's not then we execute the else if case else if the key of this is already equal to colon in that case we would just overwrite it but this is not colon this is alice so then neither of these execute we just increment the index by one and modding it to make sure it stays in bounds so then we would try this index and we would go back to our loop we would see that this index is actually empty so what we can do is create a new pair for colon and whatever value colon has assign it to this index increment the size by one and this is important remember if the size is greater than or equal to the capacity we are going to call our rehash function which i'm going to show you in just a second and after we do that then we can go ahead and return now assuming we were inserting alice and we wanted to change her city from new york city to something else then the else case would have executed we would have found alice is already here we would just simply overwrite the value we don't have to increment the size by one because we're not adding anything we're just overwriting the value and then we could return immediately so lastly let's look at the rehash function it doesn't really take any parameters because we already have all the member variables of the hash map we know we're going to double the capacity at least that's the naive way to do it more optimally we could find the next prime number that's roughly double so we're going to declare a temporary variable for our new map it's going to be an empty array so far but we want it to be of the size of the capacity that we're trying to do which is roughly double the previous capacity so we could do a loop for that we could just append null to it to indicate that there aren't any actual pairs in this just yet and then we're going to take our current map our current member map and assign it to a temporary variable old map then we're going to take our new map and then assign it to the member variable so now our new map is the map that we're maintaining but all we have to do now is take the values from the old map and then put them in the new map so as we iterate through every pair in the old map if that pair is non-null we're just going to run the insert function or the put function that we talked about earlier putting the key of this pair with the value of this pair the reason we reset the size to be zero is because we know internally this put function is going to be incrementing the size by one each time anyway so that will give us the correct size so that's why we reset this to one here so these are the main ideas i wanted to talk about you don't have to focus so much on memorizing every detail just understanding the main concepts behind collisions rehashing increasing the size to minimize collisions having some kind of intelligent hashing function where we also minimize collisions having the size be prime also helps to minimize collisions these are the important concepts to understand when it comes to hash maps and even more important than that is just understanding how to use hash maps and hash sets which we talked about earlier we didn't talk too much about the details behind why inserting removing and searching runs in o of one time on average the analysis could get complicated and i don't think it's super important to understand it's pretty safe to assume that those operations run in o of one time that's what most people do and to be honest most people don't even understand what we've discussed so far so if you understand even half of this you're far ahead of most people next let's talk about graphs which are a really common topic when it comes to real coding interviews and they're actually pretty challenging there's a lot to cover here but the good news is that we've actually already been talking about graphs since pretty much the beginning of the course linked lists are actually a form of graphs there are a subset of graphs and another subset that we talked about are actually trees whether it's binary trees or other kinds of trees a graph is essentially made up of nodes like we have over here and possibly some pointers connecting them together we've been calling these nodes throughout the course but another way of referring to them is as vertices so it's kind of a synonym vertices is essentially the same thing as a node typically and when we talked about pointers connecting nodes together another word for that is edges so suppose we had a graph that looks like this i mean this is sort of shaped like a tree right this node could have some kind of child maybe you know it has another pointer that's pointing to null in this case it doesn't what i'm getting at here is that we could have graphs of all kinds of shapes when we talked about binary trees and binary search trees we mentioned that they never have cycles but what if i take an edge from c and connect it to a here you can see we have a cycle going from a to b to c and then back to a so when it comes to regular graphs to some generic graph that we have there are no restrictions we could have any number of nodes and any kinds of edges connecting them together in any way now technically there are some restrictions and that is that e is going to be less than or equal to v square now this is math so maybe it's a little bit scary at first but v just refers to the number of vertices that we have a k a nodes in this case you can see we have three nodes so let's write that down three squared is of course equal to nine so we're saying that the number of edges is going to be less than or equal to nine but why is that the case here well what we're saying is from every single node a k a vertex vertex is basically the singular for vertices from every single vertex we could have a pointer going to every other vertex so a pointer a k a an edge going here to one node and then going to the second node and actually we can have an edge going in on itself so a node can be pointing to itself so we really don't have any restrictions here so for every single node we could have v edges where v is the total number of nodes so you can see that's how we're getting v squared from to actually draw it out let's now go to b b could have a pointer going to a it could have a pointer going to c it could have a pointer coming onto itself same thing would see it could have an edge going to a it could have an edge going to b and it could have an edge coming into itself so each node has three edges going out of it so we get three times three edges right this is the maximum number of edges we could have i mean technically you could have duplicate edges but usually we don't consider that when it comes to graphs so you're pretty safe to assume that we never really have duplicate edges one thing you might have noticed here is that we have a pointer from a going to b we also have a pointer from b going to a so that means we can go between a and b you know back and forth and you know you could also say that there's a cycle formed between these nodes and you know there's many other cycles throughout this graph that i'm sure you can find another thing to mention about this graph is notice how every single pointer aka an edge has a direction this is referred to as a directed graph a directed graph means that the pointers the edges have a direction so you know maybe we don't actually have pointers going both ways so an example would be from b and c from b we can go to c but from c we can't necessarily go to a that probably makes sense to you because we've already seen instances of directed graphs trees are directed graphs linked lists are directed graphs but what does an undirected graph look like well when it comes to actually drawing them they can be drawn um you know many ways like you can just have a pointer that looks like this this indicates that this edge is undirected meaning you can go in either direction you can go from b to a or a to b another way of drawing that and i think it's more common is just to draw like this without any direction so if we had an undirected graph that looks like this that would essentially mean that every single one of these edges you can go in either direction and that would be true for every single edge that we add to the graph in case we had more nodes or something like that so this is a bit of background and this graph that we've been looking at is a very generic form of graph it's most common to represent a graph like this using adjacency lists which we'll talk about in a bit but first we're going to start with matrices or matrix if you're just talking about a singular matrix and we're going to go through all three of these because these are three very common ways of representing graphs and to be honest this second one adjacency matrix is much less common the most common ways especially when it comes to coding interviews is a matrix or an adjacency list so those are the two that we're really going to be focusing on throughout this section so let's start with a matrix we haven't really talked about this much throughout this course and you might already be familiar with what a matrix is but in case you're not i'll go over it briefly it's essentially a two-dimensional array but it can be used to represent a graph which is what we're going to do in this case but like i said it's a two-dimensional array so in a language like python it would look kind of like this you have one array this is the first row then you have the second array which is the second row and we keep going like that so in this case you can see we have four rows and four columns sometimes people use x and y to you know refer to the coordinates which in my opinion can get confusing because because in math x is actually commonly used to refer to a row and y is used to refer to a column which is something you might not be used to especially you know when you think about the x y coordinate system right when it comes to algebra or something like that so this can get confusing that's why i very much prefer to think of it in terms of rows and columns so i use r and c to do that and i would recommend that you do the same thing whenever i've conducted an interview i've seen so many candidates get messed up when they're using x and y to implement their coordinates so moving on how would you access some arbitrary coordinate like for example this one well remember that everything is indexed by zero so this is the zeroth row this is the first row this is the second row this is the third row this is the zeroth column this is the first second third column so if we want to access this point we'd go to row equals one and column equals two so with this coordinate system we would say grid at index row equals one so this will give us the first row remember this represents a array so this is not an accord this is not a coordinate this is a array before we had grid which is a two-dimensional array we've accessed index one this gives us a one-dimensional array so now within this one-dimensional array we can access a value like you normally would with a one-dimensional array we want to access the value at index two aka column two so we would get this and this is equal to zero right we're not assigning it to zero i'm saying here that this is already equal to zero this is the value that would be a returned when we do this so that's a bit of the basics now let's talk about how this is commonly used to represent a graph you might have a scenario like this in this case you can see this matrix is populated with zeros and ones let's say that zero represents a free space meaning you know this is a space that we're allowed to be located at any zero is something we can if we're creating a path somewhere throughout this matrix we can move along the zeros we can move there we can go here we can you know go there and there and maybe here and then down here right we can go through all these positions we can't access any of the ones because they're blocked we say right this is just how i've defined it and this can be a common way when it comes to coding interviews to define things so how exactly does this represent a graph well you might have already noticed typically when it comes to two-dimensional grids you're allowed to move left right up or down so you're allowed to move in four directions you might be able to move diagonally but it's typically less common so again how does this represent a graph well you could say that we have a bunch of nodes let's say that in this case the free spaces are nodes so just to draw a circle you know basically everywhere we have a node it would look something like this and of course the ones are going to be blocked out now here we have nodes but where exactly are the edges well remember i said we can move left right up and down so this is what the edges would look like in this case let me ask you are the edges directed or are they undirected well since we can move left right up and down typically they're going to be undirected from here we can move to the left from here we can move to the right so these edges are undirected but what about you know over here well we can move up i said but there's nothing above here so you know this is essentially you could say a null pointer but in reality this matrix does not have pointers at all these edges are implicit these nodes are also implicit these edges exist because of the rules that i defined i said we can move left right up and down maybe i could have defined you know us being able to move diagonally two spaces that's pretty uncommon i've never seen that happen before but we could define it that way so this is a really really common way of representing graphs when it comes to coding interviews and for now we're just going over how this is represented later on we'll actually talk about how common algorithms can be implemented using this there can be a lot of edge cases as you might have noticed so next let's talk about what an adjacency matrix is and these like i said are much less common than regular matrices and adjacency lists and we'll talk about why so we're going to use pretty much the same matrix we had before just for simplicity but that does not mean that these are nodes themselves when it comes to an adjacency matrix the dimensions represent the nodes so typically it's a square matrix let's say in this case the size is v times v where v represents the number of vertices so that that's of course why it has to be square because it's going to be you know this is going to represent the vertices and this is also going to represent the vertices we can't have you know a fourth column here because then we have to have a fourth row but if this represents the zeroth vertex and this also represents the zeroth vertex then what does the value here actually mean well typically it's going to be a zero or a one a zero in this position for example means that there does not exist an edge going from zero to one a one in this position means that there is an edge going from one to zero a little bit more formally if we have let's say our adjacency matrix which is this aka this represented in code and we have some arbitrary index here i earlier mentioned that we're using row and column but that's usually for actual matrices when it comes to a adjacency matrix we know that the index represents a vertex itself so in our example let's say we're talking about the first vertex that's what v one is and the second one where we're talking about the zeroth vertex so this means an edge exists from the first vertex to the second so in this case this is a directed edge if the value is one that means that we have an edge if the value is zero that means an edge does not exist from v one to v two but what if we wanted to go the other way what if instead we want to know if there exists an edge from zero to one well we just swap the order of these two more formally we would look if there is an edge from v two to v one the order that you index it represents the order that we're checking we know where an edge exists what direction the edge is going if v one is first then we're checking v one to v two if v two is first then we're checking v two to v one so just to draw out how this would look if we drew it as like nodes and edges let's just go through every single one well first let's actually draw each vertex we have a zero we have a one we have a two and we have a three remember the vertices are from you know the dimensions so let's just go through every single one so from one to zero we have an edge that means from one to zero we draw an edge going like that of course we don't have an edge going from zero to one how do i know that well let's look at our grid from zero to one we don't have a one here so there's no edge going here we have an edge going from one to one what does that mean there's a self loop so there's an edge going in on itself so notice how an adjacency matrix can basically represent the same information for an entire graph right we can have self loops we can have you know from each node we can have edges going to every other node so what's the maximum number of edges we could have with an adjacency matrix well if this is v the number of edges is going to be equal to v squared so that's consistent with what we talked about earlier running through the last couple edges we have to going from two to three so that would look like this and we could have going from three to a one which is over here so we have the three going to the one over here so why is it rare to use an adjacency matrix well take a look at how much space we used here no matter how many edges we actually have to represent a graph we're using an entire matrix that means the space complexity here is going to be v squared where v is the number of nodes or vertices that we have in our graph we could represent the exact same information like look at this graph here it's not a lot of information we have four nodes and how many edges do we have we literally have four edges so we could reduce the space complexity to v plus v right the number of nodes plus the number of edges which in this case is also equal to the number of nodes so you know in terms of big O that would be big O of v we could represent it this way so why would we use a matrix which takes up a bunch of extra space and doesn't provide you know a ton of information anyway well that's what leads us to our next way of representing graphs which is adjacency list that is essentially what we have drawn over here let's talk about it in more detail so now let's take a look at adjacency lists and these are typically the most common way of representing graphs especially when it comes to coding interviews I mean using a matrix can also be pretty common though so I think these two are good approaches to get familiar with so it's very similar to linked lists and trees that we talked about so we'll have something like a graph node let's say or you know you could just call it a regular node or vertex or whatever you want to call it but the important thing is that there's really two things when it comes to nodes right we know a node could have some type of value whether it's an integer or a character or a string or maybe the value itself is some object like it represents a person right or a city for example right and you want to know how are these cities connected together or something like that or if it represents a person maybe you want to know how are these people connected like maybe it could be a social network maybe person one follows person zero but person zero doesn't follow anybody because there's no edges going out from person zero that's just one example now what exactly would a self-loop mean in that context well it's hard to say that's why maybe we don't allow self-loops and that type of graph who knows but graphs are obviously very important now when it came to linked lists we could have either had a next pointer or a previous pointer when it came to binary search trees we could have a left or a right pointer but with generic graphs we could have any number of pointers so how do we represent it well here you can see in python we use let's call it neighbors or you know you could say following if you're in the context of like a social network but neighbors is a pretty generic thing to use and instead of just having one or two or three or five or you know pre-defining the number of pointers we could have we use a list now in python this isn't clear but what type is this list aka this array going to be it's going to be an array of neighbors aka neighbor graph nodes so how would we represent this graph when it comes to graph nodes well graph node zero would have a value of one and its neighbors would be empty because it doesn't you know have any nodes going out of it so neighbors in this case means what edges is node zero pointing to there aren't any edges for node one we would have neighbors of you know this graph node so a pointer to this graph node would be in the neighbors of node one and the graph node itself would actually be its own neighbor would graph node three be in its list of neighbors no because in this case we're using neighbors to refer to which nodes is this node pointing at it's not pointing at graph node three graph node three is pointing at it so if we were looking at graph node three it would of course have a value of three that's not so important what would its neighbors be its only neighbor would be node one that's the only node that it's pointing at so would have a single neighbor it would be a pointer to this graph node lastly two would have a value of two let's say its neighbors would be just three so in the list of its neighbors it would have a single pointer to graph node three this is obviously a bit more space sufficient than the adjacency matrix that we looked at because we only contain pointers for nodes that actually exist we're not declaring two pointers for every single graph node or three or five we're just declaring an array and it could be empty or it could have some pointers now what if we had undirected edges like maybe you know the way we've drawn this is wrong let's suppose that this actually doesn't exist for simplicity but let's say we had three edges that look like this right there they don't have arrows on them so they're undirected well that would just mean that for the neighbors of zero it would have one in its neighbors for the neighbors of one it would have zero in its neighbors it would also have three in its neighbors for three it would have one in its neighbors and it would have two in its neighbors and for two it would have three in its neighbors we when we have undirected edges we're basically assuming that there is you know a pointer going both ways something like this so that's the basics of the different ways to represent graphs now let's look at some really common algorithms when it comes to matrices and then later we'll talk about adjacency lists so now let's cover an example problem that you might see for a matrix and surprise surprise the algorithm we're going to use is DFS I told you it's one of the most common algorithms and that's because it's very very frequently applied to graphs so in this example we have pretty much the same matrix we looked at earlier we want to count the unique paths starting from the top left and going to the bottom right in this case of course the path can only move along zeros not on any ones and also for a single path we can't visit the same cell more than once why is that important well take a look at this we can go along these zeros and then we know we go down here and then left here and then we could pretty much be in an infinite loop over here among these four we could basically you know keep going in a circle and however many times we want to and then we could move down and then get to our destination the bottom right so in that case what would be the answer to this question well it would basically be infinity and that's kind of boring so we don't want to for any single path like suppose we go here and then we go down and then we go to the left now we can't move to this cell because we've already visited it and we can't move to this cell because we've already visited so the only option we have here is moving to the left we can't do that it's blocked so we can only go down now here of course we can visit a couple cells that we haven't visited before we can't go to the right it's blocked off but at this point you probably get the idea so when we want to count the paths we want to go through every single possibility so an algorithm like dfs is going to be helpful for us this also might remind you of backtracking which is what we talked earlier and that's because it pretty much is this is an example of backtracking there's a big overlap when it comes to dfs and backtracking they're both recursive and they're both very very similar in nature so this dfs is definitely going to be more complicated than when we looked at binary trees so i'm going to just show you the code immediately and we're going to see how in general the algorithm will run visually and then compare that to what the code would do and by the way yes again this is going to be a recursive algorithm with the recursive case and of course a couple base cases in this case we don't just have one base case we actually have two now the simplest way to think about this is not to immediately look at the code but to think about it visually it's all a matter of choices we start at the top left like that's always where we're going to start so in terms of the parameters that we're given we're going to be given the grid of course that's our two-dimensional matrix here we're going to be given what's our starting position in this case it's going to be zero zero so we can have kind of the dimensions written out so we're starting at the top left and what are our choices we'll remember we can move in all four directions we can move up we can move left we can move down and we can move right and we know that's going to be kind of annoying because we can potentially move out of bounds that's the first thing that we have to worry about so that's what the first base case is going to be first of all here you can see i'm getting the dimensions of the grid this is how you would do it in python so the length of the grid which is going to be this is going to give us the number of rows that we have the length of one of these rows you can see i'm just taking the first row because we know every single row is going to have the same length because this is a rectangle i think it's safe to assume that most of the time and the length of one of those rows is going to be the number of columns in this case we have a simple case because we have a square matrix so it's four by four but that might not necessarily be the case now in my case i'm you know calculating this every time we call the function that's pretty unnecessary to do we could pass in those parameters to the function but i prefer to do it this way because it gets kind of annoying when you have to pass in six parameters into a function but it doesn't change the time complexity so this isn't such a big deal the important thing is we have the dimensions of the matrix now one case is that our row remember this is the row goes out of bounds it's too small or our column goes out of bounds so what we could check is our row less than zero or is our column less than zero or you could simplify that into a single case because the minimum of these if the minimum is less than zero that's all we care about we just care about if one of these is too small so we can just take the minimum of the row in the column and check is it less than zero that means either this is too small or this is too small it doesn't matter because we're out of bounds either way what would we do if that was the case well if we end up going out of bounds that clearly means that we did not find a path at least down this recursive you know chain this path we did not find a path to the destination so in that case we would return true this dfs is supposed to count the number of paths you can see that there's a few more cases where we would return zero i'll get to those in a minute especially as we run through the example but to briefly cover them now another case of course for our row and column instead of being too small they could be too big our row could be equal to the number of rows why are we checking if it's equal to the number of rows instead of greater well remember the number of rows is going to be four not three we have four rows the last index is three but the number of rows is four so if row ever goes out of bounds if it ever equals four then we also return zero or if our column goes out of bounds it reaches four then we also return zero right we did not find a path we went out of bounds why can't we take the max of the row and the column and check if it's greater well because remember the number of rows and columns might not necessarily be equal in this case since they're equal we could take the max of the row in the column but maybe if we had four columns we would not be able to do that we know we go out of bounds this way or this way if we're less than zero because that's always going to be the starting point but the ending point could be different that's another case where we would return zero and another case would be this is the slightly more simple one is if we reach a blocked position like from here right starting at the top left we go down well we reached a blocked position we're not even supposed to be here so we basically stop searching we're not going to continue our recursive case because we're not allowed to even visit this position so we're going to stop immediately and we're also going to return zero basically we're saying by visiting this position there are no valid paths that reach the destination so again we return zero now the other case is if we reach a position that we've already visited before and what is this visit actually going to be well it's going to be a hash set data structure so that's what we're going to end up passing in it's not super clear in python because we don't have static typing but this visit is going to be a hash set and it's going to tell us if we visited a position before along a given path so that would be the example where we go here then we go down then we go left and then maybe we go back up to the same position at this point if we visited the same position twice we would stop we'd say nope if we visit this position twice there are no valid paths that reach the destination we're not allowed to do this so we return zero the other base case is of course we you know end up somehow reaching the bottom right we reach the destination how do we know we reached the destination well it's defined as the last row and the last column so if r is equal to the last row which is row minus one and our column is equal to our columns minus one then we return one that means we found a single path this is that particular path for instance right this is one path that could be counted as part of the solution now let's actually run through this example and see how the code would actually execute so when we actually call this function our dfs we're going to pass in the grid of course we're going to pass in zero as the starting row and zero as the starting column and we're going to pass in some hash set and that hash set is going to be reused throughout this recursive function what that means is we're not creating a new hash set every time we make a recursive call we're passing in a reference to that exact same object every recursive call is going to have access to that exact same object so starting at the top left we're going to you know run through our base cases have we gone out of bounds no we haven't gone out of bounds have we already visited this position nope is this position blocked nope let's assume by the way there's always going to be at least one valid path in this matrix but if you're not sure of that in a real interview it's worth asking that as a clarifying question to your interviewer next let's check maybe we've already reached the destination no we haven't we can't return so at this point what we're going to do is add this coordinate to our visit hash set the reason i'm using a hash set in this case is because it's just easier in terms of code to just add a pair of values to check if we've done this before and we know adding and removing from a hash set is going to be a constant time operation but alternatively you could use a two-dimensional grid like visually what i'm going to do is basically just mark the positions that we've already visited with a blue circle so if we wanted to know if any of these positions is visited we could use a hash set or we could use a two-dimensional grid of the same size as the grid to represent that information if you're allowed to modify the grid you could also as you visit a position mark it as a one to indicate that it's already been visited because then logically we will basically assume that we can't visit that position but it's not always safe to assume that you can modify the input grid so we're going to ignore that for now so now that we've added this to our visit hash set it's time to do the recursive case we're going to set our count equal to zero or basically saying that from this position we want to count how many ways we can reach the result and we're allowed to move in four directions so we can move down that's going to be row plus one we can move up that's going to be row minus one we can move to the right that's going to be column plus one and we can move to the left that's going to be column minus one now when we go up we're going to go out of bounds our row is going to be less than zero so we're going to end up returning zero okay what about when our column goes to the left we're going to end up returning zero what about when we go down well in that case we're going to visit a blocked position so in that case we're going to return zero again because it's blocked so the only valid position we can go to is to the right over here and when we do that we're not out of bounds we haven't already visited this and it's not blocked it's also not the destination so we can't return one but we are going to add it to our visit hash set so so far we visited these two positions now that we've gotten an idea of how the base cases work i'm going to start running through them a little bit quicker so from this position now this second position that we're at we're going to have another count variable we're counting how many ways from this position can we reach the destination and when we find that result we're going to return that to the previous position that we were at which was the starting position and then we're going to have our answer basically these two are going to have the same count now from here we're allowed to go in four directions we can go up can't do that we can go to the left that is valid but it's already been visited we know that because it's blue here but we know in code we would add it to our visit hash set so it's already been visited we can't do that so we can go down well we can't do that either it's already blocked so we can't visit this position so we can only move to the right and then from here this is valid neither of the base cases are going to execute now from here we're also going to have a cost which is initially going to be zero but when we calculate it we're going to end up returning that cost to the previous node which is going to end up returning its cost to the previous position but from here we can go in four directions we can go up it's out of bounds we can go to the left it's a position we've already visited but we have two valid positions now that we can go to we can go to the right or we can go down now since this is a dfs depth first search it's recursive we're not going to go along these two paths simultaneously we're only going to do one at a time we're going to go as deep as we can in one of them and then we're going to go backwards we're going to backtrack that's why this is a backtracking algorithm and then we're going to go back to where we were and then we're going to go along the other path so in terms of code which one of these would execute first while it looks like row plus one which is this one would be first so let's just go along that so but this is going to be added to the visit hash set and then from here neither of the base case is going to execute this is going to be added to visit and now we have four choices we can go up can't do that's already been visited can't go left it's blocked we can go to the right here and we can go down here which one of them is going to execute first well as we saw before down is going to execute first but just to actually quickly run through the other case what would happen if we went here well then we don't have many choices we can't go to the left we can't go to the right we can't go down we can go up here right and then when we go up here we'd be stuck we can't go to the left we can't go up we can't go down we can't go below because it's already been visited so this path essentially what i'm saying is would not lead us to a solution so let's keep that in mind but going down here it's also not a base case we have some choices can't go up can't go to the right but we can go left and we can go down what's going to happen by the way if we went left well here we wouldn't have many choices we could only go to the left one more time and here we can only go down and here we have zero choices we haven't reached the destination and we can't move anywhere else we can't visit the same positions we've already visited so we'd be trapped and along this path we would end up returning the base case zero okay so the question remains from here can we even reach the result at all can't go to the right well we can go down so from here the only valid choice as you can see is moving to the right recursively and at this point we would finally execute the base case where we have reached the destination and then we would return one so from here this cell we would have a cost of zero we would execute four dfs's in all four directions three of them would return zero this one would return zero this one would return zero and this one would return zero but the one here the one where the column is incremented by one would return one and at this point we would end up returning one to the previous node but before we return one to the previous node you can see we're backtracking first we marked this position as visited and now we're going to mark it as unvisited you're going to see why we're doing that in just a second so from here now we're going to backtrack we're going to mark this as unvisited and go to this node from here we know that that there was only one path here that led us to the solution so from here the count would also be one and we would return one to the previous node and before we do that we'd mark this as unvisited so then we're going to go back to this node we know from here only one of the paths reached the destination so from here we would return one to this node now this node if you recall had two choices we could either go down which we just talked about right now or we could go to the right now from here we're going to go to the right we're going to execute the second valid case but now as you can see all of these are marked as unvisited why is that allowed i thought we could only visit the same position once well that's only true along a single path what i'm getting at here is we could from here have two paths we could you know do something like this that reaches the destination but we could also have a path like this that reaches the destination these are two different paths so they are allowed to visit the same position right both of the paths visit this position as you can see and this position and this position and even the result i mean of course they have to both reach the result we said only a single path can't visit the same position twice but multiple paths can reach the same position in fact they have to if they want to both reach the result so i hope that makes sense it can be a point of confusion but as you can see from here uh you know we don't have many choices we can go down and then we can go to the left and then we can go down and then from here how we're going to end up trying this path it's not going to reach the result and we're also going to try this path where we go down and then to the right and again we are going to reach the result but this is a slightly different path this is a slightly longer path but it's still valid so what we would realize then is backtracking all the way back is from here we can reach the result in one way from here we can also reach the result in one way so what that means is from here we can reach the result in two ways and what about from here well from here we're going to end up returning the count back to this so we're going to say that from here we can also reach the result in two ways and we're going to return that to the previous, which is here, and we're going to say we can reach the result from the starting position, we can reach the destination in two ways. And that's going to be the final value that we return. So when we call our DFS with these parameters and then print the result to is going to be the value that we print, you can try it in any of the code languages that we provide. So now finally, when it comes to analyzing the time complexity of a solution like this, it can get pretty complicated. So I like to think about it in a simple way, but it's not super precise. I mean, the way I'm going to think about it is just by proving to you the big O of N boundary, but it's not going to be super precise. We know from the starting position, let's suppose that the worst case is where there's all zeros. We don't have any blocked paths. So in that case, the length of a path could be the size of the matrix, right? Any given path could be the size of the matrix. So thinking about it in terms of a decision tree, we have from any position, we have four choices. And for each of those positions, we have four more choices. And by four choices, I mean we can go left, right, up or down. Now, of course, we know some of them are going to execute the base case where we either go out of bounds or maybe we visit the same position twice or we visit a blocked position. But like I said, this is an imprecise way, but it still gives us an idea of the upper bound for every node. We're going to have four choices and the height of this tree is going to be the size of the matrix because we know the longest path, which is what this tree represents. This tree represents a given path. Like if you follow a path in this tree, it represents a path in the matrix. So the height of the tree is going to be the number of rows times the number of columns. Now, if you recall from our binary tree analysis, this is going to be basically the number of branches, which in this case is four raised to the power of the height of the tree, which is R times C. More commonly people will, you know, use the dimensions to mean N times M. So you could say that the time complexity of this is going to be four to the power of N times M. That's the big O time complexity. So this is not very efficient, but that's expected when it comes to brute force DFS backtracking. Now, the memory complexity would essentially be the recursive call stack. That means when we started this position, then we make a recursive call here, then a recursive call here, here, and you know, do that for pretty much the entire size of the matrix. At most there could be that many recursive calls outstanding at the same time. So we say that the memory complexity is going to be N times M. We could also say this is the memory complexity from our visit hash set, because that is also going to be the size of the grid. Or instead of having a hash set, you could just create a grid itself to keep track of visited positions. So that's the idea here. I know it's pretty complicated. It's not the type of thing you'll completely understand the first time you see it. It can take a while to wrap your head around a lot of the concepts that we talked about here. But DFS, especially on a matrix comes up quite often. Check out the practice section for graphs if you want to find some practice problems that you can try out. So next let's talk about the BFS algorithm run on a matrix. And as you can see, I'm going to be jumping into the code relatively quickly because we actually have even more code than when we were talking about DFS. But the good thing is, at least in my opinion, that this is more simple to understand visually than DFS. If you recall BFS, at least when we ran it on a tree was, you know, something like this where instead of, you know, starting out the root and then going as deep as we can and then going as deep as we can on the other side. What we did is we went layer by layer, we went through the first layer, then we went through the second layer, and then we went through the third layer, etc, etc. And that's the same idea here. We're going to go through the first layer and then the second layer, and then the third layer, and then the fourth layer, and then the fifth layer, etc, etc. Now what type of question would this algorithm apply to? well, by far the most common is the shortest path algorithm. Suppose we want to find the length of the shortest path from the top left to the bottom right and then return that the best way to do that. The most efficient way is the BFS algorithm. We could also use DFS if we really wanted to the algorithm that we just talked about. We could use DFS. We could go. We could basically check every single possible path that can reach the destination and then find the length of the shortest one of those, but that's a brute force approach. BFS is actually more efficient. The time complexity of this is actually just going to be the size of the grid. Let's say that's n times M where, you know, this is the number of rows. This is the number of columns. This is what the time complexity is going to be. Whereas DFS was much, much less efficient was four to the power of n times M. So let's see how we can run this algorithm in terms of code and why it's so efficient. So the setup is going to be taking the dimensions of the grid. In this case, we're just past a grid. Let's say we already know that the starting point is the top left and we're going to the bottom right. We're going to have a visit hash set just like with DFS. So there are some similarities. We're going to have a queue just like for BFS when we ran it on a tree. The queue is basically going to tell us the current level that we're at. So initially the queue is just going to be initialized with the first node. That's our starting point. And we're also going to add it to the visit hash set. It's already visited. And now from here, we want to see this is basically all the nodes we can reach with a length of zero. So this is, you know, our starting point, essentially, then we're essentially going to loop while the queue is non empty. And we're going to take a snapshot of the length of the queue. Initially, it's just going to be one, just a single element. And we're going to pop all one elements just like with a tree. But when we pop, we know we're not getting a node. We're actually adding the coordinates of the position. And when we pop, we're getting the coordinates of the position. Now we might have found the result. What if we were at the destination? In that case, we would return the length of the path. Initially, like I said, our length is zero. But in this case, we know we haven't reached the result. So one way to code this up. And basically this is going to look complicated. It's a lot of boilerplate. But once you get used to it, it's pretty easy. It's just a pain to type out in my opinion, because we have a lot of conditionals. We know we can move in four directions, just like we did in the DFS approach. And we know there's a bunch of edge cases, we could go out of bounds, we could reach a position that's blocked, we could reach a position we've already visited before. Those are a few edge cases, and we're handling them the exact same way we did with DFS, where if the minimum of the row and the column is less than zero, we're out of bounds. If the row is equal to the number of rows, we're out of bounds that way. If the column is equal to the number of columns, we're out of bounds that way. We've might have already visited before, or it could be the value one. That means it's blocked. What we would do in that case is continue. That means we would just continue to the next iteration of the loop. We would just skip this coordinate. But you can see it's not as simple as just taking a row and a column. I'm writing it this way, basically to eliminate duplicate code, we could have conditional, we could basically have this conditional four times for all four directions, or we could put this in a helper function. Another way to do it, though, is to take all four of the neighbors. When we say neighbors, these are not actually the neighbors necessarily. These are basically the directions one. So this would represent the right direction. This would represent the left direction. This would represent below and this would represent above. So when we take these directions and add them to the row and the column, so that's what you can see we're doing for basically all of these for row. We're adding D R D R just stands for difference in row. We're basically saying how much did the row change? How much is the difference of the row? This is zero. And that means the column is changing, right? Basically, what I'm saying is by taking these, which represent the difference in row and the difference in column and adding them to the row and the column, we will basically have all four directions enumerated. So this is sort of more complicated than it looks. You can write it out the long way if you would like, or you can even, you know, put this kind of stuff in a helper function if you would like, but this is another common way that it's done. And this sort of minimizes the amount of code that we have to write. I could also here have declared another variable saying, you know, the new row and the new column so that we wouldn't have had that right row plus D R C plus DC out, you know, multiple times, you can see I had to write that out a few times here, but I wanted to make it explicit in this case to show you exactly what we're doing. But feel free to modify this to make it look more simple or match your coding style, however you would like. But now that we kind of covered what this is doing, this is just a bunch of boilerplate to go through all four directions using a loop. What are we actually going to do with those four directions? Well, of course, we know that the base case, we go out of bounds, reach the same position twice, or we reach a block position. That's when we do nothing, we continue. Otherwise, it means we reached a valid position. By the way, which one of these four is a valid position, only the one to the right over here. So this is the only valid position. What are we going to do with this? We're now going to add it to our queue. We know that's the next layer. This is the only node we can reach with a path length of one. So we add it to the queue and we also add it to visit, we don't want to visit this multiple times now. Now after we've gone through all the neighbors, and we popped this position from our queue. Now we have this in our queue, and we popped this from our queue. Now we have this in our queue. So after that loop, we're also going to increment the length by one to indicate that this layer that we have now added to our queue can be reached, all these nodes can be reached with a path length of one. So now let's continue. And at this point, I'm going to focus mostly on the visuals. We know we can go in four directions down up can't go the in either of those directions, we can't go left, it's already been visited, but we can go to the right. And that's the only position we can go to. That's the only thing we're going to end up adding to our queue and adding to visit. And of course, this is going to have been popped from the queue. And after that's done, our length is going to be incremented to two. By the way, when we did pop this guy, we checked. Is it the destination? It was not. So we did not return the length, which at the time when we popped this was one. So if it was the destination, we would have returned one, which is what we would have wanted to do. So now we get to a bit of a more interesting case. We only have one value again in our queue, but we have two valid directions we can go to right and below. And both of those are going to be added to the queue and added to the visit hash set. We wouldn't be able to add this one, of course, because it's already been added to the visit hash set. And then we would increment our length by one. So our length would now be three. So our length would now be three. I'm going to write the length below now since we're kind of running out of space up there. But basically, this is the layer of nodes that we can reach with a path length of three starting from the top left. And through that process, we know we would have popped this from our queue. So this is now part of our queue. We're going to go to the next layer. So this is where it's going to get a bit interesting because the length of our queue is actually two now. So we're going to pop both of these from our queue. It doesn't really matter which one we pop first. Let's just say we popped this guy first. So we pop it from the queue and we get its neighbors. Can't go up. It's already been visited. Can't go left. It's blocked. We can go to the right and we can go below. So what we're going to do is add these to our queue and add them to the visit hash set. Now you're going to see why we add to the visit hash set as soon as we add to the queue because that's very important because next we're going to pop this guy from our queue. So we pop this from the queue and we see can we go up? Nope. Can we go to the left? Nope. Can't go to the right. And we can't even go below. This was open, but we added it to the visit hash set. So now it's we know it's already been added to the queue. So we're not going to add it to the queue again. This is very important. This is why the algorithm runs an end times M complexity because we know for sure we're never going to add the same position to the queue twice. We're never going to visit the same position twice. And we still know that these are all of the cells we can reach with a path length of four. So this is going layer by layer. That's very important. So now we have two positions in our queue. So we're going to execute this loop again twice for two iterations. We're going to pop let's say this guy first. We can't go below can't go to the left can't go right and can't go up. So we pop it, but we don't end up adding anything through our queue. It's just going to execute the continue statement. So at this point our queue only has this guy. So we pop this from our queue and then we add its neighbors can't go up can't go right. We can go below and we can go to the left. So these are now going to be added to our queue. What's the length to reach these? Well previously it was four. So let's add one and get five. We can reach both of these with a length of five. That's the shortest path to reach both of them. Now we don't actually know what is the shortest path. We're not keeping track of that. It would probably be this or this. But that's not important. We just care about the length in this case. And at this point we're almost done. We have two positions in our queue pop this guy can't go up. It's blocked can't go to the right. It's visited can't go below its blocked. We can go to the left and add this to our queue and to our visit asset. And then we pop the last one in this layer can't go left can't go below can't go up. We can only go to the right. And at this point we've gotten to the destination. But you can see this is where we look for the destination after we pop from our queue is when we check is this the destination. We don't check if it's the destination when we're adding to the queue. I mean we could have done it this way we would have to rewrite it a little bit because we're only incrementing the length after we've updated the entire queue. But let's just run this very quickly. So we finished the this layer. So now our green layer here takes a length of six to reach now finishing up let's say we pop this guy first we pop it and we can't go up can't go left can't go right we just add this guy to our queue and then we pop this one. So when we pop it we check is it the result that we're looking for it is. And by the way at this point our length would still be six. So we would return that and that would be the result. So I color coded it a bit to indicate to you the length of each of these. So to actually run that you could see BFS would just be past the grid and then the function itself would take care of the rest and then we could print the result in this case it would be six the path would look something like this. But that's the main idea as you can see with this approach we're never going to visit the same position twice we're never going to add the same position to the queue twice worst case we'd visit the entire grid so in that case the time complexity would be the dimensions of the grid which is n times m the space complexity would be the same because that's the max size that our visit hash set or our queue could end up being so this would be the memory complexity and also the time complexity. So in my opinion this is a bit more easy to understand in terms of like the visuals code wise there's a lot of boiler plate though as you can see so it can kind of be a pain to write out but once you get used to it it's just a matter of writing out all the you know edge cases and getting them right but it can definitely take a while to get good at it. I'd recommend taking a look at some of the problems in the practice section under the graph topic because bfs is definitely a common algorithm that comes up a lot. Next let's move on to adjacency lists and the good thing in my opinion is that these are much more simple to run an algorithm on rather than on a matrix. Now originally we talked about the idea of having a graph node class or an object for an adjacency list where you know we're storing some kind of value it could be a string a character an integer and then it's also going to have a list of neighbors the neighbors are going to be pointers they could be directed or undirected but essentially they're going to be pointers pointing at other nodes that's one way to do it and if we were actually creating our own graph we would do something like that but when it comes to coding interviews it's actually much more common to just use a hash map to represent an adjacency list because as you can see there's two things we care about the value or you know the id of each node as long as that is unique and for coding interviews it's almost always unique each graph will have something that uniquely identifies it whether it's an integer or a character or a string in this case let's say we have a character a and so that's going to be the key of our hash map so this is how i'm doing it in python but the important thing here is that this is the key the character is the key and the value for that key is going to be a list and that list is going to represent the neighbors so this is just an example adjacency list where we have two nodes a and b now the lists here are empty so that basically means there aren't any pointers that's just a very simple example just to kind of show you that this same idea can be represented using a hash map now let's actually do a very common algorithm when it comes to coding interviews which is usually you're not just given the adjacency list itself you have to build it yourself you are given enough information to build it though so the most common thing is you're given a list of edges in this case these are directed edges and we want to build an adjacency list with them what does directed mean again well for this edge we're basically saying that there exists some edge going from a to b so it's directed so it's going from a to b this is what it would look like in terms of the picture now how do now how do we go about actually building the adjacency list well let's just first declare a variable this is let's say an empty hash map we're going through every single edge which is a pair of nodes here the values are you know the identifiers for that node so we're calling it in this case source destination so source is going to be the first of the pair and the destination is going to be the second of the pair this is how you would do it in python but other languages will also be provided below so one thing is that our hash map is empty so these keys might not even be in the hash map itself so we so if they're not that's what we want to do so we're first going to check is source not in the adjacency list is this key not in the hash map if it's not let's add it to the hash map and give it a value of an empty array so the neighbors are you know initially empty we just added this node to the adjacency list we're going to do the same thing with the destination even though we're not going to be adding anything to the array of the destination neighbors because we want to at the very least make sure that every node that we saw in the list of edges is added to the adjacency list even if it doesn't have any neighbors because we want to at least have an empty list to you know confirm it doesn't have any neighbors so here what we would say is for the adjacency list of the source for our first example that would be a we're going to append to its neighbors which initially is an empty list we're going to append b so visually we would say a points at b it has a neighbor b going to the second edge we would do the same thing b though in this case has already been added to the adjacency list we did that previously with this line but c is not so we add c to the adjacency list it doesn't have any neighbors though of course just yet at least but now we know b points at c so we have an edge going from b to c and then third edge b to e we're going to do the same thing put an edge over here c to e that means there's an edge going like this and e to d so there is an edge going like this so this is what the graph would look like we were given all the information we needed we were given the edges but from the edges we could imply what the nodes are and then we have a representation that looks like this this is the same information that would be stored in our adjacency list which you know we're using a hash map to do that and you can see why because it gives us all we really need now let's move on to actually running some common algorithms like dfs and bfs on adjacency lists and the good thing is these are a little bit easier than doing them on a matrix so let's run pretty much the same algorithm we did on the matrix using dfs depth first search we want to count all the paths and again this is going to be a backtracking algorithm but the code is a lot more simple mostly because we don't have a thousand edge cases to worry about we don't have to worry about going out of bounds that's what makes adjacency lists nice so assuming we're given this input that's our adjacency list and assuming it was given in the form of a hash map in python we're going to run dfs find account the paths let's say the node the starting node is going to be a in our dfs and the target node is going to be e so we want to count the paths from a to e the adjacency list is passed in and we also have a hash set which is going to keep track of visited nodes along a single path just like we did with a matrix so that hash set is passed in we're going to call this function and then print whatever it returns so starting at a we check has this node already been visited if it has there's zero paths from here to the result it's not the case though well is this node already the target if it is then that means we found a path to the target we can return one it's not e so it's not the target then we declare a variable to count how many paths from a to e we add the node to the visit hash set and we go through its list of neighbors the good thing here is it's so easy to iterate through the list of neighbors because we have it in our adjacency list we don't have to go you know manually in four different directions that's really annoying to write out so we call dfs on each of the neighbor nodes now maybe a doesn't even have any neighbors you know that would be an empty list an empty array and then this for loop would never even execute but in this case a does have one neighbor so we were at a now we're going to be at b i'm using these arrows to kind of indicate which node that we've already visited and which node we're currently at so we're at b right now so recursively we're at b is this node already been visited nope is this node the target nope so now we're counting how many paths from b to e so that will be zero initially we'll add b to the visit hash set and go through the neighbors of b it actually has two neighbors c and e let's say we call dfs on c first notice how it's dfs not bfs that means we're going as deep as we can in one direction so we get to see this is not visited already and it's not the target so we count paths from c to the target and we add c to visit hash set and we go through its list of neighbors it only has a single neighbor so then we call dfs on e now we are at e it's not been visited already but it is the target node so we return one in this case that means we found this path that leads to e so we return one this is the base case so we return one we backtrack and then we end up back at c from c we called dfs on e it returned one we add one to our count and then this loop is going to stop because we only had one neighbor from c so then we're going to backtrack now from visit we're going to remove c so you can think of that as you know removing this pointer and then we're going to return the count which is one and that one is going to be returned back to b and then from b's perspective we're going to take that result one add it to its count so its count is now one but for b it actually has a second neighbor it has the neighbor e so we're going to call dfs now on e from b so pointer here again same base case is going to execute now we found another path that looks like this so we return one from e we add that one to our count for b which is two so we know there's two paths from b to e and then that's its own and that's its last neighbor so then we're going to remove b from the visit so you can think of that as removing this pointer and that's kind of the backtracking part and then two is going to be returned back up to a so two is over here and then a only had a single neighbor so its loop is going to stop and it's going to end up returning to so there were two paths from a to e now we didn't even end up visiting d that's because we didn't need to whenever we get to the target we are going to immediately stop so you can see that this is much more simple than running it on a matrix but the core ideas are definitely the same we have some base cases we want to you know enumerate every single possible path and if we end up visiting a node then we want to stop now in this case we never visited the same node twice along a single path but for example suppose that from c there was actually a pointer back to b in that case uh you know let's say we do a dfs we get to a then we get to b then we get to c and then from c we have two choices we can go to e or we can go back to b but if we did that we went back to b we'd see well we've already visited this once along this path so we can't just visit it again so at that point we would end up returning zero from there we'd say that there's zero paths going to the target where we start at b come to c and then go back to b there's zero valid paths doing it like that so it can get complicated to analyze the time complexity of this type of brute force back tracking dfs approach so i'll try to keep it simple we know that any given path we know that we're going to go through every single possible path in the entire graph i mean we have to to count the paths so that could be the worst case the length of a path is going to be no larger than the number of vertices that we have because we can only visit each vertices once along a given path so that is going to kind of represent the height of our decision tree now for each a vertex how many choices could we have in the worst case it would be you know every vertex is connected to every other vertex so that would mean v choices but we're going to say let's just use some value n where n is the average number of edges that each node has this is a slightly more common way of talking about it but the important thing is just kind of understanding the ideas that i'm talking about here understanding the ideas of choices and how long could a path be and stuff like that so the height again of this decision tree when i say decision tree i mean you know all the decisions that we could make when we're counting the paths on this graph the height is going to be v the number of choices we could make is going to be n where n is let's say the average number of edges that each vertex has so in that case the time complexity analyzing this the same way that we did a binary tree is going to be something like n to the power of v so big o of n to the power of v now it's pretty rare to actually have to analyze the time complexity this type of way but the important thing to recognize here is that this dfs this backtracking is not going to be very efficient just like most backtracking algorithms this is exponential we know it's exponential because the power here is going to be a variable if we had n squared that's not the end of the world because the power is a constant value even n cubed wouldn't be that bad but here as the size of our graph grows which is the this time complexity is going to grow extremely quickly so this is very very inefficient bfs is much more efficient as we will see so let's take a look at the same example problem let's try to find the shortest path from a node to a target so in this case the starting is going to be a again and the target is going to be e those are the parameters we're passing it and we're passing in the adjacency list that we built you know for this representation now the bfs is going to work similar to the matrix we're going to initialize a few variables the length is initially going to be zero we're going to have a hash set and we're going to add the starting node as visited to that hash set and we're going to have a q and we're going to add that starting node to the q so that's the first level in our bfs so while our q is not empty we have a single node in our q right now we're going to iterate through the length of that q at that given moment so we're going to take a snapshot of the length it's one so we're going to run this loop one time and we're going to pop one node so we pop that node this is our current we're going to check is it equal to the target if it is we can return the length which is initially zero so we're saying you know the length from starting here to arriving here is zero that would make sense but this is not the target so we go through its list of neighbors the same way we did earlier but we check is this neighbor not already visited if that's the case we can add it to visit and we can add it to the q but if it is already visited then we don't do either of those things in this case it has one neighbor b and yes we're going to add this to the q and add it to visit so then this loop will stop and this loop will also stop because we went through every node in the q at that point and then we can increment our length so we know it took one it took a length one to arrive at this level next we check are we still executing the while loop well our q is not empty so we are we're going to take a snapshot of the length of the q it's one it's just one a node so we're going to run this loop once we're going to pop this node it's not the target so we're not going to return we're going to go through its list of neighbors it has two neighbors and each of them have not already been visited so we're going to add both of them to the visit hash set and add them to the q and then we're going to increment our length by one so it took a path length of two to get to this layer or this level of the graph and now we're going to run the while loop again we're going to start going through both of these nodes we're going to take a snapshot of this q it's going to be of length two because there's two of these here we're going to pop them we're going to let's say we pop c first we would pop it we would go we would check it's the target nope we would go through its list of neighbors it has a single neighbor which is e we would check has it already been added to visit yes it has so we're not going to add it again then we're going to end up popping the next one which this is the only one remaining we pop this is it equal to the result yes it is so we return the length it's two that means that the shortest path was length two and we know that because we went layer by layer these are all the nodes we could reach with a path length of zero these are all the nodes we could reach with a path length of one these are all the nodes we could reach with a path length of two and we find and this is the first time we encountered the result node so the shortest path to that node must be two and that's what we return. Now here the time complexity is going to be the same as earlier. It's going to be the size of the graph. But earlier we talked about the, for a matrix, the size of the graph was n times m. And technically what we didn't account for when we were doing that were the edges. This is the number of nodes in the matrix graph, but the number of edges for each node was four. So technically the time complexity for running BFS on a matrix is four times n times m. But we know we don't care about constants. So this is the time complexity we get here. But in this case, let's say the number of nodes is actually V for vertices, right? So that's the size of the graph. But how many edges could each node have? Well, technically V, right? Because remember, we learned earlier that E is going to be less than or equal to V squared because each vertex could have V edges. So if we're talking about edges for each node multiplied by the total number of nodes, it could be V times V, which is equal to V squared. But we don't write it this way because we know for sure here, that's not necessarily the case. This is not a full graph where every single node has the maximum number of edges. So to be more accurate, we could say the size of this graph is actually V, which is the number of vertices, because we potentially will have to visit every single one of these in the worst case with our B, with our BFS, we'll have to add each of the vertices to our visit hash set. But we might also have to travel along every single edge, which may not necessarily be equal to V. It could be greater or it could be less. So to be more accurate, we say that the time complexity for this BFS is equal to V plus E, where E is the number of edges in the graph. The space complexity, though, is a bit more simple. We know in the worst case, we could add every single node to the visit hash set and every single node to the queue. So the memory complexity is going to be big O of V, where that's the number of vertices, of course. Now before we finish up with graphs, I want to mention that we definitely learned a lot here. And that's because graphs are a very, very big topic in general, but also for coding interviews. There are a topic that comes up very frequently. But one thing we didn't talk about is what if these edges actually have some kind of weight attached to them? For example, the length of, you know, this edge is not just one. We've kind of been assuming each of these edges has the same length. We treat them the same. But what if, you know, the same way how cities are connected by roads that could be of different length? What if the length of this was two, this was three, this was also three, and maybe this one is 10? What would be the shortest path in that case? It would have been this over here, not what we had. We said the opposite. We said this is the shortest path. So our algorithm here would get that wrong. There is a much more complicated algorithm that could account for edges that have some weight or value attached to them. And if you want to learn more about it, I recommend checking out the advanced algorithms course. There's actually tons of graph algorithms that we didn't talk about, but they can get very, very complicated. There's a lot of academic research that's actually gone on in graphs. These are really complicated structures. But I think understanding DFS and BFS gives you a really, really strong foundation. After all, the shortest path algorithm that could solve this problem is actually based on BFS anyway. It's just a modified version. If you'd like to practice some graph problems, I recommend checking out the practice section and going to the graphs topic. Okay, so now let's finally move on to dynamic programming. Are you nervous? Because you should be. Just kidding. Dynamic programming can be really scary for a lot of people, but I'm going to try my best to try to make it as simple as possible. And the first problem we're actually going to be looking at is one we already talked about when we were talking about recursion. And that is the Fibonacci sequence. And this is an instance of a one dimensional dynamic programming problem. Let me show you what I mean. But first, let's recall what the Fibonacci sequence is. Basically, the zero Fibonacci number is defined as being zero. The first Fibonacci number is defined as being one. An arbitrary Fibonacci number, the nth in this case is defined as being the n minus one Fibonacci number plus the n minus two. For example, the second Fibonacci number is equal to the first Fibonacci number plus the zero Fibonacci number. Therefore, the second Fibonacci number would be one plus zero, which is just equal to one. And then we could continue calculating the third and the fourth Fibonacci number and continue going just like that. Now if we wanted to calculate an arbitrary Fibonacci number, let's suppose the fifth Fibonacci number, we would do it using a loop, kind of how I just talked about right now, we would calculate the second and then we'd calculate the third, the fourth until we got to the fifth. It'd be pretty straightforward. We could do it in O of N time using a loop like that. What you might not realize is this is actually an instance of a dynamic programming problem. But to really understand the fundamentals of dynamic programming, I'm going to spend a lot of time analyzing it and showing you how we can take the recursive solution that we talked about earlier and convert it into a dynamic programming problem. Because that's what you'll normally be doing when you're when you start getting to more difficult dynamic programming problems. So when we solve this problem recursively before, we started with the decision tree. We wanted to calculate the fifth Fibonacci number. And we know to do that we have to get the fifth minus one Fibonacci number aka the fourth Fibonacci number and add it with the N minus two Fibonacci number aka five minus two aka three. And basically we keep drawing out this decision tree until we get to the base cases which are these two here. So let's continue doing that for four. We know we have to get the third and the second Fibonacci number. So for three, we have to get the second and the first Fibonacci number. And then here, we're going to do the same thing. Notice we're doing the same thing that we did over here, we're going to get the second, the first Fibonacci number for two, we're going to get finally to a few base cases one and the zero Fibonacci number. And for two here, we're doing the exact same thing. So this is some of the repeated work that you might be noticing. We're going to get the first Fibonacci number and then the zero Fibonacci number. And then one is already a base case. So we're not going to do anything. We know the first Fibonacci number is by definition equal to one. Now the only one of these we have to expand upon is this one over here. So to again, repeating the exact same thing we already did a couple places we have to say this is going to be the first Fibonacci number plus the zero Fibonacci number. Now the code for this approach would look something like this. So we have a recursive function. We know that if the Fibonacci number we're trying to calculate, let's say it's n is less than or equal to one, we're going to return n itself. So basically if n is equal to one, we return one. If n is equal to zero, we return zero. If that's not the case, we call our brute force recursive call in both directions, you know, n minus one is going to be the left branch and minus two is going to be the right branch. And then of course we would get to the base cases and then, you know, those would return up and those would return up and, you know, they would go all the way back up until we got to the root. The downside of this approach, though, is the height of this decision tree is going to be, you know, the Fibonacci number that we're trying to calculate. In this case, it's five. So the height of the tree is going to be approximately equal to five. The number of branches is going to be two. So in terms of big O of n, we would say that the, you know, size of this tree is going to be roughly equal to n. I mean, we can see that some of the levels aren't completely filled, but this is a rough upper bound. And in this case is what the Fibonacci number that we're trying to calculate is. But we see we're doing a lot of repeated work here. When we calculated the second Fibonacci number over here, you know, we had to draw out this subtree, this recursive subtree. And then we wanted to calculate the second Fibonacci number again. And we had to draw out the entire subtree. And then, you know, we even had to do that a third time over here. But what's even worse than that is when we wanted to calculate the third Fibonacci number, we had to draw out an even bigger subtree over here, but we didn't only do it once, we had to do it twice. We had to draw out the same subtree over here. So what I'm getting at is there's a lot of repeated work that's going on. And there's a technique that we can use to eliminate this repeated work. It's sort of based on dynamic programming. So this technique of optimization is called memoization. We can basically take a brute force recursive solution. If we see we're creating a decision tree to solve the same sub problem, in this case, calculating the second Fibonacci number, we can cash that basically after we calculate it once, we can store that somewhere. So that next time when we're trying to solve the exact same sub problem, we don't redo all of the work. So in this case, we wouldn't even create these sub trees. We would automatically know what the second Fibonacci number is because we calculated it once. So in terms of code, you can see it looks very, very similar to the recursive solution. That's why when you're first learning how to do dynamic programming problems, what people do is they'll solve the recursive problem and then try to add memoization, try to add caching to it. And I still, to be honest, use this technique when I'm solving a difficult dynamic programming problem that I can't figure out. You can see when I call the brute force on passing in 10, but when I call the memoization I'm passing in five, please ignore that. You can consider this as a five. We're calculating the fifth Fibonacci number using the brute force way. And now with memoization, we're calculating it the memoization way. You can see we're passing in a parameter, which is the cash. Most commonly, I like to use a hash map to represent the cash, but you can also use a array, but I think it's easier using a hash map. So in this case, I'm actually going to draw our cash as a array because that's essentially what it is. We're mapping a integer. So in this case, you know, a Fibonacci number. So for example, in this case two, and we're then we're mapping it to the Fibonacci number, you know, basically the second Fibonacci number. So I'm going to draw it as an array, but we know we're actually using a hash map in this case. So initially, we pass in an empty hash map into memoization. So, you know, think of this as the empty hash map, we pass in five, we're trying to calculate the fifth Fibonacci number. Have we reached the base case? Is this a zero or a one? Nope. Is five already in our cash? Well, our cash is empty. By the way, this is an O of one operation to check if a value is in the cash. That's why we're using a hash map, but we could also use an array and it would be the same time complexity. In this case, five is not in our cash. So we call, you know, memoization, or you could call it whatever you want, but we're recursively going to check what the n minus one Fibonacci number is. In terms of the decision tree, we're going to start at five. Now we're going to check what the fourth Fibonacci number is. We're going to check the same base cases. It's not a base case. It's not already in the cash. Our cash is empty. So then we're going to call memoization n minus one. We're going to call the recursive function on three and three is also not going to execute any base cases. Then we're going to call memoization on two. This is where things are going to get interesting. So now we're at two. This isn't again quite a base case. It's not in the cash. So now we're going to call memoization on one over here, which is a base case. We know one is going to end up returning one because the first Fibonacci number is one. Then from two, we calculate the zero Fibonacci number. We know that's also a base case and that's going to end up returning zero. So here we're going to have one plus zero, add those together. It's one. So we found what the second Fibonacci number is. It's equal to one. So what are we going to do? We can immediately return it, but we're actually going to cash it before we return it because what we're saying is we just calculated the second Fibonacci number. If we ever need to calculate it again, which we know we're going to hear, we want to be able to calculate it immediately without having to make any recursive calls. So we cash that. So in our cash for the key value two, we're going to say the Fibonacci number is one. Then we're going to get back up to three. We calculated the second Fibonacci number. Now we have to calculate the three minus one, which is one, which is a base case. So here we're going to calculate the first Fibonacci number. It's going to end up returning one. So then we're going to add these two together, which is one plus one. The second Fibonacci number is one. The first Fibonacci number is one. Add those together. We get two. Therefore, the third Fibonacci number is two. So we're going to throw that in our cash. So for the key value three, the Fibonacci number is two. So now we know if we ever have to calculate the third Fibonacci number again, we don't want to have to repeat all this work. So we won't have to. And now again, things are going to get a little interesting. Now we're at four. We calculated the third Fibonacci number. It's time to calculate the second Fibonacci number. We could go through that whole recursive tree. But when we try to calculate the second Fibonacci number now, we're going to check, well, it's not less than or equal to one, but it is in our cash to the second Fibonacci number is in our cash. What is it? The second Fibonacci number is one. So that's what we're going to return. We're not going to do the recursive case. This is a base case. So we're going to end up returning one. So when we're trying to calculate the fourth Fibonacci number, we get two, which is the third Fibonacci number plus one. So the fourth Fibonacci number is three. And we can throw that in our cash and then return that up to the parent, which is five. So now we're trying to calculate the fifth Fibonacci number. We already calculated the fourth. Now it's time to calculate the third Fibonacci number. So we're going to recursively call that and we're going to see it's not less than or equal to one. But again, we check our cash and we found it. We found three in our cash. The third Fibonacci number is two. So basically we don't even have to execute this entire tree. We only have to check three. And then we immediately find the value, which is two. So we calculate the fifth Fibonacci number, which is three plus two. And that ends up being five. So we throw that in our cash. The fifth Fibonacci number is five. And then we return it. So you can see by cashing, this is all the work that we eliminated. So this is actually the size of our decision tree when we implement cashing. This is what it looks like. You can see that this essentially is linear to calculate the fifth Fibonacci number. We calculate the fourth to calculate the fourth. We calculate the third to calculate the third. We calculate the second. And to do that, we calculate the first. It's linear. I mean, technically, you can see that each of these nodes actually has a right child sort of. So you could say that actually the size is not just n, it's two times n, but we know that that's also still linear. It reduces to big O of n. So that's the overall time complexity. When we implement cashing, which is also called memoization, which is actually also called sometimes top down dynamic programming, this kind of diagram memoization and cashing is sometimes referred to as top down dynamic programming because we're starting at the top of the tree and then going down. There's actually another form of dynamic programming, which is generally a bit more difficult actually, because you can't just take the recursive solution and then add cashing to it. And it's called the bottom up approach of dynamic programming. And as the name implies, for this tree, we're going to start at the bottom of it and then work our way upward. So we're still not going to redo all of this work that we scribbled out, but we're going to start at the bottom of this tree and then work our way upward. And I'm going to show you that right now. So this approach where we don't use recursion at all is sometimes called the true dynamic programming approach. Like some people don't even consider memoization a dynamic programming. Some people only consider this to be dynamic programming, but this is also called the bottom up dynamic programming approach. So we would just pass in whatever Fibonacci number we're trying to compute, let's say in this case, n equals five. Here I wrote n equals 10. But to keep it simple, we're going to just calculate the fifth Fibonacci number. So we know we're trying to calculate the fifth Fibonacci number and recursively we saw that to do that, we have to get the fourth to do that. We have to get the third and to do that we have to get the second to do that. We have to get the first and to do that we have to get the zero, but why start here, which was the top and then go top down when we can go bottom up start at the base case immediately and then work our way upwards because when we have the base case, we can calculate well, this is another base case. But when we have both of the base cases, we can calculate this guy. And then we can calculate this guy. And And then we can calculate this guy until we get to the result that we actually want. So in terms of code, it would be pretty simple. Actually, this is the simplest code, I think of the three solutions, because it's essentially a loop. Now it can look a little bit complicated. And that's because I've really optimized this to the point that we don't even need an array, but I'm going to draw it this way just to get the main understanding across. First of all, if we have a Fibonacci number that's less than two, we're going to return, uh, you know, whatever N is. So the zero Fibonacci number, we know it's going to be zero. The first Fibonacci number we know is going to be one, but assuming we have N equals five, we're going to create, you know, this sub array. I'm calling it DP and we'll have this as our little sub array here. And now visually forgetting the code for just a second, visually, we would start here, we would want to get the second Fibonacci number. Now, how do we do it? Just take the two previous Fibonacci numbers, add them together. Zero plus one is equal to one. Now we want to get the third Fibonacci number. Add these two guys together. A one plus one is two to get the fourth Fibonacci number. Add the two previous together. One plus two is three to get the fifth Fibonacci number. Add these two together and we get two plus three equals five. We did the result. What was the time complexity? Big O of N. We just had to iterate, you know, N times roughly speaking. What's the memory complexity? Also big O of N. We're maintaining an array that we have over here. But my question to you is that is this array even necessary every time we got to a value when we were trying to calculate this guy over here, we only needed the two previous values. And then when we moved, when we calculated that and then we moved here, we didn't even need those values. We needed the two previous values here. We didn't need this guy anymore. So my question is, do we actually have to maintain the entire array? Or maybe we can get away with only saving the two previous values at a time. And that's exactly what we're going to do. So forgetting this portion of the array here, we only need an array of size two. And initially the values are going to be zero and one, just like we've drawn over here, but then we're going to get to I equals two. We're trying to calculate the second Fibonacci number. We're going to save this guy in a temporary variable. I'm just going to write it here. We're going to save one here. And then we're going to calculate in this position what the second Fibonacci number would be. It's going to be these two guys added together. Zero plus one is one. So what we would do is like overwrite this. We know in this case it's, you know, it was previously one and staying one, but I'm just doing this to get the point across. And then in the zero position, we would put whatever originally was over here in this spot, the good thing we saved it here. But in terms of code, we would save it in a temporary variable. So here we would now put one. So this is telling us we calculated now the second Fibonacci number and put it over here, and then we put the first Fibonacci number over here. Cause we know we're going to need the first and the second Fibonacci number. Now, when we take our I increment it by one and try to calculate the third Fibonacci number. And you can probably at this point see where I'm going here, but I'll quickly run through it just to get the point across. So we're going to overwrite the value here. Now by taking these two original values, one plus one, adding them together and then putting it over here. So now we have a two here and we were to put the original value, which was one, and then put it over here. It's already one, but you know, we're doing it anyway. And at this point, this would be the second Fibonacci number and this would be the third Fibonacci number. And now here we would take one plus two, add them together, replace this position with three and take the previous Fibonacci number, which was two and replace this guy with two. And then these two would be the third and fourth Fibonacci number. And at this point, we've sort of ran out of space, but you probably get the idea. We would now take these two, two plus three, add them together. We would get five. That's the fifth Fibonacci number. We would end up storing it in this spot. And then we would end up once our loop stops, which it would now, we would end up returning the value that's stored over here, which is five. So that's the fifth Fibonacci number. And of course, since we're looping end times, the time complexity is going to be big O of N, but we saved space. We didn't have to allocate the entire array. The space in this case is just an array of size two, which is constant memory. So our memory is not scaling as we increase our input N. So this is the most efficient way to solve this problem. And I hope this gives you an idea of the thought process that goes behind dynamic programming. Dynamic programming is essentially a technique that takes a big problem and then simplifies it. What makes this a dynamic programming problem is that we have originally a big problem. We're trying to calculate the fifth Fibonacci number. But to do that, we have to solve a sub problem, which is calculate the fourth Fibonacci number. To do that, we have to solve a sub problem, which is the third and another sub problem until we get to the base cases. So that's what dynamic programming is basically all about. Taking a big problem and then solving, breaking it down into sub problems, which Fibonacci number is a very good example of because it's mathematically a sub problem and then solving that sub problem. Most dynamic programming problems can actually be represented with equations like this. So this is the one dimensional case. Why this is considered one dimensional is because we have a one dimensional array, which is our solution space, right? We have to get the zero first, second, third, fourth, fifth Fibonacci number. It's one dimensional. So that must imply that there are two dimensional dynamic programming problems, which is correct. And that's what we're going to talk about next. So next let's move on to two dimensional dynamic programming and let's solve a pretty familiar problem at this point, counting paths. So we're going to be given a two dimensional grid and we were starting at the top left. We want to go to the bottom right and we want to count the number of paths that can lead us there. Now, in this case, though, there aren't any blockers like before we had zeros and ones, ones were blockers, but this time we don't have that. And we have another thing that's going to simplify things for us. We can only move down or move to the right. So that's going to make things pretty easy for us. So for example, one path would look like this. Another path would look like that since we can only move down or to the right, every path is actually going to be the exact same length. Because as we move down, we're getting closer to the bottom right. As we move to the right, we're getting closer to the bottom right. And, you know, eventually the paths are going to converge and they're both going to be the same length. Every single path to the result is going to be the same length. Now, based on what we learned in the graph section, we can do this recursively with depth first search. And so that's the brute force way to do this, but it's not going to be very efficient. So suppose we're given the number of rows and the number of columns. In this case, notice we actually don't care about the grid itself. It's not a parameter. What we actually care about are the dimensions of the grid. In this case, it's a square grid. So it's four by four, so four rows and four columns. And the input is going to be the starting position. We know it's going to be zero, zero to start with. So when I'm actually calling brute force, I'm passing in zero, zero as the starting coordinates and the dimensions are four by four. So we have some base cases. First of all, what if we go out of bounds? How are we going to go out of bounds in this case? Because our row is never going to be too small because we're starting here, but we can only move down. So if we go out of bounds, it's because we went too far out. Not because we went too far up. So when we reach this case, it'll be because our row is equal to, you know, four, which is the number of rows we have. So if R is equal to four, then we went out of bounds. Same thing with the column. We can only move to the right. We can't move to the left. So that simplifies things for us. So when we go too far to the right, so if C equals the number of columns, we can return zero. That basically means we went out of bounds. There's no valid path that goes out of bounds that reaches the result. The other base case, of course, would be if, you know, we did reach the goal. So that would be if R is equal to rows minus one and column is equal to columns minus one, in that case, we would return one. That means, OK, this is a path. Otherwise, suppose we're actually starting from here. It's not a base case. So what we would do is call brute force two times because we can either go down. That's this first case where we say row plus one or we can move to the right where we say column plus one. So brute force, this is going to basically be a decision tree with two branches because for every position, no matter where we're at pretty much, we can either go down or to the right. We can't go in four directions. We can only go, you know, in two directions. So this decision tree has two branches. What's the height of the decision tree going to be? Well, like I said, every path is going to have the same length. So any path in our decision tree is going to have the same height roughly. And the height is basically going to be the number of rows plus the number of columns, because that's practically what the length of every single path to the destination is going to be. So we can say that the time complexity is two to the power of n plus m, where let's say these are the dimensions of the grid. So I've written that up here as well. So that's the time complexity. So obviously it's not very efficient. But we can make it much more efficient. Basically, the time complexity can be reduced to the size of the grid because we're going to notice some repeated work. First, I'm going to quickly walk through a little bit of the brute force. And then we're going to see how we can use cashing to improve the time complexity. So first we start here, we go down and we go to the right. So now we want to know how many ways can we reach the destination from here and from here? Well, from here, we're again going to go to the right, or we're going to go. We're going to go down or we're going to go to the right from here. We're going to do the same thing. Go down or to the right. And then, you know, from here, suppose we're going to go down and to the right. And then from here, we're going to go down, which is a base case. That's going to return zero. And we're going to go to the right over here. From here, we're going to go down can't do that. We're going to go to the right as well. From here, same thing. But finally we get to the destination. This is going to return one for us. So when we're calculating how many ways from here, can we reach the destination? We're going to have a value of one now. What about from here? Well, here was zero because we went out of bounds, but from here was one. So we're going to take those two results, add them together and we get a one over here. Same thing for this guy. There was a zero here and a one here. So add them together. We get a one. Now finally, when we get back up over here, we have the value below, but now we're trying to calculate the value that's over here. Well, from here, we can have two choices. Go to the right or go down. We have the value down here, but what's the value over here? We don't know. We have to go down and we have to go to the right. Well, what about from over here? Again, we can go down and to the right. Well, right is out of bounds. That's going to return zero down is reaching the destination. That's going to return one. So add those together. We get a one over here. Now we can calculate what goes over here one plus one. So add those together. We get a two over here. What about in this position? There's a one over here and there's a two over here. Add those together. We get a three in this position. Then we can calculate what goes over here. One plus three, add those together. We get a four in this position. So now you're starting to get the idea. So as we try to calculate starting from here, how many ways can we reach the destination? We pretty much have to find the, we have to solve the sub problem for every other position because to solve this problem over here, we have to solve these two problems. But to solve these two problems, then we have to solve these next three problems. And to solve those, we have to solve these and et cetera, et cetera. We have to keep going until we get to the base case, which is the result and then work our way back to what we were originally trying to do. When you do this recursively and then add cashing to it, which I'm going to go over right now, this is called the top down dynamic programming approach. But as you might have noticed, we can eventually also work our way starting from the bottom and then work our way up, which is going to be the true dynamic programming approach, which is called the bottom up approach. But for now, let's focus on taking this brute force and adding cashing to it. So let's undo a little bit of this work and go back to some of the basics that we were talking about over here. For every single position, no matter what it is, we can get how many ways from here can we reach the result just by taking the value below it and the value to the right and adding those two sub problems together and then assigning it over here. But notice something. If we want to solve this sub problem, we have to solve this sub problem and this problem, but if we want to solve this sub problem, we have to solve this sub problem and this sub problem. Well, we have to solve this sub problem twice. That's where the repeated work is coming in. And it's going to get worse and worse from here, we have to solve this and this from here, we have to solve this and this. From here, we have to solve this and this, this is being done twice. As you can see, this is being done twice. As you can see. So there's a lot of repeated work going on. And I can draw out the entire thing. I encourage you to do that if you want to have a better understanding of it. But the idea is there's going to be a bunch of cells that have repeated work. And it's going to be exponential because you can see that there are two different ways that we can land in this spot. That means there's going to be two different ways we can go move to the right over here. There's two different ways that we can land in this spot. So there's going to be two arrows actually going down here, two arrows going to the right. So it's going to get more and more and more. And if we had a really big grid, it would be very, very inefficient. So cashing is going to help us a lot here. So in terms of code, the good thing is about cashing. There's very few changes that you actually have to make here. So essentially, the first base case is going to stay the same. And the second base case is going to stay the same. But in between that, we're going to add the case where if our cash, which in this case is a two dimensional array, which is going to be the exact same size as our input grid. Well, technically, we don't have a grid, but the cash is going to be of the dimensions of that, you know, input. And you could also use a hash map if you wanted to. Generally, I prefer to use a hash map. But in this case, we're using a two dimensional grid. And our memoization solution has an extra parameter for that two dimensional grid cash that our brute force did not have. And when I'm passing in a grid here, you know, this is kind of complicated. If you're not familiar with Python, this is essentially creating a two dimensional grid with all zeros. That's a four by four, essentially, if you are familiar with Python, you still might not understand this because this is, you know, kind of a weird syntax. Essentially, we have an array of size one with a zero inside of it. We multiply it by four, which gives us an array of size four with four zeros inside of it. And then we create a outer array taking that array and creating four copies of it. That's where this comes from. And that gives us basically four by four grid with all zeros inside of it. It's a lot easier to do this in most other languages, thankfully. But when we have that cash, if we don't go out of bounds, so take this, for example, we went down and to the right. And then from there, we went down and to the right. And then from here, we went down and to the right. But in reality, one of these is going to execute first. Let's say this one executes first. And then we calculate how many ways from this position we can reach the result. And once we have that, we're going to add it to our cash. So you can think of it, think of this grid as being our cash. We're going to put the value here. So as we saw before, all the values in the bottom row are going to be one. That makes sense because from here, we can't move down. We can only move to the right. So from here, there's one path that can lead us to the result. And from here, there's one path that can lead us to result from here. There's one path that can lead us to the result. The next row we already filled out earlier looked like this. So now let's briefly go over filling out the rest of it. So from here, we want to calculate the value. We go down and to the right. We don't have the value here. Let's go down and to the right. We don't have the value here. Let's go down and to the right. There's nothing on the right, but below there's a one. So here we can put a one as the result. Now we can calculate the value that goes here to plus one, that's going to be three. Now we can calculate the value that goes here, three plus three is six. And we can calculate the value that goes here for plus six is 10. So now when this guy wants to go down and to the right to calculate the value that goes here, we already have the value that goes here. We cashed it by building this grid. We added it to our cash. We're not going to run DFS. We're not going to run, you know, our recursive call from this position again and do all that repeated work. We don't have to do that anymore. So from here, when we go down, we're going to check, is that value out of bounds? Nope. Is that value in our cash? Yes. So we can return that value in our cash, which is six, but we still have to calculate the value that goes over here. By the way, the order of this is somewhat important. Now these two could be swapped. You don't have to put them in this order, but it is important that we put this before this because we have to check if we went out of bounds before we try to index this two dimensional array, because if we index it with, uh, you know, an R or a C that's out of bounds, then we're going to get an exception thrown or an error or something like that. So this is usually the first check that you want to make. You want to make sure we're not out of bounds. That's really important. But otherwise, if none of the base cases execute, we're basically running the same thing we did with our brute force. We're running the recursive call, you know, going in both directions, either R plus one or C plus one, passing in the same, you know, hard coded values, rows, columns, whatever, and passing in that cash, which is a reference to the object. We're not creating a new cash every time we call the recursive function. We're only creating a single cash initially, and then reusing that throughout the recursive calls. And then when we add the result of those two together, so in this case, we're going to recursively call memoization over here, we're going to go down and to the right. And from here, we're going to go down and to the right. This is going to be a zero. There's a one over here. Add those together. We get a one here. What value goes here? Three plus one. So we get a four over here. Now we can finally calculate what goes over here. It's six plus four. So there's 10 ways from here that we can get to the destination. There's 10 ways from here that we can get to the destination. Add those two together. That means there's 20 different ways from the starting point that we can get to the destination. And these are all unique paths. So this is something how it would play out if we did it recursively with memoization. But as you kind of saw, why should we even start at the top and work our way down and then recursively, you know, solve the sub problems and then go back up? Why not just immediately start at the bottom and then fill in these values and then get all the way to the top and ultimately have that result anyway? Well, the time complexity will still be the same because we still have to pretty much fill out this entire grid. As you saw right now, the time complexity is the size of the grid. We're not doing any repeated work. But in this case, we also have to declare a bunch of space. We have to create a two dimensional grid, which is going to be the same size. But the benefit of dynamic programming is sometimes we don't have to declare all that extra space as we saw with the Fibonacci sequence. We were able to save space by using the bottom up approach. The same is true with this problem. Let me show you what I mean. So this is what the code for the true dynamic programming, aka bottom up approach would look like. It's usually pretty short for dynamic programming problems. The bottom up approach usually has a lot less code than the recursive approach, but it's usually difficult to come up with unless you have a good understanding of the recursive approaches. And there are many dynamic programming problems that follow the same pattern of having a two dimensional grid and you know, solving sub problems that are below and to the right and then eventually arriving at the solution. So in this problem, what we're going to do is start bottom up. We're going to start at the base case and then calculate these values and then go here and then calculate these values and then keep going like that. Now it's intentional the order that we're solving these sub problems. And we kind of have to do it this way, starting at the last row, going to the left, starting at the next row, going to the left, because well, this is the base case and for us to know this value, we have to get the bottom value. Good thing here is for the last row, there's nothing below. We can just kind of assume that there's a bunch of zeros here if we really want to. And that's sometimes a technique that's used in more complicated two dimensional dynamic programming problems. In this case, we'll look below that'll be zero and we'll look to the right. And what we expect from here is that there's a one there. Now it doesn't really make sense that from here, how many paths are there that can lead to this position? Well, is there one path or is there zero paths? Technically to me, it makes sense that there would be zero paths. But as the base case, we put a one here because that's just what allows our math to work out. If we put a zero here, then the value we're going to get here is going to be zero and then here and then here and then all of these are going to end up being zero. We know that's not correct. So we put a one here. And then when we want to calculate this value, we go below into the right, we take this at it, and then we get a one here. And so we have to compute it like this. And then when we want to get to the next row, we already have all of these populated. So for us to calculate this position, we look below into the right, there's nothing there. And we already have the value populated here. So we'd get a one here and then we can populate this value only after we've gotten these two, we can add them together and get a two here. So it's going mostly in the same order that our recursive algorithm did. And the memoization one did is just we're immediately starting at the bottom and working our way up. We're not even making the recursive calls starting from the top. So in terms of code, it would basically be the exact same thing. What we're going to do here is declare a previous row, which is going to be all zeros. What that basically is, we're implicitly creating a row over here. That's just telling us that there's all zeros over here. And then we're iterating through this 2d grid starting here, you know, starting in the last row and then going up and up and up. And in our nested loop, we're starting at the last column and then working our way to the left. Well, technically we're not starting at the last column. We're starting at column minus two, the number of columns, which in this case would be four. That's what we're passing in here for minus two. So we're starting at the second to last column. Why am I doing that? Well, because we know that the base case is going to be one anyway. So we actually fill in that base case, we create a current row. So this is going to be our current row. And we're going to fill in a one at the last position. And we actually know that for every row, the last value is going to be one because we know from any of these going down, there's only one way to reach the destination. We can't go to the right for any of these anyway. But notice how we're actually not creating a 2d grid because it's not necessary. Notice how to fill in this entire row. All we need is the previous row, aka the row below it. And that's enough. We don't need anything from the row above or the row above that. We just need the row below it. And to get the values in this row, we only need the values below that row. So at any given point, we're only going to have two rows in memory, the previous row and the current row. So the space complexity is going to be two times the number of columns. Let's say M is the number of columns. So this is the space complexity. This can be reduced, of course, to big O of M, because we don't care about constants. So that's the space complexity of this approach. Clearly the time complexity, we are having to still iterate through the entire grid. So the time complexity is still going to be N times M. And if you're a little bit confused by the syntax that we're doing here, this is Python, of course, you could do this in any language. But this is basically starting at rows minus one. And this is starting at columns minus two, you know, this column. And this is basically saying we're going to keep going until we've gotten to the zeroth row, and we're going to stop as we get to the negative one row. So we're not going to execute when we get to the negative one row, and we're not going to execute when we get to the negative one column position. That's why we have negative one here as well. And the last negative one basically means we're decrementing each time. So we're going to take the column, decrement it each time. In most languages, I think it's a little bit more obvious what the four loops are doing. And we have the code for those below if you want to check them out. So at this point, you probably get the idea and I bet you could fill in this grid yourself if you really wanted to. But I'll do it really quickly. So when we want to calculate the value here, which is current row, we're going to take the current row column plus one. So that's our way of looking to the right, we take the current row and get the column plus one to get the value below, we take the previous row in the same column. So we take those add them together and then assign that value here. It's going to be one. We're going to do the same thing here and here they're all going to be one. Now, before we start looking at this row, we're going to take our current row, which was this and assign it to previous row. So this is now going to be considered our previous row. This is no longer going to be in memory. And now here, we're going to create a new current row of the same dimensions. It's going to initially be all zeros, but we're not going to care about that. And the last value here is going to be a one. And then we're going to start calculating these. This one is going to be one plus one, two, this one is going to be three, this one is going to be four. And then we're going to do the exact same thing over here. This is going to be out of memory. This is going to be the new previous row. And this is going to be the new current row. There's going to be a one here. This is going to get a three, this is going to get a six, this is going to get a 10. And again, this is going to be removed from memory. This is going to be the new previous row. And and this is going to be the new current row. There's going to be a one here. This is going to be four. This is going to be 10. This is going to be 20. And then finally, we're going to delete this from memory as well. This is going to be our previous row. We're going to return the zero value in that previous row. That's exactly what we wanted. We're going to return the 20. So this is, believe it or not, a relatively simple two dimensional dynamic programming problems. They can get a lot harder. But if you can get really good with the concepts that we talked about here, but, you know, understanding the idea of decisions that we're making, not doing any repeated work and basically filling in the values of a grid, you can go very, very far with more difficult dynamic programming problems. If you want to practice some, check out the practice section and look at the problems in the two dimensional dynamic programming topic. So now finally, the last topic that we're going to cover is bit manipulation. And the reason that we're covering it last is because I think it's not really a super important topic, especially for coding interviews, but it's definitely something that comes up. So I think it's very important to at the very least understand some of the basic things that you can do with bit manipulation and even beyond coding interviews. I think it's good to have a fundamental understanding of the basics. So there's a few logical operations that you can do with bits. Now, first of all, what is a bit? Well, we talked about it close to the beginning of the course. You know, bits are just zeros and ones. Computers basically use zeros and ones exclusively under the hood, but most of that ends up being abstracted for us. But let's start with a few fundamental operations. One is the and operation. So with zeros and ones, we can do a logic and aka a bit wise and so for two given bits, suppose zero and zero. Now we usually use this character, the ampersand character to indicate a bit wise and and that's actually the case for most languages. So I'll summarize the main operations in terms of code. Now this is Python, but actually these operations are pretty much the exact same in most languages. So the logic and operator is going to use this character. So zero and it with zero is the value zero. So basically what the logic and means is that both bits have to be one for the result to be one. But if that's not the case, the result is going to be zero regardless. So if the first bit is zero and the second bit is one, then the result is still going to be zero. If the first bit is one and the second bit is zero, again, the result is going to be zero. But only in the last case where both bits are one is the result going to be one. So that's where the name comes from. And both have to be one and then the result will be one or is sort of the opposite. Only a single one has to be one and then the result will be one. But if neither are one, then the result will be zero. So this is the character for logic or and so continuing zero or one will be one because only one of these has to be one. And then if the first one is one and the second one is zero, the result will again be one because only one needs to be one. And if both of them are one, the result will be one. The third case is called exclusive or and this actually does come up occasionally in coding interviews. It's basically the result will be one only if one of the bits is one, not if both of them are so it's similar to or except it's exclusive or only exclusively one of these can be one and then the result will be one, not if both of them are one. So the character for that is the up carrot kind of so zero exclusive or with zero is going to be zero. Exactly one of these should be one for the result to be one. It's not so we don't get a one. But if we have the second one is one, then we get one in the result. If the first one is one, then we also get a one in the result. But if both of them are one, we get a zero in the result. So you can kind of take a look at these three and compare them. They're mostly straightforward. But if you haven't seen this before, if you've never seen these, these are called truth tables. Basically, these are some logical operations that we could do. This is not something you'll necessarily have to draw out in like a coding interview, but it's good to kind of understand the logical basics of what's going on here that now, of course, there's the negation operator, which is the tilde in most languages. It'll just take any bit and take the negation of it. So, you know, if we take the negation of one, it's going to be zero. If we take the negation of zero, it's going to be one. It's pretty straightforward. These are really the only two cases. The last thing is bit shifting. So if we have some binary integer, usually in programming languages, we're talking about 32 bit integers, but I'm not going to draw all that out because, you know, we take up a bunch of space and it's not super important to get the idea across. Let's say we had an integer that, you know, was zero zero one. So basically, this is the integer one in binary representation. Of course, we could draw out all 32 digits, but we're not going to in this case. Now, we take this and your bit shift it to the left by one. That's what we're doing down here. We're taking n, which is one and shifting it to the left by one and then assigning that to n. So what this would do is it would just take all these bits and then shift them to the left by one. So by doing this operation, we would get a result of this. What if we did this operation again on this integer now? Well, then we would get a result like this. What if we did it one more time now? Well, assuming in this case, we just have an integer with three bits and we have no bits, you know, to the left of it. Well, if we bit shift this one more time to the left, where does the one go? Because previously we were seeing every bit was shifted to the left. Well, in this case, we have a one here. It's going to get shifted to the left and then it's essentially going to be dropped off. It's going to be deleted. It's going to be replaced with a zero. It's not it's not going to circle back here. It's just going to completely be deleted and then we would have this as the result. This zero would be moved to the left and, you know, when this zero is moved to the left, it's going to be always replaced with a zero. That's what we kind of saw throughout this example. When we shifted this by one, it was replaced with a zero. When we shifted this again, replaced with a zero. Now, one thing that's very important when we have a binary value, let's suppose we have one zero one one. This is the ones place, right? This tells us how many ones we have. That's how numbers are represented in binary. The next digit is going to tell us how many twos we have. So this is the twos place and then the next digit is going to be the fours place and the next digit is going to be the eights place and it's going to keep going like that. Now, this is somewhat comparable to if we were, you know, suppose I had base 10, if we were thinking about integers in terms of base 10, which is what we do most of the time. We're all used to thinking of numbers where the max of where the max value a digit can be is nine. And then if we get the value 10, we of course add a new digit for that. Right? We count one, two, three. We go all the way up to nine and then when we get to 10, we add a new digit for that. That's called base 10. This is called the base two because by the time we started at zero, then we get one by the time we get to two, we add another digit for that. So this in binary is not the number 10. This is the number two. That's why this is the ones place. This is the twos place. This is the four's place, eight's place, et cetera, et cetera, right? Each time we move to the left, we're getting a new power of two. So essentially this is two to the power of zero. This is two to the power of one. This is two to the power of two to the power of three, which is eight. And it keeps going like that. That's how binary numbers work. Whereas with base 10, you have a number like this. This is the ones place, aka 10 to the power of zero that evaluates to be one. This is the tens place, right? This is not three. This tells us we have 30 because it goes in the next spot. So this is 30, right? We're pretty familiar with that. That's basic math, but we call this the tens place, aka 10 to the power of one. This is the hundreds place. This isn't four. This is four hundred, right? So this is 10 to the power of two. 10 to the power of two, by the way, is a hundred. So it keeps going like that. This is the thousands plates, et cetera, et cetera. That's why we call this base 10. So I wanted to kind of review that with you in case you weren't already familiar. And by the way, what happens if we take this and shift it to the left by one and we got something like this five times five, four, three, two, and then we add a zero in this spot similar to how we were doing it with binary. Well, shifting to the left and adding a zero when it comes to base 10 is essentially taking this and multiplying it by 10. If you're familiar with math, you know, you know, multiplying by 10 essentially adds a zero. Well, in terms of binary, when we're shifting to the left by one, we're actually multiplying by two. That's very important. That's a pretty basic and important idea to remember as you shift to the left, you're multiplying by two. This in binary is the integer one. This in binary is the integer two, because we have, you know, something in the two's place. This in binary is the integer four, and it basically keeps going like that. So that's bit shifting to the left. We can also bit shift to the right. So suppose we had a value like this. By the way, this represents the value four in binary. Suppose we shift to the right by one. We would get something that looks like this. Suppose we shift to the right by one again. We would get something that looks like this. Notice how as we shift, we took this one, moved it to the next position, and then we replaced what was on the left with a zero. That's similar to what we did when we were shifting to the right. So now we're going to shift to the right one more time. We're going to get basically zero. And similar to how when we shifted to the left by one, we were multiplying by two. When we shift to the right by one, we're dividing by two. This is four, this is two, this is one. And this is zero. So we're dividing by two when we shift to the right. But if we have an odd number, we're going to round down. So with all that being said, let's jump into an example problem where we're given some integer and we have to count the number of bits. So suppose we were given an integer like 23, which by the way, in terms of binary would look something like this. In most languages, it would be represented with a 32 bit integer. So we would actually have 32 bits, you know, in this case, we would just have a bunch. So in this case, we have five bits. So we would have probably 27 zeros that come before it, which would make this a 32 bit integer. But let's just draw out the five bits for simplicity's sake. And suppose we want to, given this integer, count the number of bits. Now, N in this case is going to be 23 base 10. But we know under the hood that's represented by some binary value that will look like this. So we want to count the bits of its binary representation. How can we do that? Well, we're basically going to use the same ideas that we talked about over here. Or we're given that as a parameter, we're counting the number of bits. Initially, that's going to be zero while N is greater than zero. So suppose we're given an integer like 23 and we want to count the number of one bits in its binary representation. Well, in terms of binary, it would look something like this one zero one one one. Now, most of the time, you'll be using a 32 bit integer to kind of store this. Python actually has unlimited bits for values. But let's suppose we had 32 bits, basically, since this is five bits, we would have 27 more bits, which are all leading zeros. So I'm not going to draw all of those out. We're just going to focus on these. And we want to count the number of one bits in its binary representation. So N in this case is the parameter. It's going to be 23. But we know every base 10 integer has some binary representation. In this case, this is what it looks like. We want to count how many one bits it has. In this case, it has four. But how can we count how can we count them algorithmically? Well, it's not too complicated. We just need some of these operations that we already talked about. So in our function, we're given the parameter. We're going to declare a count, which is going to be the count of one bits. We're going to count while our value N is greater than zero. We're assuming we don't have to worry about negative numbers in this case. And we know that the value zero itself, even in binary, will just be 32 zero. So there's not going to be any one bits for that anyway. So now we're basically going to count one by one. We're going to check the ones place. Is there a one bit here or is it zero? How do we know? Well, we can use the logic and operator, the bitwise and by taking N, which is right now 23 and a bitwise ending it with one. We're going to get an operation that looks something like this. This is what one looks like in binary. Of course, it would have a bunch of more zeros, but we don't care about those zeros. So when we bitwise ended single digits, we, you know, we had this truth table over here and this will help us out when we bitwise and multiple bits. It's basically taking each of these and taking the law, the bitwise end of them. So we know that for these two, it's going to be one. And for this, it's going to be zero for this, it's going to be zero for this. It's also going to be zero and for this, it's also going to be zero because both of the bits have to be one for us to get a one in the result. Now, let me tell you something. When we take any number and bitwise and it with one, the result that we get is always going to be either zero or one. The reason is because this is one in binary. It has only zeros here. So we know everything in the result is going to be zero. Except possibly this digit. And it's all going to depend on what this digit happens to be. If this digit is a one, we're going to get a one in the output. If this digit is a zero, we're going to get a zero in the output. We're we're using the value one because it's special. It has all zeros except for this bit. So this will tell us if we have a one in the ones place or not. So in this case, we are going to get a one. So this is going to evaluate to true. So then we're going to increment our count by two. And then what we're going to do is take our end and shift it to the right, which we know is the same as dividing by two. So when we take this and shift it to the right, we get something that looks like this, right? We're basically chopping off this bit. We don't care about it anymore. Then we're taking the rest of these, shifting it by one, and then we're placing the leftmost digit with a zero. Now we're just going to repeat the same thing until this value becomes zero. So we're going to take the digit one, the value one bitwise and these together. In this case, we're going to get a one in the output because this is a one. And then we're going to increment our count by one again. So so far, our count is two. We've counted two one bits and we're going to basically do the same thing. So now this is going to be shifted to the right by one. We're going to get something like this as the value. We're going to, you know, check is this a one or not by taking the bitwise and it is a one. So now our count is going to be three. And then we're going to shift this to the right by one. We're going to get something that looks like this. Now we're going to bitwise and this. So we're going to check is this a one or not? We're going to get a zero. So this is not going to evaluate to true. So we're not going to increment the count by one. Our count is still three. And then of course, lastly, we're going to take this, shift it to the right by one. We're going to finally get this. We're going to check is this a one bit or not it is. So we're going to increment our count. Our count is now going to be four. And then we're going to shift this to the right by one again. And at this point, it's going to be five zeros. So our conditional is not going to execute anymore. So our loop is going to stop. And we're going to return the count that we calculated, which is four. So we had four one bits in this particular integer, which is 23. So 23 has four one bits. So we didn't go super in depth in this because I don't think it's super worth going into. I think understanding the basics is probably enough. And there are some practice problems in the practice section. If you want to practice more bit manipulation type problems, usually they just require a bunch of tricks, which is why I think they don't really come up super often in coding interviews because they don't really make for good coding interview questions, either you know the trick or you don't. So they're not the best coding interview questions, in my opinion. But these basics that we talked about definitely can come up even outside of coding interviews. So it's good that you have now learned them.