 Hello everyone. So in this video I will introduce you to a new concept which is that of linked lists. So linked lists are challenging but important concept in computer science and they are going to be your very first custom home-built data structure in this class. So you're gonna learn a lot from this experiment and this assignment. Just a quick reminder of some really key terms that we talked about before. An abstract data type or an ADT is a conceptual thing that is comprised of two things. One is the data that it stores and then also the operations that you can perform on this data. So the example that we gave of an abstract data type before was an unordered list. You can create them, you can append to them, you can access things by index. That's an abstract data type that exists in every language. You also learn about stacks, queues and decks. Those are also abstract data types. Data types without the abstract are concrete implementations of ADTs in a particular language. So you wrote Python versions of stacks, queues and decks. You created those data types. Finally there are data structures and these are the internal, the way that the computer or your program organizes the data internally inside the data type. So the only data structure that you have been using so far is the array-based list and we talk a lot about how that array-based list works in previous videos. I'll remind you here again in just a second. But everything so far. So here's a summary of the ADTs and the data types that we have talked about so far in this class. Well now we're going to fit in something brand new and totally different. So we're going to once again implement the unordered list abstract data type. So unordered list, search the list, remove from the list, append to the list, insert into the list. The list you know and love is an unordered list. But you are going to be creating a new data structure called a linked list and unlike the array list, that was like hard coded in Python. You didn't have a lot of control over how that thing worked. You just used it. This, you're going to write it from scratch and you are responsible for implementing all those list behaviors. It's going to be challenging but you'll learn a lot from it. At least I hope so. Okay. So just as a reminder, an unordered list abstract data type is a collection of items, the data, where each item holds a relative position with respect to the others. So the items are kept like in the order you insert them and remove them in the list. They're not ordered and they're not sorted. There are other abstract data types where as soon as you put a thing in, it's kept in a sorted order. That's not this. This is just the generic list that you know and love. So when we go to implement a linked list, and we'll get to that shortly, you'll be able to, you're going to write Python code that creates this new list class and then you will be able to call its length. You will be able to add items to it, which is like inserting at the front. You will be able to append items. You will be able to pop items. Popping is removing from the end or if you pop given a position, you remove an item at a particular index. You can search for items by value and you can remove items by value. These are all things that you can do with a Python list as well, but you're going to write the code this time. Okay. Now why in the world would you go through the hassle of writing your own list data structure when you can just use Python's ArrayBased list? Well, the ArrayBased list, while very powerful, also has some downsides. Remember when we talked about queues in an earlier video. If you were to use a Python or any ArrayBased list as the data structure for a queue, it must be the case that either the N queue operation or the D queue operation will be big O of N. Right? And this is because if you add things at the front of an ArrayBased list or remove things from the front of an ArrayBased list, then everything that comes after it needs to either shift down a slot or shift up a spot to maintain that contiguous block of memory. Okay. Well, that can actually get very expensive if the list grows and grows and grows. So if you need a queue, but you need like high performance, like a network switch that is processing millions of packets per second, they're coming in, getting stored in a queue, and then coming out of the queue, it can be a real problem. Right? So let me demo the real life pains of this. Okay. So I'm some code here. And I have written a queue data type. I'm calling it queue array. And this looks very similar to the queue that was requested to be implemented in the last assignment. Internally, the queue holds its data in a list. And I am enqueuing at the end of this internal list, ArrayBased list, excuse me, this internal ArrayBased list. And I am removing from the front of the ArrayBased list at index zero. Okay. So which one of these operations is big O of one? And which one is big O of N? Well, to answer that question, you have to know how the ArrayBased list works. And as we went over in a previous video, adding or appending to the end of an ArrayBased list is big O of one. Whereas dequeuing from the front here, when I pop zero, that pops the item at index zero, removes it and returns it, well, everything that comes after index zero needs to shuffle up a spot. Right? So that's the big O of an operation. Okay, so I've got a queue. This is just a queue based on an ArrayBased list. Now I'm going to show you some timing code. This is the timing code very similar to what I gave you before and posted inside of Canvas. I'm creating a new queue here, one of my own data types. And what I'm going to do is I'm going to count how long it takes to nqueue one million random numbers. So there's a million random numbers in this text file. What they are is an important, what I'm demonstrating is how long it takes to nqueue a million things. Python doesn't really care what those things are, just how many there are, right? That's the essence of kind of big O of n. How many things? What is the size of the input? Well, there's a million items in here. And I've got my, I'm grabbing the start time of this operation. And then I open the file. And for each line in the file, I nqueue it into my queue. Nqueuing those going to be big O of one. So this should be pretty fast. And then I stop my stopwatch down here and I print out how long it takes. Now my queue is full at this point, right? I've put everybody in there. And I'm just going to remove, I'm going to dequeue all the items, all million of them. And so I say while the queue is, you know, the length of the queue is greater than zero, that means there's still something in there, just dequeue it. And then I also have some code here that's going to print out how many items are left in the queue. Okay, so this is going to be the time the expensive operation, because my dequeue operation here, my queue is popping from the front of the array list. Okay, so let's, let me run this. Nqueuing a million items, okay, it only took half a second to Nqueue that million items, read them from a file. So that's not too bad, right? That's pretty quick, right? But now you can see I'm dequeuing. And this is considerably slower, right? This is that big Ovan operation. Every time I dequeue something, I've got to move, you know, a couple hundred thousand items up a slot in memory. I've got to copy them up a slot in memory. And that takes time, right? So, you know, if you are a network router, and you've got a million packets in your queue, that you need to process just as quickly as possible, you may be able to process the piece of data at the front very quickly. But now you've got to shuffle everything up a slot. Not fast enough, right? Just not fast enough. If you're a router used in array based lists like this, you would be mad, right? So, yeah, we're kind of crawling toward the end here. I don't know if you can see it or not, or you've noticed, but it's actually, it speeds up really quick at the end, right? So, it took me well over a couple seconds over a minute to dequeue my array based list. Not good to take a million items to dequeue it. It took me over a minute. Not good, especially when it only took me, you know, what was it, like half a second to enqueue that million items? Not great performance, right? But we can do better. Now, again, the problem right here is that we are using an array based list. If data structure, if we use a different data structure, we can improve dequeuing, okay? So, I'll just demonstrate that to you very quickly. So, you may have see up here at the top, I am importing a class called linked list. So, this is code that I wrote and that you will write in your next assignment. But it is the linked list data type that kind of wraps the linked list data structure. And I have some code down here, I'm just going to uncomment it. And what I do is I make a different kind of queue. In my queue, I can still get the length, I can still enqueue, and I can still dequeue. And in fact, if you look at this code compared to the array based list, it's exactly the same. Except, let me unqueue or uncomment it, the code for the queue array, and I'll share this with you, though it won't really help. The queue for the array and the queue that's based on a linked list, the only thing that's different here is this. I'm using my own custom data type, whereas before, I was using Python's built in data type for a list. So now I'm going to run that exact same experiment, except I'm going to use my queue with my linked list instead of Python's array based list. And let's run it. Exact same experiment. Enqueuing a million items takes a little over two seconds. So I've lost some performance there. But I have gained performance in dequeuing. So before this was half a second for the array based list. Now I'm almost five times slower. That's not great. But before, this took 67 seconds. Now it takes 2.3. So what, 30 times faster to dequeue. So which one of these do you need to use in a high performance setting? We can actually optimize this more and get these numbers down. But clearly, this one's going to work better for you, because both enqueuing and dequeuing take the same amount of time. And the reason for that, again, is because I'm using this thing that is called a linked list data structure. And you're going to implement one. And we'll talk more about what that looks like here in just a second. So let me switch back over to my slides and let's get back to where we were. So why was that array list not so great? Let me get rid of myself so you can see everything. Why was the array list not so great for the queue? Well, remember, reading and writing a list element when you're using an array list data structure is simple thanks to math, right? We can just multiply by the size, excuse me, we can take into account that this object header takes the fixed amount of space, we can take into account that the length variable takes a fixed amount of space, and then use some math based on the length value to figure out where we append things, where say index one is in this list, it's all very fast array based arithmetic. But inserting and removing from anywhere, except the end is big O of n. And that's what we're doing in the queue. We're either insert, we're based, usually we're dequeuing from the front of the queue, right? So we're adding and removing. When I insert zero at index zero Paris, what has to happen? Well, these guys have to move down a slot. And that's the expensive part. I've got to copy everybody down, and then I can insert. There's another problem with the array based list is that I run out of room sometimes in my list. So if I have a list and it's taking up this part of memory, but maybe my Windows or Mac operating system has put something else here, after it. And I need to append here to my list, Charles. Well, I can't write Charles in this slot. Something else is there. If I overwrote it, that's a memory issue. That's a security issue. Really bad juju is going to happen if you overwrite stuff that you're not supposed to. Right? So what does Python do? Well, if it detects that there's not enough room in memory, it's got to find new place in memory that does have enough room, and it's got to copy everybody over, right? This is also an expensive thing. Okay, so the process, the copying process is big Oven and arrayed lists for this reason usually double in size, every time they reallocate so that they're not copying constantly copying to new locations because that copying is very expensive. All right. So array lists are great for some things, but they're not so great for other things. Let's talk about the linked list data structure. Okay, a linked list is an alternative way of implementing an unordered list. Inside a linked list, there is a concept of a node. Okay, so you have to, you're going to have to know what a node is. It's a term you're going to hear a lot. Okay, a node is, it's going to be a Python class. The node itself will be a Python class. And inside the node, it really just holds data. That's its job. It holds the data item that you care about, right? So, you know, list sub zero gets Sally. Sally is the data item. It can be whatever, an integer, a string, any object, just like in a Python list. What's the data that goes in the index? Or what's the data that goes in the slot that you care about? The node also stores a reference to the next node that comes after it. In other words, it's got a variable inside it that points to the node that comes after it. And the way to think of this is as a chain. Okay, a linked list like chain links, a linked list is comprised of these nodes. Each node has a value in it, the data item you care about, and then a reference to the next node, the link in the chain. Okay, so this is the way that the list is kind of connected together. Okay, now when we go to write this, and we'll switch to some code in a later video, there are two classes involved. One is the linked list. Don't worry too much about the slide right now. But there are two classes that are going to be involved. One is the linked list itself. It's going to have the wrapper. You're going to tell the linked list, hey, I need to append data to you. I need to remove an item by value. I need to look up an item by its index. You will give that to the linked list class. But internally, inside the linked list data type will be these nodes that are kind of forming a chain together. And all the linked list will know is the beginning of the chain, which we call the head, and the tail of the chain, or the end of the chain, which we call the tail. And then it also knows how many nodes are in the chain. Right? So just bear in mind, as we're talking about this, you're going to be implementing two separate classes. One, for the individual links in the chain, and two, for the linked list, the whole chain itself, and the operations you do with it. Okay? So let's visualize this. You have a handout at your disposal. But linked lists are pretty easy to visualize, actually. So a node. Here's our visualization of one node in the chain. One node in the linked list. It's got data. That's the values that you're storing at each index of the list. A node holds the value at one index, one particular place. Okay? One data item. And then the next. Next is going to be a variable. Right? And it is going to refer to the next node in the chain. Okay? So the visual is a, the linked list itself looks like this. You have a head of the list. That's the first item in the list. That's going to be the item at index zero. Okay? The head of the linked list. This is a term you have to know. The head of the linked list is a variable that refers to the first node in the list. Okay? In this case, that first node in the list is a node containing the value Paris. The node containing the value Paris then points to, this is a reference to the node containing Alice. Now what is a reference? What is a pointer in Python? It very simply is going to be a variable, and the variable value is going to be this node. That's it. Okay? We're not introducing anything new here. Remember, variables in Python are references to an address in memory. Okay? So this is a sudden variable, and it knows where Alice's node lives in memory. That's it. That's it. It's a variable. Okay? The next node has the data value Alice, and hit its next points to the node containing Bob. Okay? Nexts in the node are always going to point to a whole node. They don't point to just like the value to Bob. They point to this whole thing. Again, envision a chain link. It is linked to its next node physically, the whole node. Okay? Bob is then linked to Charles, and Charles is what we call the tail. Okay? Charles is the end of the list. He is the item at index negative one. He is the last node, and he has no next. Right? His node is nothing here. His node's next value is none. Nothing comes after Charles. Okay? So this is the linked list, and these references are the links. Okay? So let's visualize these linked list operations. Try and concretize what we're talking about. So I'm going to switch to showing my document cam, my whiteboard here. Dig out my marker. So we're working with this here worksheet. Right? And we're going to conceptually draw our linked list just to make sure that we've got a fairly good idea of how the linked list is manipulated conceptually. Right? Now the code's going to be a different animal altogether, but conceptually it's going to be good for you to have a real real clear idea of how the linked list is changing when you do different operations. So that's let's explore that. Okay? So the linked list. All right? Step one on my linked list is I create a new one. A new empty list. Okay? So for step one, when I create a new empty list, the linked list itself has two three pieces of information in it. Okay? Three pieces of information in it. The first is the head. The head of the list. Okay? That's going to be the first node in the list. The node at index zero. It has a tail, which is the last node in the list. The node that's going to be index negative one. And then it also has a length. I'll write the length over here. Okay? Now when we initialize the empty list, there's nothing in it. So the length is zero. Hopefully that's obvious. But what are the values of the head and the tail? If you're thinking in code terms, well there's nothing in this list yet. It's empty. How can you denote the absence of value in a python, in the python language? Well there's a special value to denote the absence of value and that is none. Okay? So both head and tail start out at none after we've done step one. We've got a new list. It's empty. We're basically saying create some space in memory for this and now we're ready to use it. Okay? So let's move on to list dot add harry. Okay, so what is adding to a list? Add is not a method that python lists have, you know, the array-based list. What adding means for us in linked list terms, adding says add to the front of the list. Add to index zero. Or, using linked list terms, add to the head. Okay? So when I add harry, okay, I'm going to draw a node and the data value in the node is harry. Okay? It's a string. That's his data. In fact, let me let me write the word data up here. Okay? That's the data. Now a node has two pieces of information. It's got the data and it's got what comes next. What comes next? Okay? Well, I have added to an empty linked list. My linked list, my list, now has one item in it. That's this. Okay? So there there actually is no next because nothing comes after harry. He's the only thing in the list. So I'm going to draw a little x here. But anytime I draw an x, x equals none. Right? If you're thinking in code terms, none. Harry has no next. None next. Horrible grammar. Right? But that's it. Okay? Now I've added an item so the length of my list is now one. I'll lose them a marker here. But the linked list keeps track. The linked list itself is defined by two variable values. The head of the list and the tail. So where is the head of the list? The head of the list, it's a variable. The head of the list is going to point to harry. Right? He's the only thing in the list. The list starts with harry. And where is the tail of the list? Well, the tail of the list is also harry because he's the only thing in the list. Okay. Remember, the head is the beginning of the list. It's going to be the item at index zero. The tail is the item at the end of the list. The item at index negative one. So the tail is also harry because there's only one item. All right. Let's move on to step two in our worksheet. Uh, oh, I'm sorry. We've just completed step two. Okay. We've just completed step two. All right. So now we're going to go on to step three, which is adding Hermione to the list. Okay. I'm going to move my steps over here. So I am adding Hermione. What does adding do? Adding kind of adds to the front of the list. Add is the equivalent. I write down here. Add is the same thing as insert at index zero, some value. Okay. Add is the same as inserting at index zero. Okay. So I want to add Hermione to this list. I want to insert her at index zero. Okay. Well, where does she go? Okay. Well, conceptually she goes before harry. Okay. So Hermione's got a long name. So I'm going to abbreviate it. Okay. H-E-R-M. Herm, that's unflattering. Um, okay. So I make a new node with Hermione in it. Conceptually though, what's going to happen? She's up here. And because I have added her to the front of the list, she is now the new head of the list. She's happy for that. Right. Hermione is now the head of the list. Right. Because we, we added her to the beginning. Okay. The head of the list is super important. In a linked list, the head of the list must always refer to the first item in the list. Okay. The first, the item at index zero in the list. So we have to update the head here. Okay. Now, um, something else has to happen. There's no link between Hermione and Harry at this point. We need to create the link. So who comes after Hermione in the linked list? Well, it's Harry. So Hermione's next, her next here. Here's her data. Here is her next. Hermione's next needs to point to Harry. Okay. Very important. This is creating the link. You have to create the link and you have to maintain the links. We're going to get into the code of what that looks like. Don't worry. But you have to conceptually maintain these links. If you don't put the link between Hermione, who is the new head, and Harry, who is already in the list, Harry kind of like floats off. You chop off the link and you've got actually two linked lists. And that's not good. That's not going to work. It's not like a Python array-based list where everything is packed together. Each node needs to know where the next one is. That's what makes this work. Okay. It's also what makes it fundamentally different than an array-based list. Within an array-based list, everybody knows they're all packed right in this little contiguous block of memory. A linked list, that's not necessarily going to be true. They could be all over the place. And the thing keeping them together is going to be these references. Okay. All right. So we've added Hermione to the head, or to the linked list, and we've updated the head to point to Hermione, because she's the first thing. Harry is still the tail, right, because he comes after Hermione. And our length is now two. I have to find myself a new marker here. Our length is now two. All right. Okay. Step four. Append Ron. Okay. So we're going to append Ron to this list. Well, where does an item go in a list when you append it? Okay. It goes to the end. Right. So let's first create Ron, or a node for Ron. He's down here. Ron is going to be at the end. Okay. So what has to happen? Several things have to happen. First, we're not changing the head, right, because he, Hermione, is still the head of the list. Then Harry. Now Ron. So three things have to happen. The first is, why don't we do the simple thing first? Let's update the length. We've now got three items in our list. Now what needs to happen? Ron's here floating off by himself. Harry needs to know that Ron now comes after him. Okay. So we've got to tell Harry's next variable. Harry's next. We need to create the length down here to Ron. Okay. So Harry comes after Ron in the, or excuse me, Ron comes after Harry in the list. We also need to update the tail. The linked list always knows who its head and who its tail is. And so the tail is no longer Harry. It is now Ron. Okay. Okay. So we had to do three separate things there, right? We had to update the length. We had to point Harry's next to Ron. And then we had to update the tail to point to Ron too. So this is all work you are going to have to do, right, before Python kind of took care of it for you. But now it's up to you to do that, to implement. There's also something else that's kind of interesting here. Let's say that you are interested in getting whatever the value at index one is out of this list, right? How do you get to index one in this list? Well, the head is index zero. The tail is index negative one. The only way to get to index one is to start at the head and follow the link. We have to start here and go here. This is called walking the list, reversing the list. We'll talk more about it again when we get into some code. But there's no way to access Harry directly, right? There's no pointer to Harry. And actually there's no fancy math we can do here either. It won't work because the linked list is not going to be stored in a contiguous block of memory. So the only way to get to Harry is going to be start at the head and go down the list, walk the next until you get there. And that's what's going to be the challenge, the big challenge in implementing linked lists, okay? All right, let's finish off here. So let's go to step five. Step five says, pop the item at index one. Okay, well how do we do that? We start at the head. We're going to start at the head, which is index zero, okay? And we're going to go to the item at index one, which is Harry. And now how do we pop it? What does it mean to pop conceptually? Okay, it's going to be a very complicated operation in code. I'm not going to lie to you. It's going to be tricky. We'll walk, we'll talk through it. But conceptually, just looking at this, what happens? I pop Harry. What does it mean to pop something? It means you remove it and return it, okay? So if I pop Harry, okay, the value for Harry comes out. Okay, I'll return it. And let me just go ahead and remove him. Let's get his note out of here. Oh, Harry's gone. Now you got a problem. What's the problem? You've got Hermione at the head of the list and you got Ron like hanging out down here. The link between Hermione and Ron has been broken. Gone, okay? So what we will have to do conceptually is we've got to create that link again. Because Ron comes after Hermione, okay? So we need to update this Hermione's next to point to Ron, okay? That's going to be tricky in code, but you know, we'll get there. Conceptually though, this has to happen. I took Harry out. I have to maintain the link to the next item. Okay, there's two items now. So we update our length. It's only two items now. But we've got to maintain that link, okay? We don't have to update the head or the tail in this case, because if I didn't remove them, I didn't do anything like that, okay? Now list.pop0, step six, okay? Step six. So what's going to happen here? I want to pop zero. Who's the item in index zero? That's Hermione, okay? Well, it's relatively easy conceptually to get rid of her. I just get rid of her node. She is gone, as is her link to Ron, okay? I'm popping zero, which removes and returns, okay? But now who's the head of the list? Well, the head of the list, again, the head of a linked list always points to the first node in the list. Who is the first node in this list now that Hermione is gone? Well, it's Ron, so I got to update my head to point to Ron, okay? All right. Makes sense. We got to know, get really, really, really comfortable with how these linked list operations manipulate the conceptual list. Updating the head, updating the tail when necessary, updating the length, and updating the links between the items, okay? Have that in your head, conceptually, because the code is going to be pretty tangly and messy for this. It's okay. We'll get you there. But if you have a good conceptual view of how these things are changing when you add and remove items, that'll really help you a lot when you go to debug this thing and to write this code, okay? All right, finally, final step for list.removeHarry, okay? Step seven, list.removeHarry. Harry is not in the list, is he? Right? So what do we do? Well, frankly, the list does not change, right? We're trying to remove Harry from the list in step seven, but he just isn't there, okay? So we don't change the list. In code, what are we gonna do? Well, what I invite you to do is to go into Python terminal, make yourself an array-based list, and try and remove something that's not in that list. What happens? You get an exception, okay? We are going to have exceptions occurring, as well, if you try and do things like remove Harry from this list, okay? And we'll implement that in code, all right? So be comfortable with conceptually how the linked list gets edited and manipulated from these various operations. Next time, we're going to talk through some Python code constructs we're going to need, some hopefully straightforward stuff, and then after that, we'll get into talking about the dirty, nitty-gritty details of implementing this new data structure. See you next time.