 If you major in computer science, almost certainly in your first year you'll take a course called something like Intro to Data Structures and Algorithms, which is a course covering the basic collection types, most commonly used in code, and the various algorithms associated with those collections, primarily searching and sorting algorithms. The course also tends to serve as a student's introduction to the formal analysis of algorithms, that is, the student's real introduction to proper computer science. In this unit we're just going to look at data structures and we'll follow up with searching and sorting algorithms in another unit. When we talk about data types we can make a distinction between primitive types and composite types. Primitive types are atomistic pieces of data like individual numbers or strings or Booleans. A composite type, in contrast, is a bundling of these elements together in a contigo's fashion, meaning that they're all placed next to each other in memory. Composite types come in the form of either an array that is a sequence of homogenous elements one after the other next to each other in memory, or in the form of what's sometimes called a record or more commonly just an object that is a bundling of heterogeneous items. Like say a record representing a person may start with a string which is the person's name followed by a number which is their age, followed possibly by other pieces like say their address or their nationality, etc. Very importantly though these various things which make up that person if this is truly just a record they are placed contiguously in memory one after the other. For efficiency reasons you might have padding in between the elements, but aside from that all the pieces are contiguous. Now composite types may be considered to be what we call a data structure, but the term data structure is more broad, as a single data structure also may encompass a group of composite types which are associated together by references connecting them, like say in one array or one record you'll have a reference a pointer pointing to some other array or record. Effectively we're talking about data which is related but not necessarily grouped contiguously in memory. Aside from arrays and records perhaps the simplest data structure is what's called a node. A node is a record with two elements, a value and a reference which may point to another node. So for example here we have two nodes with the reference of the one on the left pointing to the starting address of the other node in memory. The point of a node is to represent a value but potentially associate that value with some other value as pointed to by the reference. As we'll see nodes are the building blocks for larger data structures such as the linked list or in some cases graphs and trees. First considering just a node though how would we represent this in code? Well in Python we would create a class and we'd call it node obviously. We'll have our node class simply inherit from the object class which remember is the generic type at the top of Python's type hierarchy. And in our constructor method we call that in Python the object itself is passed to the first parameter of a method so by convention we call it self and we'll give our constructor two other parameters item and an optional parameter other which by default will have the value none. So when we create a node object but we don't provide an argument for the other parameter by default it will have the value none. In the body of the constructor we simply give the node object two attributes item and other and we assign to them the values passed to the item parameter and the other parameter. The item attribute is the value of the node, the item represented by this node whereas other is a reference pointing to some other node. By default though it's none so by default we have a node which doesn't point to any other node. As for the other methods of our class the things we want to do with our node is retrieve the item, retrieve the value with get item. Also retrieve the node pointed to by this node with get other. Change the value of the node with set item or lastly change which other node is pointed to by this node with the set other method. So a getter and setter for the value of the node and a getter and a setter for the reference to another node. As we've already established an array is a sequence of elements which are all stored contiguously and importantly those elements are all of the same type they're homogenous or failing that they're all at least the same size and in fact the array itself is of a fixed size once it's created it doesn't shrink or grow. A list in contrast is a variable sized sequence of elements. The number of elements can actually change through the lifetime of the list. Likewise the elements of the list need not be all of the same type and size. They can be heterogeneous. The two most common ways to implement a list are what's called a linked list which is a list composed of nodes or an array list where the list is made up of one or more arrays. In a linked list each element is represented by a node with the first node designated as the head of the list and the last node designated as the tail. The advantage of the linked structure is that it makes it easy to add and remove elements from the middle of the list. Like for example here given this list of three elements if we want to insert an item such that it takes the place of the second position and shifts everything after it one spot over we can do so by simply creating that node and updating two references the reference of the node that precedes where we're inserting this new node and the reference of the node which we are inserting itself. This works because unlike in an array the elements of this sequence need not be stored contiguously each node is allocated separately and given its own spot in memory wherever that may be and the logical sequence of the list is formed simply by the chain of references. The elements all belong to a logical order of the list but their actual storage and memory might be in any order. But as long as we keep track of where the head is located we can get to any of the elements. We can follow the chain from one node to the other all the way to the tail. The end of the list is simply denoted by a tail node a node in which the reference is null. It's the one node that doesn't point to another node. So now consider how we might implement a linked list as a Python class. In this case though we'll keep things simple and not include any operations for removing or inserting items in the middle of the list but rather just include operations for retrieving values of items in the list changing the value of an existing node in the list and appending an additional item to the list, tacking on a new node. So first off we will call this class linked list and we will have it inherent directly from the object type and in the constructor we do nothing but simply give our linked list object an attribute head which will be our reference to the head node but we'll keep things simple here and say that when you create a new linked list it always starts empty with no nodes whatsoever. So in the constructor we simply assign self.head none. For our append method the argument is item the value to store in the new node and then we create from that a new node. What we do with this new node depends on whether or not there already is a head or not. If there isn't already a head then the condition if self.head will test false because self.head will be equal to none and none is considered a false value in Python. So the else clause will execute and we will simply assign the new node to self.head and now we have a list with one node with one element. On the other hand if self.head is not equal to none if there already is a head all we need to do is find the tail node and we do so simply by following the chain of references from the head node to the last node in our list which is the node where the reference is equal to none. And see here we can do this by assigning self.head to a variable node and then we loop with a condition of node.getOther as long as node.getOther returns a node which will test true rather than none which will test false then the loop keeps executing by assigning the next node to the variable node. Once the loop exits because we've hit the tail node then we invoke node.setOther with the new node. So now what was formerly the tail node is pointing to this new node and the new node which we created is the new tail node. As for our getItem methods it simply has one parameter idx short for index which is the zero based numeric index of the item we want. The index is the value four then we loop to the fourth item in the list the fourth node and then retrieve that value from that node with node.getItem and that's the value we return. Notice that in our code here we're using the four in loop in Python and we're using the built-in range function which returns a sequence of numbers up to the argument specified. So if index here is four range will return a sequence of zero, one, two and three not including four. What we remember about range is that it's not inclusive. So we get a sequence of four numbers but starting from zero so not including four itself. In this case here the elements of the sequence which we are iterating over in this four in loop doesn't really matter. All that matters here is the number of items that we are iterating over because the variable i here is not actually getting used. Also note in this code that if we were to specify an index that exceeded the bounds of the list like say if our list only had five nodes in it ten well that will end up triggering an exception because in the fifth iteration of the list we're going to end up assigning none to node and then in the next iteration and the sixth iteration if we invoke node.getOther when node is none then that's an error. You can't call a method on a none object obviously. So we get an exception which really is the behavior you want. If you specify an index out of bounds you should get an exception. Lastly notice that the set item method is the same except we're specifying an item parameter and then in the last line we're not returning a value we're just calling node.setItem and passing in the item. We use the same loop to find the node at a certain index it's just we then use setItem instead of getItem once we find the node. Now it may have occurred to you that getting and setting items in a linked list seems kind of inefficient because it requires traversing the entire list up to the certain index we're trying to get or set. For this reason a linked list isn't always the most desirable form of a list. If what you mostly do with a list is iterated through it sequentially then a linked list can work out really well because you're just going from one item to the next. If however you need so-called random access to the elements in your list that is you tend to jump around a lot from different places in the list then probably a better solution is what's called an ArrayList. As the name implies an ArrayList is a list stored in the form of an array. If at any point our list exceeds the length of the current array what we do is create a new larger array and copy over all the existing values. Now of course copying all the existing values into a new array gets expensive especially as our list gets larger. So to avoid having to do this operation too often a common strategy is to double the size of the array when we resize the array. For the most common usage patterns with lists this doubly behavior tends to minimize the number of times the array ends up getting resized. In some scenarios you may find it useful for your ArrayLists to actually also shrink when enough items are removed from the list. Doing this can help keep down the memory usage of your program but as long as you're not terribly concerned with your memory usage it's not strictly necessary. If the array of your ArrayList has a million slots but you're only using a few of those slots sure that's wasteful but the list will work perfectly fine. The opposite of course cannot be said. At any moment in time the array has to be large enough to at least hold all of the current items in the list. So here's our quick and dirty ArrayList class in Python. First note that the constructor takes no arguments because our ArrayLists will always start off empty. The array however will start off with a size of 10. So note that the class distinguishes between the length of the list and the array they are not the same. The array will always be at least as large as the list but it of course may be larger and that's how things start out. The list has zero items but the array has 10 slots. Just a reminder here looking at the second line we're creating a list with one item the value none and then we are multiplying that list times a knit size which is 10 and recall in Python when you multiply a list times a number what you get is a new list with 10 items of the list repeated that number of times. In other words we're taking our list and we're concatenating that list with itself 10 times over so we end up with a list with 10 items all of them with the value none and that's the list we assign to self.array the array attribute of the ArrayList object. Note that it doesn't really matter what value is in the slots of the array which are not yet part of the list those slot indexes which are past the list length which at the start I just chose the value none because well something has to be there now looking at the append method which takes an item and tax that on as an additional item at the end of the list so we're expanding the size of the list by one. First off if the current length of the list is equal to the size of the array then that means there are no more slots to use and we need to expand the array so we double the size of the array by taking the existing array and concatenating to it a list of none values which is equal in length to the current array length so again just be clear about the python code first the list with the single value none is multiplied times the current array length giving us a list of none values that is array length long and then the plus equals operator is taking the current value of self.array adding it concatenating it to the list of none values and then lastly assigning that new list to the attribute array of the object of self recall that the plus equals sign operator is just a convenience that spares us from having to write in this case self.array twice both on the left side of the equals sign and on the right side x plus equal y adds x and y together and assigns the result to x so now we've doubled the actual size of the array and so we need to update the array length attribute and double it as well which we do so by simply multiplying it by 2 so now we can be assured that the array is long enough to append this new item and we append the item by simply assigning to the index of the current list length and having added the item we now increment list length because the list is now one larger as for the get item and set item methods again their logic is very similar but note in both we first have to make sure that the specified index is in the range of our lists we don't want our list to erroneously get or set values in parts of the array which are beyond the end of the current size of the list so if the specified index is greater than or equal to the current list length then we will throw an exception saying that the index is out of bounds if the index is in bounds of the list then we simply return the value at the specified index within the array or we set the value at that index in the array lastly I should note that for demonstration purposes we are ignoring the fact that the python lists which we are using for our array are actually themselves already lists in fact I think they're actually implemented as array lists for the purpose of this demonstration though we're pretending that they're more like an array in C in that they are fixed in size though of course that's not the case with python lists so just be clear that you would never actually create this python class or even any implementation of an array list in python that already has a built-in list class what are called queues and stacks are like lists but they are artificially constrained and that items can only be added to one particular end of the list and also removed from only one particular end of the list in a queue items are appended on one end of the list and then removed from the other whereas in a stack the items are appended to one end of the list and then removed from that same end of the list so as the name suggests queue is like a line of people where people join the line at the end but only leave the line from the front and a stack like the name implies is like a stack of plates where you only place plates one at a time on top of the stack and you only remove plates by taking them off the top one by one we've already discussed stacks in the context of the call stack each function call adds a new frame to the so-called top of the stack and when the current function returns the stack is removed this pattern is also known as LFO last in first out the last thing added into the list is the first thing next removed likewise queues are also known as FIFOs first in first out the first person to join a line is going to be the first person through the line now in both the case of queues and stacks they can simply be implemented as just a regular list but with slight modifications to the available methods the available operations such that there is no operation for say inserting an item in the middle of the list or removing items from the middle of the list or even reading the items in the middle of the list to be a proper stack or queue the only operation is to add an item and remove an item one at a time and only at the appropriate ends of the list now you might be wondering if there's such a thing called a first in first out and a last in first out what about a last in last out and a first in last out are there such things well logically a last in last out would be the same thing as a first in first out the last person to join a line is going to be the last person through likewise a LFO a last in first out is logically the same thing as a first in last out the first plate added to a stack is going to be on the bottom of the stack and so it's going to be the last plate removed though Lilo and Filo are logically equivalent to FIFO and LFO respectively you almost never hear those terms used the proper terms are FIFO and Lilo let's look now at how we might implement a stack using nodes very much like a linked list so we're creating a Python class called nodes stack and in its constructor we're simply assigning to an attribute called top initially the value none top is the attribute that keeps track of the node which is at the top of the stack is empty and by convention when you add something to a stack that operation is called a push so we have a push method which takes this argument an item and we create a node for that item which is going to be the new top of the list so in the case where the stack is empty self.top is none so the new node is pointing to none which is actually the default value anyway and then the new node is assigned to self.top and now we have a stack with one item in it if however there are already items in the stack then self.top is not none and we're creating a node which points to the old top and then the new node is being assigned to self.top it's becoming the new top so understand that the structure of our stack is a chain of nodes where the top node, the top of the stack is the node at the start of the chain and the chain of references go from the top to the next item to the next item all the way to the end the bottom of the stack so be clear this is sort of a reversal of how we did our linked list where we were pending items on to the end of the chain here we're pending items on to the front of the chain now the operation to remove and return the top item from a stack is traditionally called pop so we have our pop method and notice it takes no arguments because there's no question of what we're moving we're always removing the top node and first we're checking to see if there is a self.top node and if there isn't then we're going to raise an exception saying pop because the stack is already empty otherwise we simply take the top node and retrieve its value with get item and return that but before returning we need to remove the top node from the stack by simply re-assigning self.top to the node which the old top pointed to so we invoke self.top.get other to get that node