 So, last time we started out with memory management, how to allocate just one integer cell, pass it around between functions and we saw that even though the function terminates and its scope closes, you can still access the cell that was allocated inside the function even after you terminate the function. And therefore, now the responsibility of other functions in your program to carefully free up that memory when you are done using it, otherwise no one will freed up for you closing a scope will no longer automatically release the memory to the system. This space comes from the stack from the heap, it does not come from the stack. So, this was the example, we allocate a pointer to an integer called p i, p i itself is not an integer, when you access it using star p i then you get an integer. So, you can say int star p i equals new int, this allocates 4 bytes from the system and then after you can write into the cell pointed to by p i by saying star p i equal to 5 or you can read that cell by saying star p i then you retrieve 5 back again, you can do arithmetic with it just like ordinary variable. You can print both the address and the contents of the address in some expression as shown in the fourth statement and the final delete p i returns the 4 allocated bytes to the system and you have to do it exactly once, neither 0 times nor more than 1 time. So, the pointer syntax are pointer variable name and one important thing is if you are declaring multiple pointer variables, you have to say type name star p a star p b it is not the star does not bind to the type name, it binds to the variable. So, if you keep giving star in front of every variable you declare. Now, the next thing we will see is how to allocate not one cell holding an integer, but many cells and this is essentially the same kind of behavior as declaring a native array except that the native array storage will now come from the heap and not the stack. So, suppose the size of the array you want to allocate is 5 the variable syntax remains the same it is still an int star p i this time instead of saying new int you say new int with a box bracket and the number of elements you want to allocate. So, in this case because n n is equal to 5 the number of bytes allocated to you will be 5 times 4. So, 20 bytes will be allocated to you and p i will be initialized here to the base address of those 20 bytes after that you can access that array pretty much like native array. So, you can write p i box i x if you want to access the i x th integer in that native array which is now allocated on the heap internally the expression p i box i x is the same as star p i plus i x. So, let me once remind you about the star p i plus i x syntax. So, suppose the memory that was allocated to you 20 bytes started at address 1000 let us say and this was 1 0 2 0 this location. So, now the variable p x is equal to 1000, but remember p x has type int star it is not a byte star for example. So, which means that p i plus 1 is actually equal to address 1004 p i plus 2 is actually equal to 1008. So, as you do arithmetic on the pointer the type of the pointer becomes important and when you say star p i plus 2 this refers to the third cell which holds an integer. So, and this is equivalent as an expression to p i of 2 p i at 2. So, that is the inter convertibility between pointer arithmetic and array index holds only for native arrays and finally that is not so important if you read someone else's code and you get a little mystified that is why I am explaining it for our purposes there is no good reason why you should write star p i plus i x actually. And finally when you are done with that array you should delete it, but to indicate to the runtime system that you actually delete deleting an array and not one variable you should give those box brackets just like during allocation you gave the box brackets. So, the runtime system remembers that you allocated 20 bytes unfortunately there is no language device to ask the runtime system the size of the array you allocated based at p i. So, once again you have to remember nn very carefully and it is pretty important that nn does not change by accident once you have allocated that because then we will lose track of how far you can go starting from p i. So, you have to remember the size of the array you allocated. So, that is how to allocate an array instead of one scalar variable. So, in today's lecture we will extend this example to implement our own vector class and the basic strategy will be that we will start off with no backing native array or a very small one and when you push back enough elements then and it overflows our native array. We are going to allocate a new native array we are going to copy over the old contents to the proper positions and we are going to add the new elements as specified. After that we are going to delete the older and smaller native array. When you delete an element you can either be sloppy and hold on to the bigger array or if you find that you have a native array of size 1000, but currently there are only 5 elements in it. You might take pity and return most of you know that big array to the system and instead allocate a smaller array while copying the current elements into it. So, that is the basic strategy and that is exactly how the vector class is implemented. We will just see how to do it on our own. Then in the second part of today's lecture we will look at linked lists. Vectors have this problem that if you push an element in the beginning or in the middle or you delete an element in the middle, then you have to shift all the elements after it one position left or right and that is expensive. List do not have that problem. In lists you can insert and delete anywhere in the list with constant cost. The flip side of that is list do not let you index into an element in the middle of the list. You always have to iterate through from the beginning or maybe the end to get to any element. So, it is a strictly sequential access data structure which you access starting at the beginning and walking up to some point, but if you are in an intermediate point you can delete or insert elements there with constant cost. And finally, if we have time we will look at binary trees which is another linked data structure where there is a root node and then there is the left subtree and the right subtree and that is recursively defined. So, a tree is either empty or it is a root node with a left subtree and a right subtree and then we decide how to insert keys and so on. So, one important thing to note is the so called null pointer. The address 0 is not legal. So, this is often used to denote an illegal pointer value and I think yesterday I talked a little bit about it. So, you can say that you can have a int star PA which is actually 0, but the compiler will complain because 0 is an integer and not a pointer. So, you can cast it to int star making it a pointer. Now, if you are doing it for classes it is usually customary to declare a static member of the class called null. A static member is one that is shared across all instances of that class, all object instances of the class and then you declared null to be equal to the pointer 0 except it is cast to customer star. Now, here is the specification of our own vector class. So, the vector class will have a default constructor which will create an empty vector. It will also have a copy constructor which will take another vector and transfer all its elements not move, but copy. Copy all the elements of the other vector into this vector. It has a destructor which will release the storage associated with this vector. We support the following four other methods. There is an insert method which given a floating value V will insert it at position P in the vector. Now, we are free to define the rules of the game because it is our class. If you give a P which is negative that is typically a bug. So, you should not tolerate that. If P is larger than the current size of the not current capacity, but the current size of the vector. Suppose your current vector has five elements and you ask to insert something at offset 10. In this case, we might choose to just throw an error or say we cannot do it. You can only insert inside or at the end of that. So, pushback will be allowed by having P equal to the size of that. Similarly, you can remove an element from position P and elements at larger indexes will be squeezed down one position left. You can get an element from position P and you can get the size of the current vector which is the number of elements which are actually in it at the moment. So, observe that the last two methods are marked as consts. So, you declare a method as a const by putting the word const after the function. That says that the method get and the method size do not modify the state of the vector in any way. It just reads stuff from it. It does not write into it anyway. But of course, insert and remove will modify the state of the vector and therefore, those are not marked as const. Yeah. So, we will see that because inside the constructor will start doing allocation from the heap and that has to be returned to the system in the destruction. We will implement it. This is just a declaration. This is just a specification which might go into a header file. Now, inside the vector has other private fields which manage its space and operations. So, we will need three private fields. There will be a native array. So, there will be a float star PA which is the pointer to the float buffer that I allocate on the heap. So, PA is a float pointer, float star. There will be an int cap which records the number of floats that can fit in the current native array I have allocated. If the native array overflows, then I have to allocate a new native array and then cap will go up at that point. Whereas, size siz just for shorthand will record the number of floats that are currently inserted in there. So, in general, there will be this invariant that size is less than equal to cap at all times. Now, if we run out of space, then we allocate a larger array and copy over. And maybe, if we are generous, if size is much less than cap, then we will allocate a smaller array, copy over and release the larger array. So, let us look at some of the basic initializers and destructors. So, here are the private fields of the vector, the float pointer PA and cap and size. So, what does the default constructor do? It initializes cap and size equal to 0. And it initializes PA to the null pointer. It means that initially I have no buffer allocated. Now, if I am doing a copy construction from another vector. So, if we initialize this cap and site in the private, and in public, we do not have to again, behind the cap and site. No, it goes into exactly. The reason I am doing this is I do not want the user of vector to look at or manipulate PA, cap and size. That is my territory. My only contract with the outside user is those few methods and constructor and destructor. And nowhere in the signature of the public part of the class, will you find any reference to PA, cap and size? Only the implementation of those methods will use PA, cap and size. So, my convention is that when I create an empty vector, I set cap and size to be equal to 0. And I set PA to the null pointer. So, there is no storage to start with. On the other hand, if I am using a copy constructor and copying the vector from an other vector, then I do the following. I initialize cap equal to size equal to the other vector size, not capacity. That is also a choice. Then I allocate my buffer, which is a new float with capacity cap. And then for ix equal to 0 through cap, which is the same as size, I set PA ix to be equal to other dot get ix. So, that copies the array over. Now, one important thing to note here is how the memory management is exactly being done. So, have a look at this picture. So, remember earlier when I said main and inside I said point pt. The space for that point was coming from the stack. Now, and then as you kept on declaring it, there would be the stack segment, which started out at the bottom when main started. So, if you did pt1, pt2, pt1 would go in here, the space for pt1, the space for pt2 would go in there. And as you saw, each had an x field and a y field. So, each was taking 16 bytes, if x and y each was a double. Now, suppose I go to main and say vector vec1. The question is, where does the storage come from? So, now in addition to the stack segment, so this is the stack. I also have a heap segment. The stack has a strict discipline of allocating upwards and freeing downward by end of scope. But the heap is much more complicated. As you have seen, combination of a function is no guarantee of releasing the memory. So, the heap is a very complicated noodle soup in which some parts of the memory belong to some variables, some other parts belong to some other variables. And there is a fairly complex memory manager, which takes care of marking which parts of memory you own and which parts you do not and what are free and so on. So, that is done by the runtime. You do not need to worry about it. But what happens is that when on the stack, you declare vector vec1, vec1 has some basic fields called pa, cap and size. Those will be on the stack. But because pa was obtained by allocating using new, pa will point to a segment in heap where the actual elements will be. That is the convention. When you have inside the vector class, float star pa and other native variables like int, cap, etc., these native variables, as well as the pointer itself, those will be in the stack. What the pointer points to will be on the heap if you have used new to allocate it. Anything you allocate using new is on the heap. What amount of memory is it taking up in the stack? In this case, it will be one pointer for pa, one integer for cap, one integer for size. So cap and size will each take 4 bytes. But what pa will take depends on your architecture and how much time you have. This laptop is a 32-bit architecture, which means that even pointers are integer-like numbers. They are exactly 4 bytes. But newer computers and laptops are all 64-bit machines where an address will take 8 bytes. So accordingly, the size of this base record for vector will be decided. Now, let us go back to the code. So one important thing to observe here is that this is called a deep copy. See, I could always have just like I am saying cap equal to size equal to other dot size. An alternative implementation, which is a shallow implementation would say cap equal to other dot cap, size equal to other dot size and pa equal to other dot pa. I did not do that. I actually undertook to allocate new space and copy the elements over one by one. So the upper strategy is called a shallow copy. Why is it a shallow copy? The actual buffer that is in the heap, which pa points to, that is not being copied. Only its base address is being copied from the other guy to myself. So if I did the shallow copy, then after the initialization, these two vectors would be sharing a buffer with the same elements. Whereas in case of the deep copy, two different buffers would be created in heap. So again, just to make this totally clear as a picture, let us say here is the heap and I had a vector vec1 whose say cap was equal to 20, size was equal to 10 and pa pointed to address say 10,000 in the heap and the capacity was 20. So this was 80 bytes. If I now initialize vector vec2 and I say this is equal to vec1, that invokes the copy constructor. Now if I write the copy constructor as a shallow one, then my vec2 would be will end up having again cap equal to 20, size equal to 10 and pa will pointed exactly the same place. It will be again 10,000. Whereas if I implemented it using the deep copy constructor, then vec2 will have capacity equal to 10, size equal to 10 according to my implementation in the code and pa will point to some other part of memory, let us say 20,000 with only 40 bytes. But holding exactly the same elements as the first 40 bytes of the source vector. Is this clear? Different between shallow and deep copy. So there is place for both of them. Sometimes you want shallow copies so that you can share the storage. Sometimes you want deep copies, but in general the standard libraries in C++ and boost, they all promote deep copy. The safety of deep copying is that you have now separated out the new array from the old one. They are not sharing any data structure and changes you make to vec1 after this step will not affect vec2 anymore. Vac2 has been copied off and separated. Whereas if you did a shallow copy, then any change you make in the old vector will instantly be visible through the new vector as well. That is a little trickier to code with and that is not the semantics of what happens if you say int a equal to 5, int b equal to a, a equal to 6. In case of primitive variables, it is really a copy semantic. It is not a linking semantic. Unless you explicitly say int ampersand b equal to a, then you are creating a linkage. If you do not give a reference, then the assumption is that you are doing a deep copy. Therefore, all the libraries we have seen so far has been implemented with deep copy constructors. Ampersand a takes the address of a and turns it into a int star. Yeah, so this is a reference. The reference is different from a pointer. So perhaps finally the time has come to hash out the difference between references and pointers. See, references cannot be changed. So in the sense that, so this is an important point. Let us might as well dive into it right now. So I say int a equal to 5. Now I can do one of two things. I can either say int star pb equal to address of a. And then to access pb, I will also always have to use star pb. Otherwise, suppose I have something like c equal to 8. So that, now I am free after this point to write pb equal to ampersand c. What happens is that the pointer called pb is now swung away from a and pointing towards c now. This is not allowed if you instead write something like int ampersand r equals a. This makes r an alias name for a. And if you now say r equal to c, so if it is r equal to c, what will happen is a will become 8 and c will remain 8. Is this clear to everybody? This is very important. So the difference between references and pointers. So here only the pointer swings around from a which is still holding 5, c is still holding 8. pb used to be here. It now swung there. So it means that the pointer variable itself has a state which can be changed. A reference does not have state. A reference is merely a name alias for something. Once you create this name alias for a, when you say r equal to c, you are actually directly poking in. So a and r are now aliases. And b is holding 8. This used to hold 5. As soon as you say r equal to c, this 5 will be struck out and you will get 8 in there. So that is the vital difference between pointers and references. Can you say pb equal to c? You really cannot. That is a type violation because pb is an int pointed and c is an int. You can force it but you might be sorry. That is the difference between references and pointers. We are here and so you understand about deep copy and shallow copy. So now let us get to the other method. So size is very simple. You just return size. Nothing to do. If you want to get the floating point number at index position p, then you should just return p of p but after doing some safety checks. So what is the safety check? You want p to be greater than equal to 0 and you want p to be less than size. You do that by what is called an assert statement. To use an assert statement, you have to include c assert. After that, if you write this assert statement, assert followed by bracket followed by a Boolean expression. If that expression fails, you will get a controlled crash of your program right there. You will just exit the code with the error message printing that this assertion has been violated. It is a good defensive coding strategy to paper your code with assertions if you think that must hold at a certain point in the code for correctness. How do you destroy a vector? It is possible that I created a vector and I never really inserted anything in there so there is no buffer. In which case cap would be equal to 0. If cap is positive, then I will delete p and I will set cap and size equal to 0. So now the array is gone. We are back in the initial stage. How do we insert? So this is a little more detailed. So I insert at position p, the floating point value v. So the technique will basically be this. So if size plus 1 is less than equal to cap, it means that even after I insert the new element, I will not overflow. Then I go ahead and do the easy stuff. Otherwise, I will have to allocate a new buffer which is larger. In this particular lecture, I will just be sloppy and say I will allocate a buffer which is just one large. Generally, the vector class will allocate something that is a little bit more extra space. So you do not keep on allocating every time. So here, suppose we allocate a new pa. See, I had this old pa and I want to insert the floating value v at position p. That will stretch my vector out but I have no space. So I allocate a new pa which is one larger than my current size. Now I create read and write cursors as usual. And then the purpose of all these read and write cursors, if you do not want to read the code, it is very easy. The first block is copied over. Then the value v is poked in at the right position through the right cursor. Then the other elements are copied over to the right of the inserted position. That is it. And the only thing to observe later on is that if I actually had an old pa, then I delete it. Otherwise, I set the cap and size to size plus one which is the size of the new allocated vector array. And then I finally swing the pa pointer from the deleted space. So once you delete pa, pa becomes meaningless. So now I set pa to the npa. So this is exactly how vector is implemented inside. No magic. So that is how even the code will work. So this is entirely how this thing is working. So no need to do it. And remove I have not implemented yet. You can try it out in a lab. Npa has been copied over into pa. So now we do not need to hang on to it. New pa is lost when you close scope, because new pa itself is on the stack. See what new pa is pointing to is in the heap, but the new pa variable itself is on stack, because it is locally declared. So new pa is destroyed when the scope is closed. But the storage allocated through new pa is not. That is now pointed to by pa. Perfectly clear. So again as a quick picture, apart from that earlier one, what happened is here was the stack and here was my heap. Earlier pa was pointing to a buffer like this and I decided this was too small. So I allocated npa, which was a larger buffer. I did my copying and manipulation. And finally I swung pa over to the same area. So this stuff remains on stack. A new block is allocated on heap and the pointer is swung around. Everyone comfortable with this? So other than this, there is nothing interesting to say about the vector class. So now we will get on to our own queue class. If you delete pa, if you actually allocated pa. So here is the queue class. And for simplicity, let us say we have a queue of integers. Now this is for the customer simulation. Instead of using the systems list t, we will write our own queue. And for simplicity, instead of a whole customer record, we will just think of putting an int in there. You can put anything you want. So the public methods will consist of the following. We will have a test for emptiness. bool is empty. The test does not change the queue. So it is a const method. We can remove the first int at the head of the queue. This will modify the queue. You shouldn't call remove first if the queue is empty. That's left to the outside user to ensure. Or we can push last. Remove first will remove the head of the queue. Push last will take this int val and push it to the end of the queue. And finally you can say print the queue. Now inside the queue will be implemented by a different struct called queue element. And the queue itself will maintain two pointers, one to the first element in the list, one to the last element in the list. Why these two pointers? Because to remove the first, it's good if I have a handle on the first. Why the last? So that we can quickly push back another item on to last. The system property is that if first is equal to last equal to null, then the queue is empty. Now what does the queue element look like? So here is a first example of what you might call a recursive data structure. We have seen recursive functions. What is a recursive data structure? The way to talk about it is what is a queue? A queue is either empty or it is a queue element with a pointer to a queue. So this is what a queue element looks like. It has, it is a struct. It has an int value which is a stack. In other applications it might be the customer with lots of fields. And queue element itself has a pointer called next to the next queue element. And we define one comfort constructor, queue element with an integer v, where v is stashed into value and next is set to null. This is how we create an isolated node with a value. But in general, filled out non-empty queue will probably look like this. So let us look at the heap now. So it is possible that the pointer first has value say 2000. 2000 is a struct which is a queue element. So at 2000 starts a queue element with 2 fields. So the value field is 5 and the next field may be equal to say 1000. So what is at 1000? 1000 has another such queue element. Maybe its value is equal to 2 and next may be equal to 3000. At 3000 I might have another such record queue element where the value may be equal to 9 and next will be say equal to 0, which is null. So observe that this 1000 basically links next to that record. This 3000 links next to this record. So logically the queue is a bunch of values which is 5 followed by 2 followed by 9. Pictorially I might write as 5 pointing to 2, pointing to 9 and then it is conventional to give the ground symbol for null or maybe a circle with a blackened circle saying that it is end of the list. So that is how we pictorially draw this. Observe that there is no correspondence between the linear order here and addresses here. This is a noodle soup. I already promised it. So the elements for the list can be coming out of the heap in any arbitrary order. So it is the addresses which chain them together. So now how am I going to print out? Suppose I am trying to write the print routine. So in queue, in the class queue I had void print. So can you talk about an implementation now of print? So I would say that for queue element I am shortening it for simplicity. Queue elements star read cursor equal to first. So I started first while rc is not equal to null. While? This is a while. I mean this is a for. So for I initialize rc equal to first while rc is not null. Inside what do I do? I print out what? Remember rc is of type queue element. Queue element has 2 fields. A value and a next. So star rc is a queue element and I can now print its value. Let us say I do that and I print an end. So the equivalent of that is rc add a value. That is syntax for star rc dot value. And then the important step is what do I have to do after that? I have to walk over to the next element. If I am at rc then star rc dot next is the address of the next. So I will just set rc equal to that. That is equal to hopping from this record to the next record. Or in other syntax you can write rc equal to rc pointer next. This is the basic traversal step. So coming back to this picture suppose at some point rc is equal to 2000. rc arrow next or rc star rc dot next is equal to 1000. So if I set rc equal to rc dot next that is equivalent to changing the value of rc from 2000 to 1000. And that involves hopping over from the first element to the second element. So that is how you walk through linked data structure. You update pointer variables. Typically from pointer fields in the previous record. So that is how you walk from link to link or item to item. So now let us get back to the. So how do I say construct and destroy? I told you that empty list means first equal to last equal to null. There is nothing in the list. How do I destroy? I may have this non-empty list hanging around with the elements allocated in heap. I do not want to keep them hanging around because now I have wasted some memory from heap. I need to give it back to the heap. How do I do that? Well first will no longer be useful after this. So I can instead of using rc I can just use first itself. So while first is not equal to null, q element pqe is equal to first. So what is happening? Suppose this is my beginning of the list. First with the next pointer to the next element and so on. First I create a copy pointer pqe which also points to the first element. Then I skip first forward. So first now diverts from the first element to the second element. So the first equal to first pointer next, add on next, that implements this bypass. Once it is bypassed, now I can delete pqe. So pqe along with this next pointer everything goes on. So this is one basic step in deallocating and returning the first element of the list to the system. And we repeat this while first does not become null. So how many are clear about how destruction is being done? Start at first, keep walking, bypassing, freeing, bypassing, freeing. How do we push last? Remember I already have a handle to the last element. First of all if the first is null, it means the q is empty. Then I can just first of all I allocate a new, so here is the, I first allocate a new isolated cell, yes. Q double, what? That is a good point. Even here you are seeing double codes, double color. So idea is that it depends on where you are declaring things, exactly like an M space. So if you say class foo and then you can define a method, int method and so on. If you implement it right here, that's all you need. If you close the class and you haven't implemented it here but you want to show the implementation here, then you have to write int foo colon colon method and then start the implementation. That holds for constructors as well as other methods and destructors. So here I am assuming we are out of the scope of the class declaration, so I just put a colon colon in here. So push last, I am trying to push a value, val to the end of the queue. The first thing I do is I allocate a new queue element and pass it val so that I have this isolated cell which is pointed to by pqv which holds val and points to null again. That's what I start with. The next thing I need to do which is not shown in the diagram is if first is null, then the queue was empty. So now first has to point to pqv. This line is clear. First is equal to null. In this second statement, that's what the fun part is. If last is not equal to null, suppose the queue was not empty, there was already a last element in there. Then last pointer next has to be made equal to pqv. Why? Because earlier last pointer next was null by definition. So this pointer out of last which goes to null has to be diverted to point to pqv. And that's implemented by just overwriting the last pointer next, last arrow next. Finally, what do I have to do? Last has to be equal to pqv. Even if neither of the earlier statements held, now this is the last element, push last. So last has to swing from the green element to the yellow element. And pqv, as you know, is on stack. That will go away when I close scope. So at the end of the push last method, then I have restored consistency. A new element with value val has been appended to the list. Last has been updated. And first and last are now both correct. And the previous last is now pointing to this new last. Yes. Yes, that is done because that's a good point. q element has a default, has a constructor which sets next to null. That's important. Otherwise, if you put garbage there, then your list is completely corrupted. You have to make sure that all stray pointers are initialized to correct value. Remove first now should be very easy. I'm not even going to do a diagram for it because we have already done remove first for destroying the queue. Yes, only do it once. So the answer is first add a value. Then you copy the first pointer value to pqv. Then you swing first to the second element. Then you delete pqv and you return answer. The whole deal with coding with pointers is that the relative ordering between these statements becomes very important. You shouldn't mess up the ordering. In what order you are copying, in what order you are swinging pointers. Otherwise, things will be buggy. I had to retrieve the answer here. Why? Because first was copied into pqv and deleted. Therefore, I have to remember the answer in some other variable to return it. Otherwise, after the delete, you couldn't access first pointer value. Now similarly, I shouldn't delete pqv before hopping first over because first itself will become meaningless. See, first and pqv become aliases in this statement. After this, if I delete pqv, even first becomes meaningless. Very important to understand where each variable is pointing and not use deleted space. Any questions about this? Queue class. So observe that in both deletion and insertion, I was just doing one new or one delete and just swinging around a bunch of pointers. So the time taken is a constant. I don't care how many elements are there after first in the queue. And last, of course, doesn't matter. Pushing back something on last has always been a constant time operation, unless you're reallocating a big vector. But that happens seldom. Any questions on this? So to summarize, our own vector class, memory buffer, and then our own queue class, we're not allocating a memory buffer for more than one element. Each element has a pointer to the next element. And we're doing this pointer manipulation to insert and delete things. New buffer, yes. Yes, we're doing it, right? So there's a picture of copying things over by this one. For our vector class. Yes, copy, copy, copy constructor. Yes, that also was shown in the C++ code. So if you look here, that is our copy constructor. You could either do a shallow copy, which I'm not recommending. The style which is consistent with boost and standard C++ is that you allocate this new buffer. Declare the new buffer and you copied all the elements from the other vector over. Yes. When you say trying to copy it, you want to add those elements over to the your original elements, right? Like vector one vector to trying to copy. But vector one is empty to start with. This is a copy constructor. This is where vector one is being constructed from nothing. So this is not like, you know, vector one dot append vector. Yeah, so that's the perfectly fine thing to want to write. So one method you should support perhaps is, you know, that would be a public method, right? So which is no, this is just this is the initializer that creates a vector. Otherwise, you have a standard method happen, const vector and other. Here you have to do something. So in the last part of today's lecture, we will look at binary search trees. So first of all, what is a binary tree? A binary tree is either empty, like before the universe was created, or it has a root node with two children called left and right. These children are sub trees. Each of which is a binary tree itself. So just remember these four lines of definition and let me give you some examples of binary. So if I take this sheet of paper and say this is a binary tree, yes, it's a binary tree. It's an empty binary tree. You could have a binary tree with one node, which is the root. This means that both its left and right are null pointers. You could have unbalanced binary trees, namely, here is a root node with a right child, but no left child. And the right child in turn may have null pointers for left and right, that's a binary tree. Or you could have balanced binary trees, where each internal node is required to have either zero or two children. So here is a completely balanced tree with four leaves. And these have null pointers for left and right. So these are all binary trees. Now if a binary tree has a key associated with each node, say an integer key. And for simplicity, assume that the same key doesn't appear twice in the tree ever. It's called a search tree if all keys in the left subtree are smaller than the root strictly. And all keys in the right subtree are strictly larger than the root, key. So remember, every node has a key. If all keys in the left subtree are smaller than the root's key, and all keys in the right subtree are larger than the root's key, it's called a binary search tree. And this doesn't hold only at the root. It has to hold at every internal node in the tree. Definition is clear. So, yes, so left and right will be three node pointers. So they should be cross linked also, right? Why cross linked? Why cross linked? Because if I pick an element in like the bank in the array, different in an array, it will be in the heap. We'll look at the implementation. Let me get ahead of you. So how do I use a binary search tree from main? So let's set up a specification and use case even before you start implementing. So I start off with say a null pointer. I declare it just for convenience, but you already know how to do nulls. I start out with a tree node star root, which is null. That's an empty tree. So the white piece of paper is represented by a null point. And then in an infinite loop, I keep reading the next key to insert. I assume the user will be kind and ensured that all keys are distinct. Yes. Null is not a keyword in C++. But then you have to cast it every time. So it's much better to declare null as a static field in every class. So like tree node null. In the new style, what we'll do is we won't need this first statement. Instead here, we would say tree node colon colon null. So this is how a binary search tree may be built. The user will enter keys one after the other. For simplicity, let's assume the user will not issue duplicate keys. But the keys may come in an arbitrary order. Every time I read the key from CN, if the tree is empty, then the only thing I can do is root equal to new tree node key. Just like I was declaring Q element with a value, here it's a tree node with a value, which is the key. Otherwise, if there is already a root, remember root is a pointer. And so star root is a tree node. So star root dot insert is the equivalent of root arrow insert. I'm going to call an insert method at the root node with my new key so that it will be properly inserted in the tree. So let's see an example of how things should be inserted. Suppose in the beginning, I have an empty tree. Initially, root is equal to null. Now next to the next step is insert 10. That's the action I want. Because root is null, the only thing that happens is root now points to a node with key 10. And these have null left and right children. If now the next thing I do say is insert five. Take five, compare with 10. This is like a railway shunting situation. Since five is less than 10, we go down the left branch. Now there is nothing but null. Therefore, the only thing I can do is insert five here with couple of null points. Let's say the next action is insert seven. So seven comes in at the root, finds 10, goes left, finds five, goes right. And that's the end of the load. So seven is inserted here. It always has to percolate down from the root. At every node, it compares the key at the node to itself. You're walking down from the root. At any node, if you find that you are larger than the key at that node, go down the right side. But sir, at 10, it found that seven is greater than 10. So seven is less than 10. So it came left. So it's 12 and 5. Okay. So insert 12. Do you not go at 5? No. Because I'm comparing with 10 and 12 will be inserted right here. For example, suppose the next thing is insert, I don't know, three. Where should three go? If you run right down 10 to five to three. Okay. If I now insert 11, where should it go? Larger than 10, but smaller than 12. 11 goes here. Okay. So you see what's going on. This is sort of approximate way of, you know, extending something like quick sort. So think of the root as the pivot. Okay. So what's happening is as I insert these things, the tree grows potentially irregularly while always satisfying the invariant that he's in the left subtree of any node are strictly smaller than the key at that node. He's in the right subtree of any node is are strictly smaller than the key at that node. Okay. Now at any stage, if I want to print out the keys of the tree in sorted order, it's not difficult to see how we should do it. We should write a recursive program. So at the root, I should first print the left subtree. Then I should print the key at the root. Then I should recursively print the right subtree. Unless the node is null already, then there's nothing to do. So that's the recursive routine for printing. So there's the insert, and then there's the print. So let's look at the definition of the tree node. So remember, every tree node has a key which is an int. And it's not easy to change keys in a tree that we won't get into that. So let the key be a constant, constant. But as we have seen, the left and right sub trees will change as things are inserted. So those are not constant. Just like a q element pointed to the next q element, a tree node will have two tree node pointers, the left pointer and the right point. We have already been drawing it on paper. The constructor takes an underscore key sets key to underscore key and initializes left equal to right equal to null to make sure those are definite values. And now we have to think about how to insert. I've already demonstrated to you pictorially how insert works. How do we just put it up in code? That's all. The insert method is called at this tree node. That's one more thing that we need to digest a little more clearly. Tree node is a class. Each actual node in the tree is an instance of this class. It's a different instance of this class. This method called insert is being called at a specific node. Remember, this is always available as the identity of the current node at which insert is being called. So when you implement insert with n key as the argument, the new key, inside the body of the insert method, you always have available the key value on the left and right value at the current node. So you can always say key, and that means the key of the node where you called insert. It's not like key is something like a global variable. Every node has its own different key, different left and different right. The method definition of insert is shared across all the nodes. Logically, they do the same thing, no matter where you're inserting. But the instance on which insert is fired is different, depending on what pointer you're invoking insert on. So suppose I have just entered into the insert method at a particular node this with n key. I compare n key to my key. If n key is less than key, then I should go down the left subtree. If in particular my left subtree is null, then I'm done because left will be equal to new tree node of n key. Already seen that. Otherwise, if left is already existent, then all I do is recursively call insert on my left subtree. So left add or insert n key. Otherwise, we just repeat all the above using right instead of left. Now, of course, typing out the code twice, once with left and once with right is a little tedious. What you mean is find the correct child, which could be either left or right, and then do the same thing to the correct child. So if you have a good understanding of what star and ampersand mean and what is the reference and what is the pointer, you can actually code this a little more nifty in half the length. So how do you do that? You say tree nodes star ampersand child. So you're getting into a little bit of high wire trapeze here. So what does it mean? It means that child is a reference to a pointer to a tree node. And I said this reference called child to either left or right depending on how n key compares to key. So note that I'm not copying the left or right pointer. So I'm taking a reference to either the left or the right point. So just like my example with integer references a few slides back, when I now say child equal to new tree node n key, I'm actually modifying left or right itself. Child doesn't have any storage on its own. Child is just an alias for either the left or the right at the current node. So after that the code is exactly the same as before except I don't have to write it twice. If child is null, then I insert a new tree there, note there with n key. Otherwise I insert n key into child. And that's it. This is the insert routine. This will work fine. How many people are comfortable with how insert is working? First of all, how many people are comfortable with the declaration of tree node? Key, left, right. Simple stuff. Insert, first cut. Compare keys, go left, go right. Second cut using a small trick with pointer references to pointers. How do I print the keys? So there are multiple traversal orders over the tree. The most common one used is in order. So you want to print the keys in order. Therefore it's called in order traversal, one word in order. So print is a very simple recursive routine. Print at the current node says, if left is not null, you have to print the left. Then I output the key at this node. And then if right is not null, then I have to print the right subtree. So that's star left dot print. See left is a tree node pointer. So star left is a tree node. Tree node has a print method. Yes, that's right. So to give another simple example from earlier days. So if you declare a vector of int veck, you could always say veck dot size. So you are calling a method inside that vector. If now I say something like vector int star pveck, then I should be able to write pveck star pveck dot size, which is equivalent to pveck arrow size. So any pointer which points to a struct or a class with a certain method, whatever you are calling with a dot earlier, you can still call it to the dot earlier after differencing the pointer. A shorthand for that is the pointer arrow the method. So therefore, suppose at each tree node, I am defining a print method. So print is not outside the class. Print is actually a method which will be fired at a particular tree node. So what do I have to do? If the left subtree is non-empty, not null, then I have to print the left. Then I have to print my own key. Then I have to print the right subtree. So this is where a recursive data structure is gangled with a recursive function in one to one correspondence. And you could write printing a list in the same way. To print a list, print the first element, then print the list defined by first pointer next. So how does this work? So the first recursive call will print the left subtree, which is smaller than the root. Then the second recursive call will print the right subtree. And from main, see in this particular case, the print at the current node is not giving a new line. So you say if root not equal to null, then root print and then you have to print a new line to clean things up a bit. So let us look at some sample code. So this is just a direct write up of whatever I was doing. And then because I am using recursion, I have this indent stuff being included from the print directory. So insert as usual, I am using that reference trick. So I find the correct child, given the key that I have to insert either left or right. What does print do? Again, if left is not equal to null, do that. I print the key. Now it could either print the key without an indent or you can print with an indent, with a recursion level. So at the root, the recursion level is zero. Otherwise, the recursion level will grow. So otherwise, I will say print at level plus one. So first, let me not give the indent. And then I define the null somewhere in the code. And then what does main do? Main initializes root to null. Then while I can still read another key, I insert the key. And then after every step, I print the current root with a new line. So I am starting with an empty tree. I am inserting keys one by one. After each insertion, I am going to print a tree for you. Just to separate things out of it, let me put two new lines. So let me compile this. So it asks for the next key. So let's go back to our diagram here and insert in that order. I thought we first inserted 10. So I insert 10. The root node is 10. That's the only thing it prints. The next key to insert was 5 in our example. So I insert 5. I print 5, 10. I next insert 7. Observe that this is a not sorted order. Third insert was 7. It prints 5, 7, 10. Next was 12. So I can keep dynamically inserting and get things in sorted order. This search tree is very close to how map and multi-map are implemented inside C++. Maps are sorted. You can always keep inserting things and even deleting things. Deleting is a little non-trivial. You can think about it by yourself if you feel that. How exactly you have to shift is the important thing. Deleting a leaf is easy. If you delete an internal node then what happens? So in general there's this path coming from the root down to the node you are trying to delete. And suppose delete has non-empty left and right sub-tree. What should you do? I don't want to get further in the coding stuff. So think about this offline. It's a little non-trivial. So here we just kept on inserting keys and we saw that at every point you can do an in-order traversal of the keys. How do you find the very first key? So in our event simulation experiment we had to keep a map of events with times and always take the smallest event. So you can use this data structure to build that event queue. You can insert an event with a timestamp and that's your key. It will go into the right place in the tree. So how do you get the leftmost leaf in the tree? Go to a root. As long as left is not equal to null, walk down. That's your first key. Now I'll kill this and I'll instead enable the print with the recursion level. Observe that here I am giving a new line at the end of it. So every level will be will indent to the proper level in the tree and then we'll actually give a new line. That will give us a lot of insight into how the tree looks. So I'll insert in the same order. First I insert 10. 10 is the root. There is no indentation. 10 is printed at the leftmost position. Next key to be inserted was 5. So 5 was left of the root because left is printed first. It's coming on an upper line. So it's kind of mirror image. 5 is under 10. Now the next key to be inserted was 7. So 10's left child was 5 whose right child was 7. If you want to show it the right way around what do you have to do? In this print you just have to swap left and right. Then you'll get the print out looking like left and right. Just turn sideways. What's the next key? It was 12. So see 10 is the root. It's right child is 12. Left child is 5. Whose right child is 7? But in order to have our cell is just like reading line by line. That's always in sorted order. 5, 7, 10, 12. Next key was 3. 10, 10's child was 5. Whose children were 3 and 7? 10's child 12 remained as before. Then I inserted 11. And so 10 linking to 5. 5 linking to 3 and 7. 10 linking to 12 linking left to 11. If I insert 20 that fills out the right sub 3 and so on. So that's how our binary search tree works. You couldn't do this unless you had linked data structure. Maybe you could but it's a little more difficult. So any questions about how binary search trees work? So that's how maps are implemented. Maps and multi-maps. Unordered maps are implemented using hash tables. Ordered maps are implemented using binary search trees. Deletion is slightly non-trivial. What happens if I insert in the order 1, 2, 3, 4, 5, 6, 7, 8? So for example I can now let me kill this. Insert 1, insert 2, insert 3. This goes down the right spine. See if the tree were balanced then searching for a key you'd take roughly log n steps if there are n nodes in the tree because there's a fan out of 2 at every stage and by the time you go down log n levels you get n nodes in the tree. This is a bad tree because I have to go down n levels to cover n elements. See searching in a binary tree for a key is very easy, right? It's the exact same as binary search. Given a key to search for, I compare with the root. If it's equal to the root I'm done, if it's less than the root I recursively search in the left subtree, if it's larger than the root I recursively search in the right subtree. That's identical to binary search. But in case of an array we were always taking the cut at the middle of the array so that we were guaranteed log n time for binary search. Here we are at the mercy of the key insertion order. There is this evil key insertion order which degenerates the tree into a linear chain and now searching for everything will take linear time instead of taking logarithmic time. There are much smarter tree insertion mechanisms which always keep the tree balanced while satisfying the left-center right property, left-root right property. That is highly non-trivial. To dynamically reconfigure the tree in nearly constant time for insertion so that the left-root right relationship always remains preserved is not easy. As you know even a single deletion doesn't look quite trivial as we speak. So to reorganize the tree to become balanced is non-trivial. If you're interested look up AVL trees and red black trees. So CS data structure is rich with dozens of trees each with its advantages and disadvantages. So if you're interested you can go explore that further. So what remains of the course? You'll have one last lecture tomorrow morning. Yeah. So you can also add a parent pointer. So so far walking down from the from a parent to a child has been easy. But sometimes handy to be able to walk up from a child to a parent as well. For example if you want to find the sibling. That's there's nothing to it. You just create another member called up and you fill it up as you insert the thing in the tree. So the last comment I would like to make is what's a constant pointer versus pointer to a constant. So anyone who follows American politics will know this very famous speech by Donald Rumsfeld, Secretary of State or something. When America was attacking ERAP saying that absence of evidence is not evidence of absence of weapons for example. So it's similar to this. What is a constant pointer versus a pointer to a constant? So suppose you want the up pointer to be a constant because the tree never changes. Once it's inserted it's inserted. Then you have to declare this as tree node star constant up. So it's the pointer up whose value cannot be changed. The tree node that it's pointing to can change. Whereas if you say constant tree node star up then that tree node members cannot be changed. And this is not the same. But you also have constant tree node star constant up. And that's like ultimate stasis. Neither the pointer nor its contents can change. So of course you know pointers are a little heavy going. So what we'll do in the last lecture is spend the first 45 minutes or so doing more examples of pointers and pointer data structures. And then we'll end the course with a discussion on input and output from files from and to files. So so far when we open an f stream and we read and write from or to it, we are depending on transforming all our program variables into a string form. Sometimes you don't want that. Sometimes you want a more compact encoding in on disk. Maybe the same size or even smaller than the size that's in RAM. For example, if you have an integer which is a standard four byte integer, the number of decimal digits in it can be as large as 10. To write down 10 digits in decimal will cost you 10 bytes of ASCII code, perhaps followed by a new line or a space. So you might take 11 bytes to write down an integer. Whereas in memory an integer only costs you 4 bytes. Sometimes this blow up is unacceptable. If you have large amounts of data like the physicists are having at CERN when they're doing collided experiments, they're generating petabytes of data per week. So they're blowing up 4 bytes to 11 bytes is unacceptable. Your budget more than doubles. So we'll see how to transcode your in-memory variables to disk in the same size. And later on you can explore how to do compression so that you can cram it even smaller than that, but you won't do it in class. So in particular we'll look at our own vector class which will actually not be in memory at all. It will actually be backed by a file in the file system. It's of course very slow, but you no longer bound by memory size. You can now have a vector which is much bigger than your RAM. And you can read and write elements to it. You can push back everything. So we'll implement a class which is our own vector which we'll call disk vector or file vector which will be implemented using a disk file.