 Today's plan is to continue with funny kinds of arrays and array operations, you know, non-trivial ways of looking through multiple arrays. And then at the end of today, we will start getting into 2D matrices. And just like arrays, 2D matrices can also be dealt with natively by the C++ compiler or you can use some library packages to do it. In this particular case, the standard C++ library doesn't have, as far as I know, options for multidimensional arrays and matrices. So we will use an external library called Boost. Boost is readily available on Linux systems. You can just install it using APT or Synaptic or whatever your package manager is. If you are on a Windows or some other platform, you can readily download Boost from a website and install it on your computer. If you need help installing Boost on your system, the sysads, the juniors can help you out. So today we won't really do much with Boost. I'll just show you how to include the files and use them to declare arrays and do very basic operations on matrices. Most of today we'll use native multidimensional matrices. And more specifically for at least this week and next, we'll never go beyond two dimensions. There's enough to do with two-dimensional matrices. So coming back to sparse arrays and how they are represented. So remember that the easiest way to represent sparse arrays is a couple of vectors. One vector records the dimension or the index in a dense array where you would have stored the vector if you could afford the space. And the lower vector is a value array. So you're saying in that dense array, which I'm not storing explicitly, index number 11 had value 2.1. Index number 5 had value 4.3 and so on. So the other indices which do not appear in the upper row in this table are implicitly with value 0. And the whole point of storing it this way is that I don't have to store all those zeros. If the fraction of zeros in the original dense vector is much larger than half, say, then it makes sense to store the array in a sparse format. The trade-off is, of course, that operations get a little more complicated. So you sort of have to address things through the position of the dense array now. So for example, if you have provided the array where the dims are not in order, then to set it in order, we have to do some kind of sort. But now the sort key, the value on which you're sorting, is the upper row, not the lower row. So use the values in the upper row to permute the columns of this matrix. And the lower row always travels with the upper row. So unless the two cells in a column stick together, the vector becomes meaningless. So in the first step, if I'm doing selection sort by sweeping a frontier from left to right, then at fx equal to 0, where fx is pointing to the very first column, leftmost column, I find that the smallest dim is 2. And it appears at position 3, 0, 1, 2, 3. So what I do is I swap the 0th column with the 3th column. And that results in the following array. 2 and 3.2 have moved left. 11 and 2.1 have moved right. And 2 and 3.2 are now excluded from the remaining sorting because the frontier has now swept past 0. So now fx is equal to 1. So I started the 1th column. And now the minimum dimension to the right of it is 5 in position 2, 0, 1, 2. And at this point, I have to swap the frontier with the minimum. And again, the lower row has to travel with the upper row. So after the swap, it becomes 5 and 7 and 4.3 and 6.5. Now after that, my frontier has become 2. But now the minimum position is already 2, which is 7 itself. So one more position is already sorted. So I sweep fx one more. And now fx is 3. So I have the last two cells to sort. And the minimum position is 4. And the dimension of the minimum position is 8, shown in yellow. So now I have to swap the last two columns. And that results in a finally sorted or canonical representation of a sparse vector, where the dims are in strictly increasing order. So now to recap, we have covered the basic representation of a sparse array as a mapping from indices or dimensions to values. And we have figured out if it is not provided in an already dims sorted order, how to do it. And once you do that, then many operations become easy. For example, taking the dot product of two vectors or taking the sum or difference of two vectors, we now know how to do that. As we get into matrix algebra, we'll see that there are other operations which need to be ported to, say, sparse arrays. We'll see that in due course. So any questions about representation of sparse arrays? So for example, sparse arrays are a natural representation if you have to store polynomials. If you have a polynomial which looks like 2 times x to the power 37 plus 5 times x to the power 101, then all you need to store is the power and the coefficient. So that's a natural representation of a polynomial. So this is the code which implements that selection sort on sparse arrays, which uses dim to calculate the minimum position, but then does two swaps at once on both the upper and lower rows together. So the next artifact we will study is indirection or index arrays. And the motivation for this was as follows. Suppose you have a data vector or data array, data dn, which consumes a lot of bytes. For example, each item in the data array could be a string, and the string could be fairly long. Or every element in the data array could actually be a large record with your first name, last name, your parent's name, your city of stay, your name of your employer, your address, all kinds of things. If you package all of this up into a record, which in c and c++ is called a struct or a structure, which we shall study after midterms, then the number of bytes or bits in every element could become quite large. Now in both the sorting methods we have seen, there's a considerable copying of data involved. In case of selection sort, we have to detect that minimum element and pull it up to the front and push something back. That involves copying the whole record. If the record was just an int, then it's just 32 bits. But if the record is a complicated record like this, then it might take a lot of bits movement. So can we avoid this? And there are multiple reasons to avoid this. We shall see examples of that. The idea was to create an indirection or a permutation added. Pause, think of it as a permutation added. So that as you clock up px, pause px will dance all around the place, but data pause px will become increasing order. So here is the example. Suppose the data is these six strings, which I'll just use one character for simplicity. So zp, ay, bm. And we want, therefore, pause to be 2, 4, 5, 1, 3, 0. Why is that? Because the smallest element a appears in index 2, 0, 1, 2. The second smallest element b appears in index position 4. That's why after 2 comes 4. The third smallest is m. And so that's 5. So in other words, the ith cell of pause should contain the index of the ith smallest element of data. That's the definition of the pause permutation. So in the table, you see that the first column shows i as the index which sweeps through pause. Pause i is, as written in the second line, 2, 4, 5, 1, 3, 0. But now, if you look through where data pause, what data pause i is, you'll find that it's abmpyz. So that's in increasing order. Now, so one fact that many of you should know is that any permutation can be represented as a composition of loops or cycles. So 0, we'll look at that a little later. So how do you find the introduction? So last time we saw that. We start with a vector of strings and names, which suitably filled with animal names in our example. And pause was the permutation. Nn was the size of both of those things. Initially, I set up pause to be the identity permutation. So pause i is equal to i. And then I do selection sort on this. I only move data around in the permutation, but not in names, which is data here. So fx is equal to 0, fx less than n, plus plus fx. Find the smallest string among the positions, pause fx to pause nn minus 1. So observe that in standard selection sort, we are finding the smallest element between fx and nn minus 1. Now, at all times, we will look at names through pause. We will never inspect an element of names by itself. And then we do the swap. So I push back a bunch of animal names into names. And then I created the identity permutation in pause. And then here was the main sorting loop. Frontier goes from 0 to n. Min pause is minus 1. Now I create a minimum name, which is the plus infinity of animal names. No animal name is larger or equal to this. And then I sweep the minimum searching iterator, mx, from fx up to the n. And if the minimum name is greater than, instead of names mx, I have names pause mx. Every time I access names, it will just be pause. Then min name is assigned to names pause mx. But min pause will be mx. It will not be pause mx. That will tell me how to permute pause itself. So after I find out the minimum position, I swap only the pause added between fx and min pause. So you saw the demo already in the animated slide. So I'm not going to repeat that. And then finally, I'll print pause. If you want, I can show what is happening step by step. So we'll see. So initially, at the very first step, let me print out pause before starting out anything else. So we start with the identity permutation. Now we search between beginning and end for the smallest animal, and that's alligator. So I need to bring one up to the front. So one and zero are interchanged. Now I look at only, so the first position is now done. I only look at the data array at these indices. So alligator is effectively left out. Alligator is already done. So I look at zebra, giraffe, wolf, lynx, and jackalope, and ask which is the smallest among them. And the answer is giraffe. So now I swap these two. Zero two become two zero. And now one and two are done. So I only look at the original data array in position zero, three, four, and five. Zero, three, four, and five. And now the smallest is five. So I swap five and zero, and so on. So is everyone clear about how the pause array is being generated, step by step? At the end, we have one, two, five, four, three, zero. If you access data in that order, you'll get a sorted list of animals. Any questions about this? So how to compute an index array? So an obvious question for you is we use selection sort because it looks simple. Naturally, you don't want an n squared algorithm. We would like to use an n log n algorithm, like mod sort. So can you generalize mod sort to generate an index array? So try this out. All right. So how can you use introduction arrays? Well, some of the reasons are obvious. You want to access them in sorted order, and you don't want to actually move around the data. But here is an interesting problem. So suppose each element in an array called customer cn, or data, whatever it is, is 1,000 bytes. Now as I was saying, this can easily happen because I'm trying to package name, home address, employer, work address, data abort, all these fields into a struct, which we shall see soon. Now pause cn will sort without physically moving the records within customer cn. But suppose we do want to restructure cast cn to be physically sorted by name. We might want to do that if we want to then store the customer array on disk. Generally speaking, most companies will have customer arrays which are too large to store in RAM. Many companies will have that. So you would like to store the customer array on disk. And now if you want to do some calculation where you need the customers in increasing order of anything, suppose you want to get the customers in decreasing order of total sales to date to find out your most important customers. If you depend on the pause array to do that, then pause cx will be jumping out all over disk. And that will take a long time. So remember, I already told you that disks are best accessed in sequential order of bytes. So for this reason, you may want to reorganize the customer database to be sorted in whatever your sort order was. If you're sorting by customer name, then that's one order. If you're sorting by total sales to date, then you might want to store back the customer array in that order so that you can access it fast. So we can always do that by using another array called temp cast cn. So we can write code which looks like this. So remember, I had the customer array. But I also created the pause array. And now suppose I create the new temp cast array of size cn as well. Then all I have to do is for i equal to 0 to cn minus 1. So this loop will create in temp cast a sorted version of the array in cast. But the problem is that I'll be using twice the amount of storage. So what if this 2x space is not available? So as I was saying, permutations can be decomposed into cycles. For example, our old table with animal names had these values for i, pause i, and data of pause i. So let's pick out cycles in it. So I go to 0, 0 maps to 2 in pause. So let's trace back to 2. Now 2 maps to 5, 5 maps to 0, so 0 loops back. And that's a cycle. So we can write out the cycle like this, 0, 2, 5, and forget about those elements. The some arbitrary element that's not covered yet is 1. That maps to 4, 4 maps to 3, 3 maps to 1, and we have loop back. So that corresponds to another cycle, which is 1, 4, 3. So what we are saying is that the permutation given by pause can be decomposed into these two cycles. 0 goes to 2 goes to 5 to 0. 1 goes to 4 goes to 3 to 0. So now it should be reasonably clear how we can exploit cycles to permute and sort the data array in place, given pause. So a swap is just a special case of this. If I'm trying to rotate a cycle one position forward, what do I have to do? I have to preserve one value, keep overwriting, and the last thing I plug in from the preserved temporary value. So this may not be the correct pseudocode. So let me actually write this down on paper. So suppose I have my original array. Let's say it looks like this. These are the indices. So let's call this px. And then let the strings here, just for short handle, let's say z, a, g, w, l, j. So that's data px, or names px, or customer px, whatever you call it. That's data px. And if you went through the selection sort, you would find that pause px should be 1, 2, 5, 4, 3, 0. This is a slightly different example. So the indices are different. So now let's start at 0. So let's say I want to do this in place copy. I start a copy cursor, cx. And I have cx equal to 0. If cx equal to 0, then data cx is equal to z. So let's remember that I started at 0. And remember that the first value I looked at was z. Now let me do this assignment. I'll do data cx is equal to data pause cx. So I'm starting to rotate the cycle. I'm advancing everything to the next value in that directed cycle. So what happens if I do that? Effectively, I'm doing data 0 equal to data pause 0, which is 1. So that results in data 0 being overwritten with a a. But luckily, I have already preserved the z. So I'm not scared. So my data array now becomes a, a, g, w, l, j. So observe that there are now two copies of a here. So now I do the following to walk over one link in the cycle. What I have to do is cx equals pause cx. So that becomes equal to pause 0, which is 1. So cx has now become 1. So in the next step, I just continue. I said data 1 equals data pause 1. What is pause 1? Pause 1 is 2. So that's data 2, which is equal to g. So now my data array has become a, g, g, w, l, j. So at this point, I step up cx to be equal to pause cx, which is 1. So that is equal to 2. So next step is again the data assignment. So data of 2 equals data of pause of 2. What is pause of 2? Pause of 2 is 5. So that's j. So what is my array now? a, g, j, w, l, j. I'm just looking at a cycle and taking the next element of the cycle and copying it into the current one. As a result, there's always two copies of something, temporarily. How many people are comfortable with what's going on here? So I started out at a position 0, and I kept doing two things. One is data of cx equals data of pause cx. So pull over the next value in the cycle to the current place, and then step over to the next position. So that meant initially cx was 0. Data 0 used to be z. I overwrote it with data 1 because pause 0 was 1. So the first position became a, as well as the second one. Then cx changed to pause cx. So cx became 1. Then I wrote data of 1. Again, it's always data cx equal to data pause cx. That's the basic template statement. And the other is cx equals pause cx. So there's two steps you execute in a loop. So in the second step, what happens is data 1 gets assigned to g, which is data pause 1. Pause 1 is 2. So now it becomes a, g, g, w, l, j. Then cx steps over to pause 1, which is 2. And then data 2 from g gets assigned to data pause 2, which is 5. And that takes the data 2 to value data 5, j. So now there are two copies of j, just like there are two copies of g here, two copies of a here. But I've preserved the original z. I'll fix it when the time comes. So what happens in the next step? So the next thing that happens is cx equals pause 2, which is equal to 5. So now I could try to do the following. I could say data 5 equals data pause 5. What is pause 5? It's 0. So that is equal to data 0, which is equal to a. So that would be wrong. I don't want that. I have looked back. The first cycle has closed. And it's pretty easy to detect that, which is this 0 is the same as the value I remembered here. As soon as that happens, you should not be using data pause anymore. You should instead replace it by whatever you remembered here. So we'll replace this wrong computation with just whatever I was storing in temp. And that finishes one cycle. So once that happens, your new value of data will be a, g, j, w, l, z. This is what came from the temporary variable. So this has exhausted one cycle. But this one cycle doesn't cover the whole permutation because these two has remained unsorted. So let's keep track of what cx has been through. cx started with 0. Then it became 1. Then it became 2. Then it became 5. So basically, I'm now done with 0, 1, 2, 5. Those still remain to be done. If you follow the same procedure starting from any one of them, you are now going to get a relatively trivial cycle with two things. And that's like a swap. So shifting things around in a cycle is a generalization of swapping between two things. Remembering one, pulling it out, having them do a musical chair, and then insert the temporary at the other end. So overall, what does it give us? It tells us that first I can, no matter how big the records are, each record, each array element could be 1,000 bytes long. I don't care. I can write an efficient sort routine which only creates a permutation array. And now given the permutation array, I am going to fix the original data array to become sorted in at most linear time. So the number of data copies you're doing in a cycle is just the length of the cycle. The sum of all the lengths of the cycles has to be n at most. In fact, there could be trivial cycles with only single-tall elements in it that will reduce your time. If something happened to be in the correct sorted order by accident, then you're already done. So in the worst case, you're going to get a linear amount of data copying. Whereas if you trace through what ModSort would do, you would actually copy 1,000 bits or bytes per record how many times? Log number of element times. So here, you do log n amount of copying of only the indirection array in pause, which is only 4 bytes per element. And you prepare the permutation array once, and then finally do a one-time implementation of the permutation through cycles. So you'll end up not moving so much data all over the place, log n times. You'll only move it once. Is the benefit of this clear? So you can reduce data copies in the case that each array element is really large. Yes? Can we also get copies of data of 2,000, data of 1,000? Yes, but that's done only one shot. Every item in data is going to its home in one shot. It just moves once. Of course, you have to move it once. The permutation pause expresses what is the correct sort order. And I'm moving every item of data just one shot to its final destination. Whereas if you looked at how ModSort was behaving, because it was doing it in runs, each item could be copied log n times. So let's try to implement this and see how it works. So at this point, the pause array is ready. I'll stop the prints. I'll print the name still. So the final pause array will be printed. And now let's try this one shot permutation copying. So we'll start off by keeping track of what has already been copied. Remember, I covered the cycle of 0, 1, 2, 5. And I can cross off those positions to remember that I'm already done with them. So now I initialize a Boolean array called copied. And one other way of initializing vectors in the standard C++ libraries is to give it the number of elements and one value to fill in all those cells. So that prevents a lot of that, avoid some work in writing a for loop. So you say I need nn cells. Each of them initialized to false. So I have not copied anything yet. And then forever, what I mean is until copied becomes entirely true, that's what I really want. Find the next not copied element. And here are some more easy things to use from the algorithms library. So you say not copied position. So find some not copied position, in particular the leftmost one. You do that by calling the find algorithm. Between copied begin and copied end, you're looking for false, so something that's not copied. That returns to you something. And you have to subtract begin to get the offset. It's just like subtracting a or subtracting the character 0. So we'll see this more as we go into templates. Any collection object has a beginning and an end. The end is abstractly one position beyond the last element. But begin and end are not directly integer types. So you shouldn't think of begin and end as integers, but find will return you another abstract position in a collection, and you can subtract begin to get an index. I could have done this with a loop. I could have said go through position by position. If you find false, break the loop and use that value. This is just one line of shorthand for the same thing. So eventually, you don't want this kind of trivial loops to paper your code. You want to pick up all these available libraries from the algorithm package and use them so that things become much easier to read and it's less typing, less mistakes to make. So I find the first not copied position. Now if not copied position is less than 0 or greater than equal to nn, then that means all positions have been copied and I can break the infinite for loop here. This is an infinite loop. No initialization, no test condition. So then break. Otherwise, I remember the beginning of the cycle as ncpause. And I remember names, I think this is wrong. I need to remember the names at ncpause. And now let me not write it as a while. Let me first write it in a relatively uncivilized fashion. So I'll say again infinite for loop. So this is the cycle shift loop. Whereas this is the loop to find the next not copied element, which is find entry into next uncopied cycle. So the outer loop finds an entry point into an uncopied cycle. And then the inner loop does the cycle shifting. So how do I do the cycle shift loop? I've already preserved the first element of it. Now the first thing is I should not be doing the cycle shift if pause of ncpause is cycle begin. So I say if pause of ncpause is equal to cycle begin, then I'm done. I should break. Otherwise, what should I do? I should do data of ncpause equals data of pause of ncpause. That was the basic cycle shift step. And finally, I have to walk over forward in the cycle one position. And that's equivalent to ncpause equal to pause of ncpause. Is that OK? And just so that we don't lose track of this entirely, I'm going to print things at the end of this. Not after every cycle shift, but after one whole cycle. But after I exit the loop, what will I have to do? So I've exited when pause of ncpause became cycle begin. So now I have to say data of ncpause is equal to the stuff I remembered. I check that the pause of ncpause is equal to cycle begin. Then I've come back to the beginning of the cycle. So I'm not sure this is completely bug-free, but let's run it on c. And after all this, I'll always try to print out the data added. Suppose I do that. Let's see if it sends off smoke from my machine or whatever. Data was not declared in the scope, line 57, or its names, not data. Clearly that wasn't working. I was trying to. Just give me a sec. So this is the original add-a. Then this is the permutation add-a. So in the first step, I should print not. Sorry, that doesn't make sense. Let me print out what I'm trying to do. Then it will be a little clearer. Let me not print names to Rachele. I will write out what is being assigned here. So I'll say ncpause gets the value from pause ncpause. So if that's wrong, we'll know why it's going wrong. So replaced by. So the contents of names ncpause will be replaced by the contents of pause ncpause. So I'm just printing out what the swap should be before mistakes are made so we can debug gradually. So 0, I'll copy 1. 1, I'll copy 2. 2, I'll copy 5. Now 5, I'll copy 0 is kind of done. So let me print a message for cycle done. Yes? You have to set all the others to true as well. That's right. Good observation. So I have to say, I also have to mark all the others as done. So 0 gets 1, 1 gets 2, 2 gets 5, and cycle is done, because 5 loops back to 0. And then 3 to 4, cycle is done. So now if I finally get the courage to print out what's happening, so after the first cycle, Alligator, Gdav, Jackalope, Zebra are in correct position as you saw in the example, but Ulf and Lynx are still not correctly done. And in the second cycle, I basically swap 3 and 4, and now I get sorted out. So there is a bit of a Josephus problem going on here in that as I am eliminating cycles by marking them to be true, I have to keep doing a linear search in the copied array to find out what remains not copied. So that is an additional complexity. You could try to think of how you can avoid that. So the search of our copies, it looks like one line of code, but it's not cheap. Every time I have to scan from the beginning to the end to find out what work is left to do. Can I make that more efficient? So again, I could try to squeeze out values, but that would be a little painful. So think about that. Any questions so far on? Basic indirection arrays. So an indirection array computes the index of the is smallest element in the original data array. And it's basically a permutation. Therefore, you can express a permutation as a bunch of cycles. And even if you are not given an additional array to store all the original data, if you didn't have 2x the amount of space, you could still just hold one temporary records worth of data and permute the whole sequence. Any questions on this? So indirection arrays are very useful because you can declare them on the same data array depending on what your sort order should be. For example, suppose I have a database conceptually which has two arrays, each of the same size, cn. And the two arrays have student names and student marks. Maybe it's an example uncomfortably close to home, but bear with me. So each index accesses names ix and his or her marks ix. And as always mentioned, they have to be ganged. You have to access two of them at the same time. But the names and marks are not in any particular order. So suppose we want to do some fast grading on any one of them. For example, given a student name, quickly binary search and find the student's marks. Or given a range of marks, report all the students who got marks in that range. Clearly, if you wanted to solve both problems simultaneously, ordinarily you would have to store them in two different sorted orders. And then you'd have to replicate the student names in two places. But now you can avoid that. You can avoid that by keeping only the original pristine data unchanged, so name and marks. But you create two indirection arrays on them. One will be for the student names. The other will be for the marks. So to give a very simple example here, just to make this clear. So suppose I have name and I have marks. And so here are the names. And the marks may be something arbitrary. Let's say, I don't know, 14, 2, 11, 5, 9, 7. Suppose those are the respective marks of the students with those names. So now I can create two pause arrays. Names pause and marks pause. Or rather, let me create names pause on this side. It becomes cleaner. So names pause, the first position has to point to the smallest name. So that's 0, 1, 2, 3, 4, 5. So it has to be 2. The second smallest name is b at position 4. Third smallest name is m at position 5. Then we have 1, then we have 3, 0. So that will be the n pause array. Whereas m pause will be its own order. So m pause will be, the smallest marks is here. So m pause will be 1, which will go here. Then we'll have 3, which is here. Then we have 5, and then we have 4 of 1. Sorry, 2, we have 0. So any questions about this? So now we know how to compute each of them. Now as a software engineering exercise, it seems like a horrible waste if to convert any data record into pause records, you have to write separate code. Ideally, you should be able to write one piece of code which given any record and what key you want to get it sorted biologically. We'll find a pause array for it. So we'll see how to do that as generic code once we get into templates much later in the course. Once we know about classes, templates, we can do that. But can we get back into the kind of queries we'd like to answer on this? So we said, find the marks obtained by a given student. So how do I do that? So basically, you have to look at the names array through the n pause viewer. And remember, this is now sorted as this increases. Names of n pause of that thing will also increase. And so now we can do binary search. Now meanwhile, if I had some two marks, m begin and m end, and I wanted to find all students in that range, what did I do? I would use marks, m pause, and I'd do binary search. So of course, I assume that that is the case. So what will happen is I'll find two positions, m begin and where the marks are between that. And then I'll access the records using that. So any questions about how to code it up before we actually get started on it? So let's try this out. You haven't missed much. String, vector, algorithm, et cetera. And now I say vector string names. Then I do the usual. Suppose I have just five to reduce. And then I have vector, say int marks. So what you learn in the course, you should learn in life as well. If you're quickly trying to create an example, you should double the number of coppings you do, 11, 9. So five animals with five marks. Now, what's the indirection added? Let me dispense with long names and give the exact names as here, so that the example is exact. We can relate back to paper and so on. Then m got 7. So now we already know the indirection adder, so I won't write code to find the indirection adder. That's painful. So I'll say vector int n pause. And n pause. So there are various ways of declaring it. Maybe I should play it cheap for the purpose of this lecture and say int n pause. Actually you say something like input int n pause equals something like that. Then you say 2, 4, 5, 1, 3, 0. Similarly, you declare m pause as 1, 3, 5, 4, 2, 0. So there are two columns of your database with individual indices on it, called n pause and m pause. So now, suppose I want to write a binary search of two names, two marks. Suppose I want to find all students with marks between, say, 2 and, sorry, say 5 and 7 inclusive. So then I need to do binary search for 5. So let's say int m low equals 5, m high equals, say, 9. So now, if I have to search for those positions, how do I go about it? Remember the old binary search code? So this is what it looked like, low and high and so on. So again, because we haven't learned how to do functions yet, we need to copy and paste the search code twice over. So in fact, this is the low bracket for the low guy. This is the high bracket for the low guy. And this is the low mark answer. While low bracket for low guy is high bracket for low guy. The mid bracket for the low guy is that. I don't print anymore. But now the important thing is that my query here is marks low. So while marks low is less than, what do I do here? I take marks, but I can't do mid. I have to do m pause, mid low, another bracket. And if the low marks is greater than the same thing, otherwise if m low is equal to that same thing. So I just show one part of it because it's too tiring. And then we'll just, after that you can continue. We'll go on to matrices instead. So answer equal to mid and so on. And then, so I'll print out something. Let's say I print out mid low and also answered low. So suppose I want to access the names and marks. So now I say names, m pause, not n pause. We are searching on m pause now. So I do answer low and the marks, m pause again. Let's see if this at all works. Probably have a million compile time trouble. This is name marks because this is the thing about parameter copying. So this should be names.size-1. So see, I looked for low, which is 5. And 5 was obtained by y. So I see that y and 5 here. So I'll post this code. You should complete it to find all students who got marks in a particular range. Fine? Yes? We are assuming that the find has to be somebody. So what if no one actually got that mark? That's also an extension that you can do as a practice lab. When you binary search and your query value falls between two adjacent values, depending on what your code wants to do, you can choose the left, snap to the left, or snap to the right. But report that you didn't quite find that value and it snapped. That's what you need from a binary search routine. So the solution is, as I showed in the code, two indirection arrays, n pause and m pause, where names pause, n x increases with n x. But marks of names pause will not, generally. And conversely, marks pause, where marks of marks pause m x will increase with m x. But names of marks pause m x need not. And then you can use these two indices to do all kinds of interesting access on the data, even though the original data has not moved an inch or a bit. So that's how indirection arrays are organized. So that sort of wraps up what we have to do about the starter problems on 1D vectors arrays. Now we'll lift our dimensionality to 2D and get into 2D matrices. Declaration and storage of 2D matrices is very, very similar to declaration of 1D native arrays. So this is the native array way to declare matrices in G plus plus or C plus plus. So you say, I have an integer matrix. So int is the type of each cell, imat. And then you give the number of rows and number of columns. Instead of one box bracket, you have two box brackets, as simple as that. And or you can declare a double matrix called demat, which has, again, rows and columns. Internally, there is no distinction between 3, 5, and 15. Internally, the storage required for a matrix with three rows and five columns, rows go vertically, columns go horizontally, is indistinguishable from allocating a one-dimensional array with 15 elements in it. And CNC plus plus compilers implement what is just called syntactic sugar. It just makes life a little sweet for you, hopefully. By using the information about how many columns we have. So since you have declared the number of columns, like five, the compiler now knows that if you have to access imat at rx and cx, then the memory cell you are looking for is the base address of the integer at imat plus calls into rx plus cx. Why that funny multiplication? So here's the picture. So suppose I declared int imat 3, 5. Internally, so this is number of rows. That's number of columns. And internally, the c plus plus compiler will logically think of the matrix as three rows and five columns. The columns will be numbered 0, 1, 2, 3, 4. The rows will be numbered 0, 1, 2. But in memory, what will happen is imat will start at a particular place, its integer. So you need four bytes for each element. This is the 0, 0 cell that will be stored in the first integer. And then 0, 1, the next one will be the next four. 0, 2 would be the next four bytes, and so on. Right up to 2, 4, which would take another four bytes somewhere. So 15 cells goes from 0 to 14 integers. So let's see a basic declaration. So three rows, five columns. Int imat rows calls. Same syntax, just simple extension. Keep adding dimensions. Now again, I'll do some address voodoo magic to get the address of the very first 0, 0 cell. And then turn that into an integer by force. You shouldn't really be doing that. This is all bad, bad, illegal things to do. But this gives us a peek into how the memory is organized. That is base. And then this is the basic for loop. Row is 0 to rows. Column CX is 0 to calls. And I get the absolute address of the RxCx element in the matrix. And then I'll print the row and the column. I'll print the row address of the current cell as an integer. And then I'll also print address minus base to suggest what happens. So I compile that. Now when I run it, well, the address itself is some arbitrary hex number, which happened to have its highest order bit set. So that's why it said minus. But if you are upset with that, we can also always do unsigned, because this machine has 1.5 GB RAM. And depending on how the address base is organized, your address can look negative, but it's actually not. That's meaningless. So that's the address. So element 0, 0 has relative address 0. Absolute address is that in this particular run. If you run it again, it can change, as always. So element 0, 1 has relative address 4. So 4 bytes forward. I need the first 4 bytes for the 0, 0 element. Element 0, 2 is at relative byte offset 8. 0, 3 is at 12. 0, 4 is at 16. So overall, the first 5 elements take 20 bytes, starting at 0, 4, 8, 12, 16. So 16 to 19 will be the last element. So byte number 0 through 19 is the first row. And then the second row, namely row number 1 starts. And that's just the next position. So overall, I take 15 times 4 or 60 bytes. The last byte is the last element is starting at 56 and going to 56, 57, 58, 59. So bytes 0 through 59 are allocated for this 15 element matrix, so 15 times 4. Now if I change this to say double, nothing should change except for the storage model. So now I take, instead of 60 bytes, I'll take 120 bytes. The last one is 8 bytes going from 112 to 119. The element indexes are the same. The storage taken doubles. And the storage is addressed exactly like this. The base address is this. If you keep adding the second number, you get the absolute byte address in RAM of that array element. So this is how arrays are organized. But if you wanted to assign values to them, that's pretty easy. You just write something like imat, rx, cx equals, I don't know, something, 0.5. I'm at, yeah. So of course, there's nothing to done. We are not printing anything, but this will compile. Now if you wanted to, you could always print out the value 2. So I'll just combine that here, silly old habits. So you access a value and you write into a value exactly with the same notation as before, except that a new index is stuck into another pair of box brackets. So all the cells are 0.5. And as far as native array declaration and initialization goes, that's pretty much it. There's nothing else you need to know. But once again, note that this knowledge that the 2D array had three rows and five columns is lost immediately after this statement. So C++ compiler and runtime system is not going to remember this for you. You have to hang on to rows and columns for dear life. And earlier it was just storing one number. This time it's actually vital to store both numbers because the way the compiler figures out that 0, 2 has relative address 16 is by remembering that there are five things. If you go into the second row, it has to understand that the first row had five things. Because the arithmetic it's doing here depends on multiplying the number of columns. Let me show it here. So if I write another relative cell number in here, so that is cell 0, that is 1, 2, 3, 4. But now it's 5, 6, 7, 8, 9, 10, 11, 12, 13, 14. Then the cell number of ij is equal to what? 5 into i plus j. So for example, cell of 2, 3 is equal to 5 into 2 plus 3, which is equal to 13, cell number 2, 3. So that's why the C++ compiler has to be always told that the array had five columns. Without that knowledge, it's just a one-dimensional byte array. So this creates infinite misery if you want to pass 2D arrays into functions. It's almost impossible to write functions which can accept native arrays of any size, because the column information is hardware. And it's outside the array itself. So for all these reasons, although we could sort of live with native one-dimensional arrays, I just don't like to deal with native two-dimensional or multi-dimensional arrays at all, because that knowledge of five is going to haunt you when you write functions. You'll keep passing it in. You have to keep pretending that the 2D array is actually a 1D array. So why pretend? So what's happened is that there is this other library you can use called Boost, which gives you all of this. So you can include Boost libraries using a more complicated-looking path. So you say, include Boost, numeric. This is basic linear algebra. And then you say, I want matrices. I want IO4 matrices. And these two will basically give you all the basic stuff you need for 2D matrices. So for example, you can say, now note that I have not used a default namespace yet here. And you can now start using the namespace called Boost, numeric, hublase. This actually follows exactly that path. So all these classes or types are registered inside this namespace. And anyway, so you can forget about those details. You can say, I want a matrix of doubles, which is m, which is size 3 by 3. And now you get two methods called size 1 and size 2, by which you can get the rows and columns. So next time we'll go on to study Boost in some more detail before starting matrix algorithms.