 Okay, so thanks for your course review, it's a humbling experience. I was never under the impression that I was a good teacher, but for the last six or seven years I've only been teaching elective, graduate elective courses, very research oriented to a class of 30 to 35. And it's sort of a challenge that I wanted to take up and hopefully do well at the end of the semester at least. So your message is loud and clear, which is my message isn't. And there are many reasons for that. As far as the surveys go, the result is that acoustics and voice is a major problem. That problem could be removed, a lot of people would get a lot more out of the course. And the room is quiet, I can get by better. If the room is noisy, it's difficult. So clearly the acoustics is not ideal, and I'd like to always remember to slow down. If I'm out running you, the best way to deal with it is to silently raise your hand from the back. Okay, whatever you do with the desk, instead if you just raise your hand, maybe both hands if you're desperate enough, I'll slow down immediately and repeat what I was saying. Okay, that's the best way to deal with it. If you make a noise, the situation just gets worse. I lose focus and I strain my voice and so on. So we'll continue to work on speech. Acoustics is a more difficult problem. In fact, today I noticed that Convocation Hall acoustics was slightly better because there it's hard and metallic ring, so you can wait for your voice to die down before you continue. Here there's excessive padding, but the geometry isn't quite right. We can keep talking about that. What's the sense in having these back windows with a curtain which can only be raised with motors? If the power fails, that's the first thing you want to do, you want to crank, right? So thought is rare. So also we have to slow down, more examples, those are kind of expected. I'm trying to increase the number of examples quite dramatically. You'll see and we'll slow down as well. But one point of the review where I would like to kind of fight back with the protest is that about material to read, you are as much in charge as I am. This is not a particularly specialized course, there are hundreds of excellent offerings of very similar courses around the world and the whole internet is out there. So collect whatever material you want and the right way to deal with it to avoid information overload is you post links on Moodle and TAS or I read the links you have posted and we can shortlist good things to read. We can say these are really important, the others may not be so important. So that way everyone picks up more material, that's of importance. But all that being said, acquiring skills in programming is like acquiring a human language and you picked up your first language between the age of say 1 and 3 to 4 and it's never a very orderly linear process, it's not like you learn all the nouns that there are to learn in the language before you pick up your first verb or anything like that. You pick up some nouns, you pick up some verbs, you make a lot of mistakes and you say things which make people laugh and then eventually figure out how to speak and most of us including myself never figured out how to speak well. And then there's the other comment I had which is skill acquisition in the form of a programming language or system, you can take it as two different things and one of the views is not correct. So is it a novel or is it a jigsaw puzzle? So if you're reading a novel, it's more or less linear presentation, you start with the beginning of the plot and you read through to the end and if you are too impatient, you want to flip to the last page to see who the murderer was or something like that. That's sort of the novel view of skill acquisition, but the correct view is jigsaw puzzle. So the language on the programming system were not designed to be read from start to finish in a linear order, they're meant to make programming jobs easy by exports. And so it's very nonlinear, there are multiple pieces that fit together from all directions and time in the course proceeds linearly, unfortunately. So we have to keep visiting and revisiting what we are doing in the light of some partial new information we have. And so someone cracked this joke, I won't name the student, which is that figuring out CS11 itself is an iterative algorithm like finding the value of pi, you gradually get better and better values, it's not like a one-shot thing. So that's sort of my counter feedback on the feedback, but feedback coming and I really promise to improve my rendering and also slow down. So last time, of course, I just started this off and never talked about this in much detail. So let's start with the Josephus problem, it's a very nice problem where I will illustrate many, many things, writing slightly tricky for loops, two alternative encodings of the problem, one of which look like simpler coding and may even be more efficient, the other is has certain positive elements. So the first approach is what we started last time, num people, people stand in a circle and suppose their numbers 0 through num people minus 1, these are their ID card numbers or their PAN numbers, they are not going to change, a person is known by that number. We start at some arbitrary person say at 0, we keep skipping. You count a skip only if you are skipping over a person who is in, if the person is out, they are out, so you are not skipping over them. And in case your circle has shrunk and you are rotating around, you have to take this mod of num people to do the correct thing. You can repeat this until the circle becomes empty and we could ask various questions, we could ask in what order of IDs is people evicted or people evicted or which person survives to the end, there are various versions of the game or maybe at any stage of the algorithm if there is a very large number of people there, we could ask, you know, is this particular person still in or is that particular person out, these are different questions. And our first implementation model the state of each person as a bit or a Boolean value. The person is in or present, that is true or the person is absent, that is false. So we set this up as follows and professional programmers often code like this, they will start off this code after making a decision of the data structure. The data structure we will use here is a array of bits or bools, we will initialize them to all true. So the programmer will type this comment in without typing the code because the code is longer and more distracting. Then you start this loop, you say I initialize a victim at 0 and I count the number of evicted people as num evicted. Yes. So for, see this is an array of Boolean values. So you can access present 0 through present num people minus 1. So for every person ID you can detect a bit which is whether the person is still in the circle or they have been sent out of the circle. Now initially they are all present, they are all in the circle and therefore you initialize them to all true. Now I also initialize two variables, one is the victim which will keep rotating around this loop, the circle logically and we also count the number of evicted people so far. Strictly speaking we do not need this variable num evicted because we can always run through the array to find out how many are false that is the number of people evicted. But to do that every time will cost me time therefore I maintain this additional variable which is the number of people evicted. Now as long as the number of evicted people is less than the number of people I started with I keep running the loop and the first step is skip over, skip number of present people. The second step is evict the victim. So again good programmer would first type in these two liner comments to remind them what to do at specific points in the code at a high level human language and then they would expand the lines and type in the code required to implement those actions. That keeps the programmer's focus as well as makes the code easier for someone to read later to understand what is going on. Now another important thing that a good programmer would do and this I did not do the first time I presented the problem was to think carefully about how victim would be running over the array. This is actually an array but it is implementing a circle right here is my circle 0 is here num people minus 1 is there logically there in a connected loop and victim points to some particular person initially it is at 0 but then it goes around and when I skip victim will be set to a new position. What kind of properties will I assume of these positions so that the code becomes as simple as possible. Actually victim points to 0 who is present but in general as I move victim around it may well land upon a person who is left. The initial skip step inside the loop should not assume anything about where victim is initial victim could be pointing to a person who is present or absent. In either case the skip step has to correctly implement skipping over exactly skip present people and the guarantee that we have to give is that at the end of the skip sub step victim has to point to a present person. This property was not satisfied by my first person so it was uglier code. So you say no matter where I start you off skip exactly skip number of people and land at the first present person after that that is the contract. So if you build up these small contracts inside your code then the statements will work reliably together otherwise you might have this off by one or edge cases which are not quite handled correctly. And typically if you do not think it through you will try to patch it with some duct tape or silo tape or something and then something else will fall apart and this will go on. If you pick clean contracts between blocks of statements then you will not tend to patch your code as frequently. Now this for loop is tricky for starters it is not like the for loops we have written and there are some elements of subtlety. We start off with a counter which counts how many skips are remaining. So initialize it to skip but observe that there is no condition and the stepper does not update to skip. There is no condition because I will break out of the loop I will not exit the loop normally. And in the stepper all I do is advance victim by one position. Inside the loop I check if the current victim's position is present. If it is present then I am skipping over one present person therefore I decrease to skip by one. If to skip reaches zero I break. Couple of points to skip is initialize to skip which is assumed to be strictly positive. Therefore when the break is taken it can only be taken because the victim was present. The break is executed because to skip dropped from one to zero otherwise the break would not be taken in this iteration. Therefore once you exit this for loop present victim will be true. This is very simple reasoning but if you practice doing it all the time while writing loops you will get this boundary conditions correct. Now of course if skip is large then we can skip over the same candidate victims many times including skipping over not present people also and all that is expensive that is what we will address in the second formulation. Now note this use of for without a condition using a break to get out and also how the stepper does not update the main loop variable. The main loop variable update is dependent on data and therefore it best belongs to inside the loop. If you have an unconditional change to to skip like running through two arrays and computing a dot product that is the kind of thing you want to put in a stepper. If the loop variable change is predicated on states of your other variables like arrays then it is much easier to understand if updates to to skip is made inside the body of the loop where you access variables and so on. So very simple code but if you did not think carefully you would write things that are messier or harder to understand. So yesterday's class people had various questions about oh what would happen in this case that case. The nice thing about this way to write it is it handles all cases correctly you do not need to worry about it. So invariants in your mind are basically that skip is positive when two skip goes to zero I must have just walked into a present victim. Now once you do this the eviction itself is very trivial. I have already located victim to be a present person. So I just set present victim to false and I increment num evicted so that the outer loop terminates. So this is a nested loop the outer loop is about num evicted less than num people. The skip loop actually advances victim and positions it on a present person and then that present person is made absent. So the design point here is that skipping is messy and expensive but eviction is constant and eviction is very fast. Victim is now absent but that is okay because the next iteration skip step will take care of that counting the number of present people will be done perfectly correctly. So the natural concern is that as I progress into this code and in the later stages of the algorithm most of the people are absent. I am spending a lot of time checking enough a person is absent and not incrementing my or decrementing my two skip. It seems like I am wasting a lot of time skipping over people who have already left. So an alternative then is not to have a bid vector where every person is marked as present or absent but instead maintain a shrinking interior array with person IDs. So in the beginning this array which will call survivors because those are the people who have survived so far. Survivors 0 will be 0, survivors 1 will be 1 up till survivors of num people minus 1 will be num people minus 1. Those are the original IDs of the people who joined the array. As people are evicted we will squeeze out the evicted ID and reduce the effective array size by 1. So num people will now decrease step by step. The allocated space will remain the same native C++ arrays once we allocated that space is given to you until the array goes out of scope. But we just want to use the suffix of it we will just shrink towards 0. So here is the picture. So this is like busy eviction. Eviction now takes a lot of effort why? At some point suppose I find that victim position the victim ID now has to be thrown out. Unfortunately unlike in that sorting example I cannot just swap the victim ID with the last one because then I would be changing the order in which people are standing in the loop. That is a violation of the specification. So instead I have to squeeze out the victim position by copying everything between victim plus 1 and num people minus 1 to the contiguous segment victim to num people minus 2. The length of those cream colored segments are the same. There is a one shift the victim gets squeezed out and the new cell which is colored in gray is created to the right which will remain unused. So I will decrease num people by 1. Now an important thing is in what order you should be doing the copies because victims old value is no longer important the copy should go from left to right. There is an alternative situation where suppose in some application I have an array like this and I have to create suppose I have decided that this position is no longer important and I have to somehow copy say this segment here and create a new position there for example. In that case you should be copying this value first the right most value. If you are shifting left you should copy the left most value first. So this loop looks easy enough although it does cost time. So for Cx or copy iterator Cx equal to victim Cx less than num people minus 1 plus plus Cx. Survivor Cx is equal to survivor Cx plus 1 that copies that block over and then num people is decreased by 1. So skipping is now trivial so eviction became complicated somewhat but skipping is now trivial because all person IDs in survivors is present by construct. And so skipping is as simple as victim equal to victim plus skip percent num people. Num people has shrunk and everyone is present. So this takes constant time. So note that num people decreases by 1 after each eviction and when num people reaches 0 you quit. So we have just seen two methods to do this. One is a bid vector representation where each person was marked as present or absent. The other is an integer added representation where the IDs of present people are kept and the IDs of absent people are removed which is better. So the tradeoff was that in the first representation skipping was messy eviction was trivial. In the second representation skipping was trivial and evicting is messy. It moves data. Now which is better depends on the goal of the computation. We will do some envelope calculations now but you still cannot say. Suppose this is kind of a lottery scheme to be held over all people of India and we have a ring with 1.2 billion IDs in it and someone is asking the database has this person got a lottery yet or not? Suppose there are lots of queries like that. Then the bid vector representation is nice because every query can be answered in constant time. That's the Boolean. Boolean is a bid. So the Boolean add is a bid vector. Add is vector, bid is Boolean. So when I declared present of num people that's the bid vector. It's a vector of bids. Yes. Messy meaning that because of this squeezing out you have to transfer the data. So it takes time. See, yeah, yeah. What do you mean by evicting? How do you remove an array element? Remember the way things are stored in computers RAM? These are integers and they are contiguous 4 byte blocks. So you could say well I'll set it to minus 1. That's sort of equivalent to the bid vector solution. So we don't want to keep people who are absent at all. The idea is to shrink the array so that only present people are kept while keeping the order around the ring preserved. And the only way to do that is to shift the rest of the people up one position. So if you are doing repeated queries of the form, did this person win the lottery, did that person win the lottery, then the bid vector representation is fine because you can just look up one Boolean and just return it. So querying is very, very fast. On the other hand, if you're using this second representation, then if you ask is this person in or out, well by now the array has gotten all messed up and you have to do a scan, or maybe at least a binary search, which you'll soon see, to find out who is in and who is out. But at least running this algorithm, let's study how much time we will take. So I'll give thought for easier writing. So n is num people. p is the number of present people at any stage in the algorithm. And s is the skip. So this is not an exact calculation, but this gives a fairly good idea about how these algorithms will work. So in the first case, imagine at any stage in the algorithm you have this circle. The circle has a total of n people in it. And there are p present people. So for simplicity of argument, let's suppose the p people are equally spaced out. That's not true, but it's a decent average case approximation. So there are p present people, like 3. Then the distance between two adjacent present people is roughly n over p on an average. This is crude calculation, good enough for our purposes. Now remember the messy skipping logic in the first case. Skipping over absent people didn't matter at all. So if I was to skip s present people, the total cost would be s times n over p. Between every two present people I'd be wasting n over p effort skipping over absent people. And then I'd do that s times. The total skipping cost would be sn over p. Everyone is OK with that? Show of hands? So remember the two skip was decremented only on finding a present person. And that would be found every n over p steps, roughly. And you'd be doing that s times. So it's s times n over p steps. Now so that in the beginning, p was n. In the end, p was 1. So the total time taken is sum over p equal to n down to 1, sn over p. That's roughly equal to sn can be pushed outside. And this would be about log p. 1 plus half plus 1 third, et cetera, up to 1 over p is about log p. So this is the scheme 1. The second scheme is also not easy to analyze exactly because you don't know where the victim would be landing. But in the worst case, what's the worst case for the victim position? First one, you have to shift the whole thing in. The victim gets out, and the entire current array will be squished down. So in the second case, the worst case scenario is that in the first step, the eviction procedure will take n minus 1 time because n minus 1 elements will have to be copied. In the second step, n minus 2 elements will have to be copied. You know, victim can always, in the worst case, land up in the first position, say. And so on until 1 has to be copied. So this is the total cost of the second scheme, which is about n squared divided by 2 or whatever. So sorry, this is not p. This is n log n. p is summed out. So you'll be comparing sn log n versus n squared. That is the basic comparison. Now if your skip is very small, like a constant, like 2 or 3, then n log n is less than n squared. And you should prefer the first algorithm. Whereas if your skip is very large, for example, half of n, then clearly the first algorithm is much more inefficient because you're getting n squared log n. Whereas on the right hand side, I only have n squared. So I should prefer the second algorithm. So this is also a demonstration that you cannot always say which algorithm is better or worse. It depends on what kind of input data you are expecting and what kind of output people will expect from the data. If your only job is to calculate the eviction order, then our decision will depend pretty much on the value of s. If your code will also need to serve as a database where people will ask is this person in or out very frequently, that's a different consideration. In that case, the first one may be better, even though it takes more time to calculate to the end of the sequence. So these are considerations which you need to keep in your mind when choosing representations and data structures for coding up things. So that's what I had to say about the Josephus problem. Any questions on this? So slightly tricky loops, but good practice. Two different representations. Trade off was opposite. Easy skip, difficult eviction, and vice versa. And then a comparison of their complexities. So the other example you already saw last time. So I'll skip over this relatively fast. We know how to calculate largest and smallest elements in an array. We can even remember the position of the smallest or largest element. We can use the knowledge of the position to swap with the front guy so that we can keep doing this and get a sorted array. This is called selection sort. You select the smallest value and pull it to the front. We also saw about prefix or cumulative sums. If you do it the obvious way, you'd keep adding from scratch for every element of b in this equation. bi equal to a0 plus a1 through ai, and that's nice. You should reuse previous computation. And that's pretty easy because b0 is equal to a0. Now instead of computing b1 as a0 plus a1, we should compute it as b0 plus a1. Then we can compute b2 as b1 plus a2 and so on. So you'd ordinarily write that as bix equal to bix minus 1 plus aix. But there is a small hitch at the beginning because ix minus 1 is minus 1 when ix equal to 0. So we need to provide a fake value of 0 from the left. And you can do that, say, with this line of code, which is if ix is equal to 0, then supply a0. Otherwise, supply bix minus 1. No, a0. So the definition is that b0 is equal to a0. So you write this as b0 equal to 0 plus a0. b1 equal to b0 plus a1. Let me draw the picture. It will be very clear. So here is a0. I want b0 to be equal to a0. Here is a1. I want b1 to be equal to b0 plus a1. In fact, I write this as 0 plus a0. And I feed 0 from the left. Could just write something like this. To simplify life, we could say inside the loop, we could say if ix is equal to 0, then bix equal to just aix. Otherwise, you can write the whole problem, which is just bix equal to bix minus 1 plus aix. You could do that as well. Ax? Yes. After a question about the theory of bix. If you do that, then that last aix will create a problem. Last aix is only in the case when the ix is not equal to 0. No, no, no. So look at the nesting. The nesting is that this bracket closes there. So here is a good example of why you shouldn't have too many functional expressions in your code. If it's difficult to read in this large font, imagine what it will be. So that's another reason why you shouldn't do it. And that's that comment here. So current microprocessors are implementing what's called pipeline execution. Inside the for loop, there's a lot of things to be done. ix has to be multiplied by 4. You have to offset it from a base address. You have to access those 4 bytes as an integer. You have to send out the fetch request to the RAM. The RAM responds back. Current processors are very efficient in pipelining those operations. So it will say, I'll issue the address for a, and while I'm waiting, I'll calculate the address for b, and then I'll pipeline it and send it to overlap the times. Now this very well tuned airflow over the wing of a plane becomes turbulent anytime you do conditional executions like this. Because now the CPU has to wait for the value of ix to come back from RAM or a register to decide which to do. So generally speaking, pipeline operations are messed up by conditional executions. So you should neither have this form nor have that form. You are much better off cleaning up the first value separately. Just say b0 is equal to a0. Then start the loop ix equal to 1. That's going to be much faster than doing this every time in the loop you are checking a value. Now applications of cumulative sum we have seen. The next problem we'll look at, I don't think we can do the whole lecture of course, but I'll stop with this problem which is merging sorted arrays. It's pretty easy. Suppose I have two double arrays a, m and b, n. So there are m elements in the first array and n elements in the second array. Repeated values are possible. They are allowed. And each array has been sorted in increasing order. Now we already know how to sort. So this is not foreign to us. The output array will be c, m plus n. So the output array size is the sum of the input array sizes. And again, we will include all duplicates. The output array should be sorted. It should contain all the elements in both the arrays. All the elements of a, m and b, n including all repeated occurrences. So my job is to merge two sorted arrays into one sorted array as output. Simple job. So example if a is 0, 2, 7, 8 and b is 2, 3, 5, 8, 9. Those values in sequence, each of them sorted. Then the output should be 0, 2, 2, 3, 5, 7, 8, 8, 9. Including all the duplicates. So how do I sort? Should come to most of you. We run two indexes, a, x on a and b, x on b. These are also called cursors in computer science. Cursors run over arrays, picking up values or changing values. We choose the minimum among the two cursors and we advance that cursor. That's the basic logic. And we append the chosen number to c. So initially a, x is at position 0 and b, x is at position 0. We won't look at the positions because they are implicit in the diagram. We will say a, x is pointing to value 0, b, x is pointing to value 2. Now we compare those values, 0 is less than 2 and therefore 0 wins. So 0 is put out as the first element of the output. As soon as I do that, I am going to advance a, x, the cursor from which 0 came. At this point they are pointing to 2 and 2. Suppose arbitrarily I break ties in favor of the upper guy. So in that case the upper 2 is going to make its way to the output and a, x is going to be swung to the next position of 7 with value 7. Now 7 and 2 are being compared and now the lower guy wins. So the other 2 makes it way over and b, x is incremented to the 3. At this point I am comparing 3 and 7. 3 wins goes to the output, b, x is advance to 5. Now 5 and 7 are compared. Again 5 makes its way out, b, x is advance to 8. This time a, x wins, 7 goes out, a, x is advance and so on. So that is how merges of sorted sequences can be done to produce a sorted output. The code is pretty simple except that there is this situation where one of the arrays gets over, finishes. In which case by construction all the values remaining in the other array are greater than everything you have sent out so far. So all you need to do is to just continue the good work with the other array and pick out values from it and just append it to whatever you have said already. There is nothing particularly special to do. And this could happen with any one of the 2 arrays. Or they might finish simultaneously also. If they are strictly interleaved values then they will finish at the same point. So here is the code. Double a, m, b, n, c, m plus n. a and b are suitably initialized. So we start of a, x equal to 0, b, x equal to 0, c, x equal to 0. A, x and b, x are what are called read cursors. This is the right cursor. We will use it to write to c. So while neither input array is over, neither input array is exhausted, we compare the elements at the head, at the cursor. So if a of a, x is less than b, x then c at c, x is said to a at a, x. And this is one case where c's auto increment operator is nice to read. So you say c, c, x plus plus is a, a, x plus plus. So c is advanced and a is advanced. Otherwise c, c, x plus plus is b, b, x plus plus. This one breaks ties in favor of b. When they are equal then b will be pushed. Now once this loop is over, all we know is that one or both of the arrays have to be empty. That is the exit condition. While a, x is less than m and b, x is less than m. Any one of them reaches the end, I will quit the loop. At this point, I can just say while a, x is less than m, append the rest of a. And also while b, x is less than n, append the rest of b. I do not need to check carefully about which is still pending and which is not, etc. It is just that exactly one or none of those loops will at all execute. Any array which is emptied out, that corresponding loop will not even execute. Yes. Yeah, I can give you exactly interleaved values and you will be taking exactly one from each, running the two cursors in tandem. Yeah. Yeah, that is right. At the end it cannot. So in either case, the last two I loops will clean up whatever is left correctly. So that is how you merge two sorted arrays. Now, I will not go into this in detail today, but this offers you another way to sort. If I know how to merge sorted arrays, suppose I am given a long array to sort, which is initially completely unordered. I can imagine that instead of being given n elements in an array which is unsorted, I have been actually given n different arrays each with one element. Suppose you look at the input in that form. If an array has only one element, then it is sorted by definition. So instead of looking at the input as one array with n elements, think of it as n arrays with one element each. Then those are already sorted. Now let us take two such arrays at a time and merge them using the merging technique we have just seen to make up n over two arrays, each of which have two elements. And these two elements will be sorted. Then we will take size two arrays and merge them to form size four arrays. We can keep doing this until we get one whole sorted array. So suppose the array length is a power of two for simplicity. The first sort will be segments of length one, so they are already sorted. Then we will merge into sorted segments of length two. So the run or the block or the segment between elements at position zero and one will be sorted. Two and three will be sorted and so on. But I won't be able to say anything about the relative order of elements in one bracketed run and another run. And then I will merge them into runs of length four and so on. So naturally this is called merge sort and here is the picture. So don't read the comments now. So initially I imagine the array of n elements is actually eight elements, is actually eight arrays with one element each and they are individually sorted. Now I take two of them at a time and I merge them. Thus creating four arrays each of size two which are individually sorted. Today the colors are showing a little better. So the pink array is sorted within itself, the blue array is sorted within itself, but they are not sorted across them. Then I take two of those and I merge them into two arrays each of size four. And finally I merge these two into a single array which is sorted from start to finish. Now recall that in the beginning of the course I said that accessing any location of RAM takes about the same amount of time, roughly speaking. But access to hard disk is actually much slower because the hard disk has a rotating platter. On the platter are concentric tracks, circular tracks. On the tracker record all the bits. Your recording and playback head just like in an old record player has to move and locate itself over a particular track. Then the disk has to turn until your desired bit is under the head and then the read-write can be done. So if you want to use a hard disk to read one byte or four bytes that is very inefficient because you have to rotate the disk and position the head just to read a few bits or bytes. Hard disks are best used for what are called sequential reads and writes. You position the head once and then you read or write through that entire circle. Then you position the head a little bit different from the next track which takes very little time. Then you scan that whole circle again. That is called a sequential read or write pattern. Mod sort is extremely friendly to the hard disk because once the runs get a little large all you are doing is you are reading two long runs of bytes suitably encoded as integers or doubles or whatever it is. And you are writing out one long run. Very soon in the algorithm your access to disks will be in long blocks. And so mod sort is the method of choice for sorting if your data is not large enough to exceed your RAM and data lives pretty much on disk. As a particular motivator, Google's search index for example involves a lot of mod sorts because when you read the documents that you have crawled you look at the documents one by one in document order. You scan the document and you find the word computer appears on this document here, that document there. But when you are searching Google you are saying find me all documents that have computer. That is like a big transposition. Your key changes from being the document ID to being the term ID. Every term like computer will have an ID. So this involves a big mod sort. I need to collect whatever document had computer in it and put them in one contiguous chunk so that during search you can access it very fast. So that is done using a generalization of mod sort which is called map reduce. How many of you have heard of map reduce as an algorithm? So Google's work horse, the central piece of code which indexes for Google is called a map reduce operation. It's a generalization of mod sort. It's very similar. Now the other thing you realize is that in the kind of problems we have been seeing in the labs suppose I want to I am given an array with duplicate entries in it and I want to squeeze out the duplicates. I want to deduplicate it so that each element only appears once. So you realize that I could also do that by mod sort except that when merging if I found that the two values are tied and only copy one copy of it. I will not copy the other one. So merging goes hand in hand with deduplication. I will do that on the flight and not cost anything more. So we will stop there and next time we will look at mod sort in more detail. Look how much time is taken and so on.