 Today we will be discussing another important concept in programming and that is about representing a set of values. We had occasions when we needed to represent multiple values of a same type and so far we have taken the recourse to use differently named variables. We look at the array structure, we will see how C++ permits us to access elements of an array, specifically we will look at the notion of the index expressions. Arrays constitute an extremely important data structure in programming, particularly when you have to handle very large volume of data of similar type, you cannot do without such a structure. Obviously, when you are handling very large data values, there is the need to arrange these values in some order, what we call sorted order, so shall look at sorting of the arrays and for the first time we will look at the execution time that our programs take. There is also another data type which we have religiously avoided so far and that is the data type called care type or character type which permits us to handle strings of characters. We have been using strings in our programs mostly as string constants and mostly written as part of C out statement where we want a specific string to appear on our screen when we give input. In this week we shall also see how the strings would be handled, of course that part we will be looking at in the next lecture and we will have of course some more programming examples. The programs from this point onwards in the course by the way will become larger and more complex as we learn to handle more complex problems. So do not get scared by the length of the program. So far the programs that we have written are all 5 line, 10 line, 15 line programs. The programs that we shall now be writing would typically have at least 25 to 30 lines, may be 100 lines and they would have a lot of nested control structures because we will be required to use complex logic in implementing it and that is the reason why we will increasingly be using the notion of functions which permits us to model the program and separate out some portion of the repeatedly used code. We revisit the problem of finding out the maximum, all of you are familiar with this, if we have to find out maximum of some numbers. First we try to find out maximum of 3 numbers, we had 3 variables v1, v2, v3 then we said 4 or 5, we could extend the logic and this is a possible program which will find out maximum of 4 given numbers. So we read each one of these 4 into different variables, start with max equal to some number and keep comparing subsequent numbers with that max. We have already argued that this logic cannot be used if we want to find out the maximum of 200 numbers or 500 numbers. So we said can we generalize it and can we use only a single variable v, if we do not need the individual values of v1, v2, v3, v4 later for our problem. And if the problem is only to find out the maximum, we could do that and that is how we introduced in fact the notion of iteration and finding out maximum of n numbers etc. But imagine if that is not so, imagine if you require these values, what do you do then? So we now look at the need for handling multiple values. This is what we did when we just wanted to find out the maximum, okay? Just generalize the comparison operation and the assignment operation and use an iterative control to find out the maximum. So this is the important lesson from that exercise that while we get the maximum, we lose the individual value. And we had argued that if I want to find out only the maximum, I don't need to retain the individual values. But what if I need them? What if I need to access these values later? So here is a problem, find individual digits of a given four digit number and form a new number which has these digits reversed. There's one of the assignment problems last year. In fact, some of the, one of the problems given was that you have to identify three different digits of a number. So people use this kind of logic. They defined integer variables d1, d2, d3. If n was the number, they find out modulo n, n modulo 10. And that would be the last digit. So they would assign it to d3 and so on. Also, one of the students I observed was trying to use an ingenious trick. He was trying to do something like this. He defined d1, d2, d3 as integer variables. Then he said that I have d1, d2, d3, I have the iteration control. So if I count for i equal to 1 to 3, and if I use this i to access these individual variables, so he wants something like di equal to something. This is actually quite intelligent use, except that it was wrong. Because in C, d1 as written like this, d2 as written like this, d3 as written like this, individually is a single standalone name. It has no parts. This does not say that d is one name and 2 is an index. It does not say that. It's just name. So by saying di equal to this, you get a compilation error. Because di itself, as written like this, has to be a separately defined variable. And any change in the value of i will not change the name. And he was wondering why the hell this is happening, because mathematically, that is how you write di equal to something. And if i changes, di changes. Unfortunately, that doesn't work in program. His guess was right, that he wanted some similar facility. Because what if you have to handle up to d9? And later on in today's lecture itself, we shall look at an example of implementing multi-precision arithmetic, high-precision arithmetic. Suppose I want to add two 50-digit numbers. The conventional in the long, etc., do not provide me with that facility at all. The maximum number is 2 raised to power 31 minus 1. So obviously, I would like to retain these individual digits of a very large number and then handle them independently by writing my own programs for addition, subtraction, multiplication. But how do I retain them? So this is the problem. If I have to retain the individual digits, I have a problem of this kind which I would like to be solved. So for extracting individual digits, we can use four variables like d1, d2, d3, d4. This is in fact a program which could do that. So I get a four-digit number. I check whether that number is between 0 and 999, 999, 999. If that is not so, I say it is invalid number. Please note that by introducing backslash L in any string, I am actually forcing end L. So I don't have to write end L separately if I put backslash L. Anyway, so I calculate d1 like this, d2 like this, d3 like this and d4 like this. And these are the digits. So I put these digits and I calculate the new number as a reverse number as d. So notice that I have to actually name individual variables as d1, d2, d3, d4 if I want to retain their identity. And then calculate individual values and then formulate this. I could have done this problem without retaining the individual digits by the way. There is another way of doing that by generalizing an iteration. But this is just to illustrate that if I need those values, then I have no recourse so far but to do this. Precisely to address such issues, C++ provides a very powerful representation. So when we want to handle very large number of values of similar kind, the keyword here is similar kind. All integers, all floating points, all later on as we shall see, all care or all structs or whatever. So difficult to use variable names for each of these values. Firstly, it is inconvenient. Even if I know I have 200 values, write 200 variables is inconvenient. And secondly, it is impossible if I don't even know how many variables I need. We will subsequently be required to manipulate matrices, two-dimensional arrays. All of you are familiar with matrices, matrix multiplication, matrix inversion. We shall be studying matrix inversion. You just can't handle a matrix n by n matrix unless somehow you can individually represent each of the n square elements of a square matrix n by n and manipulate those values. And manipulated values have also to be retained. And I can't have 200 into 200 variable names impossible. So we need a different mechanism to structure such data. And the mechanism in C++ as a matter of fact in most of the programming languages is called an array. It provides a single name for the entire collection. That is what is important. Single name. However, it permits individual elements of that collection to be accessed through a mechanism called index expression. So we shall see exactly how arrays are declared, how index expressions are evaluated, used, etc., etc. But first let's look at the motivation, why we require arrays. So I have shown here a diagram. I have different variables A1, A2, A3, A4, A5. Imagine that these variables have values 53, 79, 41, 94, 38. Suppose these are five variables out of some 100 variables which I require in my program, like D1, D2, D3, D100. If I require those many variables, then I might declare them as A1, A2, A3, A4, A5, etc., up to A100. Notice these names. These are all individual names. And assume that the compiler has allocated memory one after another. Remember if they are integer elements, each memory location will be four bytes long. We already know that. However, these are all individual variables. How nice it would be if we could define all of them as a part of an array, a single structure. That is why I have shown these values as inside some kind of an array-like structure. If I had this array, then I would have a single name for this array, say A itself. Not A1, A2, A3 as individual names, but just A as the name of the array. And then I would call these individual names A1, A2, A3 as if they are elements of that array. So imagine that if I wanted a one-dimensional array of 100 integers, which at any one time would have some n values and let's say n is equal to 5 as an illustration. In which case, we may define 100 variable names and use 5 of these. Or alternately, I can use the notion of an array. What permits me to use the notion of an array? Can I actually have a single name and yet individually identify different elements of the array through some other mechanism? So this is the motivation. Suppose these are 100 variables, A1 to A100. And let's imagine that A1, A2, A3 are all allocated consecutive memory locations by my compile. Assume that A1 has been allocated a memory location whose address starts with 12,000. We will call this base address. Now if A1 value 53 is to be stored here, it will be an integer number, it will take 4 bytes. So the consecutively next location will be addressed not 12,001 but 12,004. The subsequent location will be addressed 12,008. This will be the third element and so on. What we notice is quite independent of the fact that we have used 100 names in our program, if the compiler can somehow manage to allocate addresses which are consecutive, then the compiler should be able to do the following. It knows the base address and suppose I suddenly say go to the third element and get me. Independent of the name of the variable, compiler should be able to add appropriate number of bytes to this base address and come up with this address. So it is possible for my compiler to access individual consecutive memory locations of different integer variables. If that is possible, then why not we have a mechanism by which instead of individual names, I give a single name to the whole collection. That in fact is the concept of an array. So to look at how compiler handles A, an array is a collection of similar elements. So let's just note down certain important points to remember in the context of an array. So while array is a collection, at any one point only one element of the array will participate in any operation. So if I say array A of 100, I can't say A is equal to B where B is also another array of 100 elements. Later on when we study the object oriented concept, we shall see that using some object libraries and vectors, we can actually do that but basic programming facilities do not permit that. Array is a very basic concept. So input or output computations that is as an operand of an expression or as a location on left hand side of an assignment, as an L value as we discussed. These are the three places where a variable participates. And at each of these three places, never ever only the name of the array must come. What must come is a reference to a single element of the array. So we should understand that while array is permitted to be defined, it is not permitted to be used as a single name anywhere. It is permitted to be used only as an element of the array at an appropriate place. How do we define an element? An element is defined by an index expression. And this index expression is written inside square brackets immediately following the array name as we shall see. Now that is the difference. So for example, if instead of this, I had declared an array, this would be the declaration of an array. This is not index expression. When a number appears in a declaration like this, it means I am declaring an array of 100 elements. Now later on in my program, if ever I said di equal to something, this actually means ith element of the array d. Notice very, very fundamental difference between this type of notation and this type of. This is di, a single name. And if it is not declared, you will get into an error. This is an element of an array which means the array named d must be declared somewhere earlier. And then I am permitted to use an index expression. Very obviously with such a mechanism, you can see that the intelligent way in which the student was trying to solve this problem can indeed be solved now because this is an index expression. And in this expression, if any count i, j, k, etc. appears, obviously compiler should take note of that and do the appropriate computations. We shall see some examples of this to understand index expression. The point is index expression is to be used to refer to an element and its value is the actual index. So when we say ai, it is not a generic ith element. That is what we mathematically understand. Ai appearing anywhere in my program does not stand for anything unless the value of i is defined at that point. And whatever is the value of i defined at that point during the execution becomes the value of index. So if i is equal to 5, it is the 5th element. If i is equal to 123, it is 123rd element and so on. Here is therefore the same notion that we started with. We saw 100 different variables. We now want to declare an array of 100 elements. This is how it is declared. So look at this a0, a1, a2, a3. They are written differently now. This is not the single name of a variable. In fact, the name of this entire conglomerate is a. And this is 0th element, 1st element, 2nd element, 3rd element, etc. Up to a99. But wait, we said we wanted 100 elements and we had sort of named them as 1, 2, 3, 4, up to 100. Why are we reverting back to 0? That is because in cc++ the array index starts with 0. So what we would traditionally call as the 1st element is actually 0th element. The 1st element is in fact the 2nd element and so on. The 100th element is 99th element. When I say 99th, I mean index 99 means the last element. So this is a peculiarity that we have to remember. It might cause some confusion when we try to translate our mathematical formulations into c programs, but with some practice we can get over it. So we note that when I write a0, a5, ai, aj minus 5, etc., this is actually an expression which is permitted to be written. And depending upon the value, a particular element will be indicated. And array elements start with 0. So here are some additional important points to note. First of all, array is a special data structure. A single name representing a set of values and all values must be of the same type. It is not possible to declare an array in which few elements are of integer type and a few elements are of floating point type. All elements of an array must be exactly of the same type. Whenever we declare an array, we declare three things. The type of values which will be occupied in that array, the name of the array and the size of the array. So this is a declaration for that. So when I say int a100, the size of the array is 100. That means it will have a total of 100 elements. Of course as we understand, the index will start from 0 to up to 99. Each element will have only an integer value. Here is another example, float daily underscore temperature underscore value. So I am just recording daily average temperature values, let's say. I will need an array of 31 elements. Obviously in the month of February, I will have only 28 elements filled or if it is a leap year, 29, whatever. In general then, I would declare an array to have a size which is sufficiently large enough to contain the required number of values that I actually want to store. So for example, if I want to store marks of students in this course and I know there are 600 students, next year there will be 400 students but I know totally in CS 101, there can be maximum of 1000 students. So even if the institute decides to combine both the groups together and conduct a single class, it will be about 950, 960 students. In this case, I would rather declare an array of 1000 elements. Although I know that at any time today, I have only 560, 570 students. Name of the array represents the entire set of values and thus cannot itself be used alone in any computation. Very important point. The name alone cannot appear anywhere. The name must appear along within index. So an element of an array can be used wherever a variable can be used and reference to an array element is made using an index. So this is what it means. Index is an expression which must evaluate to an integer value. So if I say array A, Xth element where X is 5.348, it does not make sense. So whenever an index expression does not evaluate to integer the compiler forces a conversion to an integer value as per the rules of the game and that integer value will be used as an index of the array. Here is a program to find out maximum of n numbers but using arrays and therefore retaining all individual values. So look at how we are declaring an array int A100. So instead of saying v1, v2, v3, v4, v100 we are saying int A100. There is a variable which will locate a maximum value. n is the number of values for which I want to find out the maximum and i is my usual counting variable. I input the value of n. Now what I do is for i equal to 0 to n minus 1 please note i less than n is the final value. Why? Because index starts with 0 and ends with n minus 1 for n numbers. For what I do here is I read a value. After reading a value let's first look at this loop and then wonder about it. This loop says i equal to 1 i less than n i plus plus. If ai greater than max, max is equal to ai. This is the generalized loop instruction that we have found except that every time we execute this we should have a new input value. Are we getting that here? So what was the logic that we had used in finding out maximum of n numbers even though we were not retaining the values? Will this program work? I claim it will work. Let's analyze the way this program is written. I read the value of n. Then I am executing an iteration which moves from i equal to 0 to n minus 1 and in every iteration it reads a value. So it will read a 0, a 1, a 2, a 3 up to a n minus 1. Agreed? The loop has a single instruction. So I read all n values. Now I said max is equal to a 0. The first of the values which I have read which is a 0, the 0th element I am setting it to the maximum. And now I am again having an iteration. In this iteration I am not reading anything. I start with i equal to 1 to n minus 1 because the first value the 0th element I have already assumed to be max. I will compare every successive element with max. If any element is greater than max, I reset the value of max to that area. So why will this not work? Sorry, you have one at a time please. Let's execute this program for a few variables, four variables of four elements or something, an array of four elements. So somebody had said it will not work very positively. Can you just give the logic why it will not work? Let me take n is equal to 3 and let's take three values. Let's take values 5, 17 and 2. 5, 17 and 2, 17 is the largest value. When I execute this program what will happen? n is equal to 3 so I will read the value 3 in n. This loop will run for 0, 1 and 2. It will read a 0, a 1, a 2. a 0 will be 5, a 1 will be 17 and a 2 will be 12. 5, 17, 12. Now I execute this instruction which says assign a 0 to max. a 0 is 5, so max becomes 5. With this value of 5 I go here. I varies from 1 to 2 only because n is 3 so I will go from 1 to 2. Consequently this iteration will be executed twice. First time when it executes it will compare a 1 to max. Is a 1 greater than max? a 1 is 12, sorry, 17. 17 is greater than 5 so max will be set to 17. i is incremented, I go back again. This time I would have become 2. So a i now refers to second element. Second element is 12. Is 12 greater than 17? No. So max will return 17. I will come out and when I come out I say maximum value is max. So what is wrong in this? What is wrong? Yes? You are currently bracket. Well let's see there is no currently bracket for this for loop because there is exactly one instruction in that loop. I could if I wanted put a currently bracket here and put a currently bracket here and there. And yes as we agreed that would be the proper way of writing things. However syntactically and semantically this is correct. The entire first statement is executed only for this statement. There is no other statement in that for loop. That is correct. So he is observing that the program will not work correctly if the value of n given is more than 100. And the reason why it will not work correctly is that cc++ has no mechanism of checking whether any index expression that you write is within the array bounds defined by you or not. It does not check. It is your responsibility. So for example if I is a negative value or I is 1000 what will the program do? You remember the mechanism that we saw how compiler will be looking at consecutive addresses. If I is 1000 for example what the company will do is it will still take the base address add to it a displacement equivalent of 1000 elements and go to that memory location. Now that memory location does not even belong to you. It is somebody else's program. It will pick up whatever trash there is from it. And you may end up with lots of errors. So it is your responsibility to ensure that this does not happen. Consequently you should check the array bounds appropriately and in real programs as we shall see such checks will have to be implemented. In fact a generic name in data processing circuits given to such checks is called validating input health. Whether input is healthy or not. If it is unhealthy we should say your input is wrong. I can't handle it. We shall see some examples. But you understand this now? It will work correctly? Fine. Now let us add one more squiggle. I have found out the maximum value. But I also want to find out where that maximum value occurs. Is the third element the largest? Is the 54th element the largest? I just want to find out its position. How can I find out? Well, I know when the last element was exchanged here. So whenever this element was exchanged, at that time somehow if I can remember the value of I at that point in time, then when I come out of the whole thing that value will represent the position. So here is what I do. Find maximum of n numbers and its position. I do the same thing. I read out all the n numbers. I start with max is equal to a0 as earlier. But I have added another integer variable called pos to represent position. Initially when I am assigning the 0th element to max, obviously the position is 0. It may so happen that 0th element is the largest in the whole array. In which case I better retain that position. Now I run through this iteration. But in this iteration I do something interesting. Instead of just assigning max to ai, in case a number is greater than max, I also assign i to pos. So that means I am remembering the value of I at that point. Consequently when I come out of this entire iteration, I will have the maximum at max and at position pos. Is that clear? So without extraordinarily complicating my algorithm in the same logic where I find the maximum, I can also find out its position. I do another extension now. I have found out the maximum. I have found out where it is in my array. Now let's say I desire that the maximum element should be at the top. After all when we declare a meritless, the top performer, the top and so on. Suppose I wanted the top performer to be at the top, then what do I want to do? I want to exchange the element at the pos position with the element at the 0th position. I can't lose the current occupant of the 0th position. Those marks are also important. I want to exchange. If I do this swap, how will my program look like? So here is the same logic. I calculate max and pos. So when I come out of this iteration, I know max and I know pos. Now what do I want to do? I want to exchange it with 0th element. So I use the standard swapping technique. I put a temporary variable. I declare in temp here. If I want to swap two elements, I can't just say assign this to this and this to this. Because when I do first assignment this to this, this value is lost. So I have to take a value in a temporary variable. Then assign that value here and that value here. This is the standard technique. So I said temp is equal to A0. A0 is equal to A pos and A pos is equal to temp. This will exchange the elements. So I have maximum at pos position pos and exchange with the first element. This is actually not semantically correct. I should say exchange it with 0th element. Because 0 is the first element really not first. So what does such exchange suggest? Here are interesting possibilities. Suppose we have six elements in an array. I am just giving you an example. I have these six elements, 2450 to 1395, 6450. Let's say these are marks out of 100. I want to find out the maximum and I want to exchange it with the 0th element. So when I execute my logic, I will find out the maximum as 95 and pos will be 3. Notice this is 0, this is 1, 2 and 3. So maximum is occurring at element number 3. When I exchange that, what will the array look like now? Will it not look like this? 95 would have been exchanged with the 0th element. So 95 will come here, 24 will come here. Agreed? So after the exchange, this is how my array will look like. Now imagine that if I extended this concept, I have found out the maximum and moved that maximum value to the top position. Now I imagine as if I have an array only from element 1, 2, 3, 4, 5. And I put exactly the same logic. Namely I find out the maximum amongst the remaining elements, find out its position and exchange it with the top of the remaining elements, namely this element. So if I did that, if I repeated the process for the remaining 5 elements, I will now get this. Agreed? These were the remaining 5 elements, 50 to 13, 24, 64, 50. These are the ones. Remember 24 came because of the exchange. Amount these 5 now, if I repeat my algorithm of finding out maximum, then I will get this 64 as the maximum and it will get exchanged with this 52, in which case I will have this. So one more exchange like that has permitted me to find out the maximum of the remaining numbers and got it into the top position amongst the remaining numbers. Now my array looks like this. The question is, what does this suggest? So I can order all my entire array elements. If instead of finding out the maximum with respect to the entire array, if I generalize my algorithm to find out the maximum amongst k elements at any time, and if I varied k in an outer loop, where I was starting with max is equal to a0 only, if I started max is equal to some ak, where k itself varies from 0, 1, 2, 3, 4, 5 and inside that outer loop I still examine all the remaining elements from k plus 1 to n, then I will get all the elements sorted. So you see the same logic that I have used by adding an outer iteration, which will enforce this kind of logic for all the remaining elements again and again and again. The outer iteration will also have to run n times because there are n elements. Consequently my program for salting becomes like this. So you see it's a very elegant solution. It's not a very large solution but this can sort n values. This can sort n values. So I start with same thing a100 max n i. I have introduced an additional variable k because I want to say that at any one point in time with respect to kth element I will do this maximum finding and exchanging. I have as usual the pos and tem. I read the value of n. I read all elements of the array. This is my outer iteration for k equal to 0 to n minus 1. So I want to successively consider a kth element as max. Observe that this entire program becomes the same program as earlier if k is 0. If k is 0 we are finding out maximum of all elements. Initially k will be 0 here and this entire execution will actually find out the maximum of all n elements. Exactly the same logic that we have done. It will start with a k. The loop is written differently. It does not say i equal to 1 to less than n but it says i equal to k plus 1 to less than n. So when k is 0, k plus 1 will be 1 in fact. And I will be finding out for the first time when I come here finding out maximum of all numbers, finding out this position and finding and exchanging that position with the top element. Remember k is 0. I will be actually exchanging the maximum with a k from the a pass. However, I do not come out here now. I go back. This is the outer iteration. The outer iteration will now increment k by 1. The 0th element already is sitting pretty with the maximum of all numbers. So with k equal to 1, I will take whatever is the first element to be max and find out the maximum amongst the remaining elements. When I do that, the second maximum will come out and when I go back here again, when k will become 2, k 0 and k 1 would be the largest and the second largest value respective. As this iteration continues, at the end I will have all numbers arranged in proper order. This is called an exchange sort or a bubble sort because what I am doing is I am finding out the maximum and exchanging it with the top position. So I am basically inserting that number into a proper position after finding out the maximum. Let us look at how much work we are doing here. We are running an inner iteration and outer iteration. Outer iteration is running how many times? End times. Inner iteration is running how many times? Well, the first time it is running n minus 1 times. Second time it is running n minus 2 times. Third time it is running n minus 3 times etc. In general, the total number of activities that we will be doing here will be proportional to n multiplied by n minus 1, n minus 2, reducing things. This is roughly something like n square by 2, n square by some portion of n square. Those many times I will be making this comparison and I will be making this assignment. And this particular exchange I will be doing only once for the outer iteration. So this I will be doing n times. Why I am saying all this? Soon we shall be looking at the effort which are required, computational effort which are required. So far we never bothered about how long a machine takes to execute our program. All the programs that you compile and run, they execute within less than a second. So you immediately get results. But if I am handling a million values, will that still run in one second or will that take one hour? If it is going to take one hour, can I write my program more intelligently such that it takes only half an hour? These are important points to consider and we will look at these. But this is then the exchange sort algorithm. Using this now, we write a program to find top performers in an examination. This program, I will be posting this on the model on Friday morning. So this program actually does exactly the same logic as we did in the maximum finding program, except that the nomenclature use is different. This time I have roll numbers. I have marks. I have max, n, i, k, pos, etc. I collect the number of students as was being observed by our friend. I do an input health check. If n is negative or n is greater than 999, I say invalid n and I execute an instruction return 1. Remember return 1 is like break. Get out of the whole main program. Nothing will be executed for that. However, if n is correct, I will collect roll number and marks for each student and I will request data for one student in one line separated by a space. So roll number marks, roll number marks, roll number marks. I have an iteration i equal to 0 to n. I will input roll number and marks. So you agree that this iteration will read all the marks and roll numbers and marks? In one execution, it is reading one roll number and one mark. So the typical way in which we will give the data is roll number first and marks first. Please note that your existing roll numbers cannot be fed into this program because they often contain a character D. They are not integers. That is why we shall be studying care type data handling in the next class and we will be able to handle those as well. For the time being, we assume that roll numbers are what they are numbers. If that is so, then we read all the data here. Please note that we want to arrange marks in descending order. But will it be alright if the roll numbers which are read in the array are kept as they are and only marks are shuffled? You will get the maximum mark at the top but they will now get assigned to a roll number who never earned those marks. The roll number was the first number that got entered here because that roll number might like the situation but that is not the correct solution of the problem. So what we must do? Whenever we swap marks, we must also swap the corresponding roll numbers so that whenever marks are rearranged, roll numbers are also correspondingly rearranged. That is not such a big extension. At the point where I am swapping maximum with the top element, I will swap the roll number also with the top element. So this is the input and this is the logic for finding the top element. So this is the outer iteration which varies k from 0 to m, 0 to less than n. This is the inner iteration. In the inner iteration, I start with max is equal to max k and find out if any max i is greater than the current max. If so, I reset the max and I remember the position. Once I remember the position when I finish the inner iteration, I exchange the marks at post position with marks at the top kth position. And when I do that, I do exactly likewise exchange the roll number. You agree that this will move the pair correspondingly roll number and marks. And at the end, I will just print the top 10 performers. So i equal to 0, i less than 10, I will output roll number and marks. You agree that this will give me the top 10 performers. In fact, if I printed the entire end students, I will have some sort of a merit order of all the students' performance. Notice that this program now doesn't care what is the value of n. n could be 5, n could be 10. Of course, if it is 5, we have a problem. We are printing 10 values here which is not correct. If n is less than 10, we really have a problem in this algorithm. We don't know what it will print. But the logic will work correct. This is where we worry about the efficiency of execution time of our programs. How long will this program take? As you will notice, the maximum time you will take in feeding the input because you will be typing roll number, name, roll number, name, sorry, marks, roll number, marks, roll number, marks, etc. And if there are 600 students, it will take quite some time. Compared to our input speed, the machine will be very fast. So even if the total execution time for the program is say 5 minutes, out of that 4 minutes and 59 seconds will be spent by us in giving input. Usually, that process is very clumsy, particularly when we are learning to write programs. What will happen? Suppose I ask you, test this program for the course and I give you marks for them, it's same for this class. So you will compile this program, run it. And suppose you have made a mistake like I had made. Instead of K, you had written I there. You input all the 600 values and then finally program gives you something very funny. So what will you do? You will of course correct the program. And after correcting the program and you compile it and again say dot slash a dot out, program will say give me input. Again 600 names. Now that is very painful. The operating system permits an extremely simple and elegant mechanism to handle this. It's called redirection. When I run my program a dot out or for that matter any program, the operating system actually opens what we call files. One is called STD in and the other is called STD out. We don't know what the files are. Assume that these are text files which contain some data. Ordinarily when you start the program execution, this is connected to your keyboard. So whatever keys you press here, that input is picked up. That is why when you say see in, actually the value comes from your keyboard. Similarly the operating system arranges such that this output is connected to your terminal. That is why whenever you say see out, the output comes here. However, the operating system also permits a very simple thing for you to do. If you do not want to give input from the keyboard, but you have already prepared a text file, just like you prepare your program files using g-edit. If you have prepared all input data in a text file, then you can tell the operating system, please execute this program, but don't expect me to do the Godagiri of putting in data. I have already done it once. These are 600 roll numbers and marks. Please read the data from that text file to say that the command means this is an arbitrary name. I have chosen this name because I am dealing with data related to roll number and marks. But this could be any name that you give. Consequently it is possible for you to separately use g-edit and write down the roll number and marks in one line after another exactly as you would have given it as input. As long as it resembles the input that you would be giving to that program, you then save this file with say .txt extension. Again it is not important that you save it with .txt. Your program files you save it as .cpp because that is what compiler understands and this is a standard convention. But when you give this command, all your see-in statements will not read data from here. They will all read it from this file. And this is the roll marks.txt file. You get the point? So that means even if you want to test your program repeatedly for very large data, you do not have to actually keep feeding the data every time you run the program. Once the program runs perfectly, you are sure. But generally you will have to run it 3-4 times. So this is one good way of doing that. Is that clear? I will of course include it in the notes. So now we worry about the efficiency of our program. We have not so far wondered about it. So let us examine. All our programs get executed in less than a second. So what if we handle a very large number of values? And our algorithms have many nested iterations. Remember the word nested iteration? We had seen that earlier. In the sorting algorithm itself, there is an outer iteration for k equal to 0 to n minus 1. And we have an inner iteration for i equal to k plus 1 to n minus 1. Efficiency is related to how much time the computer takes to do various computations. Unfortunately, it so happens that it takes a different amount of time to do addition, a different amount of time to compare two numbers, a different amount of time to assign a value, a different amount of time to do multiplication, a still considerably larger amount of time to do floating point multiplication, etc. But in general, if it has to do more operations, it will take more time. The individual operations could be done in as little as 20 nanoseconds or 40 nanoseconds or less than say 100 nanoseconds or 2 microseconds or 4 microseconds. We don't know that. Different computers will execute instructions at different speed. But the fact remains that if there are more instructions, then it will take longer. It is therefore important for us to be able to generally judge how long a program has taken. Our planning system generally provides that facility. The facility provided in Ubuntu, Linux, etc. or any Unix variant is actually called the time command. So if I say instead of dot slash 8 out, if I say dollar time dot slash 8 out, then the program will produce all the output that your C-out says and additionally it will produce some three lines like this. So it says real 0 minute 8.0 to 9 seconds, user 0 minute 2.00 seconds, sys 0 minutes 0.004 seconds. It actually gives you three different times. The machine processor has a clock and every time the operating system asks it to execute a program, the clock starts ticking. When you execute your program, it is not as if only the instructions in your program are executing inside. For example, when you say import, actually it calls an operating system function. In general, while your program is executing, the operating system may also be allocating CPU time to execute somebody else's program. During that time your instructions are not executing but your program is still ready to execute. For example, if your program is waiting for import. Now why should machine spend its time waiting for you? It may go and execute somebody else's program. That is why the real time that is spent is different from the time which is spent in executing your instruction. So there are three components accordingly that the operating system gives you. The real time is the total clock time that is spent. So this is the clock time here, 8.0 to 9 seconds, some program. The user time is the time that the computer has spent in executing user's program. So you are a.out, that is why I have shown it by this line here. That this is the time which is of consequence for comparison of our algorithms with one another. The system time is actually the system overhead time operating system spends in controlling, allocating, reallocating. We need not worry about. In general then when we execute a program with this command we note that it will produce all the output that is given by c.out. Additionally it will produce these three things. And when it produces these three things we have to look at only this. Now how do we use this to measure the time efficiency of our algorithm? As I said I can write the previous program give 600 data even in a file and I say okay, sort it. 600 students, roll number marks given in a text file not even given as input. Will that user time be correct representation of the sorting time? Actually not because maximum time is being spent on reading the data. Whether you give it as input in which case it will spend minutes or you give it as a file redirected in which case it will spend maybe a second. But the time spent in sorting the two nested loops will be very miniscule for numbers like 600 or so. So how do we get an idea of how long this sorting algorithm takes? How costly this sorting algorithm is? Sorting and searching are by far the most frequently used operations in large number of problems that we will solve using computers. In fact, Donald Kanooth one of the greatest programming experts in the world who started writing the book called Art of Computer Programming when he was 23 years old and he wrote some four volumes. He has a volume called Sorting and Searching. Those books are still considered as bibles in programming very difficult problems inspired by the fact that several solutions are given for those problems in the book. The problems are still considered difficult. Excellent book. So we come back to this sorting and searching as being an important thing and we want to find out what is the total number of additions, subtractions, multiplications, number of comparisons etc. are done. We can actually count it and do a mathematical analysis and represent such operations in terms of say if the size of the problem is n. What is the size of the problem? That means I am handling say n marks or n elements. Whatever be the n, 500, 5000, 5 million, whatever. Then I can count the number of operations in terms of this n. So I can say my program does 3n additions, my program does n square comparisons or n square, 2n square plus 3n minus 43 and half of this is the total number of comparisons. I can do a very detailed mathematical analysis. But in general as a thumb rule I want to find out what is the efficiency of my algorithm? What is the efficiency of my program? This time command is very useful. It will give me the sort of calendar time for this. Our objective is to write algorithm which will minimize the number of operations that we do. We shall be spending some more time on these later. But currently how do we estimate execution time for large data? Now I will say that I have a program which finds out the top 10 performers by sorting an array of 600 elements. Let me sort an array of 6000 elements. Let me sort an array of 60,000 elements. But you will have to write the data even in that file. Can you imagine typing 60,000 roll numbers and marks? Now we realize that our idea is to find out how long the machine takes to do this much of work. The actual marks and actual roll numbers are not important here. When that is important we will of course run our program. But if I want to test the execution time, can I not create artificial data? Now in general marks are between 0 to 100, 0 to 500, whatever. But if it is to be artificial, as long as they are distinct, I know that maximum finding will have to be done some work so I am comfortable with it. Similarly why should roll number be only 4 digit or 5 digit? So consequently if I decide that suppose I artificially generate data for 1 lakh students and I simply put roll number as 1, marks as 1, roll number as 2, marks as 2, roll number as 3, marks as 3, roll number as 1 lakh, marks as 1 lakh. 1 lakh marks would be nice to have but the point is we are not really interested in getting 10 top performers although that is what the logic we have written. We are interested in finding out that if we indeed load the machine with 1 lakh students data, how long does it take to roll? We then use this program for different values of n such that say n is 25,000, n is 50,000, n is 100,000. Why do we do that? We say if n is something then if we increase it to twice that much, how much longer does the program take to execute? If we further increase twice that much, how much longer the program takes to execute? So here is a program written for that. A very simple program. It defines roll number and marks array as 1 million elements each. It asks you to give you a number of students. It does a health check. It won't permit you to give a data for more than 1 million students. I mean a number more than 1 million. It is not going to ask you any data by the way. So whatever n you have given, it will generate the artificial data for n students. So what it will do? i equal to 0 to n. It is artificially saying roll i is i plus 1 and mark i is i plus 1. Agreed? No input is required. It will simply sequentially generate this much data. And now I give an output statement proceeding to sort. The sort statement is exactly what we did. In fact rest of the program is exactly same as we had earlier. There is no difference whatsoever. So I now say here top 5 performers just to save some space. These top 5 performers do not matter. Actually we know who the top 5 performers are. Can you tell me their roll numbers and marks? It depends on n. So suppose n is 25,000. The top performer will have roll number 25,000 and marks 25,000. Marks will be roll number 24,999. Marks will be 24,999. It is it. So it is not the actual results that are important. But now I am curious to know how long the machine takes to run this program for different values. I have called this sort underscore time underscore test. By the way all the programs I will be putting under the moodle so you will be able to look at them. I execute this program for 25,000 students. The top 5 performers exactly as you anticipated are like that. But after this it produces this output. Real 8.029 seconds, user 2 seconds, sys 0.004 seconds. Remember these were the figures I showed you earlier as a sample out. Next I execute this for 50,000 students and for 100,000 students. And I get these figures. Observe what has happened. We should look only at the user time. The other times are not important. This was taking 2 seconds. This is taking 7.9 seconds. Almost 8 seconds. So while I have doubled the number of students from 25,000 to 50,000, the time taken by my algorithm is almost quadrupled from 2 seconds to almost 8 seconds. Next when I execute it for 1 lakh seconds, again doubling the students from here, instead of taking 16 seconds it is taking 31.5 seconds, almost 32 seconds. I observe therefore that the execution time is increasing to the square of the increase in size, not proportional to the size. And can I justify that? Yes of course. For every n I am almost doing n square kind of work. So the time taken will be proportional to that effort. We are not going further into the details but this is just a mechanism for you to cross check what happens. Can I reduce the execution time? Well I notice that I don't do anything but the data that I have put in right now in this artificial data generation is that the largest value is at the bottom of the array. So you look at the Golagiri that my algorithm has to do. First it takes the 0th number and it finds the next number to be more than that. So it exchanges that. Then it finds the next number to be more than that because the largest number is at the bottom. So in every iteration, every time it makes a comparison it has to do an exchange. Keep pushing things up. Imagine if the original data was not that bad. In real life it will not be like that. In real life some maximum will be there, some second maximum will be here etc. Suppose I did exactly the Ulta. I put the maximum marks at the top. Second maximum at the next. Effectively I gave the algorithm a sorted array itself. Then it does appear that it will have to do less work because it does not have to keep exchanging things because it will find things at the proper place. So here is the variation. I create the artificial data by saying marks 0 is n, marks 1 is n minus 1 etc. in the exact Ulta order. The corresponding segment of my program will be written like this. i is equal to 0 to n minus 1. Roll i is i but marks i is n minus i. So you agree that this will generate data in descending order? In fact the original input data is already in sorted order. So if I was smart I will read only the first 10 elements and declare these are the top performers. The problem is my algorithm does not know that the data is sorted. I have no control over it. So I am just trying to illustrate two extreme cases. One in which the data is completely in unsorted order. The worst unsorted you can get. And another when the data is completely in sorted order. But my program will execute the algorithm. What will happen now to the execution time? Will it remain same? Will it reduce? Will it reduce at least slightly? Actually the fact is it will reduce slightly. It still has to do all those n square kind of comparisons. There is no escape from that. But the assignment that it was doing. Max equal to something. Every time it found out a new element which was larger than max. Those assignments will not have to be done anymore. Because there will be top element itself will be the largest. This is the execution time example. Observe that it now takes 1.612 seconds, 6.272 seconds and 25 seconds for 25,000, 50,000 and 1 lakh data value. I have shown here for comparison the value that you got earlier. So notice this. This is slightly less than this. That is slightly less than that. And that is slightly less than this. But the order of magnitude is still increasing in proportion to, not proportion to n. n becomes twice. The number becomes four times. So it is increasing in proportion of n square. As we shall later all see when we understand the notion of complexity of the algorithm. We will call this algorithm complex to be order n square. But we will not bother with it right now. You have understood this? How this works? Okay. So I am not leaving you. This class will be slightly longer. I propose to use extra time because I want to give an example of multi-precision arithmetic. That is an example which I will just illustrate how we handle things. We take about 10 minutes. Addition of multi-precision numbers. So we go back to our earlier assumption. Integer numbers we can be maximum 7 digits, 8 digits, 9 digits. Suppose I want to add 250 digit numbers. How will you do that? It is a requirement. I have 250 digit integers representing something. Or 2 or 3 or 4 or 5 or 10. And I want to add them and to get a 51 digit value. So let us say you are counting the national GDP of India in paisay. And you want to find out the cumulative national GDP for last 10 years expressed in paisay. I do not think integer variable will be sufficient there. What will you do? Yes. So if we want to overcome the limitation of representation of numbers and we want to handle larger precision, arrays will provide us a good idea. So by storing individual digits of a number in successive elements of an array. Arguably this is an underutilization of the capacity. I am using a 32 bit number to store only 1 digit. But that is what occurs to me naturally because that is what the easiest for me to write algorithms for. So suppose I did that, then I can represent high precision numbers in some arbitrary fashion. I also have to represent the number of digits in that number. So let us say I take the 0th element of the array to represent number of digits. And then I take successive elements num1, num2, etc., etc., to actually have individual digits. So this is one representation. There could be multiple representations. We shall ourselves see some alternative representation here. Now we can represent two such numbers. Let us say m and n. Assume that a number has a maximum of 99 digits. We can declare m100 and n100. Agreed? There are 100 elements. But now each one is not storing full number. It is storing only 1 digit. So if m is maximum 99 digit number, m0 will store that value 99 and the remaining 99 digits will be stored here. We are always talking about actually using number of digits which are much less than the maximum. This is the upper limit. If we do this, then we can write a program to add these two numbers and put the result in another array using exactly the same notation. How would I declare the array? I would declare the array as having 101 elements. Because these two numbers individually may be maximum 99 digits but when I add the result may be 100 digit number. So I will require 101 digits. If I did that, so this is the reason why r is one more than that. Later on we will argue that this is not an appropriate thing. It should be our responsibility, the user's responsibility to ensure that if I uniformly have a 100 element array to represent 99 digits, then even the results must remain within those bounds. Look at the normal integer variables in a computer. If two numbers are 32 bit long, the computer does not permit me to have their additional to be 33 bit long. It does not provide a longer data structure. It says your final value must also be within that limit. Another variation of the program that we write, we shall see how that is done. Each array, suppose the array of containing 10 numbers, A1, A2, like we have M100, int M100 does not contain 100 values. And each value comprises of 8 bytes. Suppose it's int. Four bytes. Sorry, four bytes. So the four bytes are assigned at the time we make array or at the time we make array. No, at the time we make array. When we declare an array, the memory locations are assigned. It means 400 bytes are assigned to M and 400 bytes are assigned to N, the moment I say int M100. Even if it's empty or having some value. It doesn't matter. It doesn't matter. That's a good point. Just like variables, when you say int max, int k, the locations are assigned whether you ever use that value in your program or not. Similarly, the allocation of memory is done for all variables and arrays at the time of compiling. So when you compile the program and the compiled program is loaded, those many memory locations are assigned separately. So going further, we now want to store something like this. We have two numbers. A four digit number which is 9521 and a six digit number which is 996357. If we follow our logic of representing these digits, then after reading this data into my arrays, I should have the following. M0 should be four because four is the number of digits in the first number. And the individual digits 9521 should be read in M1, M2, M3, M4. In exactly the same fashion, second number has six digits. So N0 should be set to six and N1, N2, N3, N4, N5, N6 should be set appropriate. When I say set, that means that my input statement should read data like this. Observe that I will not be giving this as a single value 9521 because I am handling a multiplication artificially represented arithmetic. I must give individual digits. So the input must be 9 space, 5 space, 2 space, 1. Although any individual element is perfectly capable of storing this small number, that is not what we are intending to do. We notice further that if the numbers are written like this, we don't add them like this. So we add them by shifting them to the right and justifying them to the right. So generally, if this is a smaller number, we will put it to the right and we will put 00 here. That is how we add, right? And what is the addition logic? You add 7 plus 1 becomes 8, 5 plus 2, 7, 5 plus 3, 8. 9 plus 6 is 15. So whenever number is greater than 10, we write only the last digit here and have a carry one. The carry is added to the next one. Now in general, a carry could come any time because I don't know what are the individual digits. So my logic will be that at any position, this is the beauty. Having used an array, I can vary the array elements by here 1, 2, 3, 4 like this. But at any one point, I will not only add the ith element of m to ith element of n, but I will also add carry to it. And that carry could be 0, carry could be 1. So that is the logic. The logic is therefore very straightforward. So this is the program segment to add numbers. The maximum number of digits in the final sum will be equal to the maximum number of digits. Sorry, maximum number of digits in two numbers is whichever is larger of m0 and n0. Observe that I use a square, a question mark operator to find out. If m0 is greater than n0, then m0 is assigned to max digits. Otherwise n0 is assigned to max digits. I start with carry equal to 0. And I assume that all these digits are in the rightmost position of the array. I create new arrays m dash and n dash for that. Because my array m and n had a wrong representation. They had representation where digits were to the leftmost corner. So I will create m dash and r dash which will push this to the rightmost corner. And therefore the logic of addition is I start from 99 and go up to 99 minus max digits plus 1. So I start from backside. Let's go back to the previous slide. Imagine that this is the last element, this is the last but one, last but one plus one, etc. So I start with here and come up to as many digits as I have. If I did that, this particular iteration will calculate r dash i plus 1 to be this. Why i plus 1? Because I have artificially assumed r to be one element larger than m and n. If r was not 101, it was also 100. I would have used the same i there. A variation of this program that you shall see on the model will do that. Now, if that value which I have calculated for the digit is greater than 9, I take the modulo 10 remainder and make carry is equal to 1. Otherwise I set carry to 0 and keep on doing this iteration. So this will actually calculate the values correctly. This is the explanation. If m 0 is greater than n 0, max digits is m 0. Else max digits is n 0. I will leave you with this idea. The entire program which does the computation of multiplicity and arithmetic, including some sample data, will be loaded onto the Moodle by this Friday. So over the weekend you can see those sample programs, study them and would be ready for the lab assignments next week. I think we will stop here. Any questions? Thank you.