 Good morning. We will continue our discussion in the mid-same paper. The fourth question, by the way I forgot to mention, on the third question I suggested that we could use a common mechanism for addition and subtraction, generalizing what Leela Swizala and my TA had suggested of perpetually borrowing and only carrying when necessary. There is another method akin to the what the binary system uses in computers called one's complement form of representing negative numbers. Its corresponding theme is nine's complement. So if you have a number which is let us say one, two, three and the number is negative, the nine's complement will be you subtract each digit from nine and write whatever is the result. That's called nine's complement. Somebody from amongst you had discovered this mechanism which was very nice. So in the final answers that we will put up on the web you will have all these alternative ways of doing high precision arithmetic, namely addition and subtraction. Needless to add the multiplication will still be a larger issue. So the final take away from that problem is that while we consider individual addition and subtraction of standard numbers to be a fixed time cost that is in a constant time you can add two numbers. But when the numbers become large precision then if they have n digits even the addition subtraction cost will be order n algorithm at least. We move over to the next problem where we had these two arrays A and B each has m and n elements respectively and all elements are integer. Incidentally one of the arrays has been sorted, array A has been sorted. So the question here is there may exist pairs of elements in two arrays. Each array if you sum up all the elements let's say first array sums up to someone, second array sums up to sum two and the sum one and sum two will be normally distinct. Now there might exist a pair of elements one in one array and the other in the other array such that if you swap them both the arrays total to the same value. You want to find out whether such elements exist and if they do you want to discover them and print them. That was the problem. So here is the sample answer to that question. I start with declaring two arrays A and B. I have a host of variables declared here. m and n for example are going to be the actual sizes of the two arrays. I and j are traditional indices that I use. I could actually get away with any names but these are traditional and if you have two arrays invariably I will be used as an index of the first array, j will be used as an index of the second array. That's just a convention. Similarly if you have two dimensional array I would be used to represent rows and j will be used to represent column index. Someone and sum two are the two sums that I will calculate. If is the difference between these sums we shall see why that difference is relevant and x and y are any two elements that I am looking at. I read the numbers in two arrays. I have simplified the for loop instead of putting an opening brass then putting the statement and then closing brass. Note that whenever there is a single statement in the scope of for iteration that statement can be written immediately following the closing bracket here. So this is a perfectly valid for iteration. This will execute m times reading the values of a 0, a 1, a 2 etc up to a m minus 1. Same thing about the b i. I now proceed to find this sum of each array all elements. So this is again a simple thing. I start with sum equal to 0, sum 1 equal to 0 and add up all elements of a, sum 1 plus equal to this. Similarly all elements of b sum 2 is equal to this. Now I calculate the difference between the two and I take the absolute value of the difference between the two. It so happens that if the difference is an odd number then I will never be able to find out two elements x and y as required. Is everybody convinced about this fact? No. Let us do some very quick arithmetic. Consider sum 1 minus sum 2. This is my difference. Since I am not sure which one is larger number, I am taking this to be the absolute value. Now look at it this way. When I shift an element x somewhere here and y somewhere here and I swap them effectively it means that from someone I am subtracting x and adding y. In exactly the same way from sum 2 I am subtracting y and adding x. Do you agree that this is what would happen to the sums if I swap these numbers? Now finally I want both these totals to be equal after the swap which means this value should be equal to this. Forget the sign now. In which case if I transfer terms on both sides I get sum 1 minus sum 2. I move sum 2 to this side and I move this to this side. So I will get this equal to 2x minus 2y which is 2 into x minus y. Now x minus y can be any value all or even but because there is a multiplication factor of 2 the difference must be even. If the difference is not even then such pair x and y cannot exist which after swapping will result in the constant sum. Although for solving the problem this observation is not necessary but if you do that this might reduce a significant amount of computations unnecessarily in case the arrays are not compliant with the requirement. So all that I am doing here is finding out the absolute difference and finding out whether the modulo 2 value of the difference that remained there is 0. Note that difference modulo 2 if it is equal to 0 it means that the difference is even. If it were odd I would get 1 as the remain. So this is the message that I give difference of sums is not even desired elements do not exist and return 1 end of the matter. Now I want to do the following things. I want to locate the elements meeting my criteria. So if x is an element of A and y is an element of B then we need to find x and B such that sum 1 plus y is equal to sum 2 plus x. Is that correct? No. Nobody is looking at the equations carefully. When I swap x and y, x has to be removed from array A and y has to be removed from array B. This was a deliberate mistake in the command to find out whether you are alert. You are awake alright but you are not alert. I just wrote the correct equation. Sum 1 minus x plus y should be equal to sum 2. Yeah that is right. Essentially for sum j we are looking at B j and then within that for sum i we must have a i equal to sum i minus sum 2 plus B j. Well these two sums when I swap this sum will have to be written correctly. I will ask a question as to what should be the correct equation when we look at the program. The program itself is pretty simple and that is the whole point. I start with some element B j. So notice that I am moving j from 0 to n minus 1. It automatically traverses the entire array B. Then I calculate x as equal to sum 1 minus sum 2 plus B j. I want to find the desired value x in my equation. So subtracted B j from here for i equal to 0 i less than m i plus plus i check if a i is equal to x. So I search in the second array and if I find I print a i and B j. If I don't find it since array A is sorted if I miss finding it it is unlikely that the next higher element will ever meet the criteria. So the point here is I can terminate the inner loop because I am looking at this sorted array. If I don't find an element at a certain point beyond that I am unlikely to find. So else I abandon the search and I close the loop. Is this program correct? Ok. Write the correct program. Has everybody figured out that this program is wrong? Ok. So let me ask somebody to say why it is wrong. That person down there. Sir the first time it goes in the loop with a i equal to 0 it checks if a 0 is equal to x. If the condition is true it prints it otherwise it always goes in the L's and breaks. So it will never check anything except a 1, a 2, a 3, anything except a 0. Always it will only check a 0 if it is true otherwise it will break the loop. That is one mistake but there is a major mistake in the equation that I am checking itself. Equation itself is wrong because it should be someone minus some 2. Yes what is the minus 2 y whole divided by 2. You are transferring the values on either side. Anyway this mistake you will realize is the direct consequence of the previous conceptual mistake which is written here. Please note this particular mistake is actually insulted by design and the purpose is to demonstrate to you that even though I may make such mistakes but if I have written comments here and then try to implement those comments into logic then there is a chance that later on when I read the English comments I will be able to find the mistake or somebody else who might read those comments might find the mistake. Therefore whenever you have any semblance of complex logic implemented in your program do not forget to rewrite it in plain English words and put those words in comments. This is called inline documentation of your algorithms. You might ordinarily for large software be required to write not comments but commentary huge number of English language pages describing data structures, algorithms, everything. But within even a single program or a function it is advisable to write such comments which will help you in identifying an error in logic implementation. You will notice that whatever is written here is correctly implemented in so far as the equation is concerned. However he has correctly pointed out an additional problem here that I will not do anything worthwhile in this. This need to be corrected. I would strongly suggest each one of you at least those who did not get the answer right in the exam should actually try to write down on a notebook the correct answer to this before I put up the final answers on Monday on the website. Question 4b was relatively simple. It is stated that if m is equal to l then what is the time complexity of the algorithm that you have used in the solution. So in our program we have an outer iteration executing n times every time considering one value from b. For each outer iteration we have an inner iteration also executing n times checking every value in a. Somebody may argue that the second iteration may not execute n times and you would be right. Sometimes you will find the element within the first few elements. Sometimes you may still have to search at the end. Since array a is not sorted sorry array b is not sorted you might find initially small element larger element smaller element you do not have any clue on an average therefore the second iteration may execute on an average n by 2 times even using this. However please note that n by 2, n by 4, n by a are all order n because as n tends to infinity that factor does not really matter. So any constant factor multiplied by n is still order n. Consequently if you consider both the iterations together the complexity of this algorithm is this correct? What should it be? Order n square order n is the inner algorithm inner iteration order n is the outer iteration. For each outer iteration for each outer iteration inner iteration runs completely order n time. So complexities order n square this could be excessively costly and that is the reason why it was stated in the question that one of the arrays is sorted. If the sorted array is used in the inner loop as we have tried to do this in example then it is possible to search the inner array using binary search. If you use binary search for the inner array then the inner iteration will execute only log n to the base 2 times. We have seen that in the binary search. A substantial decrease in the number of computations required. You do not see that when you have 8 or 10 or 20 elements but when you have 1 million elements log 1 million to the base 2 is substantially less than even a 1 million by 2 or 1 million by 4. And each of these executions will happen n times where outer element outer array could also be a million strong. So these things are important. I am very glad to observe that at least about 8 or 9 students from the class have actually tried to implement binary search for the array recognizing that it will reduce the amount of time. I compliment them however there is no extra credit because the question did not demand that you write an order n log n algorithm. But I have asked my peers to find out all those who have attempted binary search at least I will have the pleasure of putting their names on the website when we declare all the marks. So this problem is clear? No what should be clear is you have still to do some homework these slides will go on to the web but you have to write the correct algorithm. All of those who did not get the answer right must on your own using these hints correct this algorithm so that you practice writing the logic correctly. Please do not forget to modify the comments also to reflect the correct logic that you will use. These are not small matters they are very very important steps in writing good programs. Writing comments describing the algorithm and then writing the algorithm. We now move on to question 5. This was about images and histograms. As I had mentioned once in the class and as you would have tried to figure out from the exam question that images are typically stored as large number of pixels or picture elements. Each picture element represents a tonal value or luminescence value or intensity of light that is reflected from that point. In case of white and black images or they are called grayscale images the values range from 0 to 255. 255 is an arbitrary number later on when we consider the computer hardware we will notice that 255 happens to be the largest number that you can represent using 8 bits because 2 to the power 8 is 256 and 0 to 255 is the full range so that is the smallest unit of memory computer's memory which you can use to represent a whole number between 0 to 255. Consequently while you take an analog image and try to digitize it you can digitize any intensity value of a picture point to any number of arbitrary units but 0 to 255 has become a standard. Another reason why this standard holds is that human eye is unable to distinguish even if I sample the value at greater granularity. So black and white pictures typically are represented where each point represents luminescence value between 0 to 255. If your color images light becomes more complicated it has been simplified by saying that each point in a color image will consist of three different components red, blue and green RGB and you can have 0 to 255 values for each of these components. So you have totally a 24 bits or 3 byte representation as we call it and so on. There are many things in digital image representation in processing. Here however we are trying to look at the notion of histogram and I want to discuss the motivation why histograms are important by giving a rudimentary example of picture processing. So each element in an image array let us say 500 by 500 matrix would contain a value as above and histogram is nothing but it indicates how many pixels have the same value. So suppose there is one pixel which has a value 138. There is another pixel somewhere later which has a value 138. There are eight such pixels then the histogram value at 0.138 will be 8. Let us just look at a simple example here to illustrate this point and we will also discuss how we can calculate the histogram. I have written here a 4 by 4 image, 4 pixels. One pixel has a value 3, another 2, third 1, 4, fourth 1, 3 in the second row 1, 2, 18, 21 in the third row 3, 4, 2, 1 in the fourth row 2, 3, 0, 0. Please note that in general you cannot define ahead of time what would be the minimum and maximum value of a picture of any picture. But we know that if we have done this kind of digitization then the lowest value will be 0 and highest value will be 255. Therefore it is prudent to define a matrix H which is the histogram matrix which has 256 elements. The first element is 0, 1, 2, 3, 4, 255. Many of you have not done that because in the example I had said that I have a picture where the minimum pixel value is this and maximum pixel value is this. Technically while answering that question somebody deciding on a smaller array may be technically considered ok. But recall that your program is likely to be used to calculate histogram for any picture. Are you going to change your program every time a new picture comes? This is the first important notion that you must remember generalize your program to work on as many different problems of the same kind as it can. And that is the reason why a histogram array should always be defined as having 256 elements 0 to 255. How will you calculate the histogram values? Let us see what is required to be done. Consider the first pixel 3. Please note that this array is 4 by 4 and the array indices here have absolutely nothing to do with the indices of the histogram array. The histogram index here represents the value at which you are determining the number of pixels. Consider this for example. How many pixels with 3 do you see here? 1, 2 and 4. That means at the level of h3 you should have an entry here 4. Agreed? How many elements do you see which are value 2, 1, 2, 3, 4? How many elements with value 4, 1, 2? How many elements with value 1, 1, 2? How many elements with value 18, only 1? Somewhere here. How many elements with value 21, only 1? Somewhere here. All these others will be 0 except 0. You have 2 values which are values. So this will be 2. Is this clear? What is the final result that you want? The issue is how do you get it in the most simplest fashion? The best way is to treat every element of array h as if it is a sum variable where you are accumulating the sum. Initially I set all these values to 0. I simply start scanning the array. If I get a pixel value which is 3, I know that the element of h with index equal to 3 must be incremented by 1. So I go to the third element directly here and add 1 to it. I look at 2. I go to the second element, add 1 to it. I look at 4. I do the same thing with 4. Now I again get 3. I simply go over to the third element, add 1 more. Please note an extremely easy mechanism is suggested because of the manipulation that is so required. I use the picture value as an index to the array and increment the array value, increment the array element and I will automatically get the histogram when I have completed the scanning of entire picture. Any other way of doing it could be very complex and complicated. Now you have to figure this logic out that in the histogram the element actually represents the picture intensity value and therefore the picture intensity value should be used to represent the index of this array and since you have to accumulate the total number of such pixels at every intensity value it is best to start with all values 0 within this array and keep increment. That is the crux of the solution that is presented next. Just wanted to show you why do we need histogram? We will get back to this problem solution in another 3-4 minutes. This is a sample image. It is an 8 by 8 image. So big square that you see here is one pixel. This is another pixel. This is another pixel. You cannot see the difference in the shares here but when you see it on the screen using the slide you will see that each pixel has different value of gray shade. None of these pixels is pure white, none of these pixels is pure black. So there is no element which is 0, no element which is 255. The elements are all corrupt together in a limited range just as you had seen in the question. These are the pixel values of the sample image. This is an 8 by 8 matrix. So there are 64 picture elements and these are the sample values. Notice that there is a minimum value which is somewhere around 52 or 51. So the values 0, 1, 2, 3, 4 all are missing. Similarly there are no values 255, 254, 253. Consequently you will see that this image appears slightly blurred because the values are very close to each other. Ideally I would like to enhance the contrast of this image so that I can more clearly see the content. For that purpose there is an algorithm which uses histogram. So these are the pixel values. These are the histogram values. At pixel value 52 there is one element, 55, 3 elements, 58, 2 elements, 59, 3 elements, etcetera for each value. I have written here only values which are not 0 because at all other places the values are 0. There is something called cumulative distribution function. So you have histogram which gives the number of points at each pixel value. The cumulative distribution function says for any given value how many elements of the picture are at that value or below it. So you accumulate all of them. Notice that at 52 there is one element. At 55 there were three elements. So in the cumulative distribution these three elements plus the element prior to that one that means I have four elements in the cumulative distribution. At 58 I have 6, 59 I have 9. So the cumulative distribution will simply keep a record of how many pixels are below that value. Notice that the largest value is 154 and it correctly says 64 because 64 is the total number of elements in the picture that you have. You cannot have anything more than that. What is the value of this cumulative distribution function? Firstly I hope you can easily calculate the cumulative distribution function if you have calculated the histogram. It is merely adding the previous numbers and putting the sum next. But this cumulative distribution function helps us in what we call equalization of histograms. The histogram equalization has this formula. I will not go through this formula. I have included it here so that those of you are interested can read the complete Wikipedia article. You do not have to search many places and it clearly explains what equalization is. For our example image which has 64 pixels so you will get m into n as 64, 8 into 8 and minus cdf min, you will get 63 here. This is the formula for equalization. What happens after equalization? After equalization the values which are limited get stretched. So you have nothing at 0, nothing at 1, nothing at 2. You have nothing at 200, nothing at 255. This entire thing gets stretched and that if you map back onto the pixel values you will immediately get a better contrast image. In fact what is shown on the left hand side is the mapping on the final pixel value. If I do this, the cdf of 78 is 46. How it is calculated? I take 78 and do this round, round this operation which is described. Again I am not going through this equation because that is not our concern here. Here the concern is to show how exactly histogram equalization helps. The point is after doing this histogram equalization I will get pixel values in this fashion. Notice that the first pixel value which was minimum was 52. It has become 0 now. The largest pixel value was some 100 and something it has become 255. So I have stretched the entire histogram. Consequently the image that I will see will be like this. Contrast it with the original image. This was the original image. This is the new image. Can you see the difference? All that has been achieved by a simple manipulation of histogram equalization. Just as a better example, I have this image. This is a grayscale image of a landscape. Again let us say you have captured this photograph early morning or late evening where everything looks of the similar luminescence. You cannot identify individual aspects. However if you take this large image, push it through the pixel value digitization, do the histogram calculation, do the cumulative distribution calculation, do the histogram equalization and recalculate pixels you will get this image. First this is the histogram and CDF. So notice that the histogram for this image, the red color thing is the histogram. All the pixels are concentrated somewhere between 120 to 200. And the line that you see, the black line, this black line is the cumulative distribution. When you equalize the histogram, the histogram sort of spreads and the cumulative distribution function becomes linear. That is all it does. But what it does to the image is really remarkable. This is the image that you get after this kind of simple evaluation. So contrast this image with the original. Can you see the difference? Now this is a very simple example but you can do variety of infinite things. And by the way, most of the digital camera hardware today is capable of doing histogram equalization on the fly. Because you have hardware implementing exactly the same complicated logic. And of course, utilities like Photoshop and whatever can do much more than this. But then that is the matter of image processing and some advanced subjects, those of you are interested can see the maths. The point here is, this is the motivation to be able to calculate histogram, calculate cumulative values etcetera and do this equalization. In the Midsum paper, the only question that was asked was to calculate the histogram. And these are the statements I have imagined 500 by 500. I have a histogram array which is 256. I am using a file called image dot text to read the image. This way I can create files, give it the same name and redirect the input so that automatically the program can read from that input. I will not go through the reading of this. It is very straightforward. If the actual size is npx by npx, I am going to read npx by npx elements in image ij. What is important is, of course this is if I am unable to open the file etcetera, this is the standard paraphernalia. Now I want to calculate histogram. As we saw the logic, I start by setting histogram counts to 0. Then I calculate the histogram values. Notice the crux of the computation is in this single statement. Histogram at image i comma j plus histogram values are all set to 0. Image i comma j is the pixel value which is going to be the index which I have to increment in the histogram array. So a single statement can do that. It is an extremely efficient way and simple way of calculating such arrays by the way are called associative arrays. We shall later on talk about them. So this is the end of it. You can print the histogram and get the problem there. You have to find the maximum value in the histogram. Well, you just this maximum calculation is routine. You have done that umpteen number of times. The only thing to remember is that the maximum value may not occur only at one pixel. There may be many pixels having the maximum value. So to print all of them, you have to calculate maximum separately and after calculating maximum, you have to go through that array once again, i equal to 0 to 256. And if any element of histogram array happens to be at max, you have to print the index at which that happens. So all the elements at which the maximum value occurs will get found out by this. Ignore this image. This I have just used to show because some batches will be doing projects on fingerprint analysis to help the national unique identity project that is that the nation has undertaken. We shall talk about that later. Okay. Thank you so much.