 everybody. I hope today all centers are connected and any glitches have been removed. I have combined all the three sessions of the day today. As you would have noticed from the modified schedule, we are replacing the keynote address on geographical information systems by a talk that I will give on image analysis. The idea here is to look at a unknown domain, look at the problems of that domain and try to see how programming can be applied to solve those problems. So, the day is devoted to workshop projects. To begin with, I would like to comment upon the software development activity and how to expose our students who will study the subject of programming to some professional teamwork. In the context of the teamwork, we will be defining some workshop projects and as an example, I will be discussing representation and analysis of digital images. Several slides on this are based on the material available on Wikipedia. Another example of great open source content initiative, incidentally those of you who have not visited the Wikipedia site, I would seriously suggest that you do so because the amount of good technical information that is available on Wikipedia is unparalleled. It would be very difficult to compile that information from different textbooks and research papers. In a very concise form, you get very useful information. Lastly, we will define the workshop projects and the afternoon lab exercises. This material will take about 2 hours. So, we shall have the tea break after 2 hours and we will not have the customary 5 minute break in between. Instead of that, after an hour, I will go over to different centers very quickly to have a 10 minute interaction. The last lecture of the morning session will be taken by Professor Kannan Malkalya on SciLab which is a package for scientific computations paralleling the popular commercial package MATLAB. The purpose here is to indicate how such packages could be used for solving complex problems, numerical computation problems from different domains. And afternoon, you will have the lab for which the lab exercise will be uploaded at around 12.30 today noon on both the Moodle and will be sent by mail. Now, most of the teachers who teach computer programming would be aware that the purpose of programming is to develop applications to solve real world problems. Sometimes, when we teach programming in an introductory course, we are so much inundated by small details that when we discuss expressions, assignment, conditional iteration, functions, we tend to do these discussions in an isolated fashion. I personally believe that each of these discussions must be related to some actual problem solving however small that problem may be. Additionally, it must be emphasized to our students that the entire purpose of writing programs is to solve some real world problems. Now, the real world problems require software development of a much larger and complex magnitude. For example, typical software even what you call a trivial application would have 5000 lines of code. Very complex applications would have hundreds of thousands of lines of code, sometimes exceeding a million. It is not uncommon today to know of applications which require over 10 million lines of code. Very clearly, such a effort cannot be done by an individual. In fact, larger software cannot be developed even by small teams. The question now is that when we teach a first course in programming, how do we expose our students to the fact that in their real life they will either depend upon or they will themselves have to carry out such team activity in order to develop meaningful software. Incidentally, when programming started happening in this world around 50s, the principles of applying engineering techniques and methodologies to software development were not known. It was considered an art in the early days. Over the last several decades, the notion of software engineering has evolved considerably and it is an independent discipline in its own right. A founding discipline as a matter of fact which provides the entire set of methodology approaches, design techniques, testing techniques, tools, etc. for proper and professional software development. Software engineering typically involves study of how problem is to be defined, how a problem is to be analyzed, how software architecture is to be designed, how detailed design at the level of modules is to be carried out, how coding should be done. Even writing C programs for example requires certain coding style and certain coding standards to be added to. Very rarely we are able to emphasize these during the first course in programming. However, it is absolutely mandatory that our students understand some of these professional practices. So there is of course no need to define software engineering per se and define all the principles of software engineering. However, while we teach the subject it is important to illustrate the basic principles of software engineering through the practices that we ask our students to follow. In this context I would like to illustrate the organization of our course CS 101 at IIT Bombay. This is the first course in programming, first subject in programming. As I mentioned earlier, this is taught to all first year students and about 830 students were there in my class last year. The schedule is very different from the normal schedule that you people would have in your colleges. We have only two lecture hours and two lab hours per week. There is an extra lecture slot per week for tutorials or for occasional make up lectures but essentially we have about 30 lecture hours in the entire semester. The labs are held in the evenings and nights because of the non-availability of the infrastructure to cater to large number of students during afternoons. More interestingly the evaluation pattern on the basis of which each student gets marks and finally the grade is permitted to be defined differently by every instructor in the institute setup. The senate merely defines certain norms. The evaluation pattern which I used for example had 20 percent weightage for the mid semester examination, 35 percent weightage for the end semester examination, quiz and lab assignments had 10 percent weightage and group project had 25 percent weightage. Additionally there was another group activity. The group project as you can see had 25 percent weightage and other group activity which consisted of setting quiz and exam questions with answers had 10 percent weightage. So as much as 35 percent weightage in the entire course evaluation for subject evaluation for each student was given to group activities. Most of us at IIT Bombay believe that such group projects are important and it is not uncommon for many subjects taught by many instructors across different disciplines to include such group projects or team projects in the subjects that they teach. I would seriously request all teaching teacher colleagues to consider introducing such notion because not only in programming but in general in any science and engineering endeavour real life problems require team efforts to solve and therefore introduction to working in teams is we believe very important. Now obviously your own evaluation pattern will be decided by the universities or the similar structures but within that I would submit you can always find some activity or little encouragement through some marks given may be part of the internal evaluation for such group activity. I will restate that it is not the number of marks which are relevant but that there is some recognition to team effort that is given in terms of marks is often enough to casual our students to seriously undertake team activities. Before I explain the group project concept in CS 101 because that is something that we are going to do let me also comment on the last point of 10 marks setting quiz and exam questions with answers. What we do is that the teams that we form then not only carry out project activity but they also set questions for quizzes which are multiple choice questions and questions for exam papers and they also submit sample answers for these questions. This activity based on their qualitative inputs gives them about 10 marks for the questions setting. Let me describe the group project here. These are essentially carried out by a complete lab batch as a whole. So a lab batch in our case has 20 to 25 students. I will believe that even in your own colleges when you teach this subject it will be possible to make large groups of 20 to 25 students. Each lab batch in our case had 5 teams of 4 members each. Of course if the number of students were more some teams had up to 5 students. Now this lab batch took up a project which was assigned to them. They did the analysis, they submitted a project report, they did the complete software development and testing by the way. They wrote programs in CC++ as is the part of the syllabus in CS 101 here. Your students in the first offering can do C programming exclusively because all that is being done in C++ can always be done in C. Now there were 25 marks for this group activity or the group project that they did. Out of these 15 marks were awarded based on the quality of the project report that they submitted. So this project report was assessed by our teaching assistants. The report contained problem definition, their approach, the complete code listing and the results if any of the successful code that they have developed. So based on this marks out of 15 were allocated. In general the marks varied between somewhere around 9 marks to some projects even got 15 on 15. But that was only one part of the evaluation. A very interesting part of the group activity evaluation was marks out of 10 to each student were awarded based on a peer review process. Incidentally because people worked in a group the entire lab batch each student of the lab batch was given the same marks out of 15 that the lab batch report got. So as a result whether I did far more work than other members of the team or I did far less work than the other members of the team I would get the same 9 marks, 12 marks or 15 marks that everybody has got. I personally believe it is important to emphasize to students that when they participate in a group activity independent of their own individual contribution they get the benefit rewards or punishment of whatever the group does and that purpose was solved here through this approach. Marks out of 10 are to be given for participation by each student. This participation was both qualitative and quantitative. Some people contributed great ideas. Some people cracked complex algorithms. Some people did mundane work of report writing. Some people kept the organizational structure of the team intact and so on. So what we did is instead of we evaluating the students through Viva which is what is often done we asked the lab group itself to evaluate each student and assign marks out of 10. Each student was required to maintain an activity diary. Each lab batch at the end of the project had a meeting where team leaders indicated marks to be given to the individual students based on the activity that each individual student has contributed to the project. Now these marks were debated within the batch and finalized. This is something very interesting and often not attempted in other places. I have been experimenting with this for last 5 years at my post graduate courses but this was the first time I did it at the first year course and let me tell you that the peer review process and the marks allocation was done with great responsibility by the students. The average marks given to the students out of 10 were 7.8. What it meant is some students got 5 marks, some students got 10 marks. I am tempted to compare the normal process of allocating marks by conducting Viva. When I conduct a Viva examination of a student for 5 to 10 minutes depending upon the particular question that gets asked and whether that student is able to answer that question on his or her feet at that moment or not marks get allocated. Whereas for a team work, the student has worked for over a month and a half, typically the team projects last for one and a half month that is half the semester. That kind of work is extremely difficult to be assessed by a teacher or a teaching assistant or a staff who has not seen the actual work done by the student. So several students can get away without doing much and still score marks. In the peer review process however, they get exposed because their own peers who have seen them working or not working have to allocate marks. I must proudly tell you that there were several students who were given zero marks by their peers and these zero marks were admitted to be correct by the concerned student. So this process I would submit, some of you at least might want to examine to put this process in place in your own subjects. Of course, there was a caveat and that is indicated in the last paragraph here. The caveat is that I have the right to take Viva sample students. So I can choose two, three, four or one sample student from every batch and conduct a rigorous Viva. Of course, examining the student's activity diary and taking input from the lab coordinator, etcetera, etcetera. However, the caveat is as followed. If the student has been given say eight marks in the peer review process and I determine that the student's work is worth only three marks, then not only that student gets five marks less, but everybody in the entire lab batch gets five marks less. Perhaps this caveat was a democulate sword hanging on the entire lab which prevented people from arbitrarily assigning ten out of ten, nine out of ten to every student. Indeed, whatever be the reason whether this caveat or the pressure of the peer review process or the combination of both, but what has resulted in is I believe an extremely fair evaluation of individual contribution to the work. Incidentally, there is another advantage of introducing this peer review process. Later on in the professional life, our science and engineering students will be constantly evaluated by their peers only. They will not have conventional examination where individual work can be submitted and evaluated. It is always the group effort and it is your individual contribution as assessed by the group which becomes important and therefore this could be a very useful methodology. Now, we will illustrate a sample complex problem from an unknown domain. The suggestion here is that this domain and other domains and varied domains, the domains could be let us say design of a mechanical engineering component or it could be let us say a control system design. It could be analysis of structures, whatever whatever, but a larger real world problem could be considered by faculty members to be taken as the base for a project. There could be multiple such domains from which multiple project problems could be defined and given to the participating students of this subject to solve. Of course, they will not be able to solve very large problems accurately and care must be taken to define a team project which is possible to be done by those students. So consequently, if a particular problem in a domain requires a software of say 20,000 or 40,000 lines of code, obviously that cannot be done by first year students. So what we do is we define that particular project domain. We ask students to study that domain in general and then identify a very small subset problem which they can solve by programming during the subject study. So in the remainder part of my talk, I will speak about what I call the image analysis as a domain and I will illustrate the image analysis portion that can be done by our students using an example of histogram and histogram equalization. Subsequently, in the morning session, as I mentioned earlier, my colleague Professor Kannan Mughalya will give a talk on Sylab. Incidentally, this discussion on images and histograms is almost entirely based on the material from Wikipedia and of course several other textbooks on image analysis. The point to be made here is that as a teacher, if I want to give a meaningful project to my subject students, then I will have to do about three times the work that the students will be required to do because I will have to study that domain extensively. If I am not an expert in that, I will have to take help from some other colleague of some other discipline to explain to me what the general ideas are, convert essential small problems of that domain into programming exercises which can be done by a team. So, here is an example from the domain of digital images. Some of you might be familiar with image analysis and therefore this material might appear rather preliminary to you but I assure you that the problem of image analysis contains very complex issues and actually if you want to solve this problem, there is a huge amount of software including commercial packages and analytical tools which are available. My ambition here is to expose all those participating teachers who have perhaps never seen a digital image or a digital image analysis to understand what the basics of image analysis is. To begin with, we are familiar with images in the form of analog photographs that we see. However, when we want to represent images inside a digital computer, these images have to be broken down into a set of points. Usually, each point is known as a pixel or picture element. A matrix of such points actually forms the entire image. So, these pixels are arranged in a matrix form. Each pixel or a picture element actually is associated with the tonal value or intensity value of light. That is how you recognize all the points combined to represent an image. If you have color images, then each point has three distinct levels of intensity associated with red, blue and green which are called the primary colors. If you have black and white images, then you have a gray scale defined which has intensity values between 0 to 255 typically. So, each picture element in a large picture will have an associated intensity value or a tonal value which will range between 0 to 255 for black and white images. Typically, 0 is used to represent black value and 255 is used to represent white value. Between them are various levels of gray. We shall see some sample images in this session to understand what exactly we mean by these gray scales. Now, if we have a digital image of a certain size, the size of a photograph incidentally is always measured in terms of so many inches by so many inches or so many centimeters by so many centimeters which is typically height and width of that image. When we divide that image into points on the analog side, these are measured as points per inch or PPI which is the resolution of the analog image. Each point is then converted into a pixel value by digitization process and what you have effectively then is an array of these pixel elements. As a sample I have said that suppose I have an array called image 500 by 500 then this matrix would contain a value each element of this matrix would contain a value as above. Of course, the pictures not all pictures will have 500 by 500 elements or pixels. Let us say a height of one picture is 200 and the width is 100 then the actual practice the number of pixels in that image will be 200 into 100. But we always define a larger array to accommodate different sizes of pictures. Now, there is a notion of a histogram. A histogram of an image tells us how many pixels in that image have the same intensity value. Let me illustrate this by an example. Here is a sample image which is 8 pixels by 8 pixels. So, if you will see for example, each square inch thing here is a picture element or pixel. This is almost white and this is almost black. In between you will see various gray scales. The pixel value at each of these points would vary between 0 to 255. Since we do not have an absolute black or an absolute white in this particular sample image, you will notice that the lowest pixel value will not be 0 but may be 40, 50, 55, 60 or something. Similarly, the highest pixel value will not be 255 which is what will be the representation of pure white but it could be 170, 180 or whatever it is. Notice therefore that the contrast in this image or availability to distinguish different pixels is limited because pixels are all within a very narrow range of pixel values. This is how many images will be and the histogram is a notion which merely tells us how many pixel values have the same value in that image. For example, you will notice that these pixels have the same value. Therefore, whatever is the pixel value let us say 58 or something, there will be 2, 3, 4 pixels at that pixel value. Let me illustrate the notion of pixel values and the notion of what you call the cumulative function by showing this. The sample image which you had seen has actually these pixel values. Notice that the smallest pixel value appears to be 52 and the largest pixel value appears to be 155 perhaps. It does not matter what the actual pixel values are but this will roughly be what you will get in a black and white image. Of course, the actual image will not be an 8 by 8 image. It will be maybe 256 by 256 or even 500 by 500 image. The illustration that we are doing here is that let me go back to the previous slide. If this is the actual image that I have then after digitization the grayscale pixel values which I get I can read them from an input device and put them into an array like this. Once I have defined an array this becomes the picture array or picture element array for my purpose. Also notice that since each pixel value is limited to be in the range between 0 to 255 it can actually be represented by a number which can be contained in a single byte because a single byte consisting of 8 bits can store values between 0 to 255 if they store unsigned integer. Actually as a matter of fact grayscale images in digital images in computers are represented by unsigned care, a care being exactly equivalent of an unsigned int. We will discuss these things of course the data types and other things in details in later sessions. So given these pixel values we now count the histogram. The histogram as I mentioned finds out that at a particular pixel value how many elements are there. Notice that at 52 there is only one element at 55 there are three elements at 58 there are two elements. There is no element at pixel value 53 there is no element at pixel value 54 there is no element at pixel value 0. Consequently the histogram values which actually will be a table defining the number of pixels at each of the 0 to 255 rail levels some of these rail levels will have 0 elements and therefore they are not depicted in this illustration. To make my point once again let us go back to the previous array which define the pixel values. Notice that there is exactly one element with 52 value here. Notice that the value of 55 occurs in one pixel here in other pixel here and probably in one more pixel here. So there are three pixels which have the value of intensity 55 there is one pixel which has the value of intensity 52. This is what is reflected in the histogram. So there are three pixels with value 55 there are two with 58 one with 52 and so on. The largest pixel value observed is 154 there is only one element. What is the importance of histogram? It so happens that if I want to improve the contrast of an image the contrast of an image is often defined by the nature of the histogram. If the histogram is limited to a small range of pixels as this one for example I will have limited contrast. If I can stretch the histogram to represent all possible values between 0 to 255 then the contrast improves. We shall see that through a illustration shortly. Here is the cumulative distribution function which is defined as the summation of px of j or pixel values at j for j equal to 0 to i. Speaking in plain English the CDF is simply the count of pixels having pixel value less than or equal to j. So consider this particular image that we have the cumulative distribution function or CDF is calculated as follows. Up to the value of 52 there is only one pixel obviously because there is no pixel at value less than 52 and there was one at 52. There were three pixels at 55. So the number of pixels having an intensity of 55 or less will be 1 plus 3 which is 4. Similarly, there were two pixels at 58. So number of pixels at or below 58 value are 6. In short this is nothing but a cumulative count of individual pixels that occur at different pixel values. What is the importance of this cumulative distribution function? This cumulative distribution function is used to normalize the histograms. Let us see some mathematical definitions here. This is the histogram equalization formula. Now this formula may appear very complex and certainly this is not the place to discuss either its derivation or its implications or how exactly various terms come in there. Those of you are interested can look at this slide in greater details or look at the original article in Wikipedia itself to understand. It is not very difficult by the way and in fact most of the first year students based on the simple maths that they simple arithmetic and algebra that they understand they are able to figure this out. Essentially what equalization does is that it stretches the intensity values which are limited in an original image to something like 0 to 255. For example, consider that the CDF of pixel value 78 is 46. Consequently 78 will become 182. I am sorry if 182 is not visible here that is what is stretching in terms of histogram equalization. Leaving behind this complex mathematical expressions let us simply go back to see what are the results of such a histogram equalization. If I apply the histogram equalization formula which is by the way a simple computational exercise which any first year student can do then the image matrix that I will get will be completely different. The original image matrix if you will recall had pixel values limited to a range of 52 to about 182. However in the stretched histogram or equalized histogram image that I now get I have a value as well as 0 and I have a value as I have 219. What it means is that the image values have been stretched in their intensity. Consequently the picture that I see now will be far clearer because it will have what we call better contrast. If we do this analysis through the computational means that I have just illustrated the picture that I will get will be like this. You can see that the contrast has been enhanced compare this once again with the original picture. The original picture is this and the new picture is this. You will agree that the new picture is visibly better in contrast. This was an artificial image so it perhaps does not illustrate the point well enough. So I have for you another image which is actually a scene from some area. In this scene you will see a lot of trees and other things but when I take this grayscale image and apply exactly the same kind of computations which as I mentioned are very simple numerical computations and which can be done by first year students. In fact then what I will get is this image. To illustrate I have here this image and its histogram and cumulative function. So look at the histogram. The histogram is limited to this range. The lowest pixel value is somewhere around 120, the highest pixel value is somewhere around 200 and most of the pixels are concentrated in this region. Obviously the cumulative distribution function will work like this because beyond that there are no pixels. All pixels are below this point and so on. When we say we are stretching the histogram what is actually happening that the equalized histogram if you take and plot it like this, the equalized histogram will look like this and the CDF which was a highly non-linear function becomes almost a linear function. In fact the histogram equalization is nothing but linearization of the cumulative distributed function. However once again without going into any kind of complex maths and justification of how histogram equalization is derived if we simply say that we concentrate on the computations which are required to be done in order to obtain histogram equalization. We notice that the computations are given by a well defined formula. That formula can be understood by any student with 12th standard maths background and therefore in a programming course people would be able to take a image given the pixel values, calculate its histogram and apply this technique to equalize the histogram. This is the picture as I said with enhanced contrast as opposed to this look at the original picture for comparison. It is interesting to note that not only modern computers but even modern cameras come with this histogram equalization function in bed. So if you are looking at a digital photograph which you have just snapped using a digital camera an advanced camera will permit you to see its histogram and also equalize the histogram. Certainly most image processing tools or picture processing tools that you see whether on windows or on unix you will find that it is possible to edit an image. Ordinarily when you think of editing an image you think of cropping it, rotating it, attaching multiple images on to one etc. etc. but an extremely important aspect of image editing is to improve its contrast and that contrast improvement is done internally by all such packages typically by using histogram equalization. Why am I telling you all of this? Suppose for example I choose a project based on this histogram equalization for my first year students rather than telling him simply the formula and telling him to compute this formula and use some input values arbitrarily and create the computed output that perhaps will not interest that student of the group of students. But instead if we spend say 15-20 minutes in describing how images could look originally like this and how if the contrast is improved the image would look like this. If we take an actual photo editing software which is available as I said in both windows and unix environment you can simply show in a lab exercise or ask the students to see for themselves a sample digital image, enhance the contrast by doing histogram equalization, do something else and then say look this is the real world problem, this is what we want to solve. You cannot solve the entire problem but can you not do this portion as a team exercise for the project. Believe me this will not only enthuse the students to do a better work because they understand the larger domain from which the problem is being solved but they will do that programming very carefully because for once they are actually solving a real problem. They are not solving an exam problem just for the sake of marks. Here is a program to calculate histogram. This program is written in C C plus plus however the syntax is well understood by anybody who has taught programming. What I have here is definition of an image array here and a histogram array which is just a single dimensional array of 256 elements. Observe that in C and C plus plus arrays begin with 0 as an index and end with 255 as an index and necessary evils that we will have to live with but otherwise everything is perfectly fine. It is particularly useful because the actual pixel values are always between 0 to 255. I have defined here an image file called image dot text and the first screen here merely shows the reading of the image n. So it is a text file in which I have arbitrarily created the pixel data written that as numerical values and I am reading that image file into a variable called npx and finally npx gives me the number of elements in that array. So obviously my input file is arranged such that the first value is the number of pixels which will be equal to the height and width and then for i equal to 0 to npx and for j equal to 0 to npx I read that square image into this array. I am assuming that my image is square and it is not really number of pixels in the entire image but the number of pixels in each row in each column. This particular piece of program also produces an output just to confirm that what I have read is correct. The interesting part is simple of course this is the ending part of that file reading process where if I am not able to open the file I give out an error message and close that file. Now this is the interesting part. Calculating histogram first since histogram by the way requires each column of the histogram array 0 to 255 will contain the count of pixels having that particular pixel value. Therefore I must first set this summation to 0 and then if I will look at each pixel in the image and depending upon whatever its value I will add one to that particular element of the array. So to begin with all the 255 elements from 0 up to less than 256 I set histogram to 0 and see how simply the histogram values are calculated. I run through all the pixels each element of each row and column and what do I do? Depending upon whatever is the value of i th and j th element of that element suppose that value is 55 that means the histogram element 55 should get one added to it. If that element is 127, 127th element should get added by 1. Consequently what I am doing is I am simply using the image intensity or pixel intensity at every pixel as an index into the histogram array. Observe that the values are between 0 to 255 as intensity values of my pixel and the index of histogram array in C also varies between 0 to 255. So I utilize this great coincidence to say if a pixel value is 0 then the 0 th element of histogram should get one added to it that is how I will count all pixels which have 0 value all pixels which have 1 value all pixels which have 2 value and I can do that in a single scan of the entire image. The point that I am trying to make here is that why the domain may be fairly complex while the mathematical derivation leading to a particular solution of a sub problem might not be very straight forward to understand but the actual computations can be very simply done. Observe that many students might compute this histogram in a different way. They will first take the value and depending upon that value they will put that value into a temporary variable. Ultimately they have to use that temporary variable as an index of the histogram array. This is merely a concluding part of that program which prints the histogram at non-zero values just as you saw the histogram printed earlier for non-zero values. Here I am just going through the entire histogram array for which I have done the cumulative totals and if that element is not 0 I merely print the pixel value at which the histogram value is being shown and the histogram element itself. Let us quickly go back over this entire program once. Notice that the program is not very large. Notice that much of the program actually deals with reading image and verifying the output of whatever I have read. Notice that the histogram calculation is actually which is the crux of the solution of this problem is a very simple part. Notice that I have a final verification for my own benefit of printing out some of the meaningful results so that I can cross check and confirm that yes these are indeed the right histogram values. What do we illustrate by this program to our students? We illustrate that while the domain may be complex if I have defined this sub-problem clearly then it should be possible for me as a student who just has a tell standard maths background. It should be possible for me to understand the basic computational requirements of that problem and if I know my C programming well then to be able to write program to get this done. Now obviously this particular program is not worthy of a team project. In fact we had a histogram computation problem as a problem in our examination. So in a written test people are able to do this kind of problem in about half an hour's time varying between 20 minutes for some smart students to about 45 minutes to other students. I mention this because you will have to do also as your team work whole lot of activities related to setting up of questions as I will illustrate short. Coming back to teaching of C programming this is obviously not a program that you would like to discuss at the beginning of a course. This should be discussed sometimes after you have introduced the basic computations the notion of it is not necessary to introduce the notion of files. You can actually use redirection of unix and they can use any file to supply values. But what is important is arrays should have been discussed before you discuss this example. Typically in IIT Bombay in our first subject when we teach this the first course in programming we have up to the mid semester examination to cover fundamental aspects of programming. Just before the mid sem or after the mid sem we announce the course projects. By this time students have understood the basic programming concepts and therefore they are able to appreciate such illustrations as this program shows. In order to make a team project out of this you have to say this is only a sample to calculate histogram. Now you have to apply histogram equalization and for some enthusiastic students you can ask them to do something more. In IIT what we do is all our team projects are essentially open ended problems. We tell the students to say that look in our project activity we can do only so much and that is what we will do. It is also important to illustrate to our students that programming is in actual life is not like solving half an hour or 20 minute problem in an exam. In fact the problem may not be completely and satisfactorily solved at the end of the team project which does not matter. That they have been able to solve part of it successfully that they have been able to develop a program or a set of program successfully to solve that problem is actually of greater importance. As a concluding slide of this part of the lecture I am showing the image of a finger print. Why do I show image of a finger print? The projects that we had given last semester to our 800 students actually revolved around use of finger prints and their analysis. So let me describe the course projects or the subject projects which are given to lab batches. I have some slides which I will use to define how the projects were organized. But essentially the problem that was given was the problem that the nation is trying to solve what we call the unique identity project that the government of India has undertaken. As most of you would be aware under this project every Indian citizen is to get a unique ID and that unique ID is to be tagged to the finger prints of that person. This guarantees that there cannot be any impersonation and the same number cannot be allocated to two different Indian citizens because finger prints are unique. Of course this entire thing is based on the premise that no two finger prints are same. Now this is theory, theoretically this is correct. But how do you do it for 1 billion people? For example suppose some Indian is being registered at let us say Anantpur at some village Thaluk panchayat and his finger prints are being taken, his name and other details are being captured. Now how do you know that that person is a genuine resident of Anantpur? How do you not know that somebody trying to fake his identity has traveled all the way from Lodhiana or Delhi to that place who has already registered as another Indian somewhere, he wants to get another identity? Well then the finger prints which are so collected from the resident of Anantpur have to be compared with all existing finger prints and to be ensured that that finger print is unique and there is no duplicate. If there is a duplicate you can announce that the person trying to register for unique identity is faking the identity. If that is not duplicate you allocate a unique identity number and store the finger prints appropriately in a large database somewhere. It is important to understand the magnitude of this problem by the way. This country has 100 crore people, 1 billion people. Typically for each person we would take 10 finger prints so we are talking about 10 billion finger prints. To collect these finger prints digitize them and store them itself is a huge implementation challenge. What is the computational challenge? Finger prints will not descend in a single day. This process will go on for several years. Let us consider the dynamic equilibrium of this process. Somewhere half way through I would have finger prints of 500 million people. Every day typically I may be collecting finger prints of about a million people. That is how I can complete the task in a few years. Now every day a set of 10 finger prints of 1 million people. Each one of these will have to be compared with corresponding finger prints of 500 million people which are already assembled. And I must come up within 24 hours with a result because if I do not tomorrow I will have another 1 million finger prints. Imagine the kind of computational power that is required. The database sizes that are required. The computational efficiency that is required. Various algorithms that is required. Nothing is being said even about the logistics process of collecting these finger prints, moving these finger prints either by email or by series or whatever mechanism. In short this is a real world problem requiring a combined approach of managerial skills, logistic skills, computational skills and so on. And in that we are saying what are the issues concerning unique identification to be allocated to each citizen. This is the larger domain. What I did was I took this problem and converted into a smaller problem. That smaller problem was allocating unique identity number to each individual student on the campus. There are about 5000 students so I actually procured finger print devices. Four of them I put them in the lab. I wrote programs which will actually do a interface to that device and capture the finger print. And this finger print would be converted into a digitally storable image. Then the lab batches were given different projects. One batch for example did the work of registration which means a student comes in, the finger print is captured. The basic information about the student is captured such as roll number, name, hostel number, branch, etc. Then all the collected finger prints of the day are submitted to another batch which is actually creating a large database of all previously collected finger prints. And every time a new finger print comes their job is to match that finger print with all existing finger prints. While matching poses its own problem and I had actually conducted a series of special lectures by specialists on finger print matching as well as finger print verification. Finger print verification is a different problem. Matching is the larger problem as I mentioned in order to ascertain that every person being given a number has actually unique set of finger prints. The matching requirement is slightly different. Suppose a student walks into a laboratory and I want to ascertain whether the student is properly registered as a genuine student who is entering the lab or not. Then I can put a finger print device at the outside and I will ask that student to put his thumb or index finger on that finger print device. Now I have a single finger print and I am merely and of course that student will enter the unique identity number that has been given to him or her. Now in this particular case I only want to quickly ascertain whether the student is indeed what he or she claims to be. So I will take that unique identity number go to my database simply retrieve one image corresponding to that thumb or index and compare it quickly with the single image which has been captured on the doors of the lab. If it matches I will say yes you are a genuine student you are admitted and I can additionally record the incoming time. I can similarly record the outgoing time. This is one example of attendance capturing and verification of students attending labs. This is real enough because this can be implemented in real life and this is doable enough by a lab batch. So consequently multiple lab batches participated in solving different parts of this particular problem. The sample project reports by the way which students have submitted all of them will be released in open source and I will try and see if during this workshop you can have a glimpse of some of the project reports that students have written. I had in fact a large number of slides on analysis of a fingerprint but that would take us away from our main task. So I am concluding this portion of the talk here and I will now proceed to define the workshop projects. The workshop projects are meant for all of you. So this is of direct interest and direct relevance. As you are aware a large number of teachers have assembled for this workshop at multiple centres. Last count I had was about 983 teachers which is a very satisfying number. Now all of these teachers will be going back and teaching programming to their students and as I suggested many of you would like to include the idea of a team project for your students later. But then in order to ensure that our students carry out a good amount of teamwork we ourselves as teachers must participate in a similar teamwork. The purpose of the workshop project is to ensure that all of us get a first hand experience in doing good teamwork. This teamwork is being defined as a workshop project. There are two activities that will have to be carried out by each team. But first let me define the team formation. Each team should have four participants. It is possible that the number of participants at the centre is not exactly divisible by four. We did not put that restriction. So consequently if you divide the total number by four there will be some reminder. The reminder will be zero, one, two or three. So depending upon this zero, one, two or three you allocate so many numbers additionally to existing groups, some of the existing groups. That is why some teams may have five participants. Of course there could be an or case where you have exactly seven students for example or seven participants. It has not happened in our case but I am just assuming that hypothetically if a centre has only seven participants then in such a case the centre coordinator can decide to have two teams of four and three persons each. But roughly this should be decided. There is an importance to this size which I will mention later. What is required? The team members should be able to easily communicate with each other after the workshop. Very clearly the team effort that I have in mind for the workshop cannot all be concluded during the workshop tenure. Remember that the workshop is only for ten days. And also remember that all of you have been advised that after the workshop you will have to carry out certain activities for about two weeks when you go back to your respective colleges. Well the activity for these two weeks is not going to be just an individual activity but it is going to be a continuation of the team activity which will begin during the workshop. That is the reason why I have stated here that the team members should be able to easily communicate with each other after the workshop. This communication need not be physical and verbal. This can be through emails. However it would be better and easier if the participants forming a team either belong to the same institution or live in the same city or live in nearby cities so that at least once or maybe more than once they can physically meet and discuss out the effort. If that is not possible and if the team members indeed belong to different places we have found that email and internet are actually the best mechanisms to correspond. Incidentally the Moodle which has been set up for this workshop will continue to be active for next three months. And all of you can actually connect with each other in terms of exchanging large files or whatever through the Moodle in addition to using your emails. What is equally important is that members of each team are in communication with the corresponding centre coordinators because centre coordinators have a role to play in supervising this team work. Let me go ahead and define what this team work is. The workshop projects consist of two activities. Both of these activities are to be carried out by each team. First a team has to define a programming problem and see programs for the solution of that problem is to be defined or to be written. This should be very similar to what us teachers will expect from our students when later on we assign them such team projects in our teaching. Additionally quiz and examination questions and answers are to be designed on allocated topics. Topics shall be allocated next week to different teams. This is to be done during and post workshop. During the workshop you are required to set up teams, carry out preliminary work in consultation with centre coordinators. So you can think of different domains. You are not at your respective institution but you would know the strengths and weaknesses in terms of expertise available in different domains in your own institution. Keep that in mind while you try to define a domain from which you want to pick up a problem. Extensive discussion will be required to figure out what small sub-problem can be taken up as a team problem to be solved as a course project or a workshop project. After the workshop when you go back complete both parts of the project and submit soft copies to the coordinators as also upload them on the workshop model. So at the end of this entire exercise at the end of two weeks after the completion of the workshop we would have submissions from you from all 893 participants or as many teams as we form really. We will have submission of so many group reports. One report on your project including the source code that you have written in C. Another report on the questions that you have set for the quizzes and exams and the sample answers. Successful completion of the workshop project is in fact a requirement for the entitlement of ISTE certification. So when you complete this we will ask our centre coordinators to provide you with the certificates for the course. Naturally this activity has to begin all in earnest and later on when I define the activity for today afternoon lab you will find that a significant portion of the lab has to be devoted to formation of such teams and initial discussion on the workshop project activity. You will notice that this slide ends with and dot dot dot that means there is something to follow. But something to follow is I think will be interesting to all of you. So before disclosing that I would like to now stop here.