 So, to recap our discussion last time, we observed a few things. We said that in the educational communication, the traditional communication is largely one way in terms of lectures and it has an element of some discussions which could be between the teacher and the students briefly in the class or between students in small groups. As opposed to that and this we said clearly is not a scalable. First of all, we question whether lecture is an effective communication for education and the increasing conclusion all over the world is that no, it is not. It just happens to be the traditional way and nobody has applied mind to see if there are alternatives possible. Discussions are considered to be more important from learning point of view and the discussions happen by the way the open discussions where people vociferously participate is something which is the visible form of discussion. But please understand that actually if you are listening to someone attentively, your mind is discussing within. So, it is an implicit discussion that you are having with yourself in the context of what is being spoken and therefore, lectures when we put a question mark, we do not outrightly reject lectures as a useless communication. They are not useless communication, they are useful communication because they also cause some internal discussion. So, we do not call it discussion, we call it application of mind. Mind is sensitive, it is actually analyzing what is being said, applying it internally to the context and trying to learn more. But we also said that in the larger context, if it is possible to capture the learning behavior of the students, then it may be possible to design communication which is individualized. And as far as individual communication is concerned, from the conventional models we picked up tutoring and coaching, not as a large scale coaching, but individual tuitions or individual instructions. So, not a coach, but an instructor is what we concluded would be an ideal way to enhance learning, but since it is not possible to scale because we will invariably have far fewer tutors, instructors or teachers and a much larger number of learners. And therefore, these ratios do not permit any individualized attention being given by the tutors or coaches. So, that was our conclusion. We also said that machines can be used as tutors or instructors and we concluded that machines can be used, but they cannot be a replacement of human mentors or human teachers, but they can augment and substantially complement the way the human teachers or tutors handle communication. And therefore, we said use of machines as to enhance and extreme of the use of machines to enhance individual learning is where you try to replace the human element in the form of lecture or tutor or coach. So, you have self-tutoring systems, online tutoring systems, online assessments, etc., etc. Or if you take MOOCs for example, it is another extreme example of an individual working with a machine. Of course, the backend knowledge of a particular course and driving of that course as a budgeted mechanism of learning something, applying your mind, solving problems, learning something more, etc., much like how a course is conducted is implicit in MOOCs. The advantage is a machine is available to you at your time of choice, at your place of choice and that makes it completely different from the conventional education. There is one more thing that we discussed is that the current mechanisms of personalized instructions, even using machines which have evolved over the last 30 years in terms of intelligent tutoring systems invariably depend upon individualized communication being generated based on the performance of the learner in some assessment. So, the current state of individualized communication is essentially, what do we mean by performance based individualized communication? We effectively say that an intelligent tutoring system including MOOCs for example, could assess you on a regular basis, can give you a series of quizzes and it can capture your performance. Now based on your performance, the system can decide that you require additional inputs for learning. So, you must spend additional time reading some more material, reading some longer explanations and attempting some more problems in a particular area in which your score is deficient. So, it is essentially the performance scores which decide what individualized material that you give. Unfortunately, if you look at scores in any assessment, they would get normalized to a number between 0 to 100. So, what happens is that I might decide some additional material set one for people who score around 30 percent marks. Another additional material set to which I may say is targeted to people who get let us say 31 percent to 50 percent and so on. Now, if I have 50 students, it is quite likely that these 50 students fall in a different set of groups, but still there will be groups. So, if there are 5 students who all score 30 percent marks, all 5 of them will get exactly the same additional material. This is useful, but is it sufficient? And this is where we discuss that the performance or scores based individualization is actually not really an individualization, it is merely responding to the ability of a learner to score marks in an exam. It does not necessarily useful to figure out what is the best material for an individual student, contrast it with a private tuition. So, even if a teacher is teaching 3 students for example, since each one of the 3 have different styles of learning, over a period of time the tutor becomes knowledgeable about the style of learning and appropriately addresses each learner's requirement different. So, for example, if some student is weak in problem solving, the teacher will give him large number of problems. Some student is not reading material sufficiently, the teacher will make him read something. Similarly then, an individualized educational communication should attempt to factor the learning style and learning behavior of each student. That learning style and learning behavior is not captured by scores. So, we now make a very major difference and we say learning behavior based personalization. Last time in the last session we discussed there are 2 aspects of it. First of all, can I capture learning behavior to begin with? If I can't capture learning behavior independent of the scores, then I have a serious problem, I can't apply it. The question of what to do after you capture the learning behavior is the question of what should be the composition of the effective communication which is individual. Can these 2 problems be solved using machines? So, one problem is capture learning habits and the second problem is use. How do machines help us in doing either? This was a question that we need. So, we are now talking about communication that is happening intrinsically. Two assumptions which are sacrosan, number one, the number of learners will always be very large as compared to the number of educators. This is the generic term that I use, which could be teachers, teaching assistants, faculty members, tutors, coaches, instructors, whatever. So, the numbers are not in favor of proper ability of human instructors to coach individual learners. That is the reason why we are seeking additional help. How could this help? First time we concluded in the last lecture by saying what all things can be captured which can be indicative of the learning behavior of the students. So, anybody cares to recall those things which we said machines can capture when people are required to do online courses mostly. If they are doing any online course, then what features of their learning behavior can be captured through the online mechanisms. Last time we had listed a whole lot of things. Anybody? The amount of activity which a student performs on. So, you have to name some activities. Activities including watching, reading, accessing the material, opening some links. Accessing, hopefully reading. Hopefully reading. Then we are also said weaving video lectures. Watching video lectures, accessing some links or downloading some content. If there are quizzes, then answering or attempting. I will mention one more thing which you are missing out. Practice problems. Problems or quizzes. Yeah. So, these practice problems are almost like quizzes, but these are not graded. The only difference between practice problems and quizzes is quizzes are graded for which marks are awarded and are counted towards your performance. Practice problems, there is no count. You may get a score right or wrong or whatever, but that is for your interest. And you will have graded assessments. Anything else that you can capture? Anybody from the group which did not attend the last session but have participated in any MOOCs course on any platform anywhere in the world? Can you say what additional communication that happens during the course that gets captured or can be captured? So, I will say discussion forums. There are two aspects of discussion forum. One is weaving and the other is participation. So, we will write both. Participation can be in the form of raising a question. Participation could be in the form of commenting on an answer given by a teacher or a TA. Participation could also be in the form of giving an answer directly to a question that is raised elsewhere. Weaving. It is interesting to decipher weaving furthermore. For example, if I just capture the start time end time of whenever a learner visited the discussion forum. I can only surmise at the end that during the two months or eight weeks or six weeks of the course this particular learner visited discussion forum for, let's say, three hours and fifty minutes. Now it might tell me in a gross sense how much time the student is spending on discussion forum but is that the only information that you can decipher when you are capturing the visits to discussion forum? They are all computers and students now. You click on the discussion forum as a learner. What do you see in the discussion forum? You see the current questions, old questions, responses, etc. Typically for a large course there will be tags. There will be tags related to topics, there will be tags related to date or time. Is it not possible to capture the topic which a learner visits in the discussion forum? Now imagine that in a programming course a learner visits the pointer topic again and again and again. Can you conclude something about that student's learning that probably that student requires additional help to understand pointers properly because the student is spending more time looking at the pointers? You see we are now talking about guessing but guessing intelligently. There is no concrete information because we are not asking a student. In fact if you ask a student which are the topics in which you are weak, the student himself or herself is likely to comment on one's weakness based on one's course which is not really related to learning because our mindset. We assess ourselves based on the scores in competitive exams. We don't assess ourselves independently. So therefore you have to intelligently guess and understand and therefore try to build what is known as a student model. There is a whole lot of extremely brilliant and useful research that has been done for hundreds of years on how learners learn. There are different models of learning. There are different ways of measuring how different schools learn etc etc. But all of these have been based on statistical analysis of a limited number of test cases. For the first time in the history of humanity in the last three years, systems have emerged which can capture the learning behavior on a very large scale, on a very large scale. Would that offer a help in making the modeling of a student a more fruitful exercise? This is one question. When we talked about capture of learning behavior and when we talked about using that learning behavior, I am talking about an intermediate step here, capture, model and then use. You agree? So capturing individual interactions of different kinds is one thing. Converting the captured data into a model which also means validating that model and then using that model to give personalized instruction is the final third step. So the construction of a student model, now let me tell you some issues that you might have to deal with as computer scientists of future. Let us assume that you are, there are about 10 million learners learning a course. Can you imagine a course that can be simultaneously learned by 10 million people, which 10 million people will find useful, 10 million is one crore, sorry, language course, alright. Let me simplify the problem. One million, one million is 10 lakhs. Can you in your own domain think of a single course which one million people in let's say India be doing simultaneously? I will give you a clue. The total number of students admitted to engineering colleges for different branches in different colleges in this country annually is about 1.25 million. Now you all have studied first year of engineering somewhere or the other or first year of science or something. Were there not common courses that all of you were required to study? Can you name those common courses? Okay. Maths. Maths. Is communication course obligatory for everyone? So what do they teach in the basic communication course? How to write? Or that they tried with me in 1963 also, but it did not impact me at all. Did it impact you? You see if all these courses really impacted us, we won't be sitting here in this class in IIT world. So there is some lacuna. Now let us just limit to the list. This clearly tells us that more than a million people every year, every year without fail do these and some more courses. Because most institutions are governed by AICT except IITs and IITs. Most institutions follow a common sort of syllabus given as a model syllabus by AICT. Although we say IIT system is excluded, it is not truly excluded because the model syllabus is based on what is taught in IITs. So invariably across the country there is a common understanding that what needs to be learnt by the students and what need to be taught in these courses is more or less identical. Now isn't this an extremely good starting point to say that look if I can do something about finding out how these students learn and help them learn better then I would have used machines effectively in educational communication. By the way this is one of the reasons why unlike our other two sister institutions we directed our MOOCs efforts to basic courses and not to the higher end electives. NPTEL for example is a great effort being done by IIT Madras. The NPTEL MOOCs are all higher end electives, MTA electives, higher end electives. Professor Prabhakar's effort in IIT Kanpur is mostly for agricultural courses and skill courses. IIT Bombay right from the beginning concentrated on offering courses which are done at the entry level first year or second year of engineering. Why? Because of the commonality of the content which will be learnt by very large number of students across the country. So we have today a course for example in programming. We have a course in basics, physics which is just getting ready. We have a course in thermodynamics. We have a course in signals and systems. We will be preparing a course in engineering drawing and I am trying to enthuse our maths colleague to offer a basic course in maths. Now second question, which are the courses which are considered traditionally to be most difficult courses by large number of students across the country from amongst these? Simple go by the failure rate. So maths, we give it a star rating, drawing, both of these. What is generally the passing rate for the first year courses in a typical university or something? Do 95% students pass. Do 50% students pass? Do 70% students pass? Maybe. We will take it as 70% as a sort of yardstick. What we don't realize is that out of 1 million students, 70% is 7 lakhs. But the number who fail is 3 lakhs students. Now are they idiots? They are not because they got admission to the engineering college in the first place. They would have done something in their academics earlier. Hopefully they don't deserve to fail. They failed because our educational system was not able to communicate education effectively to them. Consider those who pass. Let's say 70% pass. How many people will score? Because we are going by the score to reflect the understanding. More than 85% marks in these courses, very small percentage, 10% maybe. What is it that prevents others from scoring 85% marks in every subject? Again, they are not idiots. They are not dumb. But somehow they cannot learn in the style which is required and they cannot answer questions in the exams in the style which is required. Would they not benefit by some additional help? The point I am trying to make is no matter what your conventional scores indicate, every learner can benefit to a different extent by individualized learning. And that is the effort that we are doing. Now, coming to the closure on this, if I perform all the measurements that I mentioned and within three years, believe me, a million people will be doing these basic courses in India using MOOCs, at least some of them. When they do that, do I have only a replacement for conventional education or complement to the conventional education using MOOCs or do I have something more to offer? That is the research challenge to see us. I have not spoken about the use of the model. Imagine you have made the model. Now let us say the model indicates that a particular student requires more practice problems. How do you give more practice problems? Somebody has to create those practice problems more. In which topic the practice problems are required? In the topic in which the student's performance is poor, we said we can access data related to when the student gives a quiz or when the student solves a practice problem. But we forgot to mention explicitly, and I am trying to mention it now, that I just don't get the data about how much time a student spent on a quiz or a practice problem. But I know how that student performed in that problem. I know how much time individually a student spent on each question of a quiz, not just the overall thing. I know which questions he got right, which questions he got wrong. Would that help in refining my model of the student's understanding? Can I not make an understanding which is focused on understanding of individual topics rather than just a gross judgment of a 30 percent score overall or 40 percent score overall? And if I do that, then can I not say that this student requires more inputs in the topic of pointers and requires more reading material, that student requires more practice problems in the, let's say on the topic of stacks, something of that sort. How do you make such conclusions? How do you record such conclusions? How do you refine such conclusions after more evidence is gathered? That's a huge challenging big data analytics problem. You agree? That is where we believe that a large amount of machine learning efforts would be spent in coming years and decades in the field of education. So needless to add, I will repeat what I said earlier. This is clearly the reason why personalized instructions has been included as one of the 14 grand challenges for the 21st century by the American National Academy of Engineers. And it is ranked at the same level as clean drinking water, inexpensive solar energy, brain mapping. These are some of the other areas which are considered grand challenges, grand engineering challenges for 21st century. Realized instructions is given that status. I hope you understand its importance and its implications there. I wanted to give this as a conclusion because you might see a plethora of work in a field such as educational technology departments or computer science departments all across the world working on these areas. In fact, I will be not surprised if some of you end up doing your MTQP or PID research in that area, but that's a separate matter. We would expect, many of us would expect, in IIT Bombay we are actually setting up a center of excellence in online and blended education. And one aspect of that center of excellence will be to combine the educational technology research and computer science research to feed into this area. But practical systems are required to be built. These are two different aspects. So both work will happen. When we talk about personalized communications emanating out of a general communication for education, the challenges will be very huge. I will ask the last question in this context. How do you think we said earlier that the machines can help in a big way to enhance the learning experience by providing this, this, this. So we talked about measurement, we talked about modeling, and we tentatively said that after we model, this is the kind of additional personalized communication that can be given to the students and monitor. The question is, how do you prepare this additional communication? Do you keep it ready ahead of time? That means for every topic where you said, let us say, five practice problems and two quiz problems, you actually said 20 practice problems and 10 quiz problems to be given only when required to those students. The work is instead of five practice problems and two quiz problems, making 20 practice problems and 10 quiz problems. The work is five times more. How many of you have tried to set a question paper as TA or something, or that task is not given to you? One, two, three, four, five, six, seven, eight, nine. The point I am trying to make is those who have actually attempted to do that will realize the effort that is required to be spent, the time that is required to be spent. So just per topic, five times the effort required to prepare generalized additional material only in one area, practice problems and quizzes. What about explanations? What about more solved problems? So have you ever attended any coaching class of any kind? In the coaching classes, I have never seen one, but I have heard from my students that there is a lot of emphasis on teachers solving problems. Is that correct? How many problems does the teacher solve per engagement, two, three, four, five? Why can't a student be given 20 problems, solved problems to look at? Preparing solved problems therefore in much larger number is also a useful effort. Not necessarily expose all of that because then you will be overloading information to every student, but you give it as and when required. It's like you have a lot of drugs, but unless the patient has fever, you don't give that. So these are some of the mechanisms that will have to be thought out and that will have to be examined any much. All right. We would like to conduct the last two sessions as general open sessions, but these are not meaningless sessions. They will all be centered around your inputs, your suggestions on A, how to build such a model and B, how to use such a model. This will include your computer science expertise in capturing data, measuring data, analyzing data, modeling, parameter building, everything that you can comment upon. Surely not all of you might have any inclination to develop expertise in this, but those of you who have, I would request them to speak on these topics. You may choose any particular aspect. I would like public discourse by individuals for three to five minutes on any one area of these of your choice. If that does not occupy the larger portion of the last two sessions, we will have discussions where I will pose questions. Thank you.