 Good evening. Welcome to class. It's a great pleasure to welcome and to introduce Anayi Kharkare, who is a professor in the Department of Computer Science and has an IIT, Kampur. He also did his undergrad there. But, crucially, in between his current position and his undergrad, he was here at IIT Bombay. He's now in the department. He has a PhD with Anayi. And he's going to talk to us about what he's been doing on tutoring and programming, a particular software called Roto Ho to Anayi. Thanks, you know, and I would like to thank the department for hosting me here for this year, 2016. And this gives me an opportunity to introduce Bruter in CS 101 course here. So I will start with my talk. And before going into the talk, I'll just talk about a few people who are involved. So this system was mainly developed by Rajdeep Das, who was an MTech student, graduated last year. And he was the sole developer at that time. And after he left, I have taken over. So we are the main core team to develop this thing. We also took help of Dr. Sumit Gulwani from MSR in the initial phases. But he's not involved in developing the tool itself. Then we used help of several students to develop external tools as well as plugins which are used in the system. In particular, Umair Ahmed, few MTech students and many interns. So these interns typically are first year or second year students. So they have just finished their first year or second year at IIT Kanpur. But they have made very crucial contributions. So I will come to all of this later. So why do we want a system for helping with introductory programming? So the major challenges and those who attended last week's talk will also sort of agree with that. The teaching first level programming course is difficult. And why that is so? Because unlike physics, chemistry, or maths where people come with certain kind of minimal background programming, we get students who have experience with competitive programming. They have done some C, C++ programming in their 12th in CBC. To some other people mostly from villages, faraway villages who probably have not seen a computer yet. And this is true. I have seen few such students, some 12, 13 in my batch. And most of the time they also have language problems. They don't understand English. One student did not even understand Hindi. So he knew only Bengali. So all these kind of issues come up. And what makes it more challenging is then when you are giving these assignments because the class size is large, you want some kind of uniformity in grading as well as feedback, which as I will tell you doesn't happen with human TAs. So one major challenge is how to keep good programmers engaged in the course while making sure that beginners are also learning some simple stuff in simple way. How to provide early feedback if people are making mistake? How early can we tell them that this is the mistake and this is how you solve it? Without too many extra resources. So one of the resource that is used heavily in this course is TA, Human Teaching Assistant. However, they have their own classes, their own exams. So we do not want to strain them. At the same time, the expertise of TA in programming is also varying. There are some TAs who are very good programmers, some others who are learning it themselves. And no one likes to work extra hours beyond the lab hours and grading hours. And this was another challenge in front of me. We have lot of computers lying around within IIT Kanpur. And these computers have different flavors of operating systems like Windows, Linux. And within Linux also Fedora sometimes Ubuntu and going from some very arcane releases. So how can we use all these systems to teach large number of students simultaneously? So some of the solutions that come to mind are using ID1 or some other browser-based ID. The problem with this is that these browsers have limited capability. These IDs have very limited capability. Typically, they do not have any way of evaluating the solution. So you submit a program, you run it with your own test cases, and that's all. So there's no mechanism of evaluation. There's no feedback beyond compiler errors. Then there are certain judge programs where you can give some test cases to evaluate on. But typical judge programs are editorless. You have to write programs separately, submit it to the evaluator and you get back how many test cases pass. In such scenario, grading partial submissions become difficult because judges will always grade your program on number of test cases that are passed. And these judges also don't give any feedback. In fact, most competitive programming judges, they don't even tell you what the test cases were on which your program failed. So you have to guess what is going on behind the scenes. There is a tool developed by Microsoft called Code Hunt. And it's a different approach to teaching programming where you have to guess the problem statement given certain test cases. These test cases are generated automatically using the model solution, which is hidden and the solution that students submits. Okay, so the test cases highlight the difference between the two versions of the program. In this case, learning is a side effect. So they are saying it's like a treasure hunt. You are looking for some hidden program and the excitement to guess that program will help you in learning different programming construct. But at least my belief is that this is not suitable for first time programmers who have never programmed in life because these students get frustrated very easily if they're not getting to the solution. So these things also motivated us to bring our own system, develop our own system. And before going into the system, I will describe the setup which was used in the way to decide upon different features of the tool. So I'm talking about course ESC 101 at IIT Kanpur. So it's a introduction to programming course which is taken by all undergraduates in two batches. So approximately 425 students do this course every semester. Weekly load on a student is, he has to go through, he or she has to go through three lectures. There's a concept of tutorial and one lab. So every week they will undergo all these things. For an instructor, it is much more active. Instructor has to prepare for three lectures. Even though instructor does not deliver tutorial directly, he has to prepare for the do's and don'ts during the tutorial. What kind of problems have to be discussed? What has been covered in the class? So he has to summarize all the things. And finally instructor has to prepare for four labs, not one. Okay, and why is that so? Because of scheduling constraints, students have other courses, other labs. So which means not everyone can be accommodated in same lab, same day. So the idea is that the lab is divided into four sections and each of these sections have lab on one day, Monday, Tuesday, Wednesday, Thursday. And I think I won't be wrong if I just replace this ESC101 at IITK by CS101 at IITB. The things are very similar here as well. Yes. What's the program you have most specifically seen? Pruiter actually has been tested with non-imparative program, logic program through their priority program. Pruiter can work with almost any compiler or interpreter which can be invoked on command line. So it's not restricted. So as I said, we are not developing the compiler. We are using technologies which are existing and just bringing them together. Okay, so I will show some of the things later. So this talk will mainly be concerned about demo of the system. So I will show you different features of the system. I'm not going to go into the details of the architecture, but if someone is interested, we can discuss it later. So the system is designed to be used by several people. Consize are probably too small. So a student who wants to write programs, an instructor who has problems, a tutor who can create certain events like lab exams or lab graded labs. There are teaching assistants who actually grade the assignments which are submitted by students and there are researcher who want to see this code and understand how a student approaches a problem and they can do a lot of data mining stuff on top of this. And then there are other developers who can create feedback tools based on the insight they gain from the submitted thing. So in this case, I will divide the users of the system into two broad categories. One is the student and another one is the admin. So as usual, you have to log in into the system and as a student, what you get first is a dashboard. So I will directly go to the online system. So this is a temporary event that I have created for demonstration purpose. So this is the dashboard for a student where he can look at the problems that are assigned to him. This is a timed event. So it will finish at probably 6 p.m. today. He can look at his own performance in the grade card. So for me, most of this will be empty because I'm not submitting anything. And then he can look at some kind of summary of his performance in the course. So for every problem, you get this editor arena where you, I think there is some problem with the time of my system. So on left-hand side, you can see the problem statement. Yeah, I'm doing that. So this arena is divided into two parts. Left-hand side is problem statement. Plus you will get tutor feedback on your submissions. And then there are certain legends with what different kind of icons specify. So as usual, you just write your program. I have a program which is written here. And so this is a simple program which asks whether a given number is palindrome or not. So for this program, I can give some input. So I can compile it. So this is like a simple editor. Now the thing is there are features like compile-time error messages. So if a student makes some mistake, for example, forgets a variable, hints are generated to help fix it. So the one of the things we had to do was when a student is in first or second lecture, he does not understand most of this compiler jargon. So the error messages which are generated by compilers do not make sense most of the time. So for example, in this case, the actual error message that compiler generates is use of undeclared identifier y. Now for a student who is doing first or second assignment, he does not understand what an identifier is, what an undeclared thing is, and gets confused. So we have a feedback tool which rewrites common error messages and make it more helpful. So for example, in this case, it tries to give more details like you are using variable y here before declaring it. Declare it first, something like, so given example and some details of how you are going to do it. So this is the very first feedback tool that has been integrated. It is one of the simplest feedback tool. It's just message rewriting. Then there are other errors, for example, student typically tend to forget this ampersand in case of scanf. This is a very common mistake in C. So in this case, this is not an error, but a warning C compiler can accept this program, but it has a type mismatch thing and the real message here is even more complicated. So it starts talking about pointers. So we have to rewrite these messages so that students who are early in programming can benefit from the error messages. So the actual error message is much more complicated. So it says format specifies type int star, but the argument has type int back at minus w format. Now a first-time programmer, this error message is not going to make sense until it covers pointers. So instead, we produce a much simpler error message and another thing, as I said, there are language issues. So we can even rewrite this message in native languages. So this is a work in progress where we can present the same message in Hindi, English, in the Bengali or some other language if instructor wants it. So suppose student fixes all these issues. There are still bugs in the program, but let's say students execute and is satisfied with his own submission. Student can try his program on instructor-provided test cases using evaluate button. And this evaluate will take the program, run it on instructor-provided test cases and this is where feedback tool comes into picture. So this is where semantic feedback comes. So the problem is several test cases have failed for the student and it's very difficult to figure out what mistakes the first programmer, what are the mistakes that he or she has made. So in this case, tutor can give certain hints that you have to check assignment to y at line number five and check return statement at line number 13. So in this case, I have deliberately made incorrect assignment y should have been zero here. And similarly here I have made a mistake that comparison operator is replaced by an assignment operator. Again, a very common mistake done by the students. This one, okay. So this particular tool is developed by Ivan at Vienna University. So the idea here is this particular messages depend a lot on some correct submission. So we have to have some model submissions to the problem and the tool compares the student submission with these model submissions and figures out which model submission comes closest to the tool. Okay, and then it depending on the diff, semantic diff between these two program it produces messages and many times it produces false positives as well but typically for common mistakes, the tool works very well. So we have to give several in several correct solutions or in general we have to give all possible ways of correctly writing this program. Okay, and which is one of the limitation of the tool and along with these. So one of the other problem is student try to write programs which will only take which will be tailor made to the test using if then else they will try to pass all test cases. So we have certain a concept of hidden test which is very similar to competitive programming judges. So these tests, you will be only told about their success not actual test case. So you cannot game the system. In your experience, what percentage of false positives? Okay, so we have not done a systematic study limited to limited by many things of pointers etc. It cannot handle. So first few labs, it does very good job. Okay, however, I will come to some of the issues it takes a long time to solve the problem. If the if the program is complicated, then the comparison takes a long time. So students get frustrated even with that. So we have to come up with some faster mechanism. Right now we don't have a statistics. So we have not done that. So after fixing all the issues, the tutor will give you a go ahead that your test have passed. So you can submit the program. So we started with the collecting status in the sense that we were looking at the correct program after feedback was generated, whether it incorporates the feedback or not. But actually that did not go well. Again, one of the issues is that we are limited by number of people who can do this study. So we are slowly making progress in this area. I will share some statistics, but not all. That is, if each of the error measures are effective, then the student can take. Okay, yes, I think it's correct to warning or not. That's all. So I think you will get some statistics. So that's a good idea. We were thinking of it using some kind of star mechanism for the whole system. But I think individual feedback can be. Work across any compiler. Yes. Okay, so this feedback. It was any compiler because any this can this will work typically on a imperative program. So the framework itself will work for any compiler, but this feedbacks they may differ from one compiler to other in particular. This one will work as long as you have a control flow graph and call graph of a program. So this tool compares control flow graph of two programs. It does not look at the actual code. Okay. No, but for a professor, you don't have to do anything. You just have to plug it in. So Hummel has developed that tool. So that is the nice part about Brutal. It can we can integrate these tools very easily. Come back to this error. Okay, I will just show you something. So from a faculty perspective or instructors perspective, you just have to select which tools you want to use again. So these are all plugin tools which have been plugged into Brutal. They can be developed by third party. So for example, this ESC feedback is the one which is giving all the semantic feedback. I can very easily disable it. So for example, in an exams scenario, I may not want to give students any help. And the moment I disable it, I will stop getting any feedback from this. So if you want to develop a tool, yes, you have to understand the internals. But if you want to use a tool, given tool is developed with certain rules in mind, it can very easily be integrated into Brutal. So in this case, the feedback is gone. Okay. Similarly, I can disable other further. So this is system by disabling. So currently we don't have a concept of individual course. So this whole system works only for one course. Okay. So you might you will have to have different installations for the system. So all these things are in future for future. So now even compile time errors will not be shown in the nice way. It just says your program did not compile all those feedback messages have gone. Yes. Okay. So yes, yes. So typical compilers have this standardized error codes. Okay. So if we use error codes, then this can be sort of then across compilers. So this will be portable across compilers. So so right now this works for GCC as well as ceiling, the repair tool, the compiler tool, okay, the feedback for compiler, but the ESC feedback tool, as I said, it is compiler independent as long as it gets abstract syntax and control flow graph, it is able. It can generate the tree. So they have their own compiler or front end to generate syntax tree out of the code. This is done offline. So the feedback tool developed by Ivan was done to offline. In fact, it was developed for a different problem. So they developed this tool to give to use these tools in tutor. So we have to create a web based interface so that Prutor can send a query and get back the result. So any offline tool can be integrated with offline in the sense that any tool which is not developed with Prutor can be integrated. If it can handle query via web web request and then can return results in a JSON format, a particular format. So as long as this interface is consistent, Prutor can take any feedback. Okay. So this is more or less about a student view. So I jumped onto the first student. These are all the major things that he or she wants to do submit a program, see whether it is correct or not. And finally, I went into the practice mode or sorry, I was in the at the end, he can submit so that the program is sent for grading and so on. And similarly, he has other facilities like look at earlier programs that he or she has submitted grades which he has received and so on. But those are one of the middle things. I won't discuss that. So what I will do is the more interesting thing is. So this is I just have the backup slides in case network does not work. The more interesting is admin interface. Whenever an admin logs in, he will see a slightly different interface. So which is the one? This is the and every admin can act as a student. So I became an admin and then I changed my role to student. So in this case, this is the place where you manage class. Okay. So you can create problems. You can create and manage accounts. Well, you can schedule event and then there are certain analytics can be that can be done on the system. So for example, for creation of problem, you get a particular interface where you give name to the problem. There is an existing. You have to give the name of particular identifier in a certain format. I won't go into the detail. I will come back to this when I discuss problem upgrade and downgrade and for a problem statement, you have to give the statement for a problem. You have to give statement. You have to you can give primary solution code, the instructor solution code. This solution specifications are alternate solutions, which are used by semantic feedback tool to decide which code to use to give feedback. So the semantic feedback tool will find the closest program process correct program to an incorrect program to give feedback. Okay. If it cannot find any closest program, then it will just give up. So there is a notion of distance between two programs. You can also give partial programs to start with so that student does not have to type out many routine things like hash includes. Yeah, you have a question. That is not my work. So I can talk about it, but that is done by some other group. So we are using it as a third party to just to show that who can integrate other tools. And I will give a reference to that as well so that you can even go. So initial template is like this. Oh, this is this is the problem statement. Then this is the solution given by instructor or a template. You can use some external tool to generate test cases automatically and these tests can be added to the system using a common separated CSV file or you can add test manually. So if you have sufficient number of tears, you can ask them to add a test case. The thing is if you are given, if you have given this primary solution code, you just have to give the input and output is generated automatically by the system using the code. So unlike some other tools that I have used with where you have to provide input as well as output. This reduces the chances of making mistake while generating the solution. Okay, and especially the things like spaces and all this is a huge test case to just get generated and added at the end and while generating the test while adding the test case, you have an option of setting it to visible or invisible. So it will automatically be reflected in the student's view as visible or invisible test. So one student has submitted the solutions of programs. You want TS to grade it, maybe on a certain day or something. So we have this task panel, which gives you status of different tasks which have been scheduled. So in this case, several TAs have finished their tasks, others have not. So this is a consolidated view of all the tasks that have been scheduled for a particular lab and from TS perspective, each TA, you can get his own view of the task. So this is the consolidated view and this is the view which every TA will get. So these are the task grading task which are pending for the TA and these are the tasks which he has done. He might go ahead and review it. So this is one thing which has simplified life of lots of TAs because before this system, they were either downloading it from some email client or downloading it from some tar file and tarring it making sure the names do not clash and all. So in this case, the grading becomes very easy. I will just take something which is already graded. You have everything here. So this is the grading view on in the middle panel. You see the code submitted by user on left panel. This is the marks given by TA. TA can give some feedback here and finally grade the system. If TS thinks there are some minor mistakes wants to verify, he can open the submission. In an editor, we can change it. Try the same test cases. Okay. Any change here will not be reflected in the students code. So students code is a has a read only view from TS perspective. All changes here are on a temporary copy which will get destroyed the moment you close this window. Okay, but this this gives us a lot of help while grading partially correct solutions. So you think that, okay, the student has missed earlier character. You add it here. Try all the test cases and verify yours. Back to the student. The correct solution. You said if the TA can make some changes and it makes the solution work, it would be nice for a student to see those changes. So the solution that is given by the but yeah, this is also again very good. Right. Yes, yes. I think that's it also. Only one of the problem is. Yeah. So for example, most years do not try to do that. So we and because you notice this was less than we should be less I mean, you may not go through this but you paper you would just put a cycle there and put a comment. Okay. Right. So maybe some kind of advanced editor which will allow us to put comments in the code. So we'll look at is there any also for this editor is also third party thing called Ace, which is a very widely used editor browser based editor. Okay. If I will see if Ace has such features or not. Okay. It allows us to put comments in the gutter, which is what we are doing while producing compile time messages. Okay. I will check whether it can be done for other things as well. Sorry. GitHub. So another tool feedback tool that we developed is some kind of an auto grader, which looks at student submission and it looks at some other statistics like number of compile time failures, number of evaluation attempts and it also looks at some other features of other students, same features of other students to predict what grade will be assigned by the TA. So this was a again a project done by an m tech student who looked at all the submissions from last two semesters, the grades given to them and then came up with some weightage to individual parameters to predict the grade for a submission. Right now we use this predict grade button is available only to instructor. In case he feels TA has not done a good job. We can look whether this grade is close to what the tool is giving. So right now because this is a perfect system or perfect submission. So it got a predicted score of 10. Typically in lab assignments, we get full marks. The students get full mark. So it will allow it helps if we go to a non lab submission from exam submission. So in Kanpur, we have a slightly different model. There are lab exams which are heavy weight events. Labs are very less weighted. You can take some submission which does not have full marks and see how it performs. Okay. This is something really, really random. So student has not even tried to submit the problem. So in this case, the tool says that he should get some four mark even though TA has given one. Again, I can guess why he will get four marks because number of compile events in this submission must be very less. Student has not tried to do anything. So he will not get penalized for compile time failures. Okay. So another interesting thing which is, which does not have any other value than look cool. It looks cool is this history panel. So we can look at how student has, how the program is evolved, how the student has written his program. What was his thinking process? So this is the original code, the template given by the instructor and you can look at different versions of the program, how the student has tried to modify it. And if you are in hurry, you can just play it. It will show you how this code evolved over time and it also marks important events. So whether this code was auto saved, compiled, submitted or things like that. So this thing stores, in a way you get whole history, almost like a version history of a student submission and this has also helped us in catching cheating cases. Even though it was not the original intent, it has helped when we catch someone copying, then we can look at their history to figure out whether he has really copied or not. In fact, this can detect whether a large amount of code was pasted or not. So these things help. Can we not? So we have not done a lot of engineering things which are required. Again, I have had bad experience in the sense TCS has this software where they disable almost anything. You click anywhere outside their editor window, you will be logged out of the system and then you have to call and admin. We don't want to make things very restrictive. We do encourage people to paste code as long as it is their own code and not directly copied from some other friend. Okay. So what happens? Some people will try writing their code in some other window and paste it. Okay. But yeah, so it can be disabled, but I don't know if it will be useful or not. People will find way of copying. It is integrated but not in the UI at the UI level. We have a script which can integrated with more. So there are many offline scripts which are used with the system. One of the goal is to convert all of them into and some kind of again interface, create an interface for them UI interface. So this is the and finally as an administrator, you can change lot of second set settings of the system. So for example, as I've shown, you can use feedback tools, enable or disable them. There are a lot of compiler flags. So what happens? C language is really a very tricky language. There are many things which are undefined, which are processor defined, which are compiler defined, version defined and so on. So in order to give a uniform flavor, you may have to force several flags. You have to pass several flags to GCC compiler. But this page allows you to create to specify those. Similarly, you do not want student to end up writing in programs with infinite loop or which create memory. Okay. Hog memory. So there are certain limits and another thing. This came by experience is that many students when they write bad program, they just wait for miracle to correct their program. They will just keep on clicking execute, execute, execute without even changing the program. Okay. So for such to sort of discourage such that we have explicit delays associated with each event. Okay. So whenever you compile, there will be a half second lag, no matter how fast compilation went. You will have to wait for half second for execution. You have to wait half second and evaluation. You have to wait for two seconds before you can proceed further. Okay. I have seen in exams that student will just keep on pressing evaluate, seeing that things have failed again. Press evaluate again. Things have been and this thing actually hurts the server very badly. If program has infinite loop. So this was a till now I was talking about the basic framework of the prooter, which allows T is which helps T is and students to sort of grade as well as write their submission to students, write their submission. So during this course, we developed several plugins or external tools. I will discuss some of these tools in in in in a short. Okay. So we developed a method of generating problems automatically and we came up with this idea of upgrading or downgrading a pro pro problem. So when a student is stuck on some problem. We could give him a simple problem which had similar kind of concepts, but probably at a simpler level. Similarly, if a student is able to solve a problem very fast, he could upgrade his problem to a more difficult problem with automated problem generation. It was imperative that we have automated test case generation because the number of problems that can be generated are very large and so and getting test cases from T as was not scaling. I have already shown you syntactic feedback and semantic feedback. The examples of this tutor had also had some inbuilt analytics where it can show you how a student is progressing with this code. So this red thing are currently disabled at it Bombay because we had to change to a totally different language and finally there is a work in progress about debuggers. I will talk about all this in brief. So problem generation. So what happens when you are having large number students sitting for a for an assignment or exam? It's very difficult to find a space means they will always sit side by side. So we wanted to give different problems to different students, but at the same time make sure these problems have some kind of similar difficulty level. And so this way different students can be assigned different set of problems. We want to generate problem statement as well as the solution program because it will not help if we just generate programs. It will not help if you just generate the statements. We also want to generate variations of the problem different difficulty levels for the same problem and this goes along with the upgrade and downgrade feature. So the key idea in this was we created templates for common idioms. So for every lab we knew that these are the kind of programs which are typically going to be used. So we created templates for such programs. We created certain rules to instantiate this template and also we can combine two different templates. So if there is a template for for loop you can combine it with if you can nest if inside for and for inside if and so on. And finally the difficulty level is used is computed based on the rules that have been used. I'll just show you one example. Okay, so these are the topics that were covered by this automated problem generation sequences like you can generate Fibonacci sequence or a variation. The variations are also very simple. You can have 2x plus 2n minus 1 plus 3n minus 2 plus some other five. Okay, and now this 2, 3 and 5 can change for different students. Patterns. I will show you example of patterns which use nested loop problems related to arrays, matrices, string. So typical problems for finding maximum element finding minimum element finding second maximum second minimum maximum in a row maximum in a column and so on. All these were templates and we instantiated with different kind of rule. Simple recursion and simple pointers. For combination you can have concatenation. So the output will be solution of one problem followed by solution of second problem. Output can be some kind of Boolean operation. So basically if something condition holds then you print this otherwise you print something else or it can be sorry this is the conditional execution and Boolean operation is that you want to find a element in matrix which is at row maximum and column minimum. So you have two properties you combine them and then whatever element satisfies this property you print that. This is really bad. I sort of expected this. So this is one of the set of problem that is generated by the tool. So in this case we'll just show you the kind of pattern that a student has to print. So this is a pattern which student has to so student has to write a program which will print a pattern like this. So this is a problem at certain difficulty level D1. Okay and the program is also generated. This is the program which will generate the pattern and again this tool or this particular system can generate fill in the blanks like questions as well. So all these hashes you see they can be post-processed to generate fill in the blanks and the number here shows the weightage of this particular blank. So if putting an M percent height will get two marks and by figuring out that you have to put row number minus one will get 10 marks. So all these difficulty are in pretty used to use some kind of empirical measurement to figure out what value should be given to each blank. Now suppose student is unable to solve this problem. We want to create a slightly simpler problem. So earlier if you remember the triangle was not a right angle triangle. So printing that triangle is slightly more difficult than printing a right angle triangle, especially when the angle is on the left. Okay, so this is a simpler version of previous problem. Okay, you only have to worry about one condition in the loop when to end. And if student cannot solve even this then he can just print a triangle rectangle, okay, which is the simplest of all. So we had generated quite a few I think some 400 500 such problems. There are five and in fact more complicated ones are these ones where you have two patterns side by side. So these are even tougher problems. Okay, where there are two patterns side by side the similar the simpler problem for these will be printing one pattern then replacing these sequence by stars or a fixed character and finally printing a rectangle. Okay, if someone cannot solve a rectangle problem then he or she better talk to a TA or instructor. We cannot go below that. So again, this is just for some reason. Not coming out properly, but this is just an illustration of different variations for a given problem. So you are printing a triangle with numbers. Now the easier problem is printing it with the stars or easiest will be printing rectangle with all a star. The tougher problem will be printing to triangle side by side and so again the sort of the level of difficulty is decided by us. The tool cannot decide what is difficult what is not. So we have given difficulty index to each concept and the tool combines this concept to come up with a difficulty okay. So this upgrade downgrade I cannot unfortunately I cannot show it on a web browser because this feature is disabled for now. The idea here was for every problem we will give this upgrade downgrade button if a student is unable to solve a problem he can click this button to get a downgraded problem. So for example, the original problem is figuring out whether a number is palindrome or not or in particular it is about reversing a number. The simpler problem here will be reversing just last 2 digits of the number and even simpler will be just figuring out what is the last digit of the number okay. So using this again our idea was that we will be able to pinpoint the concept which a student is not comfortable with. Unfortunately this feature was also not used by students somehow students got the feeling that if they solve a simpler problem they will be penalized okay. So then we decided to disable it. So as I said our automated problem generators generates large number of problems but it is not useful if we cannot test these solutions if we cannot figure out what student solutions are doing. So we have to have test cases for each of the problem but manual test case generation does not scale. We use the CLE to generate test cases automatically. CLE is a third party tool developed at Berkeley and it is a quite a used tool in software engineering community. One of the problem with CLE is that you have to tune your program to get good test cases otherwise it will generate the simplest possible test case. So any program which requires a string input will get all null characters as input. So we to provide syntactic feedback again this is where we rewrite error messages. So for each message here we can create our own feedback and at the same time just to get an idea of what kind of error students are making we could see for each assignment what was the actual instance of the error. So these error messages are parameterized with respect to the actual variable that is being referred to. So you can put some placeholders which will get replaced based on the actual variable name used by the student. And the system can produce certain analytics on student submission. So for example this is also very important parameter when you are looking at plagiarism kind of things. So this particular submission as you can see it has a very nice uniform slope. This tells us starting from time 0 as time progresses what is the progression or what is the size of student code because we are storing code at regular interval. We can see how the size progress at certain time student the code was not changing at all. And finally code was changing again. So this is a typical feature we found in all students submissions they will have a slope then the code will be stable for some time and then at the end of the lab there is a large amount of activity. And what this is we have some 20% weightage for comments in the code. So students submit all the code and after close to the deadline they will start adding comments to the code which it sort of confirms our hypothesis that students write comments at the end no matter how much you tell them you should comment your code along with the while you are writing it it always happens as a post of martyrdom. Then there are other things like it tells us about how many auto saves are made how many manual save student submissions and so on sorry the submissions compilations and so on. So yes. Even though this is surprising a nice it actually adds indentation it indents your code properly while you are writing it. Student somehow manually overwrite that indent and produce really bad code. So another one of the feature that was lacking in the system till now was support for debugger. So if I want to tell students okay if your program is not compiling if it is if it is not running it is not giving correct solution you should try to trace how your program is running manually and to figure out the bug but this is something which students do not understand. So in order to encourage student to trace program we have developed a debugger to help them step to the program. So one of the challenges with debugger is that you cannot depend on a debugger like gdb we started with that but since a debugger session is it so it keeps server occupied with this particular debugger session if out of 400 even 20 students start debugging their code the server will crash okay so the session has to be maintained. So what we have done is we have come up with this idea of we run the student program at server collect some data send it back to client and the actual replay is happening at client. So this is a JavaScript which runs the program but actually what it is doing is it is really showing you the data which was dumped by server in a single execution. So this does not this cannot help in all possible debugging scenario but if a student is interested in seeing how his program is progressing he can use it. So here is an example. So again this is a very similar palindrome finding program if you don't read it can't read it. It's okay. The projection is not very sharp but this debugger it allows it shows you the execution of the program and we have this concept of breakpoint so there is no explicit there is no button to put breakpoint. This particular command underscore debug will halt the debugger. So if I the idea here is this blue line very light blue line if you can see or not. So this is the line which is showing current statement getting executed on left hand side you get the values of all the variables. So this is like a stack and if you have put a breakpoint you can single step to the code. So because of the solution I cannot show you all the thing but so it shows you how statements how a program is progressing and it also shows you very values of all the variables. And then so and there is an output. So the nice thing is this debugger since it is dynamic it is a so we are doing things that runtime it can detect certain runtime error conditions as well. If I submit this program that is less than zero just wanted to show a very simple program. So it can show you the errors that are happening in the program so it can detect array bound accesses it can detect null pointer access it can detect division by zero. So these are runtime errors that can that are very frequent in C programs especially with the new programmers. So since simulator is the C debugger is running the program even if these values are generated at runtime it can detect them and can give feedback to the student. So right now this debugger is also a standalone tool it is not integrated in the system. And finally once we had this database since we did not have much we did not know what exactly to do with the data so there are certain visualizer some students have come up with this visualizer which can show you data in different nice formats. So for example if you are interested in looking at all the these are just some fun tools which are. This one is not working maybe I'll come back to this. So how much time do we have I think I've already finished five minutes okay so I will just give very brief brief overview of the architecture. So this prutor is built from existing tools so we have not except for this feedback tools most of this is existing tools which we have taken and connected them together. So this is your host machine we use virtualized virtualization technology called Docker to create a different components to run different components on a virtual machine. So there is a service discovery tool we are using console. So what console does is it just keeps pinging different services of prutor to make sure they are alive if they are not then it will spawn them. What are the main services so we have a engine this is the compiler where program gets compiled and run we have a web application the interface with the user. So there are certain caching mechanism to make things run faster there are load balancer so we can have multiple instances of web server as well as in web application and engine running. So these load balancers make sure that data is distributed evenly across requests are distributed evenly across these systems. So again so virtualization is used so that we can scale this system easily so all these web app and containers. I will just show you this diagram okay so suppose you are running prutor on one host and number of students suddenly increase for some reason or so for example sometimes we have lab exams where there are double the number of students and regular labs we have 100 students lab exam there are 200 students so we can replicate the whole system easily because it's built on top of Docker we just have to get copy of the image and run it okay and we have to just connect these two hosts together through the proxy server and it scales so this is mainly for scaling so database first time we had distributed database that hit us very badly in fact what to handle lab exam we created more copies of database and every right was actually resulting into some 20 rights and it brought down whole system so right now we have a single database like this architecture is slightly older now we have a single database container so we don't that way database is right now a single point of failure for us yes. So everything is stored in the database only when you are compiling and running a copy is made on the engine container and it is compiled and run but the results and this program everything is stored in the database. So if it is on a single machine okay and if so for example right now using virtualization I could just copy all the images from it can't put it bomb and got the system running in two hours okay so that is one and similarly for scaling also it helps and packaging is one thing secondly the number of containers so now we can keep on increasing web container engine container based on the requirements yes right. That is what I'm doing is the resource content is are those things are concerned and that has been also that's the orthogonal concept within compile within the engine cluster we have sandboxing which means all student code runs within a protected environment. The Pruta does not support file I.O. or similarly some other system calls okay one of the reason again suppose this system is stable I can see now okay so I will tell it in a slightly different way suppose the system is stable it is working fine then getting a dedicated host and running all the services there will work okay but right now as we say we keep on changing things we are every day we have to sort of change certain thing in either web container or the compiler container so to make sure the this does not affect the whole system at a go it is useful to have different Docker containers. So so the virtualization actually and other thing was scalability because we can create containers on demand okay it is modifiable basically as I have shown you that by using the the advantage of having plug-in based architecture is that you can disable or enable plug-in so that you can either go for a simple tutoring system or you can just have an ID which can just compile your program. So yeah this is slightly old one so but so for database now we are having a single database it's no longer a cluster database but we have we maintain regular backups just to avoid any issue okay so I would like to conclude now you already taken 10 more minutes then where so this system it started in 2014 this July okay I still remember the first day when it was deployed so it's less than 20 months 21 months old and still lot of experimentation is going on the way we have developed is as a framework which allows plugging in different component and we have sort of experimented with different thing so with GCC and Selang it has been used extensively but other people have tried running Python and prologue code and it worked okay unfortunately none they did not use it for class for some other reason for feedback I have shown you semantic feedback some compiler message rewriting and we also use some ad hoc scripts to generate specific feedback for specific labs for automated problem generation we have ad hoc programs which take templates as input and produce different kind of programs of varying difficulty or similar difficulty and we have used cleave for automated test case generation and finally the system can do lot of analytics on its data we have not sort of explored all the possibilities and that is the where future work comes into picture so there is a huge amount of data waiting to be processed and I am I am not a skill expert so I cannot use it very efficiently over effectively there are many HCI issues so for example I told you the upgrade downgrade feature which looked so cool on paper in our mind but students rejected it someone suggested that name upgrade downgrade should be changed to something more positive okay we have to think about those things we have to do user surveys and feedback from students, TA's, instructors the feedback tools that we have developed again they have limitations their response time is not very good they are false positive and our semantic feedback tool requires large number of specification to show the magic that I showed you okay so we are trying to come up with the solutions to that and we specifications is yet so the correct solutions so we call them specification and finally one of the thing which we want to explore in future is how can we use people out there to get more feedback more test cases do peer review of student code and so on so maybe the student themselves can peer review code of their colleagues and so on so thank you and any questions so this is the proeter in action so this is photographs from some lab exam they just collected yes yeah it's not live in that sense but what you can do is let me go to the so if a lab is going on I can just click on the submission and I will see what each student is doing okay so I can just keep looking at this and keep refreshing this page it will give me correct current submission sorry photo will not go okay so this is one thing we deliberately made this so all student submissions are anonymous there's only one table which maps student data student name with his or her code okay so if we release this code we always delete that we don't release that table so student data is totally anonymous yes as long as he does not write his own name in the code technical question yes so you know this this system is clearly developed to solve the local problem at I can't put that you know as to make yes you want one better experience the students and various places these solutions get developed in educational scenarios do solve the local problem but very few like the answer for example actually go out and get wider adoption so you know this looks like very nice system what do you think will actually drive different places to adopt this I know that you installed that can't go and you know that's probably as well but on a wider scale so for a wider scale as I said what will help us is supporting multiple languages and a way to support local tools okay so recently there was a mail extension professor Varsha said that nobody is going to stop developing something because someone else is developing a similar thing to get wider adoption it's better to hook on local system so for example if I can get available hooked on to this system probably it will get adopted easily at IIT Bombay similarly for some local team has developed their own feedback tool if that can be plugged into this or if this system can get plugged into a Piazza or Moodle then it will get wider adoptability yes so we cannot have a tool which will not interact with other tools so this is where the plugin architecture will probably help us okay yes from whatever topics you have covered there's lot of work happening individually in many places for like on program generation at all is there any effort in the community to standardize the API's and because no I mean because again if it comes to assembling everything you try to then you have each person has to reconfigure whatever to make work that's compatibility is also a major you use let's say Django something and I use some other PhD based stuff and you write API's it becomes like a major mess so I think this improving education through software or computer-aided education is a very new field everyone is trying to come up with new and exciting solution and as of now I do not see any standardized effort in fact many companies as well as academic institutes everyone is coming up with their own ID like systems one nice thing about WC is this that it gives old fragments which is not then many systems whatever else you want to say and those fragments yeah so here is scanf here is how you use scan okay here is the you know so called official man scanf which is not complete and then here is a simpler so API of C C plus plus you know somehow is you can go to the code cpp.com or whatever it is but stack over stack over is new but it's not a somehow we can we can try to integrate that because you know that you want a substring you don't want it to start with zero or whatever it is a length minus one or length less one or something like that you know how do you find you know from reverse and forward you know all these kind of things come this I think syntax that you probably want to have I think this the way we are showing syntactic feedback to the student I think similar way we can show some score snippet that this is how scanf should be used so so for example in our uninitialized variable example we were showing this is how you should initialize the variable only that can be extended to give code snippet as well this is one Google search problem I don't think I can do it unless I can just search for some time for the time so our students do they have like internet browser open when they are doing their labs so in IIT Bombay there is a concept of ungraded lab and graded lab during ungraded lab they can do it similar that IIT Kanpur during lab they can do it during lab exam they are not allowed to look it so internet they do it quite often so during lab regular lab they do it we actually encourage them we are them to open the slides of the class bring some book and look at Google but during lab exam they are not allowed to so just a small thing this is the interface has changed when we came here so now this is how the new interface looks like you can add multiple language this was required at IIT Bombay because there are two languages which are being used to teach CS on the robot even though both are CPP there is a variation of CPP called simple CPP language okay so even though it is embedded inside CPP plus plus so now Prutor can support multiple languages so I call that simple CPP as logo and here you can add Python as well so as long as your language can fit into these particular few bullets okay so mode you can forget how does it come is it compiled or interpreted mode what will be the output format what will be the extension of source code what is the binary and so on so there are certain features of the language if they fit into this framework our system will be able to run it yes how did you identify the most common problems that we need to translate for more understand is it just the observation while teaching or did you do a more systematic study for semantic no for semantic feedback too it is not about particular problem it is about the solution have a problem you write as many variations of the correct solution as you can syntactic the for rewriting yes we looked at the number of occurrences of failures so it's a man but now the effort is toward automating that particular effort answer yes if you had got lunch in compiler design how would you change a compiler other than just having you know more comprehensive message messages to aid an effort like this how do you make a compiler help people learning how to go first thing will be the compilers are tuned towards professional people okay so they because most of the compilers want to produce best possible code which is not always possible and in that they will transform your code in such a way that even the messages that are generated because start becoming cryptic okay even with minus 0 gcc will do certain optimization and secondly the error messages are also tuned towards professional programmer so as I said think about print f and scan f are the primitive thing which a student will learn in first or second class and the error messages related to print f the C the syntax of print f and scan f especially scan f is something which even seasoned programmer do not understand completely okay wrong but this looks like something else you mean that is this stuff of this nature that you would someone suggested us the last semester we move from gcc to clang because error messages are much more cleaner and much more useful but even there there are certain cases where we have to rewrite yes but I think it is time so let's make that this seven minutes after 6 30 so I really enjoyed your presentation I am so the audience also did given that they won't keep going so thank you very much thank you and you can comment from 8, 5, 8, 10, 9, 10 any day OSL or NSL to see Hunter in action 9 is it 9 or 1 8 9 8, 30 to 9, 8, 30 stay up for some more time yeah