 I am head of R&D at Funtut. So Funtut is a personalized digital solution which is an adaptive learning solution which is used by more than 100,000 students across India for grade 2 to 9 mathematics and science. So today I'm going to talk about how we at Funtut use AI to disrupt education using adaptive personalized learning. So let me start with the situation. Have you a quick show of hands. How many of you have been in a situation where you're in a classroom and you wanted to ask a question but you were just scared like hell and you just wanted to get down in a bench because you didn't want to ask that question. Am I the only one here? So almost pretty much everyone of us. And why so? I mean even my friends in the classroom were of similar kind. So the reason here is maybe because we used to feel that oh mine can be a stupid question or maybe I will waste the time of the complete classroom and the reason is because there is just one teacher among 60, 70 of us especially if I'm talking about India and maybe in other countries 30 to 40 of us. And the system, the education system which treats everyone as equal. There is no distinguishing factor between the students. I mean can you imagine elephant being asked to climb a tree. So can we talk about a solution? And before going to the solution, let's talk about the Indian education state for especially the government schools, the public schools. So national achievement survey is the largest survey conducted across the nation which actually finds the state of education for the public schools and this was done in 2017. So it says that as the students grow older, as they progress from grade 2 to 8, let's say their learning outcomes, their learning achievements decrease significantly. And if we just talk about the 8th grade students, they could barely have 40% marks in science, maths or social studies. And also about the infrastructure, the number of teachers, the qualified teachers are a lot lesser than they actually needed. So what's the solution? Philosopher called Benjamin Bloom, he did a research in 1984 where he compared the students in different teaching scenarios, different tutoring scenarios and one of them is conventional tutoring where there is just one teacher in a classroom and there are like 30 students in the classroom and the teacher teaches for a specific amount of time and there is an exam and that's all they proceed ahead. Another, the second system is mastery learning where the teacher teaches, still there is one teacher among 30 students, but the teacher teaches and assesses and takes the feedback and continues tutoring till she figures out that, okay, all the students have learned the concept and only then she moves ahead. And the third system is again mastery learning but one teacher per student. So personalization combined with mastery learning and you can clearly see the results. So the x-axis is the score and the y-axis is the frequency, the number of students. So if we just look at the average student among the one-on-one tutored fashion students, so the average student actually scored more than 98% of the students who were in the conventional classroom. Not just that, if we just look at the bulge in that graph, the variation, so less variation, what does it mean? Which means that 90% of the students in that one-on-one tutored fashion group actually scored equivalent to the highest 20% of the students in the conventional scenario. Now what does it tell us? This tells us that when the students do not perform, it's not only just they are restricted by their abilities, it's the onus of the teaching instruction, the teaching medium. All the students are capable, all of them can reach to the mastery level, provided enough and well quality, a very good teacher to them. So what is mastery learning actually? Mastery learning is when you get rid of this fixed mindset, that all the students will get the equal amount of time, equal amount of resources to learn the concept. In the mastery learning, the only thing which is fixed is the learning requirement. So everybody has to reach to the learning, which is the constant here, and they can take their own sweet time, they can take their own resources, they might want to read or watch the video again, they might want to practice more. So if you can just look at the right side figure, which says that in the usual learning, all of the three students spend the same time, but the amount of ability which they receive or gain out of it, it's quite different. But in the mastery learning scenario, all the students spend the amount of time, which is maybe needed, which is suitable for them, and the amount of learning which they all get out of it is the maximum. And so how do you implement mastery learning? mastery learning is nothing but when the teacher or the instructor gives instruction, asks the students to practice, and then assesses again for her to figure out whether she needs to instruct or give the lecture again, till the time the master is achieved. So this loop goes on and on till the students become able and capable. But is it possible to implement in the real world scenario? Is it really feasible? Isn't it so much expensive for all of us to have a personalized teacher or teacher attached to us and kind of taking care of us in all aspects and understanding all of our needs and catering us, which is clearly not feasible? So adaptive learning actually makes machine learning scalable, which can actually do that owing to the advent of internet and technology. So let's understand adaptive learning. Yeah. Oh, no, actually, this is the, sorry? This is the paper published by Benjamin Bloom in 1984. This is the paper title is Two Sigma Problem, where the difference between the average student, the difference between the average student in the conventional group and the one-on-one tutoring group, the difference is Two Sigma. So yeah, what really is adaptive learning? So consider two students. They are figuring out to learn something. Let's say there are seven skills which they want to learn. So each of the point in this graph represents some learning content. It might be a set of questions which they need to practice. It might be some video. It might be some PDF which they want to study. So how do they navigate in this knowledge graph to have the best learning achievement possible? So the system which is adaptive needs to understand at any point of time how much the student knows, how does the student learn, what is her behavior in learning, what are the strengths and weaknesses. If the system does it successfully only then it can design a personalized adaptive path for the student. So let's start figuring out how do students learn, how do we all learn? So learning is nothing but a simple saying as it goes, practice makes a man perfect. So you keep practicing till you achieve the ability which you are looking for. In the initial phase it might be a little tough because you are having the skill, you're seeing the skill for the first time, you're practicing for the first time. And in the middle phase you might enjoy it because you are getting it. And finally to achieve the final, to have the status of expertise it might again take a while for you. But that was an ideal graph, in real life it might look like this because there is something called forgetting also involved. The neural networks don't just forget, humans also forget. So that is what you need to figure out, you need to understand what is the memory of the student, how do the student's cognitive processes undergo while they are learning. So one of the key component of any intelligent tutoring system which does adaptive learning is the student model, which understands how do students learn, what makes them learn and what is the pace of learning and many other parameters. So what is it? It just maintains students' knowledge, states and skills, strengths and weaknesses, keeps the historical information about students' past learning experiences as well as how the students prefer their learning in any of the medium. So let's just take theoretical or very hypothetical data. So these are the eight questions a particular student practiced in this order, green means the student was able to get to the correct answer and red means the student was not able to get to the correct answer. So what would you really say by looking at just this data? Can we really understand what or how much the student knows? So we define the problem as something called knowledge tracing, where you actually have seen the first N learning opportunities or the problems which the student attempted, where each problem is stacked as zero or one. Zero means the student did not get to the correct answer and one means the student got the correct answer. So given that, can we predict or can we realize that what would the student do when N plus one question? The next question is given from the same skill. Mind you all this N or the N plus one question is from the same skill. So from this one can clearly understand that the observations or the one or zeros, the corrects or incorrects which you observe, they are just observations. There is something more hidden, more latent, which is what we call as knowledge, which you cannot directly see. A test or a question or a problem or an assessment is a mechanism to measure that hidden latent knowledge. So now assume each question is a unique opportunity to learn, as each question gives some solution when you are not able to solve or when you want to understand the question deeply. And when you attempt to the question, some cognitive processes get triggered for you to get to the correct answer, you take the approach, and when you get the answer wrong, there are some feedback mechanisms which give you hint, number of attempts, you attempt that exercise again and again. And that makes you learn. So each question is an opportunity to learn. So when you jump from one question to another, actually your knowledge state is evolving, which is what the white bubbles keep showing. We need to track or trace those hidden knowledge states from each question to each question. So we'll start with a very classical model called Bayesian Knowledge Tracing, which is inspired from Bayes Theorem. Let's look at that. So the hidden knowledge state is also called as knowledge node here, which can take two states. So this is a very simple, very classic, very simple model of learning, where the knowledge state can be represented as binary. Either the student knows the skill or has learned the skill, or the student doesn't know the skill, or he or she has not acquired the skill till now. And the observation nodes are two, which is also called as question node. Again, two states, one or zero. If she gets the question correct one, if she doesn't, then zero. So let's look at the four basic parameters of this model. The first is P of L zero. The initial probability when you are just starting to learn this skill, even before looking at the first question. What is the probability that you will be knowing that skill? So this essentially also is a representation of the difficulty of the skill. Let's say P of L zero is too high, which means that most likely anybody seeing the first question of the skill would be able to get it correct. Which means that this skill is pretty much easier compared to the case when P of L zero is much lower. Then let's take another parameter, P of T, which is parameter of learning. Which is when you want to transition from unlearned state to learned state. So how likely can one practice opportunity, can one question make you transition from unlearned state to one state? So let's take an example of any GK question. Like what is the population of Bangalore? Or how many candidates actually stood up for election in Bangalore? This is Lok Sabha election in 2019. I mean, if you do not know, you just need one question to get it right. And then the P of T for this skill would be much higher. Because once you know it, probably again asked you will not make a mistake. But compared to a skill called mathematics or statistics, it would take time for you and you would need more practice to reach to the level of mastery. The other two parameters, P of G, P of guessing, and P of S, P of slipping, are the parameters which drive the performance. So let's say even if you know the skill, even if you have mastered the skill, there are chances where you still make a careless mistake or if you slip. And the other parameter is P of guess. Where even though you do not know the skill, you can still guess it and make it correct. So this is nothing but two state hidden Markov model. Where there are two transition states and two emission states. So as I was saying, there are two states, unlearned and learnt. And P of L0 defines what is the initial probability when you start learning this skill, which also defines the difficulty. And there is a transition from P of 1 minus P of L0 to P of L0, which actually specifies the probability of learning. Now, if you can see clearly, there is no transition back. So this model assumes that once you have learnt, once you have reached to the learnt state, you cannot go back. There is no forgetting involved here. This is a very simple model. And once you are in a state called unlearned, you can still get it correct by P of G because you might have guessed it. And vice versa. If you are still in a learnt mode state, you can still get it incorrect by probability P of S because you might have slipped. So whenever the student gets a practice opportunity, a question for this skill, the learning probabilities are updated. And I am not showing here too many Bayesian equations, but just to give you a quick idea, let's say we want to get the probability of correctness from the probability of knowledge. So how would that look like? So when can you get the question correct? Either you have learnt it and you have not slipped it. You have not made a careless mistake. That's why the first term P of learnt into P of did not slip. Or the other way you can get it correct is you have not learnt, but still you could guess it. So let's take a set of examples. So these are the four parameters shown in the bottom. P of L0 is 0.5, which shows that it's kind of medium difficulty skill. P of T is 0.2, P of transition or P of learning. P of guessing is 0.4, 1.4 and P of slipping is 0.09. So let's take a sequence 0 and then streak of 9 ones. So the first question was done incorrectly and then straight away 9 questions were done correctly. So as you can see with the fourth question itself, the student was able to reach to the 95% probability. So the student had mastered the skill. So why give the rest 6 questions? Why does she need to practice more? Compared to another set of parameters where the P of G is significantly higher, probability of guessing now goes up to 0.64. So even though considering the same sequence of the student 0 and then 9 ones, getting the correct answer does not mean that you really know it because the probability of guessing is much higher. So the system needs more confidence for the student to be declared as mastered in this skill. That is why the learning curve is much slower for the student. Probably the student would need here 8 practice opportunities to master this skill because of the very high probability of guessing. So how do you train these models? There are 3 students who have done questions of 3 skills, addition, subtraction, multiplication in this particular order, the upper table. Now this is a skill specific model. So the data for each skill has to be separated out, extracted out. So all the reds are actually taken in chronological fashion for addition and out of addition we train 4 parameters which is one BKT or Bayesian knowledge tracing model. The same with subtraction and multiplication. So a lot of information is lost but we'll come to that later. Now how do these models generalize or how are we supposed to use them? So as we said that in an adaptive learning system, whenever a new student comes to our system, we need to generalize, we need to be able to help her. And how can we help her? If we train our models, if we do cross validation at the student level. So let's say if you have the sequences of 1000 students, we should train it on 700 students and should validate or test it on the rest 300 students. Rather than breaking the sequence in like any other time series models where you take half of the sequence as training or the rest of them as validation and stuff like that. So let's take a division question something which is something like this. Do you think BKT or Bayesian knowledge tracing can handle this? Probably not because this involves 2 skills. Not only the understanding of division but also subtraction. And as we said BKT model only talks about one skill. So each question needs to be just involving only a single very specific skill. But when there are questions or problems or practice opportunities where a question requires more than one skill like in this case you need a more sophisticated model. So one of those models is performance factor analysis. Let's just look at a basic formulation of this technique where the function of probability is of the function where if you are looking for student I and this function is for a particular item or a question. So this practice opportunity involves J skills. So there is a beta parameter which actually estimates the difficulty of that skill. Also the number of successes. So there is a success learning rate. So for example till now for the skill J you have done 3 corrects. So you have done 3 successes. So there is a success coefficient or a success learning rate for the number of prior successes as well as a coefficient or a parameter for a failure learning rate. So you have a coefficient for the number of incorrects or failure still now. So these 3 parameters actually estimate the probability of correctness given the next question of any of J skills. So as you can see like BKT here we do not have a representation of the latent knowledge. We just have the representation of the final outcome, the final observation. This technique do not capture some or measure the latent knowledge. So yeah we have P of M which translates this quantity to 0 to 1 range to get a probability which is an exponential function. So let's take an example. So gamma as we said was a success learning rate which is 0.2 which is double than the failure learning rate which is 0.1. Now what does it mean? So even though you fail there is a 0.1, there is a weight for all failures which is 0.1 which means that even if you fail there is chance that your future probability to get the question correct might increase which is true for all the scenarios right. As we say by failing only we learn. So if you fail more each failure gets a coefficient or a weight of 0.1 but if you success the coefficient is doubled. So as you can see with initial probabilities 0.38 if you make an incorrect attempt it still increases which becomes to 0.4 again an incorrect attempt it still increases to 0.43 but if you make a correct attempt it increases a little bit in a higher form almost double which is 0.48. Now let's take a case where your failure learning rate is negative is below 0. Now the same example of the 0.01 so as you can see the probabilities decrease actually. Now does that mean there is a negative learning? Not really because values of row below 0 mean that the failure provides more evidence on the lack of knowledge than the learning opportunity actually causes the improvement. I mean this is a very subtle point it is looking similar but it is not a negative learning scenario. So these were the two techniques or methods which actually start with a learning theory assumption which actually try to model the learning in a way the scientists or the innovators have thought through. What if we do not have any assumption we do not have any theory beforehand. So then we would so the assumption there would be that maybe learning is a very complex process which we could capture in a handful of parameters probably we could not. So let's try a very powerful or flexible or a general approach. What would that be it would be something like a joker in our deck of cards which can fit anywhere which is in our case a deep learning model a deep knowledge tracing model which involves as we clearly saw that this is a time series problem. So we can anyway capture this with our standard RNNs or LSTMs. So what do I'm sure most of you would be knowing what RNN or LSTMs are. So just to give you a quick overview they actually the hidden neurons in the hidden layers of any RNN or LSTM network they actually represent the memory the temporal structure of the data. They actually memorize whatever is important and that can be used in the subsequent time steps. So at the time step two the input is not just the input coming from the data as well as the hidden states coming from the previous time step. So what would a deep knowledge tracing model take as input or output or how would it look like or how would you like to train them. So let's say there are three skills involved A, B and C and the questions are tagged to them. The first question the student gets incorrect which is of skill A. So each input is nothing but a question and the response to the question. And it can be encoded as a vector of two N inputs where N is the number of skills which is three here. So input vector would be of size six where first three or first N inputs would represent one hot encoding of which skill the question is tagged to. And the rest three inputs would talk about whether that particular skill while a question was correct or incorrect. And subsequently in the output you have three outputs which actually talk about the probabilities of correctness for each of those three skills. So since we have three skills here we have three probability representing the probability of correctness for A, B and C skill. So how do we train these models? Are they similar like our BKT model? Not really. Because for Bayesian knowledge tracing models we use the lower data to train them. We actually separated them out. For DKT, for deep knowledge tracing model, you don't need to separate them out. Because deep knowledge tracing model is able to handle the different questions in the same sequence itself. So even though the questions are coming from three different skills, still one model will do it because we have the capability to encode them in our input vector. So how does DKT then work? So if we consider that the hidden neurons in the hidden layer, they actually are the representation, the compact representation for the skills. And the connections between the hidden neurons actually talk about the connections or relationships between the skills themselves. So actually what this deep knowledge tracing model is doing is actually capturing the similarities, the similarities of the dependency and the prerequisite relations between the skills which any knowledge graph would do. Now if we just look at these three techniques, Bayesian knowledge tracing, performance vector analysis or deep knowledge tracing itself. We have assumed one thing that whenever there is a practice opportunity or there is a content piece, the skills or the concepts needed to get that question correct are tagged by the human experts already before we train the data. What if we do not have them? What if we just have a sequence of questions or a big item bank of questions? What if we do not have a good knowledge graph where we do not have mapping from questions to skills? So if we encode in the deep knowledge tracing model directly the questions rather than the skills attached to the questions, we can actually get the relationships between each pair of questions. So for example, if you want to know the influence between a pair of questions or items i and j, which can be computed by the probability of getting the jth question correct, given i-th question was answered correct in the first attempt. So the probability that once you answer the i-th question in the first attempt, which you answer it correctly, what would be the probability that you would answer the question j in the second attempt? And if I want to understand the impact of correctness of i-th question on j, I would just divide by its influence, divide by the influence of all the questions on j. And if I plot a directed graph out of this, I can actually get the knowledge graph completely built out of this. So generally in any knowledge graph, humans or domain experts or cognitive experts spend years of efforts to get a good knowledge graph. But here with just a lot of data we could get. So the upper part is the synthetic data where the data was generated programmatically assuming there are five hidden concepts. When the algorithm is trained, that mapping is actually hidden. So the algorithm was able to capture the five hidden concepts individually. So there is no overlap or there is no communication between five skills. This is a very real data set of Khan Academy, where you can see that algorithm is still able to capture the relationships between the skills and even identify different clusters of respective skills. So now let's talk about how do these models fare against each other. Are they comparable at all? So we trained these models with the data of around 36,000 students which involve around 12k problems or questions which would constitute more than 30 million data points. And we compared them with the metric area under the curve or AUC. And we can see that DKT is still a lot better. But PFA, performance factor analysis, is just marginally lighter than the DKT. So that is a point to note here. Of course, DKT is not performing very well. The AUC of DKT is 0.65. But given recent extensions to DKT called FSA, DKT plus FSA, you can still get to somewhere close to 0.85 or 0.83. So then the question really is, is deep learning really deep? Which I seen clearly in the different domains of computer vision or image recognition. So what we really need to look here is that let's just go through the advantages or disadvantages of each of the techniques. So DKT probably is an overkill because in performance we could not get a significant gain out of it. But it was marginally better. Also it required a lot of CPU time to train as well as the number of parameters was in few hundred thousands, which are really uninterpretable compared to the earlier simple models which we saw, in knowledge tracing or PFA, where the parameters are quite explainable. The teachers or the educators or the policy makers can actually look through them and the classroom teachers can actually design their subsequent classes using them. But there is a good advantage out of DKT as well, which we just saw. A knowledge graph could be built out of once you have a lot of data, which is called as discovery of structure here. And the other part is in any learning system you have a lot of other behavioral features like click streams, like the amount of time spent on each attempt of the question, like the human label, different inputs, or how the students are performing in the classrooms, their actual results of their school exams. So you can actually encode all of them through vector representation in deep knowledge tracing model because it does not have any restriction. Anything which you can encode in a vector can be inputted. So the key takeaway would be here is that whenever you are trying to apply any machine learning to any of the domain, this domain here is knowledge tracing. So if you're trying to apply to any domain, you just need to figure out whether that knowledge, that domain requires the depth of the deep knowledge tracing which it provides. Probably here, knowledge tracing is a shallow domain. It does not require the depth of deep knowledge tracing or deep learning. So now using student model, different student modeling techniques, we figured out how students learn or we have some understanding of how students acquire knowledge through different modes of practice. We have student model, we have knowledge estimates, we have probability of correctness for different skills. What next? How should we use this to make decisions and design a personalized learning path which is adaptive? So we can use a very simple technique. Let's just go through it. Let's assume we are having the same running example of deep knowledge tracing where you have the probability of correctness for each skill. So let's say on the current state, we want to identify what should be the next problem be. And the options here are it can be from one of these five skills, fractions, number system, decimals. By the way, these are fourth grade CBC curriculum topics, fractions, number system, decimals, ratio and proportion or algebra. So what we can do is we can ask the model, what if I gave the question from fractions? And what if I gave? So then what if the student would answer it correctly? So then there is the first branch. The student answered it correctly and the model tells us that the probability that at the current state, the student will answer correctly is 0.63. And after the student answers it correctly, which the model tells us, it also tells us what will be the probabilities of all five skills after the student would have answered one question correctly from the skill, which is what we have captured as P average. So given this simulated input of one, we also have another input that what if the student would have answered incorrectly? At that point of time, the average predicted accuracy at the next step would have been 0.39. So how do we get consolidated information out of it? We just take an expectation, which is like the probability into the value plus another probability into the value. And whichever skill gives the maximum average predicted accuracy for the next step, we choose that skill and generate the next question and next practice opportunity from that skill. So this is like going one step deep in the future. Like whenever you play chess or any other board games, you actually predict your opponent's move and so on and so forth. You design your move accordingly because you also know your player's capabilities. Here there is no adversarial player against us. There is just a student who is trying to learn. So you can also apply the same techniques where you go one step in the future, which is nothing but the depth of one. You can go two step in the future. But just that, if you go more deep, the number of nodes in the tree explored exponentially. So what is a tutor model? The thing which we describe, the algorithm which we just described is a tutor model, which takes the input about the student's learning process, which is student model, and designs the strategy based on the student's learning goals and teaching plans and whatever teacher has been teaching in the classroom based on what the student's needs are, strengths, weaknesses. It actually designs the subsequent learning path, which is the job of a tutor model. One example of an algorithm we just saw. It also diagnoses misconceptions, understands students' learning needs, and if needed, incorporates remedial strategies. We can also look at very simple strategies apart from this reinforcement learning or Markov decision process strategies. Have you ever wondered in your school time table, you had 40-40 minute periods of science, maths, and other subjects? What if we could have done in this fashion? We could have completed mathematics in the first two months. We could have completed science in the next two months, and so on and so forth. Would that be a good strategy? That strategy is called as blocking, where you block a skill, you take a skill, you kill it, you go to the end of it, you master it completely, and only then you move on to the next skill, which is an example here. You keep practicing multiplication till you master it, another way is interleaving, where you keep mixing the practice opportunities or problems of different skills, which in this case is multiplication, division, addition, and you keep practicing till you master all of them together. So how would we use our current student models to apply or incorporate these strategies? So let's say if we are using the blocking strategy, we would keep blocking a skill till we see that we have mastered that skill. Remember we saw the learning curve, the four opportunities tells us that 0.95 is the probability, so we now need to block the skill more. We can move to the next skill, which we can block. And if we are employing the strategy called interleaving, we can take the skills together and we can just keep generating more practices till we identify that the student has mastered. So by this, yeah, so that would be quite a detailed discussion, but just to give a quick summary, there are different types to it. There are many conflicting skills as well and there are many similar skills as well. For example, if I want to learn multiplication of integers 2 into 5, versus if I want to learn multiplication of decimal numbers 0.2 into 0.5. So if you keep blocking and if you learn the multiplication of two integers first, then you would falter when you go to, because you have understood multiplication of integers and that has been reinforced very deeply in your brain. Now when you move to the multiplication of decimals, you would start applying the same rules, but actually the multiplication of decimals is a little bit different than multiplication of integers. So in such scenarios, interleaving would work perfectly. But in some scenarios, blocking might also do well, or it doesn't matter if you choose blocking or interleaving. But in general sense, research says that interleaving is much better. That is why if you look at the curriculum design or timetable design, you have interleaving spread across. So this actually brings to the complete picture of the intelligent tutoring system architecture. We saw at the student model, wait a second, we'll start from the user. So user sees or interacts with the system with a user interface, which is nothing but an interface model. So user will kind of give her response to the system. The system will then send that responses, accompanying the question or the practice or the video or wherever the student is currently lying to the student model. The student model will then estimate, based on the historical data of the student, what is the current knowledge state of the student, what is the current profile of the student, including her strengths and weaknesses. That data, along with the domain model data, domain model is nothing but the mapping between skills and questions and relationships between skills. Both of these data is then processed by the tutor model, which will decide how to chalk out the subsequent path or the future path and adapt according to the student model inputs. And that, again, translates to picking a specific exercise or a specific video or a specific learning content. And that is then passed on to the interface model, which will then figure out how to render or how to interact with the student for that particular content piece. So this was a little bit about the complete system which is known as intelligent tutoring system. So we at FunToot have been working on intelligent tutoring system called FunToot, that is a primary product. And where we believe that every child is unique, and we follow one-on-one personalized tutoring strategies, as well as adaptive learning strategies, where the main aspect or philosophy of the system is about learning by answering problems, by practicing and practicing. So this is what our interface model looks like. A question is given to the student, a practice opportunity is given to the student like this, and whenever the student makes a mistake, the feedback is given. For example, the student wrote here SSL, side-side-side rule of congruence, but according to the question context, it makes sense, SAS rule, because there are two sides are equal and one angle is equal to the adjacent triangle. So the system is kind of giving the hint to the student that probably SSS, side-side-side rule will not make sense. Probably you need to employ the rule where two sides and one angles are equal. So journey so far, we have already helped more than 150,000 students who have attempted more than 80 million questions and more than 5 million hours are spent on the platform, and I can proudly say that this would be the largest dataset of any Indian learning system, I mean any learning system, learning dataset for Indian context. And in the beginning part, we just saw at the current state of education from the National Achievement Survey data, where we saw that the results of the government students are so poor that they are not even comparable. Even a graded student, as we saw, she cannot score 40% marks in math, science, and social studies. So here, this is the output of a study which we have been working with the governments and government schools. So this is about concepts of irrational numbers for grade nine. So each point in this graph represents a student. The horizontal axis is the number of questions the student got for the skill to practice, and the vertical axis, the y-axis, is the accuracy or the efficiency which the student developed. So you can clearly see there is a significant negative correlation between the questions and the efficiency or mastery of the student, which shows that we saved a lot of, probably hundreds of thousands of hours of the student. So the lowest, the number of questions which a student got is around 10, and the highest is around 40. So it says that this approach can actually help the students engage more, because when you know something and you just keep doing it robotically, it actually disengages you. It can actually put you off, and that leads to boredom. And mind you, boredom, frustration, these are not just casually used words. These are very significant cognitive constructs, which actually can put you off while you are learning. So this helps in engaging the students in a much better way. Another example is about concept of irrational numbers in grade nine. This has even a larger negative correlation between the number of questions and the accuracy which the student achieved. Just talking about the number of marks or the learning outcomes or learning achievements, so this is about grade six estimating and comparing numbers, where you compare numbers or order numbers in ascending, descending order, and so on and so forth. So all the comparing, taking the results of all the students, they actually elevated them from 42 marks to 99 marks, which is around 135 percent jump, which is significant for a public school or a government school, compared to the below average students who actually really need an intervention. For them, the elevation is from 17 marks to around 98 marks, which is around 500 percent jump, which is huge. Another example of grades and congruence of triangles, which we saw the question from the interface model. They are also the similar results, 46 marks to 88 marks, which is around 84 percent, and for the below average students who are really poor, they have around 460 percent of jump. So with this, I am concluding. I am happy to take questions. Hi, this is Akhilesh from Agar al-Adz. So, by the way, great talk. It's one question that's probably not directly related. So your feedback mechanism for the models that you trained here was, if I am right, simply based on questions that they've answered, right? So like a mark sort of system. Do you have any other sort of feedback, one? And two, have you also thought of taking into account cognitive approaches apart from statistical ones? So for instance, the way you spoke about cognition, right? So the way humans learn, etc. So have you, do you incorporate elements of that too? Yeah, so that's... So a question or any, as I was saying in the earlier slides. So the behavior or the observations are a way to measure how much the student knows. The knowledge is latent. So question is just one mechanism to understand how much the student knows. You can as well understand the behavior or the learning of the student through how the student is behaving and how much time you're spending on the question given to you. How fast you are getting to the right answer or how many attempts you are taking to get the right answer. Are you looking at the hints provided? Are you looking at the recommendations or the solutions provided? Are you going through all the steps step by step or you are just skipping to the step? Probably somebody from your neighborhood told you what the right answer is and things like that. So all of this is taken into account and today I think we saw in the tutor interface. So the right hand side is a playlist and to answer another question about cognition and cognitive processes we also use Bloom's taxonomy. Bloom's taxonomy is a way to classify the educational objectives which actually they have listed down the cognition can be divided into six blocks. The first is remember where you just need to recall the skill the next step or the next tough or little complex cognitive skill would be understanding something which comes post you remember something you are able to remember something. For example I am able to remember Pythagoras theorem but I am not able to understand it now. So the next step is how to make the student understand. The third step is applying application where you actually apply whatever you have learned and understood. The fourth is analyze, evaluate and create. The last step which is the sixth cognitive level where actually you have learned and you have figured out about the domain so much that actually you can contribute and create some knowledge on top of it some paper or some research and what not. So we do use cognitive mechanisms as well. Anybody else? I think somebody here. So I actually wanted to know because you talked about Indian scenario right and this being the maximum data set that you have an Indian scenario I see that it's mostly in English have you thought about going on local languages because most of the government school right you'll actually have kids who will not be able to comprehend English very well. That's one of the things that we have. By the way the last few slides which I talked about for the impact they are not in English they are one of the local languages of Indian state. So are you using some AI or machine learning techniques there as well to actually figure out how to get the right kind of information from what the child is saying? No we are not really right now so much into the computer vision or the speech or translation part of it but we have started working on it. The last few slides which I talked about they are actually about Telugu content and dealing with Telugu students. But the data set which we earlier saw the numbers about 30 million data points that includes most of English. But we have data sets in Hindi as well in Telugu as well in a couple of other languages. Probability of initial knowledge, probability of miss and I was wondering like instead of giving a final score to a student does it make sense to give them this so that the teachers can work with the student in a better way? Yeah here because of the scope of this talk we did not even go to how the student dashboard would look like or how the teacher dashboard would look like. So I will take a moment to explain how this complete system works. So it is not just an offline system or internet based system where the student can just log in from their home. The way this integration works is like you have mathematics period or science period in your school curriculum fun tooth period is assigned every week. So every week two periods the students would be coming on to fun tooth lab so computer lab where the fun tooth are there and the students post the teacher actually teaches or delivers lecture in the classroom. The students come to the lab they practice and those reports are actually shared to teachers and there's a teacher dashboard where we actually show all this how are the students doing are they guessing are they really getting it right how much sometimes it's not about just cognition it is matter cognition even though I know but I think that I don't know much so I'm not feeling confident about it. So not only in the in the improvement of the scores in the improvement of confidence so generally Indian students face a lot of math phobia they are just scared to come on to learn maths. So even in terms of challenging them their own beliefs and their own belief system that they can also learn and challenge these distractions so we have been helping there as well so the interest or the math phobia has been going down the students are actually engaged and motivated to interact with the system. So one last question after that we will take rest of the questions offline. This is fascinating. Sir are you doing machine learning? Oh yes we are already doing that. No GRE, GMAT those are CAT they are comparative adaptive testing this is adaptive learning so when we just quickly jump to this slide so when you are answering a sequence of questions in a test format your learning is not expected to change from question to question that is why you can be assessed in a span of 3 hours. If your learning would be changing between the first question which you answer in your 3 hour test to the last question which you answer in your 3 hour test probably this technique will not work because each question is also given feedback so each question is an opportunity for the students to learn something and rectify her misconceptions so these are all adaptive learning methods where actually we are tracing the change of knowledge from question to question and opportunity to opportunity. In the GRE and GMAT the learning state is assumed to be constant while they are answering all the questions so there the different set of techniques are used and by the way we also have adaptive testing and general assessment product where the teachers just want to take an assessment to the students to understand where the lack and what strengths they have this is about the practice about the learning