 Welcome back to learning analytics tools course. In this week, we will discuss predictive analytics. We saw in first week that there are four levels of analytics in learning analytics and we discussed very briefly about descriptive analytics. That is how to represent the pictures, how to represent the data you have. And the last three weeks we saw what is diagnostic analytics. We discussed pattern mining, correlation and some clustering techniques. So, hope you understood how to perform diagnostic analytics. The next step is predictive analytics and that will be the last step of this course. We do not talk about descriptive analytics in this course. So, when we say predictive analytics, which means it will measure what will happen next. Diagnostic analytics is what happened and why it happened. But it is what will happen. If you know diagnostic analytics, you can create a relationship between dependent and independent variable. You can use that relationship to create the future events that is called predictive analytics. So, you can use that relation to predict the future events. So, predictive analytics also subsumes diagnostic analytics, also the descriptive analytics. So, always remember the first step is doing the descriptive analytics, looking at the figure, checking the outliers, there is a relationship from the plot, if it is possible. You know, if you have more than say 10 variables, it is not easy to plot, you can get the sense of each variable data and this relationship, but you would not get a complete picture. Then you can see the correlation matrix, is there any relationship or you can see any equations there. Then you go for the predictive, create a model which can predict what will happen next. So, predictive analytics is to understand what might happen next. And this is based on the historical figure of data is needed. So, in order to predict what will happen next, we need historical data to train the system, also test the system, right. And it involves a descriptive and diagnostic analytics like we showed in the previous slide. And which means we have to describe the data, identify the patterns from the data or identify the correlation relations from the data, then extend the pattern, whatever you identify the relation, extend it to the future events. So, that is the idea of predictive analytics. So, it analyses the current and historical data to predict the future events. Like if you are teaching in a class in this year, you might use the data from this year students, also the previous year students in order to predict the final exam score, because you do not have the exam score of this year students because they are currently studying. Also, we will use machine learning and data mining tools to create the predictive analytics models. So, for example, other than an education domain, this has been used widely for other predictions, fraud deduction in credit card or loans, anti-spammers in email, you get the mails classified as spam or not spam, everything is use of predictive analytics. There are few example domains here. One is finance. In finance, it is to predict the credit card risk or loan risk. In health sector, in order to understand the insurers predicting whether insurance will get sick zone or not, so that we can ask for more premium. And in telecom sector, it is to predict which user will drop out or which user can go for the higher package, something like that. Also, other than these domains, which is used in many of the domains, this is the main part of machine learning or this is where the machine learning predictive analytics has been used in all other domains. So, we discussed that predictive analytics require an historical data to predict the future events. Let us talk about educational domain, educational contest. Do we need historical data of a same learner or can we use data from other learners? Can you think of that? Also, if you can use the data from other learners, what are the restrictions? So, we said that we need historical data to predict so that we can predict the future events. So, what is historical data? Is historical data of a same user or other users? Think about it. After you write down your answers, we assume to continue. We may not able to collect enough data from the same learner to predict the future events. You know, the learner would have joined newly in the first year of the college or the learner is the first time interacting with your intelligent tutoring system. You might have created a system to teach you one specific topic. You will not get the same learners interaction system. But you can use the similar learners from the previous study or previous years of data. So, what is the previous learner should have? There should be similarity, what is the similarity is basically? They should be same domain. If you are collecting students data in class say mathematics of second year students, you have to collect the last 2 years or last 3 years students who have done a mathematics in second year, similar domain or similar environment interaction be given, it should be classroom environment. The same kind of teaching strategy has been used or you are introducing a new teaching strategy that is a variable one. But you are using everything else is same like a classroom and students pre and post test or the ability everything is same, should make sure that is same by some statistical measures. Also the interaction time, it is not that in one year student interacted with the system only for 30 minutes and next year you are collecting data and want to predict where the student is interacting more than say one hour. So, make sure you have same domain, similar type of interactions, similar interaction time with the system. So, that that kind of data can be used to predict the future events. And also the data collection should be similar, it is not that you are collecting extra data from the previous to this year. But however, if your research question is valid, you want to collect new data to prove something extra, it is allowed. So, what I mean here is if you want to use the historical data, please make sure that you have a similar domain, similar setup of study and a similar data collected. In example, for intelligent tutoring systems, we might create a system to teach a concept say Newton's law. And I have some method to teach it and I also have a set of assessment questions to assess the students. So, I created a system to teach Newton's law to say class 8 students. Or now we collect data from say 100 students in a one school. And the students are interacting with the system, you collected the data, you are not doing any prediction of there because that is a new system. So, pilot study or it is a first study you are doing. You collected a system and you are going to establish whether the Newton's law system which you created as a learning gain, compared to pre-test and post-test, whether student learned due to your system, your intervention, compared to a normal intervention like classroom education or some internet. Then you establish that the system is good. Now, you want to say, you want to create a predictive models to predict which user will struggle, can you provide more hints on the user, something like that. In order to do that, use this 100 data and the students data. And you are trying to create a model to predict the interaction, whether user will get the mark in assessment questions, whether user can understand the concept, all these things you are trying to predict. So, when we go for the next set of study, we have to make sure the users are similar pre-test level, run a statistical significance test to test whether these two users have the similar pre-year knowledge. If their prior knowledge is same, then you check the interaction time. If you use one hour, they should also use one hour and you do not introduce any new element here, everything you make sure, then you conduct a study to test whether your system works or not. By running multiple studies like that, you can say that your system can predict the students interaction and that can be useful for the providing recommendations. It is not that within 30 students first study, it will be can provide hint and recommendation, that may be from the hypothesis you have, it is not from the data. That is what I want to establish in this particular activity. So, predictive learning analytics is also varied based on the stakeholders. For example, learners, this will be helpful for feedback and suggestions to avoid failure in the learning goal. For a teacher, it will be in the dashboard for like all learners performance vary and can also help teacher that which learner having struggle or so whether the lot of students in the class are not able to understand particular concepts. So, teacher can use that data to revise the teaching strategy, all these things possible. Also in a content developers of field learning systems, they can use these predictive analytics to provide recommendations, hints or adaptive content on that. In this video, we discussed what is predictive analytics and what is the importance of historical data. Thank you.