 In this video, we will discuss differential sequence mining. This is extension of SPM. So, differential sequence mining is trying to identify the differentiating the behavior between two groups by using the patterns, like the patterns identified in an SPM algorithm. So, we can actually use the statistical test to identify whether there is difference between these two groups is statistically significant or not, that will help you to make up claims better. And if you want to know more about of this differential sequence mining is used, how it is started, at least read this paper, a 2013 paper from a Journal of Educational Data Mining. Let us see what is DSM. I will try to explain with a very simple example. So, there are group A group B, consider group A is in your class, students from one section is group A, other section is group B, you are teaching the same course, all the students are equivalent, consider all the assumptions carried out, and you did all the tests to make sure everybody is equivalent. Within the same class, you can split into two groups, and you are giving this kind of intervention. You are asking them to interact with MOOC or you are creating a simple learning environment where they have to watch video, read some GUI pages to answer some questions, they are free to do everything. And they are free to navigate in the particular environment, you ask students to solve a particular problem, then you captured all the log actions like the reading action, GUI actions, log using log data, then you want to identify the difference in behavior of the group A versus group B. It is very simple to identify the behaviors differences by if you have a pattern mining, let us see. The group A did read PDF, read to quiz 10 times, consider this is I frequency median. And imagine the S support is equal and say 0.8 for both, 0.8 or 0.6. N is not important here, consider the N is also equal and say 30, 30 or something like that and S is 0.8. Consider the N is 30 in both and the S support is 0.8 or above. So, your I frequency, it can be a median, let us say it is median, not a mean square. So, you have this median values, I frequency values, this reach quiz have occurred 10 times, here it occurred only once to this particular once average or median value 1. And quiz to watch video occurs 9 times, here are 2 times and quiz to read occurred 2 times here, but quiz to read occurred 9 times here and simulation to quiz occurred only twice, but simulation to quiz occurred 4, it is kind of equivalent. Can you identify the difference in behavior between these 2 groups? It is obvious that this group had more read to quiz. So, can you identify the other differences in these groups? So, that is activity, it is easy to identify, I am just already I gave the answers. You can talk about which patterns occurred more frequently in group A, which pattern occurred more frequently in group B, which pattern kind of occurred similar in both groups, that is what I want you to think. So, there are 4 patterns classified into 3 groups, a pattern which occurred more in group A, a pattern which occurred more in group B, a pattern which occurred in both groups. After you list down, assume the video to continue. So, it is simple, group A add a pattern read to quiz and the quiz to watch video, these 2, 10, 9 compared to this group, this is simple you can compute. And group B, its read to quiz occurred more compared to group A, read to quiz. And simulation to quiz occurred kind of both groups. This is a basics of differential sequence mining. It is very simple if you have only 2 groups, one of these patterns is easy to identify. But if say if I have to use the statistical significance test on this particular particular patterns differentiating this, I want to see is there any statistically significant difference between these patterns occurring more in group A compared to group B. If you want to do that, what you have to do, you remember the I frequency we computed, use this value to compare the statistical significance, let us say this. For the read to quiz, suppose there is a value read to quiz for a group A, it occurred 1, not 1, 10, 8, 11, 9, 12, 7, 6, 13, it occurred in this kind of sequence. And for group B, it occurred in something like this. There are 2 sequences, you can use these 2 sequences to create the statistical significance test like a t test or man-witney test. So, you have to be careful which test you want to use. So, it is not the normal, it will not be a normal rise to value. So, maybe you might use man-witney view. Think about that, that is not part of this course. But yeah, just go and read it, just for fun understand which test to use on which kind of data and the statistical significance. So, yeah, so if you have a statistical requirement used on that, then that may tell a better value. But we love a tool to help you to do that. So, that is the basics of differential sequence mining. So, let us see the same paper which we discussed, we saw that in a last class with more detail. The paper we saw in last class, same set of actions, I want to go back to the same table. This is what happened here. So, what happened? There are like 24, there are around 40-50 students who worked on individual, there are around 24 groups who worked on a collaborative group. You can read the paper to understand exact number. And there is a difference between the frequency. Suppose, collaborative group have this more compared to the individual group. So, they compute the man-witney view test to find and if a whether this occurrence is significant or not. So, this particular occurrence is significant, this particular pattern occurred more in collaborative group compared to individual group. Similarly, for these 3 patterns occurred more in collaborative group with a statistical significant difference. And these 3 patterns occurred more in individual group compared to the collaborative group. And these patterns occurred on both. With these table, these values, you can make a new inference on behavior difference between these 2 groups. For example, you can say that collaborative group students mostly had a supportive link and take a quiz and they remove that ineffective links. Compared to individual group students, they simply read, read, read, read, read, which never occurred in the collaborative group, which occurred in very less frequency. So, individuals of group who do not have someone to talk, they are simply reading, reading, reading and trying to understand the meaning on their own. Whereas, collaborative group students would be do not need to read much because they can collaboratively work together 2 students are working in a collaborative group 2 or more. So, they are talking and they understanding they are able to solve the sums easily. So, individual group might be always asking query and explanation, they may be asking agent A what is this, ask the agent to talk because they do not have anyone to talk. So, they will be asking agent a lot of questions that particular behavior is more dominant in individual group compared to collaborative group. So, these kind of behaviors will tell you how to create inference from this particular patterns. And there are some of the patterns which occurred for both that also tells what is the common behavior between these 2 groups. If you are conducting studies say in a class of 60 students you want them to work on online environment and you do a pre-test and post-test and based on the scores, if you want to classify them as high-scorer versus low-scorer, you can create as a 2 group, high-scoring students, low-scoring students, then you can run a pattern mining based on the reactions to see is there any behavior differences. In diagnostic analytics we try to understand why student got less score compared to the other students who got low high score. Here we are not trying to predict anything, instead there is one group who did really good. What is their behavior, what is the interaction behavior in the system compared to the students in the low group or what is the behavior the low group did very good but high group students is not able to do. These kind of analysis diagnostic analytics can be done using simply the pattern mining or differential sequence mining. Hope you understood what is SPM and DSM in the last 2 videos and we do not give a tool for DSM instead we give the frequencies, the values, high frequencies. So, use those values and compute your own DSM behavior and compute your own statistical sequence test. Since the data is not normal I will suggest man with me use best to use but there are a lot of tools available in online to compute man with new you. So, go ahead and check the tool this week and compute some patterns from using your own data, create your own data and test it out and read about which test significant test to use like man with new you why and apply it and understand what is SPM and DSM. Thank you.