 In this video, let us continue sequential pattern mining with some examples. So, if you remember this table from last video, we computed i-frequency mean and S support. Try to understand this in bit detail. So, what is i-frequency? What is S support? It is very important to understand these two values. Consider you have 60 students and you identified patterns. The patterns can be just a single action. There might be an action called read. It might have occurred, single action can be, uni-reaction also can be a pattern. It might have occurred, say for suppose we have only 4 students consider that it may have occurred 100 times, 20 times, 15 and say 35 times. So, the S support will be like 4, because 4 out of 4, S support value is 1. Single item can be also a pattern. But this is just the action distribution. You can just plot it, each actions, the frequency it occurred in a plot in descriptive diatoms, you might be able to get it. So, do not consider the single action as a pattern for your analysis, but it can be a pattern. There is somebody trying to say do not avoid the single actions. When you apply any pattern mining algorithm in a sequence of actions on your data, it tries to give you the single action pattern also. But you can ignore them, because these are the simple distribution of how many times this particular action occurred on each student data, which can be computed in the descriptive diatoms, if you know how to count the actions, individual actions in each sequence. Let us say the second action, say read to quiz. It occurred 5, 3, 0, 7, some values. So, it occurred for 3 students, which means 3 by 4, 0.75. So, you have to understand, excuse me, it occurred to 3 students. So, it is 3 by 4, it is 0.75. And this value will indicate how many students have this particular pattern, whether this pattern is important or not. For example, if I am identified a pattern, say quiz to read, if I have a quiz to read, and it occurred one student 20 times, 0, 0, 0. It occurred 20 times is very interesting, but only one student have it. Do you want to consider this pattern quiz to read in your analysis? That will be defined by the S support. So, when we run a pattern mining algorithm on a sequence of data, it tries to give all the possible combination of patterns, like from a single action, action compared to the other action, the lot of numbers going to come. Which pattern we should use for our inferences? That will be decided by these two metrics. Why these two metrics? Just to pick the right pattern to make the inferences, this tells you more about the patterns, number of times it occurred, how many students it got it. That is what these two metrics is, very simple metrics. Now, the idea is, if you want to consider the patterns which occurred at least 80% of students, then you can pick a S support value is greater than or equal to 0.8. Suppose you have 60 students and you want to consider the patterns which occur only suppose consider you have 60 students and you want to consider the patterns that are occurred for more than 80% of students in your class. You do not care about the other patterns, because there will be too many patterns you are coming out from your pattern mining algorithm, which means you have to pick the right S value. So, we usually keep 0.6 or 0.5 based on number of students. If number of students is 2i, number of combination of patterns are too high, we will reduce it. We will not keep 0.5, we will keep 0.8. But if the students is only 30 students and the combination of actions is not too much, you are not making any inference, you can go down to 0.6 or 0.5. And this is what S support tells you. What i frequency tells you is that whether this metric will tell you whether this pattern is evenly distributed across all students or not, whether you should consider this particular pattern for analysis or inferences. Combination of i frequency and S value will tell you which to pick it. Let us look at that some activities to make you understand these two. There are lots of tools to identify patterns. We will give you one tool, just a Python script, one of our TA created. We will share the tool, we will tell you how to use the tool to identify patterns if you have sequence of actions, arrange it in the certain format. The format is what we saw in a previous example. So, but this tool cannot be used for the gaps in the actions. For example, some might be interested to identify the patterns with the one gap, read quiz video. I am interested in patterns which are occurring immediately read quiz. Also, I am interested in a pattern which occur with a gap of one action. How many times read to video occurred should include also immediate occurrence also with a gap of one more action in between. Just to avoid, it might be a noise. This action is like the students want to really watch video but clicked some went to quiz but immediately went back to video. If you want to consider this kind of gap also, this tool will not help. If you want to do consider that, we might have to create a new tool something like that. But there are other tools available which we used in our research. So, if you are interested, I will go on to discuss a paper. You can look at the paper and use that tool. So, this pattern mining is not easy, it is computationally really costly in a sense, it takes a lot of computational power if you want to identify all the patterns. It uses something like take the first action read, it combine with all other actions, take a breath and deep search happens. So, I do not want to go into that. Instead, the spam algorithm you created by this particular site discussed in the site is actually helping you to identify the pattern mining in a less computation costly. So, if you want to know about sequential pattern mining, how the algorithm works again, you can use check this particular page and read about it. But in this course, we will give you the tool to identify patterns from the sequence of actions. We want you to understand what this particular tool gives you, what is high frequency means or whether you can compute the high frequency or high frequency mean or high standard deviation or high support. What these two metric means, we want you to understand that is why I was trying to teach what is the input to this pattern mining tool and what is the output matrix which can be displayed, which can be used for analysis. If you want to know how this tool works, go and read this path. So, simple activity. In this activity, consider there are 5 students n equal to 5 and high frequency mean that is average is 5 and high frequency standard deviation is 6.9 and S support is 0.8. Given these two metrics like high frequency mean and S support and standard deviation n equal to 5, how many students have pattern read to quiz? The pattern is read to quiz, how many students have pattern read to quiz? Pass the video, answer this question, then assume the video to continue. So, it is simple because I said S support equal to 0.8, which actually tells how many students have it. S support is the column we said how many students have it. How many students have the pattern is basically 0.8, which means n equal to 5, it is simple. So, S support is 0.8, which means number of students divided by total n. If we know n equal to 5, what is x? So, x by 5. So, x equal to 4. So, 4 students have this particular thing, that is it, it is simple. Let us see a bit tricky activity. Same metrics and what does standard deviation 6.9 mean? We know high frequency mean, but what is the 6.9 means? Is it good, bad, what is this particular thing means? Think about it. Let us don't your answers, then let us assume the video to continue. It tells the data is skewed because mean is 5 and 6.9 standard deviation means the data is definitely skewed. What is mean and standard deviation in a general plot? You know, you might know that, see in a general plot, the mean is 5, but it may not be like this, it may be even more flatter, but I am just saying it is skewed. It is kind of skewed. It is not the perfect curve or something like that, which means standard deviation more than means kind of skewed. So, that is why I said we should use median instead of mean, but let us see how it goes. There are possible that most students have pattern only once. There are 5 students, we found out there are only 4 students have this particular pattern. So, the mean computed is out of 4 students. So, the I frequency mean, the I frequency of 5 for 4 students, it is 5, which means 4 students, it occurred 20 times. This pattern occurred 20 times for all students together, because 20 by 4 will be 5, because we are 4 students, that is why if you can see mean comes in. Now, consider if the standard deviation is really high, but then there might be chance that 3 students had pattern only once and one student had pattern around 17 times. It is possible. It is just one possible combination to get this particular standard deviation and mean. So, the all together, it is 20. So, I mean is good. So, when you look at the I frequency, also check the standard deviation. The standard deviation tells you a better meaning, whether it is skewed, where the data is or better use the median value. Median value might be like 1. So, it is not a point 117. So, now if you want to rank the patterns, we ought to pick the pattern which occurred more times. In the last video, we saw what is the S support means, S support 0.8 or 0.6 means. Now, I frequency median or I frequency mean plus standard deviation will tell you a better story. The story is suppose there is a pattern read to quiz, quiz to read and read to watch video, something like that. If you have I median, it occurred 2, it occurred 7, it occurred 3 times. And all of them are above 0.8 S support 1 or 0.8 or 0.8 something like that. We just filtered all the patterns below 0.8 S support out of this, only we consider the patterns which are above 0.8 S support. Now, if I want to order, I want to say this particular pattern is more interesting because this pattern occurred to 80 percent of students, also it occurred more number of times when each student like almost all the students would have got 7 times this pattern. So, if you think median, if you say mean it is also average and you got to consider the standard deviation, be careful if you take a mean look at the standard deviation. If you consider median, you can consider this pattern might have occurred 7 times for each student. So, this pattern is more important compared to this pattern. This pattern also occurred for all students but only occurred twice. So, this is maybe a strategy all your students will be trying to do, your students might be always trying to take the quiz and read, quiz and read might be some strategy students are coming up with. So, that you have to consider. Hope you understood what is the meaning of S support and I frequency. Let us look at one example of using pattern mining for analysis and this is a paper published in CSCW in 2018. Let us look at the paper. So, in this paper, the authors used system called Betty's brain it has set of actions like the reading action the student can read and they can add a notes or take a look at the notes maybe notes action and they can add a concepts because in this particular thing they are creating a concept map, they have to add a concepts and links between it. So, they are saying adding a concepts, adding a link or asking help from the Betty or the other agent or they can take quiz or look at the quiz they taken, they can ask for explanation and convert it. So, consider they have a set of 8 or 9 actions identified from their learning environment no need to understand that all this their own set of actions why but let us consider they have identified 8 set of unique actions in the learning environment. What they did is if they ran a pattern mining, so they grouped the students into 2 groups they might they ran a study on a collaborative one individual say they group the students into 2 groups. So, for example, this is this is a group called the collaborative group this is the individual group. What they did they identified a pattern quiz taken then they remove particular effective link and they again the quiz taken. So, that particular pattern occurred knife frequency mean is 3.61 and a 3.35 standard deviation, but it occurred only 1.63 frequency for students in say students in individual group, but there is a standard deviation. So, like that they were trying to plot the important patterns from the data this is the table. So, if you want to know how they use these 2 patterns and the values of high frequency and support lead a paper bit then you will understand that better. So, that is example paper where you can check how researchers are using the pattern mining metrics as to make inferences. So, in this video we saw what is high frequency and support in detail also please check the paper if you get time that is also kind of used for one of the questions in assignment. So, it is not that the paper reading is optional we expect you to read the papers and understand why that particular matrix is used is it correct think about it and this will help you to start without to read the research papers in learning analytics. So, this video is talked about high frequency and support, next video we will talk about application of SPM. Thank you.