 Welcome back to learning analytics tools course. In this week, we will continue diagnostic analytics. We will see sequential pattern mining, differential sequence and mining and process mining in this week. You might have noticed that first two, three weeks we are talking about data collection, different environments, how to collect data. That is to give you motivation, what data to collect on what environment. From last week, we will be talking much of algorithms or the tools we can use to apply. So, whenever we give a demo of a tool, please try to explore with different algorithms, what we are learning here. So, that will help you to understand how these tools are used for education data. So, what is sequential pattern mining? Consider your sequence of actions, say you have collected students behavior in a MOOC or in a tele like a metal or some other environment or a Moodle. So, your sequence of action students interacting with your system. So, student has to log into your system and they are doing sequence of actions, say actions such as read, go and watch the video and answer the questions. And each action is different. A student can read a page one. Again, you can read the another page. It is not the same page he is reading, it is not continuing the same page, but he is reading another action. So, read can continue by the another read action, but each read is different, it is not on a same context, the context varies. So, let us consider that as a two actions, read contest one, read contest two. We can talk about how to merge everything later, but let us consider that each action is different. So, you need to identify the unique set of actions. In the last week, we saw that in a metal, we classified the group the action into six or seven sets. So, like a functional model planning, qualitative model, quantitative model, like that you need to come up with a unique set of actions. In a MOOC it is simple, you know, MOOC we have only few set of actions possible like a read, watch video, interact with forum or answer the questions, assignments. So, those kind of simple set of actions. So, let us create such a set of actions, the unique set of actions, arrange it in a sequential manner as it appeared based on the timestamp you captured. Example action sequences here, for example, this is in a MOOC a student is actually watching a video, after watching the video is adding some post. So, the student is adding, going to the comment the forums and adding a new post or is going to upload someone's already added post and he is reading a PDF. We do not know what is he reading, it might be in the context of information. Let us see the action is reading, then he is taking quiz. We do not know the which question he answered, what is the response. Let us see he is in a quiz, he is just looking at the page. Again reading, again watch video, instead of watch video, we can say it is video only, video, add post, quiz, read, upload it is doing this kind of sequence of actions. Please understand this sequence of actions clearly, a student enters to the MOOC and watches the video, after that he is adding a post and uploading. Then he goes back to some other page where he is reading, the navigation is not given, but navigations can be understood indirectly from this kind of sequence of actions, is reading a new page in a online resource you provided, reading particular page PDF. After that he is again going back to quiz to answer some question, but he is not able to answer, going back, read again, he is not able to understand that, he might be watching the video to understand, still not understood, go and add a post saying that that particular question is not discussed in the class or something. So, then he posting it again going to quiz and trying to understand, read, upload, there might be something he was looking for in the forum, then again read. So, these are set of sequence of actions a person is doing in a MOOC, consider that. How do you say sequence is based on the time stamp we captured, but how long a student was watching the video, how much time he spent on like reading the PDF or how many questions he answered in quiz that is not captured yet. Let us consider just the sequence of action, this will capture the frequency. Last week we saw just the process model, just this transition between each actions, now we are considering the frequency, consider that we will consider frequency in pattern menu. So, what are the patterns we might have from this, watch video, add post. So, see this has repeated twice, this is interesting. So, there is an action in student 2, twice, it happened twice, there is a sequence of actions occurred twice. Apport and read, this action, this sequence of action also appeared twice, 2 actions, quiz and read, here and here, this also appeared twice. There might be other actions which might be appearing twice because I did not check correctly, but let us see, lead and apports not, there are a lot of combinations can be possible, but consider that only these set of actions occurred twice, other sequences have occurred only once. So, the patterns from this particular set of sequence of actions from one student can be these 3 patterns possible, watch video, add post, apport, read, quiz, read, this occurred twice. Let us do a small activity, the sequence of actions, you have example sequence of actions given from last slide and there are 3 students actions, you have to find a pattern. So, the 3 students actions, sequence of actions given here when they interact with the MOOC, you try to find out the patterns in this 3 sequence of actions. Please watch the video after you find out the patterns, please pass the video, try to really find out the sequence of actions. We will discuss few of them. So, let us look at the 3 students sequence of actions, S1, S2, S3 and our aim is to identify which action sequence occurred on which students. So, instead we try to create this kind of table, let us see this is a table and let us find this table in a minute. Let us see V is video, watching video, Q is workplace. How many times a sequence of action, a pattern, a video followed by a quiz action, video followed by quiz action occurred once for the student 1, this is student 1, this is student 2, this is student 3 in this iPhone, understand that the notation we are using. So, video followed by quiz occurred once for student 1 and after that, no, no, only once, here it occurred, it did not happen, this pattern did not occur for student 2 and this pattern occurred for student 3 video quiz. So, 1, one time for student 1 and one time for student 3, zero time for student. So, 1, 0, 1 that is the meaning. Out of 3 students, this pattern occurred for 2 students. How many students have this particular pattern? 2 students because 1, 2, 2 students have it, you understand this value 2 and there is a pattern occurred for student 1, this pattern did not occur for student 2 and this pattern occurred for student 3, 1 times each. Let us look at the second pattern, video followed by video. Video followed by video did not occur for student 1, but it occurred for not to student 2 also, then it occurred for student 3. So, this is the student 3, it has 1, it is 0 for student 2, 1 and 2. So, this is S1, S2, this is S3, it occurred only once. Out of 3 students, only one student had this particular pattern. The pattern can be not just 2 action sequences, it can be 3 also, it can be 4 also, it depends on how much complexity you want to use. If you can make a sense of 4 action in a sequence and that makes some inference for your hypothesis, please use that. But let us say it is 3 sequence of actions can be considered here, video, quiz and read. Video, quiz and read, video, video, quiz and read is not here and video, quiz and read is here. So, 1, 0, 1. So, which means this particular sequence of actions, this sequence of action actually what is actually happening in this particular sequence, whenever students watching a video taking quiz is immediately going back to read. So, this is important compared to this. So, we can say this action is not much, this is what we are going to consider. So, it occurred for S1, S3, not for X2 and it occurred for 2 students, that is what it is given. So, it is again video read quiz, it is basically this. It occurred once here, 0 time here and video read quiz. That is the idea. Similarly, you can compute for the other sequence of actions. Here the idea is, for example, I will show you some actions which might occur twice. So, read, read, read, read occurred once for student 1, not, not for student 2, 1, student 3. But in this case the pattern, it occurred only once for student 1. And read, read quiz, 1, 2, 3. So, read quiz, read quiz occurred everyone like 1, 1, 1 and all 3 students have it, only one student have it here, this is or many students. So, read quiz might be more, I am not sure 1. So, let us consider this 2 set of actions. There are few more patterns as possible from it. So, the pattern can be a same action repeating. It is a unary action can be repeating multiple times can be pattern. A student is reading, reading, reading. It is also a pattern. He spends a lot of time, it is my strategy for him. Or reading and taking quiz occurring in a loop that also pattern. The students is reading immediately taking quiz, reading and taking quiz. Some students do not read anything. They just take quiz and watch videos, quiz and watch videos. It might be also pattern. So, that is what we try to identify. I gave you example of few patterns that can be identified from the three students. And I want you to create this such a kind of table to understand this pattern occurred not to student 1 and 2 only to student 3. It occurred 1 to only one student. For example, here this quiz to read pattern occurred student 1, student 2, 1 student 3 twice. So, quiz and read occurred twice to student a year, also the year. So, but it occurred to all 3 students 3. It is not sum of this number. You might think it is just a simple sum of this number. No, it is just how many students it attains. Like the 3 students that is only all 3 students add this particular pattern. It might have occurred in different frequency among different students. That is you have to understand. That is the 2 differences. Identifying pattern occurred on student 1, student 2, student 3. And identify frequencies this particular column. This column says whether how many students have this pattern? One student, two students or three students. So, just to differentiate I added this particular pattern here. Hope you understood this table. It is very important to understand this table to go further. So, if you understood this table, let us see 2 metrics. Sequential pattern mining has only 2 metrics. We have to understand these 2 metrics. That is very basics. It is enough to understand the sequential pattern mining and talk and explore and do analysis on it. 1S sequence support that is S support. We call it as S support that captures number of individual action sequences for a group where that sequence of actions occurred at least once. I will just bit you just a minute. What it says is number of individual action sequence occurred for a group where the sequence of actions occurred at least once. That simply tells how many students have it. That is what S support. Sequence support. This says number of this action sequence occurred to a group of students. Like 2 students have it. At least once it occurred not it does not matter how many time it occurred at least once. So, here it is 3, 2, 1. This is basically S support. This particular column. Let us look at the I support, instance support. It is defined as number of times the pattern occurs within the action sequence for an individual in a particular group. The group is 3 students out of 3 students. For each students, how many times the particular pattern occurs within action sequence for an individual? This is what this I support. Like 1, 2, 1, 1 and 2. This is what I support means. I support tries to capture how many times the particular pattern occurred for an individual in a group. The particular group is 3 students here. For this group, all 3 students, how many times each pattern occurred? 1, 1, 2. This is called I support. These 2 are very key and important matrices for SPM sequential pattern mining. Only these 2 key matrix is enough to understand the sequential pattern mining. So, let us compute I support and S support for this particular table. Let us take from the I support. This is I support. From the I support, we can compute I frequency mean or I frequency standard deviation. I frequency mean is basically finding the average of that sequence of actions. For quiz to read, let us take quiz to read. It occurred once, 1 and 2 times for student 3. So, if you add all the numbers, therefore, divided by how many students it occurred, 3 students, so 1.33. So, mean of this particular pattern is 1.33. If we have a lot of students, say 10 students or 60 students, your actions computed, you can compute mean, median, also the standard deviation. So, if the mean is too far away from standard deviation, it is best to capture median. Just telling you that you can use median instead of mean. But I frequency mean means this value. I frequency median is another measure you can compute. And for S support, all 3 students have that particular pattern. So, 3 by 3 is 1. So, S support is 1, understand this. In this video, we introduced what is sequential pattern mining. We will talk some examples in the next video. Thank you.