 This is the last video in this week, process mining. So, process mining is like getting a process model from the temporal data. It is commonly used in customer care centers, a banking or in the healthcare sectors where suppose a call center person calls you first, based on your responses, your profile responses, what is the next process to do? If there are batch out into three or four responses, what is the next process? Should we send an email? Should we call him back again? And what is the process of applying credit card to the end of credit card closing? What is the process of taking a loan, start to loan end? So, this basically associated with start to end, what is the process? And it has been used heavily in bank sector or customer care sectors. But this also can be applied to education data. Very few researchers actually use process models for education data. It is interesting actually, if you know what is process model and you can also apply and you can understand how the students interaction behavior in a complete process. So, there are different algorithms to develop the process model fuzzy miner, alpha miner or use cheek miner. You can go ahead and read about this, but let us look at one of these models in this video. So, the process model, if you want to know the process mining software more about process models or fuzzy miner, I recommend you to read these two papers. These two papers are not for your assignment or anything, these are extra work for you. If you are interested to understand what is the process model and it gives the examples how it is created and everything. So, especially the authors, the Van der Waals just actually one who created the tool which we are going to use in this course. So, process mining analyzes temporal sequence data to develop a process model that contains set of nodes, events or actions and edges, transition between actions or nodes. If you see nodes and edges, you might have remember that in last week video, we computed a state transfer diagram. We mentioned there is a node and edge kind of things, exactly the same kind of patterns we will use process in process model. So, nodes can be the actions and the transition between actions is called as edges. It is the aim is to develop the abstract process model and there are two key metrics, significance and correlation. Somehow it happens always I have two key metrics, two only two things to remember in each algorithm. Let us keep see the two key metrics in this model also like a process mining, significance and correlation. So, what is node? Suppose there is an action read, there is action quiz. This is considered to be node, node A is read, node B is quiz. The transition between these nodes like a read to quiz transition is edge. Maybe this goes to watch video, it is another edge. This is node 1, node 2, node 3. So, there are two metrics, significance and correlation. We will see what is two metrics for this nodes and edges. So, significance is a measure for both nodes and edges by relative importance of their occurrence compared to the total occurrence. Simple it is the node or edge that occur more frequently are considered to be more significant compared to other nodes and other edges. For example, it is simple. We will see a detailed example for nodes and edges computed in some sequence, but I just want to give you the example of nodes and edges and significance here. For example, there is an action read, watch video, quiz, read, read and quiz. So, what is the most significant action here is read which occurred 1, 2, 3, 4, 5 times. So, most significant action is read, it occurred 5 times. So, we will be given first one value for significance. Relate you to this particular actions, other significance of other actions are considered. For example, v occurred twice, 2 by 5, it will be 0.4. Quiz occurred twice, 2 by 5, it will be 5 more. Significance of watch video is 0.4, significance of quiz is 0.4, related to the most significant action that is R, that is read, read is 1. That is how this significance is computed. Significance is not only computed for the nodes also for edges, which edge occurs more repeatedly compared to the other edges. If we have more read to quiz compared to other edges, say quiz to watch video or quiz to quiz to simulate or read to watch video, then this will be read to quiz to be the more significant that will be 1, other significance will be computed based on that value. And what is correlation? Correlation is how closely each two actions occur always like which pattern is more frequent? Basically it is a pattern of two actions like read and quiz occurs most relatedly most occurred patterns then that will be more correlated compared to the other things. When I say how two events are closely related, if read occurs always quiz occurs with that. So, correlation is computed. This is exactly what we did in state transition diagram that is kind of a correlation here. Let us see a simple example, there is a temporal data. So, there is a sequence of factions A, A, B, A, B, A, A, B. So, can you compute state transition diagram for this particular sequence? We computed last week video, but can you repeat it here in this week? After you create a state transition diagram, resume the video to continue. So, I am not going to show the state transition diagram that you might have done it because we discussed that in the last week. Let us look at what is process model for that particular sequence of data and we use a fuzzy minor algorithm. There are other algorithms like alpha minor or u-steak minor. Each has a different behavior, but let us look at the fuzzy minor in this course. Sequence of actions is A, A, B, A and B, A, A, B right. So, let us see. A occurred 5 times, that is a most significant action. So, A is equal to 1 because A occurred 1, 2, 3, 4, 5 times and B occurred 1, 1, 2 and 3 times, which means B occurred 3 by 5. The significant of action B or the node B is 0.6 in this particular sequence related to the most significant action that is A that is equal to 1, right. So, 0.6. Similarly, what is the correlation that we saw that in a state transition diagram, what is the transition property of A to B? Significance of edges also can be computed, you can compute it. Let us look at the correlation for this example. A, there are how many actions. So, let us look at the correlation in detail. How many actions which A to X, the X can be A or B, how many actions which for A to B and how many actions for A to A, right. A to X there are 1, 2, 3, sorry 1, 2, 3, 4, 5. There are 5 actions, the 5 transitions from A state A in that A to A is 1 and A to A is 2. So, which means A to B will be 3, right, 3 plus 2 equal to 5. So, the correlation of A to B is 3 divided by 5, that is 3 divided by 5 is 0.6. So, the correlation of that is 0.6. So, this is 0.6. And correlation between B to A might be different 1, 2, so 2, so it will be 2 compared to, oh there are only 2. So, this will be high, there is no self correlation. So, this is the self correlation A to A will be 0.4, sorry 2 by 5 equal to 0.4. So, yeah, you can compute significant and correlation. It is not that we have to compute significant and correlation that is enough for peak fuzzy minor. It is important you have to apply set of algorithms to create the process model on these set of metrics. Consider you have these 2 metrics, significance and correlation and you have 7 actions in your learning environment, you computed this process, this kind of process model that to look like a spaghetti, right. It has a lot of actions going from one node to each node. How to abstract this? The process model is creating the abstract model from the temporal sequence data. How to abstract this? For fuzzy minor we apply 3 rules. That is the rules we are going to talk about for fuzzy minor. For other algorithms rules might change. The basic is significance correlation. Let us look at the rules for fuzzy minor. Highly significant nodes are preserved. The nodes which are more significant are preserved. We will not remove those nodes. So, consider you have a node A, node B, node C, knee. Highly significant nodes are reserved. So, the suppose consider the node significance of A is 0.8, 0.7, 0.1, 0.3. Highly significance node that is A, B, C will be preserved. We may remove these two nodes. What will happen to these nodes? Let us read the second rule. Less significant nodes that are highly correlated into correlated or aggregated into clusters. Consider these two nodes are less correlated, say less significant, 0.1 and 0.3. It is less significant, but have a very high correlation. For example, the relation between these two is a 0.8, sorry 0.2. So, this is 0.2. So, this particular node is less significant, but it always occurs, if it occurs less significant means it occurs very, very, very tissue time. In the sense, it occurs how to say, it occurs only say 3 times compared to the A. So, if A is 30, it should be 3 times. It occurs very less time, but it is always a co-occur with the node A. If that is the case, then we can combine this, aggregate this to create a cluster. This node can be combined into a cluster 1 or something like that. That is what this particular thing says. And less significant nodes, the node which is less significant with low correlation with others are dropped. For example, this node is less significant, consider the point is less significant. It is more self-occurrence say 0.7 and it has a co-occurrence only 0.1 and from here, if it is a very less correlation with any of the node, but it occurs, it occurs very, very few times, say 9 times out of 30 actions or out of A occurring over 30 times. This occurs only 9 times, then you can drop this particular node. This node is not correlated with any significant node or this node is not significant at all. So, it can be removed to reduce the complexity of the process model. That is the idea of fuzzy manner. Hope you understood these 3 steps, it is very important. Highly significant nodes preserved, less significant and less correlated is dropped, less significant and highly correlated nodes are combined to create a cluster. That is a very basic steps in fuzzy manner. So, to change the abstraction level, it is not that fuzzy manner can be removed completely like system comes up with their own model. You might say no, no, there are something I really want a significance is more important. Sometimes I want more correlations is important, then you can use some particular formula. That is called again tooth values that is node cutoff and edge cutoff. To remove the nodes, which significant is lower than this. So, 0.3 should be removed or not, it is your decision. You can say all the nodes which are significant less than 0.4 should be removed. So, you can put a node cutoff say 0.4. So, node cutoff we saw in last slide, 0.3 is low significant or I say we do not know, but you can define the threshold to say this 0.3 is less than 0.5, my threshold is 0.5, whichever node which has significant less than 0.5 should be removed. So, we can put the node cutoff as 0.5. And edge cutoff to filter out the edges which utility values below the cutoff. What is utility value? Utility value is a combination of significance of edge also the correlation value of the edge. I said that edge also can have a significant value. So, that as a weighted combination of significance of edge and the correlation of edge. The weight you are depends on you. So, we want to have more weightage to the significance keep 1. If you want a less weightage to significance then keep 0, then it will be 1 into correlation. You will have more weightage to the correlation value. Applying this particular formula, you can modify the process model. The fuzzy manner might try to give you the abstract model you can say because you know the research questions you are the one researcher you want to create the model. It is not that any generic model can be applicable to you. Then you can say no, I want to keep all the nodes. I do not want to remove any nodes, but I can locate to remove some edges which has less correlation value. Then if it is less correlation value then put u r equal to 1 and the correlation value goes off only the significant edges are state back that that reduces the complexity of the process model. Please look at this particular equation is very simple just you have to understand how the abstraction is happening using node cutoff and edge cutoff only 2 things. In the process model software we you can actually vary this node cutoff and edge cutoff in the software actually I will show in demo actually you will understand this things. So, let us look at example process model applied when I want to research paper. So, in this paper we applied process mining on learner behavior from one of the OELE that OELE is again is Betty's brain. We saw in for the SPM and DSM this is Betty's brain and its set of actions. Let us consider the actions here are a bit different from last paper. The actions is read short, read long, reading a resource science page for less than 3 seconds. I will consider a read short because in 3 seconds you might be looking for 1 or 2 words not really reading. More than 3 seconds is looking is called read long but also read long as well as upper and red threshold. So, if you are reading more than say 1 minute it is not actually reading anything just kept the slide open event somewhere off. So, we are I think cutoff between read short, read long. Then we had a link edit support, link edit ineffective, the effective link edit we added a link this is correct, effective if it is incorrect it is ineffective that is what we try to say. Quiz taken is taking quiz is being quiz and his explanation is not actually this paper link is there you can download and read if you want more information. So, what we did we computed this and we created a process model. This is the simple frequency between a high web scorer versus low scorer, high scorer, low scorer as I mentioned in last video based on the pre and post test course we group the students into high scoring and low scoring. We try to see the process model for the these two groups and see is there any difference between the process. If you look at here there is a distribution of each actions for high versus low is given here and that tells that high scorer as then lot of actions compared to the low scorer also by percentage it is more for some actions. For example, read short is more for low but read short is not that much 14 percent for high so that is also given. So, you can use this table to identify the descriptive and if you can plot it in a graph and see it. But let us look at the interesting part that is process model. So, in process model we can add a artificial start and artificial end. So, start is here and end is here and this color the thickness of the line this is very thin this is kind of thick and the colors are dark or light gray indicates significance and correlation. So, let me say what is that? So, if you look at this so the thickness and darkness of the edges indicate the significance and the correlation values are associated with the edges. For each edge if it is thick it is thick which means it is highly significant this particular edge is significant if it is dark which means it is highly correlated with these two values. So, let us look at the process model. Let us understand one process model. So, we did one simple thing we wanted to maintain the process model of utility value was more than 0.8 or something. So, you can read this paper to understand because we adjusted we do not want to lose any nodes. So, we kept all the nodes no significant is absolutely 0.1 or 0.2 just keep all the nodes and we adjusted the edge cutoff value that is utility ratio value to keep 80% of the sequences in the data which is 0.8 means if I want to reproduce sequence of actions from this process model I should be able to reproduce 80% of original sequence data. So, I do not want to abstract too much so that I lose all the value. So, when you do that do it for very few number of students they pick some random 10 students in one group and do it because if you keep all 30 students in a low group to create the process model it will become a spaghetti and you lose lot of information. So, yeah decide based on what you want to do based on your search question. So, what happened is the people in the low group they started reading and they are doing a lot of short read that is why the correlation high correlation here. And after reading they go to read long and there is a loop here they read long read short. Once they pass this particular loop of read long read short if they are read long they might go to edit the link in a Betty's brain after reading you can create a concept map editing a link. After editing the concept map they can remove some other edits or add edits they keep on doing edits that 3, 4 types of edit actions. It can be combined there is a relation between these edit actions also. After doing that edit actions from all these actions they go and take the quiz. So, they read they create a concept map then they take the quiz. After taking quiz they check the quiz answer or they take a notes they check whether things are right or wrong or after reviewing the quiz they read the explanation or they go back to read or they compute here or they can take a notes go to quiz view or they do something. So, this is the set of process of actions a student can do in the Betty's brain sorry student can do now it is set of actions the low performing group students did in the process mining. For example, they read read edit immediate take quiz quiz view explanation as per explanation if it is wrong go back read again again come back this and they might end from quiz taken or from this the end is very less significant and less correlated just to say what are the notes where the ending notes for these groups. Consider the high group they start with the same similar reach out read long they also do the link edit link support most of the incorrect actions. They take quiz and they immediately go and look at the explanations and they also not take quiz very often. So, after that they immediately go to read short or they actually do not care about the quiz view. They take quiz and every definition they go back to read long this set of extra things this is not available in the lead low group. So, there are few significant difference behavior difference that can be identified from the low group SSI group. This paper is to show the process mining or the abstract differences between low group and high group. It is not to say that a high group is doing good in this actions or low group is doing good in other actions. And also here we can compare the process model with the sequential pattern mining model. In a process model be consider the frequency right how many times this action happens but not the time they spent on each action. But when you use process model tool you have option to improve the time taken on each action also start and end time. If you do that you can create a better model use stick minor model to look at it. I would request or I could recommend you to go and explore the other type of models or like not just fuzzy minor other minor models other process mining models and see what is the difference and explore and understand. So, this paper you created this paper will be part of your assignments the assignments question means which you might be part of your end exam too. So, in this week we saw what is process mining and you might get the process to mining tool also demoed in this week. So, in this week there are two demos three different algorithms process mining SPM and DSM. Let us continue talking about diagnostic antics in one more week. Thank you.