 In this video, we will continue on effective computing and we will see how to detect effective states from facial expressions. So, again the papers which we have listed in last slide, these are the slides will be used for this particular, you know, for the slide. Go and check these papers to understand even more detail. So, the one good way or the one, you know, the baseline method or the method you have to compare detecting emotion is human observations. Basically, you get trained on how to detect emotions and there are ways to do that. And it is better always you have multiple observers instead of you alone, you know, detecting students emotions have multiple people to observe. So, we talked about this in, you know, in a Kappa score prediction that the two letters detecting students emotions, you know, such as, you know, two or three students frustrated or not and they are getting an interrelated liberty. So, if you are doing the human observation, you have to establish the interrelated liberty using Kappa score, Kogan s Kappa. The other way is self-reporting, you know, the person is self-reporting the emotions by observing their own facial expressions. How to do that, we will see. Or you can have, you know, some questionnaire coming into how are you feeling now, you know, I am feeling bored or confused, they can select their own options. That is the way you can think of. So, in order to detect facial expressions by using, you know, automatically, Paul Ekman s research suggested that there are action units in your face. There are a lot of action units in your face, such as, you know, inner brow rise, chin, lips rising, nose wrinkle, all these things are action units. So, Paul Ekman listed down all these action units in a facial action coding system, FACS. Let s see that in detail. If you check that link in a facial action coding system, it is Wikipedia link, you can go and check it. And you will see there are, you know, there are 28 EAU main codes and based on head movements, there are many codes and based on high movements, there are more codes and, you know, there are more codes in invisibility codes. So, what is this code? For example, inner brow rise or outer brow rise and this each and every action units in your facial muscle is actually mapped. So, to know more about this, you know, this 28 codes, go to the link about facial action coding system, a visual guidebook, a block by the company called iMotions, which is pioneer in, you know, in a, in a collecting data from different sensors, like eye tracker, facial expression, GS, EEG, they collect all the data and they sync it, they provide, you know, complete picture. So, that s one thing. So, let s see. Inner brow rise is all GIFs explain what is inner brow rise and what is outer brow rise. So, you get trained on, you know, these, these codings. Once you have, you know, getting trained of these codings, you can, now you can say happiness or joy is 6 plus 12, that is cheek riser plus lip corner puller, which if you have both, then it s a kind of happiness, you know, you can say it s happiness. And sadness is it s kind of this, you know, kind of that. So, you can combine these action units and that can produce you emotions. First step is identifying what are the action units and you get yourself trained. These three combinations are sad. This thing combination is, you know, happiness or something like that. Then that s exactly your happiness, sad, surprise, fear, anger, disgust and contempt. So, this is seventh one, contempt is recently added. So, that s six basic emotions which Paul Ekman suggested in 1960 s paper. So, this is the Paul Ekman s work and using this, you can actually detect the human sex emotions. Let s go back to the slide. So, some of the action units we thought about it. So, we check the blog again and how do we combine how to do it in automatically is, we will see. So, there are more examples that AU1, AU2 with the figures and pictures, you can check those details here. And all the blog which I showed in emotions is really helpful to understand that. So, what is affect? It s a combination of action units, like six basic emotions and how to detect these affective states. Let s look at another blog by emotions. So, what is a facial expression? So, why it is happening, the details are given here. You can go more detail and the basic emotion is what detected here. And you know, there are a lot of details on why this expression is happening, why there are 24 action units that details are given here. So, for example, this one, three, the three action units indicate the person is kind of happy or you know, surprised or something like that. So, or you know, so these are the expressions, combination of expressions indicates the person s you know emotions. So, how do we detect that automatically by using you know, web camera or something like that. So, what happens really is, in a web camera, we capture the video, actually video is 25 frames per second, 25 pictures in a second. So, in a second you will have 25 frames, you know, the frames time is there. What happens here, the first step is using the AI, we detect the face, only the face of the human. So, in the whole picture, the face is detected. After detecting face, face detection is easier compared to you know, detecting the micro expressions that is used. Because the lot of work has been done in this field of detecting face or detecting facial expressions or facial detections nowadays. So, face detection is easy. After detecting face, the part comes here is like detecting the action units, like marking them which is upper eyebrow or what is the cheek, what is the lips, nose, marking them and classifying them as the action units, AU1 or AU2 is doing. So, to do that, they have million faces database and they are trained, all the face has been labeled properly and they trained it. So, they compare with that and they classify the given image as smiling or not or lip corner pulling or not, those kinds of classification is happening. Once you classify, if you identify the A use, you know, that is by company called Affectiva doing it. The Emotions is actually a company which integrates for it from different companies. This particular model we are talking is by company called Affectiva. And after you collect this facial expressions, after you have this action units, again give those to a new classifier. The classifier will help you to detect the emotions, that is the whole idea of how to use it. Let us see that again in briefly. So, what happens here is your facial expression in a video is captured, after that it just 24 frames per second and from each frame, each frame it detects the face. After detecting face, it detects the action units, that is the action is detected by comparing that particular area with the trained database. And from that information, it classifies the action units happening or not. There are classifiers or trained to detect action units, like a simple machine learning classifier. Most of them it is support vector machine or advanced neural networks ones. So, what happens here is from the action units, we have to predict the affective states. So, what actually happens is for example, the data will be like this, say suppose the student ID, action unit 1, action unit 2, year you 3 and say year you 29 or 30 or then timestamp. So, what happens is basically the action unit is not giving a binary classification. Instead of a binary classification, they are giving the probability values saying that what probability that this action unit might be occurring. So, say 0.8 some values, this is a 0.2 or something. So, now you have a lot of values coming over, you know a u1 is happening might be a u1 also a u7 or a u8 we do not know. So, again the trained big classifier and to combine these action units to detect and give the affective states that is emotions, boredom, confusion or anger, fear kind of emotions. So, that is what is actually happening in the backend while detecting the student's facial expressions. So, iMotions is you know commercial company and the cost in India, its cost is really high, its could be around 5000 to 6000 euros. There are some open source things available from Carnegie Mellon University, it is called OpenPost. So, from CMU, OpenPost is another library available in GitHub for free to use it. But CMU is not you know give you the emotions deduction accurately or finally emotions maybe. But what you have to do is check those open posture and they detect not just a facial expressions also the posture of your body and using that data you might have to do the labeling by human observation or self-reporting, then you have to train your own classifier. We will see how to do that now. So, before going into detail about the papers on facial expressions, how the interaction between facial expressions have happened, can you list down the drawbacks in facial analysis emotional recognition system like the one we saw in a previous slide like iMotions where they use students facial expressions from the video and they classify the face and classify the action units and they detect emotions. Can you list down the drawback? Yeah, please list down the drawback then resume the video to continue. So, the drawback the key is training database. You need lot of training database for this and this which means millions of data has been leveled to create you know data and to classify action units correctly and provide you know accurate emotions and in a real places like noisy classrooms it will not work. A student has to interact alone or if two students are working you have to mark it one and you have to you know show only one student's facial expressions and detect emotions and do that. And facial rotations are not captured by FATIVA but FDEX does that you know the movement of a head yard roll pitch. That data has to be you know if I just change my face direction and I just show my different emotion what happens system may not able to detect that. And artificial emotions that is simulated emotions is used to train this kind of classic face. How they got leveled because they asked the participants to simulate the emotions like you show happiness you show anger then they try to code and that is the labeling is happening. So, that is may not be correct to detect the real you know natural emotions. And the most important problem in emotions or the facial direction system software is they detect only basic emotions because Paul Ekman studied about basic emotions and he listed on all the action units and you know which kind of action is combined together will give the basic emotions. But when we are interacting with the learning environment anger fear you know that is not going to happen you know anger. Instead there will be more of learner-centric emotions as I was telling boredom, confusion, frustrated, engaged or delight those kind of thing. So, in order to detect those emotions we cannot use you know eye motions kind of software to detect emotions automatically. So, I will see couple of methods how they do it one is human observation. In this human observation there will be coders trained coders and there is a tool called Bromp it is created by Professor Ryan Baker and Jacqueline and Rodrigo, Professor Rodrigo and Jacqueline Ocklenberg. So, Baker Rodrigo Observation Method Protocol and the Bromp tool is simple tool it is not you know it is not automatic system do not get that it is automatic system to detect emotions. It is a simple mobile app it is in the app what happens is if you are observing say 10 students in the class. So, there are there are student 1 and student 2 and student 3 and say S3, S4, S1 consider you are observing 4 students in the class you are the human observer. This how do you observe basically so we want to observe. So, first you will observe in a round robin method first you will observe student 1 for 20 seconds say 10 0 0 to 20 you observe the student 1 and this is student 1 and you note down the emotions may be bold. Then you go to S2 then you note down you observe for 20 seconds then you note down the student is neutral there is no emotions. He is actually focusing engaged. Then 40 to 60 you might observe student 3 may be confused then student 4 again it is 10 60 then it will be 10 0 1 0 0 to 10 0 1 20 seconds. And you will be observing the students emotion as say S4 may be confused both are frustrated. So, what is happening here is as a human observer you are actually having a paper pen and paper and S1 you are looking at them and make note making sure there is a time you might be looking at the time then you know him down. So, instead of that the prompt protocol tool actually helps you to do that you just have to there will be like a student's name will be there just have to student name you click the student name and the time is automatically recorded from the lab your computer and there will be like 6 emotions you have to click it. So, your time of you know recording is saved that is the simple observation method protocol the tool which tells what to do it. But here they observe the humans not just a facial expression the holistic approach you know the students interaction with the system the gesture the screen what are they working on are they talking to a spear all this information is absorbed then this emotion is noted down. Why 20 seconds that is a very good question they want to observe at least 20 seconds and make a you know emotions and if I if you talk to a psychologist you know working in a HSS department or they say emotion won't last for 20 seconds there is no possibility that you get confused for 20 seconds the emotion surprise comes max stores for 3 seconds 2 seconds that is it. But if you want to go to 2 seconds or 3 seconds such a fine grained kind of emotions it is not possible by the human observers to observe those every time unless you are observing recording the students action in a video then observing coding every second by second. The problem is 1 hour video to code it might take more than 2 or 3 days such a huge data such a fine grained data you will be end up with if you observe the videos for every second. So, 20 second is the logical approach they found out it may not be true you might have the new ones. So, PROMP is a protocol it is a it is a kind of a app which helps you to map it if you are happy you know if you have your own notebook and you have your own timing schema to do that it is good you know you always go yes 1 yes 2 in the in the round robin method there is no need to record it and you have a timer above it every 20 second you press the timer and you mark it then no need to you know detail timer is needed just you have to save both confuse neutral then you are good there is no need of the app or tool that is it. So, basically what happens is a human observation the trained coders the coders got trained in the sense so 1 quarter couple of coders code the students emotions and they get sure make sure that interactivity is high the covence kappa is you know 0.8 then they move on to the next the the train that the observers got trained and they can go independently start collecting data from the students. So, the observed effective states can be different you know whatever emotion you want to do. So, now we can use the learner-centric emotions like a boredom, frustration, engaged concentration, confusion all these things. Then you also can not just a emotion you can also another another table saying that the behavior are they off task or on task it is very easy to detect this is the one of the easiest to detect you know are they talking about the task they are doing or they are talking about something topic and still engaged that is the whole thing you can also observe. And after you observe this data which means student 1 you observed at 10 minutes 20 seconds again you will be observing the student 1 at 22, 10, 0, 1, 40. So, you will observe a student at every every 80 seconds once you have a data like that then they get lot features from the log data. So, what happens here is again you go back to your features creations and using the machine learning classifier for example. So, what happens is the timestamp. So, there is a timestamp and student ID S1 from the students locked at that is the interaction system you compute feature 1, feature 2. So, 100 features then there is a label label is the emotion you detected that is bold. Maybe this is student ID S1, S2 if you did that you know if you create that so which means you need to have at least you know thousands of observation happened in a real classroom then you will have this table. After having this table what happens is you use a machine learning classifiers to predict the label from the features and that is it that is whole idea. And they create different ML models for different emotions. You can train only for bottom, you can train only for confusion. So, different ML algorithm might give a better approach. So, use the when which gives better approach and use them. That is the whole idea of you know creating automatic emotion detection using block data. Here the labeled information is from the human observation. So, I said that in a tally you have to come up with a lot of features to detect emotions. Now, pass this video and list down what are the five features you think is needed to detect emotion from a tally. We know what are the interactions people do with tally you know that they interact with the problem map or they create concept map, they go to simulator and imagine the tally we discussed in our class metal and try to think of five emotions. You are if you think of any technology enhanced learning environment anything you have used and think of what are the five features you think it is important to detect emotions. So, the five is not a number just giving you the idea to think about the features to extract from the log file. Pass this video, write down the answer after writing it on resume to continue. So, there are a lot of features used in the you know in the emotion detection, atom frequency, XMX clicks, how many clicks has been happened, what are the interactions happened, what are students off-screen time, on time, off-screen time, are they looking at it or not if you have a facial expressions camera or the webcam or are they spending on time on this particular task and are they spending time on this particular transactions from this page to this page. If you have eye gaze data you can talk about fixations or if you are watching video are they playing passing time and if they are posting video upward downward. So, whatever features we thought about in our previous class list down all those features and make more number of features for example, how many clicks in last 3 minutes, how many clicks in last 5 minutes, how many thing in last 10 minutes. So, the frequency for last few many few instances. All the time they spent, how much this time spent on this particular video in last 1 minute, how much time they spent on reading resource for last 5 minutes. Such you know such features we expand it more. This features generation comes from you know domain expertise. Professor Rand Baker when he was telling that this knowledge he gained from detecting, creating detectors for multiple systems over a decade and is able to create the feature engineering and features come of it. So, it is not easy to ask you know you would not get it all the features in the first time, you would not get the better classifier in one year in the first time. So, it is start with that and collect more features, create more features or read papers, publish papers of Rand Baker on detecting emotions with different systems. Then you get the idea how to create features. So, after you create features human observation is independent data features is you know the x is the features. And let us talk let us take a simple you know simple linear regression kind of approach w1 x1, w2 x2 and w3 x3 kind of weight is which is what you are trying to estimate from the training data. It is kind of a matrix, once you have matrix you can apply to multiple ML methods. So, check this paper it is interesting paper check it or if you want to more detail on detecting only you know the detecting only the how to detect frustration from a log data check this is I worked on this to detect student frustration from log data. And here also I used human observations a lot. So, let us move on to other approach here Professor Deemelo, Professor Sidney Deemelo tried to understand the dynamics of affective states. From the theoretical analysis of these emotions you know the emotions is not from the computer science department or something it is an interdisciplinary program. So, this affective state is from you know psychology. So, from the theoretical knowledge of you know affective states this is what is kind of coming up like there is a flow, flow kind of leads to a confusion if there is some impasse detector something is stopped or some impasse detector. And if the goal you know so, if the confusion leads and your goals are blocked then you might get a frustration and if the frustration continues you will get boredom and you might drop. And if suppose if the confusion it gets resolved you know there is some issue you are leading it you are some stuck in it you are not understanding you might get confused confusion. If that stuck part you know what is stopped you which is got resolved then you might go back to flow, flow is like engaged. Again you continue engage in the particular learning content. This is the you know dynamics of affective states from the theoretical approach. Let us see is it holding true in data we collect and you predict. So, what CDD Mellow and his colleagues did is this. They use a system called auto tutor. It is a vice based interaction system where you have to speak to the system and based on what you speak the conversational agent is answer your questions. Here you have to learn you know topics like hardware, internet, OAS or something like that. For example the system ask you the question when you turn on this computer how is the operating system first activated and loaded into RAM. So, it is actually a question in the word the agent actually a test to speech converter speaks. Then a student types the answer instead of speaking because voice recognition is not so great at that time when this auto tutor is created. So, when a computer is turned on a file is automatically booted up then it is asking for more information like hey what happens. So, then if you stuck it might provide a hint all these things happening in this auto tutor. So, imagine a student is working on this kind of environment like auto tutor and what you do is you record students facial expressions and posture also you record the interaction screencast you know screen capture you observe both. So, after observing both what you do is you ask the students to self-report what I mean is so after student is interacted with the system show two screens to the system two screens one is screen capture one is students facial expressions the webcam image this is from webcam. Now, if a student interacted with the system for say 30 minutes you pick you know 20 intervals of time or you know 30 minutes if you want to say I want to detect 20 instances of emotions. So, at every every you know every one and a half minute or the every 90 seconds you sparse the video and you show in a 5 seconds of screen capture and the facial expressions as the students to self-report there will be a there will be a menu bar as the students to report their emotion like a frustrated can they just select their option you know. You show what was students doing the contextual information also the facial expression to the student and ask them to report their own emotions at a particular time the self-report. You can have a human observation also to check if this happening or not or you can ask a peer to report the you know peers emotions then you can compare whether two people are reporting it correctly or not. So, randomly pick 20 times and detect the emotions or you know that emotions you can be used as a label and you can use data from eye motions action units I was telling that from eye motions you can detect the action units. Now using the action units from the eye motions and the label data from the self-reporting you create a luck mission learning classifier again the classifier will be using 20 seconds interval or you can use the log data at 20 seconds interval also because the action units at 20 seconds interval will be used to detect the emotion that is all 20 seconds interval. So, how do you combine the action units to 20 seconds get the dominant action units there and just do it or read the paper to know an exact approach. So, from analyzing that what actually happened is this table has come out. Here the double star indicates there is a you know significant flow there is a flow from there is a you know there is a transition from engagement to confusion there is a transition from confusion to engagement and there is a transition from confusion to bus station these three are significant and they did a one-sample t-test instead of to check whether it is reliable or not and the significance is good you know and the number of student sample used also given here. Similarly, boredom to frustration also significant transition is happening other things are not significantly transition happening. They repeated the study again with a new system and again they found out except no all of them except frustration to flows also happening and frustration to boredom also is happening within the students. Now, in this report they randomly selected 20 pre-selected points also three random points has been given to students to self-report and 30 participants. And now with these two data from the empirical study they are going to see whether we can plot the theoretical model of effective dynamics. So, that is what they did here. So, this is the you know dynamics of effective states obtained from this is the end of the states obtained from the self-reporting on the predicting students emotions based on the actions. Let us see. So, here you know equilibrium the flow is actually there is a significant transition in both they can go to confusion or they might come back to engagement also significant transition by both and significant transition between both. There might be a transition from boredom to confusion or boredom to engagement, but it is not significant that is what this indication says that this particular picture says that wherever the significant transition observed from all the students is shown here, but it is not that there would not be any transition from boredom to confusion there might be a transition. So, this is how the facial expressions are used to detect learner-centric emotions that is by human observation and collect log data and predict it or you can ask the students to self-report at that particular 20 or 30 instances of the time they are interacting with the system and predict the students emotions. And that is the two methods we saw and this effective states dynamics gives us a more important information that if the student is in confusion state you need to understand why that student went to confusion what actually impasses created. If you solve that you will go back to engagement mode otherwise you will get stuck in the frustration mode. So, yes, so we have to think of what kind of feedback can be given in this particular moment. So, that is why this particular model is very important. So, we saw human observations and the self-reporting has been used to detect emotions. What are the challenges in human observation self-reporting? Please think about the challenges, write down your answer then let us assume to continue. Human observation and self-reporting you know, it is a time consuming firstly. If you do human observation I said that it takes several hours to code you know 5 minutes of video. And self-reporting itself a problem in the sense students are self-reporting it is not may not be true. So, you have to conform whether the self-reporting is right or wrong. And so, if human observation is happening once in 20 seconds and self-reporting also happens in the interval accuracy of this particular system is not so great. In fact, the kappa score of the detectors which I talked about the plus 2 detectors is really bad you know 0.2 or 0.3 we know what is a kappa score what that means. So, accuracy is not really great it is really bad and we cannot go and make a decision by using the report come from this indicators or detectors. So, that is the issue. And when you do the human observation the main thing is please do the inter-rator or inter-observer reliability that is Kogan's kappa and make sure you get more than 0.8 and that is not easy to train the human observation that needs more time and more training for the human observers that is the idea. So, in this slide we saw what is affective computing and what are basic and you know linocentric emotions and how to detect those emotions from the basic emotions from emotions kind of webcam based systems automatic systems or open posture. And the linocentric emotions can be data from log data or other data if you have a human observation or self-reporting as the label data. Thank you.