 Okay so hello everybody. So my presentation will be about the multi-model annotation for expressive communication. This is a joint effort of three different groups in our department. That's town, so natural language processing, then CM Tech, so the division people and GTI, that's human computer interaction of that generation. So this this work within Maria de Maestro is complementary to a Horizon 2020 project and I'll introduce briefly this project as well. Okay so here very briefly the content. So what the general purpose of this of this work is a compilation of training material for communication of embodied conversational agents. So this means that we multi-model in this sense means facial expressions, gestures and of the silencer voice. So and I'll start therefore with the embodied conversational agents basically pointing out what the shortcomings are and of the state-of-the-art in these agents and what we want to achieve here in this project. Then we'll go to the multi-model annotation for expressive communication, task goal challenges, then the corpus. We are working on the recordings so that's more or less health care domain but also basic care and other related issues. So then the the fourth item is multi-model annotation of nonverbal cues that is for the for the time being we focus on nonverbal communication and in our annotation task and even more we'll focus on effective communication. So that is we do not consider yet semantics we just we want to annotate emotions, affections. Okay so a couple of items we'll talk about then as we heard in the first invite you talk yesterday so data protection is a very important issue and especially in this as can imagine you know so that's also if we communicate with people and we will see the domains we're working in so this is a very very central topic and then final conclusions. So by the way so I'll be sharing the talk with Monica who is doing the actual work so I'll be doing the general talking and she will become more concrete. Okay now so I'll do this part one two three and then she will continue. Okay so important conversational agents so there they are all over the place now so there are agents who are taking us through the stock of a furniture store now so agents who help us to set up our TV so then give us advice how to lose weights etc etc etc no so and the social companions are very important or the aspect of social companionship is very important and in particular when we talk about conversational agents the problem with the embodied conversational agents nowadays is that there let's say monocultural and monolingual more or less so that is if it's very likely that if we have a furniture store agent it will act the same way no matter whether we are in Sweden in Spain in Morocco or Australia no so and this is certainly unsatisfactory no so because we act we interact in different ways no so culturally speaking so then they are monolingual we want to be multilingual we want to interact in different languages and okay so this is the state-over-the-art in dialect strategies usually so what the agents use that's a scripted dialect strategy so ping pong strategy you say something I say something etc no so there is no really kind of barge in and etc so the flexibility is rather restricted no and if we look at let's say social and medical care aspect so we need to be or the agent needs to be expressive no so to be let's say compassionate etc no so and this is also not yet the case so and we try to put a handle on those problems in this in the Christina project as I said so this is a horizon 2020 project and the three groups of the department are involved so what we're trying to do is to develop an agent with social competence human interaction capabilities and with for interaction with migrants so because they they are very interesting groups not to just to test how far we can get with our with our technologies no so and we focus on three different migrant groups so that's first that's the group of elderly and as we know elderly I need of a social companion no so they need to to be entertained no so to chat about the family to the weather what's going on being remembered reminder of daily routines etc no so and here in this case not just because keep in mind we are talking about multicultural aspect so we are here we focus on Turkish migrants in Germany so then the second me doing group that's illiterate people who are in need of mediation health care health care as such so health care questions but also health care system questions no how can I get an appointment with a medical doctor etc no so here we focus on North African African migrants here in Spain and the last migrant group that's untrained people caregivers from Poland who come to work in Germany so they stay for three months or so in German then go back so they don't speak usually they don't speak German and they are not really qualified to do the job no so that is here we need to let's say mediate and to to show them how to well to to learn or teach them the basic skills in in basic here okay so where the the major goals of Christina are then the development of reasoning based flexible dialogue strategies that take cultural and social differences into account equally as an emotion the emotional state of the of the people then should be multimodal so this the whole story is about multimodality so that the facial expressions gestures and voice as I said and it should be able to learn no so that is if wants to communicate or to teach something no and to tell a story so then needs to learn from the web okay here is the Christina pipeline so we have here the users so they can communicate with the agent in different in terms of facial expressions gestures voice we analyze this then we fuse all the information we process it and then we separate again the one we know what we want to say so then we separate the different modalities and then we communicate here and obviously what what we are focusing on in our Maria de Maestro project that's these parts here no so the analysis and generation of different modalities or more exactly preparing the compound training material for these components okay so now we are the multimodal annotation for expressive communication what is the task no so starting from material that is recording of semantically and culturally diverse material we want to annotate now we want to annotate synchronous asynchronous events of different modalities and what we want to get is there an improvement of the quality of multimodal annotations so far so people working on the let's say separate modalities they annotate as they think it's okay for their modality no so that is we want to unify we want to make the annotation more coherent and we think that it would be a damn good to increase the impact and obviously this this goes now so the unit increase of impact goes with the increase of the quality if we have a coherent multimodal annotation so we think that the impact of the corpora will be increased no everybody will will be using them no because they are they are better they are more coherent etc etc and so as a let's say site objective that's develop integrative research lines within our department so we already started now so well as I said we were working in over three groups are working together and I think that there are many more perspectives and many more let's say goals to be achieved working together okay so the challenges so we have a technical challenge and we have a date the management challenge so the technical challenge is well the annotation itself so this means first the definition of creation of annotation guidelines and this is a very as we know no so those who annotate it or so no that's a very tedious and time-consuming task no so to to figure out what the best annotation is and then obviously once we have the annotation guidelines to ensure that we have the annotators trimmed to the let's say to to to follow the guidelines and to achieve reasonable into annotator agreement no only then will will be able will will be able to use our annotated corpora and to the challenge the data management challenge this is accessibility in data protection no so we need to think about licensing proprietary rights treatment of sensitive data etc etc and if we think that we are recording elderly people will think that we are recording let's say caregivers etc so this is really that very delicate issue okay so the corpus we are working on so recordings no so in five different languages that is we assume that each language represents more or less a culture so that is we have German Spanish Polish Turkish Arabic to cover all the use cases I mentioned before no so so far you see the distribution of the recordings no so in German so we have most of them so then and Spanish or share then in in Polish and Turkish and Arabic no so they are still we still have very little data no so and so the recording sign progress the goal is to have 10 hours of recordings for each language for each culture and this is quite a lot of material compared to what people usually have okay so now I'll hand over to Monica to explain the details of our work thank you so well at the end of the day what we want is computers that can recognize when human talk to them and they can understand and make sense of the emotions that we convey with nonverbal behavior and on the other hand computers that are able to reproduce this kind of human like nonverbal actions right so what do we need all we need is love okay all we need is data yeah we need data and we need annotated data and as humans whoops as we when we try to make sense of what other people is saying now hopefully you are you are getting more than my language only because you're not hearing me so we use like a holistic kind of approach so we listen to what we say but we also visually get clues and meaningful clues from how we say things right so this from a computational point of view is very difficult to represent because when we're trying to deal with emotional aspects in communication probably you may understand and perceive that I am being happy because I smile but I might be nervous in fact and some other people may perceive I'm nervous but but some other people may be confused so how can we turn a subjective issue into an objective one well using all the power of mathematics and statistics and all that's things so well this is merely an example of what we do we want to minimize perception bias right so what do we use first of all we use a balance arousal scheme which is widely reported in the literature but with the discrete labels we're not saying this is happy this is sad because there is a lot of interference in those concepts as such we use a fine-grade scale and we made some tests on the scale to represent if this person is being a more or less in some way or some other emotional state so then we are working on some guidelines that the group of annotators so far is well all of them all of us are expert annotators and they are experts in different fields that means people are developing their own modules for recognizing face for recognizing gestures for recognizing voice for natural language processing so well for representing this knowledge so we've got a good interest in in in having a good quality labels right so those are the guidelines and this is the annotated annotation tool that we use that's elan and then what we do is once we annotate we compute our score which is a crumb backs alpha score that tells us how how many annotators we need in order to reach a satisfactory agreement that is you can annotate subjectively a video right but in the end the whole group needs to achieve a certain score within this framework in order to accept that video as well if it's not the case then we take actions we revise the video we add more annotators or whatever it is right once we get that we compute another using another method called the collect method which comes from medicine in fact and we are tuning in to our needs these algorithm but well so far we can get roughly it's not precise now but we're working on it we we get our consensus which is this line down oh my god sorry okay which is this line this is the consensus agreement taking into account all the different annotators and we get a confidence score for each segment which is very important right so basically that's how we do things but one of the main goals within Maria de Maes to initiative is being able to release the data that we're using raw data to the scientific community okay we're using unidentified data it's anonymized we don't know the names of the candidates but we're using sensitive data we're talking about faces and we're talking when talking about speech those cues are called biometric cues can be used to identify a person and that is considered sensitive and in the European law and now I know the whole thing in the US this is a big issue about this which I'm happy to hear in the European law there's a very restricted law and regulations on data protection so how can we manage this well within Christina participants who are being recorded sign an informed consent this informed consent informs them on what we are going to do in Christina yeah so and how we are going to use the raw data even this is something that we are currently working on we need to inform them what we do with raw data right they sign this fine so we can use it within the consortium technical partners sign another agreement the confidentiality agreement because we didn't want sensitive data right so we need to guarantee that when we download this data it is well secured all the computers have got password so on support this is very strict and this is very serious yeah but on the same at the same time within the project we do have an agreement on dissemination in the data management plan which means of course Europe is not paying for just having research on bed this very tiny little linear and not sharing it to a scientific community Europe is also paying for sharing what we get in this project with the scientific community so let's see how we deal with this problem so far and I am happy to say that this was quite recent but now in the department there's some support that is helping us deal with this issue and this CREP you can go to the web as an internal committee for the ethical review of projects right when I first collected my first data set this didn't exist and I was like but how do I do this how can I write like this informed consent I'm so it was I was kind of puzzled by all this but now there are certain procedures and certain templates that you can use and so on and so forth and then obviously all the data is revised by this ethical committee and in Christina we also have an internal ethical committee and it's a medical ethical committee by one of the medical doctors in the project to bring that so but still when thinking about releasing the data set we need to take care of all these issues licenses and conditions for data access that's the reason why I asked that question Victoria now you know so we are working on it and it's not something simple as you hopefully now understand that this after this workshop so in any case there are many questions that arise one of which is where do we put all this data work repository to be used so many questions to be answered yet so I'm very happy that there's a Maria the math initiative is going to like lead the way for us to solve these issues and probably also set the path for other people coming in and and facing these same issues or some of the groups dealing with similar issues with data protection to join efforts and say well now we've got some established procedures that we can follow and other people can also benefit from right so some conclusions so far in Christina we're dealing as Leo said with nonverbal emotional behavior we want to enlarge this and I'll say not only not the real emotional behavior but rather nonverbal and verbal communication because communication as I say is something holistic we want to enlarge this database that we are connecting in Christina not only for evaluation which was the first focus in the project but also for training right and generation purposes and again we want to develop interactive research lines and some examples I love them dealing with computational linguistics now you know where we come from right but we do think that we have a lot to say in this field so computational linguistics and visual character designing control computer vision and multimodal communication as a holistic event so there are many people to thank and I just wanted to give you a flavor of the kind of material that we annotate so I asked for collaboration in and the annotator annotator's team and we got this if it's working it's working sorry it was thank you for your attention goodbye sit wait perhaps you can tune it in by the way thank all the people these are the people in red from the UPF department different areas and those are the experts in the Christina consortium it is not really relevant so just the last question is all we need data I hope that I was able to send the message take home message that it's not all about data we need to know it's not working that's okay they don't want to appear but they I mean it's not it's not all data as I was saying and there are many things many more things that we need to be able to reproduce and share data with the scientific community and they are very important to take care about and well maybe we also need some love in between to be able to caress this team of people who is devoting a lot of time and effort for the mere reason of getting a tiny hint on how human interaction human-computer interaction is is all about right so anyway thank you for your attention