 Thank you good afternoon. Well, how many of you have seen fake news during the last year? I want to see some hands. Oh, lots of hands, almost everybody. Well, it's many of us. Because of Europeans, 42% trust in media, only the 42% of Europeans trust in media news. And only 33% of Europeans trust in news they find on the internet. And even less, 23% trust in news they receive via social media. In Spain, only 55% of us are confident on our ability to spot and detect fake news. And there's another study that said that in 2022, most Western countries will consume more fake news than real news. It's a Garner study. Fake news usually gets spread in times of political elections. We have recently the example of the Bolsonaro campaign in Brazil. Brazil is one of the countries with more users of WhatsApp. They have 120 million users in WhatsApp. And the fall of Sao Paulo made a study in that election where they saw that 97% of the news that were shared via WhatsApp in Bolsonaro supporters groups were fake. 97%. In neutral, we are more than 70 people working in a startup. We work in the city center of Madrid. We are fact-checkers. We are journalists. And we are also developers and designers and product people and documentalists and video makers. And we are all obsessed with one thing, to see how we can use better the technology and the journalists to make citizenship better informed. We started the team that makes the fact-checking started in 2013 doing the Anapastore program in TV. And they were the first journalists that brought the fact-checking journalism to TV and they did it in prime time. So we have an experience of six years now. Anapastore in January 2018 created her own startup to be able to not only share and distribute this kind of content in TV, but also to do it in other platforms, via web, other social networks. For us, information is a big thing. Because if you have good information, you can make better decisions. But also, the free, the independent and also a pluralistic media is one of the fundamental elements of a democracy. But the impact that fake news have right now is very important and they have a real effect. They can even cause deaths. We have some examples. Last year there were some lynchings in India. More than, at least, some people say 27, but at least 20 people were dead in lynchings. That happened after some information, some fake information was spread via WhatsApp. India is the first country in WhatsApp users number. They have 230 million. Fake news have also impact on science and what we know about vaccination, for instance. 48% of Europeans think that vaccines can have severe side effects and also 38% of them think they can cause illnesses. And also 31% of them are sure that they can weaken the immune system. Also, the European Commission and the World Health Organization, they blame the rise of the anti-vaccine movement to some disinformation campaigns that are spread via social media. Fake news also have a very strong impact in disasters like hurricanes and these kinds of situations. We know that online social media plays a vital role during real-time events crisis. There's a study of Twitter during the hurricane Sunday in 2012, in the year 2012. They analyzed the fake images that were spread through Twitter during the disaster and they saw that 10,350 were unique tweets containing fake images. And 86% of the tweets that were spreading fakes were retweets. As you can see, there were a few original tweets. Fake news also have a very great impact on climate change information. Most of YouTube videos are related to content that oppose the scientific consensus on climate change. There's a study also from Germany, from the Aachen University in Germany, that says that 200 videos on YouTube were analyzed. They were all of them related to climate change and most of them, 107 of them, denied it was caused by human action. And also, we saw in the study that conspiracy theories, the conspiracy theories' context, received the most number of engagement. More views, more comments and more reactions. And also, fake news, as you know, has an impact on elections, on political elections. There are many, many social media studies on the effect of social media campaigns in populist voting. And we have only these numbers here, that 126 million Americans were shown political-oriented fake news stories via Facebook. And 20 of the most popular fake studies receive more engagement than the real news. So in this kind of scenario, how do we work in a newsroom? It's a very difficult situation in a newsroom, because in one side we have this low trust of the public in media, in general media. And there's also another explosion of disinformation. A lot of disinformation and fake news arrives everywhere, in different channels, social media, even WhatsApp and private messages. And there's also, this is a big part of the problem, because what we call dark social channels are the channels that are not easy to see for us, because they are private, they are encrypted, the fake news go circulate via WhatsApp, so we as a journalist cannot see them, cannot see what is being said there. And so we have this situation in a newsroom where fact-checkers are really overloaded with work, we are always busy with work, and also we have a human situation in which when somebody receives lots of contradictory information between A and B, we usually tend to trust the one that is more emotional for us, the one we want to believe. So the first obvious way of solving this problem of having our fact-checkers overloaded is basically having more fact-checkers. Or we can also try to get more journalists, you know, that kind of journalists that they need to be super fast doing things, they need to be trustworthy, they need to be the best in their class, and they need to be committed to help others. Are there this kind of journalists in the world? Yes, they are, but the problem is that this journalist is working in the daily planet and not in neutral. So we decided to do a different kind of hiring, and we prefer to hiring this kind of fact-checker. We do believe that fact-checking can be solved, the problem of fake news and fact-checking of automated fact-checking can be solved by artificial intelligence. But always with human intervention, because fact-checking is a very complex problem where human judgment is always needed. It's not so easy to decide if something is true or is fake, because there are a lot of political context that you need to know. So our main goal is to combine human intelligence with artificial intelligence in a way that we are building some kind of human-in-the-loop system where the bots are able to help journalists to enhance their capacity through AI. When we are speaking about fake news, normally we're thinking this is a technological issue, but it's a technological issue, but it's going far beyond that. A recent study has shown us that false news was 70% more likely to be retweeted than the truth. And this was not because of automated bots, it was because of humans retweeting. And why this is happening? Because we do love gossip. We do love sharing things that have an impact in others. We do love sharing things that are aligned with our own beliefs, although we know that they are false. So we are facing a human nature problem. It's in our own psyche. So when you think about Russian bots, please think that the most advanced Russian bot normally is us. Think when you are retweeting something. And the problem is that the truth is spread in a slower way than the lie. So we have a communication problem. How can we make that the truth is spread in a better way? We don't know, but journalists are specialists in communication. That's why we believe that the proper way to solve this problem is to mixture journalism with technology. And the fake news fighting ground is large. And good pleasures and bad pleasures are used in both technology. Spoiler alert, bad pleasures are winning the race up to now. For instance, one of the sexist topics now that is the deepfakes, probably most of you have seen this amazing deepfake about the E-Team, not the A-Team, the E-Team, which is our politicians. We have amazing technology for building this kind of deepfakes. And the technology to detect them is still being developed. Facebook has launched a $10 million content to create the new novel techniques to detect deepfakes. They are hiding actors to create data sets of deepfakes because we don't have them. And then they are going to release those data sets so we can create this kind of new technology. There are many other fighting grounds, shallow fakes, outflow detection, fake news detection, but in this presentation we are going to focus on automated fat checking. That is the kind of automated system that Neutral is building. For a long time technology has been used for spread misinformation. Now we are trying to use technology as a shield in such a way that we are able to massively scale fat checking using computers. And even if we are able to create this technology, the challenge is huge. Donald Trump has made 13,435 false or misleading claims for almost 1,000 days. And I'm not saying this is the Washington Post who is saying this. Do the maths, guys. Do the maths. 13 false claims per day. This guy is a machine. He has to start lying in the morning when he's having breakfast. I don't know how he's able to do that, but it's really good doing that. And this is new politics. And fat checkers have to work in this new scenario. So how has done the fat checking process in a newsroom? We start with the first step, which is monitoring. I mean listening to all the politicians. And I mean exactly this. At the moment we are 14 fat checkers who every morning, every day, start listening to all the public discourse of our politicians. The main candidates of the main parties in Spain. They do it, they share, they organize themselves in a team. So everybody hears. And it's two kinds of processes we have. Every day, all the days, we do the active listening and we take all the declarations they have. And then we also do the live fat checking, which means that sometimes there's an important debate on television or in a political election or something like this. We do the live fat checking and everybody is listening, trying to get what they say and trying to verify it very quickly and posting the results on the internet. This is a teamwork. We cannot do it by alone. So they are all together in a teamwork and they are communicating themselves on the moment they are working. And it's not only to listen to it, but they have to be doing at the same time a selection of the claims that they have to verify. Well, how can we automate the monitoring? Well, in the last year, there has been great advances in deep learning models that have made the speech-to-text technology accurate enough to be a mainstream in technology. So we can remove active listening by making a machine to listen to the politicians. And probably journalists are happy because of that. It's a cumbersome task. So what are we doing? We are connecting our video server to live video streams or to simply we upload videos to them. We order to a cloud server speech-to-text service like our AWS transcribe or Google or Amazon or Spacematics, that is the one that we are using right now. We order for the transcripts and the transcripts are storing our data. Then our journalists are able to read the transcript instead of having to listen to it. Reading is much quicker than listening. So we are able to save up to 30% of fat checkers' time only by integrating speech recognition in our current workflow. But the world is not always perfect and there are a lot of challenges that we need to solve in order to make this a production tool or a productivity tool. First challenge is that all those speech-to-text technology is good in English, it's not so good in Spanish. And specifically when we are talking about pronunciation, the output is really bad. Sorry, punctuation, not pronunciation. So as fat checkers, we need to have complete sentences because we provide a rating for each sentence. If the transcript mechanism is not able to give us complete sentences, the system is not going to work. This is one of the biggest issues we have now. We are trying to find out a solution by using sequence-to-sequence models, but our first results on these experiments are not good enough. Second challenge is speech-to-text technologies are accurate enough for making us to understand the transcript, but they are not 100% perfect. And this is very dangerous for us because if in one claim we transcribe only one figure in a wrong way, we have a problem. So each time journalists need to review all claims that at the end we want to fat check. And finally also, it's very important not what is being said, but also who is saying it. So we need good speaker identification systems. And probably, if you have seen some kind of political debate, you will understand that it's very difficult to figure out who one is speaking when they are interrupting each other. Well, the second part of the process is the claim detection. When they are listening, they have to select the statements that are to be verified. We are part of the international fact-checking network. That means we have to follow a methodology, and it's a very strict methodology because we'll get certified every year. And this means that manually each journalist has to select that verifiable claim. Not everything is verifiable. This is important because only facts are verifiable. Not every data is verifiable, only some kind of data. We can verify, and also opinions are not verifiable. Sometimes people ask us, somebody said something and I want you to verify it, and if that's an opinion, we cannot really do that. In this part of the process, we have to do it manually because every journalist has to be listening and selecting those pieces of information that have to be verified. Okay, this is where the magic really is automated claim detection. Can we eliminate the human in this process? We can do it. Currently, we have a system that is able to do that with good performance. I will try to explain to you how this is working right now. We have a machine that is reading the transcripts and without human intervention, select factual statements, sentences that have facts inside of it, and automatically send this information to our workflow tool. Basically, this is something like a Jira for journalists. If we are able to integrate this claim detection with a good speech recognition technology, we will be able to save up to 80% of the time of our fat checkers. According to our estimates, it will be to speed up our operation at least in 10 times. How are we doing this, or how are we able to create this claim detection algorithm? Basically, it's based in three main modules. All a language model. This is a model trained by Google with Google News Data that basically allows us, it's a neural net, allows us to capture the meaning of the sentences, such a way that we can provide to the classifier some signals about all sentences that are talking about unemployment or sentences that talk about gross domestic product or whatever, normally have a high probability of being a factual statement. The second module is a module that works with the language structure. Here, we are working with NLP libraries in order to first extract entities, persons, organizations, date, currencies, etc., because normally when this kind of entity is appearing a sentence, there are facts in it, and also we are extracting syntactic and grammatical structure trying to figure out which are the most common compositions that are part of factual statements. This is also new signals that we are going to fit with them to the classifier. And finally, we also input the system with ad hoc knowledge from experts. We input language pattern. For instance, we know that factual statements have to be declared with verbs in past or present tense. We know that normally there are comparisons in factual statements. We know that temporal adverse should be there. We know that there are some kind of words, a lexicon, that are ambiguous by nature. And however, there are other kind of words that are more concrete and they are more prone to be in factual statements. So we combine all of these. We put all these kind of signals to a binary classifier and the classifier is going to tell us if this is a factual statement or it's not. For training the classifier, we have tested or trained or tried support vector machines, naive vagis models and logistic regressions. And of course, this is a supervised learning approach. This is how or what the fat-checker watches the end. It's simply an editor with a transcript and all the sentences that contain factual statements are highlighted by the machine. Imagine that you have to read 2,000 sentences and trying to figure out in which of them there are facts. And now imagine that pressing a button in one second, you have this. The improvement is really big for them. But also in the real world, there are a lot of challenges, issues that we need to solve. First of all, noisy transcripts. We talked about it before. We have a good accuracy with perfect transcripts, but when the transcript is noisy, because the speech-to-text is not good enough, our current classifier is not resistant enough. The accuracy goes down. Second challenge, multi-language. If we want to identify claims in English, we cannot use this approach because our model is for Spanish, the language patterns are thought for Spanish language, so we cannot extend and adapt. We need to figure out how we can make a multi-language model. And the most important one, we need more data as in any kind of machine learning approach. And we would like to try, for instance, with neural nets, but we don't have enough data to do that. What is our training dataset? Or in which way have we, have this dataset being built? Well, it's a little bit cumbersome, but at least three journalists, three different journalists, have been reviewed the congressional records, and they have been leveled all the sentences on those records in four different categories. Factual statement with high confidence, factual statement, low confidence, undefined, and unfactual statement. Three different fat checkers need to label each one of the sentences, and then we have a fourth fat checker because even with experts, there are disagreements sometimes about if this is a factual statement or not. It's not so easy. Only it's more or less as easy when it's the first category. Factual statement, high confidence. But with a low confidence, we are in the borderline. So we have a fourth fat checker that adds or plays the role of a judge and make decisions on disagreements. And think about it, what do you think that is the average percentage of factual statements in congress transcripts? More than 50, for sure not. They are our politicians, okay? More than 30. We don't know. Less than five. Our data, or at least our average, is that between 10% and 50% of what they are saying is a factual statement. But depending on the topic, when they are talking about things like Catalonia or politics, or they are in a censorship motion of a politician, as they are trying to inflame emotions and passion, the language changes. And only 5% of the statements are factual. We are not using data in order to defend that kind of positions. I was explaining to you that we can speed up the process, but also we can increase our coverage. Marilin didn't say anything about monitoring Twitter, social networks. We are not doing it manually. But now, with our bot and with our claim detection algorithm, we can do it. We are monitoring 25 political accounts. The bot is reading their tweets and is trying to figure out if there are some factual statements in those tweets. If there is something there, we send that information directly to our head of fact-checking. And for him, it's the same that if any other journalist are sending him something to review. So we have a person in this kind. It's about 24-7 working on monitoring politicians. In case of WhatsApp, we cannot do it. Because WhatsApp is not providing us with any kind of API to work with them. So we have a lot of people that send us statements to be verified or photos or whatever. And this is a manual process. And at the end, our newsroom is overloaded by their request. Well, step three is how we check data. We follow the methodology I said before and we have some rules we have to follow. One of them is that we only use official sources data to check the verifications. Another rule is that we have to have three people from the team validating the same process. After a journalist does it, then it gives all the investigation to another one and it repeats the process to see that everything is correct and the steps have been followed. And we are always asking the source. I mean, if a politician said something and we want to verify it, we always ask that politician or that party to see what they say. Some people doesn't know this, but we always do that as a rule. And in this case, this step has to be done by a human because it's a human who can call the party. It's a human who has the phone number and has to call and has to talk to him and has to explain what we are doing. We are journalists, we want to verify this and so on. And also, it has to be a human because only a human can understand the context, the political situation in the same day, in the same context in a week even. Because the context can change and words that say something in a week are not the same, they don't make sense in the next week. So this is a step that has to be done by a human. It's very important. Okay, as Marilyn said, this has to be done by a human so I am talking now. I'm talking now because although the human is really important in this stage, we can make some tasks for them to help him in some situations, two main situations. First one, what happens if the claim is already being stored in our FATCHEC database? We could simply retrieve that claim and give it to the person that is asking for it. In an ideal world, this will work something like this. We could use this system with a life transcription and with a claim matching algorithm to make real-time life automated for checking in a political debate. So if Trump is saying a lie and that lie has been already stored in our FATCHEC database because politicians repeat the same lies, we can open a pop-up and say, hey, he's lying. This is a mock-up from Duke University in the United States. We are talking about, at the end, our research projects. But this is the ideal future we would like to work for. But what happens? In order to do this, it's not so easy as to compare this sentence with this sentence because language are complex. And you can say the same claim in many different ways. So as a first iteration, we developed what we call closed search. Basically, it's been implemented with an elastic search engine. So what we are doing is matching words and giving some fascinance in the search. But after that, we iterated and created a second version that we call near-search. Here, we are using word embeddings. Basically, we have trained a system to understand political speeches and so this system is able to understand words that are used in the same context and they are able even to understand claims that are expressed in very different ways, but at the end, their meaning is the same. This second system works better with long sentences. The first system works better with short sentences. So what we are doing is mixing both systems. But the problem is not here. The problem is that search needs to be contextual. The meaning of the sentence can be totally different if a different person said it if the person said it in a different moment if the person said it in a different place. So we still have a long road to go if we want to be able to retrieve automatically fascics from a factual database. What we think that we can do now is always having these journalists reviewing these results and then he's going to sell it to the press this result or not. The second task where we can help the journalists a lot is retrieving data. This is a basic information retrieval mechanism. In order to evaluate if something is true or not, journalists need to go to official sources, get the data, compare the data and then they submit a judgment. What can we do? We are able to understand the claim that we can extract the entities we can try to create a query with these entities and we can send that query to a knowledge graph. This knowledge graph has to be previously being built by scrapping official sources like the INE or the ERASDAT and then when we get the data that is needed we provide a data chart for journalists, this in the idea world. Problem, normally we don't have structured data and data movements are not so good as they should be and also if you want to fact check a local politician or a regional politician, this is an nightmare because you should have to integrate with local informational systems and need to understand different standards, regional standards normally. What we do when we have already the verification made then we publish it. We are journalists. We publish it online on our web and then we have this thing with Google that Google shows our fact checking better in the results if there is a verification made by a fact checker. And then Facebook also they issue an alert if some people have shared a content that we have verified so they send an alert to the user and they say hey, this that you shared before and they verify by neutral and the rating is false or true or everything. And also Facebook changes the algorithm for that kind of content and it shows less to other users because they know it's false or it's misleading, that's what we do. And of course we also share all this information on our social networks so the verifications also are being spread. Now I want to talk to you only about one idea something that we want to explore. We want to know if it's going to work or not. How can we make that the truth spread faster than the lie? We don't know but we would like to explore if voice power assistants like Alexa, Google Assistant Siri, whatever you want to call them could be useful in this fight against fake news. Imagine a system where I'm asking Alexa if it's true this thing that I have listening about whatever. And Alexa is asking me new questions when she needs more context in order to answer if this is true or not. This is very similar to the thing that we do normally when we are having lunch with our friends and we are speaking about fake news. All of us are discussing, we are finding data and we are asking for more arguments supporting one thing or another. Maybe if we are able enough to create this intelligence inside of these smart speakers we can have a system that everyone can use even my mom is able to listening and to speak with Alexa. Sometimes she has some problems with the name but she is able to speak with this device and at the end we would be able to create a system where it's like we were able to talk directly with the journalist that created the subject because we can go deeper asking Alexa for more details about what this is true or what this is false. Give me more data. Why do you think that? Imagine that scenario. We would like to figure out if we can do this in the next years. Because all of these things that we are talking to you about are from our perspective a richer project that is advancing. It's not only neutral who is working in this. This is a global challenge that we have in the world trying to make automatic checking a reality. The stages one or two we have them most advanced. Stages three and four we are still working on them. So our requirement here to all of you is that if you are from a research group or a company that is working with these technologies and you think that you can collaborate with us and you can provide us with something that would help us to improve our systems, please send me an email. This is my address. Even if you are a technical guy talented and you like to work against fake news, also send us an email. We have a small team of technical guys. We are minority in the newsroom. We want to be more there. So even if you have an idea don't hesitate to send us because we would like to discuss it with you. Thank you very much for your kind attention and we are open to any questions. Thank you. Thank you, Ruben and Marilin. Super interesting how it's evolving, the ability to fight back. Questions in the audience there. I have to cover my eyes. As you can see we can't see anyone. Any questions? Any questions? And where do you see I have a question. What does it look like in five years time? Are we going through a period of fake news, everyone thinks that what is happening right now will be what is going to happen forever. But obviously as you are seeing you can fight back. Will the tide shift and in five, ten years fake news will be a thing of the past? I hope so. There are a lot of people working on it. Developers or journalists are very concerned about the situation. I think we are a lot of people working on it and I want to hope that it changed and even we get to make a public more informed and more septic on this kind of situation. And who checks the fact checkers? That's a typical question. Sorry. What is the methodology that you are following that ensures that you are doing your work as you should be doing it? And there are not only one fat checker. There are many fat checkers. At the end you can have different opinions about the same fact. The problem is really that some political parties, for instance in UK they are trying to meddling with fat checkers and they create accounts that say fat check UK and at the end it's a political party. I believe that fake news is something that we are going to it's going to be part of our lives forever probably because it can only be changed through education. So it's going to be a long process and technology is going to evolve in one way and the other way. These are going to be something probably in two years we are going to see amazing videos of a lot of people but I don't know if the good guys or the bad guys are going to win. I expect that the good guys but only if you are with skepticism and trying to be more critical what we do in social networks we can fight this. Perfect. Well thank you very much. A big round of applause because they are fighting the good fight. Thank you guys.