 So hello everybody welcome to this new busily beneficial podcast it's going to be a bit unusual today for two reasons first it's we're doing this via zoom because well like many people we are confined and the second thing is that we're going to discuss not a paper but a series of videos by a youtuber called SmartRy every day his name is Dustin Sandin it's a very very big channel like one of the main science YouTube channels these days has like five million subscribers something like this and he's usually it's like he usually tackles like very engineering more kind of problems science and engineering problems like he's a mechanical engineer and it's very unusual for him to be tackling the the problem that we're going to discuss today which is the problem of the false information and fake accounts and how in particular social media companies are trying to deal with this and it's very interesting because he actually gets to interview people from the different companies unfortunately he did not get to interview people from YouTube but he did get to interview what he interviewed people from YouTube but he could not film it and so it's not broadcast on it on his channel but there are interviews of different people people from Facebook people from Twitter and also like people outside of these companies who have an external viewpoints on what these companies are doing and it's extremely like this video I think are extremely important because I still believe that social media are underrated or neglected especially by the AI safety community and yet they have a huge huge huge impact as discussed in this video because there's this example of Cambridge Analytica in 2016 1516 but since then there's still a lot a lot of things going on it what it really shows is that it's a really big challenge for this company to deal with these so yeah I think the videos are really interesting they are a bit long but they are very complete so I highly encourage viewers to go and check it out check them out and maybe now we can discuss a little bit more the contents of the videos so the first video is about checking the truth of information online and he starts with the realization that a lot of the content that we find on social medias is very often a distortion of the truth the first question to ask is why would there be fake information online and for the case of social media the answer that's that it gives is that there can be two reasons one is to get ad revenue by a simulating engagement so most often the fake news is something that's very attractive and make us want to click on and look at there's a paper that showed that in science I think that showed that it's actually fake news contents were like 10 to 100 times more viral than the more factual counterparts yeah exactly so yeah the way I see this is that often if you limit yourself to the truth it's a it's quite harder to to create content that is a viral and very engaging well if you don't give yourself any limits and you can go to the realm of any invented news then it's much easier the second reason is to manipulate opinion and this is a more recent in my opinion maybe it's interesting to note that so just in something as many as many people on a bus many of us were a bit active on the internet also media past two years noticed or three years there was a search of strange videos who are repetitive who say the same thing but then they change the title or they change the they they have like automated background and in the same period I was noting the same thing in in different languages so for instance in French English and Arabic and Moroccan and in different regions of which I could understand the language there was a search of this phenomenon of people and it's same seems that most of the time it was just for a maximum like just generating clicks and ads but later it turns out that behind so there was there was the first motivation people who just realized that you can game the algorithms generate a lot of redundant automated automatically talking automated text to speech videos where it's free to produce because you just create a text and then you have an algorithm that reads the text in different intonations and with different changes of words that the algorithm doesn't detect as redundant but very quickly people realize that besides the let's game the algorithm and have a lot of ad revenue there was malicious malicious intense sense intention that appears just from the topics and the topics were not very innocent so for instance the ones he noticed were on politics on the Clinton versus Trump campaigns I noted the same for for many politically relevant topics in different regions of the world I noticed and people were not taking that very seriously didn't it was it was very it went unnoticed for a while there was the Cambridge Analytica who made people realize the extent of related misbehaviors on social media but but this phenomenon of automated like massively produced videos who are exactly the same but they are produced in a way such that they are not detected as the same to so for example we produce a hundred videos saying the same thing you change the text you change the photo so the algorithm doesn't detect it as redundant and most of them will make five views or three views or zero views but one of them will search and exceeds the some threshold of a hundred thousand or fifty thousand or twenty thousand dependent on the country and once this threshold is reached with bots and non non legitimate clicks it seems that humans starts watching it and this is where the problem starts and those videos then are shared on whatsapp and create the problem like fuel the infobesity problem infobesity is a mix of information and obesity so and then like and then you have a problem of people may be misinformed and engage into the worst informative content there is and in context like the actual one it's not something we want and it's something we need to prevent so we need to talk about how to prevent this in a general context and maybe if we did this better three years ago today many countries who are facing fake news propagation in the context of the unfortunate coronavirus disease would have been better equipped and it seems that we are not equipped today. Yeah I think one thing important that you pointed out is that it there's the problem of a fake news or false information but it's not the only problem like and I would even argue it's not the main problem like especially regarding the coronavirus you also want quality information you also want important information to be promoted to be promoted and yeah and the problem is that if you have a lot of content generated by by different people who just want to make use and for whom the quality and important information is secondary then you can flood the internet with information that are not important that are not critical and people can neglect important problems and this has become a huge challenge in the case of the coronavirus maybe it's a bit different from what the videos were about but in the context of coronavirus for instance like just a few days ago actually two days ago so Tuesday 17 for those who want to follow because like the time is very critical for the coronavirus the homepage of YouTube has no zero video on the coronavirus still still I checked this morning so we are March the 19th I can still have dependent on where I log from like I could log from a non-navigator and has zero coronavirus so I have the MBA something on cheese and something on whatever Indian movie and something on yeah nothing on the coronavirus yeah like now like this morning to Thursday 19 March and so you can imagine that the fact that YouTube is not recommending anything about the coronavirus to many people makes people believe that it's not a big deal and it is a big deal that they don't think that it is a big deal because the whole in the defense of future I think now there is so much misinformation around it it might even be safe to not recommend anything of course yeah so it is not is not easy either the job is clearly not easy but arguably there has not been enough preparation for this kind of thing and just in general for just promoting quality content and and and there are a few simple ideas that could have been very useful like for instance we comment videos from the WHO from the World Health Organization or from the CDC the CDC Center for Disease Control has a couple of very very good videos like two minute videos very clear and they have like 500,000 views at least maybe yesterday and which you can say well that's quite a lot of 500,000 views like it's more than than I make but given the scale of the primary it's like essentially nothing compared to the scale of the US if you think of the percentage of the US population that got exposed to this even though like two billion users are on YouTube like two billion users and this only has 500,000 views like this is less than what this is point 05 percent 05 percent is something this is very very small 0.02 percent something this and yet I think these videos shoot well these videos arguably should have made hundreds of like tens of millions of views at the minimum I'd say but maybe even hundreds of millions of views but of course it's not an easy problem and I think these videos really show just how hard of a prime it is like it's easy to blame these companies and that's what I'm doing a lot but I think the blame is on everybody I guess like these problems are also like important changes for academia and arguably academia has not been thinking about this nowhere near enough there's a problem of respectability of how serious it looks on a resume of a researcher to work on the YouTube algorithm on the face there are there are many researchers working on social media of course like things changed but still in the corridors you still hear people saying this is not strong science yeah study social media it doesn't it still does not look as respectable as studying fundamental mathematical problems in in machine learning yeah yeah it's like in every field the the most valued area of the field is like the very theoretical part of a certain physics like theoretical physics but you know it's not as respect it's not it doesn't look it's still you still have a lot of caveats many universities have changed that this policy and now hiring a lot of researcher working on social media but you still have this feeling among researchers who can tell you that yeah they feel that someone who's working on on social media is not as serious as someone who's still doing empirical work on image recognition or not the language processing in a fundamental way studying social media needs to be more respected and more valued in academia and the video from Smarter everyday is actually a good example why social media needs more people to work on and to say one thing that was quite shocking is the scale at which yeah the social media manipulation happened and for both the case of so you mentioned the case of YouTube where fake video are created and and manipulated so that the recommender catches at least one of them to recommend it to hundreds of thousands of people but in the case of Twitter and Facebook they both report approximately closing one million accounts every single day one million fake accounts that are created and this is huge for the sake of manipulating and this is is it about one percent ten percent of all the accounts that are or maybe even more accounts than created by human per day yeah I don't know but yeah I think you is about like 200 million users maybe something like this okay so one person one million per day would be that one percent of Twitter coming up from from robots signing up every day and yeah this is huge and so from the video it sounds that the platform are doing quite a good amount of work to fight these manipulations but from from what we see happening in practice it looks it still looks bad because manipulation is happening and as they were saying it's it's easy to to blame the platforms after this especially because they they also sometimes get money out of these kind of manipulations if if YouTube promotes conspiracy videos and put ads in in front they get money out of out of conspiracy videos so it's very easy to to blame the platforms for this but on the other hand as we have been discussing it's a huge challenge to to contract this all this manipulation given the scale of it given that they are very smart manipulators what if if there is a rule to try to contract them they explain that there is this process of the manipulation happening one way the platform updates to contract this kind of manipulation then the manipulator updates to to still be able to to manipulate the social media even given the new the new rules and and then there is an engagement a sort of war between the engineer of Twitter's and the engineer of the manipulation company yeah just to stress how difficult the problem is like there on Twitter there are lots of bots that are legitimate bots like things that automatically retreat things of for instance YouTube shows YouTube sign yeah the French science YouTube community has a science popularization community has this bot cafe de science that retreats automatically contents produced by the members of the of the community you have these excellent bots in mathematics that Twitter automatically papers maybe let me play the the the the that I'm always older in the internet game but so before youtubers bloggers had this so it started with blog aggregators in the 2008 2009 so this trend started with which but with which bots who were aggregators of blogs so whenever a blog post is posted the the blog aggregator so you'd have the blog aggregator of France the blog aggregator of Morocco the blog aggregator of Switzerland etc and then you'd have the blog aggregator of football blogs and it started with blog aggregators and then many many many concepts followed YouTube channels news agencies like press associated press roachers etc whenever news whenever something is released on their website there's an automated tweets now they're like humans tweeting it etc but like many many tweets are legitimately automated and you just automatically rule out automated tweets a significant portion of the useful Twitter which is legitimate and not malicious yeah so that's part of another part of the prime is that the bad actors still need to define what bad means but yeah it say the bad actors are getting more and more sophisticated and recently they probably using your natural language processing algorithms to to actually treat things that sound a lot like what humans would treat and it sounds hard to speak like a human but speaking like a human on Twitter is not that hard if you think of some tweets accounts Twitter accounts I'm not naming any Twitter accounts in particular but sometimes it's not very sophisticated what what's treated by these accounts and also you see so just on this like sophisticated malicious bots so for example for the political political bots which I was following for quite a while when I was working on on this project called not the kinch you see them gaming now so Twitter implemented a lot of filters through to rule out bots illegitimate spots and one of the filters was to look at biographies because those bots had very low sophisticated not very sophisticated biographies or redundant biographies and now you see them gaming the biography like they have bios and like for example one for one time one time Twitter was making like Twitter accounts with no profile pictures were flagged first so they start having profile pictures and then okay they started having fun having them from banks of images so publicly available images so Twitter started to say that they started having their own images we now know that you can generate images that are hard to detect as not humans so it's a cat and mouse game and it's not easy to be on the on the cats or mouse water whatever is your preferred animal size yeah not easy to be on the on the Twitter safety team so by the way just coming back to the topic the videos I think you know one of the most interesting ones was the one where we're just in Sandler's interview Sandlin sorry interviewed the Twitter safety team and they were I think the thing with Twitter is that they are in general transparency on how they deal with that so the drawback is that they got to criticize the bits more compared to their size they only users but yeah it's not easy to be on the safety on the safety teams part I think for for none of the three platforms thank you sometimes develop some empathy for the Twitter engineers have to deal with all of this it's definitely like very hard like one difficult point is that it is very hard to say what's a legitimate account and what's a fake account I don't know the world fake is the right one just in a bad account because it is really about like discussing what we really want for Twitter to be like and this also depends on the impact that Twitter is having on the world can imagine that some Twitter that we all like is actually very bad because it had side effects that we cannot easily spot then maybe if we thought longer we would change our mind yeah so just the definition of what should be done what should be moderated what accounts should be removed on Twitter is a huge ethical dilemma and we don't have the tools to define what legitimate accounts we don't have the proper definition for this we don't know how to do about this and this is why ethics is very critical and ideas like the one we discussed last week about we build AI and things like that they're not ready they're nowhere near ready I'd say at this point but as time goes like we really need to be thinking about how to define what what we mean and to define in an algorithmic sense not in a blurry vagal sense what we mean by what we a legitimate account is what all the accounts that should be removed from Twitter yeah and it seems to me that it's a bit of a losing game because as technology increases we we see in the previous years that it's much much more easy that to imitate a human perfection so I guess when when the algorithm reached the human level of entertaining humans it won't be able to remove it so there was interesting ideas on the on the interview with the Facebook team which was that instead of removing some type of accounts because this this created consequences that are also undesirable but simply recommending them less and less with an exponentially decreasing rate so yeah the solution could be that as they saying if we if we know that some type of content is is beneficial and some me and some are not the solution simply be to recommend more of the beneficial content which is also something that depends on the individual so for some people we would want to recommend different type of content because so I believe that in the in the future it would be impossible to cut out fake accounts so I see the only solutions to to detect what are good content and whether the good content comes from a true account of fake accounts doesn't matter much recommending the good content yeah I feel that we are already on the bridge like we are essentially more like we're more or less the Twitter the Turing the Twitter Turing test the Twitter Turing test is almost passed is essentially passed like for for bad Twitter accounts let me let me give you a thought experiment to see like why we don't even need to go there why we don't even need to go to the situation where we have bots imitating humans indistinguishable for humans etc so let's come back to these aggregators you remember like the blog aggregators the YouTube channels aggregators so let's it was it's so for now the consensus is that a bot that just tweets automatically news from some legitimate website is a legit automated bot so this is not the kind of bot you rule out but imagine so I want to promote so imagine there's Camp A and Camp B there are like two sides of a story is side A and side B and I want to promote side A and exp and then none so just flawed side B in the noise of side A so side B release the press release and side a release the press release but then side a have a hundred or a thousand more bots really retweeting or tweeting the the communique like the the press release of side a and of course with some tweaks in the wording etc would this be considered as a legitimate bots tweeting the press release as as the agreements the current consensus is is that's your if you're just a bot that tweets press release from a news agency so for example Democrats and Republicans both have an official like the Democratic National Convention and the I don't know Republican Party etc and what if I just create bots that are just legitimately releasing the press release there's nothing malicious nothing fake there's no no lie I'm not lying I'm just I'm just all flawed in with with side a yeah so here the program I see is that if you just have a simple rule to to decide what he's accepted behavior and and I'm strict behavior then the simple rule can easily be game just like to describe that you would create one thousand account that follow the simple rule but that's also actually different because they also rule that says it's very similar account are forbidden and and that's the problem that these teams that they exploit and algorithms that they write an algorithm that blocks unauthorized participation on the platform and then they are smart people that do their best to continue advancing their agenda manipulating the way they want by following by still following the rules because the rule of the platform is what gets promoted there well it's spoiled the basic version of what I described is already solved by Twitter they have a lot of anti redundancy redundancy policies even yourself you can't tweet the same thing twice in a short span of time etc but you can think of many variants of this that will not be detected by their system so I guess we we more or less agree I think we agree I think we gave several arguments for saying that direct moderation of accounts is very complicated I think overall like just the idea of censorship is complicated because people will also like he relates to freedom of speech and stuff like this what the interesting example that was given by Facebook in the interview of a Facebook employee is that it when come came to sexual contents and especially like what the people were wearing on pictures what if you made bodies were wearing make people wearing on pictures essentially there was some line that you wanted to draw but this time maybe different from different people and also what they observe is that if you draw a line then the contents that are closer to this line are just much more viral people tend to like things that are very close to the line and this is something about human psychology it's not it has nothing to do with algorithms so the fact that we are more drawn by this kind of contents and the the trick of Facebook to sort of discourage people from from searching for this line and like paying with the line is to instead of having this threshold having some some like you just recommend less and less as people as a contents gets more and more sexual because given this there's no more line like people are just I think this is a this move the transition is just like it feels much better like it feels like something that's much more robust to to gaming the rules for instance and it's also much more important if you think about it because we talk a lot about moderation and censorship and I think this is more sexy and that's why we talk so much about this but the the real thing that just really changes everything about the platform is recommendation especially on YouTube where 70% of the of the views or recommendations by the algorithm the recommendation algorithm is absolutely critical and we can pay a lot more with this and with the smooth constraints rather than than harsh constraints and I think this is something that's interesting just to clarify whenever so when I say filter out filter in I'm not talking about the classical binary concept of filtering which is like filtering out would be removing and filtering it would be showing so here's means recommending and the recommending so it's not it's not zero one it's like pushing it close to zero pushing it close to probability one of being seen and by the way it's just talking about law and and rights to the best of my knowledge this is something still not very clear in the current laws on freedom of speech etc how to deal with the being derecommended or having the other side being over recommended so for example in the context of elections you can have clear statements on that from the legal point of view that's the each party should have access the same side like the same audience for example if I want to take Europe many countries have the audience time metrics and each party should be like you have to prove that you have given every side the same same amount of people watching there or same time or same but in the general context of freedom of speech if Facebook starts the recommending my content so it's not deleted it's it's just there but nobody's watched it it's not clear to me like in many countries you still cannot define this as being banned yeah the band is still a binary definition then do and means when you're and maybe by the way I am many people who are active on the internet censorship thing like many many on the many of them still talk in terms of being banned in band this is like negligible yeah it's in the recommended and having another side being over recommended yeah so Tristan Harris who's having this exact podcast called your invited attention he and also with his co-host I don't remember his name sorry they have this notion of freedom of reach as opposed to freedom of speech but just the word freedom I think is maybe a bit misleading I don't want to you like if everybody wants to say something about coronavirus should you give equal amount of speech to everybody I don't think so and I don't think people think so as well I think you know like this I would bet there's a growing consensus that we should listen to the experts yeah organization the Center for Disease Control etc and YouTube now you have an automated thing which is a good thing like whenever it is a video that is detected to be related to coronavirus under it you'd have a banner by the way here is what the word head organization says it's good I'm not sure many people who are watching YouTube like I watch YouTube on my laptop I don't think I'm a representative of the of the actual YouTube user I don't think people who watch YouTube on their smartphones really notice the banner or click on it you have to be like on a laptop it's it's big enough you see etc and I would challenge its efficiency so for for now it's like I think a good thing YouTube could start doing we all agreed I think we spent the episode saying that the task is not easy technically it's very easy to detect what is a good coronavirus video the word head organization cannot come with the definition of what is good or what are the criteria is to be a good coronavirus video but especially by the way especially like now they're like concerns made by medical doctors recently there's like a debate about chloroquine there's this molecule malaria and there's it's not clear whether it's safe or not but then there was a professor of medicine in southern France promoting somehow not really promoting it but talking on it while it's still very preliminary and for example in Morocco you have now pharmacies running out of medicine that is needed for other patients just because it contains chloroquine so because there is this video on chloroquine in French which many many Moroccan people who went to school understand so people went to pharmacies and started buying medicine for pilot like containing chloroquine so all of these medicine and you have real patients needing it for real diseases where we know that the molecule is really efficient running out of it just because people who heard maybe chloroquine would be good because this professor of medicine say this so so even like for example the criteria if one easy criteria is to tell you to okay whenever you feel that they're like you detect that there is an authority a scientific authority promote the video even that would be dangerous because even if like if you have a professor of medicine saying I'm working on chloroquine and it shows that it may be efficient against coronavirus then you have people go into pharmacies and and buying chloroquine so it's not easy it's not easy yeah not solve it by just telling YouTube if you detect the person or the channel so the channel is official the channel is of the Institute of it's a medical Institute in southern France a public Institute like university style like research research medical Institute the YouTube channel and so if you if you tell YouTube whenever you detect an Institute that has scientific authority promoted the even that is dangerous yeah maybe we can settle on a very small subsets which are the Institute's who deal with the pandemic which are the World Health Organization and the Center for Disease Control so I see a problem in another in another area the problem for me is that so first of all you should not recommend the same video to everyone this video from this famous doctor of France maybe should not be shown it doesn't it doesn't do any good to show it to the whole population but maybe to show it to other people studying research ideas yet researchers yeah and also it's not only about the quality of the content like the whole true and how true what he said is in that case if you want something really true we would ask experts because they are the one closer to what's true but it's about how the content is understood by by the viewers and the effect it has on them and here I guess it's if you only listen to experts you most people would not completely understand they would maybe think think they understand it but behave in a way that is a that is not appropriate compared to a day yes yeah as far as this thing about the coronavirus situation yeah a lot of the content is not user-friendly and you absolutely need that this should be understood by many people that's why I think typically here the science communicators had had a key role to play but it's very hard like for YouTube to choose which video to recommend like I've been actually thinking a lot about this and for a long time in the US the English-speaking world like I was puzzled I did not what you know what video should be uploaded like I had this thought experiment and like the CDC videos were amazing like I think this one should really be but it comes from the CDC that most people have never heard of and if you can imagine that a youtuber with five million subscribers make like Dustin Sander for instance makes something makes a make some videos like this then it's going to be much more impactful to an audience that would not be as easily convinced by CDC videos and that also would be less willing to click on these videos so it's a very very complicated questions you also have things like you talked about this example which is very nice because it's about side effects like like this thing has side effects in all the countries because they thought it was good so they and has side effects on people who actually needed by the way just to clarify in case the professor is watching us his videos have nothing harmful if watched by researchers it's just it doesn't contain the need the necessary caveats that are needed for for a general audience yeah like I've heard this other thing about using masks for the coronavirus which like in some countries like you wanted to encourage people to use the mask because people because people already had these masks at home but in other countries and especially in European or American countries these masks are there are not that many of such masks and you actually need them to be used by doctors and so if you tell people that these are useful to prevent the spread of the epidemic then people are going to buy for themselves and it's actually bad over so it's it's very very hard to know what should be recommended especially at the time of crisis like like the coronavirus situation and that's why I think we need to prepare for this and preparing for this like I think we could have much better contain the the situation if the YouTube algorithm were aligned like if we had worked prior to this on how to better promote these are that videos but it's very very hard and we need to start now for the next just coming back to the general topic of our channel and our projects we've been like robustly this the beneficial like not here not not automated like no not recommender systems but just decision-making I think it's a big time also to think about decision-making and the context of humans yeah and some things from the past that tended to be seen as has been and bad and old-style and old-school and maybe needs to be re-examined and re-evaluated in the context of crisis for example the fact that's not everything was open and shared with the public so that's something from the past because there was no internet and YouTube so an expert could not broadcast his expertise or her expertise to the general public maybe in the time of a crisis you should think about the side effects of putting your expertise online because maybe if you're talking to your researchers peers and trying to convince them they should experiment with chloroquine it's robustly beneficial to broadcast it to your researchers peers but if you broadcast it to the general public you'd have side effects that's a medic a medicine that is needed for another disease on which we are sure it deals with it would be running out of stocks yes or maybe just you have to also consider whether you should be open or not in the time of crisis so it's not about it's not only of the we shouldn't also talk about like YouTube Twitter Facebook but also humans I like should you share a preliminary research finding in a time of crisis I would bet no I would like share it with the with your peers they're like mailing lists they're academic and mailing list researchers mailing lists whatever but putting it there on the public I'm not sure it's a robustly beneficial decision yeah definitely I want to be it's hard because it's the first time we face such a big crisis but I think it's important for you want to reflect on what should be shared publicly what should be kept privately to whom to communicate what and I think there were there would have been a few bad decisions and I think I've made a quite a lot of bad decisions but we need to look at this and improve and also ask how to automate like how to what is the procedure that we should have been following and yeah to think more algorithmically about what to do in case of these situations like this is a philosophy on the harsh deadline very harsh deadline yeah another thing maybe we can talk about before wrapping up is the fact that moderators of of these companies human moderators of these companies are being asked to work to not work do not come to work because of the confinement and they cannot work from home because of security constraints in these companies and thus we're going to probably have a lot of a lot more so it's probably just like to put this in context if some of you have seen posts deleted from Facebook for instance on coronavirus yesterday or if your own posts have been deleted it has been looking at my case functions very likely so in like in the case of lay very likely and I the very likely is not only for like from moon but alexander x tamas who was the chief security officer of Facebook before he resigned in 2018 so alexander stamos who and who's now security researcher in Stanford say like it's probably linked to the fact that YouTube has human content moderators and since they are not allowed to work from home now they are not working and YouTube is deploying the automated content moderators which as which are not ready to use as as as you realized yeah and because of that now you have a lot of legitimate contents removed and maybe not literally that's content not removed we don't know so yeah the situation now is buggy is messy and partly because people are now not working yeah yeah so yeah like this content moderation or recommendation tools are becoming more important than ever and they were not ready because well because it's a very hard problem but also probably because we did not work on this enough so let's work on this yeah it's should we wrap up yes thank you for watching us today and next week we will discuss the another paper called a safety via the debate from earthen and a professional so