 The next presentation will be given by Claudia Agosti and it's called Analyze the Facebook Algorithm and Reclaim Data Sovereignty. Hi, Hall, thank you. Analyzing Facebook Reclaim Algorithm, that is the topic of the talk. And instead of taking pictures during the presentation, you can take only this picture because the slides are on PDF, downloadable, is more comfortable. My name is Claudio, Vecna is my handle on Twitter, but if you look for the FBtrex hashtag, you can find updates about the project and also the material for this presentation. The project is Facebook tracking exposed. There is not .com, .org, .net, it's .exposed. And we applied this method to Facebook but can be applied to every platform that personalize your perception of the reality. So you have a personalized content when you are logged as a Facebook user, also another platform, this happened, happened in the Google search engine, happened in Twitter, happened in YouTube, this methodology can be applied, also another platform, at the moment we are focusing on Facebook. Now I'm pretty sure in this kind of audience, many of you are not on Facebook, are just without, because they can permit themselves, and that is good. But what we want to show is that the algorithm have an impact on how the society perceive the political debate and perceive what is happening. So also if you can consider yourself free from Facebook, the society around you is cruel, so think about it. It has been 2014 when Facebook itself show how the algorithm are a sort of tool of social control. In this research published by Facebook itself, they took 600,000 users as an experiment. They divided into groups, half of them were seeing their normal news feed, so France and page are following, except the content with a negative sentiment. And the other half, everything except the content with positive sentiment. They knew how those people were behaving before the experiment began, and they checked that they were changing their behavior. It was not censoring, the content were not deleted, just it was not showing up. So in theory, if you really care about a topic, you can go on a specific France or in a page and look what they publish. But in these years, people and application are competing for our attention. So be present in the news feed or not be present, can be a sort of way to promote or demote our content. And that's got some interesting analysis raised by Zenith Tufachki when Ferguson was starting to make the movement Black Lives Matter began. She realized that in her own Facebook news feed, nothing was showing up about it. That's a start to make understand that how things get prioritized or demoted by Facebook may have some political implication because people aware of the topic can feel it is important. And who is outside of the bubble cannot. Another more creepy story is from Karin Vajano. She had a group of friends that were in touch via Facebook, and one of them once got hospitalized. He wrote, I'm in the hospital, I'm going to face surgery. They never saw this post appear. This person died after the surgery, and they found out a month later. They were expecting that the Facebook algorithm would have made those friends connected. But it's not happened. Now, the one of Zenith and this are anecdotal reference. You can watch your news feed, try to be critic and judge it if it's okay or not. But what we need are data. So because Facebook keep insisting in using algorithm to tune how the platform is used. And the aftermath of the, well, not aftermath, one year later, of all the criticism to Facebook how has been used during the 2016 U.S. campaign and during Brexit. The decision of Duchenberg has been I'm changing the goal of our product. I will help you to find relevant content to help you to have meaningful relationship. Meaningful, sorry, interaction. But how can them know what is a meaningful relationship for me? And that is the main point. Believing that because you profiled a user, you surveilled on that user, you can start to actually give them what matters for them. And that is not okay. Like the political message of tracking exposed is that a user should be empowered having their own algorithm. That's decide what is a priority for you. Only you can have this algorithm under your control. And it's a phenomena, the one of algorithm accountability and analysis of algorithm that is growing. Those are a non-complete list of organization, open source project, academy groups that are doing algorithm accountability. Some of them are directly funded by the current monopolist by the status quo, by the GAFAM. How you can spot them? Well, I don't know yet. But I have a way to understand if someone is doing a corporate friendly politics or not. If the result of their reports, of their own publication, of their analysis, acknowledge that the power is in Facebook and Google or in the monopolist. And they ask for more accountability, more transparency, more third party review. That's still keeping the power in the center of the network. If instead you're asking that the user, the people at the border of the network would be more empowered, that for me is a political activism of a digital age. And that is what we're doing. And still we have an article like this. This has been published 10 days ago in The Guardian. In my opinion, show how the debate about algorithm is not yet informed enough. Revealed how the populist uses Facebook to win power. So this already triggered to me for three reasons. The first, assuming that the only way you have to win the election is to win on Facebook. Because this implicitly reinforced the idea that Facebook is the place where you have to be. But this is not true. The second is that engagement seems to imply victory somehow. Because the two persons, Luigi Di Maio and Matteo Salvini, are the two vice prime minister. And this may seem that this is an actual way to win the election. That's a reinforced company like Cambridge Analytica. Because behind the scene, they will be reinforced by this analysis, going to the next politician and saying, look, I can increase your engagement because I've profiled a lot of users. Do you want to buy my services? So that is the reason why I don't like this analysis. But the third one is engagement, in this case, is measured by summing together likes, share, and comments. And engagement is the result of three different variables of different nature that demand different accountability. If you invest a lot for your political campaign to have a bunch of people writing content for you, or if you have a large network of people that spread your message, that's create more content. And implicitly, we'll have more engagement in this kind of multiplication. If you have Facebook in the middle that decide that want to prefer specific content compared to other, that's as an agenda too. And then you have what people want, in the sense that if they are actually interested in a specific political message, they will engage more, promote it more. But how you can judge if you just use engagement as an absolute number that want to express all these variables of different nature? Also, if you are using the matrix that the system is giving to you, you are playing in their own field. So somehow you already lost since the beginning. We can look at them as a separated component. There are someone producing content for Facebook. So everybody who publish political content or article or their own independent post, the Facebook logic and the people. Normally, people producing content are kept accountable by electoral authority to understand how much investment is behind a certain party. Facebook's logic is the one we want to keep accountable and is the variable we have to isolate and then what the people want is just outside of our interest. Facebook, we have to think that is a passive actor with an agenda. And this agenda is not the human agenda we can imagine. It's not prefer Republican or Democrats or populist or neoliberal. Probably is a different agenda that want to keep their own business running wealthy. In this example, you see some page represented with different colors that today they've wrote different content, five texts, two videos. They goes in the black box of Facebook. And at 6 PM, a user connects and get a personalized timeline. Inside of this timeline, some advertising spread. In the moment, Facebook decide the timeline is also giving a priority. We want to expose how tracking and profiting from user data had a negative impact in the society. To do that, we have to collect evidence because algorithm is just shaping your perception, hiding or making a peer post more than once. We have to record what Facebook is giving to you. To do that, we made a browser extension that is the logo that run in your browser and that collects what Facebook is selecting for you. And still, it's quite hard because you just collect a bunch of data, but then what do you do with this data? So because it's a black box testing, we start to reduce the variable we can control. We made some accounts, quite hypothetical. With zero friends, they were accessing in the same amount of time during the Italian election campaign, just because it was a defined amount of time and because you can follow pages related to political speech. We selected the same page, 30 pages for all the users. We just played with the likes. So one user was liking the center left, the other, the far right, Antonietta, the left. The colors are somehow associated to the political challenge in Italy. Olivier, the pink one, undecided that user was not liking anything just following passively. They act like bots controlled by us, but they were not doing any kind of engagement or doing any activity beside checking what Facebook was selecting for them. They were controlled through an auto-scroller. So something at 13 time per day, from 7 in the morning to 7 PM, was connecting once per hour, collecting some posts, and keep this evidence. And this is run for three months. And then we can start to do our first comparison. In the moment we have controlled all the variables, we can, for example, check if the content that composes your timeline are of the same kind. What is a kind is a photo or post or video. And then you can see that with just a few likes of difference, Santiago's center left was getting 60% of post and 40% of photo. Michele, it was making like to the far right, was getting 53% of photo and 38% of post. And that is already a way to have a grasp on how users that, in theory, are exposed to the same kind of page get a different experience. The Andrea of the right is the one that was getting more text. I mean, also the quality of information may depend from the kind of media you are exposed to. We check if this pattern was recurring also during the single day. And that's all the access made during the 9 of February from the 7 to 16. And you can see that the pattern is based on less post. So the difference are bigger. But somehow it's always the same. How is this looks like Facebook as actually behind the scene set of components, some percentage, some doses for ingredients to compose your information diet. That is the metaphor we tend to use. In the moment, we have found this metric, so this way to compare data and to try to understand the algorithm. We applied the same also to our own profile to understand if we are more similar to Andrea that is getting out of text or to Michele that lives with most of the pictures. This is an example when we analyze ourselves, we only see anonymized user name. In this case, we have papaya shawarma ice cream that is taking the majority of post, so text, and pictures photo, some video, post, and photo. The other user, the one in the bottom of the page, pasta, spaghetti, chocolate, is getting videos and posts. And that make you see that this can be a way to actually perceive in a concrete way how the algorithm is treating people differently. Another metric is how frequently the same posts get repeated when you access to Facebook after one hour in our case or when you want to access. Facebook may or may not show you all their posts. Here we made in this graph, you see on the column on the left observed means the number of time a post get observed. So everything has been observed at least once. And then a smaller amount has been observed at least twice. And this means that for two timelines, this post was present. And so far, so far. And again, you see that Andrea on the right is the one that was getting much more fresh content. So the likelihood that the post survive to refresh was around 30%. Instead, Michele in the far right was getting all the post selected repeated. And then the other users, other behavior, I don't know, probably equilibrate. Something that is considered to be interesting for you repeated and something that has just forgotten. This was taken between the 8th of February and the 14th of February. And this some weeks later, because the algorithm keeps changing. So it's also interesting to do the picture of this phenomena in specific moment of time and see how they were changing during this time frame. But this monopoly of the algorithm is an issue also for who invest on the platform and depends on the platform logic to spread the content. For example, we take three of the main publisher from Italy that are Il Giornale, Il Fatto Cotidiano, and La Repubblica. What we are seeing here is only the amount of content they publish during the electoral campaign. So Il Giornale just work much more than La Repubblica. Probably because they have more people or because they have a different policy in sending to Facebook their publication. I don't know. Then we can assume that if we are in a failure world their effort will be expressed also on how their follower surface their content. So the percentage, the ratio can be the same. Or if we believe in the filter bubble as a binary concept, you can assume that the user who was liking the green content, the one on the right, will only get from Il Giornale and not of the other. And so also the others. So there are two hypotheses. Or the fairness of the market or the filter bubble. And the reality is more complex. You see that the Repubblica, despite it was the media that was publishing less post than anybody else, sorry, than anybody else compared to this three newspaper. Was present in all the timeline with a certain amount. Il Giornale, the one that was getting the likes from the green is present in fact in the Andrea Colum in the large majority. And Il Fatto Cotidiano still seems to be treated okay. In another phase, like again between the 19 and the 26 of February, this starts to be ever more extreme. The Repubblica is present in every timeline. Who is liking the green is taking less compared to another user. And who is liking Il Fatto Cotidiano, the yellow one, is actually taking it. But for Britta of the 5 star movement seems that the concept of the filter bubble is much more visible. It's interesting that when I talk about these results with the Italians, they know the three media and they try to elaborate some justification. So one justification is, well, Il Giornale is spamming, is creating a lot of content, they get penalized. And maybe it's a possible consideration, but if you are doing business through Facebook, maybe you should know it. The other hypothesis is that the Repubblica, because it's the most bipartisan media, among them get treated better, somehow is more in the center. But that means that the algorithm is flattening the society, is reducing the diversity that we can have with other sources. Another hypothesis is that the Repubblica, as most like than the others, therefore is treated better. But this means that in 2018, the three potentially biggest news media are Cristiano Ronaldo, Shakira, and Vin Diesel, because they're the three individual with more likes. And that's, again, it's not okay. It means that the system of Facebook reinforce the current status quo or the current leading voice. We don't want to just make report and analysis. Our goal is empowering the masses. It's curious because Copernicus, the inventor of the heliocentric theory, is represented as in conversation with God. Instead, Galileo Galileo, Galileo Galilei, the inventor of the tool that enabled masses to verify the heliocentric theory, is depicted in front of the Roman institution, Roman Inquisition, because the church was increasing on him. And how he looks like the same power of the Roman church nowadays with Google taking down our extension because trademark infringement. After some month of the penitent and senior, we finally get our extension restored, but it's something we have to think about. If we want to challenge this kind of status quo of the algorithm monopoly, and we have to use the same tool that are owned by the same company, we need to find some more smart strategy, otherwise we will be always subject to these. We want to advocate for an open approach to let other people understand how the algorithm have an impact in their life. We only use the election because it's a simple story. It's for sure is the only moment in which the global north feel exploited. Like when we were talking about privacy, before Snowden in Europe, normally we were seeing as a paranoid and about algorithm influence and manipulation again as paranoid until you don't have some kind of word wide reference such as Snowden or can be journalistic. But using election just to make a simple story. You do the analysis during a specific time, you know which are the party in play that make a project self-defined. But our goal is to enable other to understand how the algorithm has an impact in your life or your group. And there is this simple method you can use to understand if this is for you. If you are following this talk and you are understanding what I'm saying, you know enough to do this test. If you know in your life, in your context, some groups that is in conflict, that can be a narrative, can be a group of people that you can help to analyze how the algorithm is treating them because quite likely it's not treating them well. They are in a conflict, they're probably a minority and if the algorithm is reinforcing the status quo, we have to find this case to be brought in public and tell as a story of algorithm abuse. And then we as group can help to provide the methodology or suggestion because our goal is provide technology to let other people understand and assess algorithm. We don't want to do report. Another very simple method to do this test. When you install the extension, you can open the panel and check on a part of a group of study, write a random code name that people of the same group of study knows. In this way, you are tagging your contribution with this code name and who belong to this group can download and analyze the data. So you can do it with a group of friends, do it in your class or whatever, in your family. The point three is about, do not matter if you're running a test with a fake user, fake profile, like bots with an auto-scroller or with your individual user. In all the case, if you tag it to compare them together, that is a way to start to understand how differently you are perceiving the stories. Of course, if you saw South Park episode four, season 21, they explain a great idea on how to challenge Facebook. But we are not so cool. We are just saying, don't delete Facebook, give your profile to the science. Your profile is a unique way to observe the network. Unique in the sense that you have selected unique source and in the graph probably there is not any other person like you, is because Facebook has profiled you and you don't know how you are profiled and is an observation point for understand how Facebook is behaving. But of course, if we're managing data of volunteers, we have to have a responsible use of technology, otherwise we are not different from Facebook. And now you will get all the list of our ethical commitment to explain what we do and what we do not want to do. The first point is Facebook tracks, FB tracks, is not a social media intelligence tool. We only observe what appear in the news feed, not what appear in the individual page. So if you have facebook.com slash, that is observed. If you are going in slash something else, is not. We respect people's choice in the sense that we observe only the public post, the one shared with the world. This is also compliant with the term of service in the sense that people who post content publicly knows that someone else can take it and we are part of this someone else. Additionally, GDPR demands that we do something more. So if we're treating data, if we're collecting data published by an original author, we want to let this original author exert their own GDPR rights. To do that, this person will have to access through Facebook, check if there are data about them on our server and delete or change them. This is not yet implemented, but it's something we have to face before the next campaign. And we consider the timeline, so the sequence of the public post collected as a personally identifiable information. Every time a post get collected, you get a banner that appear on top of the post, saying the post has been recorded and a link to access your data. Or if the post has not been recorded. That is the only client side check in the sense that Facebook HTML change quite often. Therefore, it was impossible to do the analysis on the browser extension. We only verify the privacy level in the browser extension, then we take the entire HTML, we send it to the server, and then the analysis, the pipeline start. Which data we are collecting, so only the public, not if it's shared for friends. If your friends share something, only for friends audience is not seen. And this is also the same for custom audience, the one with the code. There is the difference between our test and other adopters like you, because in our test, we are using a fake profile so they have not a human rights. And we are following public sources like media or a politician. And there is not an issue in releasing this data set. That's why the data set of the Italian election or also of the, are you sure? 10 minutes? Only 30 minutes. Okay. Well, go fast. A doctor should have an exemplary control. We know which are the privacy capability we want to give to the user, but it's quite difficult to implement a meaningful UX. That is one of the limit we are facing. Luckily, we get the first important fund since December of this year. That is one of our goal. The data observed by your profile are only yours, only you have access to it. And not us, what we analyze are aggregated data. We want to let the user customize their own data attention policy. The delete operation at the moment are made by us if requested. And in this moment, you are identified in the server because the browser extension generate a public private key. So we do TOFU trust on first user of the key and we start to collect the data associated to a public key. This should change in order to have a security policy better defined. And we have to define these things in January. We are open to any kind of discussion. There is a GitHub repository. You can, as a doctor, share a portion of your informative experience. This is an example of what we want to give. Michele, the far right, Britta, Movement for a Star, Antonietta on the left, they saw an event happening during the election campaign. It was about Macerata when a racist starts to shoot migrants. And we were checking how this event get seen by them. The Venn diagram is the kind of visualization that can also permit in future to you and your friend, your partner, your colleague to see how differently you perceive the debate around that topic. To do that, we pass through a semantic analysis system that look only for keywords that are referring to Wikipedia entry. That is a way to avoid full text search and having only meaningful elements, the one that have a page in Wikipedia extracted by the article and pieces. Analysis of the aggregated data run in our database. But again, we want to empower others. So how we can find the compromise? The idea is that nobody will have access to the database beside us. And we will enforce some kind of protection that can ensure accountability. But if someone develop an interesting research question and this can run on the database, we can make the database that is usable to do research in the public interest. This is only acceptable if you are not exposing individual behavior. This is not something that can be verified formally. So we will have to review case by case. And there is an example. Wolfie Christie asked how many posts in average are sponsored on Facebook because he was seeing that one every four posts was paid by someone. So we made an analysis to look at the percentage of the sponsored posts in every timeline collected month per month, considering only the timeline with more than five impression. This January, 15% and 10% are the higher over 20,000 timeline collected. Then in February has been changed the rounded, now is rounded by three. You see that 12% is the most likely percentage of paid content, 12% again in March. Then you see that in April start to increase. He is here, he is represented in percentage. The code to do that is this script. It's pretty simple. Just is a query on MongoDB that reduce and take the identifiable data to compute an output. But still Facebook said we're committed to do our best for transparency in political advertising. Yes, they've wrote a lot about it, explaining that they really care about it. But behind the scene, you see that they are doing their best apparently to avoid that the third party like us understand what's going on. This Twitter thread explain how they are making a random amount of span and div classes, element. And splitting the word sponsored in multiple section with also fake letter in the middle. And there is a team of anti-scraping dedicated. This is not something recent. It's something we observed since one year. But now is having worse. Now the fake sponsored or patrocinado like he's in Portuguese because he's localized. It's present in every post. To be honest, this is how look a bad day for me. So I wake in the morning, I look at the statistics and they see one of the parser that is not working. This mean the HTML is changed and they have to update my parser. And then I will start to do coffee and update the parser because we have the HTML collected. We can recover with the lost data. But I feel this a corner case for free software because everything we made is under GPL. The backend is in Afro GPL. But the parser pipeline, now I don't feel really comfortable now that I spend new days to fix it again to make it public because somehow I feel that Facebook is like Jupiter. The most massive and ancient and biggest planet in the solar system are gaseous giant and we are a fart in the space. With this kind of power difference playing also open source and have Facebook with paid person that I want to speculate to you is a bit tricky. In theory, if we have a robust community that is big enough to counteract to every change that Facebook does will be great work in GPL. That is what is meant for. But in this phase, I feel a bit of concern. And there is also another kind of ethical concern that is open to debate and confrontation because algorithm may take the shape of another form of oppression. If you look at the debate we are having currently, we see that many groups sponsored organization, researcher, politician, call for artificial intelligence able to fight this information. Like if a tool can decide what is true and what is false. I mean, it's conceptually wrong but this can be counter counter countered only if we have enough people that can explain why this is wrong and hopefully this community is the one that can be a bit of resistance compared to the lobbyist. Still, we as a tracking exposed advocate for the algorithm diversity. The idea that you should own around your own algorithm because that's defined your priority. In the other sense, you have the platform algorithm a platform deciding what matter for anybody else. If we have the algorithm diversity, we will have fully empowered connected citizen. Like the dream, the distributed network with also you owning the way to index data and be fully independent. In the other sense, you have imposed obscure not accountable values. Something that is decided to matter for you and for your society, for your community. But nothing is so simple because we already saw that there are phenomena such as polarization, conspiracy theory, anti-vaxxer that they can be created because of this intermediated communication. In the other sense, if there is some entity in the middle that is responsible for what is doing some kind of content verification may happen. So we are pushing for the possibility diversity because it's the opposite of that status quo. But in theory, in the long term, I guess we have to find some middle ground but only an informed society can critically think to what is the better choice for them. That is our message. To develop for this phase, simple, stateless tool for algorithmically literate people. For stateless, I mean, idempotent. So something that does not keep a state. It's not a business machine learning. Something with the variables that you can control, you can see, you can play with. There is an opportunity to do an interesting experiment that can be made in many countries and are the European elections in May 2019. If you look here, there is a small action plan that explains how the European elections explain how we want to deal with will not be only based on both but mostly in having people actually engaged in the different countries. We can... We have to consider that, sadly, the trend of these years is that politicians want to be on social media because that makes them disintermediated. So they can just... They believe to be disintermediated by an algorithm that is intermediating them. But they believe that they can reach their own audience. This implicitly makes the social media more powerful because it can decide what is going to be seen by whom. And also give to us more reason to do this analysis. We want to see how this political content is treated by Facebook. And a special answer that we are going to release on this stage is a tool that exposes via RSS, the RSS feeds, all the content that matches with a specific semantic keyword. So imagine that you collect the data, the data gets separated by your timeline and there are just public posts, semantic analysis of the post. You have a set of semantic keywords and then you can look through RSS data set. For example, if you subscribe to facebook.tracking.exposed. slash feeds slash facebook.xml you will get all the posts that match with Facebook as a semantic keyword. Of course, we have to let exert much more control on the reader because that's just a way to be spammed by strange content. For example, language selection or other criteria. But only because we have a free software tool we are open to discuss which are those criteria. And there is the European election. There are all the conditions that can make this approach of algorithm self-owning more viable and experimented. The Nabats tag are there because they can read RSS. So if you have a Nabats tag, you can start to read random Facebook things. It's an ambitious plan because we want to keep a database in the collective interest and make it privacy-preserving because we want to advocate to smaller groups and explain how algorithms are harmful for them. We want to cover other platforms like YouTube Tracking Exposed is in a shitty alpha but something is moving. We want to communicate how algorithms are treating you, showing how our technology behind the scene is working over your data. So we need really a lot of figure. We are very open to welcome newcomers. We want to set up a system to a method to do onboarding of new volunteers. There is some small grant but considering the public interest of algorithmic accountability. If it's possible to think to do fundraising, we are open as long as this do not compromise our ethical commitment. Most of the technology is written in Node.js and in JavaScript, and the analysis can also be easily made in Python. So if a data scientist interested in this data set beside what we already published in GitHub, reach to us to understand which kind of interface you may want. So if I understand the CDC, normally you may can applause at the end but please don't make it just to me but also to all the proud contributors. Thank you Claudio. I guess we have lots of questions for this excellent talk. If I don't see any urgent questions here, over there. At microphone number three please. Yes, hi. So thanks for the informative talk. I have a question not on this narrow subject of the Facebook algorithm but on another aspect of it. So it occurred to me and some friends of mine on multiple occasions that Facebook displayed ads which seemed to have no context in, for example, the browsing history of me or my friends but in conversations we had near our iPhones obviously or smart phones. So, for example, once a friend of mine talked about the idea of traveling to a small specific village in Austria and then suddenly they popped up and they had an hotel over there or something like that. Do you think it's a misconception of mine or paranoia or have you heard of issues like that and if it's a letter is it just an epiphenominem of actually we haven't searched for things like that or have you any reason to believe that Facebook might listen to conversations like that and tell out the ads to it? I have the answer. Mark Zuckerberg in front of the congress testified that they do not listen so I guess he was not lying willingly in that context. But an article of motherboard explained how other apps are listening to the microphone and the user and the result of the profiling can also be used by Facebook. So, that is somehow the situation I understood but this is about mobile security and which are the apps you have in your mobile phone what they are listening and what they have access to. Facebook is just apparently using this data as a third party profiling which is that they are integrating in their own dataset. We're heading over to microphone 2 please. Hello. The user that got shown the far right content and the more videos. Is it possible that this user liked more videos before? So, is it a simple recognition of the types of Facebook posts or it's a deeper recognition of communication structure and stuff? To be honest, we didn't keep track of it but the likes was mostly scattered around the specific sources attributed to the orientation and the likes there was not a methodology on that. But that is an interesting question because another test can be done with this kind of attention in mind indeed. Because maybe the source is already very video full automatically it likes more videos. Is it a theory? Still consider the... Well, it's a theory. Ok, let's stick to microphone 2. Hi. So, Facebook doesn't just get paid for straight up ads but you can also pay it to boost a post on a page and reach a larger percentage of your followers. And it actually discloses some of its algorithm and lets you choose a little bit who you want to expose your posts to. So, I wonder if you did any data collection from the point of view of the pages because there seems to be a lot of data there. No, not yet. The point is we observe only what appear on the news feed and that was the limitation. Looking more on the advertising logic may have sense but is a development we didn't consider yet. Let's get some in. Turn that input. My signal angel is waving hands. So, go ahead. Yes, thank you. Happy from the ISC wants to know how should Facebook handle the situation in your opinion? Which situation? What can Facebook do better? Ok, interesting point because our goal is not to do the replacement of Facebook but to just change the culture. In theory, imagine that Facebook will give only chronologically machine-readable content and then in your client you have your algorithm that decides what appear to you or what doesn't. That would be the best case scenario because it means that you are using Facebook as a true neutral platform that is only intermediating data without applying any kind of filtering and then you realize that your algorithm is shitty because you need to understand maybe such a phenomenon analyze and put in perspective trends that are happening in your region or your context and that may justify that you want to use other sources to evaluate the news feed that you are getting and to select what matter for you or what doesn't. That means distributing the responsibility among different entities on what can influence the perception of the stories and means that the user will always have an individual agenda respected change their own algorithm and maybe customize them share them, etc. Microphone 4 please Thank you for the talk again Has Facebook any ability to realize that the data of its users are being collected by your service? At least the web extension is running and I don't know how our application like the web application served by facebook.com can do some tests if they are doing some active test to spot the presence of the extension is something we should be capable to spot it. We have not that ability right now if someone of you who is expert in web extension development can help us on figuring out this race otherwise on the dataset side there is no way to understand if a user copied the data from their timeline Another question from the internet Yes I guess facebook's algorithm is changing constantly How is your research project dealing with that? Aren't your findings out of date are we quickly? I don't care the point is not to reverse and generate the algorithm we are not aiming for that because it's impossible to guess the complexity of the variable behind whatever matter is showing that the algorithm exists, has an agenda and also if you believe that it's doing something in your interest it cannot be because only you can know your interest Facebook can keep changing and also tomorrow in direction that are not in your interest Microphone 2 please As you mentioned at the beginning of your talk you also created some fake accounts for your research Did you run into any issues of those accounts getting blocked by Facebook or getting a need to verify or something We made a lot of experience on how to record accounts Yeah I'm having a user virgin no profile behind following the same page when you are using an account with the zero friend Facebook may spot the anomaly may eventually also treat you differently that's why if that was ok at the beginning of the year after all the publication and speech we made about it, this is not anymore I will feel comfortable to suggest we can experiment something else also with friends that are interacting as long as it's documented in the methodology sorry for the issue on taking down the accounts mostly we are using one SIM card to create a dedicated user that's all You divided the types of posts by videos, photos and posts Did you analyze them visually too because we did some research and we found out that it depends on which account creates those posts for example many posts created by the German AFD they posted lots of photos but within a post container so it looked like they did way more postings and way less photos but when you really looked at the timeline you realized that it was more photos than all the other accounts but it didn't show up because it wasn't in the URL You are right we are improving this ability to extract metadata for example when you share a photo you can also put a lot of text on top and that should be counted so now we start to use this much more fine analysis but in this analysis we were only using the URL to understand the kind of content Over there microphone 3 Hi, thank you for a very interesting and important research you are doing I was wondering if and if so what role you see for social scientists in this type of research especially in the research design and also in the output interpretation phase Thank you Social scientists besides the fact that I'm not I never understand what they can do but for example very for example the help we got during our test was someone selecting the facebook page that want to be across the spectrum or someone who knows which are the keywords that express some issue for example a woman in Catalonia she was investigating on harmful speech and machism and she was using a lot of profile of women to observe how this kind of context was surfacing so she knows the keyword list this is an example on metadata that a social scientist can have as an input and then the research is developed around that another researcher want to see following random page how fast speech from politician to face in the narrative this mean that has to select some politician to be used as reference has to analyze their publication and check if what is appearing in a random page depends from them or not so mostly we have to provide tool that can give to social scientists dashboard some analytics that can let them to understand what's going on users timeline or in people who are actively participating to the project so imagine that you can have a research group of 20 or 30 people you will have your own special URL that say to them if you click opt in you are accepting to go on the research made by you that will investigate on A, B and C will last for the next two month and you can get your findings here that's number four please so I'm a Facebook user myself and I often see these paid advertisements from entities that I do not have a clue about is your data set including these paid advertisements because you focused on certain entities that you know of yourself but are there any anything that popped out is we to be honest collect everything that is public so if the advertising appear with a public post shared with the icon of the world this evidence is collected and the fact that you don't know who is advertising on your timeline is a problem on the fact that you are target for groups you don't know but we cannot explain to you why like the reason we are targeting you is something outside of our main but the fact that the target view is something that can be collected in this moment we are collecting this post but we have an issue in finding the sponsor of the post so the reason why I also show the statistics until May is because after November the sponsor of the post start to be even more deceptive and we have still to fix that parser to find the sponsor of the post is something have to deal with in the next days and if you want to use this use case to experiment with us welcome number one please as far as I understand understood you are parsing the content and adding specific tech to all posts which have been collected by your add-on and then I think it's pretty because it's still the page of Facebook so it's pretty easy for Facebook to track your text you added and to see which users are taking part on this on this collection. Only if the web extension, the web application of Facebook look actively that we have changed the DOM and that's indeed is true. That can be verified also by us. It means that Facebook is sending a web application intended to observe tampering. It is something we can test. We can also we put the message for transparency to remind the user that they are using the extension. It's also true that if this can become an issue we can just display up up at the beginning and say remember you have FB tracks installed and nothing will change in the DOM of the page. It's an opportunity, it's a possibility. In this moment the bottleneck do not permit us to elaborate more. But if you are an expert in web extension development you are more than welcome. I see three more questions. We have two minutes so keep it very brief. We start with two. Very brief. When Zeynep Tufekci started writing about YouTube radicalization the reaction from YouTube seemed to be reasonably panicked that they were themselves struggling to understand why the algorithm was doing such things and struggling to fix it. To what extent do you think there is a similar atmosphere in Facebook that sort of they've almost lost control and are themselves struggling to understand what the news feed is and what it does? Thank you. They already published some years ago an article saying the algorithm is so complex we don't know how it works which considering is a neural network formally yes nobody can know what is the status of the neural network. The fact that there are political impact and implication maybe is too distant from the average Facebook developer. So you are tuning your algorithm to provide more content fits for you and you observe that in the majority of your demographics video are doing better than selfies and this input end up to have a political impact and you don't know it. So I can understand that that may be something for which they cannot be responsible but they are because in the moment you keep your algorithm secret and you decide to change your logic and your solution is making other algorithm that promise to do better than is an acceptable that is influencing the public discourse for reason that for us are invisible. So probably they are panicking but the solution will not just be more secretive and to improve them because it is nonsense try to try to improve all the political conflict around the world. You can at the best fix the alt-right in the US because most of your developer are in Silicon Valley but the diversity and the diversity of the world is not something that can be covered in this way. I'm sorry we have to make one final question I'm sorry that we are out of time so maybe you go to the speaker after the Q&A. So last question Thanks. You mentioned ethics in your talk so do you think your approaches or knowledge which you generate can be abused with bad goals? Well we want to avoid it to do that the only PII is accessible to the legitimate owner of this data and the result of the analysis should be not related to an individual but only to observe phenomena Now I'm not working alone with a group of now we have also University of Amsterdam collaborating with us that will bring additional review and the other third party to do privacy assessment and impact assessment so we are confident that this is not going to happen of course we should keep as much as possible open to feedback and privacy revision from other organization and of course depends for bad usage so it depends which is the user you are imagining but we try to for see and avoid them Thanks a lot give him a big applause