 The next talk will be given by Claudia Agosti. He came from security, then moved to privacy, and then he saw what is happening in this social media area with all this personalization that is taking place so that everything is very nice for you or something like this. Well, so a lot of people are a little bit wary of what is going on there, but it's not really transparent so far. And what he does is to devise things, tools, algorithms to explain what is going on there with your privacy or personalization. Claudia Agosti. Thank you. Well, beside Claudia Agosti, there is an entire team of volunteers. This project began one year ago. The project began one year ago. We collected volunteers and we reached a beta stage. I was supposed to release the 1.0, but nobody is actually paid to develop this program, so we are doing it in the free time. The title is The Quest for Algorithmic Diversity, we will see later. I'm not a TV English speaker. I will try to speak slowly to be understood. The talk is intended to last for 35 minutes, so we have time for questions and answers. The website, Facebook Tracking Exposed, is also the name of the project. Exposed is a true top-level domain, so feel free to press enter. There is not a .com, .org, just enter. We address two big problems. I can begin by asking how many people are on Facebook, but we realize that it doesn't really matter. What we are going to criticize here is the algorithm of hegemony, so the influence of the social network to frame the perception of the reality. Every person perceives based on their own bubble, their own sources, their own activity, their own preference, and this implicitly gives to the user a personalized experience. The social media has, as main goal, the one that keeps you in the social media, active, engaged, and that became, you get only what you really want. Considering that many persons are shifting their own information model from official newspaper or from, I don't know, maybe more social interaction to the social media, that implicitly is a chunk of perception of the reality that is influenced and decided by an algorithm. The second point is the amount of data collected by Facebook can really help to understand the society. If the society can check each other, can check what is happening. At the phenomena, I don't speak about looking at the individual, but it can be a way to understand trends and how, in generally speaking, the person changes their interest, their belief, their action, their will. But this is not possible because Facebook privatized the information and you can get only access through certain limited API. This project want to offer a way to understand the problem, not yet a solution. There are some solutions around that are experimental, but this try to provide a tool to understand what are we talking about and what is the problem. The concept is a web extension that run in your browser. At the moment, it works only for Chromium, but it's in progress, the port for Firefox. If someone want to help, there is an issue on GitHub. The extension looks at your Facebook activity because it want to actually scrape the post you are receiving. Only the public post, the one shared with the world, so not scraped the things shared with friends or for restricted audience. Every time you access on the news feed, that is a timeline, we have 150,000 entries currently. A timeline as many posts, they are the impression. The impression can be private or public. When they are public is because it's a public post and we get the entire HTML snippet. In this way, you are collecting for yourself or for the network your actual timeline. What you saw, you can call it back later. This permit to start to begin to understand what are you receiving because sometimes the needs of the user became, I want to be more informed, I want to know what Facebook hides, I want to escape my bubble, but those are quite abstract needs, are not practical. Start to begin understanding your information diet, I call it sometimes, or what actually you are receiving. This is an example on 20 days, and this user was receiving a post from a certain amount of sources based on the color. But those sources, the source is a metadata, a metadata that you extract from the HTML. There are also other metadata you can extract and associate. For example, you can start to understand by sentiment, by keyword, by language, and make more complex visualization that can give you an insight of how you are informed because at the end of the day, the algorithm has a big role, but you have a bigger decision in making your information diet diverse or complete. But then we see also the power of the algorithm and how some kind of repression can be exerted or some kind of influence can be exerted. So a poster can show up on the top of the news feed, constantly, despite not being chronological order. This has been done when Facebook in the 2013 or 14, I don't remember, released the experiment on the emotion contagion. So 600,000 persons have been used in an experiment. Half of them were keeping seeing sad, angry posts on top of the news feed, and the other were just seeing happy posts, just to check how the humans react to this kind of stimulation. And that's, in one side, is the example of the algorithm abuse. Is that okay that I happen, that the user are at the lab to do this kind of experiment? And is that okay that my perception of the reality changed because something keep appearing on top and my attention economy get polarized by this content? In the moment you take a copy of what you're seeing, you can apply analysis and start to understand if it's happening or not. If it's something very frequent or something rare. And also, post maybe they just show up once, but then if you do not interact, that is appeared. We know that exist. We know that this is happening because that is how algorithm works. But understanding how much can permit us to also make a more precise criticism to Facebook. Because in this moment Facebook or the partnership for artificial intelligence are starting to saying maybe we should be transparent or maybe we should give accountability. But there is not a terminology, there is not a political needs, there is not a perception of how the algorithm actually can impact in your life. Having an insight that begin from your experience is the first one to elaborate what is okay, what is not. This is an example on a timeline. Here the post are ordered by the number on the left is based on chronological order. So the first one is also the last one published. The second is the fifth in time and the second third are below. The yellow one is highlighted and you see for example, this is another post, but you see for example that it was remaining in the same place in the same position of the second timeline 10 minutes later and then was going down. That can be an expected phenomenon. It is justified because a generic Facebook lawyer but also we know that we need algorithm. They are necessary to sort it out all the information we are accessing. But algorithm define your priority. So an algorithm is something you exert if you prefer to eat something or go to the Karan or decide what is your priority in life in the short term or in the long term is a priority. And this priority decision is a decision we are not taking. It's something that Facebook is taking for us. Investigate on the algorithm is not so simple. Other experiment has been made. Sometimes this experiment try to show you just who is advertising on your timeline or who is writing to your news feed. But what we are receiving is composed by five high level variable. How the supporter use Facebook because if I keeping accessing to the social media I'm populating my data with a lot of access. If I keep accessing like in an automatized way or all the day, my possibility to see information is higher compared to someone who access rarely. Which source you have because you decide your own bubble? What the source share? Because your bubble implicitly may be polarized by some external event or by their own. The Facebook algorithm that is something we are trying to understand and the supporter profile. So all the kind of your backstory based on your conversation in messenger may be your account of Instagram linked, your likes, your shared data. The data you shared is something we have not. We just look what are you receiving. So based on those assumption, it's clear that we cannot moiter what you cannot moiter what you receive or the researcher cannot do analysis simply. Thinking that based on our small amount of users we can understand how Facebook is working. So we start to run some text, some tests. If those are variable, we can try to reduce their variation in order to test the algorithm or exclude the unknown part. So we made four fresh user with zero friend. They were following the same 12 national media. They were accessing Facebook at the same time in the sense that it was once per hour. They were doing automatic scrolling for three minutes. So access once, three minutes of scrolling, wait for 57 minutes, and then refresh. In this way, you are following these 12 sources and you can see if they publish something new, if the things is appearing for the user in the same way. But we also have made some differentiation. Specifically, every person, every user-bought character has some likes of characterization. So over four users, this test has been made before the French election. One was liking Le Pen, the other Macron. Another was mostly liking a German rapper and the other was liking DR. So a very different kind of subject that may be in the profile will arise. What we saw is that despite that they were accessing frequently, they were sharing the same news media. You see here the post ID, so the post that they get published. This is the user ID. And this the hour, the hour in which they were accessing. What can be interesting is that despite we try to uniform their access, some news was not showing up. This user, as accessed for five time, the post was existing at 15, yes. But in that user never show up. You can see the same in this line. And those are posts that despite they were keeping accessing Facebook, despite the post was available, was not showing up. Why? Algorithm decide. That's, I don't know, it's not exactly OK. We can do different analysis. Also, I keep saying we, but we will see the project is quite open, needed for collaboration. So when I say we, means you with us, can do analysis. For example, to see, to analyze the comments. We know that comments provide a certain kind of frame of the news you are seeing. When you have many comments, you can look the previous, look the next, but you have some preview. And those previews are all the same. How much this preview can influence your judgment on a political event? To do that, we have a structure. And that's just if I also, why we are scraping the entire HTML. We are receiving the HTML that can be represented in this way. That is what we are receiving. And then there is a batch of process who extract for every metadata we can be interested, only the specific metadata. This because Facebook changed the HTML structure quite often. So you need to catch up quite fastly if they updated. For example, if the rainbow reaction get add, you can update your parser to extract also the rainbow reaction. Or this project begun when, for a certain moment, Facebook made the flowers. The flowers to say thank you. In that moment, the HTML structure changed a little bit. Here you see the metadata extracted. In blue are the parser that have run over that HTML. And in black are the metadata attributed. This has been made because the HTML structure was changing quite often. At the beginning, during the alpha, we implemented in thank you to take picture to the slide. But I have to tell you, they will be online. They will be online on the website, Facebook.tracking.exposed. But if you want to take a selfie with the slide, that is unique. It will not happen every time. I can thank you. This is a mistake that happened during the alpha. We were collecting a promoted post in red and feed the post. Happened that Facebook changed the structure. And we start to see the promoted post as a part of the fee. And the feed the post start to be ignored. So we actually collected it. We broke the data during the alpha. That has made us learn that the parsing has to happen server side for that reason. The parser has to be contained as much as possible. They are based on, they are implemented currently, not JS with Cheerio. But there are also some experiments made in Python because you can just talk with an API that say, give me the HTML that has not this field. You get the HTML. You extract your field processing your code. And then you update the HTML structure with the new metadata. The metadata can be something extracted from Facebook. So the name of the source, the number of likes, or even something that is not present. The language, the sentiment, the entities, the complexity of the word used, the presence of a speech or a specific keyword. Or eventually, if you have a list of sources and you want to attribute their own trustworthiness, you can compare the source with the source list. And the metadata that you append to the database is the trustworthiness. In this case, it's just based on a CSS selector because that is most of the time what the parser are doing, what the parsing are doing. Algorithms are a social policy. So that is the theory. If algorithm can influence how society works, that means that they have to be open to public scrutiny because it's in the public interest. Until we do not understand how the algorithm, as a leverage, also the political need will miss. Also the kind of request that we can do as a user or a citizen of a federation or a country. We can do to Facebook are limited because we are not understanding. That is a problem that is getting perceived much more in the last year. With Zuckerberg announcing that they want to do a global community, yeah, good, a global community, but with an algorithm. It's like all the citizens are just one mono citizen of Facebook. We have pretty much different needs if the needs are absolved by our priority, by our values. Those values and those priority has to be embedded in the algorithm. We cannot expect that Facebook with one single algorithm can satisfy everybody. The effect has also impact in who use Facebook to communication because you feel that you have to be present in Facebook because otherwise you cannot communicate. But Facebook implicitly force the person to communicate in the way they want. This is a journalist Kurt Gessler of the Chicago Tribune and here lies that some posts were not surfacing anymore because Facebook decide that the video has to be more promoted. So they were just writing text and that has caused downfall of their own income and to the access. Because what the media is quite often complaining about is that the user are not accessing to the news media through their own homepage but only through social network. News, recent news is that Facebook want to change the news feed in favor to website that load faster. So at the end of the day, the algorithm is a way to incentivize person, users or community to communicate in the way Facebook want. And that is the algorithm hegemony. And that's, I don't know, I'm not seeing up to be part of Facebook land but only through Facebook social network. Imagine that in the future, the social media will just be storage of data and you can decide your own algorithm client side. The user can customize their own algorithm, exchange it with other, remix it, resell it if they have special values or change it when you need it. Because if I live in Italy, I wake up all the days looking on my news feed because I want to be normally informed, I can have an algorithm. But in the moment I become a tourist or I'm migrating or I'm looking for something new or I broke my legs, my needs are changing. If the needs changing, also the algorithm has to reflect my new needs and I want to be in control of it. That's who's the part about the algorithm hegemony. Now thinking that Facebook is a sort of mirror of the reality. Clearly if we can, as a society, access to this data set, we can have insight and understand something more about us, understand or judge what is okay, what is not and take more informed decision. But in the graph, in the big network, you cannot have a single way to observe the network. Every user is an observation point. So if we start to use the extension, you start to observe Facebook, what we will get is a vision of the world that is only a subject of the hacker scene. Because most of you belong to the hacker scene. The observer is never neutral and all these observer are also always intermediated by the algorithm. So at the end of the day, we are not really reading the society. We are just reading a subset of it, partial and biased. But still can be enough to do some analysis or to understand a little bit more what Facebook can do and eventually ask collectively to have more access to this data because it can be in our interest. That is the goal somehow. This project is not intended only to empower user but also to rise criticism and show Facebook that there are other ways that they are not exploring that maybe can be done. So the data collected, the data we are collecting, they can also be used by a user that are not yourself. In this moment, we have some guidelines that I protect the individual data as much as possible, listing the user needs in the sense that if we want to provide visualization or analysis, this has to be done in the interest of the community who feel a problem. Not because I believe that the fake news are the one from Brevard. I will mark that the news feed with Brevard are red. This is a sort of subjective judgment. It's not something we want to do but we want to enable person to implement their own value in the analysis. Data transparency policy. So try to explain as best as possible when a data arrive in our system, how it's treated. We know that this data is processed by parser, can be mixed, can be reduced, can be a nice experiment, try to do the best data transparency policy because this can be an example for other company who actually have to do a data transparency policy. Data transparency policy means explaining all the place and the reason where data is stored, if we can do that, can be a way to show how can be done in a more human rights perspective way and give back the agenda in the user end. So try to make the user experiment their own algorithm and make feel how it is if you can customize it. This is the pipeline that is going to be completed. Impression and HTML, so the impression are just the entry of what you, how many posts you saw and HTML are only for the public. The parser extend the metadata and the metadata can be accessed in three different way. If you are the owner of the data, you can access to your own data. Eventually you can share it. This means that a nice functionality can be that you can share your own personal point of vision, point of view with a person that you trust. A parent, a partner, a colleague or a professor can be a way to understand that everybody's perceiving a different reality and check each other. Anonymized aggregated API. Anonymized data from a graph is hard but some kind of API limited and analyzed can be done that will avoid at best correlation. That can be a way to analyze phenomena. And then researcher who accept an ethical agreement, for example, you will never use this data to create a profile of the user. You will not use this data outside of the research scope. Some logic on publishing who is accessing this data as a transparency are the way we think that the metadata chain can be queried. As a doc.go care to make us remind, anonymized data is very hard. That's why instead of releasing anonymous data, we rather prefer to make the MapReduce code or the API implemented in the server so it can be privacy analyzed and be sure that no correlation element get leaked. It's already happened that a researcher, Sylvia Puglisi, used the database as a verification. She published this paper to test her algorithm in a de-anonymized graph and then has been tested on this dataset. So this mean also other researchers that are not just belonging to a technical field can access to this metadata chain. But that's as to complain with the ethical agreement. We will find the procedure, but that's what we are open for. Next step, what time is it? No problem, there is time. Next step, finding users, writers, analysts and developers, I'm afraid in this order because developers are very important. Mostly we are developers, but what we are needing now are mostly person analyzing the data and extracting insight or start to understand how this huge amount of metadata can be, you can make a sense out of it because it can be very a lot, can be more than what you can expect. Users, because they contribute to different point of view and writers because to engage users and to create a vision on why this is a social problem and as to be addressed, they are necessary. Developers, welcome. In the sense that most of you probably can help us for the technical side. We have a GitHub repository and we are very open to implement new technology to do the analysis or to manage the metadata. We have to complete actually the 1.0, so it's the right time to join the project. Explore the diversity of the needs in the sense that we know that the Facebook graph can contain many insight, but we have not the habit to use it. Normally, the Facebook graph is used to do marketing research or to target the user to communicate to them when they are more vulnerable or more interested to buy, but there are some socially useful way, maybe, we have to find them. Whenever some visualization, some analysis, some insight can be implemented by a community or by some experiment when this can become stabilized can be implemented in the main software so all the user can see it. Some practical idea to explain what can be done next if you are interested. Help researcher in figure out a way to use it. So if you know some researcher that is doing analysis in media and they need to have access to Facebook, accessing to Facebook through scraping is hard. Accessing from Facebook from their own profile is biased because you are accessing through your bubble or through your page that you know. Accessing Facebook through the API is limited by the API itself. This can be a way to do a new set of social science research. Run some experiment. For example, we made experiment before the French election with four users created from scratch but you can run the experiment among your group of friends. Maybe with your friends you are following some of the same source. You start to monitor your own and your friends the same and then you compare how much they are seen by you and then you can run some automatic user to keep accessing to this source to be sure to get the 100% of their publication. Write a new parser. For example, we have not yet a parser to extract the number of likes or the number of comments. It's pretty simple, it's just a number but I was trying to keep the simple task to be done by new commerce so it can be easier and make maybe a community of developers. Use the API, the API to query the data or ask for a new one because in this moment we are just a small team we lack of this kind of testing and imagine how to visualize data. When you have so many metadata it's quite hard to actually make meaningful visualization. If you have experience in that is helpful. Tomorrow at the Italian embassy that is at the corner, yeah, where the grappa is. But at 10 a.m. 50 we will just offer coffee. We will dive into the data set to show which data we have, how can be used and to get feedback if you are interested, et cetera. So this is all. Now I left some time for question and answer because I was getting more in what you are interested on. This is the website. Here are the two links to GitHub. But that's it. Thank you Claudio. So there are two microphones. So if you have questions, please come here and share them. First question? Well, first off, thank you for doing this because I think these things are important to open our eyes to give us an eye into this kind of stuff. I'm wondering how big are your plans? Are you looking also to integrate that across multiple social media? Are you going to integrate Twitter, Google News and stuff like that? Maybe. I'm open to that analyzing the impact of the social media is important too. But Facebook is quite meaningful now for the importance that it's having and this is the first practical way to test the community, to test how much is the answer in the interest, to test how much we are able to manage the supporter and how much are the problem in managing this data because starting now to support others can be just a dispersion of effort but I hope in the future it will happen. The nice aspect of tracking exposed is that it's a domain name who contain a message. So you can put all the subjects you want as a level. Next question? You did this experiment with both users, this had slightly different likes. Did you do an experiment like have 12 or something, users having exactly the same, seeing if they get exactly the same results or if Facebook is adding some element of random to this? One theory is that Facebook will put an element of random. The other theory is that in the moment you are registering a fake user at two different times. When you put the likes, you're just changing the graph somehow. And that means that the next user will act in the graph in a different condition. Or it can be random or it can be so thin that I don't think we can really measure it. But we also have not made a test with a completely equal user. Our tests were equally enough to do this table but also, for example, what has got influence was the size of the screen when you're doing automatic scrolling. So the kind of element that you discover during the test that can lead to use some differentiation are many. All the data points that Facebook has are just unimaginable. But if you want to run this test, there is all the tool for you. Thank you. Next question, please. Hello. Thank you for the presentation. Do you obtain data about the publicity on the Facebook profile? On the privacy? Or how much? Publicity. Publicity. Yeah, we're collecting also the advertising data. So you can, from your own personal page, you can download as CSV format all of your feed or all the promoted, sorry, sponsored posts that has appeared. And actually to the sponsored, if you look in Facebook, if we go in parsers, those are the parsers to extract the metadata. And promoted is about the sponsored post in which we are getting info, links, title. And the other feed are about the feed. So there are also a different set of parsers because the structure is a bit different. Okay, next question here in front. They are accusing Facebook of having implemented filters in the algorithms so people get lead into bubbles. And have you found that these bubbles are really now completely invisible to each other? Are they isolated? Do they fragment society into smaller groups which do not meet each other anymore? Thank you. This is actually one of the example of the social political problem. What I'm looking for. As we made in the last five years, many research. This is one that can be interesting for you. It's some other example has been made as a personal analysis. Now this one was in Ferguson. This one about the conservative vision. But this is a research made across many user. The bubble is a psychological and a technological phenomenon. So we as human stay with the person that we know. We are okay with, we share the same idea, we share the same language or values. That is physiological. The social media can be a problem when this kind of feedback that we as human tend to pursue is incentivated by the algorithm. So if the social media became an experience that can open you a little more than in the physical world is an achievement about openness. In the moment the algorithm start to prioritize your own friends that's became a lose. But this means that social media are still potential. The filter bubble is not just technical and understanding these three shoulders can be important to understand how much the social media has an impact on it. For sure is happen, yes, but also is complicated. So if you compare a person that was spending their own life without the social media and get connected, for sure the bubble is bigger. But if you're taking a person who has a certain lifestyle maybe traveling between cities, going to university and having multiple friends, the exposure to the different news maybe is reduced. So also stating a definitive way about the filter bubble is complicated. Ellie Parise start to talk about it six years ago and now a new research say, yes, I exist, but. So please, you are here in the talk with me. You have listened to the talk, do you have some question? Not really. Ah, okay. One question, can I use this piece of software to try to run my own algorithm for the news like making not a daily, maybe a weekly newspaper of my Facebook? Not the extension, the extension is just collecting. No, no, no, the API. Yes, with the API you can retrieve what you saw and have your own algorithm that sort out your alternative version. So it can be a way to experiment a different user experience of your Facebook data, yes. Thank you. I will love it. Next question from the other. I'm curious as follow up to the previous question there, how disparate the profiles could get, how the different feeds could get. So what is the maximum percentage of difference you found between profiles? Like what are the most extreme differences that you found? To be honest, I don't wanna do analysis. I want to enable other to do analysis because one of the thing that is happening now is that in a political reason, after the Brexit and Donald Trump event that has make everybody surprised, as because the media have to understand better how the social media works, there are many groups that are trying to do experiment in analyzing and providing research output. And this is a cross of skill. The skill of the team we are building is technical because we have to deal with the technicalities and enable other researcher to do their own analysis. The analysis I made has been just for the presentation to show there is potential, but I've not actually start to make these analysis because I'm not a social scientist. I can guess which are the difference between profile with the numbers, but then... I still be interested in your guess. I mean, just your general feeling. Were you surprised at how different it could get or it was not so bad? I have not looked at it. Really, also in the user I'm using, I tend to follow different source. Instead of this answer, can I work if you are following the same between friends? And in the experiment we made, they were the same forcefully. So I really not an answer. But it's not so difficult to do this test. Tomorrow, 10 to 11. I'll be there. Thank you. Next question. I may have missed this in the beginning, but did you share this information with Facebook and specifically in light of your recommendations, I wanted to know how open they are if you have shared them? No, I consider our group still in these sides of potential. No, these sides, the potential is bigger. I hope. But no, I've not got in touch with them. Yeah, thank you. Already. Doesn't look like that they are currently up for the questions. Claudia Augusti with the Quest for Algorithm Diversity. Thank you.