 And so I have the pleasure to announce the talk about trachography. You never read alone. And as speakers, we have Maria Zainu, who's a researcher at technically, I'm so sorry, technical tech collective. And there's Claudio Agosti, who's also with the Technical Tech Collective. And apparently we're going to have a premiere tonight, which is a software which will illustrate where all this data is going, where all these data are collected about us. The sale is yours. Hi, thank you very much for being here today. Claudio and I work with Tactical Tech, as we just mentioned. Tactical Tech is an NGO based in Berlin, but we work internationally. And we are super excited today because we're launching one of our new projects, which is called trachography, and it's all about online trachy. So when we think about surveillance, very often we think about governments because a lot of it is carried out by them. So we do think about governments. However, in most cases, I think that companies, corporations, make a lot as great as possible. A lot of our daily browsing activities, like reading the news online, is what enables governments to access the data that they want to give away. So for example, when we think about a prism and a whole bunch of other systems, that NSA and other intelligence agencies, we can see that a lot of monitoring is possible because a whole bunch of companies like Google and Facebook collect data based on their tracking that they do on a daily basis. And this is something we would really like to draw your attention to through this project. Basically, what we understand is that this whole world of data collection is creating a new kind of power around the world. Because musically, one country is data geopolitics. At the moment, if we go better, one country has a large pipe of gas, and these gas permits for you to be warm in the winter. Do you understand the San Giro-Political relationship? But the question is, who are the other depending more on the other? This kind of answer are the things that are rotting inside of us and the trackography tries to be an answer. It's a project intended basically for an advocate, and for your analyst, researcher, so a person that makes analysis about these things. We are focusing on the media website. We mean the website providing the news media because if someone is able to understand and to study the information, a reason, a country is accessing to the media and would be more interesting what is more accepted, researched. As a factor, a way to perceive and to study the nation itself. You also see by the Facebook experiment that you are influenced in your mood and that is why the media are the subject of this test that the media can apply to every website. So basically we developed a script that emulates the behavior of a user connected to a website. This script performs the connection using phantom.js. Phantom.js is our browser without the display window. Every time you connect from the website, many third-party content like the video or the video or the banner that are in advertising or some static picture or some hidden tracker like a JavaScript inclusion that just puts some cookie and reads your behavior. This is what our script automatically detects. Then we perform a trace route for every included third-party. We figure out which is the network path and which server is included. Then we perform the OIP, a lookup in the path. Location, you have to figure out which are the infrastructure involved. At the moment we have collected 30 countries that are in this data. The media list changes over time. The media lists change over time. We have over 3,000 sites analysed and the whole thing from 34 different countries. So let's take a look at the geography of the whole thing. You look at the geography of the whole thing. This is essentially by running the script and these countries based on media websites that we collected. If your country is not blue, you are not discriminated. Unfortunately, we don't have a partner in the corresponding countries. As you can see, we have data for the blue country. For example, if we click on Germany, on the side of the panel, you can see where you can select the websites which cover the news internationally and which are accessed globally. If we click on the international politics in the wide area, we can access the Wall Street Journal. If we click on the Wall Street Journal from Germany, we can see that Germany is green, the country from which we click on it. If we click on the Wall Street Journal from Germany, we can see that Germany is blue, the server of the Wall Street Journal is hosting. And then we see the countries that use the network infrastructure to access the website. And of course, we try to show the websites that host the trackers. The way that these companies track us is that they have all embedded codes that are on third-party websites. We have a website up here, basically to the media website. The unintended connections, which in this case are 67, are connected to the Wall Street Journal. There are not only the wanted connections, but there are also 67 unknown or not-wanted connections. And we are also connected to servers that go to third-party providers that were not wanted directly when we clicked on the Wall Street Journal. So when you get access to the track, you access the media that you read, either from the global list or if you scroll on the global list, you can click on the global list and you can simply have a global view or a national or even a regional list to look at. And then you can see where the sides can really track you, where the providers can track you every time you click on this website. Now, the reason why we decided to show you this is because this is one of the very interesting slides where there is really a media side that the only website is on the same track that we found. And that's WikiLeaks. So the only media side that we can show you where no tracker is outside of WikiLeaks. So if you look at WikiLeaks based on one of our results, then you really only see that your traffic and tracker is there. And we find that very interesting. Besides the study, to look at who does the whole thing, you have to look at how we are connected. Every time you connect to a server, there is someone who is analyzing actively your content. A collection of your behavior or the content you are exchanging. If the connection is encrypted, the third party in the middle has no power of modifying or collecting this traffic. But commonly, this kind of part of the action is not encrypted. And there is no revelation about Angry Birds. This was one of the very examples because Angry Birds was at a game where the advertising server was running in the U.S. and therefore the information was needed to reach the server who was touching the U.S. And then they say it has got a good idea to monitor our traffic. Another reason why the network topology matters is because we never really know what is absolutely these cables. We never know. And when your connections travel through them, so this is an example from the Snowden leaks. And this illustrates a program through which the NSA collaborates with a whole bunch of the party countries to provide them direct access to the fiber optic cables which make up a lot of connections. And also these third party countries which include Ethiopia, Saudi Arabia, and the United States. And that's another example of the CIA. I'm accessing this website. I don't know, Azerbaijan, for example, to reach the server. It might be a big deal because agencies which wouldn't want to have access to your data do have access to your data because they have a particular network infrastructure. We know that these communications on the fly can just change the way in which you are providing your data and put the information you are providing on your computer. But this can also be used for other exploits. For example, if your browser is vulnerable to some exposure, the person in the middle can simply exchange the dual content in your computer. Fox Asset was a similar attack from the same kind of privileged position. So one of the reasons why we developed photography is because we wanted to figure out politics of data. What does it mean today when you, through the information your data literally travels to various servers all over the world? We are accessing three media websites in Italy. What we can see here is that Italy owns the servers of these media websites. Of course, the big countries as you can see here are the ones which are hosting the servers of the different companies. But what's important to note here is that they are not using the servers of their own media websites. If we look at a country in the global south like Nigeria, for example, what we can see here is that Nigeria does not host the servers of its national media websites. Instead, they'll host it in the USA, and then an infrastructure required to access them is based in South Africa. And the blue arcs essentially show how, where the data has to travel to the international media websites. How can Nigeria and all the Nigerians in the world kind of protect their citizens' data in practice when they don't even own their own infrastructure? How can they make sure that their citizens' data is actually protected when they don't own or when the control happens to it? I'm not implying that we should have some type of autonomous system like China, or the United States. Another aspect that you deserve a study is when you see someone that invests on a foreign infrastructure. For example, the former investment in Brazil, in this case, has gone to America to create some network infrastructure. In certain cases, the connection that is travelling from your Brazilian carrier to the US, some Italian infrastructure. This does not mean that the connection actually goes down the ocean, goes in Rome, and then comes back to the ocean. On the field in Brazil, there's some Italian infrastructure, and the Italian infrastructure in Brazil is the only infrastructure which is the only infrastructure. Now, again, another example. Given the political tension in Ukraine, because of the Russian and the Korean Revolution in February, we decided to run the script in Ukraine. As you can see in the map, there are many connections to Russia. This is because by accessing two of the main companies from the open websites in Ukraine, we can see that one of the main companies which tracks users is Yandex, and Yandex is sort of like the Google and Russian. Now, this raises a whole bunch of questions, but essentially what I think is quite interesting is, regardless of whether the Ukrainian is pro-Russian or not, the Ukrainian pro or contrarussian is, they really want to have all their data and do they really want to come to Yandex? What do I mean? Well, hand in hand, what are the Russian agencies to access to the type of news they read and all other present activities, and all of this? When you click on a counter, this is colored, you can see why it's colored and we are the third party running in there, or very nice to pass. Wait, I close here. Okay, but the goal here, when we start to collect the data, it was not to generate just a image and things like that. We have created an API in the future to collect this data and mechanize it. This is an example of the percentage exposure that is facing every test. For this exposure, I mean, it's enough that the one connection between the many that are performed when you are accessing to a media website is passing through a country to assign a presence of that country. Because if... Every time the traffic flows through a country, the presence is raised here. We have to... For example, in Russia, for example, in Russia, every internet access provider has a contract with an international carrier to offer different routing possibilities. So we can see here in Russia that 100% the traffic flows through the... Through Russia, 5% of the traffic flows through the internet access provider. The traffic flows through Russia, 5% of the U.S.A. and so on. So we can directly imagine which countries are the most present in our routing. Germany, Austria, etc. So there will be different countries in Italy, Austria, and the Philippines. We have also tested this in Italy. There is another provider and another autonomous system involved. In this case, Europe without a specific meaning are present. France, USA, as always, exceed. So again, France, the U.S.A. So... When we collect all the data, collect all the trackers, we figured out to declare the most media website ever. But this is impossible because we discovered that the amount of data in the time... We see in this heat map a certain amount of media. In D3JS, I'm a goatee so if someone can have some data I goatee and so if someone can help us to develop a better visualization you can see how many trackers are present and how many trackers are present. This heat map was intended to see the source of the user provide different trackers. So when we access media websites or when we call media websites then we go all over the world. But of course there are a lot of companies from third countries that provide trackers. And here we have some numbers which trackers are present in Germany from which countries in most cases, in 88% of the cases Google is the dominant company and Google is the dominant company in almost all of the countries that we have analyzed in 30 countries Google is the leading company which except for two exceptions there were two countries that didn't work one was South Africa and the other was Russia where Yandex does most of the tracking as mentioned earlier Google is the main company that does the tracking Google should have the type of monopoly and so forth a lot of media organizations use Google analytics and that's why Google analytics and that's why it's so often here and that's of course a political question whether Google should always see what you are reading and whether we should simply agree to this Google hegemony so in the last four months we didn't have time to analyze and go through every single one of them so we decided for starters to look at the so called globally prevailing tracking companies and that we need the companies which track you the most based on the media websites that we analyze so basically the companies like Google which presents the highest percentage just as illustrated in the presentation previously for example if Google is a business model like advertising or profiling and you know a lot of people have said to us especially over the last months when we've been working on this is that you know I don't care if Google or Facebook or any company you know is tracking me but it's an advertise I mean you know in the end my web experience they provide me service so they deal that's a lot of hard things to answer we think it's more political than that so last night Jake and Lorne Fortress gave a fantastic talk and I'm guessing you attended right yeah cool okay so following their talk how many of you access their Spiegel Slash International to see you know so relations about tax on crypto targeted killings in Afghanistan who has used a tour who has actually okay so who access this while using okay so okay so okay so okay so okay so okay so okay so okay so okay so now okay so okay so so okay okay um um no When we access the specific page from the relations, then we'll just access Dash Beagle in general. And as you can also see, some of these companies include Google, Twitter, Facebook, and NYI. These companies have kind of been compromised by NSA. Prism has been connected to it all. We also know that the NSA has had the intensive centers of Google through the muscular program and so on. So what we can see basically is that last night, when we all accessed the Google today, access is very important information. These companies were tracking on us, and these companies also worked hand-to-hand with those who probably don't want us to get access to information. So maybe that's a great idea. Especially if we're ever going to argue again that, you know, they just do advertising. So how do these trackers even handle our data anyway? What do they even do with our data? It's kind of hard to answer because the real answer is we don't know. And that is actually a problem. When we say that they track our IP address, they track our IP address, our search history, our browsing history, the browser movement, our mouse movement, all of these companies collect a whole wide range of data, but that's why we're trying to look at privacy policies, not because they necessarily do what they say, but because unfortunately, that's kind of like our best shot if we want to know what they do in our data. And also by looking at the privacy policies we can compare, that's what they actually do. So, for example, if they say in their privacy policies that they do not use data, whatever cookies or whatever tracking technologies that we figure out that they do, then that's where we can actually, you know, raise the discussion. So what we have done is that we looked at the privacy policies of some of the globally providing tracking companies. And what we're trying to do is collect the following fields of data as a great slide, like what types of data they collect and so on. And this data we have put in this, we on our repository on GitHub, which means you can all access it, you can all contribute to it. We do contribute to it. There are a lot of companies and we cannot do it all by ourselves. And also it will be fantastic if they can get some lawyer to help us do it and do more kind of research as well. So what about you read the term of fairness? So what about you read the term of fairness? Nobody read the privacy policy? No. Maybe the company with the hidden third-party tracker. If you read the term of fairness, the privacy policy, remember that the term of fairness we can convert the privacy policy, the term of fairness in a machine-readable version of the data. We can convert the privacy policy in a machine-readable version of the data. So by looking at the privacy policies, and I repeat, on the one hand it's great that they have privacy policies, on the other hand, I don't know how useful they are, but it's not clear if they actually want to do what they claim they do. But like I mentioned, it's kind of like the basic information we can get right back. Well, what's interesting to see is that most of them are based in the US, which kind of again, you know, numbers of privacy policies are based in the US, which again is interesting. It kind of like shows again the US, Germany, where it comes to everything when it comes to infrastructure and collective data. But more importantly, we can see that only three of them support Do Not Track, one of which is Twitter. And the other one is that basically 11 out of 25 that we look at are not only 11 out of 25 disclose how long they retain data for. But still, even for ones that do disclose how long they retain data for, that gives you a question. Because, well, they might say, for example, that they retain data, possibly for a few days, that retain data protection period, my position will be renewed. We do not know whether they retain data for, how long they sell data for, and so on. This is endless. The change of the policy act of Eurovision gave us three days. That's such a big question. So the point is that we do not know who happens at the end of the day about this topic. We think that just by looking price-wise, we might get some type of insight. Okay. And that's the API that is designed. And here's the API. That is collecting all the data that is the data that is the solution and the reverse DNS. All the media visited and all the third party that was injected. They injected the machine. The machine is paraformed. The JIP and the Tormus system is connected to me. There are JSON lists that permit the conversion between the domain name of the third party and the name of the company. And with this connectivity you have access to the complete database. And this whole API is designed to extract the complete raw data and add additional features at the same time. Because we want to collect all the data globally. And we also want to be able to change the time in the network. Whether something has changed over time. And here you can also look at the documentation under the link. And at the moment 26 companies are integrated into the system. But of course we try to add even more that we have a job here for the people on the site. So how can we prevent tracking? There is no simple way to prevent that. Start here with ESF, for example. ESF has a privacy badge that is pretty good. There are other ways to prevent third party trackers. No script is a way to block third party scripts. Chrome is the best way to switch to Firefox. Disconnect is another way. And you can also visualize the trackers. So there are some tools to block these trackers. Of course it is important to emphasize that it is always great to hide your tracks. If you can always hide your IP address or other possibilities. This is a little better explained on our website. Here you can look at the website link. So there is also an approach to improving the protection against these attacks. Like for example the ad blocker. Or for example the possibility to analyze identity through different browsers. We are going to finish. We are coming to the end. And the best way to work on trackography is to work on the media list. And to roll out the whole program worldwide. If something is important, you can add it to the test. There is a Python script and a phantom.js script. There is also the source text on GitHub for the root data. So if you want to help us further than just the media website. It would be wonderful if you could like to add a whole brainstorm of other ways. If we could find other ways that would help us in progress. So I don't know if you want to take questions. I'm sure there are many of them. I will be drinking tea and waiting for your eventual question. Thank you so much for this very very interesting talk. Thank you for this very interesting talk. Also you can access the trackography to the org. Feel free to access it and play with the map. Thank you.