 And so, I have the pleasure to announce the talk about trackography. You never read alone. And as speakers, we have Maria Zainu, who's a researcher at technically – I'm so sorry, Tactical Tech Collective. And there's Claudio Agosti, who's also with the Tactical Tech Collective. And apparently we're going to have a premiere tonight because they're going to show us trackography, which is a software which will illustrate where all this data is going that the websites are collecting about us. The Zal is yours. Hi, thank you very much for being here today. Claudio and I work with Tactical Tech, as just mentioned. Tactical Tech is an NGO based in Berlin, but we work internationally. And we are super excited today because we're launching one of our new projects, which is called trackography, and it's all about online tracking. So, when we think about surveillance, very often we think about governments, because a lot of it is carried out by them, and rightfully so we do think about governments. However, in most cases, I think that companies, corporations, make a lot of that surveillance possible. A lot of our daily browsing activities, like reading the news online, is what enables governments to have access to this data so they can monitor us to begin with. So, for example, when we think about Prism and a whole bunch of other systems used by the NSA and other intelligence agencies, we can see that a lot of their monitoring is possible because a whole bunch of companies like Google and Facebook collect that data based on their tracking that they do on a daily basis. And this is something we would really like to draw your attention to through this project. Basically, what we understand is that this whole world of data collection is creating new sort of power around the world. And why we created trackography? Because basically we want to cover what is data geopolitics. At the moment, if you know that one country has a large pipe of gas and this gas permits you to be warm in the winter, you understand some geopolitical relationship. But about the data, who is the owner of whom? Which are the countries that are depending more from other? This kind of answer are the things that was rotting inside of us and trackography tried to be an answer. It's a project intended basically for advocate, lawyer, analyst, researcher, so a person that make analysis about this phenomenon. We are focused on the media website. For the media website, we mean a website providing news media because if someone is able to understand and to study how a population, a region, a country is accessing to the media and what is more interesting, what is most accepted, researched. As in fact, a way to perceive and to study the nation itself, the target itself. We also see by the Facebook experiment that based on what you are reading, you are influenced in your mood to be more positive or negative. And that is why the media are the subject of this test that by theory can be applied to every website. So basically, we develop a software, a script, that emulates the behavior of a user connected to a website. This script performs HTTP connection using Phantom.js. Phantom.js is a browser that runs without the display window. Every time you connect to a media website, you have many third-party content like the video commonly came from a third-party or have a banner that displays advertising or some static picture or some hidden tracker like a JavaScript inclusion that just puts some cookie and reads your behavior. Those third-party is what our script automatically detects. Then we perform a trace route. Trace route for every included third-party. This can lead to a figure out which is the network path needed to reach the server included. And then we perform J-O-I-P, so a resolution from every hop in the path to a geographical location in order to figure it out, which are the infrastructure involved. At the moment, we have collected 30 countries in the computer analysis. We have a media directory on our GitHub repository with the media list, because the media list changed on the time and required a local knowledge to be tuned. So everyone can contribute to putting his own country. And we have more than 3,000 media actually analyzed and some special media that are analyzed from every country. So let's have a look at trackography. As Claudia just mentioned, we collected data for 30 countries. We did this essentially by running the script in these countries based on lists of media websites that we collected for each one of them. If your country is not blue in the map, don't feel discriminated, just that we don't happen to have partners in that country, or we just don't happen to have someone who could run the script there. So as you can see, these blue countries are the ones that we have data for. So if we click on Germany, for example, since we're in Germany, on the side of the panel, you can see that we have lists of media websites. By global media, essentially, we mean media websites which cover the news internationally and which are accessed globally. If we click on the Wall Street Journal, for example, what we can see now is what happens when we access the Wall Street Journal in Germany. As you can see on the map, the green country is Germany. The country we're accessing the Wall Street Journal from. The blue country in this case is the United States, which hosts the server of the Wall Street Journal. The purple countries are the ones which host the network infrastructure required to access the server. And the red countries, which in this case is the UK and the Netherlands, are the countries which are hosting the servers of the companies which can track us when we access the Wall Street Journal. What's noteworthy in this case, essentially, is that the way that these companies track us is because every single website, or at least most media websites, include embedded image and code, which belong to third-party companies. And here we can see that we have a lot of unintended connections above. Yes. So the one intended connection is basically to the media website. The unintended connections, which in this case are 67, are the connections to the servers of third-party companies. That means that while we were planning to connect to the Wall Street Journal, in reality, we are also actually connecting to the servers of third-party companies, which can track us and which can collect data about us in the process. If you select more media, you just sum up the visualization with others. Right, so you can feel free to, when you get access to this map, you can feel free to select the media websites that you access, the media that you read, either from the globalist, or if you scroll down below, you can also click from the national list in your country or the regional list, which covers the regional news. And by clicking on every single one of these media websites, essentially, you can see which third-party companies can track you every time you access them, but not only which companies track you, but also where your data travels to every time you access these websites. Change. Now, the reason why we decided to show you this is because this is one of the very few examples, if not only the only one, where there's a media website where you access it from Germany and no one tracks you. Or at least, according to our results, there are no third-party trackers included. And this is WikiLeaks. So, again, our results show that when you access WikiLeaks, sure, your data does travel to Norway because based on one of our results, the server of WikiLeaks is based there. But as you can see, there are zero unintended connections, which means no third-party trackers, which I think is quite interesting. Change. Beside the study, who is the third-party that is present on your navigation, it's also important to understand what is the network topology. Because every time you are connected to another server, your connection pass through an infrastructure. And this infrastructure, if it's owned by someone that is analyzing actively your content, can create a collection of your behavior or the content you are exchanging. If the connection is encrypted, like in HTTPS, the third-party in the middle has no power of modify or dump and collect this traffic. But commonly, this kind of third-party injection is not encrypted. And we saw with this other revelation about Angry Birds, that this was one of the very example, because Angry Birds was just a game with some advertising embedded, but the advertising server was running in the US. Therefore, the infrastructure needed to reach the server was touching the US. And Vanessa has got a good idea to monitor our traffic. That is why the network topology matters. Another reason why the network topology matters is because you never really know who has access to these cables. You never really know who has access to the network infrastructure when you're accessing websites and when your connections travel through them. So this is an example from the Snowden leaks. This document illustrates Rampart program through which the NSA collaborates with a whole bunch of third-party countries which provide them direct access to the fiber-optic cables, which make up the backbone of the internet. And also these third-party countries which include from Ethiopia, Saudi Arabia, Tunisia, and so forth, they also are hosting US equipment. So this is just one example to illustrate that while you might say, okay, right, okay, so I'm accessing this website, my data goes through, I don't know, Azerbaijan, for example, to reach the server while it's not a big deal. Well, actually it might be a big deal because maybe agencies which you wouldn't want to have access to your data do have access to your data because they're tapping into that particular network infrastructure. And beside the tapping and Pacific collection, we have already saw that exist a lot of attacks that exploit this kind of communication. FieldFly is one kind of the implant that on the fly can just change the download you are performing and put the fanfisher Trojan on your download. But this can also be applied to some exploit. For example, if you're downloading a Macromedia flash content and your browser is vulnerable to some flash exploit, the person in the middle can just start to substitute the content you are receiving in order to exploit your computer. FoxAcid was a similar attacks performed by starting from the same kind of privileged point of view. So one of the reasons why we developed our calligraphy is because we wanted to think about the geopolitics of data. What does it mean today when through the internet your data literally travels to various servers all over the world? Now, if we look at a small comparison between what happens in the so-called global west and the so-called global south, one example here is where you were accessing three national media websites in Italy. And what we can see here is that Italy owns the servers of these media websites. Of course, the red countries, as you can see here, are the ones which are hosting the servers are the tracking companies. But what's important to note here is that there are no blue arcs in the sense that they own the servers of their own media websites. On the other hand, however, if we look at a country in the global south, like Nigeria, for example, what we can see here is that Nigeria does not host the servers of its national media websites. Instead, they're hosted in the USA and the network infrastructure required to access them is based in South Africa. And the blue arcs essentially show where the data has to travel to every time they wanna do something as simple as read their national news online. Now, I think one of the reasons why this is particularly interesting is because how can Nigeria and all the Nigerians in the world kind of protect their citizens' data in practice when they don't even own their own infrastructure? How can they make sure that their citizens' data is actually in practice protected when they do not own, when the kind of control happens to it? I'm not implying that we should have some type of, you know, autonomous system like China or so forth, but I do think this is something which I think about carefully and maybe raise debates about. Another aspect that deserves a study is when you see someone that invests on a foreign infrastructure. For example, the former telephony monopolist in Italy, Telecom Italia, has put out a lot of investment in Brazil. In this case has gone to South America to create some network infrastructure. This means that in a certain case of the test performed, the connection that is traveling from your Brazilian carrier to reach the US was true some Italian infrastructure. This does not mean that the connection actually go down the ocean, go in Rome, and then come back to Washington. It's just because on the field in Brazil is present some Italian infrastructure and VIP address are associated to an Italian company. Therefore, the system recognize the ownership and the entity owning the infrastructure. Now, again, another example. Given the political tension in Ukraine with regards to Russia throughout 2014 and the Korean Revolution in February, we decided to run the script in Ukraine. As you can see in the map, a lot of connections go to Russia. This is because essentially by accessing two of the main media websites in Ukraine, Pravda and Vesti, we can see that one of the main companies which tracks users is Yandex. And Yandex is sort of like the Google equivalent in Russia. Now, this raises a whole bunch of questions, but essentially what I think is quite interesting is regardless of whether Ukrainians are pro-Russian or not, it really comes down to do they really want to have a lot of their data ending in Russia? Do they really want Russia to have access to them? And do they really want a company like Yandex, which likely works hand in hand with Russian intelligence agencies to have access to the type of views they read and all other browsing activities and more or less, a lot of things they do online? When you click on our counter, this is Colored. You can see why is Colored and which are the third party running in there or the connection passing through. Yeah. Wait, I close here. Okay, but the goal here, when we start to collect the data, was not to generate just image and interesting things. We have created an API that has permitted to every researcher to collect this data and make analysis. This is an example of the percentage exposure that is facing every country tested. For percentage exposure, I mean, it's enough that one connection between many that are performed when you are accessing to a media website is passing through a country to assign a presence of that country because if it's running some program that analyzes the traffic of the foray in order to analyze their behavior, it needs to be taken into account. In this map, we see, for example, in Russia, one test running on a certain carrier that answer to the autonomous system with this number because every internet access provider has different contracts with other international carriers that are for different international carriers bring different routes. And so we can see that in Russia, 100% of the connection pass through Russia, 85% touch USA, 77% touch Green Breed, et cetera. And in this way, we can take a look of which are the countries that are most present in our navigation because we are seeing Nigeria, Italy, Philippines, Germany, Austria, et cetera. For example, we have two tests in Italy, different autonomous system, means different provider involved and also different exposure of the user. In this case, Europe, that the idea associated with Europe without a specific meaning are present, France, USA, as always, et cetera. And once we start to collect all the tracker present on the media website, we figured out to declare the Rost media website ever. But this is impossible because we discovered that the amount of tracker injected by a media website changed during the time and maybe also in the same day, they are changing. At the moment, we are just analyzing the home page. But we see in this heat map on the X a certain amount of media and on the epsilon, the test run. In the D3JS, I'm a goat and so if someone can help us to develop a better visualization, is welcome. But almost over the value square, you can see how many tracker are present and when the test has been performed. This heat map was intended to see if some media, dependently on the source of the user, provide different tracker. And the answer is yes. So when we access media websites, we're not only vulnerable because our data travels to various servers all over the world, which we cannot control, but also because a whole bunch of third-party companies which we haven't given our consent to and which we don't know track us in the process. Here we have collected some figures which illustrates which are the primary companies, which track us in every case. So we can see, for example, that in Germany, based on the media websites that we run the script on, in most cases, in 88% of the cases, Google is the dominant company which tracks us. And if we scroll down, we can also see that Google actually is the dominant company in almost all of the countries that we run the test on. Actually, out of the 30 countries that we have analyzed, in 28 of them, Google is the main company which tracks users in almost 90% of all cases, except for two exceptions. The one is South Africa, where effective measure is in the first place. And the second example is Russia, where Yandex does the most tracking, which is sort of like the Google equivalence in Russia as mentioned earlier. But still, there's a very small difference with Google. So what we can see is that Google's the main company. And then again, this raises questions as to whether Google should have this type of monopoly and so forth. But then again, it's not a simple question, of course, because a lot of media organizations use Google Analytics, and that's one of the reasons why Google ends up tracking you. And then the question is, what alternatives are there to Google Analytics and whether we should work with media organizations to help them find alternatives so that all data doesn't end up with Google in the end and can bring some type of end to this Google hegemony. When we run the script, we identified hundreds of not thousands of companies. Of course, in these last four months, we didn't have time to analyze and go through every single one of them. So we decided for starters to look at the so-called globally prevailing tracking companies. By that, we mean the companies which track you the most based on the media websites that we analyze. So basically the companies like Google, which presents the highest percentages as illustrated in the visualization previously. By looking at their websites, we can see that their main business model is based on advertising and profiling. Profiling and Web Analytics kind of feeds into advertising. And a lot of people have said to us, especially over the last months when we've been working on this, is that I don't care if Google or Facebook or any company is tracking me for the sake of advertising. I mean, in the end of the day, they improve my web experience. They provide me services and so forth. What's the big deal? That's a lot of arguments we've been getting over the last months. It's kind of hard supposedly to tackle that. But we think it's a bit more political than that and here's why. So last night, Jake and Laura Portros gave a fantastic talk about reconstructing narratives. I'm guessing most of you attended, right? Yeah? Cool. Okay, so following their talk, how many of you accessed Deschbegel slash International to read the latest revelations about tax on crypto, SSL and targeted killings in Afghanistan? But can lower the hand who has used a Tor? Because... Or actually, okay, so who accessed this while using Tor? Without Tor. Oh, okay, sorry. Who used with Tor? Who accessed this with Tor? Okay, so who accessed the latest revelations on Deschbegel without using Tor? Oh, wow. Okay, so great. Well, not great, but actually what I'm going to show is for most of you. So after the talk, we thought, hmm, what's happening now that everyone's accessing Deschbegel? So we decided to run the script last night to figure out. So as you can see here, we've run the script based on just Deschbegel.de, but also there was this Deschbegel article where we've run the script and collected results for the specific page of the revelations. As you can see there, with regards to tracking companies, more companies tracked us last night when we accessed the specific page with the revelations than when we just accessed Deschbegel in general. And as we can also see, some of these companies include Google, Twitter, and Facebook, and FYI, these companies have kind of been compromised by the NSA through Prism. Prism has been collecting data in bulk by these companies as we know. We also know that the NSA has hacked into the data centers of Google through the muscular program, and so on and so forth. So what we can see basically is that last night when we all accessed Deschbegel to gain access to this very important information, these companies were tracking on us, and these companies also work hands-to-hands with those who probably don't want us to get access to that information. So maybe that's just something we should think about, and especially if we're ever going to argue again that, you know, they just do advertising. How do these trackers even handle our data anyway? What do they even do with our data? It's kind of hard to answer because the real answer is we don't know, and that itself is the actual problem. When we say that they track us and they track our data, essentially what we mean is that they track our IP address, they track our browsing history, our search history, the scrolling movements of our mouse when we access the webpage. So last night, for example, when we were reading Jake and Laura's and Erin's fantastic article, they could literally track the scrolling movements of our mouse if we copy-pasted something and so forth. But in addition to that, these companies collect a whole wide range of data from a whole bunch of other sources, and that's why we thought it might be interesting to look at their privacy policies, not because they necessarily do what they say that they do in their privacy policies, but because unfortunately, that's kind of like our best shot if we want to know what they do with our data, and also if by looking at their privacy policies we can compare it to what they actually do. So for example, if they say in their privacy policies that they do not use, I don't know, whatever cookies or whatever tracking technologies, and then we figure out that they do, then that's where we can actually, you know, raise the discussion with them. So what we have done is that we looked at the privacy policies of some of the globally prevailing tracking companies in order to collect the following fields of data, as included in the slide, like what types of data they collect and so forth. And this data, we have put it in a CSV on our repository on GitHub, which means you can all access it. You can all contribute to it. Please do contribute to it. There are a lot of companies and we cannot do it all of it ourselves. And also it would be fantastic if we could get some lawyers to help us do it and do more accurate research on that. What we want to obtain with that is nobody read the thermal service, nobody read the privacy policy because they are in fact long, complex, maybe they're also language barrier. Maybe you are touching a company with a hidden third-party tracker that you are not aware that exists, LightBeam, the plugin for Firefox, show these things. In this way, we can convert the privacy policy and the thermal services in a machine-readable format and then app or other visualization can use those data in order to provide an easier visualization for the user that can be more aware. So by looking at the privacy policies, and I repeat, I think personally that the... I mean, on the one hand it's great that they have privacy policies. On the other hand, I'm not sure how useful they are because it's not clear if they actually, you know, do what they claim they do. But like I mentioned, it's kind of like the best information we can get right now. From their privacy policies, what was interesting to see is that most of them are based in the US, which kind of again, you know, not based in the privacy policy, but generally we still have most of them based in the US, which again is interesting. It kind of shows again the US hegemony when it comes to everything, even when it comes to infrastructure and collecting data and so forth. But more importantly, we can see that only three of them supporting our track, one of which is Twitter. Thanks, Twitter. And the other one is that basically 11 out of 25 that we looked at do not only 11 out of 25 disclose how long they retain data for. But still, even the ones that do disclose how long they retain data for, that again is kind of negotiable, because while they might say, for example, that they retain data for 730 days, that retain data retention period might potentially be renewed. We do not know who they share data with, who they sell data to, how long they retain data for, and so forth. And it's this endless, like, chain of third-party actors who eventually gain access to our data, and it's such a big mess. So the point is that we do not know what happens at the end of the day, and that itself is a problem. And we think that just by looking at privacy policy, we might get some type of insight. Okay. And that is, basically, the API that is designed. We have a database that is collecting all the trace-out data, all the DNS resolution, and the reverse DNS, all the media visited, and all the third-party that was injected. The privacy policy converted into machine-readable format, and with MaxMind resolution of the GOIP and autonomous system, this kind of resolution. With disconnected.me, there are JSON lists that permit the conversion between the domain name of the third-party and the name of the company. And all this data are integrated in our database. In the future, it will be extended with new data. The goal of this API is permitted to develop a researcher to extract those data and use it for the analysis, et cetera. The whole goal of trackography is to create a global, because we want to cover every country, observatory, because during the time, we want to monitor the change and the modification about the tracking business and the exposure of the user in the network. You can find the link of the restful documentation, and our privacy policy converted into CSV. At the moment, only 26 companies are being converted because it's a quite human-intensive operation, but in a distributed crowd, this can be easier. So how can we block and circumvent online tracking? There's no easy solution, but for starters, we can all start with these. For example, the FF has a better privacy badger, which is pretty awesome. That is one example of how we can block some third-party trackers. In this table, we include other tools which we can use, like NoScript to block third-party scripts and we can use Chrome, which may be best to switch to Firefox, but anyway. Or, for example, go through Disconnect. If you want to visualize third-party trackers and also again block them. So this is just an example of some of the tools which can be used to block some online tracking. Of course, it is important to emphasize that the use of TOR on top of all of this is always great to hide your IP address and so forth, which is something that can be used to block the third-party trackers. This slide is also better explained on myshadow.org website. You can find here, at the hand, the link. In myshadow website are better explained the various approaches for the defense. We just point out that someone can defend himself by using multiple browsers in order to not let cross-website association correlation. Anyway, we are going to finish and the best way to contribute in trackography is contribute to the media list because it is important to have a complete media list about the media access in every nation and eventually also some kind of political sensitive to the test. Run the test when the media list is ready. There are a Python script that run phantom.js and JavaScript and then send the results to our server. It is an open source on GitHub and you can be part of it. In addition to helping us review media lists around the world, if your country is missing from the map and you would like to see what tracking is going on there, you would like to see what is going on there. It would be wonderful if you could meet us and brainstorm of other ways with which we can deal with this issue and improve the project and so forth. It is still a project in progress so any ideas are welcome. I am not sure if we have time for questions. I don't think so. I am so sorry. I am sure there are many of them. I have some myself. Thank you so much for this very, very interesting talk. Sorry. Also you can access the project at trackography.org. Feel free to access it and play with the map. Thank you.