 All right, so thank you for being here to begin with. We're going to talk today about device that's rock people. And these are really almost every device that you use from mobile phone to your personal computer to wearable devices, and the list goes on. You can send me an email if you have questions, or you can use the question and answer slot to ask me a question after the talk. And you can follow me on Twitter if you want. So my name is Silvia. I am a software engineer and PhD candidate in Barcelona in telecommunication engineering. I research mostly in privacy and web science. And I wanted to talk about the real dark web and what I mean by that. So what is this all about? I'm going to talk about marketing to begin with, privacy, user tracking, online footprint, identity, and control. Why marketing? We're going to find out in the next few slides. So if you think about the company that started since the 90s with the new economy and so on, they based the business to begin with, or they sustained the business for a while, only on advertising. And so if you think of Facebook, if you think of Google, if you think of Twitter, and Instagram, and so on, they let you know in the very fine prints in the user agreement that you signed, that they sell your data in a way or another. So the scope of this is kind of massive. And the actual objective of advertising is for you to buy product. And for you to buy product, they need to suggest you products that you might be interested in. And so they want to know as much as they can about you so that they can recommend you product that they're more likely to buy because you're already interested in those things. And the thing is that they collect a lot of doubt about people. And this is personal data, and sometimes it's very personal and very sensitive information. So the way this happened is that they basically proc things that you do. So these are your online activities. But since the online activity is so linked with the offline activity, sometimes also it's offline activities. So when you subscribe for a meetup that you like, they know the topic of that meetup. It can be technology, it can be skateboarding, it can be theater, literature, whatever it is, they know about it, for example. So it's not just something that goes into the online world, but it goes into the offline world. So your real lives. And this information is crawled, it's analyzed, it's indexed, and it's always available to the companies that collect it. So what about online privacy then? Do we have online privacy? Do I have to speak close? Yes. OK. So if we always are tracked and the action are logged, does it mean that we're always controlled and some entities do always know what we do? And do we have lost the right to be anonymous when we want to? So the thing is that I always ask this, when I talk about privacy with friends or family of people that are not concerned about privacy, they always tell me, there is nothing that I am afraid that people can know about me. It's like I'm not interesting enough for them to look at me or I have nothing to hide. And this is not true because privacy is a fundamental human right, and it's documented in the UN chart. And it's kind of controversial because up to now, the right to privacy is the right to information self-determination. This means that if I give you my data and I give you a consent to use my data, that is fine. And you can use them. But with online data, the thing is that I can give you access to my location. For example, with the run application, but you know when that location information is used and to do what. So if I don't know what to use this information for, it doesn't mean I gave you my consent to use it. So there is also another thing that sometimes is said that privacy is a right to be forgotten. Because people don't care about privacy and people want to share their lives and they want to put the photos of the food they eat on Instagram. So the information that is shared online, it's actually a lot. And in 2001, it was estimated that it's 75% of the information that is on the internet that is user created. And that's personal information. So there was a kind of comic strip a while ago. I think it was in the 90s on the New Yorker. And it said, on the internet, nobody know you are a dog. But actually, this is not true at the moment because they know everything about you. So sometimes, there is this idea that is presented about the dark web. There is the web that is seen by search engines and there is this scary and it's not known. And this is usually associated with Tor. But it's actually, the dark web is any pages that cannot be crawled because it's protected by a password, for example. And there is kind of strange that they mentioned the dark web as so because also Facebook is protected by password and not all profiles can be crawled. But it's not part of the dark web for some reasons. So the dark web is the web that companies cannot reach a control or they cannot track data on it, basically, because they don't have access to it in order to crawl it. But if we think about this picture in a different way, we can see that the web is the service that everyone uses, like cloud services, email, shopping, music, maps, whatever. And then there is the data that are crawled and are shared by these services. So this can be called the dark web of marketing and we don't know about this. And I think this is the dark web because it's something we don't know about it. We cannot control up and down devices on which we have no idea what you're doing. And the thing is that we are perfectly fine with it because we are OK with using these devices and we don't question the way they access our information. So this takes us to the idea of metadata and what they are. So there was a lot of talk about it last year and metadata are basically structured information. And they can be collected about online content, offline content, telephone calls, whatever you think about it. You can have metadata about it because, especially if you're a programmer, it's not something different from having an object and having something that describes this object. Like a car can be described by the parts of the motor or the part of the show of the car and whatever. So they've always been used by website in different formats and not only by website but by any application basically. Like XML is a very old format. And it's not new or it was not invented by the NSA or anyone, so it was Jason. And to make some example of how this is embedded into website, I thought of the Google conversion tracking. So basically it's a set of tools that Google allows website to install so that if you have a campaign on AdWords and you want to know how many people you have converted into potential customers, you don't use your telephone number on the ads and on the page but use a Google voice number so that you actually know that those calls were routed by an advertiser, not basically. And when I think about this, I think always at the yellow pages. Like maybe 15 years ago or more, companies used to put ads on the yellow pages. And they might use a different number so that they knew that the calls to that number were coming from the yellow page ads. And that was fine because the information between the people that called the company and the company itself was contained between these two entities. But in this case, if you're calling something that you think it's kind of personal, like some doctor or something related to your health or to your personal belief, that information is also shared with Google, which is a third party, with Facebook or with someone else. So yeah, that was the explanation, basically, and how it works. Then there is another kind of metadata. And for example, I was looking at my phone network logs. And every couple of minutes, there are some things that are sent to a remote server through a HTTP call. And these include network location, for example, or a push of my contacts and so on. And I don't mean only the service that I have directly authorized, like, for example, Gmail push my contacts into my Gmail contacts. There are also push, for example, Samsung does rather applications. And the thing about this is that because this actually data goes encrypted now, you have no idea of what is actually being sent. Because in order to look at this data, you have to go through a very complex procedure of doing some kind of many-the-middle attack onto the data and see what your device is sending on your behalf. And then there is also other kind of devices that send data. And these are wearable devices. They track your weight, the amount of time you've been walking every day, your sleep, for example. And I have actually lots of friends that are technical. And they like using these devices. But they never think that there is a company that knows how many hours per day you sleep. And if you've been drinking, your sleeping patterns are kind of different or whatever. And I have started to read articles about how this information can be used, for example, from health insurance companies to say, OK, you have cancer or you have had a heart attack, but we are not going to pay for your care because you haven't been walked enough in the last five years, things like this. And then there is stuff for productivity. The sets that boost productivity. And they're starting to find articles about it. And then there is glass, of course. And I mean, glass is basically a calm streaming constantly on the internet because you don't know if that is standing video stuff. I mean, in theory, you know there is not. But how do you really know there is not? And they also have information about your blood pressure, for example, or if you're looking at a shop window or not. And they actually want to do that. They really want to do that. I was in a tech talk a while ago in Google. And they were explaining how this can be important to push a new kind of advertising. In which, if a shop has changed their windows, they kind of know if people liked it, but if they stopped it and the way they reacted it by looking at the way the art with, basically. And I mean, I wouldn't like that. And then there are other variables, like they're on the market, I mean. And it's just the beginning. And then there is website advertising again. And this is just a call to double click. And he has embedded some keywords. For example, if you look at it, it's like neurology or psychology, anxiety. I mean, I'm looking at the website because maybe I have a health condition or something, or someone in my family or not. And this is recorded. And it's also recorded through the keywords that the website embeds. And this has always been this way. And so because this happened with HTTP connection, every connection that you make in the browser has some selector associated with it. The picture is kind of small. But basically, let's see if I can zoom for it. No, no more than this. Well, it says, for example, that this connection goes from gating forum to Google Analytics. And they talk a couple of things about you. So for example, here there is the screen resolution. And the screen resolution, for example, used without a combination of things, can be used to uniquely identify your browser on the internet, for example. Because if you match that with the browser plugin you're running and so on, it can be used to create a unique identifier of your browser, even if you're not logged in any service. But we see this in a bit. So these things of connected data and metadata takes us to something that is being discussed mostly in the API community. And it's about hyper data and hyper media. And so when the web started, basically, it was about this concept of the hypertext. And the hypertext was linked into other hypertexts. Yes, exactly. And there was a relationship that the link kind of expressed between two pages that were connected. And this was used, for example, by search engines and so on to kind of build a model of web pages. And this concept is actually evolving from the web of web pages and hypertext to the web of data and interconnected data. And this data is basically anything. And it can be used also for personal data. And this is what happens, actually, in RESTful architecture. And RESTful architecture is basically the architecture of the web. And it's actually a way to represent web pages in a way that you can abstract over them. And they're not web pages anymore, but they are a resource. And they are a URI that identify that resource. And this representation abstract completely on the protocols that are used or on the service that are used. So it's actually only the resource and this representation on the resource. There can be any way to represent data. And the way to identify this resource, there is the URL or URI. And whatever is behind this, it's kind of connected through the server with a REST connector so that the client doesn't know about it. And this is actually what happens when we interact with data, basically, on the web at this moment. We have an interface that is uniform. And we request the resource through the URI. And we have the representation of the resource. It can be a page. It can be a JSON. It can be XML. And we use that. And we have the resource that is linked to all the resource. And we serve through them. We are starting this with the hypermedia API and so on. And so why this model for privacy? And why is this interesting? So because optional information about users, the web tracked were just a record in a database, for example. It was just a log message. OK, user one has done this on this page. But now everything is started to be structured. And the way the information can be mined, it's a lot more efficient. And it's a lot easier, basically, to track user follow the principle in which you would analyze web pages. There is something that, for example, search engine I've been doing for the last 10 years or more, 15 years, probably. So if we look at the graph again, we see that every action has a selector. It can be a keyword. It can be the resource where the action has started. Or it can be anything about it, really. And everything makes that action more unique, because it can be searched more efficiently. It can be analyzed more efficiently. And it takes up to this idea that everything that we do is connected in a hypergraph. And it's just a graph. It's just a graph. And where every node is connected with a set of edges, there are the links to other entities. And this is what happens with web pages. And this is what happens with personal data at this point. So you have your identity. And you have the web pages and the action that you visited, the action that have been taken on those pages. And you have your post on Twitter, your post on Facebook, your friends, your connection. And everything is connected to you, basically. And it's very easy to analyze. And this is an example of how it was analyzed. It was some document that was leaked on the Washington Post a few days ago, I think, a week. And they used the Google pref ID that is set on a cookie to actually identify targets online. And so that's how easy it is. And I mean, if everyone can analyze data every time you visit a website. And at the moment, it's happening mostly in JavaScript. But if any of you works with web application or mobile application, you might use a set of tools, like third party tools, like Google Analytics or New Relic, for example, Mixpanel, and so on. They're used to track events. And this happens in the back end of an application. So in this case, you cannot know what the application is actually tracking, because it happens in the back end once you're logged in. So you're not even able to look at it as you do with JavaScript or HTTP calls, because it's the server that does those calls, and send the data that they want to track. So I was suggesting this link, because it's a study from EFF done in 2011, I think, quite a long time ago, before this started to be mainstream about how you can use some statistical measurement of how your browser is actually unique. So to answer that question, how unique is a footprint? We can look at a set of different things. You can start profiling your activity and see what kind of profile you're sending through the network, what kinds of keywords you're sending. For example, you can calculate how many bits of information you introduce every time you add something unique to, for example, your browser profile. Or you can start analyzing how many unique features you are sharing across the network. There are studies, for example, that by looking at your Twitter profile and your tweets frequency across the week or across the hours, they can know if you're tweeting more at work or tweeting more at home or tweeting more in the weekend and so on. Applications, they know that they customer are more likely to use the application in the weekends. They increase, for example, advertising during the weekends, or they promote special offer. And so these are, for example, application that are selling, I don't know, video streaming services like movies or magazines, books, and so on. They know that because you're busy, you're probably likely to do these things during the night or during the weekend. So they kind of change their strategy following these patterns. So when you profile over a set of categories, what you do, you basically count every time the category is expressed and you put that in a histogram. I mean, it's nothing very statistical at once. You just count and you divide by the total and you have the histogram. And you can do it with like, I did this for myself and I was looking at stuff and you have the spike in computers, but somebody else can have a spike on health, for example. And that might be used for any particular reason, I don't know. So this was a study that is done by DFF a while ago and I mentioned it before. And he used things like the user agent in the browser, the headers that you send every time you visit a web page, debugging the time zone, and so on. And I calculated the actual, how unique is that feature in a set of what? I mean, the cloud is just collection of data centers belonging to some companies. And when, I mean, 15? Oh, OK. So when we think about this, we think, OK, we are giving our data to somebody else and the data is safe, basically. And, but, I mean, who owns the cloud? I mean, even if we use different services, sometimes the infrastructure behind the cloud is by somebody else. So these are the biggest players in the cloud computing infrastructure. And it's only five, even though, for example, there are a lot of services that use Amazon, for example. But they're still on Amazon. Yeah, and this is basically for the revenue for 2013. And still, the players, there are a very small number of them. They actually have cloud infrastructure, not the drug-selling services. And if we go beyond cloud providers, this is like the website that shows cable map for submarine cables. And all these cables, they're not public. They are owned by companies. And the way the data that travels on this cable is treated, we actually have no control about it, because we don't know. So I start thinking, what about mobile communication providers? Because sometimes mobile infrastructure is deployed, especially in Europe, it's deployed with public funds, with a lot of help from public governments, but then they're used by the companies that they actually can share those infrastructure. But so if they share their infrastructure and they actually haven't deployed it, who owns that infrastructure at the end? Or who has done it? And I mean, built it. And the number of companies, again, that built telecommunication infrastructure on the market is not the right. It's actually free. And it's Huawei, ZTE, and Ericsson, exactly. And so it's, again, free companies, mostly. So there was this quote that I read a while ago about Snowden and this idea that the internet was free. And then it wasn't free anymore. And the thing is that the infrastructure that we have been using has always been the same. I mean, the internet was built centralized with a series of layers of control that are there by design. I mean, if we look at the protocols, if we look at the way it was built, it was built with the idea that was someone routing the information or controlling the information, providing the infrastructure. It was stopped down in a way. And that is understandable thinking about how electronic systems were built a while ago. But if we want that freedom back, the idea was that I think we've been starting to research in open data and we've been starting to research in open source software. But we haven't researched in open infrastructure yet. And I think that's where we should go in order to have more access and more control on the infrastructure that we use, actually. And another thing that I think we should do is also collaborate with researchers outside of the telecommunication industry. Because there are a lot of people that want to do stuff for privacy from the political point of view, from the social point of view, from the point of view of the law. And they don't know sometimes where to start and they don't have all the information. And the last thing that I suggest is to be mindful about your footprint. Like, if an application wants to know your sleeping patterns, I mean, you should question that. And that's it. Thank you, Sylvia. I think a lot of people appreciated your talk. Are there questions? We have some time for questions left. Actually, two interesting points that you brought up that I'd like to explore a little further. You mentioned, of course, people tracking fitness and sleep. Are you aware that in the US right now, there are three court cases that are using the Fitbit as part of their subpoenaed evidence? One of them is for an injury case. They want to demonstrate that the person was, as injured as they say they are, by using the fact that they can no longer exercise. But the other two are divorce cases, where someone is trying to claim that a person is not where they said they would be using the Fitbit tracker. I actually read something about it. I read it was used in court cases. I didn't know the actual cases. And the thing that I read mostly was about health insurance companies. Yes. The second thing was regarding the mobile phone. It was recently revealed that both Verizon and AT&T in the US were adding an additional tracking signature to everyone who used the browser on their mobile phones so that they could then get a complete record of every site that you've gone to once you leave their connection. And AT&T has claimed that they are no longer going to use it, but Verizon has made no claim at all. They're going to continue to collect that information. Well, the thing is that it's very easy to add a parameter to a URL. I mean, once the call goes through their service, they can do whatever they want with it. And they have the information anyhow. And it's very difficult for us to find out that they're actually doing it. OK. Thank you. The work you're doing is great. Thank you. So, here's another question. Yeah, thanks. My question is, what we could do to reduce our web footprint now? I mean, we should use all the time of the Torbrowser bundle with, I mean, in a virtual machine? The question is, what we can do with about the footprint and if we can mitigate that with Tor. Yeah, I mean, Torbrowser bundle is, like, unrecognizable. I actually run the Panopticlic with Tor. And OK, so with my usual setup, the bit of information with about 22, that was sending. And with Tor, it was getting down to 12 using the Tor bundle. Still, if you keep browsing with the same bundle, although your location might change and so on, there are information that you send, for example, about the cookies that you store or you design not to store if you need to. And your profile, basically, the things that you've been surfing. I mean, after a while, your profile becomes part of your footprint. So that's the thing. Thanks. Thank you. Go. Now, your question, please. OK, so sometimes it's not practical to use a Torbrowser bundle because it's just not fast enough with the rerouting. Do you have any other recommendations one could use to stay a bit low profile? The thing is that once you log in or you accept cookies, there isn't much you can do. I mean, you can stop any JavaScript calls, for example, to third party. And you can do that with a number of browser extensions. But if this is something that is happening in the back end of an application, you have no way to know what information has been collected. And with mobile phones, it's even worse because even if you don't use the browser and you use applications, even when you do not log in into anything, into an account, they use the device ID or they generate an ID for that device in order to use it to track you. So maybe a follow up on that one. Like, how would a web browser, how would it have to be designed to actually prevent these kind of things happening? Or maybe is it possible to design a web browser the way that so it's much harder to track you? So the question is, if it's possible to design a browser that is harder to track? There are privacy extension that can help to block calls and can help you say, OK, every time I visit, for example, the New York Times, I make these calls to these servers. And these servers are located in this location and are sending this data. And I can say, block this. And you're probably blocking some ads and so on. But again, it's not like everything needs to happen in the client. As a client, when you use a client, you can protect yourself. But as soon as that goes into the server, and unless you always kind of send forged information, like you always change the location for a proxy and you always send bogus keywords to say that if I'm searching for something, for example, basketball, I'm sending a set of keywords through other websites like business, economics, and religion, whatever. Unless you do this all the time, there is always a way that they can track you. Because the way the infrastructure is designed is to collect data. And the development of big data is about data points that users have generated. And they can help them analyze stuff, I don't know, from if the application is used more from London or New York or from, I don't know, one magazine is read more than another one, or one article in the New York Times is more interesting that another one, you name it. You can do it, basically. Thank you. OK, then there's one question from the internet and another in the room. The first one from the internet. OK, would it be any help to randomize all possible tracking variables, maybe by an HTTP proxy that modifies the relevant content? So the question is if you can use tracking barriers and HTTP proxy to avoid this. You can certainly use it, but you are also aware that your online experience changes. And again, you cannot use that all the time when you use, for example, your mobile or things like that. So anti-tracking technology exists, and they're starting to be developed. But in my opinion, they're only in the client. And we also need an effort from the social point of view, from the political point of view, to say there is so much you can do with personal data. Thank you. Thanks. OK, then that's the moment the last question there. Thank you. I have a question concerning the identity hypergraph. And if you have any information, if there's any debate on the strategic value of that, since the US, at least, there is a process called the National Strategy for Secure Transactions on in cyberspace, where they basically want to create a marketplace for identity services. And I wonder if the vendors that you have knowledge about participate in that discussion, and what will the hypergraph place in that? So the question is about the hypergraph and security services. Well, the political value or the strategic value of that, if you're aware of any discussions or if you have seen anything that points to. OK, and thanks. So the idea about the hypergraph that is part of my PhD research and thesis, and I use it to analyze data and not to have just to query a database, but to be able to say, OK, if this person was interested in psychology, maybe they're also interested in anxiety, or they also did something with that. And the political idea behind this is that if you know what you've been attracted for, or if you know what information you're sending on the internet, you'll be able to say, OK, I want to do something about this. So the idea is that if you posted, for example, something on Twitter, Facebook, or whatever, that says something that you don't want about you, like you accidentally post something like Xanax or something like this, you are able to remove it and to know that you actually did it. And for example, if you know that because of a number of connections that you have on Facebook or on Twitter, you're actually more likely to appear of one religion as opposed to another, or you're more likely to appear gay or straight or whatever, you might not want that to be shown. And you might want to recognize what is actually being revealed about you by your activity. So the idea of the apigraph that I had was that by analyzing the information that I share, I'm able to know, OK, I need to delete this. And if I delete this node, I also delete all this information about me. Politically, I don't know if this discussion is happening. I know that, I mean, the thing is that it's the way the NSA and the liquid documents talked about metadata. There was something about a node and selectors. And that reminded me of a graph structure and a link, actually, when they say selector, I thought of a link and that was it. I don't know if I answered the question. Thanks. OK, I don't see any more questions at the moment if anybody has another question. You can just shout now or wave. Doesn't seem to be the case. So thank you again for this talk.