 So it's a full room for a first talk. That's nice Hello, and welcome to the first talk of the decentralized Dev room Just a quick reminder Please when you leave the room keep it clean you have the trash bag over there So put your empty bottles and glasses So we're gonna start with our first talk. So Please welcome Santiago Savedra and Konak Modi for watching them watching us web extension exposing privacy leak Well, first of all, thank you all for coming here and thank you for having us This is the and this talk is a bit of an experiment that's coming out because Both the local sheriff project and Dracula went to the call for participation of this Dev room We we were asked that we merged our talks and make a decentralized one. So in this Dev room, it made sense Because we were making an effort to explain More or less they come to the same problem from two different approaches And I hope that the result is is good for all of you. So what's this problem about? surveillance well It's prevalent that in we all know that there are lots of trackers on the websites that we most likely visit every day Like for example, Google is embedded on more than 80% of the web pages that we visit in Europe from data form who tracks me and There's a lot of actors in here and a lot of companies that are involved in in this issue But why why does this happen? How does this work? Well on one side application developers and the website owners Want to at some point outsource some of their known core parts of the application or the service they're building to other companies, right? For example to build analytics a be testing recommendations advertising Some of these use cases are legitimate use cases for for businesses to have and they are based on the capitalistic model That we are all familiar with that website owners pay these third parties to provide them with services But some others in fact instead of doing that they use business models made out of the surveillance capitalism They made money out of you instead of the website owner And in fact the owners may even receive part of the revenue How does this work? well Every time you visit a web page or you interact with a digital object that's connected to the internet You generate a trail of data and that trail of data is then aggregated and brokered by these third parties in a data market where it's then Grabbed by other actors who can aggregate them and cast profiles of you and from that profile Then you can get behavior analysis of the people and that behavior analysis feeds back into the system And at some point profiles are so good about you that you can predict What's what will happen to a crowd of people and with that when that happens? You may be able to predict real life results like the results of an election So we think that this is a very Interesting problem to tackle and that we have to do it in a holistic approach for example This is one of the challenges that we're going to just explain Imagine you're planning the your next trip. For example, you want to go to Free software meeting You may have come by train or by plane somehow you have a booking ID that Is fed back to you when you pay for your ticket and from that booking ID You may be able to then get into the reservation to maybe add your passport details or your full name or something like that And in that page there may be some trackers embedded some pixels that Allow third parties to know that you visited the booking page in time and and so on but thing is that in In many cases these pages include the authentication token or include some details that can lead back to these trackers Being able to also see your personal information including your full name your passport information your home address and so on And this is exactly the issue that local sheriff Focuses on this on this plane and I'm telling you about so I will now give the stage to Konark to Talk more about this issue So before getting into what local sheriff actually tries to solve and how it tries to solve Let's try and understand the example that Santiago shared Basically, the URL structures can have some private information in the URL as query parameters Or the URL might have some authentication tokens, which leads you to a page which has private details We name these types of leaks as telltale URLs because these URLs are now also being shared with a lot of third parties Like one example is let's say if you are if you ever donated to Mozilla after you finished your payments You get redirected to a thank you page now all the information that you entered while doing the payment your email address currency Country you belong to amount you've transferred it gets added to the URL itself and Because this page loads a third party in this case from fonts dot Google API dot com and the URL is not cleaned The same information in this case is being sent to Google API dot com and That's not the only third party that is present on this page There are about seven third party domains which are present on this page most of them legit Maybe not legit and we really do not know what happens once the URL leaves the users machine What we know so far is that this information is being sent to various third parties Another example for example, this is a very popular site called train line dot you which helps you book train tickets Now what happens is after you finish the booking you get a URL which has an authentication token the URL itself is clean But if you have access to this URL you can potentially access the users booking Deleted cancel it change some data and do all sort of things that a legit user would be doing on its booking Now the Same concern It also has a lot of third parties in this case It's Facebook first the leak happens to the referer Second the Facebook the Google Analytics script on this page actually reads the URL itself does not clean it and sends it back home And it's not just Facebook and Google There are about 17 third party domains that belong to five different companies on this page itself So your booking details which were supposed to be only yours are now with 17 different companies that you did not even realize That's again example of Flix bus doing the same thing They have a auth token the URL which is now being leaked to fonts dot Google API dot com Then this is the case of Lufthansa for example if you're traveling international You are required to put in your passport information during the check-in process Lufthansa was not clean in the URL which basically was being shared with a lot of third parties giving access to the users passport Information data worth etc. Furthermore someone else could also actually upload some fake passport and when I'm traveling at the airport Maybe I get caught in some bureaucratic things and Again, there are about four third party domains on present on this website who get the exact URL which will lead them to my booking and passport details Spotify is another example if you've used Spotify desktop app last year and until March So if you log in on Spotify desktop app and then you click my account in the desktop app It would open a URL in your default browser Now you will not need to log in again in your default browser because the URL carries an OAuth token Which leads you to your profile details. What was happening in this case was? They had some trackers on this page and were not clean the URL So basically all those trackers were getting this OAuth token which potentially gave them partial access to your booking The details we're talking about here are your date of birth. Are you paid customer if you're a paid customer? What are the last four digits of your credit card number which company your credit card belongs to right and They were about 25 of these trackers present on that page which we're getting the same details Similarly if you've ever ordered food online a lot of these services I'm going to talk about food order D here So the prerequisite of getting home delivery is you need to enter your address now some websites Do a reverse geocode on the address into fine latitude longitude values and then they put that information in the URL bar itself And because the URLs are not cleaned They are now being shared with a lot of third parties in this case My home address is being shared with 18 third party domains when I'm ordering food online And I can go on and on and on about these examples, but that's not what I'm going to talk about today In a gist there are a lot of these URLs being shared every now and then when you're transacting on the internet It is not just about you being tracked across multiple domains It is also about websites giving their customer data to third parties when it is not needed in the first place And also giving them potential access to critical data More worrying is that users consent is never there We did one test on a website which was leaking such telltale URL to some companies We actually found their privacy policy is pretty outdated and the host names that are actually sending data are way more as listed on their website To make matters worse the websites themselves do not realize that they are sharing these URLs with these third parties For example, this is a case from Emirates comm their privacy policy states that at any given point of time You are not supposed to share your booking reference and last name with anyone else because they can access your booking But at the same time Emirates comm was actually leaking these same two values to 12 different third parties Now here's the missing piece. This looks scary It is a serious problem and it is happening everywhere but somehow this topic is not being talked about and if I Tell you the horror stories that I have when I'm reporting these issues. It's hilarious. We can talk about them Maybe offline, but the point here is we believe that there is an important aspect missing here Which is the tooling right now if we have to inspect your own traffic and see if these leaks are happening for your URLs It's very difficult You need to know some network monitoring tool like MIT M proxy or need to know depth tools You need to understand how first party third parties work and what data is being leaked And you have to do a lot of manual steps to actually figure out what is happening in your browser, right? And this is the problem. We want to solve so we took four key Details that we wanted to solve with local sheriff first. It should be easy to install So you do not really have to be a technical user to install the software So we thought let's create a browser extension because then it will Hit the masses second It should be able to monitor all the network traffic between a web page and all the third parties that are there on the web page Then all this data needs to remain on the user's machine at any given point of time the software Local sheriff or any other software is not allowed to send anything back home because it's critical data itself Then the users can actually understand if it's a first party for example I visited a page or it's a call that got loaded on the first party itself and After all this data is collected locally on your machine You also get a search interface on the extension itself Which helps you now find out what is happening and what private data of yours is being leaked to which companies? As an example once you install local sheriff This icon appears in your toolbar and it starts to monitor the web request using the web request API from browser extensions Then when the you when local sheriff gets a request it classifies whether this is a first party request For example in this case it will classify or its edition dot CNN comm So this is what the user visited and all these different calls are actually what is happening as third party calls on that particular web page Then additionally we also want to map if a particular domain is known to belong to a certain company For example if you load CNN comm and there is a call to maximizer.net We want to tell the user that maximizer.net actually belongs to Oracle So we also want to get the parent company to create a detailed In perspective on who is actually collecting your data on the web So for this we use an open-source database available from who tracks me we ship it locally on the client side with the extension itself Which contains this mapping and Then finally this is the interface that local sheriff would have first of all on the top You would see have any of data points entered in forms been shared So basically we assume that details being entered in the forms are critical at the first place so it automatically flags if You entered a value in the form and that has been seen being sent to any third party So here it will list down the value you entered in the form and label it as yes or no And if you click on the yes, it will then give you a detailed log as a second step Let's take an example again as a second step. Let's say after installing local sheriff I went to this whole process of donating to Mozilla again now What will happen is I go to the search interface and I enter my email address to see has this email address Been shared with any third parties now as you start typing in the search bar It will list down that yes this email address has been shared to seven third party domains Who are owned by two different companies by one website in this case? This is the website then you can also take a look at the detailed log saying how this actually got leaked Was it referr headers was it in the payload itself? Which is being sent to analytics companies and things like these If you want to do more advanced analysis, let's say I do not want to share a Private information that is being leaked rather. I want to see how much information does Facebook has on me in my browser So what I do is I actually start up fresh profile on browser I visit few websites then I look for this pattern which is basically used by Facebook to track or send data to its analytic services So when I search for Facebook comms last year it lists down that there are four websites that Have a Facebook tracker and also Facebook puts a cookie in this case It starts from FR is equal to some value which is being set on my machine and sent back home with all these Information then I say okay Let me take this cookie value and then see how many websites are being tracked by Facebook and the list explodes to about 11 websites now Now if you take a look at this example that I shared before train line dot EU which has Potential access to anyone who has this URL about my personal details my full name my email address It has a Facebook tracker, which also has the same cookie and sending this URL back home Coming back to the food or example which is sending my lat along values back home to Facebook with the same cookie Which is actually if you put it on maps you can see the exact house where it was ordered is also being sent back And now if you look at this this is like an analytics back end of any tracker That would be you have these four URLs along with that cookie Which is enough to say that this same user is going through all these different domains But also thanks to last two URLs now if this tracker wants to de-anonymize me on the internet It's very easy for them because now they can say ah by the way this train line booking belongs to this such and such person And this such and such person has a probability of living at this address Local sheriff can not only be used by users But also can be used by organizations and developers to test their own apps before the leaks hit the production systems They can include local sheriff in their testing frameworks and start testing for privacy leaks and audit their applications themselves It's available on all platforms because it's in web extension So you can basically install it from Firefox store or Chrome store and test your favorite websites or when you are checking Your flights or trains back while going from posthum make sure to see how it's getting leaked Last this local sheriff tool helps people who already are aware of online tracking world and want to see the next step Of what data is being leaked, but there is a there is still a huge gap between Understanding what is happening in the online tracking landscape and this is where the project from Sophia and Santiago comes in which is Dracula So I like to hand back Mike to Santiago to explain Dracula project more So our idea is how to it's the other challenge is how to make this Approachable to everybody out there outside this dev room and outside for them and outside, you know this Technical context that you need to be aware of to understand these issues And and that's where where the project comes in we started out in In a citizens and laboratory in Madrid where where we There was an open call for participation that was more geared towards maybe journalists and artists about Migrations and we came up with an idea of migrations of data through the digital world and we were chosen to to pitch this idea to a group of collaborators that ended up being on the project and after we came up to Tell them about and the all the whole idea of how the technology is tracking you through all these means They wanted to be able to explain that to their friends and how could they appeal to the broader public? So we wanted to empower them to be able to do just that and so that then Other extensions such as maybe a local sheriff or you block or something that really Helps you protect from that can happen and can go more Broadway more to the broad sense so after being able to go from this State of things where we have When we're on the technical side, it's sometimes difficult to Get down to just the simplified point of how to use things without all the technology involved and when you finally get that and Everything everyone is on the same table and is able to understand that it's Better to be able to to get the idea around so we started we started out with the with Mochilla item as an example of The idea that we wanted to convey this was one of the tools that we used to Make them try to make them understand the problem, but they felt that this this kind of visualization Gave you too much information at once for what for the basic idea of what they're trying to solve so we want we want we worked with that and we We have a workers our web extension that is a fork of light beam that works with another visualization set the idea being Having a metaphor about the real world about the world that you know and the underworld where everything is happening But it's not just as easy to be aware of it so We first tried with trees that you plant or the west the websites that you are visiting and then the roots underneath are the points where The data is being shared, but someone told us and it's true that trees don't Share roots in the bottom so we changed in the realm to the funky the fungi realm because mycelium is really the thing where Different fungi in in the upper world share their roots in the underworld And from that the deeper you go down the more the darker it gets in the background and the more Rooted the trackers are on the the websites that you're visiting so this is the plug-in working and it has some sort of information in general about the visualization that you're doing but also on the left there are some information pills of ideas about tracking in general and also the most prevalent trackers that you have and As you see what I was saying the more into the bottom that you go The more rooted they are on all the websites that you visit and you always keep track of the last five websites that you're visiting so that not too much information is crowded on the On the page at once and also this visualization is unique to every user in a given point in time and if you install some extensions or Utilities to block information to book sorry to book connections to the outside world then you will find this Visualization change so that you don't get as deeply rooted Connections because all these trackers will you will no longer connect to them. Sorry. I have problem passing this slide Sorry about that There is so at some point you will have a more clear Ground of what's happening and you will not find these connections that I was saying But now how can we appeal to the greater audience? In one side, I think we all must Build technology that allows us to communicate all these issues Clearly with everyone else so that you do not need to have as technical background as As you would if you just try to convey the most information possible so We were talking yesterday in a dinner how we could communicate and collaborate with our joint work to appeal to a more better audience a greater audience and Local sheriff has an idea of notifying the user from automatically from the form data that is being leaked and Turning it into some kind of privacy analysis engine so that other other tools can use local sheriff data as As an engine for all these things and the idea is that Dracula becomes some sort of UI for local sheriff so that we can On one side have the technology and the other side have the visualization and we want to share that for JDPR compliance And this is our first time for them. So it's really nice if you gave us feedback. Thank you very much Thank you Sophia for all the graphics that So I say any question yeah legend exodus on the side for the press, okay, so questions Two questions if you don't point One is as I understand when data doesn't leak your like doesn't leave your Computer or handheld You cannot actually connect something what was happening on your laptop and your mobile device So is it in your roadmap to safely share and collect? All your activities from different devices in one place, you know and analyze it all together and Another question is related to at blockers. So at blockers at some extent can block some trackers how this works with local sheriff Like would it show you that some parts of requests were blocked and you was protected partially. Thank you Yeah, sorry about that So to answer your first question Right now because it's just a web extension So it's not yet possible to analyze or even analyze mobile apps, but that's something which is on the roadmap to Maybe build something equivalent of local sheriff to also analyze mobile apps and then maybe think about connecting it as a complete profile On the second question about the ad blockers so maybe you can So About the second question What's the possibility of leaks if you're using some ad blockers? I would say yes, there is still a possibility because there are so many different ways Such sensitive data can be sent outside and ad blockers will not block everything that is being sent So it is likely that you will still leak some information because they are basically based on Host names blocking and things like these so it's not the perfect Match to block certain things at some point website owners and third-party Tool developers need to own this and fix these mistakes themselves, but we can talk about it more in detail offline maybe Hi guys. Thanks for being here. I have a very simple question You're saying that the one of your use cases is to train the users train them to understand what privacy is and how to make sure that Their privacy is being violated step-by-step Do you think that it's easy to present that to my mother or to my grandmother that? She's using simply Facebook to check out on the family news or For example, why should I use Dracula instead of the EFF's privacy buzzer for example? Thank you Personally, I was trying to explain this this problem to my mother and I use the Dracula and she understood so I think The Advert user is going to understand it with this and I don't remember the other question The idea was that we also tested this with the other groups that were on the open call so they were our beta testers so to say so Of course this could this can be improved and We hope to get more collaborators and more people involved and have more projects like like this Happened with non-technical people so that they can better understand it and empower them to build Upon this and make better tooling for it And what was the other question? Okay, so what's the benefit to your mother if she knows what is happening in the scenes while she can then install other Extensions or other things to be aware of it and block if it she wants to and in any way Have the conscious decision of whether she wants to share that data or not Thank you very much. We have no time for more questions. Thank you Thank you