 And now welcome to the last talk of the day and it's very close to night and it's time to tell spooky campfire stories except this one is 100% real, spooky as it is. It is a talk on shadow profiling and on how Facebook attracts you even if you don't have a Facebook account. So let's give a warm round of applause to Christopher and Fredericke and invite them on stage. Thank you. Hi everyone. I'm just amazing to be at Congress. This is my first Congress. So yeah it's just been an absolute like dream. I really enjoyed it and it wouldn't be possible without like the angels and the translators and the heralds. So can you give them a round of applause? Thank you. Just by a little show of hands. How many people in here are Android developers? There's like five of them. Cool. And how many of you have used the Facebook SDK? Any of you? Couple. Cool. So I'm going to hand over to Fredericke and he's going to take for a first look at the slides. Earlier this year we have filed complaints against seven companies. Those are data brokers, ad tech companies and companies that do credit scoring. And the reason for our complaints and the reason why we focused on these companies that are really not household names to most people is that it is currently impossible for most people to understand how they're being tracked and profiled and in whose hands all of this data ends up. As part of this research I asked one of these tracking companies, a company called Quantest that many of you probably haven't heard of but that has certainly heard of you for all of my data. So what you see here, this is a heavily edited and deliberately blurred version of my browsing history. So what you see here are those are the websites I've opened, timestamps, device information, this is work related so it's quite sensitive sometimes. It also includes inferred data, a predicted value for my gender, whether I have children, my income and with all of this came partner data from all kinds of companies, different data brokers, who placed me into infuriating categories like heavy alcohol, spender at home or affinity for baby products and nappies. Here's what's so very fascinating about that data except that it's a lot of data, some of it is heavily personal and some of it is very wrong but what's so fascinating is that all of this data is from one cookie, from one browser that was placed on one of my devices. This is why today we have just published right now research about how a very different tracking company tracks people on apps that are built for Android and we've focused especially on how Facebook tracks people who do not have a Facebook account. Earlier this year research by the University of Oxford showed that 42% of apps of free apps in the Google Play Store in the US and the UK should could share data with Facebook and what's interesting is that this makes Facebook the second largest tracker after Google's parent company alphabet. With this research that we just published we wanted to build on this and show what exactly this data sharing looks like particularly for people who don't have a Facebook account. The reason we focused on Facebook and not Google or any of the other tracking companies is because the very fact that apps like a period tracker or an LED flashlight shares data with Facebook in the first place will come as a surprise for many people and especially for those who have made a conscious decision not to be on Facebook. Here's what we did. So we took Oxford University's research from the computer science department they gave us a copy of their list and they gave us a list of all of the apps of the top 5,000 apps that use Facebook SDK. We chose the number of the large or some of the largest ones and we also chose a few apps that were that had sensitive data like they were either to do with religion or they were to do with health or something other stuff and a few utility apps out of that selection because utility apps may also have information about other apps. So this is the 34 that we chose and you can see it's a real mix of both big big name applications you know like Spotify indeed and then much smaller applications that are made by more indie developers either just games that they make that sort of thing and but all of these apps have over 10 million installs and we chose apps that had decent install bases because this isn't really about the apps themselves and developers we're not here to criticise developers for the way they make their apps this is all about the SDK and the way it transmits data with or without user consent. So we can take you quickly through our analysis methodology at a Nexus 5 running Android 8.1 connected through a virtual machine running MITM Proxy which is a manual proxying tool and I did transparently so the app didn't know that it was being intercepted and we had a look at what the data was that was going between the app and Facebook. So our first finding was that over 61% of the apps we tested automatically transferred data before the user has any other like that they literally just opened the app so it's 21 out of 34. So for example here's Kayak and as you'll see pretty much immediately from when the app is up clicked that first request is straight to graph.facebook.com and it sends a whole load of other data to many other companies besides and I could let this go on for a while but ultimately you end up on their home screen which is kind of amusing because it's got this at the bottom of it which is don't worry we'll never share anything without your permission so let's take a look at some of this in slightly more detail and this is a this is pretty standard this is a vk the Russian social network and they're after pretty typical of how the SDK is implemented and I'm guessing probably in the default state which is the app doesn't have initialized when the SDK is first load saying SDK initialized and then when the app is in foreground it's got activated app and then when the app goes to background it sends a message saying deactivated app and when it comes back and when you close the app it says deactivated app so you can actually start to profile like how long the user is using an app how often they're opening the app this kind of stuff and if you look at some of this data if we are able to see it well clearly um it got a unique ID that is we we proposed personal data because it's uniquely attached to your browsing and whatever your app usage and then some other extraneous data in here that's also quite in quite telling it's got a it's got stuff like what version of android you're running what the device you're running on what your keyboard localization is you know your time zone you're getting quite a picture just from this one app of like someone and then it doesn't take much to go beyond beyond that so our second finding is some apps also routinely send facebook data that is incredibly detailed and sometimes sensitive this isn't a particularly new finding there have been other people so a good shout out to privacy who also reported about the baby plus app and uh and mobile shizer who also earlier this week i think reported about um the pregnancy plus app that sends granular data so there's some examples we found in our data so the top example here is kayak and kayak sends your entire search to facebook every time you do a search in their app which is lovely so and it's interesting what they send because it's not just um it's not just for the content of your search there's also some other stuff like your user score whatever that is and it's got obfuscated session ids and all sorts of other things that they're sending to facebook um the other one that's on the bottom here is the king james bible uh and it's quite typical of a lot of ways that app developers implement the sdk and allows them to track your usage through the app so um this one when i originally tested it they've actually made their data slightly less granular but when i originally tested this app it actually told you which verse and passage of the bible you'd read which is a what facebook needs to know so and then the last bit is the the actual advertising data that facebook uses and this is a a request to their ad network and uh some slightly interesting stuff that that comes from here is that you know like the device is on charge the mat the battery percentage is full there's free space and this isn't this isn't even the most comprehensive example i've seen other data on there such as accelerometer positions the other a slight interesting thing again all linked with the uh the app id the other thing here is it's got the top there copper false which is the americans child protection a child privacy act and uh it's somehow decided that i'm not a child even though it never asked me so so and yeah and it's crucial to remember that this happens whether you're a facebook user or you're not a facebook whether you're logged in or you're not logged in so it's making profiles are being made regardless of whether you're having a facebook account or you don't have a facebook account well you have to use as some of these apps and a profile is being built potentially so why does this matter why do any of this matter so our analysis obviously only focused on the data that apps transmit and we can't possibly uh say with definite definitely how this data is being used but here's what's really interesting is that our first finding is that the vast majority of apps share data the second it's opened and the data that is being transmitted um indicates what kinds of app you use when you use them combined with a unique ad id and knowing what kinds of app somebody uses and when can give quite a detailed picture of someone's life so these are four apps that we have actually tested the first one is a muslim prayer app the second one is a period tracker indeed is a job search um app and my talking tom is a children's app so what kind of person could that be that looks like a person who is likely muslim likely female likely looking for a job and likely have who likely has a child so in our analysis the apps that automatically transmit data it all comes together with the google ad id that's a user-resetable unique id that also in our previous work on data burgers and ad tech companies is primarily used for uh connecting different profiles together even though we don't know what is happening to this data it would be pretty straightforward to link it up there's also a second reason why this data is uniquely interesting and that is knowing what kinds of app hundreds of millions of people use and when they use them gives gives quite a unique insight into the android app market so if combined event data such as app installed sdk initialized and deactivate app from different apps uh offer a very detailed insight into the usage behavior of hundreds of millions of people on apps and again we have deliberately focused on app that have a lot of installs at least 10 million and some have even 500 million installs so combined we're looking at over two billion installs and the kinds of data that these apps share that really matters because it's a lot of data the question though is why do so many apps share data with facebook the second that the app opens there are obviously many reasons and some of these are even good reasons why apps use third why apps use third party tracking and sdk such as the facebook sdk um so what the facebook sdk for android does it allows app developers to integrate their apps with facebook's platform and it contains a number of main components such as analytics ads or login so this is the reason why apps or some apps choose to use the sdk now here's where it gets a bit complicated facebook places this sole responsibility on apps to ensure that the data that they collect and ultimately transmit to facebook has been obtained legally and we've reached out to facebook and they confirmed to us again that um it places a legal and contract contractual obligation on the developer who they see as the data controller to get consent as it is required for the use from users before sharing data with facebook via the sdk however the kind of data sharing that we've observed in the majority of apps is the default implementation of the sdk that's nothing we discovered that's clearly uh admitted by facebook in the quick start guide for android so when you use the sdk its default implementation automatically transmits data since may and what happens in may gdpr entered into force um lots of developers have been filing back reports about the sdk um on the right you see one of them but there are many you can search facebook's developers platform for them this one from july 24th somebody complains that they wanted to integrate the login but the moment the app opens it transmits people's ad id and they're not allowed to do this so the reason why developers complained is that when apps share data the second they're opened you are unable to ask people for their explicit and ambiguous permission which is the bar for consent that gdpr requires so facebook released the feature uh and that feature delays what they call the automatic event logging but that feature was only released in june and we in our conversation with facebook prior to this publication facebook confirmed to us in writing that prior to the introduction of this delay option developers were able to disable the transmission of data but that doesn't include something they call a signal that the sdk has been initialized and this is exactly what we've observed in our research and just as a reminder the signal that the sdk has been initialized that's data that that gives a strong indication somebody's what kind of app somebody uses and when they're using this combined with their google ad id the big question is of course is this even legal and we have a very long section in our reports that explains this in a lot of detail because it is very complicated but what this analysis shares is that we think that the responsibility is a lot more complicated than saying this is entirely the responsibility of apps so since we've done this analysis in the uk and the uk is still a member of the opinion union and has implemented the gdpr that's one legal framework that applies we also looked at the e-privacy and we also looked at competition law and the underlying theme is who has what what kind of responsibility so the question is what should android developers do what should android developers do well up your privacy game again we looked at large apps that have lots of significant resources at the very least apps should comply with relevant privacy laws and i'm saying this so clearly because we've seen apps that have like two paragraphs of privacy policies and that is unambiguously non-compliant but we also sort of like you have a responsibility like to not transmit data that doesn't need to be transmitted so data minimization and giving people an actual choice and it was quite interesting we reached out to all of the 21 apps that automatically transmit data as well as those that share much more detailed data and it was very fascinating to see the very different responses that we got from companies some we had the impression didn't fully understand the sdk and what the sdk does when some had a very different interpretation of what they should do legally um some work didn't really didn't realize that this is happening and promised out to update their apps um we need to give some credit to sky scanner this is the only app that got back to us within three days and said thanks very much we've already immediately updated our app and this is no longer happening we haven't been able to test it um and there's also been an app the weather channel they updated their app immediately after we tested it in december but basically the the responses were quite very varied and the general impression is that apps that have a gigantic user base need to do a better job at the same time however i think our research also gives some thoughts to rethink third party tracking on apps even if our legal analysis says it's a little bit more complicated um the facebook says the responsibility is with apps so integrating if you the moment you integrated a third party tracker this comes with risks and so the question is do you really need to integrate the sdk and if you integrated can you do it selectively you shouldn't assume that the default implementation is compliant and um whenever you implement it be very fair and transparent about users about exactly what it is you're doing and how you're collecting data so what should facebook and google do so privacy by design and by default is not just a principle for data protection but it's also something that's quite relevant here um we don't really facebook got back to us explaining the many different ways that developers are able to delay the kind of logging and or like or change the way that the sdk works but to the best of our knowledge there's no good reason uh why the default shares data automatically so if you're selling a product that allows people to send data to you and you make place the reliability on the people who do this data sharing why shouldn't this be privacy friendly by design and by default there's also something interesting that happened in both the responses that we got from facebook and also google's initial reaction to the research that was published by by the oxford researchers earlier this year and that was sort of like but other companies also track people and facebook spent a lot of time in parliamentary hearings this year and we had the pleasure to watch all of them and in one of these hearings facebook followed up in writing on the us senate hearing and in one of these written responses they said it was also the topic was also shadow profiling the tracking of non-users across the internet and the company said that's a standard feature of the internet um and google also pointed like amazon does this twitter does this etc so in a way we need to reclaim the internet like there's no natural law that says that every website every app you use sends such detailed data to hundreds of different companies every time you use them so what can you do so what can we do so what our response from google said there's two easy ways to fix this problem you just you can opt out of ad personalization and you can reset your ad id so we gave that a shot see how that works out so this is what happens when you opt out of ad personalization so the top is with ad personalization on and this is opted out it's a big improvement there the uh the actual the real the real improvement is that the flag has been changed from true to false but the data is increased and it's still being sent so yeah it's um so you can't really opt out then so your best bet well an audience like this has probably got one advantage you probably got if you've got an android phone it's probably rooted so you could always just block graph dot facebook dot com i don't know what the what the implications of that might be there might be other stuff that just stops working but but most users don't have rooted phones and so you the only other real alternative is you could try not install apps that use the facebook sdk but good luck working out which ones those are because it's not well indicated on any app i've ever seen so the only other options you have is to minimize what you're the data you're you're sending so you could segregate your apps have a different profile in your android device for every app so that the the advertising ids are separate and then you you know your your id will only apply to that one app and obviously you can always reset your id and opt out it doesn't really solve much but it does at least you know it keeps your profile fresh keeps the you know it's the time frame that also makes quite a big difference to how you are profiled so but why should you have to do all of this so i'm going to release all of my environment so anyone can re replicate what we've done i'm going to do that tomorrow because i need to get home and sleep but and so yeah i mean my environment is going to go on pi's website and then all of our documentation is also available on pi's website so if anyone else wants to look through what all of the apps we looked at don't do i definitely suggest you have a look at kayak and i just have a look at the led flashlight it's quite interesting especially the consent flows but um yeah and we also um tour thank you thank you christopher thank you for the amazing talk we are taking questions just to remind you there are five microphones two in the front and three in the back so queue up and we're going to take a question right now from microphone two what did you send to facebook to get your data did you send the advertisement id because it would be kind of nice to ask facebook and then complain to the apps and then ask facebook again and repeat that every week so i have actually done a dsr to facebook based on my advertising id and they responded outside of time it wasn't until we actually sent them that we were going to release a report they actually replied to my original dsr and they claimed they have no data and we were going to follow up on this in a not too distant future and they're releasing a new tool it's like this clear history tool because it's sort of like this experience just shows that it's excruciatingly difficult to actually exercise your data rights if you don't have an account so there's a process there's a form you can fill out but it's sort of like it should be way easier to get access to the data that companies hold on you if you don't have an account thank you thank you we are going to take a question from the internet real quick the internet wants to know did some of the apps prevent interception of data by pinning facebook sdk certificates none of them did of the ones i tested at least none of them they were quite happy to send data to a man in the middle thank you uh microphone one hi thanks for the talk are there any open source projects that allow access to these kind of sdks with privacy sensitive uh defaults uh not that i'm aware of i don't think not that sorry not that i'm aware of would this even be permitted by the sdk terms of service sorry say again would this kind of application or uh libraries be permitted by facebook or google's uh terms of api usage or sdk usage we've read them but i couldn't answer that on top of my head sorry thank you thank you microphone one again uh yeah thank you for the talk it was very interesting and uh i think from the top of my head that most of the apps you researched were free to download and now we usually tell the layperson buy your app so you won't be the product do you have any indication of whether this is actually true this is an excellent question and the reason so when you confront companies about third party tracking in general the the the response that you usually get is but apps need to monetize or publishers need to monetize and that's why we need that much tracking however the truth is that this has been the argument for a very long time and at the same time the tracking has become exponentially more invasive and with the argument is still that that this is all needed in order to show relevant ads and we sort of say there are ways to use analytic software there are ways to even show ads that are fair and transparent to people but sort of like but there's such a gap between what's what's considered sort of industry practice or what's happening on a massive scale and what would be considered transparent and fair so there's like a massive room for improvement and that's why i'm not buying the argument that that apps or companies like facebook google say but people need to sell ads but uh do you know if uh is the microphone on oh yeah uh but do you know if uh paid for apps also participate in this tracking we haven't i didn't do any analysis on any paid for apps thank you thank you um microphone five but since it's the default of the sdk it would be it would be really interesting if somebody does that task five thank you thank you for your research um you focus very much on facebook do you know whether or not the other companies such as the big five use sdk that's similar to this one um i didn't really do much digging on the other ones but just by looking at the logs as they were coming in you see an awful lot of other tools that developers regularly lose that there's a tool i think it's called amplitude and then there's a there's crash analytics from google i believe and they all i believe they must all come in an sdk because of the way that they send consistent data but i haven't tested them for what data they're sending thank you mic four so my question is twofold the first part will be um as you've said and have we seen that there is quite clearly a legal like they break the law by not answering the question stuff um out of experience we know that facebook's legal team is quite strong so would you say that there is a chance of like suing them and the second part will be uh what do you say if we just automate the process to reset the ad id could you say the last part again like is is there a way or is it like useful if we automate the process to reset the personal ad id so that we could just reset the tracking every time yes so to answer your first question we're considering what we're going to do next so i can say on that read the legal section like we so read the legal section in our analysis so we didn't want to summarize it here to because we wanted to do it justice and it's a bit complicated but i think yes it raises many questions and there have been previous cases about the tracking of non-users by social plugging and by by pixels and so the question is sort of like what's the what's the parallel here and to answer your second question about resetting your ad id i believe that was your question um it's it's it's not very obvious because to reset your ad id on an android device you go into settings google ads reset your google advertising id and that resets your facebook ad id as well because it's the same id and so one recommendation in the report is also sort of like this is the privacy settings of both google and facebook are still sort of counterintuitive and there's no good reason too why the ad id only resets when you reset it manually because we know that many people will not understand or know what the ad id is used for and what implications resetting it has thank you and speaking of the report i would just like to point out again that it's only being published now so you are the first people to hear about it and i think it deserves a round of applause so the next question is from microphone too hi um is there a way to delay loading the facebook SDK so that facebook wouldn't even have the ability to execute code and then send the analytics data until you really needed to perform the function that you initially integrated it for yes there's a there's a couple of answers to this question it's a good question the um so facebook say that any version of the SDK 4.34 or later has a delay function in it the developers even on this latest december are still saying that functionality is questionable as to whether it works but the actual the actual graph api isn't it it's just an api that you need a key for and we think that some of the larger apps which we tested such as the we chats and the drop boxes might actually implement their own calls to the api rather than going through the SDK which is why they don't automatically make calls to graph my thank you um microphone too for a follow-up question so my point wasn't that uh was actually not implementing the code yourself but using the facebook code but only loading it as soon as you actually need it is there a question oh is that possible was the question i don't know sorry okay thanks thank you microphone five hi thank you um seeing the amount of data that's transmitted um is changing the ad id even making any sense yes in the sense that it sends a signal that i mean it doesn't change anything and we say that very clearly in the reports you have a new profile it's not your old one it's a new one and we know how easy it is at least technically to relink data etc but i still think it sends a signal that that sort of you do want to minimize the granularity of the targeting that's happening and at the same time you do get a fresh profile um we had a little bit of a curiosity and that in our analysis the id did not reset when the phone was put reset on factories reset uh and google got back to us they said they tested it for several different android devices and they they had a different they had different findings so we couldn't verify google's tests it might be that this is unique to the environment in which we tested the apps thank you thank you microphone three yeah um hello uh do you have any information how it is on apple devices that's a great question that might be my project for next year so some of the uh the buck reports that developers filed were also about the uh apple ad id but we haven't we haven't done the the research but i'd be really interested to read it thank you uh microphone three again so you showed that when you opt out of ad personalization the amount of transferred data actually increases um do you know what kind of additional data that is me back on the slides that's like for this very specific app yeah and this is uh from sky scan this is their custom app data for origin selection so when you go through their process or when you when this was tested at least when you went through their process and you selected what city you wanted to fly from it would send data i'm i don't know why it sends more when the uh when advertising uh when personalization is opted out i don't know so that's we're just contractually when you opted out you're not allowed google got back to us to say when you've opted out of ad personalization you're not allowed to use this data for ad personalization first of all that's like a restriction on the on use it's not a restriction on connection but also it's only a restriction on use for advertisement we still think you could be using it for different purposes easy survey or whatever thank you mic five please hi so i have a question would it make sense uh for example that when we uh do this kind of research and we say that these applications are really uh violating the privacies to to actually extend the so we have the f droid and this alternative uh application stores to add additional indication saying look this is uh this application is very much has a very low privacy rating and then if somebody wants to download it he just can decide like there can be like five stars the the user review and like one star for the privacy review and just make sure that it's super obvious if you're downloading a privacy intrusive app not to download it or just just to be aware that what's happening that's an excellent point and that would be a great feature to be implemented um one of our recommendations for google was that they implement something similar to show you know for two users on their play store they already show certain other characteristics about the app like whether it has in-app purchases and other things why don't they show that it's got a tracker attached to it or it's using the sd a facebook implement integration or any of these other things but to add two things one is sort of like the play stores they have terms of terms of services and and and they all say you have to comply with existing laws so what we observe here is also sort of like there's a discrepancy obviously between the terms and how apps behave in practice and the other thing i want to stress is that for this research we only looked at facebook and facebook is a tracker that's present in many apps but there are thousands of of different tracking companies and sort of like to assess how invasive an app is it's really interesting to see how many different trackers they have and the research that we refer to at the beginning um by ruben bins and others at the university of oxford they really looked at the number of trackers that different apps have and it's sometimes is quite a bit i mean i was thinking thank you there is a person at mic four who's been waiting since forever hi um i was particularly interested in your communication with the app uh the apps you were like you contacted the app developers is there an option to make these conversations public i feel like it would be really interesting and it would give me somewhat of an angle to attack this problem from the customer point of view so we contacted the app on december 19 and we appreciate that this is an unusually busy time of the year um and we have added all responses we obviously can't publish the conversation but we published the responses that the apps shared with us and you can find them for each of the app you can see the response at the end and sort of like the many apps didn't get back to us but if they get back to us we would obviously consider to add their statements at a later point this is this is really brilliant thank you very much we have time for a couple more questions but and also to be fair some said they need more time to evaluate that internally so just to be fair um i wouldn't interpret a lack of response as anything negative just yet thanks i would like to say thank you to everyone who's asking questions today because you are making talks even more amazing than they are so a round of applause to everyone who has asked speakers at least one question please do that more often microphone one hi i uh hello um hi um uh i have another question related to communication sort of um i'm a bit surprised of the all the amount of data that facebook is taking um have you ever tried to modify the jason a little bit and i don't know upload like one gigabyte of them and see if facebook drinks it all i think that would give our general counsel a heart attack it's interesting though actually it actually raises a very interesting side point is that some of the apps when i tested them they have they're communicating with so many trackers that they actually like a few of them actually crashed my environment because it's just using up way too much memory trying to log all of the trackers it's trying to connect to you um and when i changed to add personalization off or opting out of ad personalization it actually starts uh well from my experience since a documented bug in on a mitm proxy that's in their github there's a bug where um the android device starts initiating http2 connections without any preamble and mitm proxy can't handle it and it just keeps dropping all these connections and eventually just floods your log with rubbish thank you i feel like next year we might have a talk from someone who actually decides to try it out microphone two please yeah along the same ideas is uh since you already fetched these requests um and they are bound to the app id to the ad id is it possible to maybe if freifung would collect like the last 1000 ad ids and just randomly mingle them to different profiles so to poison them all uh i mean i'm not suggesting to do that but it raises a lot of gdpr questions you might not want to do that i guess don't try this at home yeah microphone three please we have an overwhelming number of questions but we only have time for one more then thank you very much that i can be the last one thank you very much for your talk um one question um have you any experience in how successful is uh for successful facebook is to link your ad id with your account or what happens when you delete your account and um i'm pretty sure facebook still keeps some record of you and if they can they use any measurements to find you again even if you don't have your account anymore so facebook in their privacy policy in the cookie policy and in the in the business terms for facebook pro business terms i don't know the exact word on top of my head but they explain how they use data and there are sections that explain how the company uses data for people who don't have an account um and also in our conversation with the company that got back to us to say this is how we use it um we find that it's still not super clear um and and sort of like for this specific behavior that we're observing um that's a very vague answer but it's sort of like it's like we don't we don't know what they're doing so we have to because we don't know you could you can we have to sort of like trust statements that are made in the policies and that the company made elsewhere but what's interesting the responses to congress where shadow profiling was a big issue is the response of like various purposes are given and the data is stored for a varying length of time and so that's why it's very sometimes a bit tricky to figure out exactly what's happening with us and we'd appreciate if that's a bit more transparent yeah thank you frederike kaltoiner christopher weatherhead from privacy international with an amazing talk thank you