 Welcome everyone to our first talk for tonight. It's called, Won't Somebody Think of the Children? Excelling Copper Compliance at Scale. The speakers have asked to introduce themselves. Welcome. Hi everybody, how are you doing this morning? We are really excited to kick off this amazing village. As you can see, this research is an interdisciplinary research with people from a variety of universities. We are from XC at UC Berkeley. And let me start with a direct question, because I am Israeli as you hear from my accent. Who here brought a burner to DEF CON? Not their personal phone. Great. Who here is not using Android because of security issues? Okay, we have a number of people. Well, the issue is our phones are of course, they have very sensitive information. Apps are collecting that information. Through those permissions, we often see the Play Store and the Android system. We only know what they may request and may collect. We don't actually know what is happening in reality. What information is being collected? And more importantly, who is getting that information? So my colleague here, Primal, will kick us off and explain what we are doing in this privacy audit, which is a collaboration between a bunch of hackers, security researchers from XC and me. I'm a doctoral law candidate at Berkeley Law and a CLTC grantee. Thanks, Amit. I guess everyone can hear me. So I'm Primal. I'm a postdoc researcher in UC Berkeley and UC. So as Amit kind of said the stage why we need this. Well, the simple question is that once you give permission to an Android application, that's it, you don't have any idea how the application is going to exercise that permission. Or who else is getting that information? So to answer that question, we developed a dynamic analysis platform where we can see how exactly these applications are used in those resources and who else is getting these resources as well. Well, it's no longer secret that whenever an application access your data, it's going to share the data with the third party users. So we won't understand that ecosystem. And then the core use case of this platform is to actually see how these mobile applications are compliant with the regulations. Kopa, GDPR, the California Consumer Process, et cetera. So how the applications are actually compliant with these regulations. So the instrumentation framework has two major components. The first one is we actually have a custom built Android 6 where we can monitor whenever an application access any sensitive resource. And also all the surrounding contextual information with that permission request. We also have a tool called Luma, which is pretty much like a VPN monitoring Android where we can track all the network traffic originate from a given application and its content as well. So we can see whenever an application access resource, a given resource, we can also see who else is getting that resource as well. So using that platform, we built an automated pipeline. We scraped Android Play Store to download all the most popular apps under each different category. And then we also have an Android exercise of monkey who runs this application for 10 minutes. And then the platform will try to get all the data on the resource access and then the network sharing. So, and then right now we can actually execute 1,000 applications per day. And we have already analyzed close to 80,000 so far. So technically speaking, we can analyze, we can see every single sensitive resource in Android. But for the purpose of this talk, we are just going to focus on a subset of it. And then we are going to categorize into two categories. The one is just personal information, like your contact details, your location details, or even your Wi-Fi crowd information, which could also be, you know, trace back to you. And then all the different persistent identifiers that Android proids for third party applications, like Imi, Mac address, Android ID, or even the Google service framework IDs. So that's like a very high level overview of the instrumentation. Let's hear the legal side of it. So what are we doing with all of this information? We started looking at compliance. Our infrastructure is applicable also to GDPR and the coming privacy regulation in California. If you heard about it, it's coming in 2020. But we basically started with COPPA, with the Children Online Privacy Protection Act. And the reason we chose COPPA is because COPPA has very strange definitions on what you are allowed to collect from children under 13. What kind of disclosures you need to give to the parents? What kind of privacy protections? And COPPA defines personal information very broadly. So this, of course, includes contact information like email and fine geolocation. But it also includes persistent identifiers that are defined broadly. Basically everything that could be used to create a profile on children. So because our infrastructure is basically looking at how SDKs and third parties are getting various kinds of persistent identifiers, we chose COPPA as our first case study, but we are actually expanding this process. So what about COPPA? COPPA is designed to basically regulate the way online service providers, including mobile apps, are collecting, using, and disclosing personal information of children under 13. Now COPPA specifically prohibits be able advertising and collection of persistent identifiers without a very, very strange type of verifiable parental consent. And when I speak about a strange regime, I'm talking about you actually need to verify that the parent consented to the collection of the various type of information. So you need to get the credit card details from the parent or their address or call and talk with them on the phone. We will see in one minute why this issue of consent matters. Now COPPA also requires that you will actually have in place reasonable security measures. As we will show, that's not exactly big, big, the reality in practice. A lot of apps don't have even the basic of basic reasonable protection security measures. Okay, now COPPA also has teeth, although it's still not clear if we have a private right of action, so whether individual can go to court and sue a company for violating COPPA, that is still kind of emerging right now with a bunch of new class actions, the FTC can go after companies for violating COPPA and state eternal journals. And each violation, and you will see the potential violations the amount that we are talking about them here today, is up to $40,000, yes, per violation. And maybe you remember that from that episode in Silicon Valley where Dinesh freaked out. So we actually see how the FTC is bringing settlements under COPPA, but remember the FTC has limited resources, unfortunately. So it's interesting to kind of compare the different settlements that were actually brought by the FTC under COPPA and what is actually happening in reality. So one very important issue, especially for our paper and our talk, is who is actually being enforced under COPPA? So we have three main categories. First, we have the actual commercial website or online services including mobile apps that are directed to children. So if you have an app and you are teaching one to free or ABC and according to the total circumstances, this is directed to children under 13 because you have cartoons or simple language, et cetera, COPPA applies on you whether you're thinking you're directed to children or not. This is a flexible test that the FTC applies. Then we actually have our operators of general audience services. So think about something like Yelp. Yelp is not directed only to children, but if that online service provider has the actual knowledge that there is a child interacting with the service, maybe because theoretically and unrelated to any settlement that was reached with Yelp, they collected date births and they knew the child was under 13, then COPPA would apply only also with respect to general audience services. And most importantly for our talk, COPPA also applies for third parties. So our ad neck networks and the SDK and basically everybody that wants to get money and basically monetize our information, it also applies to those services if they have actual knowledge that their service is being used in a child directed other service. So think about an SDK or an ad network implemented in a child directed app. That is important because the FTC actually brought action against Inmobi, which is this kind of ad network or a fourth party. So what are we focusing on? We are first focusing on almost 6,000 of the most popular apps on Android Design for Families, Google Play Store. And the key issue here is as you see here, there are apps, every apps that is listed to Design for Family, in Design for Families at Google Play Store actually represented to Google that their service, their app is child directed. In fact, if you're not child directed, you have no place on Design for Families. They also represented to Google that they're compliant with COPPA. Okay, remember that when we get to the results. So as I told you, almost 6,000 of the most popular apps on Design for Families, what did we see? Not sure that you're gonna be surprised, 57% in potential violation, more than the majority. Out of this, 4.8 are actually collecting personal information. So find your location, emails, and the phone number, yes. Then we have 39% collecting non-resettable persistent identifiers. 19 potentially non-compliant SDK. What I mean by that is 90% of those apps are actually implementing in their app a service that in their terms of use that there shouldn't be a part of a child directed service. So the app network said in their terms of use, you cannot use our service in child directed services. And yet in 90% of our cluster, the app developers implemented those SDKs. And then we have failure to take basic security measures, basic TLS. So I'm just gonna run through it because we talked about it. The main idea here is how do we know that no consent was obtained from the parent? Because we have our monkey. Our monkey just clicks. It's not a real human being. If the monkey can bypass the consent, also a child could do that. So that's how we know that no available parental consent was obtained. And we talked about the different violation that we found. This is just a showcase in which type of app developers that happened. Time Lab here. They're a real big compliant. 1.9 collect actual content information. Another interesting issue is that Google have this idea that they created a resettable Android ID so you can reset it. But what we see is that developers are actually undermining that by collecting that information. The Android ID that is supposed to be resettable with a persistent identifier that is not resettable. They're by undermining the whole idea of the Android ID. 39% of that, of our cluster, that billions of downloads are doing that. And let me skip right in and get to a bit of the crypto. So, Primal. Yeah, before getting into the crypto, I'm just going to skip over. Since we mentioned that not only application, the third party service are also, you know, they are subject to the corporate law as well. So different third party services have taken different measures to make sure that they are not in the cluster. So here is an example, if you are a subscriber of Unity, then there's this tick box that when you new register, you say that, you know, I'm actually giving you data that's likely collected from a keeps target application. So, and then in the actual network flow, there is also a flag that you can say that I am sending data from a keeps application. But what we found out is that 84% of the time applications are not setting it correctly. So, the reason why they have this flag is like they want to treat the data that they are getting from keeps application differently, because they are certain restrictions, but if you not set the flag correctly, then they are likely to not treat the data proper way, so in violation of COPPA. And then there are other services that they don't have any flags, they just outright say do not use this service. So for an example, Crashlytics is a very popular third party service among mobile applications, say that if you are a developer of a keeps target application, do not use this service. But unfortunately, and not so surprisingly, 19% of the applications, they share very sensitive data with non-COPPA compliant services. And then if you actually calculate the number of affected users against each SDK, it's hundreds of millions of users. So it's not a significant, it's not a negligible portion. And then we mentioned that an important clause in COPPA is that, you know, you have to take proper security measures. But what we found out is that 40% of all the network flows that we see, they didn't even bother using as a TLS, that's most of the time just is a standard library number, you don't have to do much, they didn't bother using. So after actually doing the full analysis, we found that there are a few cases where we see the application accessing certain resources, like location, email numbers, but we don't see them sharing all the network. Well, you know, the first intuition might be, you know, for once they're actually trying to be privacy conscious. Well, and then, you know, after years of trying to analyze application, we knew that it's not going to be the case. There's something fishy going on. So we actually filled out all those applications and then we started manually analyzing it. So we actually came up with quite a few in very interesting use cases. I just picked three for under no particular order. So there is this application when we started and manually going through the network flows, there is actually a variable for locations, but then there is a value. So the value is not like co-fine location data. Well, then we tried with Bayer 64 decoding, nothing came out. And then we decompiled the code and then we saw that they get the original location data and then they XO twice with two strings. So we thought maybe, you know, they have like a one-time pad or they have like, they are probably creating this value of the five. But we found out is that, so they have two hard-ported strings and then encryption key, encryption key and encryption key. And then it's not enough startup, startup, startup. Well, startup is at the actual SDK that's doing this. If you search it, it's actually a quite popular third-party at service. And then you get the raw location. And some people were actually encrypting it. You know, they were using AES. But unfortunately, they have the key and then the IV both hard-ported in the code itself. So it was one extra step, but we just had to make sure that we are scanning it. So the last use case is that, so here's a network flow that we detected on our application. So if you can see, this is the EMI number and this is the Android ID that Amit was talking about like the reasonable. And there are a bunch of data, chunk of data in between. And then as you can see, these are repeating. So there's actually three unique values and then just repeating itself. What we found out is that, so they are sending the raw EMI number. And this is in one network flow. And then they're also sending the MD5 of the EMI number and then SHA1 of the EMI number, SHA2536. And then these are all same values, it's repeating. And they are doing the same thing for Android ID as well. But the only difference is that in the Android ID, the values are not repeating. You have the MD5 and then a different value, SHA1 and a different value. So what they are doing is that, so they take the MD5 of the original value and then convert it to uppercase and then take the MD5 again. So if you take the uppercase of a number, it's still same. So that's why you are seeing the repeating value. But for the Android ID, well, if you take the uppercase, it's a different value, so you get a different value. Well, for the more, you've been looking at this for months, we still don't know why they're doing it. So if you have a better idea, we can talk about it after the talk. So, and then if you, there are many more interesting use cases in our official blog post, well, I mean, we'll talk about it. Well, the bottom line is 57% of the time, they are potentially violating at least one of those posts. Great, so by the way, Copa has a safe harbor regime that allows private company, private organization to certify a service as Copa compliant. This is supposed to make enforcement easier. What we saw is that the practices in apps that are certified as Copa compliant under the safe harbor regimes are actually no different than others. So this is a bigger picture kind of finding with respect to the applicability and the effectiveness of those safe harbor regimes. And by the way, we got to the news. It was fun. And we are happy to chat about it more. We do have some more findings with respect to something we are calling mixed audience. So one of the problem is that apps like this are categorized under Google Play Store as mixed audience, not primarily directed to children. So some developers, after we warned the news, they claimed like, oh, we are not really directed to children. We see teens and adults using our services. And as you can see, this is clearly not child directed at all. And what about other examples? So overall we found that more than the majority of our cluster, 51% of the apps in our, the most popular apps in our research are actually family friendly, not primarily directed to children. And this includes this adult oriented app and this. And what about this? Clearly not directed to children app. So there is also an abuse of the design for families mixed audience regime. And I want to close it with a bunch of recommendations and what we are suggesting. Just some findings from our research. We have a full paper, it was published in pets. It's online, it's accessible to everybody. We have a blog post. And actually all of the information is on our blog app sensors. You can look whichever app that you want and get exactly what they're collecting and we're encouraging you to do that. So what we're recommending is in this scenario, be sure don't be the Nash. Developers use compliant SDKs, right? Copa is a big deal. You don't want to have the FTC knocking at your door. So read the terms of the SDK, use the flags, okay? SDK providers, come on and force your terms of use. That is not as hard. Platforms, Google, Amazon, Apple, you have a lot to do here. You have a role to play here and force your own terms of use. And I'm happy to say that we are actually talking with Google and trying to help as much as possible. And users, of course, privacy awareness. Parents, go on app sensors, look at what is being done. And finally, for you, everybody here, we need more privacy auditors, we need your help. Join the privacy auditing efforts. This is one just cool research. We would love to see more and more efforts into at scale dynamic analysis tools. Please help us, help everybody else. Here are some links, you can follow up with us. Let's open up the question. But just a heads up, if you do end up doing this, you might get letters like we got from ad networks. Don't get too excited, that's part of the deal. When you get a legal front letter, you know you met it as an academic. Okay, let's open up for questions. Thank you.