 The makers of the Firefox browser, and I'm very pleased to have this recording for you to talk about lean data practices. This is something that Mozilla practices we love to talk about it we love to teach about it. And so I hope that you will enjoy this presentation, and we'll learn a few things. So let's go ahead and get started. First, I will introduce myself and again welcome my name is Alicia Gray, senior manager for trust and security at the Mozilla corporation. I run our privacy operations management program. And so I live deep in our lean data practices as we advise and counsel our product teams and various operational teams on how to handle privacy and security at our company. We're going to talk a little bit first about internet essentials. What is data, where can you find it. So that will help you with your lean data practices projects. Excuse me, then we'll talk about introduction to lean data practices what is it. How might you think about it, where can you use it. We'll talk about lean data practices, or privacy by design as you're thinking about how you want to approach your products and services. And then finally we'll talk more specifically about putting lean data practices into practice. So let's go ahead and get started. First, before we talk about lean data practices, it's really important to understand the general big picture. What is internet governance. So you probably know that the internet is a global system of interconnected interconnected computer networks, including all the devices on those networks so whether it's your phone, your computer, a laptop, thermostat, a car, anything connected to the internet is part of that interconnected computer network. There are two multi stakeholder organizations that the support the development and maintenance of the internet. That's the engineering task force, and the internet corporation for assigned names and numbers, I can say for software applications and services to work. Obviously data has to travel across the internet, and they do that through physical wires that are controlled by our internet service providers. This gives our local ISPs the ability to collect, intercept, reroute, or stop the flow of information across their wires, and every country has a different approach to ISP regulation. So let's understand the ISPs are involved here because you're going to see the internet protocols work, in particular what that means for personal data in your collection of it. Next, you have choices with data. So device can stay locally with somebody. So when we talk about a client, that device is a local computer that can access a remote server through a network. So your phone, your laptop, your computer, these are all examples of things that we call clients. Data cannot be transferred anywhere if you haven't connected these client devices to the internet. So if you have no wifi connection and no data for your phone, it is not connected to the internet and everything is local on that device. Software applications can also have local clients. For example, Microsoft Word, Word, excuse me, Adobe PDF web browsers are all software that's capable of processing and storing data on a client's device, does not have to leave the device. It can stay on that device unless you, a company or an organization, a third party, or a software application cause that data to be sent someplace else. Local data on a device can be accessed by anyone who has physical access to the device, including that password or remote access to the device through the internet. But as we're aware, most data does travel across the internet. So a server is a computer that's connected to the internet that performs some kind of a task. A database is a software that runs on a server and stores data so that it can be made accessible for use. Some people host their own servers, but most servers are remote. Data centers are the physical buildings for servers, databases and other software applications that are physically connected by wire to the internet. Any stored data is physically located in a database on a server in a data center. That's like a cloud service. And then obviously, finally, content delivery networks or CDNs are distributed servers that can quickly relay information to end users by geographic proximity CDN operators host their servers and data centers run by local ISPs. That's how you get web pages delivered to you. So this is important because it's important to understand that internet travel is both public and private. People have really different expectations of what is private when it comes to online activity. This is more often than not shaped by their cultural contexts and their own computer literacy. ISPs, because they have control of the physical wires that this data travels across and have to direct traffic to the correct places, generally have the capability to know what domains and IP addresses their subscribers of visiting and using. Third parties can also observe or interfere with online activity. Various technologies can be used to keep data secure and private. This includes things like caching, encryption, multi factor authentication passwords, the protocol of HTTPS, virtual private networks, and so on. So now we understand a little bit about how the internet functions, things like data centers, local storage and so on. Now we're ready to talk about lean data practices. So what are lean data practices? These are practices that Mozilla uses to advance our privacy, security and transparency in our very own products. Lean data practices is a framework that anyone with personal data that they want to use, collect, store, process, can use to build in privacy, security and communications in ways that help build trust and reduce risk. So Mozilla facilitates workshops around tables on lean data practices, and we offer a toolkit, templates, case studies, and other resources at www.leandatapractices.com. And anybody can use that toolkit to get her repo. And please do check it out. I'll show this again at the end of the presentation. So why Mozilla? We have experience in this space. We have password managers. We used to run an app marketplace. We have mobile apps. We use cloud services. We have developer resources. We worked in virtual reality speech recognition. We have video conference. We use video conferencing. We had a mobile OS. So we have a lot of time and effort put into this space and we think that we're very well positioned to help others understand ways that they can respect user privacy, while still offering the ability of products and services. What have we done with LDP so far? We've reached over 100 organizations, including in India, the United States, Europe and Kenya, and a variety of industry verticals, including e-commerce, finance, health. We've done this through live and remote trainings and through roundtable discussions. We had a pleasure of being there last year, and I'll show you a couple of pictures a little bit later on. This is our sweet spot, and we like to work particularly with up and coming business to consumer tech companies. It's really beneficial for companies that big enough that this kind of privacy protection development matters, but small enough to not quite have it yet nailed down so this kind of training and these types of toolkits are really beneficial for them. So here's a picture from Hasgeek last year when we had Mika Stra or Associate General Counsel came and spoke. And I believe this slide is from when they were in Kenya. So you can see a lot of people very interested in Mozilla's lean data practices training. Okay, so let's talk a little bit more in depth now. Let's move on and talk about, do you do lean data practices, or do you do privacy by design? And privacy by design is about staying lean and being smart about how you collect data so you can build trust with your users and ultimately help grow your business. And privacy by design was developed by Anne Kivorkian out of Canada in the early 1990s, and includes seven principles by which organizations should be thinking about how they can embed privacy into their tools and services. It doesn't have to be or it is and. Mozilla's lean data practices have three pillars, which by design and by nature are embedded in privacy by design. So this is a way to get you started, particularly for small organizations that might not know quite where to start and seven principles seems very difficult. So there's three actions that you can take today in order to start building your program. So what does it mean to practice lean data? So we have three lean data practices. First, we're going to talk about how to engage your audience. Second was talk about staying lean and what that means. And third, we're going to talk about building in security. So the first lean data practices principle one engage your audiences. Our biggest tip here is, what is it that people would find surprising about your practices? What is it that they might not know? What is it that they might find out someday they're like, Well, I didn't know they had that, or I didn't know they do this. That's what you should be more clear about with your users. So let's talk about that a little bit more. This is a picture of an application that our associate general counsel had, and it's an app. And obviously you can just install it. Nice install button. And when you install the app, it asks you for a first name, a last name, a phone number, an email, and then a password. So it doesn't seem too uncommon. But here we go. We should be asking users might be asking why do you need my first name? Why do you need a last name? And in particular, what are you going to do with my phone number? Well, this is what happened with the phone number. The phone number was provided. And we started to get all kinds of marketing texts with 15% off this and 15% off that. This was not, this was a surprise. This was not expected from what this person thought they were getting, which was to sign up and order medicine. And you'll also notice that none of these fields appear to be optional to the user. So they appear to all be mandatory, although there's no asterisk or other indicator that these are required fields. So this information is being used in ways that surprised the user. And that's what you want to avoid. That's what creates distrust and tech companies. And so you should really think hard about, if I'm filling this out, what would my user expect? And what do they actually need to provide us? Okay. So this is part of the disclosure that that prior app had for people to understand what it is they were doing. And privacy notice, it said we may share the personal data you provide to us with government agencies, relevant multinational bodies and other bodies for the purposes of influencing policies are developing higher within the relevant professional organizations. The users probably will not understand us. And it doesn't say it when circumstance somebody might share data with the government, who are other relevant multinational bodies or and who are these other bodies. It's really important to be clear with your users about what the intended data sharing could be for the information that they've provided you. If they provide you with a bunch of information does that mean you're going to turn over a bunch of information to some kind of a government agency. This statement is not very clear. This statement says when you have a policy with an insurance company or an account with a micro financing house, we may share the data you provide to us with such insurance company or micro financing house. So if I'm signing up to have some medicine delivered. I am not clear as to why you would need to share that information with an insurance company or micro financing house. And the fact that this is may doesn't give me the option to decline that. So this leaves a lot of power in the company's hands and very little power in the user's hands. And that is something that company should be striving to avoid and try to put more choice and control into the user's hand about what they want to have done with their data, including data sharing. What are some of the things you can do to practice engaging with your users. So some tips. First, when you have people signing up for services, use icons to indicate what your active data collection is what does the user need to provide to you. Can you create permission panels so can you provide people toggles share this and don't share that. Can you use prompts or notifications to people. Maybe you put up a just in time notice about this is the information that we would like to collect and this is the reason for it can you please provide it. Can you provide an onboarding tour that shows people the benefit of providing the information that you're asking them for. You can use interstitials you can use overlays. And of course you can put data disclosures and email footers as well. Few other tips, use unchecked boxes to make opt ins clear. Don't default opt people in it doesn't give them much control. Try to make your optional input fields clear. You can say optional. You can divide them into sections. It's really clear for people what they don't need to provide to you. Make the controls accessible. Don't bury them in settings pages. It's really hard for people to be able to actively control their data, if it's difficult to find. Always provide controls to disable or delete where you can so the user can maintain that control and explain the data value and understandable terms. You can use that example where we just saw but make it clear we're going to use your data to provide you with the service to send you something to connect you with somebody else in a community that you might be that you might be in together. And then finally, make the information easy to understand and void legal jargon and those sentences we just read were really long and had virtually no meaning. Highlight what people care about. That's how you engage your users. So they're really interested in knowing what you're going to do with their data, make it clear. Explain the things that people would be surprised by. So any example we saw if you're collecting a phone number because you want to send me texts for advertisements and coupons, then you should be telling me that someplace that I can see that easily. And then finally, make it look nice people should be able to to find information without having to struggle so try to use headings numbers or layering to organize the text of the information that you're trying to share. Okay, principle to let's talk about security. So as we saw in the beginning when we talked about access to data. There is a lot of things that we can do on a security side now that we've built people's trust to secure that data that they have trusted us with. So how can we practice that good security. There are always administrative and technical and organizational steps that you can take. Some of these things are role based access controls. So within your own systems people need to work. But if I work in the HR department. I do not need access to a subscription payment center. So I should have access to what I need to do my job for human resources purposes. The person that works on subscription payments needs the access that they need to do their job, but we do not need access to each other systems in order to do our individual jobs. So people should have the access they need for the job that they do with the least amount of privilege necessary for them to conduct their work. It's a really important setup. It allows people to do their roles but it also makes sure that they don't have more than is necessary. Another thing that you can do is to look carefully at your system integrations and network weakness potential network weaknesses. So systems that are connected between each other have integrations it might be with an API, it might be through an SDK could be system to system communication. Those are all points where something could potentially happen. So you have to work with your security teams to make sure that those integrations are as secure as possible. And if there's the opportunity for third party vendors to enter into your systems, maybe they have a login of their own, you need to make sure that their protocols provide them only access to the particular network that they need to work in. So security breaches such as the target breach in the United States where third party vendor HVAC vendor, so heating and air conditioning vendor had access into a particular system at target, and the third party vendor was hacked, and the third party vendor was able to go through the heating and air conditioning companies systems to get into the target systems, and then we're able to steal credit cards from the point of sale machines. So it's very important to check and make sure that you have system integrations tight that you don't have weaknesses in your networks in terms of people moving through networks at various points, but something else you can do. And sharing is also a place that you can practice good security. So as I mentioned in the target example, weakness at a third party with access to your systems is a weakness for you. But we also have security potential issues with just the data that you might send them for everyday processing. So you always want to make sure that your vendors have good security protocols in place. You have a breach notification policy and process to inform you should they experience an incident, and watch out for things on shared drives, such as on cloud services, where sharing settings might be more permissive than you wish them to be. So always be on the lookout for things like that. You can use those keys. If you use GitHub to commit code to or things of that nature. It is not uncommon to see people accidentally copy and paste API keys or other types of keys in those places. And you don't want those things out there where other people can get them to commit code. So just keep an eye out for that. That's also really important security practice to watch out for. Don't forget the paper. A lot of us still have paper files. Some countries require paper files for certain types of data, in particular in the HR world, maybe in the health world. So we might all probably have files sitting around. Keep the file cabinets locked. Only the people that need keys have keys. If you have server rooms with localized data on them, make sure the server room is sitting in a cage and only the right people have keys to gain access to those server rooms. So always don't forget the physical is just as important as the SM role. And a couple more security measures you can take. Again, you have physical measures such as ID cards for getting into a particular space. Shred those documents make sure that file cabinets or closets are locked access controls and passwords are really important and multi factor authentication for both employees and potentially even for users of your products or services, always a good way to prevent account errors. And finally, technical measures such as penetration testing, encryption, intrusion protection and vulnerability reporting are also great technical measures that you can take to secure your networks. And finally, principle three be lean. So what does that mean be lean just means don't collect more than you need. We call it data minimization we like be lean because it's even shorter than minimization. So this is an example of a screen for a service that somebody went to sign up for, and there's a lot of data to be collected here on the screen including credit card information. You can see email, a phone number, credit card, a mailing address, and you can also see that this checkbox is automatically checked so we talked a little earlier about not having checkboxes automatically opted in. So this person signed up for this account quite a few years ago, as most of us probably forgot that they had it, and then all of a sudden, they get a notification of a breach. So, wow, compromised accounts, 26 million accounts, including email addresses, names, phone numbers and physical addresses. Going to talk about lean data practices is in this entire page, depending upon what this particular service might have been doing this was ticket fly so I believe it's concert tickets. The question is, how much of this information was really necessary to collect, and do they need to store it. If you don't have it, it can't be breached. So, as you think about what you're going to collect, really take a hard look at what is the minimum necessary in order to provide the service, and try not to collect more than you absolutely need. So how can we practice lean data. So always try to think about it from that user's point of view. You always want to be able to answer the following questions. Why? Why are you collecting this. So for every piece of data, you should have a very crisp and clear answer. We're collecting your mailing address because we need to ship you something. We're collecting a credit card because you have to pay for something. We collect your email address because we need to send you a confirmation. So always make sure that you're able to answer the why. So what are you collecting from me. I shouldn't be surprised to find out that you collected from me a profile from my social media pages. I should have the right to know whether you're doing that or not and it should be really clear to me, particularly if you're going to correlate that data and somehow provide me with targeted advertising. The how how are you getting this information so am I providing it directly to you, or are you getting it from a third party. Who is going to have access to this data. So who in the company can see my data. Are you going to share it with a third party and why make sure that you're explaining that and answering those questions. And then when how long will you keep this data and when will you get rid of it. Do you have a retention time for it, or do you plant plan on keeping it forever. All data has some kind of a life cycle attached to it, and it probably isn't good in perpetuity. And then this goes back to the question of surprises that we talked about earlier in this presentation. If any of the answers that you're giving yourself as you ask yourself these questions, cause you to raise your eyebrows and surprise you. More than likely it's going to surprise your users. Security for the data that you do have how are you securing it sensitive data such as a credit card information or health data requires a different level of security than perhaps just an email address. So the more sensitive it is, or the more of it that you have your security needs are going to change and potentially increase as well. So what I'm trying to ask is are you collecting data for future but undefined reasons. A lot of companies like to do this because that data they feel might be useful in the future. But in reality, data does have a shelf life and it isn't good for forever. But you may find that the data that you've collected is biased in some way, and isn't actually useful for whatever thing you might come up with in nine months or a year or three years. So you should really think hard about the data that you're collecting, and it's usefulness past the period of time that you need it for. If you are collecting data for some future undefined reason, you need to be really clear with people about what that is, and how long you might maintain that data, so that they get a choice as to whether or not they want to participate in that kind of work. And I think it's really helpful, it's something that we did at Mozilla last year because staying lean doesn't mean that you don't have any data at all, and you eventually do have to clean it up. And this is very hard, we get busy in any given day, and the year passes by, and all of a sudden, we're three years down the road, all the things that we intended to do just sometimes don't always happen that we're human and it happens to everybody. So here we introduced an annual spring cleaning week. And the goal of the week was to do some quick, meaningful things to make an impact on the data that they might then a unit or group of people may no longer need. So what we did was over the course of a week, we separated this workout into some focus area so it didn't feel so overwhelming for the teams that we were trying to initiate this project with. The first day of the week, which was a Monday, we started with paper and physical files, and try to remind everybody if they had stuff sitting in a desk drawer or a cabinet that they no longer needed, then they should go ahead and shred it. If there was no business reason to keep it. The next day we focused on laptop files so things like in a download folder or documents that you've created over time that are no longer useful. Drafts, things like that things in a shared drive to go ahead and clean those up and get rid of whatever is no longer useful. The next day of the week, we worked on email. So we get a lot of emails are to clean it up, but go through and clean up. Again, things that are no longer necessary communications and slack channels, and any Google groups that need management that your company might use. We tend to forget about those Google groups that sit out there. On Thursday, we worked with them on not sure items so maybe they had an item they weren't really sure whether they should keep or not. So they could talk to their manager about it, they could get privacy input on it. So it was kind of an ask a question day. We had people to set up password managers for their SSL accounts and to install their crash plan application which for us is a backup system for things stored locally on devices. And then Friday we had people fill out a survey as to whether they thought this week was useful, their feedback on it so we could improve it going forward. So we had a lot of surprises to our to our participants as well. So it was a really great way to spend the week really reminded people of how difficult it is to clean things up when you don't stay on top of it. And so we hope to encourage this as an ongoing project at Mozilla. And it's a great idea for for other companies to give it a try as well. I will thank you very much for listening to this today. I hope you found it useful. Again, here's the link that I referred to in the beginning that you can visit. And you can get the lean data practices toolkit you'll find worksheets and other really great pieces of information in there to help you as you start this lean data practices journey. And thanks again so much. You take care of yourselves. We'll talk to you soon.