 Hi, my name is Namit. I have a question. Maybe Mika might be in a better position to answer that. So we talked about data. And GDPR is one big regulation that affects how an enterprise is stored and used data. But I suspect, eventually, governments are going to go beyond just data and also look at what data enables. Case in point is a white paper that the UK government released earlier this year called The Online's Harm White Paper. It's an outrageous piece of document because it just goes in a superlative punitive suggestions for what organizations could do. But we also talk a lot about ethics of machine learning and AI. So my question is that we know that, eventually, governments are going to go after regulations. What is the role that companies can play, and, for example, Mozilla Foundation, to inform them better of how to make good regulations? And regulations is one thing. Industry groups also come under ISO to set their own standards in a lot of ways. And that's particularly where companies can influence how ethics or good governance is incorporated into the businesses. So do you think industry has a role to play there? And how do you think Mozilla might like to get involved? Absolutely. So I'll start off, and then anyone wants to join. So I think the starting point, why? Yes, governments are increasingly looking into regulation all over the world. They're taking different approaches. And some of the approaches are alarming, as you mentioned in the paper. But if we go into the cause, it's because everyone is fed up. At the end of the day, people are like, there's something wrong. And what can we do? And I think that's where it's incumbent on companies. If companies don't want terrible regulation, we are the ones who are empowered to start making change. If companies, let's say, ideally, if all companies follow lead-nata practices, and they just started building their products, focusing on smart data collection, focusing on actually being transparent with people and allowing people to understand and giving people choices, building insecurity so that breaches are going to happen, but that they're not at the scale that you see, that would have changed that. If that had happened in the Maya ideal world, then you wouldn't see all of this sort of reaction that's coming, and there's this push that we have to solve this problem immediately. So that's where I do think that industry absolutely has a role to play. Companies don't need to wait for regulation. Companies can start making changes today. If you're in charge of your own company, you have all the reason to build trust with your customers, and for all the reasons everyone described, there's actually a lot of value to your own business. So I'll see if anyone else wants to add to that. It's more than just being altruistic. I believe there is a business angle here that will be beneficial as a competitive differentiator because you'll be ready if and when there's regulation that comes. You'll be more prepared because you know exactly why you chose that data, and you'll be able to be much more transparent as well. And I think it's actually, in addition to industry, consumers have a very powerful voice here too. People feel frustrated. Like I mentioned, I was too lazy to call my dentist and tell them, please delete my record. But you actually experience multiple moments like that throughout the day in all the products that you experience. And so we also then have the ability to start getting on social media, talking, complaining. And the more companies hear that, I think that also adds to that circular of, they understand that there is value to making these changes. There's a question in the back. I'm coming from health industry. My name is Akshay. So the questions that we need to ask is not very well defined in industries like health, where we are hoping that if we collect a lot of data, we can then do some analysis on it and then figure out something useful for the user. So maybe this question is to Rebecca. So how do we know, I mean, in industries like those or even for tech companies like Google, except for saving on electricity and, you know, and now that they are GDPR compliant, at least they are able to be transparent and protect privacy in at least European countries. So what is wrong in them collecting a lot of data? Why should they stay lean when we do not yet know that there are, I mean, when we do not yet know the questions? That's a really, really good question. I think that you kind of have to take it on faith until your faith is broken in these companies. And that is kind of scary because for something like health in particular, the outcomes can be pretty deleterious for society. So it's good until it's not. And this is sort of true for a lot of these types of applications. So I was thinking about how I would answer your question as well, where I think that a lot of companies really can only promise transparency because privacy as a concept in most people's minds is something that is pretty inconsistent. A lot of people are fine with a practice until something happens and then they're not. And so I think this is also true for these sort of health applications where people are fine when they hear all of the ways in which it's being used really well, but then suddenly it's not. And then all of a sudden it's the worst thing ever. And that is one of those practices where because it can lead to actual bad outcomes for people's health and that could have bad social impact, it seems like, again, it's not just not collecting a lot of data. If you saw how we talked about it at Firefox, we actually collect a lot of data. We're just really transparent about what we're collecting and why we're doing it. So I would say for those health outcomes, if you were collecting a lot of data, as long as you're transparent about what you're collecting it and why you're doing it, you're still following these lean principles. Maybe you wanna also add, we've talked about how you can certainly start collecting everything because you're not sure how it's gonna be used in the future, but that's actually not gonna be necessarily useful for you in the future. It's kind of like, I think about how consumer preferences change over time. And a common example for a lot of data scientists is you say fashion is a big place where you see a lot of data science. Stitch Fix is a company that has made lots and lots of money doing this, but fashion changes with every season. So if you're using old data to fit these models, you have to continuously refresh it with every new seasonal change. And I think also with health applications in particular, a lot of the data science applications, they don't have a lot of underlying theory about how these things work. So you end up with models that seem to be very successful at predicting a known health outcome, but then new health outcomes come out where we don't really know how the theory relates. So again, I think that when you're thinking about how these sorts of practices work, really it's just transparency in the end and hoping that through the transparency, people can independently look at the way you're thinking about data and they can decide whether they trust your intent. And that is ultimately why people believe in Googles because that company is really doing a lot of publishing. They're publishing a lot of what they're doing. So that actually leads to intent. If they just produced a product where suddenly there was a magical health application and you couldn't really figure out how it worked, again, it works until it doesn't. And sometimes that leads to bad outcomes for society. It's also a role, an application where data, the importance of data governance can really still be applied. So for example, let's say you don't know entirely where the data value will be. One aspect of application of data process is that, look, we are going to revisit this data collection one year from today or six months or whatever it is. Most companies don't do that, but that is actually where it's going to get around over time that that feature was deprecated or that intent all the six months that we spent ideating on this particular feature, we moved on to something else. And so it's important to then come back and say, okay, do we still need this? Or another way to apply data governance to what type of, we're not sure, so let's collect a lot, is to do it on a smaller scale. So perhaps you don't need to collect it from everyone but launch small experiments over time that are timed. We do this a lot in the browser, Firefox and Firefox Lite. And so we'll set out at the outset and say, we're gonna do this for six months and then we're gonna come back and revisit this if this is actually important. And we're not going to launch to 100% of our users, we're gonna start with 1% or whatever. And then we're gonna tweak it as we go along. It also goes back to, I can answer that, is having a product-based approach to how you try to solve any problem. So trying to understand really what it is you're trying to solve in this, even in the healthcare feature or healthcare product, what problem you're trying to solve and work from there. And then trying to feed that problem or solve the problem with the hypothetical kind of data that you need or the models that you're looking for and work from there. I still think it goes back to trying to solve a problem instead of just collecting everything. And then hopefully something magically comes out. I don't think it works like that. There is, I saw a couple of hands. One is in the second row over here. Someone wants to bring a mic. And then there is a hand in the back after that. Thanks for the talk, guys. I just, from what I take away is that there seems to be a trade-off between having better data and having more data. I don't know if that's true, it might not exist. My question is that across Silicon Marine Big Tech you've got Google, Amazon, and Facebook and where do you think they lie on having better data or are they just all focused on collecting more data and just don't care about the quality? What do you think is the ranking on it? Any thoughts? I think that a lot of these companies are hiring for people who do privacy-minded work now. You can see the job descriptions and you can see the way that they're trying to navigate this problem. So I think they're thinking about it. They might not be thinking about it the exact same way that we are, but they are absolutely hiring for people who think this way. So it suggests to me that they care and that they're starting to think about this in a way that really affects the way that they do business. A lot of jobs, just so we're clear. There is a question in the back. My name is Raghu and I'm a startup there. We actually have users location in the north because we need to show the user what's the mobile network function there. And while we're close to in six countries right now, the paradox is that, and shamelessly what happens is that you have this legal genre of writing everything and I myself as a co-founder you don't understand all the linguistics aspects of that. So apologies to actual user. What your idea is on showing something very simple. So practically what happens is that when you install an app, an Android app or an iOS app, you ask those six or seven questions or permissions and by default you grant them. If you don't grant them, you are penalized because the app doesn't work well for you. And then you have this huge legal thing to cover with yourself. So I think our dilemma is that how do you show it in a simple way and yet you actually are legally. Yeah, I'll take that. So first of all, there's no black and white description of this is the way the privacy policy must be done. And actually, and we've done research into our own products. So on the main Firefox landing page where you could go and download Firefox, we did a test and we saw only one. So there's a hyperlink to the privacy policy and then you download the product. Only 1% of users clicked on the hyperlink. So that's great for those 1% of users. We're happy they found the link. But then what about the rest of the 99%? They might care about privacy too, but that means we are not doing our job effectively if we're not able to give them the information in the time that they want it. Because let's be real, you're excited about downloading a new product or a new app. I don't know if you're reading every line of that 15 page terms and conditions and privacy policy and it's too much information in the beginning. It actually makes more sense maybe when you're in the experience of the product, companies that will take permissions instead of take all the permissions at the outset, but rather ask you in the moment, hey, by the way, would you like me, would you like to share your location data? Then we can provide this customized feature and then you can make a choice in that moment and say yes or no. That we think is the better way and we love when we see companies doing, taking that approach and it's called sort of in-context notices or in-time notices rather than companies that take their approach of, oh, well, we put it all in our privacy policy. So an example, so in Firefox after we did that test, what we did was when the browser opens, the second tab, there's a homepage and then the second tab now is the privacy policy and we know that the percentage of people who engage with that has jumped. So we think we're doing something right for the people who care, it's easier. We also did something where we use expanders, like people don't read A to Z, especially in that new onboarding experience and so we just have bullets and we tried to make it very short and we use expanders so that people can find what they're interested in easily. So I think there's a lot of product experience and product design that could be applied to the data collection. I think we have a question from Arnold. Yes, so Shikant asks, how does one push for lean data practices in a policy slash regulatory environment which enforces data maximization and calls it empowerment? Let's read the first one again, let's take it in pieces. How does one push for lean data practices in a policy slash regulatory environment which enforces data maximization and calls it empowerment? Sure, absolutely. Like I think the answer in many ways is, I think lies in something, but that both Stan and Rebecca already mentioned is a part of that talk, which is that a lot of the principles behind lean data practices are fundamentally principles that are up to companies to follow. And in fact, not just companies, but for governments, for civil society, and for any agency or entity that deals with data to follow. And regardless of the environment and what the incentives in the environment necessarily align for, I think we've seen sufficient evidence in India that when it comes to lean data practices, consumer trust and privacy are valid competitive advantages by which people make choices between services. And the number of people who make this choice is increasing almost every day and the awareness around these issues is also increasing. So I guess what people would then have a choice between is A, going along with the environment and doing what is fashionable at a given moment versus B, going and building long-term consumer trust via long-term practices that showcase that entity believes privacy is a fundamental right. And I think we are seeing evidence that people will start caring a lot more about B than they do about A in the long run. And I think yes, it's up to the entities to choose what they'd like to do. So WhatsApp is piloting payments in India. You can use WhatsApp to send money to someone in India. It's trial, it's limited to 1 million users right now. But a lot of us have it and have used it. And when you use WhatsApp, you're told that your messages are end-to-end encrypted, which means that only you and the recipient have access to the message. But when you send a payment through WhatsApp, that's not true anymore. And that's partly architectural. Obviously, the sender and the receiver are aware of the fact that a payment has been made. But UPI uses the email scheme of having a domain and a username at a domain. And if you look at the number of parties that get access to this knowledge of a transaction being made, the sender and the receiver, the UPI ID providers on both sides, the sender's ID provider, the receiver's ID provider, then the bank that provided that ID apart from the app. So the app on both sides knows it, where the transactions happen. The bank that provides the ID on the sender's side and the receiver's side knows it. That may not be the same bank at which the account is held, because UPI is an interoperable protocol. So now you've got sender and receiver, app on both sides, ID providers on both sides, and bank on both sides, and NPCI, which is the intermediary body through which all of this happens. So you've suddenly gone from a private conversation between two people to what? Seven parties now? And nowhere in the terms and conditions are you told about what the data retention policies of all these various entities is. And so that's one. That's what she can means by data maximization, that a lot of these parties are encouraged to collect this data with no clear terms, and that's the law, that's a requirement. It's part of the architecture. It's enforced through various mechanisms, whether it's law of our coercion or oligopoly. Like, I mean, I just think that, which is why we completely agree with you, right? And we think that at least that's the reason why India needs a strong data protection law as soon as possible, because in order to even begin understanding such processes, setting standards for what kind of disclosures company need to give, and also for users to be able to hold these seven to eight parties to account, saying, what data do you have about me? What can you do with that data or not? There needs to be a legal framework that empowers users to do that in the first place, and that's something that I think even at the highest levels of the country, like, there is some agreement that it needs to happen, but we just think that it needs to happen as quickly as possible, because it's only when that framework exists that both in instances of WhatsApp, as well as with other sort of like, sort of collection of data by the government, as well as private entities, can users truly hold them to account? And I'll just add, I still think, yes, absolutely, we need law, and in the law there will also be provisions for data minimization and passing on responsibility so that, for example, WhatsApp can't just say, oh, actually, those were my processors. WhatsApp isn't responsible, and those are responsible for managing those data processors in the background. But on the product side, this is something that can be solved, and this is where we think more focus should be, that's for the company to invest on the product side to explain to people. It doesn't have to be boring. You don't have to have several paragraphs explaining it. It can simply be something that says like, how does this work? People aren't dumb. A lot of people, the people who care will click on the link and that is an easier way for them to know, rather than you have to go hunt through every single company, and then there are policies. There's a question in the middle. So I don't have a question, but I have a comment to add from a consumer's point of view. Is a lot of times consumers tend to believe that new products are priceless? I mean, they just have to use it. So I think as consumers, we have to learn to let go. So from, for example, from your point of view, I have a cautionary date. So one of my, somebody I knew, came up with a new app for investment advice. They would take a lot of your data and advise what products you should buy from the market. And I went through their privacy policy and I knew the co-founders, so I asked questions. And within the third of the fourth message, the person says, you know, dude, even I haven't read my privacy policy so carefully. And that's a bad place to be. And as a consumer, then you should be okay to let go of such products. Not every product is inevitable. Rebecca, this question is to you. So you talked about the three points, right? Like better data leads to less time wasted, improved productivity and better models. And my takeaway from that was, like it's primarily around usage of data. I'm still wondering if, let's say, there are 100 attributes out there and I still identify these are the five core ones that I need and I'm parked them out. But I still, and I use them for, you know, training my models for my data scientists to use. They can still make an argument for collecting the remaining 95 attributes and parking them for use in future, as you were talking about. What's the argument against that? That's still lean data practice or lean data usage, not clean data collection. So I still think that as long as you're clear with your intent about what you're trying to do, we collect data, we collect petabytes of data and we have collected a lot of data in the browser product that we have an intent for it and we want to use it to answer those types of questions. And I think that when you're seeing a lot of these new regulations coming out and I'm not the expert on this so I hope you too can speak to this. But this sort of secondary use of thinking about data collected under one intent and then using that data set for something that users are not aware about, that's starting to become something that people are starting to point fingers at. So again, I think the most important takeaway from that is if you think you're going to use this data for more purposes like that, just disclose it. Just be transparent about it. If you intend to use it to train a model, you already just described how you have a thought about what you want to do for it with that data. So just write that down. You don't have to hide that you're trying to think about questions that you have already identified are going to be down the line in the future but you should acknowledge that that's an intent that you want to use this data for. It might just not be immediate. But again, to me, it just seems more like the whole point about these fairness, accountability and transparency practices. It's really about making sure that the context under which you're trying to collect data that actually does affect the way you are collecting it. And it's really about making sure that you're collecting as much of that information about why you wanted to collect that data at the time of collection because times change. So what you just said doesn't really, I guess it still feels like it's in the spirit of what we're trying to communicate. Can you just hold on? There's a mic coming your way. So if you expose the reason why you are collecting a data, wouldn't it kill your, I mean, wouldn't it kill the comparative edge that you have? That's one way of looking at it. I mean, if I say, hey, this is the reason why I'm collecting the data. Second thing would be the exploratory approach, which is what I think Akshay talked about, where I have no problem. I just want to see if there's really anything. So I just keep collecting data randomly and then search for a problem, which is what I think seems to be the approach. I don't know. There's two outcomes there. One is that you're exposing yourself to a lot of risk in this world that we now live in. And the other problem is that you might actually just start wasting all of your time trying to find something that you can solve with this particular set of data. And on top of all of that, I still think that what we've been trying to communicate is that the competitive advantage is actually the user experience. The practices that we're providing and that we're advocating for have these sort of downstream benefits for the data science as well. But it also leads to a world where you can build a better user experience for your users. Because I think at the end of the day, people don't want to not use technology. People don't want to not have these smart applications, these smart products. They just don't want to feel like they're being taken advantage of. And really, it's about building that trust with your users. So again, if you are doing something like in this application that you just described, I think if you're open and honest with your intent about yes, this is actually a research application. If you don't have a product idea, we're thinking about the idea of building a research foundation that could then deliver a product idea. That is something that I think a person would be more likely to believe in. And actually Apple does this. If you look at their products, you can opt into a research arm of the health kit that they offer. And they're using that data for research purposes and they're not promising product delivery out of that. But they will use it for informing the research that then might lead into product. But again, they're open with their intent and they're transparent about what they're doing. I think it goes back to that Warren Buffett quote. You know, your reputation is easily lost in five minutes. And so I think any company that is secretly collecting data because they think that there's a competitive advantage and that they cannot reveal it and be honest with their users is kidding themselves. And you see this like in voice and speech. Major companies have gotten publicly called by you were listening to my recordings and my family in my home. That's gonna catch up one day. So we have time for a couple more questions and then we'll wrap up. There's one in the back here, one in the middle and then one in the third row. Yeah, hi. So question probably a different nuance, but since there is so much concern about the privacy, is there a framework or a rating agency? Now we're talking about regulatory regulator. So I don't like give a, you have this better bureau, I've stamped given for small businesses, there's who to govern for large businesses, right? So we hear about breaking up some large corporations that that's a different discussion, but on privacy term, is there any some rating? If I have a star, it means that your data is, I'm saying that at the GDPR level, users don't know. GDPR is a big thing to read about, right? But from a very simplistic way, how do I say that? Yeah, this is trusted. So there's a lot of civil society organizations that publish industry reports and they'll focus on different sectors. Maybe we'll know some examples. They look at companies. So for example, the electronic, the EFF, the electronic frontier foundation, they publish a report, I think it's called who's got your back and they will look at companies transparency reports and kind of give you an analysis. And then data protection authorities in countries where there is data protection regulation, they are the ones who are in the best position to really enforce and make sure Yeah, and we also have Dvij, who's now a Mozilla fellow with us, but who actually I think at the Center for Internet and Society here about a year and a half to two years ago published a report called Ranking Digital Rights that actually attempts to do exactly that, go through privacy policies, look at practices and try to rate companies across a wide range of sort of providers on their practices when it comes to digital rights broadly of privacy of which was a key one. Sounds like you have a new business idea. No, it's just about to say. There was a question in the front. Hello, yeah, I'm a user of Android and Android as a platform gives applications a framework for asking users to kind of permissions the applications require. So why can't we use a similar idea in Firefox for plugins? Suppose in Firefox, you can have some sort of validation and framework for plugins to follow or suggest to users. These are the kind of data that I'm going to collect. So users who are installing those plugins will have some sort of feeling of comfort that, okay, I'm okay to install this particular plugin. So it's just an idea. So we actually have, so plugins, he's referring to extensions in Firefox. Firefox is, you could customize it in any way you want. You can go and download any number of extensions that are made by third parties. And so two things, we have a lot of policies and guidelines for developers and privacy and security are actually really core to a lot of that. And then we have permissions. So the browser always drops down permissions in a certain format so that you know and you can trust this is your client Firefox talking to you and not someone impersonating it. So location is an example. If you go to a website and says, do you want to share your location with this website? That's the browser acting as your user agent asking you to make a choice. And so we have a series of permissions just like on Apple or iOS where there's permissions that people have kind of become accustomed to that we show when you install an extension and it would walk you through the permission would drop so that you are aware what it has access to. So we do that. So I think we have time for one more question and then we'll check in online. I think there's one in the back. Yeah, so from a software developer point of view, we work in major corporations but the problem is we don't understand the legal ramifications or the legal side of things because it's total legalese. The terminology itself is like so thing that we just share the legal department to do something. For example, just applying a patent itself. We just work on only the innovation, the idea and the everything, the concept. We don't really look into the how the patent is applied or who is the authority and how the process is because we are not the domain experts. We just leave it to the legal department. They will say yes or no, binary answer that is comfortable with us and we take it. So how do we as developers kind of pitch lean data practices to major corporations legal team provided they're open to understanding and they're open to suggestion. I mean, they will take it as a suggestion and they will evaluate and then only they will make the decisions but how do I pitch it to the legal department of my corporation? Do you want to take this? And then I'll add. So one of the things, I've gone to a couple of privacy engineering conferences this year and one of the things that has been pretty striking to me is how a lot of companies are starting to really think about privacy engineering in their own context. And what has also sort of struck me is how many software developers really want to participate in something that makes them feel like they're meaningfully contributing to this kind of effort. And I think that especially in Silicon Valley you're seeing a lot of software workers frankly walk away from their jobs. So they're not, they don't want to work there anymore, right? And I think that it's striking to me because it seems like you as the software engineer actually have a lot of power because you are the person that has to build the products that the companies want to be able to ship itself. And so you have power in a way that I don't think people have had in a while. Like they're seeing all of these things in Silicon Valley and in Seattle where tech workers are starting to walk out. They don't want to work on these types of products that are used in this way that they don't understand. So I think that it might not be a single person making a single pitch to a single executive but it might be that as a class of people as workers you might just be able to work together and sort of say like, we're not gonna do our jobs until you give us some way to participate meaningfully in something that makes us feel like we're making the world a better place. I also think at the end of the day a lawyer has to come ask an engineer how does this work? Explain to me how this data works. And so it's another place where the developer or the engineer can really explain in detail and give the context because that other person is sort of the intermediary to write it down. Our practice is we always send it back the developers or the engineer, whoever is closest to it must be able to read it and say yes, this makes sense. Because then they're the ones who can come raise their hand and say actually we've made a change. Doesn't work like this anymore. Or no, you actually got it wrong. And we have a lot of back and forth. So I do think the developers and the engineers who are writing this code have a lot of say, yeah. So I think we've run out of time for any more questions in the room. Just wanna check online. Any final thoughts from the stage? If I could just add to that one last point. We're building a new browser called Firefox Lite in my group and every single product feature, when we're planning that product feature, we try to understand what are the performance signals that we're looking to see if that is a good feature or not or what data would be needed to make that feature valid or to enable that feature. And everything has to be documented for the product management team, the engineering group and also a data science team that is looking at collecting data. And they're all separate, they're three separate groups but they have to agree and understand what it is we're trying to do. And if any of those groups doesn't understand they can raise the question but it's all documented in what we're doing. And having that open discussion, I think is the beginning of that where making sure that everyone on the team understands how that, what is the purpose of that feature? And it's easy to ask because you can always go back to what problem are we trying to solve? And I know you may hear that a lot, I say that a lot too, but if we can all understand what is that business problem or technical problem we're trying to solve and start there and build into the product and put the signals and the metrics into that and then understand and map that all the way out. It makes the job easier for everybody, right? And so if you can establish that kind of working environment, that's the way to do it. And I know we've talked a lot about how the user experience of this kind of transparency gives you competitive edge but I also just want to reiterate that as somebody who hires data scientists as somebody who talks to a lot of people who want to get to data science a lot of people that I talk to now it's kind of shocking to me. Data science in a lot of ways has become sort of associated with a lot of not great things. And I have talked to a lot of people who really want to work in a place where they feel good about what they're doing. So you also have a competitive advantage in the labor market for data scientists because the really good ones don't want to be part of something that they feel is bad. So if you can give this kind of story to the people who want to work in your companies you will have a bigger pool of the labor market to be able to hire from. And again, it's very expensive and very costly to hire data scientists so you want as much of an edge as possible. And I think a lot of people really want to work in a place where they're proud of the products that they use. So thank you all for coming tonight after your day, it's Wednesday evening maybe you battle traffic, I don't know if it's raining outside so we really appreciate your time. Thank you for everyone who joined online. Thank you Haskeek for hosting us. This lecture is recorded, it's going to be available at airmozilla.org. If you want to see it later the slides will also be posted there. And then we showed the website is leandatapractices.com. And on there is a public toolkit, there's several resources in Google Docs if you want to just implement it in your own company you can fork the documents and make them your own. So thank you, good night. Thank you. Thank you.