 My name is Pesu. I am a public policy consultant currently working with AWS's public policy team. The presentation that I'll be giving today however is only my opinion and I would like to apologise in advance for the number of times I will use the word data. So just I'll keep this short a basic overview of the two versions of the report. The first version was out in July 2020 and the second version was out in December 2020. First thing first, the definition of non-personal data. The definition of NPD under the two versions of the report say the same which is that any data which is not personal data as defined under the Personal Data Protection Bill of 2019 or data which is without any personally identifiable information. In version one we further had NPD divided into three categories with different purposes and processes for sharing. These were public NPD, community NPD and private NPD. Version two doesn't really have these explicit categories and treatments but in a sense there is some carry forward of the underlying process. Coming to NPD sharing purposes under what purposes can non-personal data be shared? So sovereign purpose was present in under version one under which data could be requested for the purpose of national security legal purposes etc. The revised framework does not include such sharing within its ambit as the existing regulations addressing sharing of data for sovereign purpose already exist. Hence the non-personal data authority which would be the authority set up under the revised framework would not adjudicate the validity of data requests made for sovereign purpose. Coming to core public interest purpose in version one data, non-personal data could have been requested for community benefits or public goods, research and innovation, policy making for better delivery of public services etc. In version two however the committee has proposed public good purpose as the only ground for mandatory sharing of NPD. This purpose can include data as useful for policy development, better delivery of public services, supporting societal objectives such as health, science, urban planning etc. Coming to economic purpose under version one data could have been requested in order to encourage competition and provide what the committee called a level playing field or encourage innovation to start up activities or for a fair monetary consideration as part of a well regulated data market. Version two however accepts that given that such data sharing already exists within voluntary data sharing agreements between companies, the framework refrains from making any recommendations from this. This was a method of development between the two versions. Coming to the ecosystem entities under the two versions of the report we have data businesses and both version one and version two have data businesses as an entity that derives economic value from either the collection or storing or processing or managing of data. Under version two data businesses would have to register after a certain threshold. These threshold parameters could be things like gross revenue, number of consumers and a lot of other things. The committee does recommend that these thresholds would be harmonized with the threshold suggested in the personal data protection bill for significant data fiduciaries. Also data businesses would be a horizontal classification and not an independent industry sector. Then we come to data trustees. The data principle or community under version one would exercise its rights through the data trustee. And this is the same in version two as well. In version one however there was a representative body through which a community could exercise their data rights and for a lot of community data this would be the corresponding government entity. Under version two data trustees have been explicitly defined as an organization which would either be a government organization or a nonprofit private organization which could be a sectioned company or society or a trust which would be responsible for the creation maintenance and sharing of data under high value data sets. What are high value data sets I will cover that further in the presentation. Data trustees would play a key role in the entire data sharing mechanism under version two. They would sort of attach the middleman in the data exchange process. Trustees can also request certain non-personal data from data custodians in data businesses to create high value data sets while data requesters can seek access to said high value data sets through the data trustees. Coming to data custodians, a data custodian is an entity that would undertake the collection, storage and processing of data and they have to keep in mind something the committee referred to as they have a duty of care to the concerned community to which the NPD pertains. The new report makes it very clear that both government and private bodies could act as data custodians and also retain the data custodian's duty of care for a given community. Now data trust is a concept that was present in version one as defined as an institutional structure for sharing a given data set as for the specified rules and protocols. In version two, however, this concept has been expunged. Then data processors were not recognised in version one explicitly whereas version two does call out data processors as any companies that process data on behalf of a data custodian. These would be cloud service providers, SaaS companies, ITES companies and others. Data processors are not obligated to share metadata belonging to their customers which are data custodians. Again, I would like to point out is a major development from the version one of the NPD report. Now the crux of our framework, the NPD sharing process. So in version one, Indian citizens and organisations would have access to the metadata about the data collected by data businesses. A request would be made to the relevant data business for the underlying data. If the business would comply then the transaction would be completed at that. If not, the request would be escalated to the non-personal data authority for evaluation. The NPD would then direct the data business to share their data if these other requests is valid and they only need access to raw data sets or at a friend which is a fair, reasonable and non-discriminatory based remuneration in case of data sets which are consular to have some sort of value add. In version two, the data sharing process is through high value data sets. Now I will come to the explanation of high value data sets but basically a high value data set would be a directory of metadata that is selected by different data businesses and there shall be open access to this metadata directly. Data requesters can avail access to these high value data sets from the data trustees. According to the new report, the data trustees should have non-discriminatory access to such data. The report does this as the granularity of the non-personal data which will be contained in the SPDs. Private entities can make data requests to other private entities to other more voluntarily sharing and are not covered under the non-personal data governance framework. Now high value data sets, basically data sharing is operationalized in the updated version of the framework through high value data sets. They did not exist as a concept in the earlier version. An HVD is a data set that is a public good and benefits the community at large. It is a data set that is a selection of metadata for public good. It has predetermined data fields and data trustees are responsible for them. Data trustees can and will request data from data custodians and businesses to create SPDs for public good purposes. And as I mentioned before, the committee has suggested the granularity of NPDs that is to be selected and what would be the different treatments for the same. Now right over non-personal data. In version 1, non-personal data was considered non-rivaled in nature and a community good. Law and factual data set selected by businesses were considered as community data and were liable to be shared at no remuneration. At points or levels where processing value add is non-trivial with respect to the value or collective contribution of the original community. Data sharing could still be mandated under a fair reasonable and non-discriminatory base generation. Subsequently, the version 1 mentioned that with increasing value add, it may just be required that the concerned data set is brought into a well regulated data market and the price is allowed to be determined by market forces. In version 2, there is explicit recognition of copyright of the data businesses over their databases since significant skill and effort is expanded in maintenance and maintenance of such databases. There is no sharing of inferred derived data including trade secrets and algorithms and analytics present under version 2. The report also provides a constitutional basis for a community right over non-personal data through article 39 of the constitution, the director's principles. According to the committee, extraction of pre-set fees from a data set would not violate the database design copyright. Similarly, trade secret protection will only be allowed if the act of compiling or processing any non-personal data leads to an inherently non-public and non-public complication of data. The committee does note that trade secret protection is unlikely to cover a proprietary right over data to prevent the eminent domain of the government over this data. There is a lot to uncover here and I'm sure as today's session progresses, we shall get into it. Now, the interface with the personal data protection bill of the NPD, the proposed non-personal data governance framework, version 1 of the report recommended the non-personal data authority to work within the framework of the personal data protection bill and in consultation with the data protection authority. However, the revised report falls for a single level national regulation which would be a non-personal data legislation to establish rights over non-personal data collected and created in India. And the recommendations of this report are supposed to become the basis of this new legislation. At this end, the committee of experts also recommends the deletion of any references to non-personal data in the personal data protection bill, including section 91 of the bill which imparts the government to direct processors and data processors and data fiduciaries to share non-personal data. As for the interface between the two frameworks, there is a lot of clarifications given in the report. Non-personal data would be under the purview of the new NPD law whenever it comes into place. Re-identified non-personal data, which is basically non-personal data which through an accident or through a processing purposes becomes, is re-identified or de-anonymized and hence, becomes public data versus personal data would be under the ambit of the data protection authority as set up by the personal data protection bill. Next, datasets that have inextricably linked personal and non-personal data will be governed by the personal data protection bill. Now, the privacy is this of, I mean, under both the versions. In version one, there was a recommendation for the NPD to work within the frameworks of the expected privacy legislation and in consultation with the DPA for anonymized personal data, the individuals to whom the data pertains had to be considered as a data principle for non-personal data. However, in version two, there is a recommendation to provide an option to opt out of data anonymization for data principles. Data custodians are required to provide a notice to the user at the time of selecting their personal data and also offer them an option to opt out of data anonymization. Effectively, they can kind of choose what framework they would like to be under the purview of the personal data protection bill or the non-personal data regulatory framework. So I shall conclude now. The key changes between both the versions of the report. One is the removal, explicit removal of business to business non-personal data sharing. The committee has recognized that sharing already happens and there is no need for a framework to kind of, you know, bring in recommendations to adjudicate that. However, with the setting up of HVDs, there are some concerns regarding the loss of competitive advantage or if any data business would have to comply at the risk of hurting their own proprietary information. Next, the creation of high-value data sets. This is a concept that did not exist in version one of the report and it's been fleshed out really well in the updated report. Third is the opt-out option from anonymization as I just referred to in the previous slide. Fourth is the recognition of data processors. The committee recognizes that data processors do not really have access to the data of their customers, which are other businesses, which are other data businesses and hence they are not required to share them in the metadata of their customers, that is data custodians with the NPDA or, you know, put it in a high-value data set. And lastly, but I think this is the most significant, is the recommendation to create a legislation, to create a law that would govern non-personal data and, I mean, everything to do with non-personal data in India since there is no such contemporary regulation or law in the world right now. So that's where I end. So my name is Sochna. I'm the founder of the Mindful AI Lab. We are a technically focused AI ethics audit and AI ethics advisory services consultancy based out of Bangalore. And we help organizations craft their responsibility strategy at all stages of their data maturity journey. So that's a little bit about us. And while I have a whole set of concerns, again, as Rishu said, you know, I'm speaking both on behalf of my organization and also on behalf of myself as a civil society stakeholder. I'll try and keep those two points of view separate. So first off, I want to start by setting the context quickly about the idea of consent and, you know, why this is something that concerns me with respect to the second version of NPD as Rishu just told us, you know, there is going to be an option to opt out of data anonymization. And then a couple of concerns that I want to bring to the forefront here around this idea of consent being sort of the sacred guardian of privacy. I think most of the panelists here would probably agree that that that's not the case that consent is broken in so many different ways that to assume that the average citizen is going to be able to give meaningful consent to the way that their data is shared or de anonymized or consumed or inferences are made from it. I think that's a very strong assumption. If the intention of the government is to sort of offer genuine privacy protection, then I think this idea of anonymization as well as consent really needs to be examined very, very closely. So I want to take a minute here to talk about the ways that consent is problematic. So in the first place, you know, the idea of consent is it's networked. So, you know, if a bunch of people who are very, very similar to me in a lot of different respects choose to share their data, and I don't. Based on those similarities, a lot of inferences can be made about me that I would rather not have made. So the fact that I withheld my consent is not as meaningful any longer in the context of the fact that other people similar to me have given consent to, you know, for their data to be used. And I think the idea of meaningful consent itself that it's impossible. So, you know, I'm a data scientist, you know, when I look at a column of data, I think I have a very good idea about the ways in which this data can add economic value and even then it's not possible for me despite my technical expertise to foresee all of the different ways that this data can be used. So when I give my consent about this data being used, it's always ideally should be time limited and purpose limited and what I mean by that is, you know, I cannot foresee all of the ways that my data might be used so when I give consent it's for a specific purpose. And with this idea of the creation of high value data sets. It also means that the set of purposes for which I gave consent to my data being used. Once that high value data set has been created and there are other parties consuming or making inferences from that data. The power of my consent essentially got eroded. So those are a couple of ideas that I wanted to surface as a civil society stakeholder in the context of this whole opting out of anonymization idea so you know in terms of protection maybe there needs to be sort of stricter warnings you know in the spirit of tobacco is injurious to your health, you know in the same spirit, do not let your data be denonymized. That's just a thought that I'm putting out there for all of us to consider but definitely the idea of consent as being a sufficient protection for data and privacy I think that idea is actually quite weak. I think that I want to talk about and this is, you know, I speak also on behalf of the Mindful Air Lab and I do this. This is the idea of auditability right the ability for the practices and processes around data to be audited for there to be audit trails. So I want to call out that the second version of NPD literally mentions the word audit in just two instances and one of them is in the context of sort of global regulatory frameworks right. So the idea of auditability is incredibly important for us to have any kind of functioning accountability mechanisms oversight mechanisms right since the idea of audits is in the context of organizations, you know doing internal or self audits if you will. And this is a very problematic idea so if we take a step back and look at the ways that in the last, you know, three or four years particularly we have seen instances of machine learning bias being uncovered. And these have almost always come from independent third party investigations, you know, journalists, researchers in the academia. So, for those of you who are not familiar with the idea of bias and machine learning algorithms I'm just going to take a quick moment to give you some examples. So, you know, there are instances where machine learning algorithms have proven to be biased, because of people's race because of the color of their skin because of their gender. Right, so one of the probably the most famous instances is, you know, IBM's facial recognition software, you know, which was very accurate for white people slightly less accurate for white women and then substantially less accurate for black people in general even less accurate for women and so on and so forth right. So there are many many ways that potential biases can creep into machine learning algorithms both because the underlying data that's used for training them could be biased and unrepresentative of the true communities that that data is drawn from, but also because the algorithm itself might learn biased representations despite the training data being fairly balanced. So, what this means in practice is that there needs to be a process for organizations to both do internal audits and uncover biases like this but also for trusted third parties to be able to come in and audit this. And the second version of NPD does not place sufficient emphasis on the practices or the policies or the accountability mechanisms that would ensure that such audits happen that organizations follow the best practices that are required of them in order to protect privacy right. So, this whole aspect of auditability and the lack of emphasis on it is something that's deeply deeply troubling from our point of view. The second thing that I want to talk about is the idea of accountability mechanisms and oversight mechanisms that need to be baked in so the second version of NPD does talk about, you know, both the first and the second versions do talk about protecting community rights in the data but then, you know, the protection of rights does not happen without an accountability mechanism without an oversight mechanism and there isn't clarity around how these mechanisms are going to be institutionalized how they're going to be operationalized and this is a big gap right. So this is something that, you know that that again troubles us deeply as an organization. So, in terms of, you know, so there's also this related idea of co designing of accountability mechanisms rather than you know sort of externally imposed accountability mechanisms so in our view, the best oversight or accountability mechanisms are those that are co designed by all stakeholders and not just by a narrow set of stakeholders. In the process right so again the NPD version to does not do justice to any of these ideas. So, the third and the last thing that I wanted to talk about is this idea of operationalizing community rights. I do have some interesting examples across the world where people are trying to do this so for example, cities like Amsterdam and Helsinki have a public AI register, you know, which is, which is a way for civil society stakeholders to participate on an ongoing process in terms of reviewing what are the different AI applications of different machine learning models that are deployed by the government in various contexts right. So this these kind of public AI registers serve as the beginning of the springboard for an oversight mechanism or accountability mechanism. So it would be good to see, you know, the ideas like this like public stakeholders or equivalent ideas around how we can ensure that civil society stakeholders can participate in a meaningful way it's it's one thing to have, you know, feedback from from technical experts or a consultation process with them. But once the law goes into effect and high value data sets become available for consumption, and people start actually building applications on top of this. So this is going to be the ongoing audit or review or oversight process. So I think that's something that as as a community and as as industry stakeholders we should all think about very very carefully. So those pretty much where the ideas that I wanted to share and I would absolutely love to hear thoughts from you know the audience or the panelists about any of these ideas and questions are most welcome. Arvind here, I have a question to Sushant or a point to make I think I like what you said about the need for audits and I really believe that that should be strong and governed in the regulatory framework. I, I'm also thinking, maybe the audit, there should be a wider gambit for the audit, including ethical AI so there are a lot of ways data can be used, and we are running a kind of a social experiment now, which many of us have not opted in. And we are realizing that a lot of results are probably not what is actually good for the public good so there are some some changes that happen are good privately but then you see that over a period of time this is actually taking us back over so I believe there should also be some thing to figure out that is there is there a net positive for the society with a certain set of data collection. What's your view in general about audits, ambit of audit on what all are audited. Absolutely Arvind thank you so much for flagging that so I would love to call out the distinction between a few different kinds of audits that would be relevant in this context right. So public interest algorithm audits of the kind that I talked about earlier that uncover potential sources of biases and machine learning algorithms from a public interest point of view right. So that's one kind of audit, and then the other kind of audit is from a compliance perspective you could think of it as more of an adversarial audit where, you know, pretty much like how it's in the financial sector for example, you have an authorized set of agencies or organizations that you know, evaluate and audit you from a compliance perspective. And then there is also a middle ground for trusted third party audits for organizations that want to be proactive about making their AI and their machine learning practices more responsible right and that's kind of where they come in, you know, where it's both sort of technical and also an ethical audit and this is another thing that's very close to my heart that I want to talk about which is that there isn't a dichotomy between, you know, ethical practices and technical best practices. So there is a point where sort of failure to follow technical best practices can, you know, can be considered negligent bordering on criminal depending on the gravity of the situation and then, you know, a lot of what we talk about as ethical best practices for AI are really sort of adopting technical best practices and solutions that have already been proposed in the literature or adopted by the rest of the industry. So it's a question of sort of figuring out how do you pull your organization closer to those best practice benchmarks. And so I think, you know, audits can help in all of these three different ways to to your point Irvind.