 The first question that I really wanted to kind of invite those of you who would like to jump in and share your thoughts on is to go back to basics and think about what in your context in each of your companies is really non-personal data because we know there are two things. There's this anonymized data sets and then there are data sets that never had to do with personal information in the first place. So I mean I'm happy for anyone to jump in or I'm happy to call on one of you but the first kind of question for discussion I wanted to ask is kind of what proportion of your data sets do you think fit this description and what is the kinds of data you're thinking of as non-personal data. So happy for you to jump in otherwise invite any of you to come in. Our case is very peculiar. Please just quickly ask you all to introduce yourself as you speak because we wanted to save time and not do that. Yeah, sure, sure. Kailash, I had technology at Zerodha. We are a stockbroker. We're a stockbroking firm. So I was saying that our case is quite peculiar because we are heavily regulated and I mean individually and separately regulated by multiple entities. And NPD is at least an hour and a bit vague because all data is sensitive financial data tied to people and identities. All the data that we collect, every single transaction that we generate, it's all personal and connected to people. So apart from that, there could be certain inferred data that could be generated from this transactional data could be investment patterns or financial habits, etc, which we don't do right now. So in our case specifically, we don't really have any NPD. Every bit of data that we have is highly sensitive and personal. Chaitanya, I don't know if you want to jump in. Yeah, I can jump in now. So hi guys, I'm Chaitanya. I am the Chief Innovation Officer for Ozontal Systems. We are a cloud telephony company. So I think we are in the same boat like Kailash being in the telephony space. We are also very heavily regulated. We have TRI, DOT, blah, blah, everybody wants to control call data and I think there are enough rules out there already protecting call data and call detail records, for example. And for example, if you want to find out some information about a particular call, you will have to go and get a warrant and then we'll have to give out the data. And we also have rules saying that we have to store the data for almost like a year and again, depending on different verticals, like if it's healthcare call centers, they have certain other requirements. If it's financial call centers, they have other requirements. So that way, we are in a similar boat like Kailash in the sense that everything is, every call is personal and every call data that we store is personal. But obviously we do have, if you want to look at the non-personal data, the kind of data that we can provide is call volumes, peak over calls, call drops that happen, what kind of call drops happen. So multiple things which can be done. We have started doing some of that for our own internal purposes. So to track, for example, at six o'clock, Airtel to reliance calls drop in Bangalore is one example, let's say, if you guys have been trying to reach what a phone to Airtel, you get a call drop. And we sort of can predict to a certain level based on previous history, what kind of drops can happen. So that is, I would say, come under non-personal data. But most of our data we store comes under personal. I'm going to quickly ask whether Kranti Tanmay and Srijana want to jump in and we were just on our first point of trying to identify actually what falls under this regulation practically speaking seems more appropriate for certain companies than others. But yeah, Srijana Tanmay or Kranti, any of you wanted to weigh in. Hey folks, this is Tanmay. I'm the founder CEO at Hasura. Hasura is a cloud infrastructure company. So that means that we can kind of similar to maybe the way you think about AWS or Akamai, similar to that. Even in the US, we operate in India though, we have a customer in India. So this regulation is kind of for this intended regulation is interesting to us. I think it's very confusing for us because we do of course have a certain amount of data that we have, but then there's a lot of data that we have that actually belongs to our customers. And if it comes to kind of cloud infrastructure, it's really hard to understand what PII data or non-PII data is like, for example, the configuration of your infrastructure is that NPD, is that personal data, right? It's very, very confusing. It's uniquely anonymized because it's somebody's configuration, but it's kind of like, imagine if you will GitHub, right, and we have somebody source code now. So all of the data that you have is all of people source code. Now, you know, what is that kind of, what is that data, right? So, so in that sense it's a little bit, I say a little bit mildly, but it's a little bit confusing. So yeah, that's, that's me. This is Kranti. I'm AVP Data Science at Swiggy. Firstly, we collect a lot of data from transaction to location and try to make insights out of it or make customer experience better. If even a single identifier exists, even if it is anonymized, I am not sure how it can be non-personal. At least I'm not able to fathom that. It's very easy to reconstruct. So, like, even forget, even identify, even if location data exists, I think it's easily reconstructable. So I'm not a loss, except for very high level aggregate data, it makes sense, right? But then it begs the question, let's say I'm actually sharing the data on where do people like to eat or where do people like to order from? Isn't that violating the privacy of the restaurant entity? Who is a different party altogether? And how do you, like, this doesn't even cover three-party platforms, right? It only talks about customer and customer privacy, but then there is enterprise privacy. And we don't, essentially, we are answerable to them too. So I don't know how that comes into play. And to top all of that, it's a competitive world out there. And let's say we decide to make even as simple as things like volumes public. And you're not a traded company yet. The implications can be pretty drastic on investment or any other thing. So yeah, I don't understand how this is going to go forward, frankly. Thanks, Panti. And I'm going to quickly ask Srijana to jump in, as well, if you're back, Srijana, and given your vantage point ported Google and with Wadohani, if you had anything to share? I think Srijana may have some audio trouble with trying to sort it out RN. Maybe if you can carry on and then come back to Srijana when she's back on. We'll go ahead and do that. So I mean, first of all, I guess as the moderator of the debate was a non-techie. This is a great kind of confirmation for someone who's been reading the report. I think when it came out, the big question was really what's left? Like, what does this actually cover? The good thing is that the report is fairly clear when it talks about mixed data sets, right? Because it says that where PII and non-personally identifiable information are intricately connected, it falls under the personal data protection bill, which makes sense. I think the tricky, and then the question is really, you know, to the point of inferred data that Kailash was talking about, we know that inferred data also is clearly personally identifiable information. So that's covered off. And so it really leaves what it does leave is some kind of the kind of volume data that we were talking about, you know, that Ozontail might be producing those kinds of basically business intelligence. And again, there the report at the risk of educating the audience talks about intellectual property and business intelligence is intellectual property. So again, we have that kind of figured regulatory wise. So really, the question is, if IP deals with it, personal data deals with it, and then you have some competition regulations to deal with people not abusing and like hoarding data. What is really the objective here? Be there is me and I think so I think that is, I can't ask the question. See, I've spoken to some committee members. We need to see the second version of the report. Be there is it may even if it comes in and we're unclear, I think the second, the thing that it opens up is, even if the majority of data is bi, it seems like where when that data becomes anonymized you become subject to this regulation, because the whole point of personal data laws is that you anonymize the data you improve customer privacy and then you can kind of use that data for innovation. And this is where I kind of wanted to ask all of you on the question of anonymization because clearly when you're holding this kind of personal data. There is particular types of anonymization you're implementing if it's not too difficult to talk about specifically is it database level anonymization, or are you guys doing some kind of credential management. Maybe first we can just talk about technically how do you handle anonymization and then come back to how would, you know, depending on the kind of anonymization how does this framework work, hit you. Again, I'm opening it up for discussion but happy to kind of call on anyone in case we have coordination problems. Yeah, I can go first for this. So the type of anonymization, we generally don't anonymize by default. Again, the what we do is we take the telephone lines from the telecom operators, and we provide the software to the customers so the customers own the data we don't even be proud that is protected the database level itself. So we can't access the customers data. But some of the customers ask us to anonymize on their behalf while storing it in the database itself, they don't want to let us say for example, some of the numbers that we call. So that we handle by using masking. So we just mask the numbers and some other examples are for example credit card numbers have to be masked as per PC ideas and certain rules that are again, taking care of a lot of these issues. So those are the things that we manage, but our anonymization is mostly at a customer level. So if the customer asks us, we anonymize. We generally do it with masking, not a very high high techie kind of thing but general masking procedures, well known algorithms are there we mask it and so that it can be recovered kind of thing. The other thing that we do is we anonymize through aggregation. So if it's an aggregate data, then again, pinpointing at an individual level is hard. So for example, if I get called drops, 20% call drops happen. So that's an aggregate. So we don't actually store the information that this person called X person called Y person and that call dropped. That information is not stored. We sort of aggregate and store the aggregate information. So aggregation and masking are the two ways, and we generally do it at the customer level. I can go next. I think from our point of view, again, as a cloud infrastructure vendor, we don't anonymize data. There's no use case for it. We do encrypt data, of course, for when it is stored but anonymizing or masking is not really something that we need to use case that we don't. So when we use data information from our customers that we need to use for our internal kind of analytics, product design purposes, infrastructure purposes, right if you want to decide for example how to handle people's loads in a particular way and stuff like that right so that we can get analysis or even pointed analysis where we make the data available internally to the company and anonymized way so that that data is not from a customer protection point of view. So, I would consider that I think I would that is critical business intelligence that is, of course, something that we can't share. So, so yeah, that's that's kind of our that's kind of the place where an organization happens are technically where it happens is it happens in, in some kind of an ETL process where data kind of goes from a source of truth to another anonymized sync of data and the process is done on route. So, so that's, I mean technically that's how it happens. I can go next. I think, I think the question itself and the way anonymized data and anonymization within the context of companies is referenced what the paper is very ambiguous. It makes it sound like anonymizing data and processing it mining it is a standard practice. I don't think so. So, we don't anonymize any data because it doesn't fit into the framework of our related existence so we, let's say there's a database with sensitive data, and the risk management team needs some aggregate statistics on which financial instruments seem slightly more risky or you know there's some sort of an anomalous activity, it could just be a simple report, which, which could be 10 numbers which is generated out of the out of a database with a billion rows, there's no concept of anonymization you don't copy data into an anonymous data set and then run queries. I think it is more relevant in the case cases of companies where data is mined and passed around to external parties you know external silos not really within companies, or maybe there are companies that are so large that there are divisions that are pulled off from each other but for any reasonably medium sized to large company, it's, it's very ambiguous and it's not binary do you anonymize a lot. So, like I said, we don't because the statistics that we need to derive can be derived in real time from the source database and nothing no data gets passed, passed around outside the silos that are regulated. Also just to maybe add to that, I think the main confusion here is that even if the whatever the use case, whatever the technical implementation of anonymization is the data that is anonymized and used. That is definitely kind of that is, that is business data which is IP right like that is not data that can be shared with other people because then what are you doing as a company right so that is I think that is the that is another layer of confusion on top of this. Yeah, and I guess just to kind of tease that out a little bit right and to make sense of how this report is dealing with it but very honestly even the personal data protection bill which in any case will apply to all of your companies and there we do talk about anonymized data as being the boundary for when you have certain obligations. So I guess I have two questions and happy to hear all of your thoughts whoever would like to jump in. The first one really is, is it primarily I mean is this a distinction between data that is in flow or data that is actually, you know, that you have downloaded on your systems are which are sitting on your systems is that the distinction. In terms of when you say, you know, I'm able to compile this information directly from, you know, whenever I need to. Is that that's more of a yes or no, just to clarify for the audience but I think the deeper question then becomes, is it that all of those processes are not hard, you know, kind of terrible word use but hard coding anonymization by default right now is that because there is no regulatory requirement currently in India because I know in other countries for instance, if you touch any PI at all and if it gets downloaded through your data flows into your systems, there is an automatic requirement for anonymization so that I think is that just a factor of regulation. The first question. Sorry, that was about the technicality of anonymization wasn't it. I was just saying is that because it's not it's kind of data in transit and that's why you don't think. Not necessarily it's not because certain things are certain way it's just because that is how it's evolved organically we've built our systems in such a way that no personal data has to be downloaded onto anyone systems, even within the company so all these aggregates etc reporting happens on a dashboard and nobody really gets access to the underlying sensitive data so that's how we've engineered it and pretty sure that's how most companies do it. You don't really have to download an excel sheet and you know do analysis. The second second bit again. It's connected to this. We don't. It's not because there are, there's a lack of regulations in fact for us, there is actually a 54 point circular from our markets regulator which kind of indirectly touches upon these things. So, we are already compliant. It's not because there's a lack of regulations is just because it how it's just because how we've structured it. There is no need to pass sensitive data out of the system via API file download so it. Yeah, didn't have to build it. Yeah, I'd also like to add on to that first point of as a, as a data infrastructure company this concept of data at rest and data and transit is something that is very interesting in the lines that are blurring right like when you say data and transit, literally mean like bytes on the wire, because data and transit is becoming a concept that's increasingly hard to define like data in a queue is maybe data and transit but it's also data at rest. It's also data that can be queried while it's in the queue, while it's in transit right so there's a, the notion of data and transit and data at rest is is not necessarily useful when we think about when we think about an organization because the lines are getting blurred. That is one the other thing I think is also is that the use cases for anonymization are, are, are, are, you know, just to just echo the previous point is that they're kind of, they're not very large they're very few, like there's not too many use cases for anonymizing data perhaps like you know internally sharing and stuff like that. So, I think oftentimes to a non technical audience perhaps. And if you haven't thought about it right and is this notion of anonymization and data protection is data protection and anonymization are slightly independent things right data protection is when you want to protect that data, the way it's accessed the way it's encrypted. It gets access to it and stuff like that. And then anonymization is a separate kind of thing that happens when you make parts of that data available to other stakeholders right so. So, so to basically frame it as broadly saying anonymization will protect data. That's not a, that's not a useful statement because nobody is actually anonymizing data right like if you think about just the unit or particular business function and not no other functions outside, outside that function right or a small startup or a very kind of vertical function, then, then there's no anonymization happening in the first place but there's a lot of data protection that's happening. So, anonymization for security and organization for protecting customers or protecting their data or for adhering to regulations is a regulatory requirements is a little bit of like there's a little bit of conflation that could happen there so I just wanted to just want to point that out. Yeah, and, I mean I'll just give a grant your titania an opportunity to jump in in case you wanted to but I did want to kind of take that point forward. Did you want to share anything. So, the personal data, especially for the customers as well as the these and stuff we do anonymize. And so think about it as any data which is accessible by a system or a person. It's not customer facing right is essentially anonymized on the key attributes which are traditionally thought as private, which include name, phone number and stuff right so nobody can actually visually see that data. So that does exist and I think that does bring in some degree of common. I would say basic privacy which is required. Because yeah, these are powerful data elements and nobody should be the view to that. So that is, I think, yeah, those are at play already, but to answer the other question, visit at a database level or a record level I believe it is at a database level. But yeah, I haven't worked on them personally. The other question you asked was, I couldn't get the grasp of those things so I'll refrain from that. I am also with Kyla here. Most of the data we don't download there's no need for us to download in Excel and deal with that it's just dashboards that we show up, but but there is one small difference right I mean some companies. Let's select Swiggy direct if I interact with Swiggy. I am that's a relationship between me and Swiggy. But for us, I am not a direct, no customer directly deals with me. They deal with Swiggy if Swiggy is my customer, then I deal only with Swiggy. So that's how it was so there it's a little bit different. Like that data, the calls that happen that data belongs to Swiggy. And it's up to Swiggy to define their anonymization techniques. But for me, all I need to do is to make sure that if Swiggy is my customer big basket is my customer XYZ is my customer they that data, my engineers wouldn't know which data belongs to whom. That's pretty much it. Yeah, and you know, it's always interesting for me to have these conversations because basically just shows the kind of disconnected mental models right. And I think that's why it's important to have these conversations, but I do think what I think the level of mistrust we're seeing currently is probably because of that disconnect people don't know whether it's generally that's the way it is or they just being told that's the way it is because it's convenient. To bring it back, I mean, to the two points that I wanted to kind of respond and kind of take forward, one of which was just on this idea of obviously, you know, two of you are B2B businesses in some sense. So it's a very different kind of conception and under the personal data, Bill you would think of yourself as a data process or you're not a data fiduciary. You're doing it as third party on behalf of someone else and obviously that means your liability and risk management measures change right and we all kind of know that. But I think, and nowhere you know are we conflating anonymization and data protection clearly there's a range of cryptographic techniques and anonymization is one within that arsenal, and then you have data security. You're not getting into the wider free. I think the part where data protection and non personal data intersect is this idea of non anonymization and purely because the mental model there is, you know, when you capture information at that point you're saying to yourself you can do, you know, perform some techniques to anonymize that you could sit on a visit or masks it, and you know things like that so even if you're not, you know, the minute the data hits your system. You know the kind of really kind of cutting edge for people pushing the privacy envelope as citizen advocates would be saying, you know, even at that level you even at the level of credentials, it's all anonymized. And I guess, you know, potentially it's possible it's going to cost you a lot more money to do that, but we'll get to that later. And I guess that just to take it forward from that point right like, even in these kinds of situations where you, you might be and now the the danger is that even if you're dealing with data that is not at all anonymized. It's just kind of personal data switching around then very clearly you don't have to care about the NPD framework and I think that's where this gets complicated because now you kind of have the option of going under one framework or the other and as a company you would take a call would I rather anonymize and have these costs or not anonymize. So that's the next part of the discussion I wanted others to. But before we get there. I think we might have Strujana back against. I don't know if you can hear us at all. But thanks so much and I'll call this to everyone. I think I should have checked this didn't want to disrupt the flow. But, you know, really glad to be here and I just got in a few minutes back. Yeah, and I'd love to kind of hear your perspective, you know, because we did want to grill you both as Google and as Wadwani. You're being very civil at this point of time. Very quickly kind of you know the two things we were talking about and it's a great kind of time to have you in the conversation is the first part of the conversation to summarize is primarily a lot of the information people are working with is PII or personally and information so then really the question is what is left over for an NPD framework and I think the two things that stood out for us are really kind of business intelligence analytics type of data aggregate intelligence, which people really didn't feel should felt like you know that's intellectual property. I don't know if you have a different view based on what you see the UC kinds of data that you think, Okay, this is not IP this is not personal data this is something else. Right, right, so definitely would like to chime in thanks so much Malika for, you know, beating this for me. So, you know, just by design right a lot of data of interest like you called up right pertains to objects that are relevant to society right and given that they are bound to some real humans who are connected to the data, and what you could say the non personal aspect right really depends on the specificity of those associations. Even though you mentioned that aggregates or non personal right like you can just take something like aggregate case counts, they have been derived from some real health conditions and real people and narrow the scope course right so the more identifying it becomes right and the externalities are like very localized. And, you know, same thing with even something that you think very non personal like the water table in an area. It might seem like a pure NPD, but it's also associated with some real human and the smaller focus, the more would be and even if at the point of capturing the data, the data set was not connected to humans, there is in fact a real world link and there are those, you know, those kind of impact. Now what I'm just trying to say here is that this whole notion of NPD like you called out already lies on this whole spectrum right and the specificity spectrum is of course some data pertaining to just like universal physical reaction to the astronomy chemical reactions and so on. But, as you know other panelists would have pointed out, most of what we are dealing in businesses and the corporate world pertains to data that's actually closely tied to human activity. And so even aggregates are dangerous in fact in the sense like dangerous meaning, they do have impact so we have to deal with them carefully. So given that I do feel like a separate, you know, taking NPD apart is probably not the ideal thing to do. Yeah. And that's on one of the aspects I also wanted I was just following the conversation you had with Kranti and others on the anonymization bit. And even there I feel like you know I'm here when I'm thinking of anonymization. I'm just thinking interpreting it as any kind of privacy safe transformation this could be aggregation, some kind of sub selection perturbation of data. You know, even though you said like about the organizations I couldn't probably speak about the specifics of the organizations I work for, but as others have called out right folks try to comply to the minimal extent that's required for, you know, legal compliance and you know which is often just removal of the PI fields right and some big MNCs do take it more seriously because of the repercussions to their own businesses they don't want their users stealing this data. So they are like a lot more careful about how they safeguard it. And again, here, I just wanted to point out that the multiple ways to quantify this privacy, you know, these, these notions of K anonymity, L diversity and like you called out right, the closeness all of these right do permit attacks, and even the more robust information theoretic approaches such as differential privacy, even those, you know, degrade in the presence of secondary data set. So there's correlated data set outside right so that can impact your notion of privacy so all I want to just call out is like you know all of this anonymization that worked on a lot of these projects, which involve linking records, and pretty much right if you get a data set, if you actually have access to a data set even without the PI kind of fields, there's a huge risk of de anonymization or re-identification. And, and other thing I wanted to call out is, it's not just that we need to worry about that extreme case where you identify a person and the specific information pertaining to person, there is a negative impact and cost associated even with just like narrowing that scope like say if someone were to know that like you know women in a particular age group in a particular colony have like say 80% cancer prevalence right, that itself can cause, I mean no one's being personally identified here, but there is a risk associated with it. So, this is something I think, I mean I don't want to go on a lot, but I just feel like you know, I don't know what other pieces have missed but just feel like we need to have this notion of, you know, rate or like a risk card for data and people need to be more informed about those things. Thank you so much, I'm really glad we got you back because I think you've done two or three fantastic things, which I couldn't personally myself, not being technologist and a panelist. So I think the first thing that I will flag for those of us who might have missed it is actually we see a level of disagreement in the panel in terms of whether anonymization is relevant or not. And if I look at that as a spectrum, I am hearing voices that say actually it's not relevant at all because it's never touches our systems. But then I think there are people who are seriously saying even if it doesn't touches your system even if the you know tree falls in the forest and no one sees it, it's still fallen in the forest you know to use an inappropriate maybe metaphor but analogy but so I guess the question is really you know, where on the spectrum of anonymization or pseudonymization basic techniques are people really following when they're setting up their systems itself, even if they are not, you know, even if they're providing a service to a third party and then themselves not downloading or looking at that data. Is there some kind of basic guardrails. I'm going to park that question there because I think that falls very much within the realm of a personal data protection bill type of question. So maybe has the equal have a spectral kind of fight about that later, but I actually wanted to pick up on something that is written or descended on which is kind of, even if it's aggregates, even if it's kind of, you know, metadata. It has this kind of privacy denuding implication. And for those of you who don't know you know there's this famous Princeton review case in the US that everyone talked about like four years ago, where they were targeting areas where Asians lived to sell Princeton review material, and it did not profile a single person but they were like this is an Asian ghetto will just price it like $20 higher and because we know that Asians care this much about getting into Princeton or whatever. And that's a classic, you know, community level harm. I guess what that opens up for this discussion and I'd love to open it up to all the panelists is when you are. One thing that the report does say is that it's going to ask companies to share metadata, right so leave anonymization aside or not, but where there is metadata, it's going to allow one set up something like a metadata directory. I'm curious to understand what you think about that. And then the second one is allow certain companies to access other companies metadata. I think we've touched on the privacy risks. I'm just very curious one to hear and this is the first question I'll ask all of you, do you, what do you think are the risks that come from sharing metadata, are there any risks and would you do that, given the current metadata is sitting on and again I maybe I'll kind of start with Regina because she's not been with us for the first half now and then really open it up to anyone else who wants to kind of jump in. Thanks Malvika I'll keep this short and you know, I would just say at the outset right that any information is power it's of value right and the line between metadata and data is very fuzzy. So what you even as a technologist as an engineer, you know, they think setup can be coded into the fields or could be coded in key value pairs, there's a lot of, you know, choices we have to make when we are even coming up with a data model. And so metadata is also information sharing metadata will reveal a lot of proprietary assets of an organization, not the actual data itself but it can reveal a lot about the organization's process tools, and it can also be used to streamline the organization attacks. Right, so for instance if you know that particular hospital is capturing certain biometrics and whatever right, there's a lot you could do with it so. Yeah, also, also it does seem to me to add to at this point that metadata is probably IP though right, like I would consider all our metadata to be IP. So I'm not, I'm not sure I can imagine what metadata would not be IP, like maybe the use case are far too narrow. Actually that's an open question just to quickly answer the legal question, but in case I can add value that's it's on the fence you know not all metadata is considered IP but yeah back to anyone else who wants to share. I would agree with the same thing I would say that like, first of all who would decide what metadata is IP and what metadata is not IP would a lawyer come and sit in our office and look at metadata and then make a call that hey this metadata is not IP man you just need to open it up. How do you do that I mean and what metadata I'm saying I can just say that I'm not, this is my metadata I'm just storing these three columns. I mean how is it. Yeah we'll be in quotes all day long basically right like I will I will lawyer up and battle for every single piece of my metadata being IP right like I mean. Absolutely. Every, every company would lawyer up right and then. I agree with you. Yeah. No, like there's no end or there's no benefit of this right then in that case. For sure I mean if I look if we look at any of our processes also just the things that we capture to make our service better. That's metadata in my view. And why would I open that up. And how is it going to help the ultimate goal of doing good for society, if even if that can be proved. What's the next step I'm completely flabbergasted here completely. I plus one the flabbergastedness. I agree metadata can reveal a lot of sensitive things and I think what is slightly more troubling even more troubling is that this is meant to be a machine readable. I think that's what the paper says it's meant to be a machine readable directory that all businesses published. I think we're a malicious actor to just write about that goes and crawls all of these metadata directory so called directories and collate valuable business information they'll know that this company stores by metric info this company so something else something else it's it just opens up. It brings a huge attack vector to every single company that is plugged into the NPD framework. And in addition to whatever is that we just talked about. And not to mention the attack vector from the marketing people I mean where once your media is open with the data that that you have the marketing in the sales case to start bombarding you email I have this data man just buy it from us kind of thing. Built with us stores information I mean tracks information about what your website is built on and and then they sell that information to people saying that hey this particular business is using XYZ software and then use that mailing list to sort of bombard people to say that by the software kind of thing so this is one way in which metadata can be used. So, first of all, maintaining a metadata is not a easy task. Right. There are companies which are coming up with tooling for metadata itself. And especially if your data is growing at a very large space and evolving maintenance itself is a headache. Now. So, yeah, and if that changes and your metadata now you come into the world of versioning of metadata. Right and whatever you are making it machine readable it changes every day. How does anybody make a value out of it so for just talking from a practical aspect. Right it's a time sink. It's a security risk. Right and it's non maintainable. So there can be parts of the data which can be useful for the larger economy and larger social good, just providing metadata for all the data you have is just not maintainable. Yeah, well, I mean it's fascinating to hear you guys talk about this because that was exactly my understanding so I mean we don't think all metadata is IP but then it is kind of the companies intelligence currently it's not something that's what people talk about in the US you know how it would develop its true litigation like one company says to another, but I think that's why this framework is so significant. With the house is the people who are writing this report they are based in Bangalore for those of you are because this report is trying to do the work of that lawyer right it's got a section section seven I think. Can you hear me okay sorry my internet connection. Thank you. Essentially what this report is trying to do is to say one businesses will must set up a metadata directory, which will be administered by the NPV authority. And then it says, when a company, you know, figures out I want this data set they can make a request to that other company to reveal that the metadata of their underlying data. And then there's a section where they say metadata and underlying data, it's a little bit unclear, so that like opens up all sorts of issues. But I guess the reasoning behind this is that it will assist competition and innovation, so that an Indian company that may not have access to a vast trove of metadata or other kinds of non personal data can now innovate and so on. It brings us very swiftly on to like the final thing that I want to discuss and we have kind of 10 minutes. I think the background premise, you know, we put all this is this idea of digital industrialization. I don't know if one of I think a committee members joining the next panel as well. And somebody I was with at the beginning of the week said, you know this is all about digital industrialization because Indian companies have suffered from not having access to particular types of non personal data. So this is India's pitch to kind of open up the data set for small Indian companies who may not have that, keeping aside all the data security and data privacy risks, I can't believe I'm saying this but keeping aside the data privacy security risks. If there was a perfectly safe, private secure way to share this NPD. I would love to, you know, hear your thoughts, do you think there's some reason you would want access to something like this as if you didn't. Yeah, just curious for your thoughts again open to all the panelists. I think, sorry, can I can I jump in. I don't know what that phrase really means because if you look around you. The last decade in India has been witnessed a huge digital revolution haven't been so hasn't it already happened and also is it not happening at a massive scale every single day. I don't know what more can happen, which will, you know, I'll try to change whatever is happening right now. And the, and the other thing was there's no, there's no, it seems like a lot of hand waving that there's there seems to be no way to quantify what exactly would be the benefits when you can't even define the core. For example, community, community data, public good, what exactly is public good. I mean, for, so it seems, it seems very vague. And really, if that was the intention is to encourage here. It's not encourage its force and mandate. But if the if it's to encourage innovation, wouldn't it be better. I'm just postulating. I'm not really qualified to have an opinion here so I'm just postulating wouldn't it be better. To incentivize companies and whoever builds up data sets to release them. There's this whole open data connected to the free and open source force software moment that has been around for decades right so and open data some of the most valuable data sets in the world are open data can be your, you know, open street mapping data or Wikipedia the knowledge graph, it's all open data, and nobody had to mandate for these things to come out so if we want this at an industrial scale in India, can there be a better way to incentivize companies, how exactly I don't really know and I encourage them to willingly come contribute data than in very questionable ways, force them to comply and when you force companies to comply it will always be, you know, people will do the minimum, minimum stuff required to comply. I don't know if such forced compliance will produce valuable enough data that will really create an innovation revolution. I don't think so. To add to that something that I feel very passionately about this idea of building businesses in India, and streamlining that right I feel like there, I feel like that is the problem statement there are many ways to solve that problem statement right like for example if he's saying that, oh, there's no small company has the amount of data that other companies have like you know maybe Facebook or Google or whatever because that data is no longer in India. I feel like that's a slightly different problem right I feel like that's the problem of like the reason why India has historically not allowed the automobile sector to and why we have such an heavy import duty on cars that are made outside because we wanted to stimulate our own automobile industry and so we said you have to manufacture cars in India. That was one way to solve the problem. The problem was not necessarily tell all companies that hey, by the way, every single thing that you know about making cars is now public. There are many ways to solve the problem of ease of business and creating a level playing field but hitting it at the roots of how businesses operate and changing that is I think it's a it's a it's a huge legal precedent it's almost like philosophically I would say it's like you're, it's like some kind of capitalism versus communism argument almost that this will descend into where where maybe this model of evolution in the innovation we don't believe in and so we should just get rid of capitalism and have a better way of doing things. That's what it seems to descend into for me if I if I if I think about it. So, so, so this alleged benefit makes does not hold any water with for me given given the amount of regulatory hurdles, there are for a small company to already do business in India. Just to take a small example if you are a company based in India and you want to take recurring payments from people outside India. That is that is hurting small businesses more today than the ease of doing that and the hurdles in doing that that is hurting the small business more than than creating a database level playing field right so so it does not compute for me. I agree with both Kailashan and my I think I think these two will be very good beer buddies with me man I mean we just sit down. I think it's very similar right I mean it doesn't make any sense I in a way I first of all that term you used I had to Google it I didn't even know what you meant and I even after Googling I couldn't figure out what it went so I'm still stuck on that but yeah I mean see we for example right I mean data is not a big problem from the perspective I mean we can always if you really want to get data you can get the data even for if you are a small company or a big company. And first of all I am not even convinced that big data actually solves the problem or whatever of AI or whatever your Google level data also has not solved a lot of problems yet. We still don't approve that actually more data will just solve this problem. It seems more like your put terms like economic good all these kind of terms like a marketing material and then okay we all sign up for it. They could as well just say what the actual which are the companies that they actually want the data from and then work with those companies rather than say put it as a regulation and then all of us have to just listen to that I would say there is some other underlying thing they want some data why don't they just come out and say hey I want this data this is the data I'm looking for and then we all agree to it and figure out what to do with it. Yeah and I think if Google and if you don't want the Google in the Facebooks of the world to have data then just stop them from operating in India do what China did right. China was very clear they said that we are not going to let our the social fabric of our country's IT be built outside China. And so now they have a tremendous amount of innovation happening in China around we chat and and their services and their social and they do that I mean that is a way of also solving the problem without hitting at the roots of every single thing that you understand about about how companies work right. I'll give just one example. So we are actually doing one project with the triple IT Hyderabad to collect speech data. So I mean obviously the government of India itself has sanctioned mighty and all have done this they want to collect a good amount of speech data for what as a data set corpus so that Indian companies can build our companies can build. So the idea here is that it will be open source, all the data that will be collected very similar to Mozilla common voice or whatever, and then you put it out there. So, when I put it out there right, I know for sure that the companies which are going to benefit out of it is going to be Google, Microsoft and Amazon, because they have the GPUs and they have the computer power to actually make sense of the data. The actual startups in India will find it very hard to use that same data to actually build something out of that. So it's sort of confusing what works or what doesn't work and this whole point of idealism of that everybody have access to data, what will we do with the data. I mean, that's something that needs to be defined. I have, I mean, I actually like really enjoyed this points all the way from Kyla and everyone, but I also have a slightly different point of view. One is that you know I lately been working on some of these health projects with the government and in general right like even these multilingual data sets data on health conditions and so on right. There's a huge value to be derived from there, if it was available right and like take and you're called out of course the Google's of the world and you know Google and the Microsoft of the world will like extract more juice, but it also gives an opportunity for the smaller players for the academicians like say in the COVID setting right there were like data was not there out there I mean of course I it's not like private parties like making silos of data. It is the government itself and you know different government organizations that did not put it all together, but not having access to data prevented a lot of things that we could have gotten right. And in fact it was involved in and the plumbing is so bad with a lot of these government agencies that it's a critical problem to solve there are indeed data sets that can contribute to societal well being and because of how our incentive system works right. You know the public good data sets are not really mean, you know not prioritized in a lot of the organizations that but having said all that I agree with what Kailash said that you know the right way to set this up should be a proper set of incentives right and it again goes back to this whole question of what are we really trying to do here right so the whole point of this PDP and the NPD framework is to extract value from the data for society while safeguarding privacy people against privacy violations and you know proprietary rights of the companies right so that is really the underlying thing. Given that we should follow this maximum of like you know first we should do know how and where we are right I'm indeed honestly worried that without standardization of the technical definitions itself without standardization of the technical process itself also right it's not just say anonymization the ideal thing would be to say that you know this is the tool you need to do and this is the very specific level of anonymization or not exactly anonymization is I think very crude term this is the specific level of privacy safety that you need to get to and in the absence of all of that right we are definitely going to have lots of arbitrary data transfers which are not necessarily going to be in the interest of the society this will be a case of some kind of chronic capitalism right and you know I just feel like the right way to really ground all of these regulations would be in terms of the actual risks and benefits see just identification is not harmful but if you identify me with a health condition there is a risk if you identify me with a particular like you know financial condition then there is a different level of risk so quantifying that risk and also you know just thinking of this whole data as a like a information group right so this is an information group which have multiple producers all the folks who are the subjects the folks who are contributing to the collection the folks who are like you know coming up with algorithms that derive extra juice so ideally we should think of this as like an information group in a like a royalties type of framework where as the value is derived from data all the producers are compensated equally and whenever there is a risk you know there should be a reversal mechanism so in the absence of all of this I think our framework is premature. Yeah and also to just I think we can make a lot of these problems more tractable by looking at specific data sets so if you want to solve a health crisis problem, then then there should be a foundation or a campaign to get that particular data set for which it will be much more easy to define and standardize these terms and to define that for this particular data set this is what is bi and what not right as opposed to a generic definition across all data sets that exist on the planet. It is much easier to scope it into a particular type of data for a particular use case and then create a foundation around it that is composed of multiple stakeholders, producers, consumers, researchers, etc. That is a much easier and more tractable way of kind of really solving this problem if it wants to be if you want to solve it. Yeah. And so I'm going to just kind of pull us all kind of pull the threads together because I see that we are on time and I think has been very good and not cutting us off the live stream. But I mean I think I heard a lot of things there and I should clarify right like I think what is interesting about this framework, especially from the legal perspective is it departs from. I can say but now I'm now I'm independent for the next few months so I can don't have any organizational constraints I think it's. In many ways it breaks with all of the theory around data regulation that we've had since the 70s since large scale computer systems were first developed. So, not quite sure where this mental model is coming from I don't think anybody would disagree with objectives like you know in a time of a crisis how do we marshal data. Obviously Fukushima happened and then all of the. I think it was the electric companies that had a location data close by and they could pick up radiation data and you know that's a strange one that came to mind when I was speaking earlier. So you can have specific instances where there's a very clearly in articulated objective I think the problem here. It's great to get from this panel that the objective is really not clear for the multitude of us who are kind of engaging with data from the regulatory side or clearly from the implementational side as well. And then I guess if you're not clear with your objectives then the whole you know the everything else just gets more and more complicated. Kind of bring it all together what I'm hearing is a if anyone on this panel decides to ever have a beer I would love to be invited to because I had some hoots on on on mute. I definitely think the best things I heard was database level playing field and this whole idea of digital communism is what I'm going to call it. What I think is is kind of brought to mind for those of you who want to do a bit of reading Graham Greenleaf is his excellent professor I love to quote everywhere. He wrote this fantastic article about the PDP Bill and he hasn't commented on the MPD Bill he sits in Australia and he said he calls it GDPR light with Chinese characteristics, which I thought was excellent. The article that he wrote about the PDP Bill some I do think the world is also watching India in a strange way and just wondering what it what we're up to. And I'd love to invite technologists like you tomorrow is the day after as the deadline to the committee is responsible even if you could just write an email to the committee, I think it would be amazing for them to have, you know, some of your thoughts are down on paper.