 To give a brief background of the stock, at FIFTEL we have created a new track to focus on digital infrastructures, essentially giant architectures that are being built, whether it could be in the public space or even in the private space, that are data trust, that are information systems and this whole business vertical of information has ballooned over the last five years. So we are trying to bring in computing principles on how these systems need to be built to ensure that individuals have control over their data and this series essentially focuses on talking about issues like consent, privacy, differential privacy, encryptions like new analytical systems which are privacy safe where people are not invading someone's private space but are still using their information to do analyze systems. So part of the series we have Anand Venkannarayanan who is going to be talking about consent and essentially is going to show us how cryptography is important for building information infrastructures and he is going to look at how the digital locker system could have ideally been built by comparing it with Amazon S3 and also he is going to touch a bit on differential privacy. Anand Venkannarayanan is an industry veteran in the space of cyber security, he has been being involved in a lot of reviews of existing digital infrastructures for the past three years and he is going to share some of his learnings with us today. Anand over to you. Okay, thank you everyone. I just want to basically give an overview on what this talk is about. I will try to finish it as much as possible in 40 minutes so this is essentially the break up. The first five minutes we are going to try to finish just a very top level introduction on digital infrastructure then we are going to go over how the use of crypto has evolved in digital infrastructures and how the last bit is specific on how crypto and consent are pretty much related to each other. That is basically how the flow of the talk is going to be. So the first part is this, so now most of us are aware of the fact what infrastructure means but on the physical side. So I would not go over it but the fact is that it is like a base layer on which everything is built. So in the real world you can think of like hospitals, sports, transports, people call them as infrastructure but they are on the physical realm. And the same thing applies even on the financial side. So for instance people may come back and talk to you about there is a central bank, there is a commercial bank, there is an ADM space, there is NPCI, there is a whole bunch of routing protocols and so on and so on. So these are also what we call as examples of infrastructure but on the financial side. And you can apply the same concept and paradigm on the digital space and you can say that what exactly is digital infrastructure. So you can think of how digital infrastructure as a concept is like maybe five, six years old but it has been around for a while. So you can think of how when they came up with the AWS stuff there were a lot of things that came in initially. When they said we are offering infrastructure as a service. I mean why rent service, why rent hardware, why rent switches, why buy switches. We're just going to offer you infrastructure as it is and that they used to call as infrastructure as a service. Then what companies started doing was they started offering software services on top of that and they said oh this is software as a service. And then there is nowadays this whole concept about platform as a service which is like the entire thing take it to whatever you want. So you can see the iteration levels are exceptionally fast in this area but having said that it is still digital infrastructure. In the sense that these are things that are so basic that they just power a lot of these systems and you just have to understand that they are almost the same realm of physical infrastructure. So what exactly is digital infrastructure? You can think about this infrastructure in a very broad way by saying that they are not any different and fundamentally same as information processing infrastructures. That is what they are. They just process a lot and lots and lots of information and they provide all the framework tools and the platforms to crunch and emit data. But that's basically what they are. And so if you look at any infrastructure or some kind of a universe, you talk about what are the atomic operations that you would typically find in this universe. So in the information infrastructure, you would actually have only three macro operations and they are all very related and cyclic to each other. That is what we call as a stored rate analysis. In the sense that you just create one and then the other and it spins faster and faster and it creates more of itself. So these are basically what we call as the atomic operations in the data world. So then I'm going to go over a bit and trace down the evolution of how the atomic operations of this infrastructure work and also try to please crypto evolution into these infrastructures. So if you go back and look at the previous one, we talked about what is called as a stored creative process and you just have to go back to March 2006 and trace back what S3 did. And it is a newer kind of a data storage and it basically came back and said look, you can store a lot of data in an infrastructure that we have built for you. You don't have to basically create this infrastructure yourself, but you can just use and rent and for yourself. That's basically what they said on March 2006. They said here is a new kind of data storage. And slightly later on August 2006, they introduced what we call as the processing layer and they said, well, we have this stuff called EC2. You can do whatever computing you want. You can rent it out. You can do information that you had stored on S3. That's basically what the initial pitch was about Amazon EC2. And on August 2008, which is three years from when they started the entire notion of on-rent available digital infrastructure, they said you can bring your own data and the primary motivation for that is, well, S3 is a new kind of storage. The existing storage was all based on hard disks. And NFS used to be pretty popular then. It's somewhat less popular now, but even then. So what they basically said is look, if all you have is information, you can just bring the information in block storage and we have it here. And so that's EBS. So if you look at how this entire thing maps back to the atomic operations in the universe, so EBS and S3 belong to the store part and creating more information and analyzing them belong to the EC2 compute part. That's basically what they have done so far until 2008. And you look back again, what they did. I mean, this is again, public information, pretty much a very small operation. The infrastructure is not a fantastic upgrade. And it's still evolving. And on 2011, which is five years later after they launched S3, they brought in something very interesting. It is called what encryptor we call is bring your own keys. So historically, if you look at the public digital infrastructure evolution, it's basically about, well, there is data and you can store it, but don't ask for encryption. So the initial version of the S3 was all about what we call as client encryption, which is you want to store something, you're worried about security, you just interpret yourself and push it to the block. That's basically how it used to work. Now, obviously that is not something that is very scalable and local. And there had been a lot of feedback about, well, we wanted it on your side, but not on my side. So what they actually did on 2011, which is five years later after they started S3, was by saying, well, server side encryption is there and you're going to allow you to do what you want to do. And so the interesting point here is that what it allowed is, when you do a put operation on an S3 bucket, you can just basically specify your ES25, that's basically what it's all about. Okay, then came EBS encryption. This is saying that, well, you have a hard disk and you have a lot of data in the hard disk. And here is another encryption layer for storing EBS on that. And again, you go back and look at what they did during this time, 2014. They introduced full-fledged key management. And there is a primary reason for key management. The interaction of key management in general. Because no matter what you do about keys, whether they are public or private or a symmetric or asymmetric, key management is always a bit of a pain. So people basically said, well, you're offering encryption, but where is the key management? And it used to be off-loaded to the users at one point of time. So it came back and said, well, we're offering you key management as a service. And once they offered key management as a service, they allowed only three services, as you can see, which were the most popular then of just EBS, S3 and Redshift. So these three services together and the key management is what really changed the game by saying, we're extremely serious players. Okay, that's the way in which I could read this, okay? And then came the interesting part and they said, well, not only we can manage your key cycle, key life cycle, encryption key life cycle, you can also bring your own keys with the key management service. So this is all done from 2006 when they started, 2016, about 10 year evolution, you look at it. They basically are practically done with inventing encryption keys and management for millions of companies and corporations, right? That's the evolution of how a public digital infrastructure, even though we call it public, it's still privately owned, morphed itself from an organization that just sold books to doing what it does today. Okay, that's the evolution of the crypto part. So now what it really meant is it fundamentally changed the game to do something like this. So in an earlier model when we started off, we basically said, it's all they do is analyze, create, and store. But by doing all these changes, even though none of the changes were there about 10 years ago when they started off, over a slow period of time, they brought in the middle layer, which is crypto, and kind of embedded it into almost every service you can think of, right? So in a way, if you are looking at an evolution of how digital infrastructure that everyone uses work, this is how it looks today, right? And so what they basically did was they said, well, there are a lot of services that create data, there are a lot of services that analyze data, there is a lot of services that store data. And so this is the cycle which kind of powered Amazon into what it does today, right? Okay, the interesting part in this whole thing is all about why did they do what they did, right? And it is important to put some historical context into why did they did what they did in terms of why crypto is important. The basic idea that we had always followed in the physical world is about control. Like for instance, you can just draw a small hedge around your house. You can just put maybe a wall, 13 feet wall is pretty popular in India. And so you can just basically come back and say that, well, I've drawn a big wall around my place and I've secured whatever and no one can come in. There's security guards and so on and so on. But in reality, this is actually confusing storage with ownership and with control. So in the dataverse, these three things are very different things. What we mean by saying that these three things are very different things is that it really doesn't matter where you store your data. It really matters who has control over it. When we say control, it's about can you see what is in it? That's really what we mean by control. Can you change what is in it? That's really what we mean by control. So the fundamental control operations with respect to data are read and write. And if you have control over that, and I think it means that it really doesn't matter whether you're storing it in your hard disk or you're storing it in your neighbor's house. And so that's really a breakup of how the industry works. In the past, most of it is all stored in a single organization. The fixed perimeter, those perimeters are now gone. And when those perimeters are gone, the only way in which you can kind of say that I'm still in control of what I store is crypto. And that comes because of the control part. And that is why it is important to basically navigate and show you how the S3 and Amazon folks evolved into what they are doing. Because they could not have done what they have done if they have not recognized as fundamental back and added crypto as the layer in the middle which kind of hooked up into all the services. Yeah, right. And so what it really means is that when you basically take out what is all in one place, which is data, and then it's always in your pocket, you have control, you have ownership. It is also stored in the machines that you control. And do whatever you want with it. What it fundamentally means is that you had a lot of trust on whatever you had built. But the moment you disaggregated and put it across in various different places, what you actually have got is distrust. So what this means is that we have a very nice template. It's called an emergent property in a sense that the property emerges out of complicated interactions among these three vectors. Okay, so that's really what we mean by saying emergent property. And how does this emergent property impact business and impact everyone else is because it presents a big problem in a sense that if you're good in processing data, and let's say you have a very AML company, you are doing something, and it's not your job to run storage. It's just not your job to run compute servers. Your job is to run a whole bunch of other things. And that's what you're good at. So this is basically saying that, look, if you're good at one thing, you just focus on one thing, but allow other, I mean, leverage other players that are good at other things to use it. And so the hard problem with the whole thing is, I mean, also known as a data processing system. And so how do you ensure that the other guy is not, you know, taking out your data and doing your own thing, which is one of the things that people always had about AWS. So the only way in which you can do that is by doing encryption and that's basically the distrust part that comes in. So the way in which they had solved this entire problem is by saying that, look, I know you don't trust us. I mean, it's still, you are free to store it the way in which you want if you trust us, but let us say you don't. So we often do all this wonderful mediation services that kind of turns distrust into somewhat a limited amount of trust. And so that's the job of crypto. So if you understand how the talk has gone so far, it's fundamentally about a lot of players with a lot of their own problems and their own opinions and trying to maximize value about an infrastructure that one player has built, even though you don't necessarily trust the player. This is basically the interplay of all these forces. And the only way in which you can build that is by building a technology layer to some level and say that, yeah, this is how it is. We are mediating all these parties, different concerns by creating this layer called crypto. That's the evolutionary path. If you look at most of the public platforms that you see today which are gigantic and used by many have taken particularly when there is a free market or there's a market force which says that it's not mandated. You're not mandated by a government holding a gun on your head to use that, but it's just basically free market playing it out with various parties having their own opinions and how they're marketable. That's basically the trend that you've seen so far. And so far we have talked about the enterprise space. And this is a space that is typically dominated by businesses. And in this space, yeah, sure, crypto key management used to be a hard problem, but we have just shown how it got resolved over the time on the enterprise space. But having said that, in the past, one of the reasons why crypto is not really used a lot is because we always have a problem with crypto piece. How do you manage? What's the life cycle? What gets rotated? What gets recycled and so on and so on. And so this is one of the reasons why it was not used a lot before and who has used a lot before where people who could afford it. But this is really not the case anymore on 2020. I mean, that's the reality that we have seen all along. You can see how AWS built a key management solution called KMS. You can see how Google Cloud built their own key management systems. And you can see how Shikor built their own wall for solution for recycling keys. I mean, the wall does nothing but store and manage secrets. It's all about life cycle managing of your keys. And so this is one part of the enterprise side. But if you look at it, something interesting has also happened on the retail side. You have these two players. You have a lot of these password managers, last pass, one password, Bitwarden, who have been trying to do in reality secrets management, which I call as a kind of key. I mean, a password is just kind of a key for retail folks. I mean, people like you and me. And if you look at what they've actually done, is that in the past, if you had ever seen a UX of a product that does key management at the retail side, it's kind of very demanding. Like the GPG instance, you had to basically create a right of command to open as a cell, key create and copied over, then embedded in some folder and then do a whole bunch of things. None of this you would see in the modern password managers. In fact, what they've actually done is that they have simplified the UX to a large level and say that all you have to remember is just one password, which is the master password. And that's about it. So if you look at what they've actually done on the background is that they use a PKDBF tool or a whole bunch of algorithms around it. And they generate a AES 2506 key. And that key is then encrypted using this password and that key is then used for storing other passwords and so on and so on. And it is kind of also interesting what Bitbot did in a sense that they said, well, this is all over crypto and it is open source. You can go look at it. And it's fully API compatible. So in general, you can see an explosion of these solutions just not on the enterprise side, but also on the consumer side. And the typical cybersecurity advice that everyone gives it to their dad and their dog nowadays is, well, just store it on a password manager. I mean, whether it is paid or unpaid, that's a debatable thing. But that's basically where the trend so far has looked. Right. Okay. And here is the other interesting example. From where team management used to be a very arcane exercise. We're also seeing an instance of P2E encryption where we know for a fact that WhatsApp uses ETV. We know for a fact that the protocol is pretty complicated. And we also know that most of it is so seamless that people don't even understand that on an average, how many keys they manage when they talk to their family members using WhatsApp. I mean, so this is the signal protocols DHRachet algorithms that they use. And you can see those red lines. I mean, the red boxes, I will not go much deep into explaining every one of that. But you can just see how many keys they are. I mean, the ones that are ending with keys are keys. So there's RK, there's CK, there's Rookies, and there's a chain key. And then there is all of them are rotated, ratcheted, back and forth. All of this done in such a transparent manner that none of the people here don't even know about they're doing key management with WhatsApp. I mean, so we have seen a historical trend just suddenly reversing very fast in a span of like three or four years. Okay. So that's basically the thing that you have to look at it and say, wow. Am I exchanging 100 keys a day? I mean, the answer is yes. And the other interesting thing about these crypto and the storage and all the information stacks are that most of them are open source. For instance, you take history from example, you can add, the history protocol is kind of well known. It kind of became de facto. So there are quite a few players there who provide their own version of S3. In the sense that you can just take the AWS CLI and the AWS services and just change the endpoint. And from S3.something, Amazon AWS.com to XYZ.com and it kind of works seamlessly. So I know for a fact that some of them also offer encryption, double encryption. So one of the players who does this is Minio. So you can just go and look at Minio and they offer quite a bit on top of S3. And it's natively compatible. It has a whole bunch of use cases which are remarkably different compared to what S3 can offer. Then there is also this stuff about SIF and Gluster and Xenophase and so on and so on. So all of them offer encryption. All of them offer encryption natively. And so then the question that you really have to ask is if they're open source and if they offer crypto and they offer all of this for free, I mean, how the hell do they make money? The answer, of course, is that they're pretty good in what they're doing which is efficiency, I mean, optionality and trust. So this is basically the lay of the land. It may be just look at what happened in the last two, three years. That's basically where it is. Yeah. All right. We're now moving to the third part of the talk. I mean, hopefully I'm still on time. Yeah, sure. I'm almost on time. The third part of the talk, which is about a digital public locker. I mean assets stands in India and how it compares with some of the stuff that we have been talking about. Yeah. I mean, so the digital, the public locker as a concept itself is not a very strange thing. We've been using them from time immemorial ever since the railways introduced this thing called cloakrooms. So what you do in the cloakroom is you just basically go to a place, you just store your stuff and extra languages and you just come back and pick them up. And usually these public cloakrooms in India offer their own lock and keys, but you're also free to bring your own lock and key, which is what I mean my parents used to do ever since when I was young, like 25 years ago, they used to carry their own lock and keys. So what it offers is this, which is you get storage, you get safe storage where you may probably not get to work about things and you also bring, take your own lock and key. So this is the public locker as a construct in India and it's been there for a very, very long time. And then you also have bank lockers as an example. And if you notice any bank lockers, you would always notice that there are actually two keys. The one key is where the banker, the next key is with you. And of course, you're not allowed to bring your own lock because it's just too complicated to operate on. So the base idea here is that it's a very simple thing. I mean, why do you basically not have one key and just give it to the banker? Or why do you not have just one key and just leave it to the owner? This is again, it's not very complex to say what it is. I mean, if you have only one key to your bank locker, you're one of the bankers, and the banker is one of them. If you only have one key to this locker and you may probably lose it somewhere. So the base idea is that you just need two people. To basically open the lock unless until you get a locksmith to just basically bar it apart. That's the essence even of the bank locker. In the sense that there is a trust factor involved and they kind of said we will solve the trust factor by doing a key partition, which is a cryptography term by saying there is a key and it is divided into two people and only when both of them apply at the same time, it kind of works. So the problem is kind of universal. And since there is public early available stuff, I mean, even on the page of like bank locker, there is a trust problem. And you solve the trust problem using either bring your own lock and keys or do partition keys where you just put the same key for a certain people and then use it. Now, the most interesting thing that you would see in India, in most of the public digital infrastructure, which I call digital locker as an example, is that it lacks this idea about bring your own keys. And why is that? So here is the problem that typically people tend to have. I mean, would you lose your encryption key? Then what happens? So in the case of Bitward and all that kind of stuff, it's a risk that you take in the sense that you lose an encryption key. Well, I'm sorry, it's done. I mean, there is nothing much you can do about it. So people typically tend to have some kind of backups, like even an AWS, if you lose some of the CM keys, the master keys, you're done. I mean, what can you do about it? So that is the crux of the problem. And the crux of the problem is look, if you lose the encryption key, you're done. So you're just going to store it in someone else's place without having an encryption key. And so what it really means is, and what it really means is that you actually lose control of it. I mean, literally speaking. And then we can also argue about the fact that, look, you sometimes want to store with encryption, but you also want to share information. So how do you actually do both of it? So then the overall ground in India is let's not do crypto. Let's just create distal artifact, which prove that you consented to share. But what uses consent when the information that you're shared is the third party? And you just basically did a legal fiction by saying that, look, I'm giving it you for this purpose and you're not supposed to use it for any other purpose. A classical example is when you just take a Xerox copy of your ID card and you say, for the purpose of this, who even gives the product? I mean, you can use it for whatever, but someone can always do something else. But then the question then is, if this is a problem that existed, how do you actually solve this problem in an alternate reality? Is it even a solvable problem? So how do you go about doing it? Because on one hand, we're seeing the one encryption on storage using bringer on keys. On the other hand, we're also seeing that we want to share information. So how do you basically bridge these two different problems? So we have quite a few schemes and some of those schemes are interesting because on the cryptographic side, they are interesting mathematical properties. So for instance, the typical way in which people think about encryption is that it's an all or nothing. By saying that it's either you encrypt or you don't encrypt. But that is not how in reality things work. We also have a different class of algorithms called as the property preserving encryption algorithms, which fundamentally says that, look, if they preserve property among texts. For instance, let's say you have a plain text A, say number, which says 25. And then you have another plain text B, which is another number, 50. And the relationship between these two numbers is 25 is of course less than 50. Then what you do is you encrypt number 25 using an algorithm and you encrypt number 50 using another algorithm. The same property still works for the encrypted 25 and 50. So that is what it's called property preserving. In the sense that if you normally apply encryption, the standard AES 256 encryption on both the plain text, you can't make sense out of this. I mean, there is no guarantee about the operator property being preserved. But on the contrary, if you're using a PPE, then you basically get the property preserved. So what it really means is that you have a class of algorithms that preserves a class of property. Some class of algorithm preserve less than, some proper, some class preserves equivalent, some class preserves greater than, some class preserves a bit of both. And so on and so on. So what it really offers you is that it offers you a trade-off between information leakage by giving up some privacy versus the convenience of sharing without an all or nothing deal. But having said that, remember, you're still making some privacy losses because it's amenable to repeated queries. But it is still better than what I would say is share everything model with no encryption. That's basically how you create PPEs as. And the place where PPEs get very interesting is when you have a big document which you can strip it up into end feeds. Let's say you have some kind of an ID document which has some security or it could be your pan number or it could be your tax, I mean, or BF, whatever. I mean, it's a security which you don't want to share. And then you have name and date of birth and address which are three other feeds which you're okay somewhat to share. So what you can actually do with PPE is you can apply differential encryption on this document on these different fields. So you can basically come back and say, well, I want to share some information about myself to do something. But I will apply full encryption on the secret ID because that is something that I don't want people to know about. And a name, I don't care. So no encryption on it. And well, date of birth, I don't want to share, but I'm okay to share a number which says what's my age. So then you can apply a transformation of the date of birth and then you can come back and say, well, then I would apply a PPE algorithm on the age part. And then on address, you can choose anything. So what this really means is that you don't have to treat encryption as an all of nothing thing. I mean, that's fantastic. You can just basically, you have a lot of little norms on whatever you want to share on a particular document. You can just pick and choose what you want based on what you think is appropriate for you. So that, what it really means, right? I mean, so you come back and ask, what does PPE really offer then? Is that it offers what we call as informational self-determination in the sense that you, the sharer, has a lot of control on what is being shared, how it is being shared, and also determine the PPE and the encryption algorithm. And as you can see, it is also composable. So when you say composable, it means you can keep applying transformations on some of the fields and then apply Kitty on top of it. So that is one thing. So then, so this is basically the broad spectrum thing of what PPE offers, right? And is everything fantastic then with PPE? No, they're not, right? So this is basically people have been getting, people have been trying to use the PPE model on health reports, which of course is something that is happening in India with the national health stack and all the other stuff. So the biggest problem with health records has always been that, look, what is that you're sharing? I mean, epidemiologists and all the other guys want a lot of information about people and health records, not to do a lot of effective public health management. So what happens if you apply a PPE on those records and they actually run an experiment on it with various record types and how it is? So what you see here is the auxiliary attribute, which is, I mean, imagine a health record there, you had a column called primary payer. And what they try to do is they try to extract this attribute using a PPE, I mean, and these attributes are introducing PPEs and they try to extract it using an extraction attack, which is all about repeated query. There's a whole MATLAB attack, you can do on that. With PPEs and all. So the accuracy part merely shows that the higher the accuracy, the more they're able to extract information back to the plain text. That's what it really means, even though it is PPE. And so it requires an attack model. The attack model fundamentally is about to get queries, you have a lot of time to and so on and so on. And so you can just see the patient died, it's like a yes or no. So you get lots of good hits with bullions, gets harder and harder with numbers and so on and so on. So that's basically the extraction attack stuff on PPEs. You can do, this also tells you about the other interesting part, which is the other attributes and the time it took them to basically figure out what is the actual value by combining the things. So, but having said all this, you just have to look at it and say, well, is this good enough for you? The answer is well, it is probably better than what you have today. That's, that is how you do it. The next part, which is what we will be covering is on differential privacy. So what does differential privacy really mean is that it basically allows you to retain your privacy, even though statistical, at least on a statistical database, people can figure out the information of people who contributed to it. Okay. So for instance, so think about this very differently, right? There is a statistical database of PII, Personal Identifiable Information, which had name, age, sex, marriage date, and so on and so on. And so someone basically drew and wrote down aggregate statistics from the data base and just released it on the public domain for whatever reason. And the key question then is, can you reconstruct the aggregate information from the information and reconstruct it to the actual person? That is, that is what we are interested in, right? So an example of, so it's, we call it PII differential, sigma and differential is because it merely says that, look, if you had never been in a database, you had suffered no privacy harm. But on the contrary, if you had been in it, assume that nothing will be released. What is that we have to aspire for? So you're going to aspire for releasing a statistical dataset which is useful for public data and public health, but should not release any private information. So that's basically the end goal. So the end goal of differential privacy is, is almost the same privacy as individual assets that data is not in the database. That's basically the end goal of differential privacy. So this is another very interesting area of research that is coming out, right? And we have lots of real world use cases where this has been successfully applied. So one of it is Windows telemetry. So if you look at what Windows telemetry is that it just keeps sending you data back to Microsoft to, to figure out what has happened in a device and are you being in an attack? I mean, and so on and so on. So, so the question then is, can you really release a lot of personal information by looking at it if Microsoft got the, I mean, got it back? So that is one way of, so that is a problem they're trying to solve by in the differential privacy. So the next thing was LinkedIn address queries. I mean, this was also where you want a service to address it to target, but you don't want the investor to know the specific names and addresses of people. So how do you do it? The last thing is interesting, which is the US Census for Commuting Patterns. Okay. So these are, there are more examples than, than what I had done, but this is just to say that these algorithms exist and it kind of works, yeah. So this is an example of what we call as a reconstruction problem in the US Census. So what they actually did was, they collected data from people about their age, sex, race and relationship, and then they created the statistical table and the list of the public domain on the left side. So if you want to reconstruct it back to the right side, you just need a set of, you just need to solve the set of 164 equations. And so solving those equations takes 0.2 seconds and that way it's not going to grow. So this is basically what the problem differential privacy is supposed to solve, that you should not be able to reconstruct the right-hand side from the left-hand side, yeah. Okay. So what they actually do by saying is that they basically inject random noise into the data and hence try to preserve your privacy. Okay. So what it really means is that you basically have a sliding kind of a slider that says that, well, if you want complete privacy, the aggregated is useless, but if you want slightly less privacy, then this is basically the amount of information injection that we will do to make it slightly useful and so on. So that is, so this slider is the best way to think about what differential privacy does. So what we have done so far in this whole construct, I mean, we're bringing this together. What's the theoretical framework that you ought to really think about when we talk about public digital infrastructures is that, A, they're extremely useful things. I mean, who wouldn't want an infrastructure in a modic country which has roads, rails and transports and hospitals? And I think the same construct works for even for digital infrastructures. Who wouldn't want a digital infrastructure? But the point, of course, is that you ought to build them very carefully. And the way in which you ought to think them, if you don't build them very carefully, then you suffer a lot of these problems. And one way in which you can think about how you have to build them very carefully is how the other public digital infrastructures evolve, where you keep crypto, I mean, as a core layer, you build a whole bunch of storage management around that and then you basically build privacy services around that. And then utility layers, it could be public health data, whatever you want. So the base idea is that if you are doing a lot of aggregated data, you ought to think about doing stuff like differential privacy. If you're doing a lot of personal data, I think you ought to think about bringing your own key management. And you ought to rethink about how people interact. And by rethinking how people interact, you actually create trust in the system which basically increases the accuracy of data that you're building. So the way in which I can think about, the way in which I think about this is what I call as privacy pie in the sense that everyone has to share data and everyone needs to contribute to some extent to how decisions are made in a country. But it also has to be done in a form that creates trust, but not like a legal fiction. And the way in which you ought to create a trust is by rethinking about crypto and all this information and all the privacy stuff that is modern coming like the differential privacy schemes. We're almost there. So if you have ever read that nice book about privacy.zero with the written by Rahul Martin, I would still go and urge you to read it. It's quite a nice book. So most of the arguments about the book is about, oh well, there is consent, there is broken and how do you rebuild it and so on and so on. But if you actually notice, there is actually a way, the solution for unsent broken is a problem that is there on the cover of the book itself, which is what I call bring it on encryption please. And just bringing this into the picture and bringing your own encryption algorithms and how you interact with these systems kind of work very, go a very long way into how you can rethink and reimagine about public display structure. Yeah, right. I'm done exactly what happens. Thank you, Anand. I think that was interesting, varied set of perspectives. Before we open up the questions, I think if people have questions, they can type in the Q&A or if you're watching it on YouTube or elsewhere, please either go to the comment section and ask them or even ask it in the chat window so whatever you are doing it. But I have a question before some of these people ask, you put questions pertaining to this. Clearly, these are whatever you have said is essentially a mathematical issue, right? Like you boiled a social issue of privacy or the infrastructure issue, which is an engineering thing, all of it to a mathematical problem at some levels. So for people to determine what model to be used, these things need to be calculated or modeled. And when you talk about, say, sensors, for example, the US sensors bureau is looking at it. And same way, I guess, the India's digital general of India supposed to be doing some of this. You have a proposition by saying that these things need to be happening in a certain way, which enables trust. But at the same time, people have to share data to ensure that some policy decisions need to happen. There can't be a scenario where people cannot say that, oh, we won't be sharing any data. That could be a serious loss in scenarios like the corona pandemic itself, where you are required to share your information at some level, especially at health information, if it is harming someone else. Now, how do you even take this forward? Say, who should be the authority who should be looking at this at a larger level? Because considering the problem of technology in itself or how engineering is done is different in different ministries or different in different departments. Do you think all of this can be simplified by building a separate digital infrastructure itself? Again, when I ask that, is there a need for it? Who determines the need for it? Well, I think it's... I mean, who determines the need for it? I mean, I don't know how to answer this question, but I can answer the last two questions, which is, is there a need for it? I mean, yes, sure. You just look at what happened and when people were asked to share information with the ROGC, like, I mean, I personally tracked about 15 knots, right? How suppose we wrote an article about it? So the big problem clearly is that the people are not comfortable sharing. And it could be because of the fact that they've been asked to share more, maybe the thing is not good to lie, I mean, useful, whatever. So definitely there is a need for it and the need for it is about public trust, right? And there has to be a lot of informed decision-making on the public side to say that, look, I'm comfortable sharing my Bluetooth phrase information, but not my location, yeah? And I'm comfortable sharing my Bluetooth information only if it falls within these parameters. So you just turn around the question and ask, why would we not share? The trust problem keeps coming back, which is not any different than what Amazon faced when they started off, right? So you have to come back and say, look, I'm giving you all the tools that allow your agency as a citizen of this country who's involved in certain way in the country's functioning to say, well, you choose. And these are ways in which you have and you just basically empower people and then maybe it will work. I mean, this is basically what I can think of. As to the last question you said about, is there a need for it? Yeah, I mean, it should be, who's gonna do it? I don't know. I mean, I think someone has to do it. It would be far better if the government follows the public consulting approach for doing this rather than just doing what they're doing. That's all I can think of. Okay, do we have any questions? I guess some of you might have questions. I don't see any of them here, but anyone? Sorry, we are collecting some questions first. There is one hand raised over here. She can't. Okay. Hey, she can't. You can talk now. You can unmute yourself. Hey, Sinha. So I just had a quick comment to make on the locker analogy and the digital locker in particular. So firstly, with the locker, with digital locker, I would say, it's about data sharing from a set of trusted entities to another set of trusted entities, where your information is aggregated by this intermediary to share trusted information about you. And that's not information that you determined yourself. It's basically your bank statements or whatever, even your ADAR card, but the data that's there is from the authority. So what this infrastructure is for is basically, instead of trusting like 1 billion people, the FinTech company can now trust, say, a set of 100 banks and another set of 30 government agencies and say that, okay, I need to ask you for, say, your salary statements or whatever to give you a loan. But I don't trust you when you come and say, this is my salary slip because you can just take a paper and say it's the salary slip. So I would want the bank statement to be made. And I would want that to be digitally made via digital locker because even bank statement paper can be forced for you. So I would want the bank itself to cryptographically sign the data. So here's where the thing about trust comes in the context of digital locker, where the data issuer, in this case the bank, signs it with its keys, which the FinTech can verify cryptographically and the data flows through. And you as an individual has, say, an access to say, give this FinTech ability to get this data from this bank or whatever digital locker is kind of this intermediary public storage where you can just pull your data into that place from where you can share. So it's in one way, it uses cryptography, but it has got nothing to do with information self-determination, which is why I call digital locker is all about access self-determination. So you can determine who can get access to your data. And of course, that's with the trade-off that the government which operates the locker gets the data in encrypted form. So that's the trade-off that you make to do a data exchange between, say, other entities which are ready to trust a set of, you know, few entities and not trust people. I mean, this is what I had with locker and trust. And do you want to respond to that? Yeah, I mean, I understand why they built that locker thing. It's never about storage, it's about sharing. And even on that particular aspect, right? I mean, so you think about it, right? Maybe they just want to share some information about your bank records, right? But why share the entire transaction history of where you went and did the curry shopping, right? I mean, if all the bank really wants to go and share is about what's your average monthly flow? That was good enough, right? And then you come back and say, what do I want to share? I mean, I can come back and say, I can, you can further apply a transformation. I mean, you can, the bank may not actually send, I mean, the locker itself, may not actually store the anonymity form. I mean, even though it is digitally signed, you can still go and, you know, apply your own encryption key on the digital signature, extract all the information and say, look, maybe the statement has X fields. I mean, it is like 12 month statement. Maybe I don't want to share all of this. Maybe these are the things that I applied, which is good enough. So, so the point here is that, I mean, it's kind of understandable what they did with the digital help. It is right. What he's saying is fundamentally about, you know, sharing access control. It's never about information self determination. But what I'm just trying to tell you is that even on stuff like they give you what a pan notice. I mean, I've seen some of the people coming and giving me a pan as an example. Even on that, right? You can apply these things. It's not that hard. It's just that people have not applied a lot of imagination on it. And they just went back and said, well, earlier you had a physical card and now you have a, I mean, you had a physical card and a physical statement. Now you have a E card and E statement. I'm just putting a digital signature of the party who's issuing and just giving it to you. I mean, that's basically how I see it. Yeah. But the moment you transform that into a health record, I'm now deeply worried because this health record is not about, and you apply it on health. This is not about just a bank statement. I mean, this is about a whole bunch of other things, right? Whereas if I have to do any sharing on a digital health, it has to be on my control, not what the authority on the other side wanted. You know, this side, one side puts the other side's demands. And this basically an intermediary who says, yes, that's not going to work the moment you just transform this architecture into a bank. Into health records and not at all. And just to add to a comment, we are actually in the 2006 of S3 in terms of digital locker. We still don't have any encryption of that, any sorts. Okay. Yeah, well, that's kind of sad, no? Because when people come back and say they're designing 30-year architectures and stuff like that, it's kind of interesting that you have to think back and look at what other people made mistakes. I mean, S3 didn't make a mistake. They also did what they had to do in order to get the first X number of customers to do it. But I'm just saying they get critical adoption only when they add crypto in it. And I think it's time that we should do that. That's basically what I'm saying. It's okay for you to start where you started. I mean, a lot of businesses start that way. But what you navigate in the middle is what important? There's one question from Pramod. Pramod, do you want to ask that question yourself? I can allow you to talk. You can unmute your mic. You can go ahead and ask the question. No problem. Okay. So Pramod asks, how do you think government should participate in creating a public framework around this? How do you think government should... I mean, see... Participate. Yeah, well, government should participate. There's no question about it because this is like saying, what do they normally do in creating a public infrastructure like a road or an airport? I mean, there is a fixed democratic process they usually follow around it, right? Which is, is it necessary? Is it important? What's the standard? What is the basic thing that you should all follow for? And is there something that is going on? And then put the standard and then let things evolve. Because I'm pretty sure the moment they start putting the standard, which is much more modern and evolving, people will sign on to it. I think that makes sense. But the complexity around this much more granular structure that you're proposing could become pretty hard to manage. How do you think technology could solve that problem? So here is what it is, right? I have about, you can think of different ways of doing it, right? So if you think about it, I mean, the most interesting thing for me is take the simplistic use case of my health record, right? Just look at the fields and just tell me what it is. This is not any different than what S3 offers in terms of like a bucket. You just have to basically come back and say, well, it's a document, it's a field. And these are the four or five standard fields. You can just put a smartphone for it. I mean, for God, I mean, if you know, and then go back and say, these are the n-encryption algorithm that are standardized. And these are things that you want to just see and share it all. So what I'm saying is in general, these are not complicated problems until once you start going around the path. And then you just have to keep iterating over a period of time to figure out, okay, this is what works, this is what works, this doesn't work. This is the same model that most of the companies follow in terms of solution iteration. You just have to go back, write a prototype, see how it works. Does it make sense? Then what's the problem that you have? And keep going until you hit what Signal did with the protocol instance that you barely see it. It just works. And then once you come back and say, okay, what's the most reasonable defaults for it? And you just quote up the defaults. I mean, this is what we do all the time, isn't it? When you say, we've done all this usability analysis, this is where it works. This is where the maximum thing that comes. And if you're a tweaker, you can just go back to doing more. But if you're a normal person, these are your normal defaults. Sure. Yeah. I just thought that at the individual level, it's hard to manage. Or even understand how to use it for my own good, right? Even if the frameworks are available. But I think, like you said, it's about a question of iterating to get to the right points. Yeah. I think there was one hand by Konad Kotiab. Konad, do you want to talk? I have one minute to do. Yeah. I hope you can hear me. Thank you for your session. I just want to take a step back and try and understand. So usually what happens is it's either the right way to do things or the convenient way. And when you start adding all these concerns of encrypting data at rest or sharing data in a differential privacy manner, it's adding some sort of burden on the organizations. And that's where they decide not to go forward with it. Maybe the technology is not there or maybe the frameworks are not there. For example, if you're not hosted on Amazon, then doing all this stuff on your own infrastructure in itself is pretty complicated. So where do you see how can it be like developers are more careful about these things or organizations take it more as an important thing to do or a necessary thing to do rather than probably just a legal binding that they have to get through with it. Okay, so this is the free market question, isn't it? In the sense that unless there is a demand, you wouldn't want to do all this because there's two burdens, right? So typically what the GDPR approach has been that look, we are going to force you to follow these rules by just enacting a very broad privacy law. And it didn't work very well in opinion because if you look at the recent studies that you have been seeing on dark patterns, they just basically convert the entire GDPR into a consent box. I mean, just mandatory except cookies kind of a stuff, right? So the organizational problem would solve when there is widespread awareness about why this is good for business. That is the only way I see it. And they would not be interested in solving these problems until a consumer is demanded. So that is the demand supply problem. I don't see these problems going very, very easy. That's my take on it. Thank you. Okay, I think that's about it. There are no more hands. Okay, does anyone else want to ask questions? Okay, I have one last question, I guess, and we can end it. So there is this whole concept of data trust that's emerging where a lot of organizations are looking at this idea of data fetishries where your data is being stored in a capacity, say in a digital locker, for example, and then it will be shared, anonymized data will be shared for monetization. And there is a lot of talk around this. So do you think maybe we can push some of these discussions into that in terms of the whole non-personal data framework that's being debated inside comment of India right now, or even part of the privacy bill or the data protection bill? I mean, that is definitely one possibility and because infrastructure is not made and broken down on well, I mean, the best time is always when they start off doing something new, right? I mean, and the more you try to push these discussions when there is something that is not yet done, the more chance of success you have. Essentially, if one were to build a data trust around this, one could technically do it as a pilot. Yeah. And any data exchange system that's on top of it should be using differential privacy or crypto in some way to enable trust. Correct. So one major question in all of this, how do you verify it's working? The issue with crypto has always been you get an option to verify it, right? Like even S3 or when you share your own keys, you can verify it hasn't been tampered with. How do you do that with, say, an existing system, especially when the control structures are not with individuals, but it's mostly with, say, government? You can't verify anything. I mean, existing systems are unverifiable by default and they've been built like that. You just basically have to take what they're telling you on face value and more. And that's the problem we have, right? That they are fundamentally unverifiable systems and unverifiable systems are not trustworthy. So if you have to build a trustworthy system, you basically put out the algorithm, check through the standards, and you put out the implementations and the source code. I mean, how else would you build public infrastructure? So then a better way to push this is actually recommend this as a standard to ministry of electronics and information technology and enable these debates there within the standards body. Okay. Okay. I think we don't have any more questions. And does anybody have one more last question before we end this? We'll end it at day 15, considering we're almost done. Okay. I think we can end the session. Thank you, Anand. Thank you. Thank you so much for explaining all these individual terms and how things need to be built.