 for Global Enterprise, Global Scholars, Expert Connect Series, Intensity in the Digital World. My name is Ira Sager. I am Vice President of Global Learning Initiatives for the Center for Global Enterprise. For those of you new to the Center for Global Enterprise and our Global Scholars program, we are a nonprofit research institute focused on the study of global management best practices, the modern corporation, economic integration, and their impact on society. Our Global Scholars program is a worldwide learning community for business-interested students, academic faculty, and professionals. Through Global Scholars, we offer online courses and digital internships, as well as this Expert Connect webinar series. Participation in all our programs and membership is free. You can find out much more about our activities on our website, www.thccge.net. This is the first in what will become, beginning in September, a monthly forum on digital identity. But before we start this program, a few housekeeping notes. Today's program will be recorded and posted on the CGE YouTube channel. We will leave approximately 15 minutes at the end of this session for audience questions. If you have a question for our presenters, at the bottom of your screen, you will see a Q&A feature. Please submit your questions using the Q&A function. We will try to get to all your questions time permitting. Now to today's program. Identity is so important to our everyday lives, but for much of our history, our identity has been designed for an analog world based on physical documents and face-to-face interactions, like this one right now. But in a transition to a digital economy, that requires a different approach, a much more radical approach to managing our identity. To understand the magnitude of this issue, think about or consider that while most of our identities are associated with individuals, identities can also be assigned to legal entities like corporations, partnerships, and trusts. It can be assigned to physical entities like cars, buildings, smartphones, and internet-connected devices. Think about the internet of things. And as well to digital entities like patents and software programs. So you understand the complexity of this issue. In digital world, it touches nearly every aspect of our personal business and social lives. Our existing methods for managing our identity, particularly digital identities, are just not kept pace with technology. Dr. Irving Gledowski-Berger, a CGE fellow and former vice president of Technical Strategy and Innovation for IBM, will lead our forum and in monthly programs will explore the opportunities and challenges surrounding this complex issue. Joining Dr. Gledowski-Berger for our first forum and introduction to digital identity is Thomas Arjono, technical director of the MIT Internet Trust Consortium. Thomas leads technical projects and initiatives around identity, security, privacy, and emerging technologies such as internet of things, smart contracts, and blockchain systems. He is also co-editor of Trust Data, a new framework for identity and sharing. And with that, I'll turn it over to Irving and Thomas. Thank you. Thank you, Ira. And it's always nice to join another CGE webinar. When Ira, Chris Kane, and I were discussing what would be a suitable topic for the next Expert Connect series, I didn't hesitate in suggesting identity as a topic. Frankly, the reason I suggested it is that it's probably the topic that is both most important and to a large extent, most obscure. There are a number of advanced topics, clearly artificial intelligence, machine learning, blockchain, those are very advanced topics, but they are, there is a relative consensus on what those topics mean among the research community. Identity is a little bit different. There isn't a lot of consensus, different organizations view it in their own way. There aren't a grid upon standards. And perhaps the reason is that topics like AI and blockchain are technology topics that are relatively new, whereas we cannot think of identity without discussing that identity has been around, as Ira said, for hundreds if not thousands of years. When we're born, we get a birth certificate. Later on, we get passport, we get driver licenses, we get social security numbers, some of us get Medicare cards, we get email IDs, and on, and on, and on, and on. Are those identities, if they are, boy are they, many of them weak, as we have seen with all the security incidents, all the theft of identities, all the theft of data for people to get our identity. So it's a very, very, very important subject. Now to kick off the identity series, I reached out to my MIT colleague, Thomas Argiordno, where I'm also a fellow in the Connection Science Initiative at MIT, that Thomas is the Chief Technology Officer of, and I asked Thomas one of the leading experts in identity that I know to please give us a kind of primer on what digital identity is all about. And with that, let me pass it on to Thomas Argiordno. Thomas. Hi, thank you, Irving. Thank you, Ira, for this invitation and opportunity. We wanna talk about a number of items related to digital identities in particular. We have some slides to show, and I invite the audience, if they want to ask any questions at any moment during the presentation, if you wanna use the chat room to post the question. And maybe Irving or Ira, if you can put up the chat window on your side, because I can't see the chat box right now, just in case there are any questions. But I will just continue on right now, if everybody's okay. So the topic today is, identifies digital identities and this thing called personas and a few other sort of topics dangling off those three things. The list of topics are grouped into four areas. So I'll talk about what they mean when they say identify as the namespaces, when people talk about attributes or assertions or claims, what they mean. I'll talk about how the web use this web single sign-on, which is what we all use today, like this right now, when you try to log into a particular website, when you wanna log into, say, Amazon, you're doing this thing called web single sign-on. I'll then talk about some projects at MIT, particularly one on defining correctly the notion of identity as the correct usage of data and contextual data. And then I'll talk about data privacy because as Irving mentioned earlier, when we have data about us all over the internet and that data is stolen, instances of theft and so on, hackers can sort of create fake digital identities using our data. And this is where we end up in this whole mess, identity theft when people are able to create bank accounts on our behalf and so on. So the data privacy is the last topic. But before we go there, I'd like to just go through the first topic, which is what is it that when we say identity versus identifier, what is it that we mean? So an identifier, digital identifiers is really just a sequence of bits or often in my characters that I use to differentiate a person, an object, an organization, a company from one another. And typically when you say identifier, it is meaningful in a given context. So the technical term for that context is namespace. So you might hear people talk about namespaces and you know you're in the company of geeks when they start talking about namespaces, what they really mean is context. So an example here on the slide is that your credit card number is a 16-digit number that has meaning in the card payments industry. And if you use this credit card at a shop that has a terminal point of service terminal, the POS terminal, the terminal expects this sequence of numbers, plus maybe the other information such as the credit card expiration date and on the other side is that CVC number that's supposed to be secret but you know you use it all the time anyway. Another example is your social security number. It's got a fixed format and it's meaningful in the context of government transactions or when you deal with government services. And it has been used of course in the financial industry, as a means for example for a financial institution to verify that you are a US citizen or US resident. Another example is your phone number. So your phone number is really an identifier for you and your advice. And for those who know, in fact your mobile phone has got another identifier called the IMEI sort of physical number which is attached to the SIM chip. And so when you buy a new mobile phone, you can just transfer the SIM chip inside the phone and that retains your phone number. Your person identify a telephone number but it may change the IMEI number. So the phone number is meaningful in the context of the telephone network. And that's why when you ring people, call people overseas, you have to put the country code, you know, whatever plus one for the US plus whatever 44 for the UK because it puts the rest of the numbers in the context of that domain. So, you know, if you're calling the UK, I think it's 44. The phone knows that the ensuing digits are meaningful in the context of the United Kingdom's telephone system. And similarly, when you do log in today on the website, you put in your email address. Your email address is meaningful to the context of that website or email services and related. So when we say identifier and namespace, that's what we mean. Another way of looking at namespaces is to scope. So to put a limit on the scope of the usage of an identifier so that my credit card number will be useful and usable only in the scope of the card payment industry. If I take my credit card number, you know, and use it, you know, to log into my, you know, internet account, it will not work because the namespace in the context is different. Now, what's interesting and it's very relevant for our discussion about identity is this idea of data, right? So in many instances, the identifier that is assigned to us or that we choose internally in the backend that deals with our transactions it is used to quote, link to a certain set of data. So for example, your credit card number is used actually for your list of credit card transactions so that when you get your monthly statement from Mastercard or Visa via your bank, you see those expenditures as data, right? So an identifier of course has meaning in the transaction context, you know, when you go to point of service terminal, you know, you could say that people talk about that as the front channel, you know, you're dealing face to face. But then in the backend, the same identifier is also links to a whole data set, right? So sometimes it's also used to index into the data and, you know, pretty natural thing, you know, to do. And why is that significant? We'll see later on. It's significant because the set of data that is tied to or linked to an identifier is often used to compute a thing called an assertion. And to compute those assertions or those claims or those values, you need algorithms. So a good example is our current credit score system. So when you wanna apply for mortgage, you know, you go and cost it a mortgage, you know, go to the bank, the bank will say, you know, give me a set of identifiers, you know, and typically that would include your social security number as an identifier. The bank might even look at your credit card number if it's issued by the bank. So yeah, you know, this person is truly our customer. And then the mortgage provider like at the bank, it will request, you know, consent from you to check your credit score. And what that means typically is that they connect to one of three providers of credit scores today, which is Equifax, Experian, and I think TransUnion, I can never remember the third one. And they have algorithms that will compute a credit score based on the data, digital data that they have about you. And so they'll say, okay, here's the number based on the data we have today about Thomas, it is, you know, 650 out of 800. And then they might wanna use some fancy, you know, format and digitally sign it. And when something is been designed, we say it's an assertion. So basically when Equifax produces a credit report, it is asserting it's making a claim as a trusted authority that Thomas' credit score is 650 out of 100, right? So, but what's important here is that there's two aspects coming into the picture as compared to the previous slide, and that is algorithms play an important role. And this thing called attributes. So an attribute typically is the term used for the sixth portion of the information. So your date of birth, your social security number, location your birth, those are fixed, you know, pretty much typically your date of birth and your birth location is fixed, you know, for life and those fixed data points are also called attributes. Assertions are typically, the terminology of assertion means that an algorithm is used to compute over a data set. Now, the data set will change, it may increase over time, more maybe it's deleted, maybe they only use the last six months of your data sets and the algorithms might change also. They might, you know, a credit score company might update its algorithms to include, you know, other data points and data sets. And so that's why we distinguish we use the word assertion or claim versus fixed attributes or attributes. Now, today when we look at the literature, we look at the news, we look at the many, many websites on news items, we find the term identity provider being used. Now, an identity provider strictly speaking is really a web provider, a service provider on the internet that assigns you an identifier. And in this case, your identifier is typically an email address. So when I go to say, you know, Gmail and I say, give me a free email address, they'll usually say, well, you know, please pick your email address and invariably it has to be, you know, at gmail.com and there's a reason for that. Now, if my desired email address, you know, email name, commerce has been taken by somebody else, then they might assign me, you know, another one like Thomas, you know, 2018, for example, as an option. Now, usually these identifiers email addresses are used, it's linked to an account, an email account in the first case, right? So again, you see the same pattern identifier is used to maintain, to link into an account. And an account, you know, typically has collections of data in the case of an email provider, the data is your email. But let's say you were collecting photos such as on Pinterest or some of these other services, your data is perhaps, you know, photos. And today, as we all know, this process can be cumbersome, right? So, you know, on the left there, you see the bottom form, you know, every time you go to, you know, sign up for a new service, you discover a new retail store online, that typically are being asked to re-enter the same thing again, name, address, city, phone number, and, you know, it becomes cumbersome. And also, in essence, it's just duplicating, you know, many, many sort of fixed attributes that can change, but typically do not change that often. So for example, my home address, yeah, it might change, you know, once in a couple of years, once every five years, but certainly it doesn't change like every day. And so that's the first issue, that massive duplication. And sometimes it's erroneous, right? So sometimes I might forget to put my apartment address, or I might, you know, put in the wrong street number and so on and so on, right? So, when people say identity providers, typically the correct terminology is that they really identify providers because those entities don't have data about you. They don't know anything more other than here's your, you know, identifier. As such, they're not really creators of identity, they're just creators of identifiers. So why did we end up with a thing called an identity provider? So back again, back when, in late 90s, early 2000, when the web was just, you know, booting up and there was a proliferation of retail stores online and so on, in order to prevent people, consumers, from having to repeat this process of entering, create new accounts, you know, create, you know, new password and email address and so on. The idea was that a special type of service provider called the IDP identity provider would maintain that information for you and an IDP would enter into a business agreement with all these retail stores. And so what would happen is that when I go to a retail store, you know, and this is happening today, like shown in the diagram there, when you go to a retail store, they'll ask you to, you know, log in using one of these other services, or in this case, in that diagram, it says, will you sign in with a Facebook? Or can you sign in with Google? Or if you don't want to do either, you can, you know, log in using the service being maintained by the retail provider. So this process is really called mediated authentication because the end user is actually doing authentication providing the credentials and the credential checking to the identity provider instead of the store itself. Now, again, the side effects, and this is another set of discussions, when you do that, when you sign onto an IDP, often the terms of service is not favorable to the consumer, right, depends on who they are. So, for example, an email provider might say, okay, we'll give you a free email service, but, you know, they have the right to keep copies of audio emails. They might claim that they have the right to mine the data, scan through data, and then they might even say, you know, they have the right to actually sell your email address to a third party such as a marketing company, right? So these are some of the negative sides. The positive, of course, is very convenient, you know, you just log in one. But then the other side is the user might, you know, be affected in terms of, you know, the sharing of data across different retail and marketing organizations. Thomas, can I ask you a question? Yes, yes. So let me, first of all, let me remind those listening that if they have a question they don't have to wait till the end, they can submit it via the Q&A button at the bottom of the screen. And I will look at it and then I will ask it at the right time. Thomas, as you were talking about identity service providers, they are really more identifier service providers. The thought that came to my mind is that something like TSA that assigns you a number that helps you get through airports much faster, it's much closer to a real identity service provider. They give you a number, you keep it for a long time. It's based on a lot of data they gather about you, a kind of security clearance, as opposed to just going to Gmail and asking for an email ID. Is that a correct view? Yes, that would be correct in my opinion. Also because Irving, the value of the transaction is higher with TSAs. So because you're trying to get through the airport, right? So that event is very important but you need to get through the airport quickly to get to your plane versus getting an email address, the transaction is a free email address, transaction email of the lower value, right? So let's say if you lose that email address, email account, somebody hacks it, you're just given a new one. Whereas if you, when TSA gives you this identifier and the card and so on, they go through a lot of background checking, right? And so the implication of course is that the events, which transaction in this case is a very high value. But Thomas, let me anticipate something that I know you'll be talking about. Email address may look like not a big deal until it gets stolen from you. And then all of a sudden, people start using this supposedly not that interesting identifiers to steal your identity and steal money and things like that. So we are in an interesting situation where an identifier has little value until somebody hacks into steal seed, steals your identity and bad things start to happen. And I suspect that's a lot of our dilemma that we have all these identifiers that in a more digital world have far more value than originally anticipated. Is that a correct view? Yes, just because of the nature of the growth of the internet and actually correct. I mean, part of stealing your identity is getting one or more of your free email addresses and people get attacked by fishing attacks all the time to get the email accounts. That's the first step that the hacker tries to get in order to continue on and create this fake identity based on other data about you. Yeah, by the way, somebody asked me a question I should have said, what does TSA stand for? TSA in the US is something you can apply to that if approved, gets you through lines in the airport much faster and when you land, there are kiosks that you can get through much faster than most people. I don't know what the exact name TSA stands for probably transport something or another. Irving, it's the Transportation Security Administration. Okay, transportation, thank you. But it's a real thing and to get it, I had to go to the airport, be interviewed. I think they actually, they take your fingerprints and so on. So they do a lot more work than just anybody can get it. I'm sorry, I didn't specify. Go ahead Thomas, I'm sorry for being not sure. Okay, so this next slide just captures very simply, you know, this web single sign on, right? So recap what I said before, and this is a green user is trying to get to the RP. The RP is called the Relying Party and the reason why it's called the Relying Party is a terminology that's borrowed from the card payments industry. So, you know, way back in the 70s and 80s when the whole credit card industry began to grow, they came up with this, what is called the Four Corners Model where, you know, your bank is called the issuing bank would give you a card, you'd bring the card to the store, the store when you swiping the card, the store is actually transmitting your card number to the network, you know, there's another terminology for that. And so the retail store is dependent, it's reliant on the checking service, the people who check your credit card to say, yes, no, this is a valid credit card. And so that name has stuck, you know, Relying Party, a party that relies. And in this case, the RP is reliant on the identity provided, the IDP, to perform the authentication. But again, we're just borrowing terminology. Now it's important to note, again, in the late 90s and early 2000s, the browser was the primary tool for people to do, you know, shopping on the internet, it still is, we didn't have mobile phones, we didn't have mobile apps. And so the term web single client on is actually very meaningful in the context of browser web single client on. Today with mobile phones, we have apps, it's a different model, the transaction flow looks similar, but people use a thing called an awa token, you know, under the hood, we don't see that as a way to achieve the same single client on. So what are the challenges today? You know, this is what Irving was alluding to earlier, you know, in this presentation. So a number of things, lots of things, I just picked out a few here. Social platforms are collecting massive amounts of personal behavior data. For those who have Amazon accounts, and like myself, buy a lot of stuff, and yesterday or the day before was, you know, Amazon Prime Day, for those who like to, you know, buy discounted goods, you know, part of us engaging these platforms is that they are collecting behavior data in terms of what items you're looking at, you know, on social platforms itself, especially like Facebook and Twitter, they are keeping track of what things we click on each page and so on. That behavior that is actually immensely valuable, very valuable, and in fact, we would argue here at MIT that that is part of our individual data. It's a set of data that can distinguish us from the next person also, say, on the same platform. So that's a first issue. The second issue is traditional institutions. So these are, you know, financial institutions that've been around 50 years and so on and so on, who have some data about us. They are constrained in the sense that there are specific regulations that are used within the telecom industry, the financial industry that prevents them from actually making, you know, free use of our data. And so these institutions are finding that they're being left out in this, you know, internet generation because, you know, they've got regulations, you know, basically preventing them from monetizing consumer data. The third issue, which is true across, you know, across sectors that are online today, that the citizen UNI are typically outside the loop of the flow of personal data. So when the ECOS Act, you know, occurred, you know, so, you know, I called the ECOS Act to freeze my report and did the same for my wife. And my wife said like, so how did ECOS Act, who is ECOS Act, we didn't know who they were, and how did they have data about me, right? So that's kind of an indicator that today the individual is not in the loop of the, you know, it's outside the loop of the entire data ecosystem. So we think we need a new paradigm to understand this future personal data ecosystem, particularly because personal data is really a new asset class. And I'll talk about World Economic Forum report, you know, in the next slide or so. So we need to, we think the way we look at personal data and come up with a better paradigm to get everybody, you know, within the same ecosystem and provide transparency, you know, for the flow of data. Okay, so moving on, so core identity persona. So this is part of some of the work that we have been doing here at MIT, trying to address some of the issues with regards to personal data. So having been, you know, involved in identity, identified since the 90s, you know, involved, believe it or not, in the thing called the X509 digital certificates. And I was working for probably the largest CA back in the 90s. And for those who know what a certificate is, it's simply a public key with my name or my email address that is digitally signed by what is called a certificate authority. And what it does is it's simply an assertion saying that Thomas with this email address is the owner, legal owner of this public key. That's the separate, you know, discussions. And that's really not identity, either it's an identifier. And so we came up with a more broader definition of identity and we named it core identity. And we named it core identity because it has to with you inherently. So all the collection of digital data that is out there, you know, that's held by my roommate here, that is held about you, whether by Echo Facts, by Google, and it's uniquely about you, regardless of where they are, who holds it, all that makes up who you are, right? And the amount and size of the data is growing because as you keep on engaging, you know, with the internet, you know, this set of data sets spread all over the internet is growing. So we think we need the new name and we came up, you know, lack of, you know, creativity core identity because it's core. So in the diagram, what we're trying to show is that your personal attributes together with your work data, your health data, all the data that your gadgets, you know, are producing, whether it's cell phones or physical, you know, your GPS data, all of that combined is what we call core identity. And some parts of that data really should be made private in the sense that it shouldn't be available to just anybody who is asking for it, of course, because, you know, for example, of the issue of identity theft. So the other term we introduced is again, just very simple persona. So when I go to work, so I work in an office, in a company, there are some data points that the company will need about me and some data points that the company will generate about me or having to do with me as part of my employees. So for example, let's say every year, you know, you get, you know, a, oh, every month you get a pay stop, you know, you'll pay, you know, proof. That's data, right? So that's also growing. Now, the set of data, data points and attributes that I are, you know, used in the context of work is different, for example, than the social context. So let's say, you know, I was a member of golf club, actually don't play golf, so I'm a member of a social club, like a golf club, they'll give me an identifier, you know, membership number, and they might want to know, you know, my home address, you know, my mobile phone number, you know, they might even ask, you know, well, what brand of gear do I use, right? But the golf club may not be interested in, perhaps they should not be interested in, you know, work data, you know, like, you know, what's my office number or what's my phone number, what is the building number that I work at and so, there's a clear need to separate the context and the easy label we use to name those contexts, this is persona and persona. So a human person, Joe or Alice, will have multiple personas relevant to each of the context within which they live, right? So let's say you were active in your town hall, you know, you go to town hall meetings, you vote, you leave, you know, elections, local elections, you will have, you know, a town hall, you have your town persona, right? So it's about the context and the relationships and what one could even argue is that your friends on Facebook, the social connections you have is contextualized, is context information, so that could be your, we could call that your social connections persona. And let's say you are active on, you know, Facebook and let's say on WeChat, well, you could say, well, you know, I have two personas, right? One on WeChat, one on Facebook. So that's kind of the, this is an easy labeling for this notion of context. So what are the issues? So we're moving on to data privacy. Earlier I talked about persona, personal data as the new asset class. So that support there on the left is worth reading. It's a nice read. It was published by the big data group in the World Economic Forum. The chair of that group happens to be my boss here, MIT, who's Professor Stanley Pendlin. He is one of the top, one of the top seven social analytics experts in the world. And the report, personal data report, actually argues, you know, none of the items there should be surprised to us, but it argues for the need for a better way to look at data, manage data, and to address, you know, privacy issues. You just Google WES, you know, personal data, new asset class, you'll get the building and there's a PDF there. And so this issue of personal data, privacy, how to treat data, what is the personal data ecosystem, is now, you know, it's being addressed by not just the United States, but the folks in Europe and also in Asia. And that's just because of this general recognition that personal data is an asset, not just for us, the subjects who have data everywhere, but also for this, basically the entire economy, because a lot of the transactions today, you know, in the economy, rely on the accuracy of this data. So if you apply for mortgage, the mortgage provider will make a risk-based decision based on the data available to them. And so personal data is definitely, you know, an important aspect of, you know, the future of the internet. Thomas, let me interrupt once more if I may. Yep, go ahead. First of all, let me do a time check. We have 20 minutes left total, so we should try to wrap up the presentation side in about 10 minutes. And then Chris Kane had a question. Chris, do you want to ask the question? Well, I would like to ask a question, Irving, but if Thomas is gonna wrap up momentarily, then I can wait and we can do it. Well, it would be probably be 10 minutes, so maybe you want to ask it. Okay, all right. Thomas, I'm interested in the distinction between identifiers and identity, and clearly it sounds, if I heard you correctly, there's a gap, right? There's a gap between an identifier and an identity. Sometimes they align, but usually they don't. And I have two questions for you. One, are there technologies that are aligning or locking identifiers and identity so that there is no gap? And the second question is, how commercially disruptive would it be to eliminate this gap between identifiers and identity? Sure, so I think the first question is, yes, there is a gap in the sense that there is no coherent way to view, to manage, to provide accountability for entities, organizations who have data about me. So, if I do a transfer, I apply for mortgage, I don't know how many other organizations, besides the mortgage provider, that actually get access to copies of my data. And it's not a matter of privacy in this case, it's also a matter of accuracy, right? So one of the problems I was made aware of by colleagues in the financial industry was that the cost of onboarding consumers can be very expensive for new businesses, right? So they typically quote buy data from third-party aggregators, but then there's this issue of the quality of the data that aggregators have, and apparently it ranges from data that they scrape on the internet to data that they buy quote from other third parties. So there's the question of the provenance of the data and the accuracy of the data. And these two combined actually harm the service providers themselves, because if you are a marketing company and you say, give me a thousand names of people who might buy a Toyota in the town of Cambridge, Massachusetts, you might get a thousand identifiers, but you have no indication really of the quality of that data until you try using it and you might have a low return in terms of the investment you made in that data. And so the other aspect is the need of a special kind of arrangement for entities who have data about me. So we like to talk about a fiduciary relationship. So I have a bank account with State Bank of America, they have a legal fiduciary obligation to maintain my monetary information as private, right? So they're not gonna be telling people my bank balance is in someone and so on. And so right now we don't have the equivalent of a fiduciary obligation on the part of people who have data about me. So correct me, but for example, I'm a customer of a mobile phone carrier, mobile provider in a big one, you know, AT&T, Verizon, any one of these guys in the United States. As far as I know, do you not have any fiduciary relationship and obligation with me with regards to my telco data, GPS data, and so on. And so, you know, we kind of need a new model. You know, we've been at MIT, we've been looking at this, you know, credit union. So credit unions are different from banks, right? So is there a way to create a credit data union? So like an organization that keeps data accounts for people that acts more like a credit union than a bank. Right, so you are, I think, I don't know what I would call myself a shareholder in that quote credit data union model, but that's one of the things that could be used, Chris, to fill in the gap. And so with that model, I could in fact keep copies of my data with this data union, you know, organization, and I can actually go in and personally attest that the data is correct, that yes, my credit card transactions that I just uploaded, that's correct. That's what I got from MasterCard, that's what I got from Visa, right? So suddenly, you know, you have a better accuracy and bit of providence of the data because the consumer is looking to providing the data. And the other side is we need a way to remunerate the consumer. So right now, after I do my mortgage, I get my mortgage, my data is floating out there, I'm not getting any value directly out of that data. But what if I could provide copies of my data through something like this credit data union where I could say, yes, you can run these algorithms and every time an algorithm is run where my data is included, you know, I get a fraction of a penny. You know, maybe I get a hundredth, you know, of a penny, right, which is kind of nothing, right? But over time, you know, I might find that, hey, this month I made a hundred dollars as a consumer because my data is being used in this, you know, computation of an algorithm using, you know, a whole range of data, including mine, right? So that's how it goes. Thomas, could you show me? Go ahead. No, no, no, we just need to move on because I'm getting quite a few questions. There are two in particular that I think you're about to get to, which is what strategy should a platform business use to guard against hackers stealing your data and identity? And a second, if the identity is stolen, what can be done to stop it from being used? So I know that's what you're going to do in the next section. And just keep the time, you know, we need to stop at 11, so keep the time check, okay? Right, so one way we believe to reduce the opportunity for data theft is simply to disallow or prevent as far as possible the copying of data that's not consented by the user. So, you know, my mortgage data really should not be sold around the web and it should not be even accessible to third-party aggregators, right? So this idea of this, you know, black market for data, you know, in the, just the financial industry, that really needs to stop because they don't have a fiduciary relationship and obligation to me so that if one of those guys who have data about me that they copy or they box from somebody else, they get hacked, really I'm the citizen, I'm the entity who suffers, right? So we're looking at this thing called open algorithms and the idea is that instead of copying data for processing, we really need to move the algorithm to the data. What that means is that if, you know, if a company wants to know something about me instead of buying data about me, they should just find the legitimate entities who hold data about me with my approval and send the question to them, get it computed and get an answer back. So you always return safe answers. Safe answers means that unless consented by the individual, it should return aggregate answers only. So, you know, a marketing company might say, you know, to legitimate data holders, give me, you know, how many people who live in Cambridge, Massachusetts have washing machines of this brand, right? So that's a pretty, you know, neutral question. It doesn't identify or re-identify any particular entity. Thirdly, we really need to start looking at algorithms. And if you look at the GDPR Article 7 and Article 22, they talk about, you know, algorithmic decision-making in the sense that the subject needs to be in the form about what algorithms are being computed and what does the algorithm actually do? And so we talk about vetting. An expert really needs to look at the algorithm so that they know that it's, you know, free from bias and so on and so on. To answer Irving, your other question, how do companies like Africa Equifax and so on, who have data, you know, my bank, who have legitimate data with me, they've got a relationship, how do they protect all the consumer data in their backends? One way is to use encryption. So this idea that the world really needs to move to a situation where all data is encrypted, always encrypted, during storage and during computation. So there's a whole field of cryptography called chromomorphic encryption and multi-party computation, NPC, that allows you to do some form of computation using encrypted data without keys, right? That's the interesting thing. So that if the data is always encrypted and it's distributed across multiple nodes, it makes the job of the attacker more difficult. They have to not only attack multiple nodes, multiple backends, they now have to try to decrypt the data without key. So that's another sort of project that we're looking at at MIT to help solve this data hacking problem and the negative impact that it has on consumers. And this is, I think this is my last slide. This is just a pictorial representation about OPOL. So instead of moving the data, what if you could just send the algorithm to the data repository? So each of these companies, employees, banks, they would do the computation behind their farm. And in this way, it would actually help them in preserving their own data infrastructure. I believe that's the end of my talk, just bunch of references there, but maybe you wanna open up the floor for questions, Irving? Yeah, let me, I think let me summarize a few key points that you made that I think are very, I don't know if they are unique to the approach of the MIT Trust Data Consortium, but they are, first of all, identity is all related to data. You cannot separate identity from data and identity is based on all the data that the institution or whoever has about you. Is that a correct view? That's correct. It's not just institutions, it's everything, including yourself, right? If you have a data set and a data set about two different people, you pretty much have two different identities. And so the core identity of a person really doesn't change because let's say the data set is the same data, the identifiers can be changed. Right, but then the real issue is that we don't, when an institution has data about us, we don't know where else they'll send the data. We don't know who they'll sell the data to or transfer the data to. So there is data about us all over the place and not in institutions that we ever did business with. They just got it from somebody else and that is a lot of what contributes to the hacking problems we see that data gets stolen from somebody we never dealt with and all of a sudden our identities come from. Is that correct? Yes, that's correct. And that is in fact one of the articles in the GDPR is this notion, captures the notion that if an organization has data about a citizen, it really needs to let the citizen know because it has obligations, because if they bought the data from somebody else and they get hacked and the data's lost, they're not penalized for doing that, but the individual citizens, individual persons, they have to burden the negative consequence of this data getting loosed and getting stolen by people. And then to make matters worse, and I don't mean to get everybody here to lose sleep over what we're talking about, there are, a business can handle the data any way they want to, some encrypted, most do not. And so this would be as if you can build a skyscraper in Manhattan and you can use whatever quality, concrete and steel beams you want and if the building falls down, you say, gee, I'm very sorry. And so it sounds like we are, even though data these days is maybe as critical as the physical aspects of a skyscraper because of all the damage that it can do, the institutions don't have the obligation to protect it with strong encryption during storage communications, et cetera, right? Is that correct? And that's correct. And this is one, another aspect of GDPR about the penalties for losing data. And so I think that the tide has sort of shifted though, I think it's very disturbing in the sense that in fact many organizations that we speak to today are in fact keen, they want to use encryption technology and are waiting for this type of computation on encrypted data solutions to be made available as products. So it's almost like they're now looking for a way to use encryption just to protect the backend data. And they can use older technologies so if you have multiple drives in your racks you could use self-contripting disk drives so that if a disgruntled employee decides to steal a disk drive the disk drive is encrypted, right? So that's a low bar sort of solution that many of these companies could begin looking at even today. Thomas, one of the appeals to me of blockchain services is that the use of strong encryption is built into them. You cannot access, you cannot do things with blockchain services and strong encryption is in present. I mean, then you cannot call it blockchain. So to the extent that blockchain services become increasingly standardized companies will just automatically encrypt everything by just using the blockchain service. Is that a correct view? Well, so the blockchain itself is not holding all this data other than its own transaction that's ledger data and whether or not the ledger data is encrypted or not is a different discussion but the data that I use when I buy stuff on the retail store is really it's off-chain, it's away from the blockchain. You could use the blockchain to index into the endpoints so you could create a special blockchain that essentially is a directory of pointers so that if I was a marketing company I could search and say, hey, I'm looking for entities out there who have data about washing machines consumers of washing machines and maces and so on. And so, you know, the blockchain could provide them with the URI URL of the data repository but the data repository needs to be in the back end encrypted and not really on the blockchain itself. Yeah, I understand. Now, by the way, one of the comments here is the whole conversation makes me feel scared about the whole digital transformation world we live in. Yes, we agree, which is why we chose this topic as one that we need to look at because as Thomas said, there are answers to this question. But, you know, there is considerable research going on regulations are beginning to appear like GDPR in the European Union and on and on and on but these kinds of changes take time to appear. You know, even when we know we knew that cigarettes caused cancer a few decades before there was finally action about it because there are always institutions that don't like the changes. So this will take time but there is considerable attention to this topic. Now, we are almost at the end and I cannot get to all the questions but Thomas, give us some parting words that at least can make people sleep better at night. Sure, so if possible, I would limit the number of accounts that you open, right? So when you want to buy something, maybe you should buy stuff of the same minimal number of stores whether it's Amazon, Macy's.com, whatever. So try not to use retail stores that you've never interacted with before. Number two, maybe make use of services like PayPal because PayPal is essentially almost like your identity provider in the sense that you have to log on to PayPal to get payments occurring, right? And the transaction information, your data, other than your shipping data, doesn't get transmitted to the retail store. So there's a lot of low-hanging, through day-to-day activities that you could improve essentially to limit the surface of the attack on your personal data so that fewer and fewer entities on the internet have copies of attributes about you, even simple attributes like your phone number, your home address, you know, and, you know, use zip code, right? Okay, now, Ira, do you want to... It's 11 o'clock, so... So it's time to end the session now and I want to thank Thomas for a wonderful and enlightening discussion. And Irving, thank you for moderating. We'll post this recording on the CGE YouTube channel, as I mentioned, and for more information, also information on subsequent expert connects on this topic, you can go to our website, thgcge.net. And thank you, everyone, for participating. Thank you. Okay, thank you, folks. Bye-bye. Bye-bye, thank you.