 Oh, great. Looks like the live stream is working. Okay. And Sean, I'm going to make you co-host. When Tim gets here, I'll make him co-host. Oh, looks like Tim's here. Hi, Tim. Awesome. Good morning. Good morning. Thanks for joining. Whenever you're ready, Sean, you got it. Awesome. Thanks so much, Sean. We can go ahead and get started. Welcome everyone to the Identity Special Interest Group for July 27th, 2023. Thanks for joining us today. My name is Char Haaland and I'm a co-moderator of this group with Bippin Barathon and Tim Spring. And today we've got a great agenda. We'll go over working group status updates and then hear a presentation from Dimitri Zagadoulin. I'm so sorry if I'm mispronunciating. No, that's exactly it. Yeah, thank you. Okay. On generative AI and SSI threats and opportunities. So really looking forward to that. Thanks again, Dimitri, for joining us. Yeah, my pleasure. Absolutely. We, let's see, since this is a Linux Foundation call, we're following the antitrust policy as well as the hyperledger call. So we're following the hyperledger code of conduct, which are both written and linked here, if you'd like to learn more. This call, of course, is being recorded and we'll post it on this meeting page later today. I'll send out this wiki link. All right, introductions. Would anybody on the call like to introduce themselves? Looks like we just got an introduction from Alfonso in the chat. Thanks so much for joining Alfonso. Great to have you here. Feel free to come off mute and talk a bit about your work or, yeah, and anybody who wants to jump in and introduce themselves. All right. Alfonso Govella from Hyperledger Latino America Regional Chapter. Glad to be here. Thank you for the invite. Thank you so much, Alfonso. Glad to have you here. Anybody else who'd like to introduce themselves? Dimitri, feel free to introduce yourself now or when you get to your presentation. Either works just fine. Sure, I'll jump in. So Dimitri Zakadoulin, I'm a lead architect at MIT's Digital Credentials Consortium and a general software engineer in the decentralized identity space. So I do a lot of work on NW3C and ITF and DIF. And now, hopefully, Linux Foundation's Open Wallet Foundation on Verifiable Credentials, Decentres Identifiers, all that related tech. Absolutely. That's great. We're glad to have you here. All right. Would anybody else like to introduce themselves? Great. Well, thanks, everyone, for joining and I'll drop this in the chat again for anybody who just joined if you'd like to put your name under the attendees list. That would be wonderful. We've got a few announcements worth mentioning here. Our call next call in two weeks will be joined by Nick Steele to give a presentation on credential migration for wallets and credential providers. So definitely join us for that one as well. Also coming up next month is a hyperledger in-depth webinar with Infosys. So here's a bit of info on that and a link to register as well. Sean, have a few announcements as well here. So I'll turn it over to Sean for those. Hi, I'll keep it quick. Thanks, Shar. So the Identity SIG is not only about hyperledger projects. It's across the Linux Foundation. We also bring in folks from outside the Linux Foundation to help cross-pollinate great ideas. Unfortunately, these three bullets are about the Linux Foundation and Hyperledger. So our Q3 editorial campaign is Hyperledger Identity. If you have content that you want to submit, like a blog post or announcements, please let us know. There's a link there for that. Animo, who are fantastic contributors and members of Animo, are also maintainers on Aries Framework JavaScript. They also recently, those maintainers did a fantastic workshop on AFJ's new 0.4.0 release. They recently announced making Aries Framework JavaScript a global framework. This is in response to the ARF and some of the action happening in the EU. It's a call for sponsors and partners to make it easy to build EU suitable applications with AFJ. You can find out more at that link. And Lisi, who are currently using Indian Aries and the non-creds, they announced a new messaging feature for their agent and wallet. And there is a link to a LinkedIn post there as well. Thanks, Shar. Awesome. Thank you so much, Sean. All right. Any other announcements that anybody wants to jump in with or Vipin or Tim, are there any things that you wanted to say before we jump into our working group updates? All right. Let's go right ahead then. So we'll start with the Indie Contributors Working Group. So in this meeting, a recap of the Indie Summit, that was a three-hour event that happened a few weeks ago, several weeks ago, to get everyone involved in Indie together to talk about how they're using Indie, Future of Indie, all sorts of great conversations. So recapping that on the call, as well talking about deprecating slash archiving the Indie SDK. I know there's been a little bit of confusion. The keyword is SDK there, deprecating Indie SDK, specifically not Indie as a whole. So this is talking about the replacement components for the Indie SDK, Aries Askar for storage, Indie shared RS for the AnonKids implementation and Indie VDR for the Ledger client interface. So good to talk about all of that stuff. The Aries Working Group meetings, anybody able to attend these recent Aries Working Group calls? I wasn't able to make these ones, but they've been talking about Aries marketing. They've got this Google slides here. It's cool how it displays right in the page, but so they're going over the results from a survey they went out recently related to Aries marketing and they've got some suggestions here. So, and they're also talking about the Aries Endorser service and RFC pull requests as well. All right, anybody able to attend the Aries Byfold user group? See, I wasn't able to find too much on their goings on, but I think they've been talking about Byfold updates and issues in PRs as well. Let's see, in the Aries Cloud Agent Python user group meeting talking about Accompy Release 090, some of the important aspects of that are that it's archiving or changing the dependency related to the archived hyperledger or subproject. So the replacement of that is the Anon cred's CL signatures. And so that's an important update. Another important part of that update is the upgrade from Python 3.6 to 3.9. So good discussion of that. Also talking about deprecating the Indie SDK related to what I was just mentioning in the Indie contributors working group, auto flags, marshmallow warnings. And then as well, there's a new Aries Cloud Agent Python maintainers meeting. And so this is a half an hour alternating weeks from the Aries Cloud Agent Python user group meeting. And this one is specifically for merging in PRs, discussing changes that need to be made and getting those PRs merged in. So that's a separate space to do that. All right, Aries framework JavaScript. Anybody able to attend these recent meetings? Looks like they've also been talking about deprecating Indie SDK in the migration to Asgar and Anon Cud's Indie BDR, potentially moving to a bi-weekly call as well. So the permit is a bi-weekly call for the summer. Okay, just during the summer? Yeah. Okay, great. Any other updates or insights you have into that recent work in that group that you'd want to share? No worries, if not... Sorry, nothing else to upset. All good, all good, thank you. And then let's see. Hyperledger and non-CUDs, anybody attend this recent meeting? So it looks like they've been reviewing and merging in PRs and talking about non-CUDs in W3C format, a proposal for the API. All right, any other hyperledger working group updates? All right, moving on to the trust over IP foundation. Let's see, it looks like the last all members meeting was in March, so we've reported on that in the past. Let's see, Daniel, I saw you come off mute. Am I missing a meeting there? No, it's just getting ready for the comms committee. Okay, great, yeah, yeah, feel free to jump right in. Yeah, so I did the comment to you, well, to the NIST 863-4 on behalf of trust over IP to NIST. And I believe we put it as a summary blog, but basically as anybody that read it probably it can easily tell, it has nothing to do with decentralized digital identity. It's all enterprise or basically federated identity where every aspect of your identity claim has to be validated with the credential service provider in their model. So anyhow, so that was the essence of it. So the suggestions were that, because they mentioned mobile driver's license and verifiable credentials in the document. And so we challenged them to be friendly to decentralized digital identity and adopt it. And they basically said that they mentioned those two verifiable credentials and mobile driver's license as an input into their identity life cycle. In other words, if you make an identity claim using a verifiable credential or an MDL that would be an input into their federated identity model. Anyhow, that was it. Interesting, thanks for that update. Did I miss the meeting notes on that? I think I was only able to find the June 6th meeting. Was that more recent than that? No, that sounds about right. Yeah, it's been kind of light this summer. Great, great. Just wanted to check on that. Thank you so much for the update. Let's see, governance stack. Anybody able to attend this July 13th meeting? It looks like they're changing the plenary meeting cadence to monthly and they're asking for feedback on this trust registry presentation governance document. Technology stack working group, they've got a bunch of task forces. Is anybody involved in or attending any of these task force meetings who would like to report? Yeah, they're all basically interrelated, but the focus is on that, the trust spanning protocol task force and the trust registry. So the trust registry is kind of what it sounds like, what should be in a trust registry and the spanning layer protocol is what is the payload, what are the protocols used to get stuff in and out of a trust registry? So that's what we're looking at. And even the previous one, the governance stack, if we look at, we use the EU trust list as an example, it's kind of you trust the trust list today because in the case of the EU, the EU provided it. And it's sort of a binary thing like a passport public key directory, Kaio publishes this list of country signing certs. And if your passport has a valid signing cert, on that list, you're done. It's a binary thing. It's either on the list or it's not. And so what the governance folks are looking at is, what should it look like? How do you trust it and what content should be in it? And then the more technical folks are looking at, again, what in depth, what technically, what are the protocols to get stuff in and out of the list and what should the list contain so that it can be trusted and more than just you're on it, you're off it. So if you're a trusted issuer, you're a trusted issuer of what credentials and at what assurance level and all that kind of other metadata that we think should be there. Uh-huh, cool. Thank you for those updates. Yeah, and it looks like for the trust spanning protocol, hoping to present a working draft and implementation at the next IW and for the trust registry, there are some different proposals. That have been created and they'll be consolidating those proposals, so great. Thank you so much, Danielle, for that information. Yeah, sure. Let's see. The ecosystem foundry working group. Anybody able to attend this one? Yes, that was a little bit ago. They were talking about ecosystem components, concepts and terminology. They met recently talking about the terminology engine V2 progress did on ACDC and Web of Trust Glossary as well as a few other updates. So any other trust or VIP related status updates or working group progress? All right, so we'll be on to the diff. Let's see, did anybody attend any recent DidCom spec working group meetings? Looks like they met on July 10th, but I wasn't able to track down any meeting notes on that, unfortunately. But the, oh, I guess they must have, despite this announcement from last month, they must have had the DidCom users group on July 10th and so the spec working group would have been the first Monday of the month. So in the DidCom users group, they had demos, a demo from James Ebert about DidCom invitation service. Then this demo is well on analysis and improvements of DidCom messaging from Jesus Diaz Vico. Slides there. Let's see the diff interoperability group. I don't think they've met this month. So we've reported on them in the past. Anybody attend the IoT special interest group? All right, I think those are all the diff groups that we keep updates on, moving on to the W3C standard. Not sure the, as far as I can tell, the did working group hasn't met recently, but the community credentials group looks like they had a recent meeting with the CCG verifiable credentials for education task force. Dimitri, were you involved in this one? Yeah, yeah, absolutely. So I think it was, let's see, what? Oh, the topic on that task force call was the upcoming Jobs for the Future Foundation, so JFF, PlugFest number three. So for those of you who are not familiar, Jobs for the Future Foundation is US and international based foundation that is responsible for job opportunities, skills-based learning, and using verifiable credentials and decentral identity in helping out those causes. As part of that, it's been doing a series of interoperability PlugFests, getting together issuers, wallets and verifiers, and ensuring interoperability, demonstrating interoperability for educational and human resources, like job hiring sort of credentials. So that was a technical discussion about the upcoming PlugFest number three, which is actually gonna be happening the day before IW on, I believe it's October 9th. Great, that's awesome. Thanks for that update. All right, we've gotten to the end of the working groups that we keep tabs on. Does anybody have any other working group status updates or announcements they'd like to give before we move into Dimitri's presentation? Sure, I'll just give it a quick one from ISO because it's identity related. So last week was SC17 working group 10, which is basically mobile driver's licenses. And I think a lot of folks are tracking that because the current published version dash five of 18013 is in-person sharing of identity information and the upcoming dash seven, which we were working on would be remote sharing of identity data, sensibly using things like open ID for VCP for verifiable credential presentation. Well, those specifics are this week in Fukuoka, Japan, which is working group four, working on ISO 23 to 20 dash, I think it's four, which is working out the technical details on using such protocols and APIs and stuff. So anyhow, it's still in the works. Both those are core to things like the European digital identity wallet and well MDLs in general, but yeah, that was it. Great, yeah, thanks for that update. Wonderful, I think we can go ahead and turn it over to Dimitri for your presentation. I'll pass you the screen share. Thank you. All right, well, let's jump into it. So here's the slides that I'm gonna be using. Feel free to follow along. I'm gonna share my screen here. All right, hopefully everybody can see. And okay, yeah, I don't have a screen space to follow with chat, so please feel free to let me know a voice if somebody has questions. Okay, so I've given this talk a number of times and each time I try to tailor it specifically to the audience, some audiences know a lot about SSI and less so about machine learning and some audiences know all about AI but we're unfamiliar with SSI. And so I try to do different emphasis. This group of course is definitely familiar with SSI. So I wanna talk about, give a brief run-through about some of the recent developments in machine learning and some of the implications of it. And then I wanna have a conversation about so what role do we here in this group have to play with it? How can we help specifically in the situation? So, okay, yeah, like I mentioned, those are the three main sections that we're gonna talk about. Gonna go over what's AI, less so about what's SSI because you know everything. Gonna talk about the dangers and opportunities and the interaction between the two. Okay, if nothing else here's what I want you to take away. So artificial intelligence powered by recent developments in large language models is pretty remarkable. It's here and it's moving extremely fast. Enough so that if you follow weekly news of the AI industry it is like head turning and mind blowing each week. So moving extremely fast. You may, if you haven't been following it you may be surprised what it can do. I was certainly in that bucket up until recently. I was very surprised what has happened while I wasn't paying attention. But large corporations such as Google, Microsoft, especially obviously Facebook slash Meta and other world leaders are very much paying attention as well as governments. So that's why I'm here talking to you all. All right, so briefly AI is a broad field. Can be measured in or rather can be divided into two different fields. Machine learning, which is largely what we're gonna talk about today and symbolic processing or good old fashioned AI as it's sometimes called. And of course there's so much more to the field of artificial intelligence. So that we're not really gonna cover into it, cover it on this talk today. But as you can imagine, it's a fairly large field. So to go over some of the acronym soup that we're dealing with right now. What do we mean by AI? Is it gonna be using that word a lot? I unfortunately it has that term has turned into buzzwords in marketing soup lately. So specifically, or rather we're gonna try to be in our conversation as specific as possible and say large language model or machine learning rather than AI. But understand that when we say AI, we kind of mean we're hand waving and saying all of these things. In reading articles and interviews about this stuff, you'll sometimes come across these terms like AGI, which is artificial general intelligence, which is theoretical human level intelligence, which so you'll see predictions. Okay, when are we gonna have AGI? And of course, predictions are difficult in a exponentially moving field, but you'll see different experts give different estimates. Limited intelligence or narrow intelligence is essentially what we have right now. And it's named that way partly to reduce the culture shock and fear and loathing that people might have when faced with it. And of course, once we have general intelligence, the concerns and the apocalyptic predictions that you hear usually relate to artificial superintelligence. So once we pass the human level and the AI can improve itself, we may or may not have runaway effect, or not there yet. So we'll deal with that bridge when we come to it. ML is of course machine learning, which for the last 15, 20 years has been the major direction in AI source of billions of dollars of investment from very large industry players. And symbolic AI is kind of languishing. The two branches of the science kind of alternate each other in history. So and at the moment, it's of course the rise of ML. And specifically, part of the huge reason that machine learning and artificial intelligence has been in the news so much lately is this fairly recent development of large language model, which are neural networks of a particular type trained on large data sets. And we're gonna talk about what that means in a bit and how that applies to us in the SSI world. In a particular type of large language model is general pre-trained transformer, which came out of a general pre-trained transformer paper right around 2017, 2018 that really turbocharged and accelerated a lot of the both research and actual practical deployments in this field that has really brought it to the front pages of the news. So we're not gonna go into the history too much just to say that the term itself, artificial intelligence is fairly old. It's from the 1950s. We've had several iterations of AI hype cycles and booms and busts, several AI winters, so to speak as the symbolic AI promises and hype where sort of splash with cold water for various reasons. That's the rise of the Lisp machines. That's the rise of prologue and expert systems. But technology in general and AI in particular has been moving on and things that were predicted to be many years down the line, all these milestones are gradually being passed like AI, machine learning, defeating world chess, players. In the 2010s and slightly earlier, the thing that majorly changed the landscape is of course the big data revolution and the neural networks kind of, it goes hands in hands. Disk hard drive space is getting cheaper, bandwidth is getting cheaper, processors are more powerful and the databases of key cloud providers are getting giant, the web is gigantic and all of that presents a lot of training data for massive neural networks. So in 2017 is the Seminole sort of Keystone Transformers paper by Vasvani followed by an improvement on that on the Transformers paper using this GPT technology, this generative pre-training Transformers. And after that, GPT the project using this tech has essentially taken off, has gotten again billions of dollars of investment from Microsoft, from Google, from the various open AI partners and others. And as we'll see in future slides, the training datasets are getting larger, the capabilities are getting more remarkable and the versions are flying by. Although now we've hit a point where I think all of the players involved are both horrified at the rate of progress at the super fast rate of progress and what have we done, but also are under pressure from everybody else, both state actors and other giant corporations. So in some sense, we can't stand still, right? There's an army's going on there. The thing to keep in mind is that this has been slowly, this has been a slow exponential curve. So the capabilities of these systems have been slowly increasing like while we're sort of not noticing, right? So we've had handwriting recognition since the early 2000s, since Apple Newton and the early PDAs. Of course, now we're used to speech recognition. We've got Apple Siri and Amazon's Alexa and Google Home and all of these things, right? So nobody's surprised any longer when our phones and our laptops and our IoT devices recognize speech. Although it has been a long time coming, like the industry has clawed for every percentage point of speech recognition, performance, just insane amounts of deep science are there behind how effortlessly we can talk to Siri right now. And then as data sets improve, again, hard drive gets cheaper, processors get faster, we then come to a fairly remarkable achievements in image recognition. So that's when you start seeing some, like, yes, we've all probably came across humorous articles where it's a giraffe and an AI recognizes it as, I don't know, a car or something humorous, right? We've seen all of the sort of the missteps, but underneath has been a slow and steady march towards human level or better image recognition. And so I think culturally, we're kind of getting used to image recognition. We've all used the CAPTCHA services, which incidentally serve as training sets for these image recognition AIs. But very, very recently, we've started getting into what we consider very core human capabilities, which is reading comprehension and then language understanding, which is being able to do things with the reading comprehension, where the ISF have reached average or even fairly high level human performance and then have started to surpass them. So where are we at this point? What can large language models do? So we've heard the term generative AI, which is this notion of large language models combined with a user interface, either chat or voice or some sort of graphical user interface. And they're able to create new content with it, whether it's text, meaning you can ask JGPT-4, hey, can you write a cover letter for my resume based on this document over here or write me a business plan for this new startup that I'm doing? So they can generate text in a fairly high level. They can generate images, of course, that's the mid-journey and Dali and other such services that we've seen. And recently, we've been getting a slate of video generation of like text to video kind of thing where you can say, hey, create to me this 10 second advertisement of this particular product and tailor it to such and such audience and it can actually do that. Combine that with the fact that large language models have gotten sophisticated enough that they can clone people's voices after as little as three seconds of sampling your particular voice. So that there's been fairly remarkable demonstrations and actually they're now at a level of mobile apps in iOS and Google Play stores where you give it a sample of your voice and then from then onwards, it just straight up speaks with your voice and it's extremely close and difficult to determine a difference. And again, as a lot of the AI researchers point out, the important and somewhat overwhelming thought about all of this is reading each capability that the large language models can do. You have to keep the thought in mind is today is the least sophisticated they'll ever be. Today is the worst performance they'll ever have. Like, fortunately or unfortunately, it keeps improving day to day and week to week. So if the language models can clone a person's voice with three seconds of sampling, the sampling size will only decrease and the difference will be even more indistinguishable in the coming days and weeks. Combined with the proliferation of surveillance and recording, you can see how we're sort of getting into uncomfortable territory. We can see why we're all here in this SSI field of, yeah, maybe we should do something about that. Maybe we should both on the legal and legislative and societal end of things, but also on the technological end of things. Maybe we should start providing tools and affordances to address some of these things. So when we say generate stacks, we mean it writes pretty decent essays, pretty decent articles and, well, shocking to me personally, straight up can now pass not just SAT questions, but straight up law school exams. And again, even as recently as Chad GPT-3 or 3.5, we pointed to the SATs or law school exams as, okay, yeah, I mean, AI's are getting good and stuff, but they're certainly not good enough to pass law school entrance exams, but Chad GPT-4 came out and all of a sudden, they started scoring in the 96%ile of all like law school entrance exam performance applications. So like that's getting fairly remarkable, right? So we've probably heard a lot of about all AI is gonna replace programmers. It's not quite, but it is getting, it is doing fairly remarkable assistance and even writing full application software using again text to code interfaces. Hey, Chad GPT, can you write to me a data analysis that looks at this data set over here and optimizes for such and such performance metrics or more upsettingly, hey, large language model, can you write to me a zero day vulnerability hack for this model of Wi-Fi router? Like those have been demonstrated in the security circles. Putting all of these things together, the ability to generate text and to clone a person's voice, it is now, again, this is now available on the application, mobile app level, it can make phone calls, it can act as a personal assistant as to schedule and organize events. Putting these generative skills together, we've now seen first instances of it making scientific discoveries, either retracing and duplicating the discoveries that humans have made or actually making new ones. And specifically a lot of funding has been applied towards using generative machine learning and generative AI specifically to work with protein structures, which immediately, mine goes to, okay, that means medicine and unfortunately means by warfare agents, which is part of the reason of the concern some of the governments and some of the providers of this technologies now have. And again, each one of these has been sort of, each one of these capabilities and threats have been already demonstrated by researchers and publicity stunts and all these things. Obviously, and we're fairly used to this, a lot of what the large language models and neural networks do goes into automated operation machinery. So all the self-driving car progress or lack of progress that we've had over the years has largely relied on this type of technology. Okay, so what does that mean? What are the implications to us? We can divide them into sort of four broad categories. So obviously, there's some economic implications. The important thing to note about large language models and AI, which it shares with incidentally the recent COVID pandemic, is if anything, it's an accelerant of existing threats. And trends. So when you find yourself thinking about, okay, what are the opportunities and what are the threats that this new technology provides? Just think it is a powerful accelerant of existing trends. So think of all the existing technological trends, turn them up to 11 and think through the implications. So, but what are really, what are we dealing with? What are people outpicketing industries right now? First things that we've run into is of course, a compensated transfer of value from content creators to large corporations. To put plainly, these large language models have been trained on the entire internet, both what we think of as public, quote unquote, and in the gray area. So basically the entirety of Reddit has been used as a training dataset, the entirety of Flickr and large repositories of photographs and like Facebook and all that stuff has been fed to these machine learning algorithms. Which means all of the photographers, all of the artists, all of the designers out there have without consent and without compensation, have helped train the thing that is largely becoming their replacement. So we're dealing with the potential for massive job replacement and obsolescence. And we don't have the compensating safety nets, the mechanisms like universal basic income or just anything really to help deal with it. At the moment, the fields that are being replaced first, the fields hit the hardest. So far has been lower-end graphics design. It is now trivially fast and trivially cheap, like free basically to say, hey, can you generate a good logo for my company? This is the company description. This is the audience that I wanna target, et cetera, et cetera. It gives you a fairly good logo. The thousands and millions of artists working for commission for either business graphics or hobby graphics, all are currently being very quickly replaced with machine generated. And I have talked to the artists personally, and I know I have a lot of artist friends that are running into this day-to-day already. And again, it's getting better, right? The other main economic threat, aside from, hey, the people training this thing weren't compensated and it's gonna replace a lot of jobs, not just artists, but surprisingly enough, a lot of white collar workers, a lot of office management and decisions and data processing and a lot of stuff that we thought would be fairly safe from AI for a while is being hit the hardest first for various reasons. But the other main economic threat is that at the moment in this early stage of history, the data sets and the tech and the resulting large language models are of course being centralized in just a small handful, can really count on one hand, maybe two of central players, all the usual GAFA, et cetera. Except actually Amazon is oddly falling behind the sort of arms race. But there's definitely threats of centralization monopoly. So what do we mean by legal threats? Well, the problem is these models have been trained on literally the entire internet. So with all of the good and bad that comes with. So there's a lot of legal biases that a lot of prejudices, I'm not even talking about differences in jurisdictions that are essentially encased in very opaque data structures that we can't really examine. So there's a great paper, I should search for it afterwards called moral crumple zones, cautionary tales in human-robot interactions. And what they really mean is human large language model interactions. And it just talks about when as large language models have more and more responsibility, meaning they're driving the cars, they're making business decisions, they're making medical decisions or heavens for fend legal justice decisions. We've already had early experiments with that, not to good effect. As these models are making the decisions, oh, who is held accountable? And how do you explain the decisions that they made? Because at the moment, the results are very opaque. You don't see the language change, you don't see the reasons or the input training data that it was made on. Then of course there's the privacy of the training data and privacy on the other end, on the outputs and the explainability and attributability of various arguments. So we're familiar with this in SSI and in technology in general. So the tech is as harmful and beneficial unfortunately as the legal framework around it. So that is gonna be one of the key battlefields that we have in this. Security threats, I have this. So the very stereotypical of can it build nukes, can it build much more easily? Chemical agents and biological agents, unfortunately agents, unfortunately yes. So the ability to build tailored diseases is being lowered down to the garage level, down to the hobbyist level, which is somewhat concerning. On the other hand, the good news there is of course the counter balance, the ability to cure and fix these agents is also increasing in the same way. So that bit I personally and a lot of people are less worried about. It's the less obvious security threats that are going to be coming to the forefront of the discourse. And that is attacks against infrastructure, meaning, you can literally say, okay, here's the data set of the power distribution centers in the state of Massachusetts. Which one do I have to take out for maximum damage or maximum power outage? Things like that. Even more subtly, there are, and we've seen this with Cambridge Analytica, we've seen this in the previous election, the notion of attacks against social infrastructures, which polling stations do I have to close in order to get this particular political result? Or more importantly, generative AI makes it extremely cheap economically and instantaneous to mimic humans online, to make posts, to create social media accounts, to create web pages, to vote on a properly not locked down system. It makes it trivial to create fake humans and to scale that horizontally. So anything that that ability would affect social structures, that's definitely in threat. Very sort of down to earth and practical implication is it's already messing with our search engines. So more and more when you search for anything on Google or some of the major search engines, the top results are increasingly filled up with junk posts, with machine generated data. And what's worse, there's a vicious cycle there in that because the LLMs are being trained on web data, as the web gets more and more filled up with junk posts, what is that gonna do to the quality of subsequent models? All right, so we've talked about some of the threats and some of the challenges there. What are the opportunities? What can it do for us? It vastly increases each person's skills and capabilities on a different level and increases the skills and capabilities of groups and of larger organizations, of corporations, of nation-states. Interestingly enough, AI and some of the SSI technologies allow us to collaborate on private and sensitive data in a privacy-preserving way. It has already begun impacting medicine and medical and lifespan research and will only do so increasingly going forward. There are a number of projects putting large language models to work in climate change solutions. And if we do manage to solve the attribution problems, artists' lives are gonna become easier. Okay, so I see we're getting closer to the time. So I'm gonna run through all of these fairly quickly. We all know what SSI is. Here's the takeaway that I want you to think about. All of these absolutely remarkable capabilities come from massive data sets. How massive? Well, GPT-1 was fed just four and a half gigabytes of text, a bunch of e-books somebody had. GPT-2 is the publicly available web text dataset. So it's a whole bunch of Google Box great documents as well as sort of some of the deep web like Reddit posts. And then as we go further, more data, as you can see the size of the training data sets is increasing exponentially. So it's now the entire, all of the e-books that have been published, all of the web, all of Wikipedia and a whole bunch of, basically anything by hook or by crook, the training companies can get their hands on. And that is a lot. And we've gotten to a point where as of GPT-4, they stopped disclosing what they're training it on. That's how large and how hungry the data sets have become. Okay, quite the important bits, what we're really here to discuss. What's our role in all of this? All, as I mentioned before, all of the AI accomplishments run on data and a lot of the SSI and identity techniques working on can help secure that data, both in the privacy preserving sense, in group collaborations, in inter-corporate and intergovernmental collaborations. So that's thing one. The other thing, part of the sort of social threads that I mentioned, AI makes it absolutely trivial and instantaneous to impersonate humans, both for news and for discourse and for all sorts of other purposes, which means our identity and provenance technology becomes that much more crucial, everything needs to be signed. And by everything we mean, we're working out some of the techniques to sign and attribute every piece of the input training data. We're able to sign and provide provenance for the algorithms that are running. There's a lot of developments in the field of verifiable computation. We can give accreditation and identity to the resulting models, to some of the questions that are asked of the models and of course, the dream that we're moving towards is AI-powered agents. We of course need this AI to be open and answerable to the public. Okay, so calls to action. If you're not familiar with the field, read a couple of articles. Obviously have your hype filter on because there is a lot of hype on there. Definitely watch the Microsoft 365 co-pilot because this stuff is literally moving into Google Docs and Microsoft 365 Office, which is technology you're gonna be using every single day. And some of the capabilities that Microsoft and Google are putting into it are absolutely remarkable. Definitely pay attention to the legislative conversations because some of these capabilities are alarming. You're gonna see a lot of laws being passed very quickly. Some well-informed, some less so. See if you can help with that. And my biggest advice, my biggest encouragement to all of you is just play with it. Literally log on to chat GPT or mid-journey or Dahli and then start poking around. If you're an engineer, enable co-pilot on Visual Studio or in WebStorm or whatever, IntelliJ. Start getting an intuitive feel for exactly how much capabilities it adds to individual people. And if you're an engineer, one of the remarkable recent emergencies is that some of very powerful language models have been liberated from centralization and are now downloadable and runnable on your desktop and on your phone. So start experimenting with those. And of course, let's keep working on wallets and key recovery, which is going to be key, so to speak, to everything, all of this. We still need to be continuing doing our work on data provenance because it's gonna become crucial. And of course, encrypted data stores, either personal or corporate, which will provide the training datasets for all of this. Okay, we only have no time for questions. Maybe some quick ones. So, yeah, let's continue this conversation. I think both the challenges and what we can do is fairly clear. Yeah, absolutely. Thank you so much for your presentation, Dimitri. I see we're just at the end of time. If anybody has any quick questions, Dimitri, if you're able to stand for a minute or two. Yeah, absolutely. Yep, not a problem. Does anybody have a quick question or two? Dimitri. Yes. We've been here. How are you? Good, good. Well, good that you're continuing your association with the identity working group. Now, about this whole GPT stuff, you know, there are a lot of problems still with, especially with hallucinations. And you had also mentioned something about generated data, which means they're eating their own dog food. Yes. I don't know whether they'll turn into dogs or into wolves. But this is a big problem. One of the, you know, recently they've come up with some kind of a association to control this. And also, Sam Altman in doing the world coin thing has proposed using iris scanning as a way to distinguish humans from. Yeah. And my question is, will those, you know, techniques themselves be co-opted by AI? I don't know whether they can generate good enough iris scans to pose as humans. Unfortunately, you spend very little time on this, you know, the application part, especially the SSI part. You said, you know, something about certifying everything. I had written a pretty extensive article in Forbes about all of this, including how to do the risk management. But I think all these are going to be overtaken by the capabilities and the kind of things they do. But anyway, coming back to the SSI part, it would be interesting to just devote some time on that, you know, how it can curb some of these, some of these bad effects or even mitigate them. But are there, are there, you know, what is your view on this whole, you know, biometrics? Yeah, great, great questions. And I would love to, by the way, read your Forbes article. That sounds really interesting. So what we're seeing with biometrics is an arms race. The generative AI get more sophisticated in cloning voices, cloning irises, fingerprints, all that stuff. On the other hand, the detection techniques are also getting more sophisticated. So it is literally an arms race, depending on the week or the month. One side has has a little bit of an advantage than the other. The comforting thing so far is that each type of biometric by itself may be fairly easy to clone, but multiple types are still very difficult. So the higher assurance level. Credentials, higher assurance level identities tend to rely on multiple biometrics. So we mean like two-factor authentication, two-factor authentication. Yeah, so it's more exactly, exactly. Two-factor biometrics and more, right? So all the spy movies that you see where it's a voice password and Irish scan and a fingerprint, like that's the kind of stuff that we're getting into multiple channels and it's an arms race. But how will that sit? Alongside SSI because many people are extremely persistent in the SSI community to the whole notion of biometrics itself. Great question. So I think we in the SSI field need to be careful to be against naive implementations of biometrics. But biometrics do have an important role to play as long as we pair them with privacy preserving, selective disclosure, and just general good design. So you mentioned world coin. Why is world coin such a problem? Because all of the biometrics that they collect are just in a giant database. There's no selective disclosure. There's no privacy. And recently, the world coin database has been leaked. So all of the millions of Irish scans are just out there in the dark. Right? That is how not to do it. Well, I mean, we had a presentation many years ago in the identity working group where the data is not stored centrally, but rather a conversion of the data into some kind of a fuzzy matching at the edge happens. And then that is compared with the, you know, instead of the stored biometric, it would be some kind of a, I don't want to use the word fingerprint, but some, you know, some way of identifying that and then matching those identifications using fuzzy matching, meaning, you know, nowhere is the biometric stored. It's always converted into some intermediate form. And the matching is done using some reduction algorithm, like a hash or a fussy hash or something. Yeah. And those kind of techniques are definitely important. Like, I think, I think what you're getting at, you know, is it's an important tool in our SSI toolbox. Like we shouldn't dismiss it out of hand because it's got an important role to play in the fight against generated fakes basically. Yeah. I mean, distinguishing between human beings, even before a generative fakes, which was a concept of civil resistance, which is, you know, even before generative AI became a thing because civil is obviously multiple presentations from the same human being purporting to be different people. Yeah. Okay. So that we, we, we had such a presentation also stored in our identity. Working group. I mean, working group in the beginning, the, our repository. So all, all great things. And, you know, I'm just going to say, waiting for the next person to answer what, ask a question so that I don't just take a poll. Thank you. Thank you. Excellent questions though. Anyone else? And I know we're over time. Great. Well, thank you for taking the time to give the presentation and even stay a few minutes after to answer questions and sending out the slides again. So much to think about. Thank you so much. Dimitri. Hey, you're quite welcome. Thank you for having me. Let's. Continue the conversation and let's, let's keep working on the stuff. Thanks all. Absolutely. Absolutely. Thank you so much. Thanks everybody. Thank you. Thank you.