 Great. Well, thank you everyone for joining. I know we're still going to have people joining, but given that we only have a 30 minutes today, I'm going to go ahead and kick things off. Thank you for joining us. This is our second in Berkman Klein Center series on accountable technical oversight of generative AI. I'm Sue Hendrickson. I'm the executive director at the Berkman Klein Center. Last week we started this series with BKC responsible AI fellow Dr. Ramon Choudhury and Riva Thortz from NIST on the landscape of generative AI harms. And today we're going to focus on a new tension on balancing security and transparency in the quest for accountability and open research regarding generative AI. Legislative efforts such as the DSA are providing new regulatory routes for increased transparency and access, yet how best to accomplish that in the rapidly evolving landscape of generative AI is not yet resolved. Today we're looking to tease out what constitutes meaningful transparency and wrestle both with with the security risks that may come from sharing training data or code and the societal and other risks that may come from not sharing. I have with me today for this discussion and fireside chat I'm thrilled to have Bruce Schneier, internationally renowned security technologist also known as a security guru. Bruce is also, I'm pleased to say a long tried affiliate and long term fellow at the Berkman Klein Center, a lecture at the Harvard Kennedy School and a best selling author of 14 books and literally hundreds of articles essays and academic papers. The digital newsletter the Cryptogram and blog Schneier on security are read by over 250,000 people. He's a thought leader that I and many others consult with to understand the nuances and challenges of the digital security landscape. Welcome Bruce. Thanks for joining. Time is short. So we're going to dive right into questions on this. So for our virtual audience. Please add your questions into the chat box we're hoping to make this an iterative discussion and we'll try to read in your questions into the discussion as we're going through. Okay, Bruce, starting us off here. I'm looking forward to thinking about the lessons that we can learn from the security landscape as we tackle generative AI. How did we consider the trade offs and even if there are trade offs between transparency and security in releasing how should companies think about this and releasing their models and underlying training information. So it's interesting, I tend not to like the trade off metaphor, my transparency is something you trade off with security transparency is one of the ways to get security. So these are all going to be security versus security discussions. So we have a security versus another type of security in the, in the software space we have been dealing with transparency issues, really since the beginning transparency isn't a goal. It is a method of achieving a goal you might want to know that the algorithm is fair you might want to know that software is secure and you might want to know it's something about the systems you're using. So one of the ways to get that is via transparency. It's the open source movement was built on this notion that if the software is public, many eyes can look at it and vulnerabilities bugs are found and your software is more reliable more secure. I'll argue whether that's true or not, but that is the, the conceit. Yeah, there are other times when transparency might be a competitive problem, my Google is going to say, you know, we don't want to release how our search algorithms work. Even though that is transparency, then people will be able to gain them. And if you know how it works you'll be able to optimize your page to rise in the rankings, and that would be an insecurity. So there, there are ways that transparency affects security both good and bad. And the question is always, what are we trying to optimize for whose interests are we serving, and what are the ways ways to get that. Think about again open source transparency doesn't give you security. I mean, the Linux is more secure because it's transparent. Doesn't make sense Linux is secure because people are looking at it. And so there's a lot of open source software that is public but it's obscure, and no one looks at it. So you don't get the benefit of the analysis so the value is analysis then how do you get it. Right, so I mean that's actually a great topic because you know many of us saw the leaked memo by senior engineer at Google around no mode and this trend we're seeing right now in generative AI towards the smaller cheaper versions of kind of best in class models is, you know, clear with the proliferation of these open source ones from hugging faces alternative to chat GPT to alpaca from Sanford Dolly stable stable the Kuna and others. How you know I like the way you frame that, and you know I agree with you on that that transparency isn't the goal, it's a method of it. Right in the context of the open source models that are being released and widely tested, are they posing security risks are their things. How should we tackle those differently. As we look at these proliferation of open source models. There are security risks of giving everyone that powerful technology. I mean we think about what's happening with chat GPT, and there'll be ways people use prompt injection to get to do things that company didn't want and they put in controls. So it couldn't, as you get these models in the hands of everybody, a very international. Obvious community, you're not going to be able to put in those controls, and if you have an art model you can tell it like do not create fake IDs. And as long as a big company's running it, you'll be able to do that. Once that guard starts being diffused, you lose the ability to do that. So we're going to have models that will be racist and hateful, just because those controls won't be there. This isn't about transparency this is really about how the technology diffuses. I think there's a kind of enormous value in having this democratization I think we're seeing going to see a lot more innovation. A lot more, a lot more new ideas, a lot more ways these things will work and then they're not going to get a control of these massive for profit monopolies is a good thing, but it also has, has issues. This isn't a transparency problem and transparency to think about what we might want an AI is doing a thing it is making a decision, and we want to know why. But I say I want this modeling transparent it's kind of I want to know why it makes decisions makes there's a bunch of ways I can learn that. Maybe I can learn that by seeing the insights, but maybe I can't maybe that's going to tell me anything. Maybe I can learn it by being able to audit the model. Am I allowed to query the model a million times and learn what its contours are right it's making going to make this up hiring decisions. I want to query it with counterfactuals, submit the same resume with different parameters, tweaked maybe racial ethnic you know the kind of things we might be concerned about, and see what the differences. These are forms of transparency that aren't releasing the details of the thing, but it might give us the information we want. So again is you know what is our goal, releasing the full details of the AI model might be beneficial and might not. A lot of these things are irre, irre, non reduction of all in their complexity, so that seeing all the parameters doesn't tell us anything the ability to actually interact with it does. I mean that seems to be one of the real challenges here is figuring out kind of what actually would provide that kind of meaningful transparency that people are looking for in this context, as to whether you know as you said kind of will releasing it actually accomplish what the goals are of kind of providing transparency, and are there ways to to achieve those goals through that release like what what techniques would actually let you do that from these models. People like Kathy O'Neill talked a lot about the ability to algorithm audits. And I think there that's real important Susan Benish talks about the same thing you know, these these systems whether they're, you know, AI or not are making these enormously important decisions, and we have no ability to to audit them to to understand what they're doing. In a sense I don't care why if an algorithm is racist I don't care why I just want it not to be racist. So then I'll think about human systems. If a human system of humans is racist, in a sense I don't know care why I can't open the people's brains and look at the parameters and understand how they made their decisions. You know, we as society want them to make more fair decisions by whatever that means. I'm, you know, I worry a lot that transparency is is not going to give us what we want to sort of chasing the wrong tactic for the goal. I do like open source. I do like I'm glad they're open source models. But I don't think we're going to get what we want by forcing, and our companies like Google or open AI to open up their models, so we can evaluate them because that's not how we're going to evaluate them. We're going to need access to them. We need to understand that maybe we're going to need some access to their training data. There are a lot of other reasons besides security, I mean it's no if they're violating copyright or anybody else's rights, or like, you know, or unduly taking the labor of millions of people and not compensating it. So we have a lot going on here. Great. I mean, do you see any of those as particularly security related risks, or is it the broader collection of risks that we worry about when we're kind of seeking seeking this and you, you teed up perfectly our next our next session on this which will be related to audits and accountability mechanisms in its context because I agree with you fully that that is one of the things that we need to figure out you know how we're how we're managing here with respect to it. I mean, the security risks kind of look like this and this is this will be the anti open source argument. If we give people the source code, the bad guys will be able to comb through it and find vulnerabilities and exploit them. Yeah, that is what Microsoft will say we will we're not going to release our source card. Are you crazy. The bad guys will will figure out exploits. Now, that that is a thing we know from decades of security that that isn't a real worry. Then in fact there are more good guys and bad guys and the good guys find things and we fix them. So generally is a good thing to make your code public and for security and not a bad thing. So far these instances I like Google, like Google search algorithm, where you don't want people gaining it where you have an algorithm, I'm going to make this up for admission into college. And we make it public. Now every high schooler knows like the exact GPA and the exact type of extra activities, kind of making this up, the exact ways to gain the algorithm, and we don't want that. And we've heard increased calls from kind of some of the large language model providers to to hold back from the kind of openness that's been provided on these security grounds which is one of the reasons that we kind of wanted to tease this out some and think about whether that's whether that's real in attention with the open source community or whether that's I mean that's always been attention. I mean that that's Microsoft versus Linux that I mean that's the exact tension we can do both things we don't have to choose one or the other. We can do both different companies will make different decisions. And I think that's okay. China isn't necessarily more secure than the other the risks for one, aren't necessarily the risks for the other, and Microsoft keeps their source code proprietary, but we know pretty well that countries like China have gone and stolen pieces of it. So, right, are we getting the value of that closed down, you know, right. And the benefit of open source is that we in the community get to get to look at it and play with it. Now, what you said earlier there are open source models out there. And there are several and the innovation has been phenomenal. Yeah. And we're learning a lot that you don't need months of computation that these smaller nimbler we're going to update them every day they run on a high end laptop. I think that's going to be the future. And it sounds pretty good. But yes, I mean there are security worries. There are security worries in, you know, the the authentication of voice authentication now will no longer works video authentication now no longer works. Right. Yeah, the, you have to send Facebook a scan of your driver's license who thought that was a good idea. Well, now it's going to be a terrible idea. Yeah, all of these sort of security measures that kind of just barely worked now won't. I'm sure JP chat GBT can guess the answer to your secret questions. Yeah. But these are always been like my identity the authentication it's all out there now what do we do is there is there a solution to pull back any of this kind of, you know, security in a world where we're not going to be able to have this kind of authentication. We have to rely on other things. And so you know if you think about your phone right when you call me. I recognize you because of your voice voice and shared history that they know we're going to talk about things that that you know we know that we know each other. caller ID also exists but it's kind of it's not secure doesn't work very well. But if I can't trust your voice maybe have to rely more on caller ID, which means maybe quality needs to be more robust. Things like the phone company needs to do more to prevent sim swapping, because now the other mechanisms are failing. So I think that's going to be a rejiggering of security. I'm just trying to think about this. Yeah, we've been using these informal methods and because they worked most of the time, and they work pretty well. And then we might have to make some different decisions, given the ease of which you can fake data at a distance. So voice or video or photographs of documents, so all of those things no longer work. They kind of never worked, but they'll work, they worked okay, right now they're really not going to work. There are going to be some, some, some bumps. And I think sim swapping is a good example of this so this is a this is a problem where you would when you when a hacker calls your phone company and convinces them to switch your phone number onto their phone. And it's actually surprisingly easy to do. And the reason it's easy is because the phone companies are not optimizing security, they're optimizing for I lost my phone, and I bought a new one, and I need to move by account. They're optimizing for ease of use and customer service. Yep. And that that makes the attack possible. Right, you're making me realize that we're going to need quite a bit of security innovation around these around these issues in order to kind of tackle some of the authentication challenges and even worse business innovation which is even harder. I mean, businesses like to be easy. They don't like security. I don't think this is we need a new tech, we're going to need phone companies to say hey you know we need to make caller ID actually good and not accept crappy caller ID. We might have to do authentication from the phone to the tower, which we never did because we never wanted to and didn't have to. So it's going to be the processes. I mean, in this space I'm a big fan of regulation I think the market actually won't solve this very well. But a lot of our assumptions are changing. And that's what's going on. And then what do we do to get that I think transparency is going to be part of that. There's a lot of other parts of making security work in this new era, when things that we thought were harder turned out to be easier. Yeah. Well it's interesting when you say kind of a fan of regulation I want to turn to one of the questions that we received because it's a good link to it. A question from Adam Holland. If we imagine some sort of mandatory transparency in the mode of the DSA is soon to come database. It's not clear what that would actually look like practically. One different seems to be that in the case of the DSA we have safety and parties making requests regarding content in the case of generated AI we have what there's a huge difference between looking under the hood and seeing what material is or is not in a library's connection so to speak. How do we, how do we think about this how do we, how do we think about whether that kind of mandatory transparency is going to be able to accomplish the goals that we're looking for, and in the generative AI context. I mean I think that's a really good way of framing it. And in a lot of cases, it doesn't so I'm going to demand transparency do I want to see the model. I want to see the training set, like the data that goes into it. Do I, if it iterates do I have to see each version. And I don't know I mean I don't think it gets us what we want. But what I want is I want to know why the model did the thing it did. And really how can I stop doing the fat thing next time, whether it's making a mistake, or acting in in some biased manner that that that either illegal or or unethical we just don't want. Yeah, or it makes a decision that affects me that's important. And I want to know why. Why did it deny me this benefit this loan why did, why did it just make this discrimination. You know put me in one pile versus another. Right so kind of knowledge. Yeah, I mean the transparency I want is, you know submit my, my stuff. 10 times with 10 different parameters changed and tell me when I get different results. And then that's what I kind of mean by an explanation there. Why it did the thing. Exactly why the decisions and stuff are being made. One, one thing that's been kind of you know teed up is the idea of you know benchmarks for acceptable levels of security risks and what that you know might look like in this space. Can you just speak to that for a bit as to whether there's actually a kind of benchmarking process that could go on here with respect to that and then I mean I guess we get to the question of kind of, you know how would we think about what was an acceptable security risk and who would decide that. Yeah, that's really hard to do without knowing how it's going to be used. See the AI doesn't exist in a vacuum it's doing a thing. And you know in a chat box and gauging and it's generating human text in some decision making system making a decision, or it's or it's affecting the world in direct physical manner. I don't think you can have security benchmarks without knowing what it's doing I mean what would it look like you give it an AI system that that does that is there. Without without knowing what it does I don't I don't see it I don't see even how to begin to have a security benchmark is going to be some level of risk what's what's what's the risk that this will make a bad this is like a driverless car like what's the risk, it will accelerate instead of break. And so that we can talk about. We can, as soon as we instantiate it in that type of decision, we can develop a security benchmarks for what that means, which now now is really safety so. Yeah, exactly safety, yeah, but but you know safety to me my differences that safety is against the environment. Safety is road conditions and day versus night and traffic and you know people walking around security is an adversarial environment security is the stickers you put on stop signs to make the AI I think it says 55 miles an hour speed limit. It's a very different way of, of looking at the AI and adversarial way does it survive a malicious intelligent adaptive adversary versus doesn't survive all sorts of random weather conditions. It's an important it's a really really important distinction and stuff. I want to add in we got an interesting question from Jonathan Horitz for you that I'd like to tee up which is, is there anything qualitatively different for LLMs and previous statistical models in that we had a good well understood at least by people with mathematical background theory that explained what the old models did and why, whereas LLMs, we don't have such a theory, no human really understands why they work as well as they do. Is this maybe why you talk more about getting Oracle access to a model than just seeing the source code. I think that's part of it and this is where they're approaching humans right we have no way to understand why they do what they do and we know there's some psychology human explanations are basically justifications they're not explanations they are. They're generated after the fact after people make a decision. So we're getting to these models that are that are so opaque that irreproducible that like see why did make the decision here the two billion parameters that's why that's not a useful answer and explanations are very much a human shortcut. They really represent the way we humans make decisions, not how these LLMs make make decisions. So I don't think we can, we can usefully, you know, open up the insides and see what they've done just like we can usually open up human brains. So now we are stuck to the bad word, we have to use what we can what we can see. Like, what is the output as we're looking at the, you know, the, the, the class, the incoming class of a certain university, and we see, you know, we see a racial bias. And we say, look, I don't care how that happened, you can't have that. Did you use people to use an AI to use a mix we don't actually care. Right. And I think in that case we, we have the tools already to to go after that kind of bias, you know, their places where that is already illegal and it probably should be more. Right, and we thanks to also a Karsha surrender for another question, if not through mandatory transparency which it sounds like we're moving away from is the responsibility to reduce these risks then left to the developers of these models. So what incentive to do so if it slows down improving the capabilities of these models. Well, this is a regulation. Yeah, right. I mean, that's where we are because right the incentive of the companies is to make a lot of money, not to be fair not to be just not to be equitable, not to do any of those things. And if we expect, you know, whatever, you know, pajamas not to catch on fire, Jimmy sweet companies not the higher five year olds to climb them, whatever, all these things we have to pass laws. So yes it's going to be incumbent on the companies to make sure these models are fair and it's come with on government to force the companies to make sure these models are fair. Then it's a comment on government to check to make sure the companies have actually done what they said they do. They never lie in the market for this because the market is not set up to do this. Right. We, this is, I mean, but I think this is larger than AI. I always think of the market as a game playing on a field that government defines, like we define what is the viable playing field in which competition happens. We have lots of laws that talk about and whether they are safety laws or child labor laws or fair practice laws or truth and advertising laws, all of these laws determine how companies can operate, and these have to be more of them. We have no other way of doing that. So that's, you know, I think one of the challenges that we're wrestling with is to try to figure out, you know, both how we can do that on a global basis with respect to it we're seeing a proliferation of regulation right now in, in different spaces figuring out how to harmonize that and then how, how we can actually make that kind of regulation effective given the kinds of challenges that we were talking about beforehand. And so I think it's an interesting, interesting dynamic at play that because of the way so I mean, not an AI and computer security. California has an Internet of Things security law. It's not that great but one of the things that it mandates is no default passwords. So if you sell an IoT anything in California right now, you can have a default password. I guarantee you that no company has two thermostats one for California, one for the rest of the country. Right they fix the default password and sell it everywhere. A good regulation and a big enough jurisdiction moves the planet. So you look at Europe. You look at their new AI law. Right, they're driving a lot of the stuff they're going to require is going to benefit the whole world. Even though the United States is really dysfunctional is not going to pass anything here. Yeah, right. The EU, who I think of as the regulatory superpower on the planet will do things that benefit us. So I'm okay with the regulations not harmonized. I'm okay with different regulations different jurisdictions. I'm almost at time so I just want to ask one, one quick kind of question for you on, thanks to David Harris, do you think it was responsible of meta to release llama to research to researchers with such a seemingly low barrier to entry that it then leaked. Interesting question I'm glad they did. And I think the fact that it leaked. I mean, we've learned an enormous amount from basically the community. The companies never understand open source. They're always blindsided by it has been true for decades. I think it's good. I think the democratization is really valuable. I think getting it in the hands of people who don't have a profit motive is phenomenal. I don't think it's irresponsible that they, they didn't. So I guess I do think it's responsible. I'm glad they did. I think we in the world are benefiting enormously from it. And now it's done me, you can't, you can't put it back in the box. The innovation has been incredible and other open source models are coming. So I guess even if, even if they decided not to, it would just be a few months delay before some of these other ones are coming online you listed a few of them in the beginning. A lot of them are based on Lama, and some of them aren't I guess all the ones that are other was alpaca and I guess when I go is the next one. Yeah. Well, Bruce, thank you very much for joining we are at time so I need to cut this off I could continue to talk about this with you for hours and you know it goes back for me on this of trying to figure out how we deal with the you know some of these communication and security challenges in this world now where it's going to be much more where we can't put the genie back in the bottle but but just for everybody else who's listening please join us next week for our third workshop on enabling Bruce teetered up a little bit on audits and others but enabling meaningful technical oversight of generative AI will have speaking Dr. Ramon Chowdhury tech journalist Julia Angwin and crowd tangle co-founder Brandon Silverman. Yeah, we're looking forward to it and thank you again for joining and Bruce thank you for your words today. Thanks all.