 Hello and welcome everybody. My name is Dr. Ramon Chattery. I'm a responsible AI fellow at the Berkman Klein Center for Internet and Society at Harvard University. And I'm here with Riva Schwartz, research scientist at NIST. Thank you for joining Riva. Thanks for having me, Ramon. So we're here to spend a very what's going to be very exciting and very fast moving half hour to talk about generative AI and its harms. And there is just so much to consider and so much to discuss. So I just want to let the folks joining the call know that we don't have a formal Q&A session at the end. If you do have questions during the conversation, please feel free to submit it on our Q&A forum and it will be surfaced for us and we will be we will answer the questions. As we can get to them. So just a heads up for folks and for people who are just joining. Welcome. And we're about to kick it off. So Riva, it's extremely exciting times and AI generative AI responsible AI anything related to technology and within like the end the technology. And this is playing a very central role in the US. So, you know, just for the sake of our attendees and for all of us, can you give us a bit of just grounding and background on NIST and and in specific its role in the AI governance landscape. Sure. Thank you for giving me the opportunity. I'm excited to be here. Yeah, I mean we're kind of in this state right now with AI where we're kind of lurching from one product released to another. Everybody. So AI was already kind of this crazy ride and then the release of chat GP team changed everybody's schedule and that's all we talk about an AI risk and AI risk is something that was kind of given to us by Congress we were mandated by the US Congress in January 2021 to develop a risk based framework for for AI. This, there's a lot about the reasons why we were given this some of this has to do with our work in in other frameworks including the cybersecurity framework. NIST is a non regulatory non enforcement agency under the Department of Commerce. Our mission is to promote innovation and industrial competitiveness. And as part of that framework being mandated to do it, we undertook 18 months to hear from anyone and everyone. In the end we ended up hearing from 240 organizations across private industry, academia, civil society, and state, federal, local, foreign governments and standards organizations so huge array of stakeholders. And that material is open and available everything we do is open and transparent. Those comments, the list of people who, who, who we heard from is all available on our website. And so, the purpose of the framework is not just about the technology itself, but the organizations that that technology is built in we think that that that environment matters and developing approaches for those organizations to get better at their AI risks at mapping their AI risks which is coming together with a broad group of stakeholders both within the organization and outside of the organization to consider risks from a much broader perspective than than the pipeline. So let's dig into that a bit so you mentioned that NIST is you know, non regulatory non enforcement. There are plenty of people that would say so then what's the point like how does an organization that can't do enforcement have value so what are your perspectives and you know I was very intentional to use the word governments right so I think a lot of people and they think they think the law, unless you're able to regulate it, what's the purpose so can we dig into that a little bit how have companies found value out of NIST how a society found value out of NIST not just in AI but it also in the other work NIST has done historically. Yeah, thank you. We are our work is in the area of standards we are not a standards development organization but we work with private industry and across the standards landscape to help develop the technical contributions to enhance the measurement of of whatever it is so whether we're talking peanut butter whether we're talking atomic clocks whether we're talking that the kilometer whether we're talking AI. Technical contributions that that are interoperable that are measurable that are structured, but in the in the case of AI, it's really more about flexibility. And so the framework is purposefully general. And when we talk about governance. I think that's important because it gives organizations the opportunity to develop governance processes policies structures that fit their specific use case and context and industry. And I think I think we lost the train on that at the governance piece. Yeah, I guess just just that how how they want to implement it is going to be used within their org. And how does society then benefit I mean one thing I noticed that you mentioned is that everything you all the contributions to this etc it's all transparent and that's not something we're going to get from companies. So how does this translate into not just more innovation or more guidance etc but how does that translate into value for, you know, society for people because what we are seeing is we are seeing the direct impact, and everyone and as I sort of jokingly I'm in the nth degree of technology so we're talking about, you know, one of the reasons the writer strike is happening is people are protesting the use of these technologies so how does what you're building at NIST, in some way shape or form help or transform the conversation happening, not just in companies but with regular people. Thank you. Yeah, and I think so we try and be honest workers. I think this is why we're trusted across across industry and and in the government, but our goal and the reason why we're asked to do the framework was, while yes it is about managing and it's ultimately it's about creating systems that are trustworthy and enhancing trust in technology without trust in technology, we have a lot of, we have a lot of societal challenges and problems and so our goal is trying to kind of maneuver in that space to build systems that are trustworthy and enhance trust. So we actually have a question from Euryssa, and, and they ask, how does NIST validate processes and use cases for bad actions in emerging tech, not yet commonplace in society, is there an advisory board or forum so basically how does something contribute to NIST, and then how does this NIST, like NIST and we're about to get into this anyway how does the NIST risk management framework, build into this like validation, and proactive identification of harms. So measurement is kind of our bread and butter at NIST. So a test evaluation validation verification important to note that those are four distinct things that require different sets of criteria and and personnel. We do not have kind of existing working groups that people can participate in and we are not a stamp of approval on on existing practices. We create and release technical technically sound scientifically valid contributions. The framework was part of that, but we don't do that alone. We don't think we have all the answers. We would not exist and our contributions would not happen without the feedback and input from the broader community. So we convene extensively, we have open transparent participatory approaches that just kind of built into the way we work. We used to while we were still building the framework which is now out and was released in January and is out for three to five years before any changes will happen. We used to ask people to send in their comments for the framework that is now closed. You can still though submit your comments on our playbook, which is set to be updated every six months. So if you have feedback for us about what you think should be, you know, a recommendation or a suggestion within the playbook. We were happy to hear it and take it under advisement, and we will also always continue with webinars and workshops and things like that. I love this idea of iteration and kind of going back right one of the critiques of the way us law works is that you know you kind of write a law, and then the law is what it is and then now the courts have to debate things and anything is new. So you'd also mentioned early on that, you know, the NIST, this management framework was a process of 18 months. So, you know, presumably what it was developed in a pre current generative AI landscape. So is there anything new and different that or how I'll ask a question a little bit differently. How does the risk management framework relate to generative AI. Do you see there being a need to revisit like we saw with the EU AI Act. Do you think that it is flexible enough that it absorbs any changes? What are your thoughts on how a company may use the risk management framework to understand the harms of generative AI? Yeah, so we heard that a lot, you know, when we released the framework in January that, oh, well, now you guys have to go back to work because it's already outdated because it doesn't address China's AI. But in fact it does. So they the way we define we don't define AI and the framework we define AI system. It's a lot easier to center around and in fact that definition of AI system may end up being the definition of AI system in the EU AI Act. And that definition allows for the outputs of generative AI. So generative AI is squarely addressed within the framework. We also address generative AI in the AI RMF core, which is our functions, specifically in the managed function we refer to how to manage pre-trained models and models that come from, you know, outside of the organization as third party models, as third party training material. We also reference the generative AI and the risks of generative AI within our appendix that we have where we discuss how risks from AI differ from traditional software. And that in fact, up until the framework, there was no approach for dealing with these kinds of harms. And most of what the framework, how the framework addresses the risks of generative AI is through trustworthiness, building trustworthy systems, and responsible AI practices and governance. So, for example, we talk about validity and reliability, not accuracy, bigger than accuracy, validity, that persistence to be valid and reliable. That can help address things such as the confabulated content that, you know, an emergent behavior that we hear so much about with generative AI. So I'm going to dig into definitions a little bit, because you and I kind of both smirked when you said not accuracy, but you and I have had conversations of our frustrations around sort of circling around the term accuracy. I'm going to point to a question that Meg Mitchell has in here, and thank you for it, Meg. The question is, how does NIST define measurement-first evaluation? So there's a lot of very specific words that you're using, and all of those words are very intentionally chosen. First, have you defined measurement-first evaluation? And then next, can we go into this, like, accuracy versus validity? Aren't they the same thing? What's the difference? Yeah. I'm going to start with accuracy versus validity first, and then come back to the second one. So if something is accurate, there's this kind of notion that you have ground truth and that it's really just, did it meet that, you know, like an answer key? Yes, it is accurate, and it is, but validity is bigger than that. And the question is, is it doing what you claim it to be doing? So in the sense of construct validation and the use of things like proxies for something like higher ability or criminality, yes, you may have a system that performs accurately within that construct. So yes, you have set up some criteria for what is higherable and that you've given it training data that you, you know, can measure as yes or no. But is that construct valid? It's a little bigger. Is that actually measuring what you claim it to be measuring? And can you measure something that is unobservable like higher ability or criminality? So that's the bigger topic. And this is for measurement and evaluation. So I think I want to talk a little bit about measurement because we wanted to specifically normalize the idea that measurement is so much more than qualitative, excuse me, quantitative measurement, and normalize qualitative measurement. Because we have kind of gets a little bit to the hype cycle of AI and this notion that quantitative information is more valuable, more reliable, more objective than non quantitative data or non numerical information. So measurement can be both qualitative and quantitative and for evaluation, we tend to think about, well, I think I have a good definition that we tend to use internally for evaluation right on my desktop. No, I don't. I was going to try to pull out the actual definition. But yeah, I mean, for us, we define and we carry out evaluations in a community based perspective. So what are the tasks that people are looking to perform? And how can you evaluate that from a broad set of factors? Amazing. So I'm going to next try to merge like four questions into one, because they're all kind of hovering around the same thing. So kind of merging Saren, Shivangi, Cody and Matias and everyone's kind of asking about the enforceability. So in particular, Cody is asking about the enforceability at the local government level. So rather than thinking about it at a higher level, even the corporate level, they can governments use this like local and county governments than this RMF. Matias's question is about just, is it possible to even, you know, making it obligatory as part of risk assessment. So now we're seeing, for example, laws passing that would mandate technical audits and risk assessment. Is it fit for purpose or usable for that? And then kind of the broader question that Shivangi asked about whether or not these frameworks are validated by governments and regulatory authorities and becoming enforceable. So basically, like the general, you know, field of questions is something about can governments use this can governments make this enforceable and then kind of at what level, like is it appropriate for a county and local level? Would it be too onerous? Would that not make sense? Where does this live kind of in the vertical of governance as it relates to specifically regulation and audit? Yeah, so as I noted, when we were developing the framework, we did not. This was not intended to be, this is not intended to be legislation. We were given a task of coming up with a consensus based set of technical contributions that organizations can implement to help them manage their risks. I think since then there are, there has been interest in pulling some of the, some pieces of the framework and the playbook in various, or at least we hear about them. We are not coordinating with those. We don't have anything to do with those. But we do hear that California has instituted some language around the framework. I heard there might be other states and other localities that have taken some of the contributions inherent within the playbook. There's discussion in the EU AI Act about that. We are not, like I said, we are not a stamp of approval. We do not measure what other people's performance are. We try and create measurement and evaluation processes to help everybody to kind of lift all boats. But we do not prescribe what a certain measure should be or what a threshold is. That's not what we do. So what do you think it is that's so appealing about the RMF, right? So we're hearing so much interest internationally, locally, companies, governments, etc. Just to speculate, what do you think it is about the RMF that's kind of speaking to people? What is the unsolved problem that it's helping us solve? Okay, I'm going to go with practicality. Because for 18 months, we could not get stuff out fast enough. By the way, we released three draft versions of the framework. We had three workshops, untold meetings with people, and conversations like this in public remarks. And people just, what can we do? What can we do? They want practical guidance. So the framework is, again, perfectly designed to be general because industries get adopted. It's going to be adopted in hiring. It's going to be adopted in healthcare and finance. How can they adapt the framework to their means? And that's what the AI RMF profiles are for. Those are things that from an industry vertical sense that those organizations are in the driver's seat. We don't have to be there at all, but we'd love to be in the car with you all as you think about how to apply the framework to hiring. I think we had an event just a couple of weeks ago with the Department of Labor who were adapting the framework for use in automated employment decision tools, specifically in consideration of disability and accessibility. And then there's the use of creation of profiles across industry verticals. So things like the use of generative AI, how do we manage those risks? Those are things that more likely NIST is going to be in the driver's seat for. Yeah. Great. So here's an interesting question. Thank you, Alex for the question. So what Alex wants to know is what are some of the, so you mentioned that they're the process of creating a lot of the RMF was consensus based. What were some of the biggest points of contention? Where do you see people disagreeing? Well, I will say that one of the top from the, at the beginning it was more about why, why risk? Why not something else? And that was easily answered and that that's what we were told to do. We also think that risk management is an incredibly powerful tool that the goal, the challenge of course is identifying the risks, but we think that our map function helps organizations get better at that. And I think the, maybe not the most contentious but probably the most discussed throughout the entire framework until the very end was how do we know when to use this framework? When should we use it? And our answer was always it's not mandatory. You can use it whenever you want. And we think that these instituting these practices is just going to help you. And even in the absence of regulation, when, if regulation comes along, there's no harm in having your governance practices and your organizational practices improved. And what about, sorry, a quick question because I'm building on a question that Genevieve has asked, you mentioned the absence of regulation, what about the absence of information? What about things like procurement? Because now there's a lot of conversation about procurement in particular government procurement of AI systems and how to identify risks and harms. Is that something you tackle? I think procurement practices would be a use case or an AIRMF profile. And, you know, anybody can start that. Anybody can start a procurement profile and get things going, how to use a framework to manage those risks. Amazing. So I'm going to switch gears. I cannot believe you only have seven minutes left. We had only like five things we wanted to talk about, but honestly the questions have been amazing. So I want to switch gears to last week's White House announcement. It was a very, very big deal. Like there has been, wouldn't I even speculate and say there's a bit of an AI regulatory arms race happening where everyone's, you know, definitely trying to catch up or, you know, see what kind of regulation and norms can be put into place. So can you talk a little bit about last week's announcement? The White House in particular, their investment in building out national AI centers to do additional research. How does that relate to this and the work you're doing? Yeah, well, we are extremely excited about that because we are one of those centers. We are co-hosting with the NSF, a five-year trustworthy institute. One of our centers is called Trails and we are looking forward to working with those researchers and trying to continue the how do we get actionable guidance out that's scientifically valid and technically excellent. So that's our main thing. We're extremely excited about that. We're also excited about the announcement for the DEF CON event. So, and looking forward to that. I'd like to take a minute, you know, to talk about that because, you know, you are engaged in that work happening. So just for folks who are unfamiliar at the biggest hacker conference in the world, DEF CON, I am part of a group organizing the largest ever generative AI LLM red teaming exercise, meaning that basically all the major companies who have built large language models will be participating in a double-blind competition where anybody at DEF CON can come in and hack at the system for about an hour and try to identify harms and make that happen. And even you've been part of, you know, some of the advisory on that. What kind of value do you see this being? I mean, you know, it's like a big splash event. That's cool. It's at DEF CON and thousands of people. Great media moment or whatever. But there should be something more, right? So how might these kinds of red teaming exercises be part of a risk assessment program or risk management program? Yeah, so we're very interested to see those outcomes. I think there's a lot of question about how there's a lot of discussion about vulnerabilities, but also how do, from a risk perspective, which is what we think about, how do risks connect to actual impacts? And we mean impacts are different. Vulnerabilities are different. How do they all relate to each other? Because if we're trying to avoid certain impacts, what are the risks that are most likely to contribute to those? That's a big area of interest of ours. Yeah, and to go back to trails. So as you mentioned, like, there is a significant investment the US government is putting into this trustworthy A.I. Institute. What do you hope to accomplish in the next few years? In particular, I want to loop in a question by Mary Madden about whether or not there are any parallel efforts either within NIST or other agencies to translate this risk framework for the general public. Yes, we hope within the next year or two to develop materials that are useful for any audience, not just technical audiences. We're looking for a number of things out of trails. I'm a social scientist in my area is looking at A.I. systems from a socio-technical lens, and we're hoping to understand that that's really hard and often ignored because it is hard. How do we measure systems from that perspective? How do we measure governance? How do we think about best practices for documentation and contestability? There's so many questions to answer in A.I. and so many places to build out guidance. So we need all the help we can get, including everybody who's here and trails and DEF CON. So yeah, we're happy to be part of the whole community. Amazing. So in the last few minutes, I wanted to ask, you know, the purpose of the series is to talk about this concept of accountability and technical oversight, and this is kind of where a lot of new regulation. If you look at the EU, we have the Digital Services Act, opening up access to researchers, but also access to technical auditors for the first time. In the U.S., we have the Platform Accountability and Transparency Act. Similarly, opening up access to researchers. So, you know, within the realm of NIST and the NIST Risk Management Framework, what does accountable technical oversight mean to you in this setting? So specifically, literally those three words, accountable, technical and oversight. Yeah, so accountability and transparency is one of our seven trustworthy characteristics, but we talk about that is bigger than trustworthiness. I think this gets back to what we started the conversation about which is governance. At the end of the day, the proverbial end of the day in A.I., somebody should be held accountable within the organization. So these are organizational structures that are put into place. Practices that are clear that these risks are managed and addressed. And yes, it is hard. We know that we're looking at a landscape of enormous amount of risks. If you, this gets to kind of calming down some of this lurking back and forth with strong governance that lowers the anxiety a bit within an organization to have those structures in place and accountability structures are top of the entire pyramid. And in the one minute we have, I do want to highlight that you mentioned a few times that people can get started using the NIST A.I. Risk Management Framework today. How does somebody get started? Who is the persona that's supposed to start doing this thing? Yeah, thank you for that. You can get started today. You can look to the playbook. Some of the cheapest, dirtiest, quickest way to do it is to empty out the table and start going through the answers all by yourself with the A.I.R.M.F. And just answer them. So the first one is what are the legal and regulatory requirements in your area? Start filling it in. What are, you know, what do you need to know about what your team looks like? Start filling that in. And we want to hear about your lessons. But hopefully soon we will have templates and practices and others forms of material for everyone to use. And we are at time. That was, as I said in the beginning, it was going to be an amazing 30 minutes. And thank you, Riva, so much for your time. And thank you, our hundreds of attendees for your questions. And if you have any insights or anything further, you can find us on the various social medias. Thanks, Riva. And thank you everybody. Just a reminder that we have another Fireside chat next week. I'm hosted by Sue Hendrickson. So please register for that and we'll see you soon. Thanks.