 My name's Kersti Lingstad and we have two keynote speakers this afternoon who both promise to be very exciting. We've got Renee Cummings and Natasha McCarthy who are going to be talking about data, AI, ethics, policy and much, much more, I suspect. So, first of all, we're going to start with Professor Renee Cummings and I'm absolutely delighted that she's able to join us this afternoon. She has been described one of the world's top 100 women in AI, ethics, and her focus is on criminology, artificial intelligence data and describes herself as a tech ethicist and is in fact the first data activist and residence at the University of Virginia at the School of Data Science. She was named Professor of Practice there. So, her work very much looks at the ethical implications of data on society and she's been really exploring the long term impacts of AI and generative AI, helping us with our understanding on the ethical risks of AI and how to build an ethically resilient right space technology that is responsible, sustainable and justice orientated. Something that I think we can all agree is at the heart of what many libraries do. So I'm absolutely delighted to be able to welcome you Renee and over to you. Thank you so much. Thank you so much Christy for that introduction and let me say it's an absolute honor and a pleasure to be with you. Thank you for that introduction. Today I'm going to look at the contribution that research libraries can make to the future of responsible AI and I'm very committed to responsible trustworthy and mature AI what I've realized over the last to maybe eight years speaking about responsible AI is that the more interactive and integrated and interconnected we get as we adapt and we designed to develop and deploy artificial intelligence and new and emerging technologies. The more we seem to have to go back to basics because what we're seeing is this need to really think about what we're doing in ways that are still more traditional as we enhance our technological landscape This image that you're seeing is an image that was part of a research study done by ProPublica, a journalism outfit in the United States and what ProPublica looked at was machine bias in the criminal justice system and long before this image became very popular and almost instructive as to the direction that responsible AI needed to take within the context of bias and discrimination. I was really thinking about these impacts because I was in the criminal justice system using these risk assessment tools and realizing that these risk assessment tools were creating these zombie predictions in the US about black and brown defendants were engaging with the criminal justice system and realizing that the challenges that we were seeing when it came to data driven decision making or algorithmic decision making were rooted in historical data. So what this image shows us is the risk scores and these risk scores would be used to determine the kind of sentence an individual would get. And if you realize the two highest risk scores, the high risk score of eight for the young woman. And of course I think the risk score of six for the young man. If you look at the offenses that they have juvenile misdemeanors and a petty theft, and you look at the other individuals and you're seeing armed robbery, you're seeing grand theft, you're seeing domestic violence, aggravated assault grand theft and drug trafficking. You've got to ask yourself how effective and how efficient are the algorithms that we are using in the criminal justice system when it comes to allocating our risks and of course allocating sentences and asking ourselves if these algorithms are the algorithms that we're deploying for efficiency and for effectiveness and to bring service delivery excellence that we have really got selves about the ways in which we are using technology. This is another case that happened in the US and has become sort of the poster case for the American Civil Liberties Union and the man in this image with his wife and his children. He was arrested in front of his family in 2019 in Detroit. The police came to his house. His wife called him. She said the police are here. They are planning to arrest you and he is asking if this is a prank, if this is some sort of scam. What's happening here? He said he had no prior contact, no relationship with the police could not imagine why the police would be waiting on him. The police of course said they had images from facial recognition technology of him shoplifting in a high end store in Detroit. He was arrested. He was taken to the police precinct after about 15 plus hours. One of the officers decided to look at his driver's license and compare it to the images that they had from the facial recognition technology and they realized he was not the person. Of course his family was traumatized by that point. He had lost his job. There were so many other traumatic complications that went with this wrongful arrest. So again, we're seeing machine bias and discrimination in this technology that we're using and we're deploying. Here's another case that I just want to sort of open things up. This is Portra Woodruff and last year she was arrested by the police in Detroit. She was accused of carjacking in a gas station when the police came to her home. She was eight months pregnant and she said it's impossible. I've been pregnant of course for eight months. I've not been outside pretty much doing anything. Furthermore, carjacking. The police of course did not believe her story. They came again with their photographs, an image that they thought was Portra Woodruff arrested while she was arrested started to experience contractions and it was one traumatic situation after another. Now here are some cases that you may be familiar with and I think during the heights of COVID, many of us in the world are responsible and trustworthy AI. We're looking at these cases as well. The first one was the visa application process and this alleged racist algorithm and of course the ways in which those algorithms were being deployed and how visa application became a civil rights issue. Another challenging case coming out of the UK was also in 2020 that looked at grading of a grading algorithm that created a lot of chaos for A level students. Of course, predictive policing and predictive analytics, a space that I've spent significant time in also creating a lot of challenges when it comes to thinking about biased training data when it comes to thinking about the ways in which we're deploying algorithms, not only in the criminal justice system, but just across society in general and thinking about those relationships between machine and data and bias and discrimination and the kinds of sort of due diligence that we need to bring into that space. And just using these headlines to show you that beyond the headlines, beyond the headlines, what we are seeing with algorithm what we're seeing with AI, although we know they're extraordinary rewards although we know there's great promise to this technology. Although we know this technology has the potential to do extraordinary things, we're realizing that this technology also needs to be put in check. So when I think about AI when I think about its relationship to research as I said in the beginning, the more advanced and the more connected and the more innovative we get. It seems that we need to go back to basics when it comes to the kinds of ethical concerns that we're seeing so of course discrimination marginalization are profiling with data in particular when we think about policing and and law enforcement and national security and defense victimization where we continue to see vulnerable high needs that you know at risk, minoritized populations continuously are re victimized, traced, you know, terrorized, traumatic experience and of course, all of this sort of link to institutional betrayal because we see, particularly in the US and of course, in the UK are many of the agencies and the institutions and the organizations that have been charged with the responsibility to protect and to serve and and to provide a service delivery excellence when it comes to public benefits and criminal justice to health care to housing, just education as well. We're seeing a kind of betrayal that's now attached to data and algorithms and it all comes down to those historical data sets so it's new stories being created because of new and emerging technologies, but perpetrating the old stereotypes. And this comes down to what I call the atrocity of the black box so we know if we're using AI and if you want a running definition for AI I will say the only definition I work with is that AI is still very much undefined, but we know it's a computer doing human life things. We know it's feeding that computer, the requisite level of data and that computer are really doing the kind of hindsight and foresight and oversight and insight. When it comes to the predictions that we are seeing and the ways in which we're engaging with that technology. So when we think about the black box I want you to think about vastness and opacity. And the black box actually being a space where you would find a times many of the individuals who are designing and deploying this technology, many of the computer scientists, the data scientists, still not understanding the inner workings of the black box and we know where there's no capacity there challenges of course to accountability and transparency and explainability and accuracy and audit ability so many challenges. And all of this comes to the fork particularly in criminal justice because one of the basic tenets of the criminal justice system is the ability to create an accuser. And if that accuser is a black box. What is your recourse what is your address. How do you contest that. So when we're thinking about AI and we're thinking now about generative AI and algorithms that have the ability to make a decision about how to do that in real time. We've really got to drill into that black box and if you really want to think about the kinds of risks and the implication. I will send you always to the EU AI Act, which does a really exceptional job in categorizing risks and and looking at the ways in which we are responsible the kind of regulation that needs to support a rights based approach in the ways in which we really need to think about resilience so this is something that I always say as a critical data scientists you know I've come to realize that data is a cultural meaning independent of its computational meaning and when I'm thinking about that I'm looking at that through the lens of data trauma and the kinds of trauma that we are now seeing when it comes to the youth of data and the ways in which we are deploying data driven decision making or algorithmic decision making or automated decision making and the ways in which in this data fight existence that we now have to live in our data is being used to make some of the most critical decisions but data is also creating some new critical and challenging ethical questions as well. So I always say that different communities have historically experienced data differently and data sets do not have the luxury of historical amnesia data sets don't forget they carry with them a memory and in particular a memory of intergenerational trauma so as a critical data or criminologist much of my work looks at the recognition and exploration of the diversity of arms and cognitive injustices we are experiencing in the realm of AI and generative AI. So we know there's no AI without data. We know AI has a data problem we know generative AI also has a data problem we know large language models have a data problem as well. And we also know they are large language models they are not large knowledge models so it's a language based application as opposed to a knowledge based application we know that data equity is a real challenge and we know that data justice is the kind of reality we're trying to achieve. So when I say go back to basics particularly from a research perspective. The more advanced we get into AI and using AI solutions and AI tools. The more basic we've got to get when it comes to interrogating those data sets because we know that data is never free of human judgment. We know that all data is not created equally we know there is a appearance of neutrality and I always speak about the memory of intergenerational trauma trapped in our data set so as a critical data scientist. What I try to do is to really challenge the very traditional understanding that we may have of data and really try to stretch the imagination of data science by uncovering some of those false beliefs and of course, critiquing and scrutinizing data through many lenses including racial justice and social justice data justice algorithmic justice environmental justice design justice so this is where I think research libraries have a critical role to play when it comes to the design the development deployment. Adoption implementation of responsible AI because AI has a trust issue public confidence and public trust in the technology continue to be undermined by bias and discrimination and the many kinds of crises that we're seeing. So if we are to build legitimacy and if we are to build integrity, it is important for us to give the individuals who are designing and developing and deploying the technology and those of us who are using the technology of a kind of due diligence that is required and of course duty of care so very important when we think about AI and we think about the deployment of AI inspired related anything generated by AI and of course due process I come from the criminal justice system so due process and fairness and equity are critical. And the basic approach from an ethical perspective is really doing no harm. And if we want to do no harm with this technology, we have got to understand that we need to always be mindful of the context in which this technology is being deployed so I think from a research perspective from integrity perspective. What we really needed the moment other skills to build the ethical resilience and to build the ethical vigilance so this is what I call the ethically aligned data pipeline for responsible AI. And we see from everything from collection to afterlife data, there are ethical challenges. There are ethical challenges because there are many cases now when we think about data and deception and the ways in which we need to ensure that there are various disclaimers and of course disclosure so if we're going to bring that record level of due diligence for responsible AI, then we need to think about human oversight. We need to think about security and accuracy traceability detail documentation. These are the skills that AI need that skills that AI need at the moment. And when we're thinking about these skills and when we're thinking about data science and developing and designing and deploying AI, we've got to think about an interdisciplinary imagination. We've got to think about ways in which we need to not only investigate and interrogate and turn information into intelligence, but the ways in which we need to think critically about knowledge so I put together a little list and these are the ethical challenges from fairness, accountability, transparency, explainability, accuracy, auditability, all of these are challenges. All of these are the major challenges with AI that we so require at the moment, a particular kind of intelligence to deal with those. So from a research library's perspective I will say research libraries have a critical role to play when it comes to educating about responsible AI, building the requisite level of due diligence in our researchers in our students in academia in general, and really enhancing those detection mitigation and monitoring and managing skills which are so critical to ethical and responsible AI and this is where I think the most attention needs to be placed is that the more advanced we get in the technology. It seems the lesser we are advancing when it comes to creative thinking and problem solving for critical success with AI tools, prompt engineering is another space I think we've got to think about. And we know what prompt engineering particularly when it comes to generative AI, the quality and the structure of the prompt really impact and influence the effectiveness of the output that you're going to get from your models, the answers that you're going to get. So we need to really fine tune and bring a requisite level of sophistication to the ways in which we are providing those prompts so the relevance and the accuracy of the model really depends on and how the prompt is constructed. And we know that in this space of AI and generative AI, accuracy depends on quality. So when we're thinking about that, I think data literacy, AI literacy, media literacy are the three areas at the moment that are so critical to the deployment of responsible mature trustworthy AI. So one thing that I just want to say before I finish up the relationship between data and democracy and decision making and equity and disinformation. And you know in particular in the US we are moving into an election and much of the conversation is around the intersection of AI and politics in particular deploying of deep fakes and how deep fakes can undermine democracy and of course trust in this question of deception. And one of the most critical areas we're realizing at this juncture is this there is a deficit when it comes to the requisite level of literacy are required to deal with the kinds of challenges that we're seeing. And of course the UK AI assurance framework is something that I really like, because it speaks about evidence based trust and trustworthiness not just trust and trustworthiness, but an evidence based process linked to processes and metrics and the ways in which we communicate, evaluate and we really measure the ways in which we're bringing trust, or we are building trust in the AI space so again safety security robustness, transparency explainability, fairness, accountability, good governance when we're thinking about policies, when we're thinking about compliance when we're thinking about regulations. And this is the part that I like most contestability and redress which is so critical to the ways in which we are thinking about the technology so as I finish it up for you, AI ethics is where all the conversation is at the moment. Responsible AI is also a critical space at the moment. I believe that research libraries have a critical role to play beyond the fact that we know that generative AI is making libraries adapt when it comes to the user experience when it comes to the interface, when it comes to information management and just the administrative tasks across libraries. That's the great things, those are the rewards. But we've got to think about those risks and we've got to think about our responsibilities. We've got to think about those rights, particularly human rights, of course civil rights, creative rights, copyrights, all very critical. And what we want to do is always bring that requisite level of diversity and inclusion and equity in the space. And I think this is a pivotal moment for research libraries to sort of reimagine themselves and in that reimagining, as we become more innovative and more interactive and more interconnected and more interactive, we've got to go back to those basics. How do we investigate? How do we interrogate? How do we turn information into intelligence? And how do we truly understand what AI needs at the moment? And I think what AI needs is really that interdisciplinary imagination, because it cannot stand alone. And it really requires a stretching, not only of the interdisciplinary imagination, but again, sort of a reintroduction of some very basic methodologies that are the ones that are going to build the kind of and the kind of integrity that is required to deploy responsible AI if we think about the long term impacts and we think about the ways we want to ensure all of humanity is about to benefit or should benefit from this brilliant technology. So with that, I thank you. Thank you so much for that. That was a fantastic start and I think you touched on so many different areas and my mind is swirling and I suspect everyone else is as well. So please do put your questions in the Q&A or in the chat and we'll pick those up at the end. And now I'm absolutely delighted to be able to welcome Natasha McCarthy, who is Associate Director at the National Engineering Policy Centre at the Royal Academy of Engineering, where they're very much connecting policymakers with critical engineering expertise and really providing a route into advising from the whole profession. But prior to this, she was also Head of Policy at the Royal Society and very much leading on the society's work on data, digital technology, covering issues including the governance of AI and data use. Very much the theme of this particular session. So I'm absolutely delighted that you've been able to join us, Natasha, and I'll hand over to you now. Wonderful. Thank you, Kirsty. Thank you so much for that nice introduction. Thank you for having me. And I just have to say thank you so much for an incredibly powerful presentation. It's so many interesting themes to explore and really, really pertinent points about the ways that AI has some quite significant problems and the ways we need to address that. So I'm hoping you can see my screen. I'm sure the chair will tell me if not, but I'll talk about where I come from and then I'll talk about what I want to cover in this talk. So as mentioned, I've been able to work in policy through lots of different disciplinary areas across engineering and science and the humanities. And I think that's really helped me think about how we take it really into disciplinary and we take a very cross sort of sectoral approach to some of the challenges that we have in AI. And oops, so let me get onto the next slide. There we go. So when I start, I'd like to sort of think a bit about why AI is such a hot topic at the moment and how we can think about how that really broad interdisciplinary research community can help shape a better AI and how that really broad interdisciplinary community is going to be shaped by AI itself. So one of the things that we have seen in the last couple of years is an absolute explosion in the power and the accessibility of genitive AI. So while AI policy discussions have been longstanding and has been a concern about the value of AI and enabling that value, but also understanding the harsh way I for many years, we no doubt felt, you know, as a society as policymakers as researchers that something changed when really powerful technologies such as jet jet GBT other genitive AI technologies such as image generation technologies were in our laptops. Anybody could use them. They were really accessible. And so for example, you've got a nice little image there. This is what mid journey the image generation app created with a prompt traditional UK research library filled with studio readers. You instantly can create that content. You can instantly create a poem. There's certain shortcomings that you can see because I can't see any studios readers there but otherwise, you know, it looks a little bit like what you'd imagine a research library to look like. And I think seeing this as something that we can all interact with has shown how there's a real kind of concern about how this kind of generative AI can shape how we think about reality, shape how we think about authorship, shape how we think about language and meaning and shape how we think about intelligence. So there's a real kind of sort of palpable feeling of the power of these technologies. And as a result, the UK then last year in November held the first UK AI safety summit, which was an international summit to talk about some of the really big safety challenges around AI. Because this kind of interest that was created by generative AI and this kind of technology really started to spark some concerns about significant safety issues and potentially really catastrophic safety risks. So there was a really important set of conversations that were happening at that point, a very useful international collaboration around those conversations. But I want to pick up one thing that came out of that that will frame the rest of the things I want to talk about. The AI safety summit itself was closed. It was very broad in terms of international reach, but it was a closed event. And so a number of organizations such as the Royal Academy of Engineering where I worked were part of the AI fringe. So this was a week long series of events which were open to the public to have a broader conversation about AI and its safety. At the very end of that we heard from some people who'd been at the summit. So we heard from Professor Dane Angela McLean, the government chief scientific advisor who reflected on some of the things that she had learned at the summit. And she said two really interesting things. She said, these models are more grown than engineered. So they grow organically, they harvest big data, big found data that is out there, rather than being sort of carefully designed and engineered. And she said we need to engineer safety in from the outset. And she also talked about the fact that the summit had highlighted the need for a scientific and research led approach to developing AI. So for example, many AI systems are very difficult to reproduce, they're very difficult to verify. So some of those core scientific research standards of reproducibility of being able to replicate results is not seen in AI. So we need to be building AI in that way. And I'd like to pick up on how those sorts of ways of thinking about AI can enable us to create better AI and how the research community and research libraries can play a role in that. So first I'll say a little bit about some areas where AI is really creating opportunities for research. It's really nice to be able to create an image of a library to be able to write a poem or to be able to ask a question of chat GPT. There are some really fundamental opportunities being unlocked by AI and I think that's really exciting to see how it can help us really accelerate and broaden the scope of research. So just to pick a few. The first one there is the Alpha Alpha fold protein structure database. So this was developed by DeepMind and the European Molecular Biology Laboratory and the European Biomathematics Institute. And this is a way of using machine learning to predict the structures of proteins, the way that they fold. And this is really important because it takes a long time to work this out. Experimentally, this is quite intensive. And to be able to do this very rapidly, firstly, frees up costly and energy intensive research infrastructure for very specific areas that can't be dealt with by AI. And it also unlocks the opportunity to do more drug discovery and address illnesses that affect the human body by understanding the structure of proteins. DeepMind is also working on alpha fusion. So this is the kind of perennially longer waited area of fusion energy where at the moment we can create fusion reactions but for very short times. So AI is helping to predict the conditions that enable that to go on for a little bit longer. And we've got there a building an AI scientist. So this is a vision from Schmidt Future. This is a philanthropic organisation by Eric Schmidt and his wife. And the vision there is thinking about how we can use AI systems to do science really rapidly, so be scaling up what any individual scientist can do and unlock potential discoveries at a great pace. So these are really exciting potential areas for AI to shape research. And I think it's really important to see that these things are really valuable uses of AI. We see many uses of AI. We see some of the lower risks, some of the higher risks. And I think it's always very interesting to think about when we're investing in AI, where do we want to put most of our capacity and efforts and thinking about these areas which could be potentially very valuable is really interesting to think about the context of AI. And this is a little bit of a plug for some work we did at the Royal Academy of Engineering to think about this concept of engineering responsible AI. So going back to Damangella's points about how do we engineer in safety from the beginning. And three particular experts that we talked to talked about their sort of hopes and fears for AI safety and they talked a bit about the nature of this technology. And I think one thing that's really important to understand about what we call frontier AI, also people refer to foundation models or general purpose AI, is this is the kind of AI that can really make a lot of use of large amounts of data including unstructured data. So they pointed out this huge opportunity to make potentially very good use, as we've just seen, of a huge amount of data that's out there. There's a really great opportunity, there's a really great hope, but there's quite a few fears as well. So there are fears around the questions of how we use this safely and how we ensure what comes out of this data is correct. Because the important thing to know is that things like Generative AI, they look at statistical correlations. So for example, large language models, look at how a language works statistically, what's likely in terms of one word followed up following another, they don't necessarily tap into meaning. So that creates some problems which I'll come on to now. Before that though, I think it is worth noting that what I just talked about in terms of research challenges really focused on experimental research. But the fact that we can tap into that vast amount of unstructured data means that we can get into more text-based research. So I think that's a really interesting thought in terms of how we can, if you like, unlock the library, use AI to access and use lots of data of different forms. And I've seen some really interesting work done by the Alan Turing Institute on this where their public policy team, for example, uses Generative AI to look at free text responses to government consultations, being able to get detail from that that perhaps human readers can't. So again, really exciting opportunities, but some no doubt concerns. So in terms of the challenges and limitations, Rene really powerfully set out some of those real problems with AI and the data that it's drawn from. So for example, we have the Alan Turing Institute, they're looking at bias in facial recognition technologies. Rene showed just how impactful and harmful it is to people when biased AI is used in that way. And there are people looking at, you know, why this happens and how it can be addressed. This is not just about, you know, harvesting image data. We also know that research itself has not always been inclusive and many sort of research data sets involving people don't necessarily reflect all groups in society. So we know that a lot of the data and the research that's out there can have significant gaps in and significant biases in. And because, as said, data is the fuel for AI, if there is bias and gaps in that data, there will be bias in gaps in the AI itself. So, you know, as a sort of a research method, we need to think about how we can do AI in a way that more properly collects appropriate data for the research question, rather than purely relying on big or found data that is already kind of infected by lots of historical biases in society. We also know that AI has certain aspects which mean that it's very difficult sometimes to be transparent and explainable. So Rene talked about the black box. Because of the way that machine learning systems work and they train themselves, we don't always know the process by which machine learning system comes from the data it takes in to the outcomes it creates. And of course, in the context of science, that's really problematic in terms of understanding, for example, causality in terms of understanding why an output has been achieved and understanding why decisions are made. So without that explainability, that sort of robustness and the rigor that we'd expect in research and science is difficult to achieve. Because of that, we do have some challenges with reproducibility. That can be because the nature of the algorithms is that they are black boxes. It can also be because we don't have access to the training data that the AI uses or the code that it uses. It's nice to see there in that reference the point about work done to try and create that reproducibility, try and create those standards for being able to reproduce and verify and replicate AI results. So that's very pleasing to see. And at the bottom there, I've got an image. So lots of the issues that we see with things like generative AI are not new. They're not new in that they are endemic to other areas of AI, but they're not new in society either. So that's a lovely image that we used when I was at the Royal Society in some work on the online information environment, which is an image from the 1920s of what looks like fairies at the bottom of the garden, which lots of people were duped by. But now, of course, we can do that at scale and at pace. And I think there's a risk of both fakes and falsehoods coming out of AI. So we know that AI can be used to change images, to generate new images. And there's a real challenge there in understanding what is actually representative of reality and what is generated. And the problem with that is that the more artificially generated and fake content there is say on the internet, the more large language models and other generative AI will feed from that replicating some of those falsehoods. We also know, as mentioned just before, that because the nature of these technologies are that they predict likely patterns of language, plausibility rather than truth is the kind of guiding principle for how things like chat GPT creates output. It can be incredibly powerful, it can be accurate, but it can also, as they often say, hallucinate. It can say things that sound likely, but are simply not true. And that's the nature of the technology itself. So we have to think about how do we think about what is a trustworthy information environment in the context of being able to create this artificial outputs, which can be wrong and they can be fake. So how are we addressing all of this? So at the AI safety summit, one of the things that the UK government announced was a AI safety institute. The models, the foundation models that are power these technologies are, as they mentioned, they're grown often rather than engineered. So how do we know that they do what they're supposed to do? And the AI safety institute will have a great role in testing these models. Other things that have come out of the UK as the AI white paper, central to the AI white papers approach is balancing innovation with regulation and policy. And that takes a real principles based approach. So getting into those questions of safety and security and robustness and models, thinking about transparency and explainability, ensuring the fairness of algorithms and the decisions that come out of them, and ensuring that we can have accountability and good government and testability and redress for decisions. So the UK has been shaping governance and shaping regulation with those principles in mind, which are really important. And as really mentioned, we've seen some groundbreaking legislation coming through European Parliament just this month. The AI Act was voted on by the Parliament and approved. And that really does some powerful work to address the most high risk uses of AI. So the kinds of examples that Rene was talking about the harmful uses in surveillance, and it highlights the need to be open about when content is created by a joint of AI so we can have that sense of prominence and trust. It also thinks about how you just take that broad risk based approach. But interestingly, it doesn't cover AI for science and research. That's an area that there's a gap. So how does the science and research community come into that gap? Think about how it can create good governance for AI for itself or for wider society. And think about how we can bring that really good research value, that real sense of scientific process and engineering and secretary from the start into AI. So here's some thoughts. I think there's two main aspects of this is thinking scientifically and thinking from a research point of view about what goes into AI. And thinking about what comes out of it. And alongside that, thinking more broadly about how we design the experimentation and research that shapes AI. So for example, in terms of access to data, much of the kind of really good quality structure data you might want to train AI on might be sensitive, it might be private, it might be commercially valuable. So for example, work I was involved in at the Royal Society involved thinking about how do we create safe spaces to enable access to data in a privacy enhancing way so that can allow analysis across different parties even when they can't shape data. So enabling that kind of trusted and trustworthy data environment, I think is a really important way in which the research sector can shape good AI. But that data is often biased even when it is from good sources like research because of the ways that research objects are selected and historical biases and doing so. And I've got an image there from Simulacrum, which is an example of a synthetic data set. So these are data sets that replicate features of real world data, but they're not about real people. They have a certain privacy preserving element to them, but they can also be designed in such a way that really addresses those data gaps and those kind of over sites in the data that can create this really harmful bias. I know less about this, but I was intrigued by this idea of guinea pig bots, so not just artificial data, but even artificial research subjects. Again, when we know there are certain groups that are hard to engage in research or have not been traditionally engaged in research, can we do better? Can we create systems that can address those gaps? And then on the right there, you've got an image from Content Credentials, which is a new product from Adobe. And this is the other end of the spectrum. When we've got this output from AI, how do we know it's accurate? How do we know it's true? How do we know where it comes from? And so this is one way, for example, of creating provenance of data, showing where images come from and showing that they come from a certain source that you can trust them. And I think this is really interesting in terms of markers of trust in the online information environment where we don't know necessarily what's true or false. How do we create those markers of trust? And I think tools like that are interesting to explore. But in addition to all these specific tools, I think there's a really important point about how AI should be designed and engineered and researched in a really intentional way. Again, I really like that point about engineering and safety from the start, engineering systems and doing them scientifically rather than just growing them. And that involves things like, you know, what is a research question? What is a legitimate use of data? Where can AI best be applied? What research challenges and social challenges and policy challenges are best stressed by AI? And I think that helps us really think mindfully and intentionally about good use of AI. So just to finish, so where do we see the role particularly of the research system and of libraries? So again, I think there's a role here both in terms of what goes into AI and what comes out of it. And a couple of particular initiatives there around enabling good quality access to data are firstly Smart Data Research UK, which is a UK-rise research data program to enable researchers to get access to a large amount of data. We've got the Health Data Research UK creating what's called a trusted research environment. Highly secure data environments for using well curated data. And I think this is all about really curating and accounting for the quality of data and the way that data is really representative to really make AI work really robust and rigorous. But I think there's also a role in terms of what comes out the other end. So when we did our work at the Royal Society on the online information environment, we talked about trusted and trustworthy institutions that can help us navigate that online environment. And libraries know that have a key role in helping to see where is information that is good quality? How do we tell good quality information from poor quality information? How do we understand the provenance? How do we know when something really is what it seems to be? And that is absolutely critical for all of us as citizens, but also understanding the value and the quality of research that we produce. So I think with that approach of the research system, ensuring that the fuel for AI is really well curated and well managed and ensuring that what comes out of AI is rigorous, robust and has good provenance. I think we can create a much more trustworthy system. And I think we have a real opportunity then to access lots of data across disciplines, across sectors, all of which is essential for addressing some of the biggest research and policy questions that we have at the moment. So I do feel optimistic, but I also feel there's a lot to learn from the research and science community to ensure that we do this in a way that really does create safe and responsible AI. Thank you. Thanks so much for that Natasha. I can invite Renee back as well and we'll now come to the Q&A part of the session. And I think you've both given us a lot of food for both there. And I just like to sort of kick off and ask both of you, you've kind of touched on data and trusted data a lot and making sure that we've got high quality data to work with. And perhaps Natasha, if I can go to you first. How do we within the library and the research library enable access to that data and information so that it can be used for the AI development? And then Renee, if you could perhaps pick that up as well. But Natasha, first of all. Yeah, I think that's a really interesting question. And again, going back to the work we did when I was at the Royal Society on online information environments. One of the points made there was, of course, libraries hold huge amounts of electronic and digital data. Lots of publications from across the web, from across eBooks and so on. And at the moment, there are some limitations on access to that. So I think there's a real opportunity in enabling access to all of that data. I think it's really important to think about that in terms of the nature of it. So this is a very, very broad deposit of lots of data of different qualities. And I think that enabling access to that information, enabling access to electronic publications can open up research and can also help us understand the sources that we've been working with already and help us understand what is already out there. And so it's not just about data for research, but it's also, I think, a way, data about research as well that we can access through that. Thanks so much, Renee. Thank you. Fantastic question. I think I just want to sort of buy plus one to Natasha's response, which is the paywalls. And the challenge is when we're building many of these technologies, we're scraping from the Internet. So when we scrape from the Internet, we're getting the good things and the not so good things. We're getting the things that we can use and the very toxic things as well. So those hallucinations or the deployment of generative trauma attached to generative AI are all challenges with the data. The other challenges, you know, we speak about the data cartels who really protect the data that we want. And we would find that the kind of data that we want, most of the time we've got to pay for that. And the challenge now with society is that we need data just in our everyday life to make informed decision-making. Data in itself is about intelligent decision-making and decision-making accuracy, of course decision-making equity. So the challenges that we continue to see is the data that we need is the data that's behind the paywalls and many times the people who need access to the data to make the decision do not have those kind of financial resources. So we've got to democratize the data. We've got to democratize the data and we've got to just create more access to the more knowledge-based data as opposed to just the sort of natural language types of flows that we're seeing in particular on the internet where most of the developers go to get the data that they're using that continue to create these kinds of challenges that we now have to deal with when it comes to, of course, the bias discrimination and the kind of generative and algorithmic trauma that many communities are now victim to. I was just wondering, Renee, you spoke very eloquently about that sort of trauma and I suppose working in research libraries and also with archives, we're very aware of those biases that are within our sort of collections and data sets and I suppose I'm really curious, how do you see us, if we're using some of this in order to help develop AI how can we make sure that we navigate and curate that content to not create further trauma? I think we do that with always bringing a trauma-informed perspective to the ways in which we're using data and when I spoke about data having a memory beyond the computational, beyond the statistical, data carries with it every decision made, the good decisions, the not so good decisions. So if we want to think about a harm reduction within the context of AI and generative AI we've always got to think about that duty of care, our responsibility to the many stakeholders and of course to society and we've also got to think about bringing the requisite due diligence and vigilance to the work that we're doing. So two things I always speak about which is a justice-oriented approach to AI and a trauma-informed approach to AI and if we bring those together then what we will get is a technology that does the minimum amount of harm because at the moment what we're seeing is a technology that is doing extraordinary harm to already impacted communities. I think what strikes me there that you're talking about is very much that we need to curate and almost bring that human element in. We keep talking about this data as if it's something that is just a neutral set and actually in order to get a set that we can really work with we need to curate and put actually quite a lot of human effort and thought and thinking into it. And I just wonder whether you've seen any really good examples and good examples in research libraries of actually putting that care and attention into it. Well it really comes back to that human-centered approach when we're thinking about AI and bringing a human-centered design to the ways in which we are doing it. In particularly I cannot highlight a particular library that's doing that but I think at the moment what we need most from our research libraries is for research libraries to take that lead because as we try to integrate this technology across our classrooms we're seeing that more and more our students need those basics media literacy, data literacy, AI literacy. So we're building something that we're promoting as extraordinary and what we need most at the moment would be those basic methodologies when it comes to investigating the technology, when it comes to interrogating the technology, when it comes to this understanding that the outputs from large language models are not knowledge-based, they are language predictive texts. So more and more and I made reference to it in particular in the US now that we're into election mode we're seeing most of the conversations around the election attached to deception and the fact that now democracy is at risk and democracy is being undermined because many of the voter population may not have the requisite skills to understand what is a deep fake, what is not a deep fake or to understand the content that they're being fed may not be the best content for them to make informed decisions. And I think again one of the questions we've just had through is how do we really sort of tackle that AI literacy teaching in order to really improve and tackle some of those things that you were mentioning and I wonder Renee what are your thoughts on that and Natasha again in your sort of sphere what do we need to look at there? So I'm thinking that the conversations we need to have across library and classroom would be those conversations how do we build that due diligence which is so required to be able to understand the kinds of risks what are our rights and our responsibilities and how do we build resilient and sustainable technology. So most people may not be at the moment rushing to the library for content because everyone wants to generate that personalized precision type of content that interface with the GPT technologies and of course their fingers but what we're realizing now is that libraries have got to sort of step into the space and say listen this is what we need to offer at the moment which would be the due diligence which would be those critical thinking and problem solving skills that so many of our students don't have and so many members of society don't have at the moment so I think libraries also have a greater role to play in the community in upskilling communities and stakeholders and the various publics in real time as well. Thanks Natasha. Yeah thank you I think this is a really hard question and it's very easy to say things like you know we need greater AI literacy to help people understand the information that's coming out of AI systems the information that's out there. It's quite hard to think about how you practically do that and practically have the reach you need so I think it's a really interesting question that needs thinking about. Some of this is really just fundamental thinking about how we understand the quality of outputs of research and how we do things like referencing all of the things that are simply fundamental to research and some of that has to come through the education system but I think there's also that value in just showing how library systems and others do that kind of catalogue in that curatorship that understanding the provenance and the source of information just so that it's a bit more in people's sphere but to me and one of the things I think about a lot in terms of AI is it's very easy to take a technology such as some of the image generating apps or chat tube GT and play with it and see it as something that is out there, you can do things with it some things are helpful, some things are dangerous but it's a human technology. It's something that we make and we can make it in different ways and what I'd really like to see is much more opportunities especially for younger people to be making the technology themselves and creating themselves and in a way that helps open up that black box that oh this comes from this sort of process and I can manipulate it and I can do it and therefore you don't sort of just have this expectation that this is created by the system and you're not really sure what it is, you can start to see that it is a technology that is designed in a certain way that can do certain things that can't do others so I think literacy about information and the source of information is important but also I suppose that kind of ability to do some of it yourself and to create some coding tools and some AI tools so that you can see that this is a technology with its limitations with its complexities and isn't a magical source of information that's either good or bad. That's quite interesting because we've got another question through around about do you think that we'll ever get to the point of trusting the data and actually is that ever a place we'd want to be so to both of you perhaps Rene if you'd like to start on that one. When it comes to trusting the data I don't think we're going to be doing that anytime soon we know that data sets are very compromised we know that data sets really can corroborate any kind of storyline that we want I always say if you interrogate the data good enough it's pretty much going to confess to anything you wanted to confess to so I think what we need would be those critical as a critical data scientist and someone who continues to challenge the data power structure the privilege and around data of course the prejudice and the politics data is very very political and we've got to bring that really sophisticated contextual understanding of the ways in which we're using data and I'm very committed to data I'm very committed to data when it comes to intelligent public and private decision-making but I don't think we should ever become that comfortable to trust data there's great things that we can do with data but we got to keep thinking about bringing those critical skills and always investigating because we know data is not objective we know it is not neutral we know there are many challenges and we definitely know and I said it in my presentation that historically and we continue to see it that different communities experience data differently and different communities have had different relationship with data some communities very traumatic traumatic relationships and of course some communities not that traumatic Natasha? Yeah thank you I think a couple of points there's a sort of sense in which AI just hoovers up data and comes out with results but there's a huge amount of data wrangling and data engineering that has to go on so to get data to a usable a usable format to make it really like effective fuel for AI it takes work and I think that for example one of the most growth jobs in AI is is data engineering so it's not that the data can just be picked up for many AI systems it needs to be processed and I think it's really important that in that processing of data we don't just think about that in terms of cleaning the data and making it interoperable which is essential but also looking at the gaps and looking at the holes in the data and this is where again I think it's quite interesting to look at things like synthetic data so some of that data wrangling might need to be about creating better data sets so that they are more reflective of characteristics across the whole of society and so that we're working with data that we have perhaps made a bit more trustworthy but I completely agree with Renee it's not so much about getting to the point where we trust data it's getting to the point where we know what questions to ask of it and how to interrogate it again that's part of the data engineering job but also data users and users of output from research need to ask those critical questions and if you get to the point where you just think oh this looks like it's from a good source it should be fine I think we're probably not doing good research work I think again that's where the library is very much sit at the heart it is about constantly asking the question and rebinding your question and also checking the provenance of things but I think what also strikes me and again it'd be interesting to hear what you think is just the sheer cost that's involved in creating this trusted data in order to have AI that meets the sort of human requirements that we're looking for and I'm just wondering the reason perhaps you know can we afford to create this data that we really need to be best go about it when we're trying to bring in different groups across society and getting them engaged and getting them to understand that value of engaging and contributing to that contribution the data do you have any thoughts around about that? Certainly I think when we look at the social cost when we look at the environmental cost when we look at the ways in which data is just really deploying some challenges as I said as a data scientist and a critical data scientist I'm committed to good data science and the ways in which we can really build the processes and the systems and the solutions that we need and the ways in which we need to just reimagine data because there's so much we can do with it but the challenge is this the kinds of risks and the kinds of crises that can be created with data sets that go rogue or with data sets that we don't bring the requisite kind of due diligence to the cost is just a cost that we cannot really sustain as a world so data is being used to refine and redefine and reimagine just about every business model as is AI and of course generative AI and one of the things I always say to my data science students is that as data scientists you are now architects of the future you are now designing a future and you have got to really think about your social responsibility so this is why the moment we're seeing those connections between data science and social justice and racial justice environmental justice even design justice the ways in which we need to bring a human centered approach to the kinds of user interfaces the ways in which we're designing for accessibility just representation visibility these are all so critical so the thing is that we don't have the luxury or we don't have the comfort to sit back because AI is so powerful it's so pervasive of course gen AI even more powerful so we've got to do it the right we've got to do what's right we've got to do what's right in real time and it really requires us to do the cost benefit analysis and look at the impacts the long-term future impacts and know that in this present moment we each have you know there's a responsibility that we each have that just really requires a duty of care that is so so important at the moment Natasha? Yeah thank you I think one of the things that strikes me a lot is just the incredible energy intensity of holding data storing data analyzing data some of the best ways to analyze data using things like privacy enhancing technologies create even more energy use and I think there's a statistic around every chat that creates an opportunity query uses half a liter of water this is all completely invisible to the user and I think that this creates an opportunity to think more mindfully about the data we hold and the data that we use and how we use it there is an environmental there's a computing cost there's a financial cost and you know potentially a real social cost of using data in certain ways and that's why I think well development of developing I just mean that there's a lot of work using big data and the things that you can do and so you know learning from the data sort of opportunistically in a way that is quite organic but I think there's a need to move to a much more kind of intentional well curated and well designed process so you're using the data you need to answer any question that is important rather than haystacking data thinking oh we might get some insights from this data if we hold it for another 20 years that is costly in many ways and it is not necessarily good quality data so I think being I guess more frugal with data and the way that we use it can help create a better system that's really interesting because again there's a lot of conversations that we sort of have when we're talking about digital preservation and some of the sort of other areas that sort of impact on this and I think again I almost feel like when we use to catch EPT it ought to come with little symbols saying you've used so much water and so much energy and just to like with so many other things make us rethink about how much we just use it without realising that actually switching that on and asking that question has a bit of an impact I've got another question here from the audience which I just like to put to both of you which is building on the point about content being fed into the tools what strategies do you envisage for sustaining AI tools to discern pseudo scientific literature from mainstream academic literature which again both of you who sort of touched on so perhaps Natasha would you like to start off Yes thank you so I think primarily it's not just assisting the AI tools it's assisting the developers and the use of AI I think a lot of this is really about those human skills of knowing where the good quality data is the good quality evidence is and using that and you know we have good knowledge on where to find good research and again the work I was involved in at the Royal Society looked particularly at misinformation in a scientific context so looking at questions around misinformation about climate about vaccines and about technologies such as 5G and one thing that was really useful about that is just understanding the kind of forces that shape that misinformation sometimes it is malicious sometimes it is people coming together because of a joint interest or concern about something and sometimes it's a legitimate reaction to things carried out in the past that make people distrustful of things so I think there the point is that understanding you know the information people get where they take it from and why they use it is something that AI developers need so it's not just something for AI tools the other thing is that I'm really interested in ideas around sort of markers of trust in the information system so you know we're very used to some markers of trust in the cyber security area we know for example that the cyber page is secure if it's got a padlock we know that certain payment systems are secure if they ask certain questions and I mentioned the content credential told by adobe adobe particularly have a strong view on that one as an individual but I think we are going to think you need to think about moving to this situation where there is that kind of visible marker that people I mean well I see that this is right much as you'd see like a standards mark on something to show that it's safe it will be really interesting to see how that applies to information and data so people can say this is verified this comes from a good quality source it is a certain standard and it's of a certain quality Thanks, Bene Sure and I will just add to that because that was a brilliant answer there Natasha I would just add to that what we're seeing in particular now when it comes to data poisoning and data poisoning is what many creators are using now to fight back against generative AI kind of like poisoning the data sets themselves and within that context we're seeing that when it comes to those sources more and more and more we are going to have to see of course the kind of intellectual property and of course the creative rights and the copyrights around data and just respecting to the fact that many of those data sets that we're using many of those data sets that we require to do the kind of scientific work requires that the sort of what I call the ontology and genealogy and provenance of data which is so critical because data is now like our DNA unfortunately and we're seeing more and more just new relationships with data that just need to also be curated we're also seeing just the interaction between machines now and in human subjects when it comes to research and sort of a reimagining of the Belmont principles as well that's sort of required so there's just so much happening in the data space that's requiring us to really and truly reimagine our relationship with data and understanding as well that many of the developers many of the designers many of the technologists who are doing this kind of work may not have that sort of a disciplined way of thinking of scientific research because it's about bringing a product to market and that is what the focus is on bottom line in real time as opposed to really solid scientific research behind the technology Thanks for that and I think we're just about I think we've gotten to the end of most of our questions now so I think one final one for me if there's one thing that research libraries could really do to make a difference in this area what would you encourage us to focus on Rene first of all and then Natasha I will say in real time invite yourself to the AI party and really really take up some space and of course it calls for a rebranding of the research library methodology and understanding at this moment something more is required and really just have the courage to go out there and say you know we have a very significant role to play when it comes to the designing and developing and deploying of responsible trustworthy AI and as advanced as we get with the technology we cannot forget the basics because on the basics is which we need to build on. Thank you and Natasha. Yeah I think a key thing about responsible AI for me is using it for really important issues so addressing the big questions that we needed to tackle in society looking at how we address climate change looking at how we address inequalities and that needs access to really diverse data really kind of a wide range of data from different sources and you know the library has all of that you know the specialist institution it has all disciplines and I think it goes back to the point that we made earlier and René made really powerfully about democratising access to that, getting it open and enabling these powerful tools to be able to look across different disciplines different sectors, different areas of life and use that data for the most important questions that we need AI to help us to answer so that would be my answer. Well thank you both very much and I think you're also reminding us that actually a lot of what librarians do at the core around about provenance trust, managing data metadata and looking at all of that is really at the heart of what's happening and actually that we do we have all these skills but I think actually René you put it very powerful we need to re-articulate that so that people stop thinking of the image that you brought up Natasha of the research library as those you know ranks and ranks of books and that kind of smellable leather and all of that but actually that we have a role and we're very much active in this kind of modern day and age and that a lot of what we do is managing digital data and actually that we do have a lot of these key skills and that we shouldn't be afraid or scared of kind of talking about ourselves as being part of that sort of data science process and that we've got a real role to play within that so I think thank you both for reminding and highlighting that for all of us and I think that's a call for all of us to take that forward