 Good afternoon everyone. Thank you for joining us this afternoon here at the first first session at CNI Fall 19 My name is Jason Griffey. I'm gonna be leading us off I'm also here with Keith Webster. We are going to be kind of splitting this roughly in half I'm going to be starting us off with a sort of introduction and some big picture sorts of thoughts and then Keith is going to be following up with some more specific work that he is involved with Carnegie Mellon and elsewhere So thank you for you know joining us First by way of introduction for those of you who may not know me I am a librarian so most of my sort of perspective on technology comes as part of that sort of deep-seated Identity piece of myself. I'm trained as a librarian, but my Background is also as an academic and a researcher. I've been a fellow and an affiliate at the Berkman Klein Center for Internet and Society at Harvard in the past and My current role is as director of strategic initiatives with NISO and so I'm standing in front of you today Largely because I was the editor and partial author of artificial intelligence and machine learning and libraries with the American Library Association last year So what am I going to talk about for the next 25 minutes or so? I'm gonna move pretty quick I'm gonna give a very brief overview of what I mean when I say the word AI right when I say artificial intelligence What is it that I'm talking about because it can mean a lot of things? I'm gonna give some examples of sort of AI in the world. I'm going to give some very specific Recommendations for things to look at that impinge upon the world AI impinging upon the world of libraries and information the information professions publishing and elsewhere and then I'm gonna go over some of my concerns and conclusions about sort of where we should be thinking what should our Five year ten year kind of outlook be what should we be worried about? So we're gonna start with just the state of artificial intelligence when I say artificial intelligence I'm using it very broadly to encompass things like machine learning and deep learning and neural nets and sort of the entirety of the spectrum of what we kind of technically mean when we say the words AI and I'm also talking about a sort of very Narrow type of AI for those of you that may be imagining Terminators and Skynet that is not what I'm talking about right. I'm talking about sort of weak AI rather than corresponding strong AI strong AI is usually understood You know sort of a general intelligence something that is capable of reasoning like a human That's not what I'm talking about at all. I don't think we're there I'm not sure we will ever be there what I'm talking about is what's you know understood as sort of weak or applied AI where we have a Machine that is trained to do one thing very very well Usually better than humans are capable of doing that particular thing But if you ask it to do something else it falls down and falls over right if you asked If you asked IBM's chess playing AI to play go It would not be able to do that because those are two very separate sorts of endeavors And so When I talk about AI I'm using the sort of weak AI something that is trained on a specific corpus with specific data to do a particular thing One of the things that is different about AI machine learning and technology in general Then sort of some of the other areas that we talk about in information science on a daily basis is AI AI machine learning is Heavily driven by advances in technology and technology doesn't change in a linear fashion technology changes in an exponential fashion, right? so As it gets faster and better it gets faster and better and the way that I like to sort of frame this what I do talks is that If you think about the moment we are in right now, this is as bad as AI will ever be right It will only ever get better from here And it will only ever get cheaper and it will only ever get easier to do it will never get harder It will never get worse That's what exponential means All right, and it only ever gets you know faster and faster that curve is not Again straight. It's a hockey stick and so the changes that we have seen over the last Five to seven years of in machine learning how it has it has gotten So much better so quickly. Well, you know buckle up because it only gets faster So what are some of the sort of? Active modern current uses of AI that I think we should be aware of thinking about you know kind of taking advantage of every day There's a number of them and pretty much everyone in here probably has an AI in their pocket, right? How many people in here have a smartphone in their pocket right now or in your hand if you're tweeting, you know, that's fine Right, right like almost everyone has a smartphone if you have a smartphone a modern smartphone It has a dedicated AI engine inside of it that is working every time you put it in your pocket to organize things like your photos Right, so this is an example of my photo library where I searched for the word beach It gave me these pictures not because there is someone that has applied metadata to this set of photos that have You know in one of the fields the word beach It is because the AI has been trained on a corpus of millions and millions of photos to look for beach like things And when it looked at these pictures it matched a certain criteria for beach like and it returned these to me Right, so this is not the result of any sort of manual classification. This is all happening automatically It's all happening automatically every day all the time in your pockets So this is the sort of AI that we are all living with every day that we may not sort of think about as machine learning and AI That's a fairly trivial Example of AI, but it becomes very very important Even something as sort of trivial as image analysis becomes incredibly important when you think about things that are being done with AI In medical diagnoses This is a quote from Jeffery Hinton at the University of Toronto If you work as a radiologist you're like Wiley Coyote in the cartoon You're already over the edge of the cliff and you just haven't looked down. There's no ground underneath It's completely obvious that in five years deep learning is going to do better than radiologists. It might be ten years So he wrote this this is from a New Yorker article in 2017 Two years ago almost three almost three years ago And it should come as no surprise that we are already seeing results where AI trained to recognize cancerous cells melanomas Breast cancer etc anything sort of that relies on imagery is doing so at much higher rates than human experts So how many of you are at universities? How many of you universities have radiology programs? Awesome You should probably worry about that This is again this was 20 March of 2019 There's a lot of replicable studies showing that they are better that AIs are better at Image diagnoses image based diagnoses as a matter of fact They are in general better at diagnoses of all types once they are trained. This is a story AI equals human experts in medical Diagnoses and this is general symptomology not just epidemiology not necessarily cancerous There's a lot of this sort of thing that is going on in the world now now. This is a happy positive example There are lots of really terrible awful examples and those may be the ones that often people think of when they think of AI AI is only as good as the data that is used to train it All right, you can ask any question you want But unless the data that you have has the answers sort of wrapped up inside them You aren't going to get what you want and if the data is biased Then you are going to get exactly the opposite sorts of answers that you want And there are lots of examples in our modern world where AI is doing exactly this sort of thing How many people in here knew that there were AI based systems that judges in the US use in order to determine sentencing guidelines for people Number of you right what is what what trained to this AI what data did they use? Historical sentencing data which is absolutely fair and unbiased in every way, right? No, right is the historical sentencing data in the US is is is awful. It's racist It is it is a terrible set of things that you would use to actually Determine what you know the length of time someone should be incarcerated and yet we are sort of willingly using historical data in order to determine modern sentencing rates for people that are in the situation in the US We have examples of Big companies making similar sorts of data Problem data mistakes two of my favorites Google very early in the In their photos Experiment Google photos is a lot better now than it used to be but very early on it was classifying African-American individuals as gorillas why Because they used bad data and did not train it appropriately and because they did not look at the results of their own work It was gross oversight for this to happen And my own personal favorite example Microsoft Determined that they were going to do an experiment in AI a few years back This was about four years ago five years ago. They Created an AI that they wanted to learn how to act like a human to behave like a human talk like a human online And they thought well We need people to feed it data to like teach it how to be human, right? So they took this bot and they put it on Twitter because that's a perfect again a perfectly rational and reasonable place to teach something to be a human and So this bot Tay took about Three hours to become a white supremacist neo-nazi Just horrible awful evil thing Because it learned from the data that people gave it and when people said horrible things to it It incorporated that as part of its personality and it parroted those things back data matters AI can also be used as a tool specifically for Bad things, right? We are all in this room certainly aware of the misinformation Disinformation campaigns that have been regularly going on on our internet for a few years now AI can be used to enhance and incorporate Evidence that is entirely false and yet seems to support a point of view Let's let's do a test. So AI's at this point can create Humans kind of from whole cloth. They can create facial Structures of humans entirely faults and fake ones, but they look real so I show hands so one of these is an AI generated face and one of these is a human So how many people think the the male is? AI AI generated How many people think the females a I generate more okay, you're all wrong. They're both You know, they're both AI generated neither of them are real These are entirely fake people as Are these they're entirely fake people, right? When I started talking about AI several years ago this sort of work to create the actual generation of a fake person Took a lot of horsepower. You had to have specialized rigs. You had to have access to high powered servers You had to have high bandwidth data You had to really sort of have an infrastructure in place in order to do this sort of work Now it is a website literally a website if you go to this person does not exist calm You can create entirely fictional people for use in in your advertising And it has become a fake people as a service at this point You can through the website generated photos download a hundred thousand fake people For use in your advertising promotional material, you know, whatever websites whatever you're doing, right? The ability to create fake people the ability to create fake voices Google and others have Ais that can disassemble and reassemble Phonemes at this point so that you can through just a few minutes of audio Make an individual say anything you want them to so you could take my voice and make me say horrible awful things Do the same to a politician right all through Through the use of some very very simple AI tools and you can combine the two visual and audio in order to get something called the deep fake where you have entirely fictional videos being produced with Recognizable human faces on them that you know you could make Mark Zuckerberg do anything you'd like in a video So Yay, what a brave new world What does this mean though for our world? Yeah, it's only gonna get better. That's right. That's right. That's right. No fear. No fear only gonna get better So what does this mean for our world for libraries for research for scholarly publishing for for kind of all of this that we that we live inside? Well, there's a lot going on right now And I'm gonna point out one specific tool and then I'm gonna rush through a whole bunch of other tools and Try to paint a picture for you The one tool I wanted to point out in specific is as far as I am aware the only one built by a librarian at a library for public consumption This was built by a librarian named ender on by Yelton when she was still at MIT Hamlet stands for how about machine learning enhanced theses. This is an entirely machine learning driven discovery engine for Electronic dissertations and theses at MIT Uses no subject headings uses no Sort of human-generated metadata anything like that. This is entirely using sort of machine learning driven semantic analysis of those digital goods and it is miraculous it it it is a Completely different beast than the sorts of discovery systems that we have had in the past Quick example of how it is different The first time I used this the first time I had a chance to test it The first thing I do with any system is try to break it because that's what you do with systems is try to break them so One of the one of the things you can do with this is actually upload text of your own So you could copy and paste a paragraph that you thought was particularly interesting and it would find semantically similar works, right? So I fed it the entire Book Peter Pan by Jim Berry Like I figured MIT theses are probably gonna choke on this They don't have anything, you know, what are they gonna do give me rocketry for flying? I don't know like what are the you know, what is what is the match gonna be? So I gave it Peter Pan and just said show me what's similar and it gave me all of the creative writing theses from MIT Of which there aren't many because it's MIT in the creative writing department is you know, not awesome but I don't know Not a large corpus there, but The eight the machine learning system was smart enough to write to realize that Peter Pan is a work of fiction and That other things in this corpus that most relate to it are other works of creativity rather than Rocketry right or other engineering topics Fascinating so you should take a look The rest of the tools that I want to whiz through all have to do with a sort of dissembling of the research process Libraries publishers etc. We learning management systems We all sort of have in our head this research process that individuals undertake as a part of their as a part of their learning right you find a topic you Search for that topic you locate them you evaluate them you make notes you write the paper you cite it You you know proof read it you edit it There's this sort of you know process of scholarly creation that happens and a while back I sat down and thought how many of these are done by AI right now All of them every single one right so you have AI's like Cara AI wisdom AI Diffio AI Iris AI all of these are search engines that you give a topic to and it does a search an analysis a Retrieval and a set of citations for you all automatically Doesn't know no sort of interface for you AI takes care of all of it all you do is give it your sort of rough idea of the topic you want All right, so now you've got your sources you want to take notes and actually write the thing okay Well, there's an AI called resumer that will take a look at all of your papers and I Automatically summarize them for you give you just the high points tell you what the important pieces are give you the thesis statement under You know kind of break it all apart for you Scholar see does the same sort of thing it creates flashcards from a document to give you just the key facts about what it was that you fed it You've got AI writer which Gives you again unique content it takes topics and or Germs of stories that you give it and generates entirely AI written papers You've got easy bot which does the same sort of thing which has my favorite tagline of any of these finish your essay today No plagiarism. That's my favorite like You have an AI writing for you, but no plagiarism. That's that's the important part Yeah, anyway essay bot again you feed it stuff it it Collates the information rips information apart rewrites it for you gives you entirely original content based upon what you gave it and Then you have again Articulo, which is my favorite example for then a tech name that has never been part of a romance language The that does the same thing you give it research and it gives you a paper out the other side Proof reading and editing Probably somebody in here is using Grammarly because it's an incredibly popular service to do grammar check and proof reading and Everything like that. It is all driven on the back end by a machine learning system that does English grammar and sentence structures Writing assistant most powerful writing improvement software in the world also machine learning driven doing Corrections on your grammar So more or less Every aspect every single part of a traditional research project You could outsource entirely to an AI now That would be horrible right now right like nobody expects that to come out the other end and be like a masterpiece of scholarship But as I said right now is the worst this will ever be and While right now it is garbage to do that process beginning to end in five years It's gonna be better and in ten years. It might be good enough to fool us And that's something that we need to think about the thing that I am anticipating and that I expect to come very very shortly is a Sort of end-to-end solution that someone is going to patch together, right? I did say earlier that I was interested in sort of the weak AI Right just the individual little train to do one thing and that's true But if you chain those together if you put a series of weak AIs all together that can take Write something from beginning to end and pass it from one to the other. It is more or less The same as a stronger AI system and so These personal AI assistants are a thing that I think are are coming very very soon AI is dependent on data But if you capture a student very early in a learning management system or something like that even as early as high school Then you're going to have data on them on their interests on their research projects as they move through college Into graduate school and finally you know into sort of research Professorship or research institutions you are going to have a decade of data and I say you we might not someone will right libraries might not be able to do that, but somebody is and That decade worth of research information is going to allow that individual to have a personal AI assistant That is far more in tune with what they do and what they want Then any sort of reference librarian interaction might be they're going to have historical data that just is An incredible amount of stuff It worries me that this sort of project is definitely coming and yet we don't Haven't thought about libraries haven't thought about how to sort of get around it And it may not be one of our traditional vendors. It may not be you know one of the vendors here in the room Amazon is certainly making a play for this sort of thing. These are the big boys that are playing in this space Amazon has an Alexa skill for educational technology applications Including Coursera canvas blackboard and all of the sort of learning management systems that you may have on your campus Right and as if this is integrated through something like Alexa then Amazon is scarfing all that up And they're gonna have longitudinal data that we could only dream of So conclusions Um, I don't really have conclusions. I have concerns, right? I have I have Things that I worry about I worry about Privacy implications I worry about privacy implications because AI necessitates data And if we are going to play in the pool that is AI we have to have water And that worries me because any data is dangerous data. I worry about the historical record of Automation moving funding from labor to capital, right? Traditionally when things are automated You lose individuals and you gain Things you gain capital, right? as AI increases and we offload Responsibilities to it. I worry about the chance that we're losing Person personnel and we're gaining this outsourced thing. I Worry about us repeating the mistakes of history because In the same way that judicial systems rely on historical data Any sort of search retrieval discovery engines that we build will have some degree of historical data in them And if we only kind of move in with the data we have and aren't very careful about it We may simply be codifying sort of old procedures and old biases. I Worry that AI's are in many ways black boxes that we have data on the front end and answers on the back and some of the stuff in the Middle is real fiddly and Understandable by us When we start off loading that sort of decision-making out of a library and out to a vendor or out to Corporation like Amazon I worry that that black box nature of it allows for fiddling in ways that of information flows That don't make me comfortable at all. I Worry about the externalization Leading to ethical decisions being made in those spaces and not in our spaces not in libraries But in situations where the where the corporation is making what we would consider an ethical decision That bothers me. I Worry that the focus on these agents on these AI agents only increases the sort of filter bubble aspect of our informational resources We've seen this filter bubble especially in our political discourse over the last several years If you only ever get the things you're interested in you only ever get the things you know and that bubble only increases and strengthens and so I worry that AI driven discovery and or Personal assistance only drive that sort of filter and bubble effect of it. I Worry about incentives that are placed before vendors and corporations in a world where discovery and consumption or data driven We have a historical precedent in the world of news and journalism and how that has been Disincentivized in so many ways over the last ten years I worry that other pieces of the information ecosystem may follow and in conclusion Amara Roy Amara was an American researcher and scientist He said we tend to overestimate the effect of technology in the short run and underestimate it in the long It is entirely possible that I'm overestimating right now But I AI and the effects of AI in these AI systems is going to be something that isn't going to go away We are going to feel it for a long time and then if you will allow me one tiny tiny bit of Promotion if you would like to continue this sort of discussion NISO is actually having its own conference for the first time in February NISO plus NISO dot plus if you want to get some more information there and Then this is all of my information. I think we'll save questions to the end I'm gonna let Keith come on up and take over. Thank you everybody. Hey, good afternoon everyone Thanks, Jason. I'm not going to Make as many general points about AI as Jason did We'll let his presentation stand for itself. I'm going to talk more about some of the specific issues We're wrestling with at Carnegie Mellon. I was calculating this morning. This is my 43rd conference presentation of the year I'm glad the holiday season is upon us I am not going to talk about the library of the future or about our deal with Elsevier Adam would keep me right on on that one and Rather what I'm going to do is talk about AI as I said in the context of my institution for reasons that I'll explain And it's really what I want to convey as a sense that AI exists across the academic landscape. It's no longer rooted Entirely in computer science. We find it in the fine arts and the humanities and elsewhere and from a library perspective we need to Support research and education and present our core library services with all of that in mind So how I'm going to get there is Talk a little bit about AI Frame that in the context of the work. We've been doing the last couple of years around open science Say some words about AI as it has evolved at Carnegie Mellon the education and research world that we find ourselves Supporting some of the library activity. We're seeing and hopefully get through that in time to allow for a few questions When I dig around looking for images for presentations, I find I get two sorts for AI One is the more algorithmic mapping type stuff Or I can talk to my colleagues in computer science who will send me screeds of algorithms to play with and sure These are good fun. I'm trained as a computer scientist. I appreciate that sort of stuff but on the other hand I can get the Sort of cyborg type things Jason made the point about the judicial system leveraging AI algorithms and About radiologists and other health care practitioners So we get the sense of cyborg becomes professional when you try and do this for librarians. You get things like this Or this slightly more ripped maybe but There's almost the sense that there's some sort of You know great brain thing going on and the more you dig around the more you realize that it's difficult to get AI images That don't reflect deep intelligence. It's all about the brain and technology and as I say it's impossible to get anything that isn't Really reflecting the sense of supreme intelligence I had to get one Brexit joke in Come on Thursday is going to be a big day, but that's maybe the best outcome As I say, I'm not going to talk about AI as it applies to libraries in the abstract or indeed about AI and higher education Jason has brought us a great book Joseph Owens book on AI and higher education really is a fantastic read But I do want just to make some framing remarks about how I view AI and I really like the way that Oracle has presented this this evolution from artificial intelligence Through to machine learning and on to deep learning as the more contemporary approach I'd argue that machine learning in our space driven by massive scale computation of correlations Supported by high powered computers is most relevant to us It's about extracting minute, but highly relevant correlations From massive data sets not about computers trying to behave as humans do I don't have time to articulate an argument on this in any depth But I hope we can agree that properly done however this plays out It is good to have a different intelligence to ours working alongside us We argue this in the diversity space. We argue this in the AI space as well diversifying the views that come together to help us form insights and make decisions in The next decade. I think we'll see a growing focus on how AI can help liberate us Just as the machine did in the second industrial revolution William Ross Ashby coined the term amplifying intelligence sometimes referred to as cognitive augmentation in his introduction to cybernetics published in 1956 That was one of the the earliest works about AI as a discipline Recently I said I wasn't going to talk about Elsevier, but here I go There's no getting away from it at least the Elsevier produced a very helpful overview of AI as a discipline and Showed the huge growth in AI research over the past 60 years But what it points to is the growth over the past five years of Scholarly output in this field fueled in part by a huge investment in research in AI in China But also reflected in growths in Europe and in this country And it's nice to really understand that AI is not a discipline in itself But really is a collation of a variety of different fields if we look at this growth we see that The volume of papers over the last 20 years has ticked up and as I said the past five years Really has illustrated that but what we've seen is the emergence of machine learning Which was kind of in the middle of the pack 20 years ago has become the leading field from the research Perspective alongside neural networks and that perhaps is indicative of where the Academy is Focusing its interests at this moment in time I'll come back to that shortly but at this point I do want to make a few remarks about open science and This is where I begin to talk about some of the stuff we're doing at Carnegie Mellon It's perhaps helpful to frame this in the context of the OCLC University Futures report We were ranked one of the ten most research intensive in their typology of institutions and I do that just to illustrate the Volume of research activity on campus not just amongst the faculty but indeed amongst our undergraduate students Many of whom are publishing in top tier journals during their bachelor's degrees We've invested heavily in open science infrastructure in recent years To me that really is about delivering scientific excellence at scale Machines will play a key role in delivering a new generation of services for authors reviewers and editors Built upon the principles of open science and we're already seeing a lot of interplay between the products of research made open and computational science in essence machine learning the dominant component of AI research today needs data and The more open that data are The more activity the machine learning researchers can engage in I Realized we really had crossed the Rubicon with open science in our libraries at CMU when a couple of my colleagues created a lip guide But that's the the test when they've got a lip guide. It's a real thing And if you want to understand more about how we're doing this, please do visit that site Our institutional repository kilt hub has started to become an integral part of our support for AI activity to demonstrate replicability and Consistency the algorithms explanations data training sets should be archived by the institution and That is something that our repository architecture has been explicitly configured to support AI can only be as good as the ecosystem services Provided by the data management that helps our researchers Retrieve and interrogate data an example of this from the neuroscience space at CMU was the Bold 5,000 FMRI data set that is being used by our AI researchers But had been generated by our neuroscience researchers as part of their own work And you can read that new story online Much of our work has taken its inspiration from OCLC's model of the evolving scholarly record, and I don't have time to Explain that in depth, but our service model on the left and the recognition of the Open science tools from our colleagues in Germany on the right really illustrate this Rethink about what a library is and the recognition that Gone are the days when researchers would come through our doors to use our collections our collections have moved to the cloud and we have had to build our services and The tools we deliver to researchers around their work flows What Lorcan Dempsey describes as the inside-out library? I'm just going to let the photo op take place So we've begun to dissolve many of our Traditional services and think about how we can take our offerings into the lab into the Researchers office into the classroom Cliff mentioned the executive round table this morning when he spoke a few moments ago and Many of the remarks that you'll see documented in his report really reflect the culture that we are experiencing I spoke with my colleague Wajin Wang at this conference last year and our detailed views on open science and Can be seen on that presentation We've had a couple of open science symposia run by the University Library's at CMU over the past couple of years We just released the videos on YouTube on Friday of this year's symposium If you're interested in finding out more about our work there and how it's playing into the AI machine learning space Do please have a look at that gary price told me earlier. He'd put this on to info docket So that's another way to find the link to that so AI at Carnegie Mellon just to touch back on the Elsevier report for a moment Whilst some of the Chinese powerhouses really are dominating the world in terms of scholarly output in the United States the the five institutions at the bottom you can see that I've Predictably called out Carnegie Mellon as the most productive in terms of scholarly articles Not many people think about that think about places east and west but Carnegie Mellon is The most productive in terms of scholarly articles and our engagement in that place can be tracked back to the 1950s in 1955 three professors at Carnegie Mellon Alan Newell our Simon whose picture there and Cliff Shaw wrote the first program deliberately engineered to mimic the problem-solving skills of human beings and They are credited with having developed the first artificial intelligence program even before the term AI had been coined earlier early computer users at Carnegie Mellon gravitated to questions of human and computer logic what Herb Simon who won the Nobel Prize for his work termed intelligences artificial and natural and Which he investigated through observations of students interacting with logic puzzles out of this grew the university's reputation Leading to the formation of our machine learning department around about 20 years ago and more recently a strong investment in deep learning I'm just mimicking the Oracle model here if you're interested in our deep learning work There's a great series of lectures on YouTube from those working in that space As we think through the Situation of having great depth in AI and related research some of my colleagues have developed this model the AI stack Which is trying to show how we are deploying artificial intelligence research Across campus and how strategic Investment decisions are being made The idea here is that we are working in an environment where there is so much expertise That we don't need to be skilled in every area The intent is to focus on one area and call upon others for help and we have individual Departments focused on most of the themes called out in the stack such as our machine learning department or social and decision Sciences department human computer interaction and so on and That AI stack appears around about the front lobal region I think temporal lobal region in this model of AI research at CMU which broadens to complement the AI stack with expertise in fields such as robotics and design So a lot of AI activity at CMU The interesting point is that it's scattered in disciplines. You wouldn't necessarily think about our Center for human rights science for example is working on AI and human rights Our department of philosophy is world leading in the ethics of artificial intelligence I didn't have time to create a slide, but our creative writing program is Top notch and I'm intrigued as to whether they're using AI I must must pass that on We have specialized interdisciplinary institutes such as the block Center for technology and society Looking at analytics and ethics our business school is very focused on things like the business of health care and Transforming that with machine learning Blockchain type activities and so on the College of Engineering perhaps more Expectedly is working heavily in AI and how big data can transform engineering activities and our College of Fine Arts is heavily engaged in design aspects of AI, but also how AI can lead to Creating illustrations instead of essays and dissertations. You can create artwork and maybe sell it on YouTube Not in YouTube on eBay Using AI to do a lot of the hard work So research activity across the institution as is educational programming This is just the first of multiple pages from our course listings Where I did a simple search for AI or machine learning hundreds of courses at the University in these fields typical course this one from the College of Fine Arts on Creativity and AI from the University Libraries ideate program our integration of technology and life that human and machine autonomy aspect and dedicated degrees a masters in Artificial intelligence and innovation this year we launched our first bachelor's degree in artificial intelligence So you get a sense of the institutional environment. What about the role of the library in that space? We clearly can't ignore AI. Otherwise, we would miss a large part of the institution's activity So for example in the Dietrich College of Humanities and Social Sciences their general education offering is in a series of grand challenges things like climate change and in this case artificial intelligence and humanity and As a core part of teaching the Gen Ed programs that is an opportunity for us to interact with early stage students and Help them understand some of the challenges that I'll turn to in a moment next semester a couple of University Libraries faculty will teach a new course Listed by the Department of Statistics and Data Science called discovering the data universe Where they are talking about data collection data management formatting visualization storytelling Ethics and so on and much of this is designed as a course partly for non specialist students to Understand some of the basics but also to prepare them for subsequent study If they wish to dip their toes into machine learning or statistics We're seeing a lot of humanity students for example Wanting to explore how they can use these technologies in their majors They often haven't come from a rich computational background in high school and we're helping them feel comfortable and confident in working with these activities like many libraries we are Offering data carpentry software carpentry type workshops. These are Consistently sold out. We have a number of our faculty trained as instructors and they could be doing this full time We're in discussions with our statistics colleagues about making the carpentries a prerequisite for people Who wish to declare stats and data science as their major in our special collections? We are also beginning to dabble around some of these things and we have a couple of enigma machines Which have attracted a lot of attention? I use this to point to the needs we hear from researchers who are humanists who are moving into machine learning and artificial intelligence and like the students they often have come from Humanistic backgrounds rather than computational ones and they view our faculty as being trusted and accessible They're safe people to come and tell that you're terrified of numbers or you just don't know how to Understand the algorithms I showed at the beginning but more broadly we are seen as reputable and reliable intermediaries between The humanists and the computer science specialists that they need to work with we are the ones who connect as interpreters and explain to the computer scientists this papery thing is a book and To the humanist this squiggly thing is an algorithm What this points to as a side note is the importance for us of staying engaged with researchers because if we are to build these Relationships and foster these collaborations. We need to know who's playing in different spaces and Given that many researchers particularly the ones who don't know what a papery thing is haven't come through our doors in years it is prompting us to reach out and build connections that Traditionally have not been our natural space One way in which we're doing that is a new service called the data Collaborations or data collab where we are bringing together researchers who have generated data and want advice on how to share it and look after it and so on and those who need data to do their own research and it's a kind of Almost like a data holics anonymous where we bring them together and they can trade data and algorithms in a variety of ways So this is something we started this semester Has been a big hit and I'm sure that will continue I've mentioned already some of our training programs. We received Mellon funding to offer a series of digital humanities literacy workshops and Have very much focused there on issues around metadata standards analyzing large data sets one example of a project there is the six degrees of Francis Bacon and project which had any H funding and the idea here was to use machine learning techniques graph inferences and web development to reconstruct the social networks of Early modern Britain from about 1500 to 1700 long before anybody had heard of brexit and to take the history of scholarship as produced initially in the directory of national biography and Then build the networks out. So that was an interesting project with a fair bit of library support But what it then triggered was a recognition that these approaches allowed us to do things like Bring a voice to the historically marginalized What you have here on the right or on my left looking at the back of the screen on the top is the network of everyone predominantly men and You just can't see there that the depth of individual identities and their networks but when our colleagues looked at the presence of women in these networks, they were almost invisible and it led to an interesting study in its own right about the role and presence of women in London in those days, which in turn makes us recognize the importance of calling attention to things like biases and ethical concerns and these are the sorts of things We are trying to encourage students to reflect upon separately the university has Established a variety of research programs and engagement activities on ethics and AI our ethical principles play nicely into this This is the the British equivalent of the American Library Association. I just like their color scheme But calling up their ethical principles quite strongly. They are very much in line with what's seen in this country Are there this year the ARL issued a report on the ethics of artificial intelligence? I don't have time to Review that report, but it calls out many of the key issues these are Also evident in academic administration We we know that AI is being used or people are talking about using it in things like making decisions about college admissions Identifying students at risk Personalized learning some of these are good things some of them are confronting and we I think have a professional Responsibility in our institutions to be the voice of conscience There are I think powerful upsides our language technologies institute is working on making privacy policies more accessible by extracting from the 40 page click-through thing and Presenting easy to digest Summaries of what you are signing up for before you hit. Yes. I agree We've seen research on campus looking at flu tracking. We've seen some fun stuff around poker building upon chess and go and Have beaten the world's pros Things that might that are challenges but have the power to improve our lives such as autonomous vehicles and Frankly some things that are more confronting around AI and military applications In terms of library specific activity I promised Thomas that I would congratulate him publicly on releasing his report about an hour before we all came into this room Please read it. I had the pleasure of seeing some early drafts It really is a great agenda for research in this space. If you look at Twitter and you'll find it It's been tweeted everywhere But things that are interesting in the library space earlier this year Springer published its first machine-generated book and I think this is a really confronting Issue it's nice in some ways to see how an algorithm can Ingest thousands of articles and spit out a 400 page literature review But does it become a bit like a Ponzi scheme where an algorithm reads a bunch of literature reviews written by Algorithms and so on and it comes a bit. You know is this the mortgage-backed securities of 2020 something We've you know Jason gave you a great host of examples. I'll just call it things like you know As examples in our space the Chan Zuckerberg Initiative announced they are funding of meta recently another good application What about a world in which we could test and validate hypotheses against the scholar literature? You know, I've got a research question I'm going to express it in natural language and have the answer delivered to me From surfacing science direct and widely online library and plus and so on Other interesting examples Using AI to analyze patterns both to streamline the patent application process in a world where we are all Encouraging innovation and entrepreneurship, but also to leverage the vast quantities of technical information Locked up in fairly dense literature Research we were involved with was Computer analysis of the teeny Harris archive from the Carnegie Museums of Pittsburgh and understanding some of the threats around facial recognition, but some of the opportunities for Creating metadata for vast corpora of old photographs Automatic shortening of titles and other related things here and some work in the fine arts space My colleague Matt Lincoln has been looking at how he can read art auction catalogs from 18th 19th 20th centuries and figure out how the behavior of the fine art industry has shifted over time What is it that people are valuing at different points in history? another interesting project that people are working on is looking at how we can identify the publishers of Works that were published anonymously 500 or so years ago and by looking at the fonts as bits of data and looking for similarities with Broken or damaged fonts you can find books that were published anonymously for political reasons must have been printed in the same place as books Published by Identifiable publishers or authors and you can begin to infer new insights in ways that were impossible before now We've also been working on Coming back to some of the more core traditional library things looking at our special collections We have one of the first printings of Frankenstein We leveraged AI and machine learning to understand some of the opportunities around that In Cliffs remarks at the beginning of the this event He talked about AI and its potential for data discovery and reuse We hosted a three-day conference on that earlier this year. Thanks to the National Science Foundation Lots of interesting insights from that the papers are available in F 1000 and My colleague Wajin Wang and myself wrote an editorial Across some formally published papers, which was released last month by the ACM I don't have time to summarize all of the key themes But very much focused on the power of reuse thinking about incentives and standards One of my takeaway remarks from that conference was you may not have 70 million dollars to do research But you can access and build upon the data from somebody else's 70 million dollars worth of research And that really is at the heart of many of the opportunities that we see coming out of data sharing and the opportunities of machine learning to exploit that I Won't be a labor of this. I just want to get to a couple of points. The National Institutes of Health currently is Calling for feedback on its data management and sharing proposals these are due in by the 10th of January, please do take time to comment on those because Clearly we are at a tipping point if we are to leverage responsibly and it is much more challenging for the NIH than the NSF The data generated by experimental research Then they need to understand what sort of expectations can be shared with the research community and their institutions We've seen a huge amount of data sharing in repositories and repository services like dryad and Zenodo in recent times But we need to recognize that sharing is not reusable again cliff made the point about fair data and I think there is some really Searching questions that we haven't begun to explore in depth yet and these will very much share our agenda over the next while I'm going to wrap up there and Offer the opportunity for questions to Jason or myself If we need it, there is a microphone here that we can throw into the audience So over to you. Thank you