 Okay it's time. Let's go ahead and get started. Welcome everybody. It's great to see you all here to the extent that I can see you with all these bright lights but I'm delighted you're all here. Welcome to the December of 2023 CNI member meeting and I hope that your trips here were easy. I'd like to extend special welcome to our international participants. We have I believe a good number of them with us. I also would like to extend a welcome to our ARL leadership fellows and our clear fellows who are with us today. We have a lot of new attendees at least based on the first-time attendees session that we had earlier this morning and some of them have first-time attendee tags and some of them don't but please make them feel welcome whether they do or not. I'm very glad that they're all here with us. For those of you I haven't had a chance to meet yet I'm Cliff Lynch. I'm the director of CNI and I have a few housekeeping things to cover and then I'm going to talk broadly about some of the developments of the past year and a few of the things that CNI is doing to engage them and try and highlight some things that I think should be on your radar screen. So first off I would like to welcome some new members and some rejoining members. Our new members Whitman College, Western Michigan University, the Inter-American Development Bank and Concordia University and our rejoining members California Polytech State University and New York Public Library. Welcome all. I'd just like to note a couple of logistical things. First off let me remind you that we will be taking video of all the sessions so if there are two sessions and you can't decide know that whichever one you didn't go to in a few weeks you'll be able to watch the video and I would invite you to share those videos very broadly within your institution and beyond to people who might be interested in them. I also want to just note for our member representatives here we have been using the member representative list quite a bit to invite participation from member institutions in various initiatives notably things like the ARL CNI Joint Task Force on AI future AI scenarios. I mention this because I think that it's important that our member reps keep an eye on that list with a view to sharing these calls for participation as appropriate within their institution. In many cases the member rep may not be the right person inside the institution to participate but we are relying on our member reps as sort of gateways into our member institutions and I just wanted in light particularly of the increased opportunities for participation that are coming through that member rep list to remind you of that role. A couple of notes about the meeting. We will conclude today with a set of lightning round reports. Those reports serve a lot of purposes. One is to just give you heads up on important developments that for one reason or another don't make sense as a full session right now. Another is for you to identify people that you may want to talk to. For example you may want to find them at the reception that immediately follows the lightning rounds. In many cases the lightning round speakers are also hosting discussion tables at the breakfast tomorrow morning. This is something that we have been experimenting with for the last meeting or two and seems to be quite a successful way to handle some opportunities for structured conversations around topics of interest. So please avail yourself if you're interested of those opportunities tomorrow morning as well. Finally I just want to note that it is December and travel which is always interesting can be even more interesting in December. So we may have some schedule changes. As of right now I'm not aware of much in that area but we will post those both on SCED for those who are using that and on the message board opposite the registration desk. So please keep an eye on that as we go along. And I think those are all of the logistical sorts of things that I wanted to cover. So let me move on and get to the main topic at hand which is my survey of developments. And I feel like this has been a year where we've had an awful lot of developments and an awful lot of confusing developments. But I want to among other things persuade you that there's more going on in the world than the day's announcement about the latest generative AI thing. We will talk about AI in a little while and machine learning and even generative AI but I don't want to start there. I want to start with infrastructure and we'll go from infrastructure to AI to a few very brief comments about scholarly communication and then a few brief speculations about some longer term social and societal things that I think are in play. I'm determined to keep these remarks short enough that we have at least 15 minutes for conversation at the end. Maybe I'll do even better although past history is not encouraging. But I will stop at 15 minutes too. So let me begin with infrastructure and I'm using infrastructure in the sort of broad sense of computational and networking infrastructure and some of the things that are attached to it. And the first thing I want to just remind you of because it's been surprisingly low profile outside of high performance computing circles is that we are now pretty firmly into the Exascale computing age. We've been working our way up to this for a long time. We've been making investments in higher and higher performance machines and we've actually gotten there. There are a new generation of absolutely gigantic machines coming online. Frontier at Oak Ridge, Aurora at Argonne and a series of machines at the Texas Advanced Computing Center just to name a few. There are more of them but those are a few of the most visible public ones. These are now moving from acceptance tests and sort of break in to actually getting used day to day in research and science. And I think over the next year or two we will start seeing some really really interesting developments from those machines. The other thing I want to note in the infrastructure area is that I think there's broader thinking about what constitutes infrastructure coming through the funding agencies and through groups like the Office of Science and Technology Policy. So for example we saw this year a very interesting plan for a National AI research resource. There's a nice blueprint document which I commend to you if you've not looked at it. It's not perfect but it really is quite an unusual thing. Now I don't know whether this is going to get built as with so many of the exciting things that we've seen coming out of the various policy development groups and planning groups. It's awaiting funding. If you look at so much of the Science and Chips Act for example, or I guess the Chips and Science Act technically, you will see that that is all still contingent on funding. But I think this is an important thing to watch. Another piece of infrastructure. We are starting I think to get much more serious about the implications of attaching experimental equipment to the network. In December 21 when CNI came back together for the first time as the pandemic began to abate. Actually in this room, in this hotel, you heard about the Carnegie Mellon Cloud Lab Initiative. This was a effort to basically build a highly automated lab for material science, molecular biology, chemistry, and related fields which could be used remotely by faculty at Carnegie Mellon. It was based on some technology from a company founded by a group of CMU grads called Emerald Cloud Labs. Now if you don't recall this presentation, I would urge you to go back and watch the video. I'm not going to reiterate it. But what I am going to note is that in December 21, they were describing plants. I was in Pittsburgh a couple of months ago. They've built this. It's online, 14,000 square feet. It's starting to get used. And over the next year or two, we will start to see what genuine impact it has. I think that this is a very important strategic development to watch and we will get an update here on that at the appropriate time. But even beyond what happens at CMU, I think that institutions like the National Science Foundation have started to really begin to get the concept here and to start thinking about instrumentation as a shared and remotely accessible resource. That is a very powerful force for democratizing science, for resource sharing, for changing patterns of investment, for facilitating collaboration across institutions. The National Science Foundation this fall has been funding a series of workshops. There was one in Pittsburgh in October. There was one in Atlanta in November. And there's another one scheduled for North Carolina in January. Reports are not out from these workshops yet, but it's very clear that these are ideas that are starting to mature and gain mind share. You'll have an opportunity later in this meeting to hear from another group, the Ecology for Research Networking, ERN, which is doing work on remote access to and sharing of instrumentation. As opposed to cloud labs like Carnegie Mellons, which really are dealing with a bespoke facility, ERN is looking at how can we take equipment that's already in place and begin to share it. I think these developments are really quite significant. And there's a whole add-on to this that deals with sort of smaller scale laboratory-based automation. There's a lot of investment in this. It goes under the unfortunate name right now that we're trying to get people to abandon of self-driving labs. I can't think of a worse way to communicate what's going on, particularly to the broader public. But I think this is a trend that is really worth keeping your eye on, because it has the same kind of resource sharing and enabling and democratizing implications that so much of advanced networking and computing has been exploring for the last 30 years, just now moving into physical systems as well. Two other things I just want to flag in the infrastructure area, and I don't have time to discuss in any detail. One is cybersecurity. All I'll say about this is that if you look at the statistics, they're really scary. The number of ransomware attacks, for example, the amount of disruption that's going on through various kinds of attacks, is really extremely troublesome. We're starting to see everything from hospitals hit to critical infrastructure, all the way through cultural infrastructure. I would invite you to look at what happened to the British Library recently. They underwent a significant security attack, and still months later now they're trying to get some of their services reestablished for their user community. I don't have a solution there, but I think this is an area that, if it isn't very much on your radar screen, needs to be. The final thing I'll mention in the infrastructure sphere is the notion of digital twins. This is something you may or may not have run across. The idea is that you take a simulation model and you couple it with a physical system. You use the simulation and the physical system to validate and predict back and forth. It's a very powerful idea that's being used for everything from things like jet engines all the way through manufacturing facilities. I'll just note to give you some sense of what's going on here. NIST, as part of their role in the advanced semiconductor manufacturing, part of the CHIPS Act, has got a workshop going on Thursday and Friday this week on the role of digital twins in advanced semiconductor manufacturing. The National Academies has been conducting a whole series of symposia on digital twins in different areas throughout the year. I've shared some of those with CNI Announce. They are coming out on Friday with a summary report on what they've learned from this series of workshops and there will be a report release event on Friday. I'll share a pointer to that report on CNI Announce. This reminds me of something I intended to say at the beginning before we leave infrastructure. I am putting together what you could think of as a series of footnotes or pointers for this talk because I'm referring to a lot of different things as we go. That should be up and linked to the web page for this talk in a few days. I'll put out an announcement when that's ready. Let me turn from there to the wonderful world of AI and machine learning. The place I want to start here is just with how much uncertainty and confusion there is right now. We find ourselves in a world where we've got issues around technology, economics, policy, regulation, content all mixed up together and now fueled by this series of chatbots based on large language foundation models which have sort of captured the public imagination. They're very scary because almost anybody can interact with it and has an opinion about it. They have no idea what it's doing. It's very reminiscent of the early days of things like Google. For example, when people thought Google could read their mind. It was amazing to many people how this worked. I think that this sort of emergence in the public opinion has really been an enormous confusing factor. You've got a lot of people whipped up about the need to regulate things. You've got a lot of people making all kinds of dramatic pronouncements. You've got a lot of demoware out there. You've even got some fake demoware out there from what I understand. Some of these demos really are very, how shall we put it, orchestrated demos. An amazing lack of transparency about what's going on. One of the things that we've done at CNI quite recently is we've partnered with our colleagues at the Association of Research Libraries and put together a task force and a process to develop a set of scenarios to help us chart the possible futures and the ways in which some of these developments in AI may play out. It's our hope currently, and we're on a timeline to do it, that at the spring meeting in San Diego, we will have a first draft of those scenarios and we will have a conversation about how those have shaped out. It's our hope further to have a fairly final draft by the ARL meeting later in the spring. At that point, we'll be ready to use these as a tool for various kinds of institutional and community discussions going forward. Some of you have very kindly found time during this meeting or prior to the meeting start this morning to participate in some focus groups. Some of you will be participating in future focus groups. Some of you have given time or will be giving time for interviews with the team to help put this together. A number of you have very generously volunteered to participate in the task force, an even more significant commitment of time. I'd just like to thank all of the people who are helping us to move this effort along. I am hoping that the result of this will be something that gives us all a little bit more insight into the possible futures. Now, I want to spend about 10 minutes on some of the key issues related to AI and machine learning that I'm paying close attention to. I don't promise these are the right issues, but they're at least the ones that are preoccupying me and I put them out as something that might be helpful for tracking the situation. The first issue is whether a lot of this is going to turn into a sort of a centralized oligopoly or whether we're going to see highly distributed adoption and development of these technologies, particularly the foundation models. There are a group of players, large corporations mostly, who are kind of interested and motivated to promote centralization to say, this is something that only very big, very rich corporations can do, that it costs hundreds of millions of dollars to train a state of the art model. And the only way everybody else is going to be able to get it is through a sort of a software-as-a-service approach. And there'll be a handful of these players. Now, there's a lot of counterweight to this. You may have seen, for example, the recent AI Alliance announcement. Certainly, there are a huge number of researchers working on models that are smarter, that are more frugal in terms of their computational requirements, that rather than just throwing the biggest, most expensive hardware in the best statistics at things, the biggest mass of data try to be more parsimonious and more precise. I don't know how this is going to come out, but I do know that it's going to be very important how this balances out, and this is going to do a lot to shape the future. I would also just note in this connection the interaction with the various calls for policy and regulation here, the things coming out of Europe, the things floating around in Congress at the moment as proposed bills. It's much easier to regulate a pretty centralized industry. And there's a sort of a regulatory capture phenomenon where if you've got big players, they actually can use regulation to make sure that the market is kept for the big players who've got enough resources to deal with the regulatory apparatus, and basically you kind of squeeze the small players out. That is a possible scenario here. I don't come down on one side or the other of this as a prediction at this point, but I do think it's important to understand that there are definitely some vested interests who are promoting this vision of massive scale, massive cost, massive centralization. I'd also just say in passing that the more distributed this turns out to be, the less demanding computationally, the harder it's going to be to regulate. We can take lessons here from history, such as the cryptography wars of the 1980s, for example, about how hard it is to regulate things if they are accessible to just regular people with regular grade computing capabilities. Next issue, I think that one of the things that everybody's struggling with right now is how important is generative AI going to be as opposed to the much broader portfolio of AI and machine learning technologies? And the answer to that obviously is very complex, and it depends on who you're asking. One of the things that's striking about generative AI is that it talks directly to a lot of people. A lot of people have to write things, students, office workers. It talks directly to threats to the livelihood of a significant number of people who make pictures, music, videos, words for a living. So it's something that speaks very directly to some big chunks of the public. On the other hand, I would invite you to have a look at the National Academy's workshop on AI and scientific discovery, which was held in October. This was a day and a half workshop. They've got the videos available now online. And at least my takeaway from there is that generative AI is probably going to be the least of it in terms of genuinely driving scientific discovery. It may be important in the communications processes or the funding processes, but it's going to be much less important in supporting the process of discovery itself. The next area I want to highlight here is training data. And I think many of you are already aware of some of the controversies there. The uncertain legal status of the use of copyrighted training data and the pending court cases over this. But it's not just copyright. There's all kinds of liability. There's privacy questions. There are questions about transparency and attribution, about what training data is actually being used for what. And this is something that a lot of the players have been remarkably opaque about, perhaps with good reason. There are further questions here. For example, there's some very interesting work that suggests that if training data is polluted by significant amounts of synthetic material that has been produced by generative AI, that the value of the training data and the value of the language models being trained on it deteriorates very rapidly. There are questions about whether we have pretty much run out of content to feed these things. Not clear. There's questions about how stable these models are, especially if and as they incorporate the results of ongoing training and interaction. For example, there have been some studies that suggest that some of the open AI chatbots actually, as they have operated over time, have become much worse at doing arithmetic. Very curious. I mean, they were never good at doing arithmetic, but they've gotten, you know, it's deteriorated. There are some things that are very hard to explain like that. The last thing, and this is related also to training data, which I think is going to become a very important concern for our community, is the interactions between the choice of training data and what the models can do. There are people now who are training models exclusively on scholarly content. In some cases, pretty broadly. In other cases, pretty narrowly. There's interesting work going on in places like the Allen Institute on that. Not a lot of results out yet. These are still new. People aren't talking about this very much. You may have seen the announcements recently. That exascale machine at Argonne, the Aurora, is now training something called Aurora GPT. This is being trained on scientific literature, code, and data. They are talking one trillion parameter points. So it's an enormous model. It is not clear how long it's going to take to train this. They started it in November, as I recall. But we need to be watching these kind of developments over the next year or two, as I think we will better understand how training sets interact with the capabilities of these systems. Then the final thing, and again, it's related to training sets, but it's related also to a lot of other things. We've seen quite a bit of rhetoric about open AI and the desirability of open AI systems of open models. I think it's important to note that we don't really understand what this means. We spent 20 years, 30 years developing this notion of open source and open source communities, the governance of open source communities, the licenses under which open source communities can operate. That experience doesn't translate neatly into the world of AI, where you've got this sort of interwoven set of data, code, training, human interactions, which is very confusing and calls for, I think, a different kind of way of thinking about what open is or perhaps a very nuanced set of multiple definitions of openness here. I just note as a sub-piece, I think that the question of how we meaningfully preserve these systems is very closely related to our understanding about what constitutes open in this area. I'll just leave that there. Let me move on to talk a little bit about scholarly communication. In, let's see, I guess it was October, maybe late September and October. We held a series of executive roundtable convenings on developments in research data management, and hopefully we'll get the report out before the end of the year. It is really striking the extent to which demand for RGM support is ramping up at many of our campuses. Some of this is being driven by shifts in policy from the funding agencies, notably the new NIH data sharing policy, which has been operational since early this year. It's clear that more and more researchers are realizing that they need to do this and that they don't know how to do it and they need guidance. We are clearly struggling with how to do this at scale. We are struggling with questions of to what extent do we teach about tools and to what extent do we actually get involved in the process in our support organizations. Teaching is much more scalable than getting involved in the process, but it looks like what at least a lot of researchers want is genuine involvement in the process. The repository landscape is also evolving in very complicated ways, and we certainly heard quite a bit about that in the executive roundtables, notably about a real reconsideration of the role of institutional repositories in hosting data at what scale and what kind of data, as opposed to going to external generalist repositories or specialist repositories. So there's a lot going on there that I think is important. Connected to that, I think we are seeing steady progress in persuading funders and persuading open scholarship advocates generally that code needs to be a first class part of research outputs along with papers and along with publications and along with data. That seems, as I say, to be making steady progress. It is ironic, of course, that the rapid adoption of machine learning and AI kinds of tools is now making this very confusing because it's not so obvious how to share or publish those kinds of tools in some cases. I'll just mention two other developments in scholarly communication. I think that it's clear we continue to make progress on opening up access to the scholarly record. At the same time, I can't shake this sense that the consensus about how to finance that open access is really fragmenting, that there is a loss of consensus here. We've heard about publishing read agreements, transformative agreements. Now we're seeing a lot of pushback against APCs. We are seeing attempts to, you know, add APCs to APCs sort of. We are seeing some groups call for entirely funder provided infrastructure for publication, which has implications about, you know, centralizing things that are very problematic, at least to my mind. So I do think that there's a new debate emerging very much without consensus about exactly what the economic and business models are going to be to support this going forward. The last thing I want to say in the scholarly communications area, and this is something that just has really caught my eye recently again, is that we are seeing a kind of a new sort of database show up. It's a prediction database. So one of the most famous ones was the one that came out of Google DeepMind maybe a year and a half, two years ago, where they predicted all of the protein folds for a vast number of proteins. And they just put this out as a reference database, and people in molecular biology, in drug discovery, and other areas have been using this ever since. It's a wonderful, wonderful resource. One of the complications, of course, is that these are predictions. They're not always perfect. They're not always right. And we don't really have a good mechanism for, you know, kind of reflecting observed science in this predictive science. Now we've just come up with another round of this in material science. Again, it's out of the same DeepMind group in Google. They have predicted the stability of a vast number of compounds that may have interesting properties. Again, it's a prediction. Nobody's synthesized most of these compounds yet. In fact, they don't even know if they can synthesize a significant number of these compounds. So this sets up a whole new kind of set of scientific practices and scholarly communications practices as we couple up predictions with various kinds of experimentation. And I think increasingly we may see some of the experimentation, some of the efforts at synthesis, for example, becoming automated. There's a very nice project connected to that material science prediction project that was documented in nature a couple of years ago, doing exactly that with robotic synthesis. So I think that's an area to keep an eye on going forward. This is another good example of where our computational abilities are producing artifacts that don't necessarily fit all considerably into the workflows of scholarly communication and scholarship, and we're going to need to figure this out. So let me turn, as is appropriate, towards the end of a set of comments like this, to some speculation about some longer-term things that we're going to have to sort out. And I'll give you one that's obvious and one that's less obvious. The obvious one, and we've all been sort of talking about this for a while, is the recalibration of how we assess truth in an era of generative AI and so-called deep fakes and all of that sort of thing. That world is here now, and unfortunately, the world is here, the solutions aren't. There are some efforts to do technological solutions here. Everything from various kinds of forensics that attempt to detect anomalies in computer-generated images or videos, and there are anomalies that you can pick up, but this is essentially an endless arms race. Once the anomaly becomes well-known, the generators will quickly fix up that anomaly. So I'm not real optimistic that that's going to be a long-term solution. There are efforts to essentially formalize provenance through a series of digital signatures. For example, there are cameras now that are being produced that essentially sign the images it's produced. I think that those are an interesting niche, but there are real limits to exactly how far you can take that. It really relies on you having confidence that the camera is actually being employed by somebody you trust as opposed to somebody producing a fake image that alleges to come from a given camera. I think we really need to get a lot more serious about some of the, I don't know whether you call it information literacy aspects of this or digital literacy access aspects of this, but thinking about teaching people about corroboration, about skepticism, about context, about provenance, all of that is going to come into play here. I think we can complain about this all we want, but this is a reality that is here now, and I think society is clearly going to have to deal with it. But let me move on to the slightly less obvious one, perhaps more unthinkable one. Right now, the Copyright Office is taking the position that only humans can create and that it only counts if the humans are using technological tools to help in that creation in a very supervised kind of way. Where exactly you draw the line there is still very much up for debate, but that's the position that they're taking. I think that over the next decade or so, and this is going to get pretty disruptive, I fear, we're going to have to have a recalibration of how we think about creators and their roles, about the rights of creators, about what it means to create. I think that we're going to see, for example, an enormous upsurge of the ability to produce things in the style of so-and-so, whether it's music or prose or images, that's fairly easy to do now. And that's never really been protected all that well. It's sort of protected. It's not really copyrighted. It's kind of an adjacent thing. I think this is going to be very much into play as we try and sort out the implications of things like generative AI and less the issue of training data and more the issue of what they produce and the legalities of what they produce, the ownership and the responsibility around what they produce. I think that this is going to have broader implications than we might think. There is one group of people who create for a living, if you will, and those people, by the way, are by and large really upset right now about the use of their works as training data for various kinds of generative AI systems. I think if anything, many of us may underestimate just how deeply upset those people are. This is their livelihood. This is not just their livelihood. It's their lives in some cases. And they feel this is deeply, deeply wrong. Now, scholarship is another matter. We're going to talk tomorrow in the closing plenary about how notions of open scholarship relate to the willingness on the part of authors of that scholarship to have their material used in various computational ways, including the training of AI systems. Maybe we're going to wind up here with a distinction between art on the one hand, if you will, in the broad sense and scholarship in another. Maybe when we think about how we treat creators in a world of generative AI, maybe that's going to be an important distinction. But I want to leave you thinking about one more piece of this. If you believe that we're on the verge of, if not already, at the point of making systems that can behave in the style of giving enough training data, it's not just famous artists or want to be famous artists that can get undermined by these. It's historical figures. It's cultural figures. It's family history. It's all kinds of recently deceased people who are well documented in the media record of the 20th and 21st century. What does it mean when we can reanimate recent historical figures that can talk to you in a very convincing way about how they claim to have made the decisions they made? Why they did what they did? What does it mean for family history when we can package up our grandparents so that future generations can enjoy their stories and ruminations? I think this whole question of things that reanimate, if you will, that produce conversation in the style of historical figures, either ones that are personally important or ones that are socially important, politically important, historically important, is going to be a major development that we don't understand the ground rules for at all and that is going to be part of our sorting out of this whole question of creators and their roles and works in the style of, works in the tradition of works that reanimate. So I would leave that one as something that is going to be a long term argument. It's going to be an argument on a social level. It's not going to just be a legal argument. It's going to really be an argument on a societal level, a consensus that will presumably be developed. I think it's going to probably take at least a decade to sort this out, but I think it's an important one for us to be mindful of as stewards of so much of that record and also as people with a great interest in the integrity of both the record and presumably the reanimations. I think that these are developments that really, they may change in a significant way how we think about and relate to our past, particularly our relatively recent past as opposed to what happened in medieval times. So I leave you with those speculations about broader issues that are related to some of the developments in AI, but actually go much deeper than that and that I think we ignore at our peril. And I can't believe it. I've gotten through all of the things I wanted to talk about. We actually have just a tiny bit more than 15 minutes for questions and discussion. So let me thank you for your attention. I hope that I've given you a few interesting perspectives that maybe are new to you, maybe will give you some different ways to look at some of the developments that are taking place, how to interpret them. And now please let me hear questions and comments from you. Thank you and the floor is open. Hi. Danielle Cooper, Ithaca SNR. Not a question, but I'm very excited to hear about the forthcoming footnote version of this and I thought I'd kick things off with a recommendation or suggestion, which is when you're in the section on cybersecurity issues, you mention the British Library, a really great example, but I really look forward hopefully to maybe a throughline to some of the other prominent examples because they cut across the types of institutions that are being affected that are relevant to this community. So to give you two other examples that I have been following very closely, the first is the Toronto Public Library. That system is still not up. After many, many weeks, they did not negotiate and now all of the staff data has been compromised. They are manually writing down call numbers to record when people are taking up books. They're running out of books and you can't even print. The other example that many more people probably in this audience are familiar with, of course, is the University of Michigan. That wasn't just a library. That was a whole institution and crucially was just a week or so before classes began and just a few days after the president had so prominently announced their generative AI solution. So thank you for indulging me because there's no question here and I look forward to it. Thank you, Cliff. Actually, there are a lot of questions there. There are questions about, you know, how robust our systems are. There are questions about what should the priorities be in trying to bring them back. You know, you've got two kinds of things mixed up here. You've got information about staff, you know, personnel records and things of that nature, which are the same as any other company, the same kind of considerations. But then you've got these public facing services, which are quite different in character often than, you know, other kinds of corporate offerings. And the public facing stuff is really very complicated to figure out what to do about. It's also sometimes hard to understand why it's taking so long to bring some of that back. You know, the communications around a lot of this have not been great. I mean, another particularly chilling example here that I've been following lately is the 23andMe breach, where all of a sudden you've got a lot of omic and genealogical data floating around that has, you know, gotten loose. And I don't think anybody really understands the implications of that fully at this point. But I agree. There's a lot to be examined and understood here. Andrew White, Rensselaer Polytechnic Institute. You mentioned quite a bit about Exascale, but I wondered if you had any observations or speculation on the emerging quantum computing. Oh, I'm so glad you asked that. Where are you? Are you over there? I can't really, yes, I can't see too well. That's why I sat down. The glare is a little better at this level. Quantum is something I actually cut out of this, because the original version of this, when I tried it, ran two hours and I only got through the first two-thirds of the material, so I decided that wasn't going to work well. I've been following the quantum stuff pretty carefully. And there are a lot of different aspects to it that I think are significant. I mean, obviously, the phase-in of so-called post-quantum or quantum-resistant cryptography and all of the cryptographic agility issues around that are very important. But I think we're seeing some impressive progress on computation at the quantum level. I know I've been following, for example, the IBM roadmap that they've laid out, and I know your institution, if memory serves, has got one of the IBM quantum systems going yet. That's correct. Yeah. And I'm going to be really, really excited to hear how that goes over the next year or two. One of the things that I think is really wonderful about what some of the players at IBM, again, as a good example here, have done is they've really tried to open up the software environments so that people, just regular people, not a very tiny cadre of graduate students, can go and actually try and understand what quantum computation involves and what programming one of these devices might actually be like, even if what they're doing is talking to a simulator in many cases. So I think there's some very exciting stuff going on there. At the same time, I think there's also a certain amount of hype in this area. What's really going to be important is understanding this sort of fruitful area where in the next, say, decade, we can actually get genuine leverage on more problems using particularly noisy quantum computation until they get to the stage where they can do error correction for huge numbers of qubits. Thank you. I would really welcome it if you would keep us posted on what's going on at your institution as you go down this path. Be happy to. Thanks. Hey Clifford. Jen Stringer from Getty. I'm going to start off with a comment and then I do have a question both for you in the room. I was just at an amazing conversation at the National Gallery of Art a couple of weeks ago around AI in art. So your comments about art versus sort of scholarly communication and scholarly publication were, I think, really insightful. And I would also say that many cultural heritage institutions and museums and libraries, especially collections, who have predicated the open access movement on creative common zero licensing, I think we had a robust discussion about was that now given large language models, now given what is happening in the generative art space, do we need to rethink creative commons, what it allows for, what it doesn't allow for? Are we unintentionally perhaps feeding this question of what is art and I know things aren't in copyright and people are still artistic, but how that could be used perhaps in a way that we may not ethically be interested in seeing happen? Great question and one actually that has parallels beyond the art world. There are people, and as I say, we'll touch on some of this in the closing plenary panel tomorrow, who are not entirely comfortable with the notion that the sort of creative commons licenses that we've been using historically and have been proponents of open things up for wholesale training of large language models, irregardless of the use to which those language models are being put. I think there's some particularly, I don't know what to say, vivid questions in the arts about when does in the style of belong to all of us? And when does it stop belonging to that creator? And we've never really quite dealt with that as a society. I think you'd probably get general consensus that producing works in the style of Leonardo as long as you're not trying to pass them off as genuine Leonardo's is probably not that outrageous. The closer you get to the present, the less comfortable perhaps people get. I have no idea where to draw the lines, but I have to believe that there are going to be some very uncomfortable discussions around this in the museum world. And I think that, just to put a final comment on it, if they decide that they don't want this material used to facilitate the production of new things in the style of, they may have a lot of trouble walking back their current decisions. I mean, if they've already released this material CC0 or CCBuy or something like that or just placed it in the public domain, can they really walk it back? It's not obvious to me how that would work. One more comment around that is that CC0 gives people, which is what we've done, allows people to make money off of it. I don't necessarily know that I want IBM making money off of our scholarly collections. We did that for scholarly, you know, that was actually to open up scholarship, maybe not necessarily to make, you know, name your organization a shit ton of money. And I think that there's sort of that issue too. So in a way, I'll leave it. I think it's a fascinating conversation. And that same question is floating around a lot of scholarly work that, you know, has been opened up with the purpose of advancing scholarship. I think it's a really tough call. More questions, comments. I should probably know the answer to this, but I'm curious if our educational institutions are implementing enterprise licenses for AI and using that to ingest in our documents to enable discovery. Actually, I'm kind of excited about that. I mean, there are certainly some privacy risks and data leakage risks, but the corporate world is doing this in spades, and I've been looking for information about what academia is doing, but maybe I'm not looking in the right places. So I wonder what you know about it. So I'll just say a couple of things about this. I think that on the corporate side, there's a lot of concern about how private things that you feed into these large language models are. You know, even if you're doing it as part of a prompt process or a fine tuning process, and this has not been helped by the lack of transparency on some of the part of some of the platform providers. I think that a number of the more paranoid corporate entities are going to end up running things in-house if they possibly can, just to get control over the privacy kinds of aspects of things. Now, in the higher ed area, there actually are some remarkable things going on. For example, Harvard has made a significant investment in local, you know, fenced off Harvard community AI systems, large language models, things of that nature. I believe UC San Diego, working with the San Diego Supercomputer Center, has already done this to just give you one other example, and I'm sure there are more. I actually would have liked to have done a session looking at those kind of local implementations for this meeting, but for various reasons I wasn't able to get it together, and I'm hopeful that maybe we can have one in San Diego in the spring, because I think understanding the pragmatics of that and why people are doing it and how well it's working is really important for the strategic planning of many of the institutions here. This also, I would say, ties very much into that question that I was framing about will these things be centralized software as a service, or will these models actually get to the point where, you know, reasonable sized organizations can run their own instantiations of them? Certainly the policy, like in Kansas where I am, the state government's past an AI policy that is beyond ambiguous, and we're all wondering what in the world it means for use of AI to help us in our work. Yeah, and I guess I should also mention a lot of the recent edicts forbidding the use of things like chat GPT have been driven around that security issue. So, for example, a number of the funding agencies have been very heavy handed about saying if you are a reviewer on a grant, thou shall not put that, you know, grant proposal into some kind of a large language model to get it summarized or, you know, help you write the report or the evaluation. And that's primarily as far as I can tell because they are terrified about the security issues around it and the disclosure issues around it. Other questions, comments? You can't see me through the podium. Oh, good. So you have to take it. Hello. I'm the blurry shadow in the light. Okay, gotcha. Cliff, I'm curious if you could speculate a bit about how academics and cultural heritage organizations will consider generative AI in terms of its climate impact. So I guess there are a couple of ways to think about this. The first is trying to realistically quantify the climate impact. And I'm not sure that experience to date is necessarily a wonderful guide to the future there. There has been a certain amount of what in my view is kind of heavy handed. Well, you know, we can afford to get 30,000 GPUs and run them for a month to train this thing. So by all means, let's do it. You know, we have a statistical and a computational advantage that way. I think that you are going to see an era here where there is going to be at least some attention paid to computing smarter and more parsimoniously rather than just, you know, throwing as much hardware as possible at some of this. And I think until we navigate some of that, we should be a little careful about jumping to judgment on the cost of training these things. Now there's also the cost of inferencing once you've got one of these things trained. So for example, if you look at the experiments that some of the search engines have been doing, where they're coupling, you know, the search results to one of these language models, it basically is making their processing of each search quite a bit more computationally expensive. And maybe for one search it's not a killer, but they process an awful lot of searches a day. And, you know, to the point where they actually notice the computational cost in a significant way. I think that right now there's not a good sense of the cost benefit of applying these tools in various scenarios. And I think one of the things that it's going to be important to get a handle on is what those cost benefits look like. And, you know, I've seen extremely minimal research on that so far. It's much more sort of, wow, we can do it at all rather than a kind of a measured, you know, it gives us a significant improvement in these dimensions. So that's at least a little bit of a perspective on that. Thank you. And I think we're at time. I thank you for these questions and comments. I'm sure there are a lot more to be shared and I hope that in our conversations in the coming days and weeks we'll have an opportunity to do that more. So welcome and I hope you have a wonderful conference.