 Good morning everyone. Thank you for coming to this session. And we'll introduce ourselves very briefly. I will give some introductory remarks and then we'll turn over to the panel to give you some more background of what I think we believe to be a very interesting, somewhat unique collaboration between a not-for-profit and a new corporation. I'm Chuck Henry. I'm president of the Council on Library and Information Resources. I'm Carol Mandel. Yes, this is Anya. Yes. Dean Emerita of New York University Libraries and doing various things, some of them as a, I'm on the clear board. That's why one reason I'm here and also a clear distinguished presidential fellow on doing some work I'll be talking about. Hi, I'm Stephen Rentot. I'm president of Coherent Digital. I'm Christian Dupont, Associate University Librarian for Scholar Resources and Burns Librarian at Austin College and I engage with Coherent Digital as a strategic advisor. And I'm Wayne Graham. I'm the Chief Information Officer for CLEAR. Thank you all. And we will have some time at the end for some questions. I hope you have some interest in this topic too. The briefly, about 2014, 2015, Cliff Lynch in his roadmap talk for the coalition meeting that year made a nice reference to a new committee, then new committee called the Committee on Coherence at Scale. The Committee on Coherence at Scale was a group of people convened by my organization, CLEAR, with generous funding from the Andrew W. Mellon Foundation, Don Waters at the time and he was intimately involved with this. The committee was composed of college presidents, university presidents, provosts, there were public fears university librarians, archivists, any number of professions represented on this committee. And it was brought together to look more closely at the academic information landscape. And we got together because we all had sort of very similar unease, I guess, in the environment, the information environment in which we were working. Part of that was we were looking at the publishing models. We were looking at the flow of information, the flow of knowledge from research to publications to libraries to archives and we found a lot of firewalls. We found a lot of lost data. We found information that was very difficult to find in the first place. It was disconcerting. There were also, I think then and now, there was not a lot of guarantee that something that was published is going to be around for a while. The publishers, particularly in the sciences, don't make those guarantees. So we were looking out on a landscape that was, where information, which is really the lifeblood of what we were all about, was ephemeral, costly, redundant and sometimes impossible to find. And complicating that was the culture of higher education itself and one of the key aspects that we focused on in those years and those deliberations was the problem of competition. Universities and colleges, libraries were competing constantly, I guess still do, competing for faculty, competing for staff, for number of books, for number of journals, for grants, for students, this incessant pursuit of what I call the arithmetic of prestige. And what that did is, and we were all part of it. My organization was part of it and everyone at the table was part of it too. It was very hard to extricate oneself. Once you get into that competition, it's very hard to get out for one thing. And once you're in the incentives to collaborate, the incentives to become, heaven forbid, interdependent, are vanishingly small. If you, we used to talk about, hypothetically, if someone was applying for a presidency of a university and she sat down and said, I think, you know, what we ought to do is cut half of our athletic programs. We need to become interdependent with the two other universities in this city. We need to share collections and share staff and we can pair this and we can focus and concentrate on a fewer number of areas of expertise. Would she get the job? I doubt it. And so it was, this is the environment that we were up against. So we discussed for several years how we might form a collaboration, how we might address this really multifaceted problem of disjointed information, expensive and redundant information, and a culture that really was not about to change very much. The committee eventually dissolved. It was basically through attrition, about half the members went on to different positions and we had admittedly a very difficult time of continuity. So the committee disbanded after about four or five years of discussion and we have been speaking intermittently since then about this problem which persists. So we're here this morning because there was another member of the Committee on Coherence at Scale who was then a board member of CLEAR, Stephen Rintott, who is now president of Coherent Digital. And we're gonna talk about how that ideas or those values or the looking at ways to streamline, make coherent, make cost-effective and build communities as opposed to dividing so many of us around academic information. So I will turn to Stephen and thank you, Chuck. So I was inspired by Chuck. This committee really struck me as if it was going to solve a really, really major problem. The trouble was moving from strategy into action. So in 2018, I left the company, I founded Alexander Street Press and I was an absolute loss. I love my work and I had nothing to do. And my wife said to me, you have to find something to do. And I went back and said to myself, wow, I wonder if there's something I could do with this committee idea and actually build some coherence at scale. So what I did was I started talking to a number of people that you know, people like Brian Shotlander and Jim Neal. And what I heard back more and more was that actually in one sense we're not doing collectively that good a job. How is that Facebook can actually capture an item on a phone, disseminate it to three billion people for free in a matter of seconds. And that same journey for an academic artifact is measured in months and thousands of dollars, sometimes tens of thousands of dollars. Surely there's something we could learn from Facebook and other social networks. So the more I did an audit of what's out there, I realized that there's literally hundreds, maybe even thousands of tools that actually could enable us to do this. There are tools like OCR engines, there's content management systems, there's HDR, there's entity extraction tools. A lot of the things that we do manually could actually be automated. But the fundamental challenge that we're facing is that they're not coherent. So to give you an example, what happens is you could use an AI engine to create metadata, but many dissemination systems require nicely full marks to give a decent user experience. So that then means that you can't just use the AI engine. So I began to think, I wonder if we actually created a coherent system. And that's actually why we called the company coherent digital, because our idea was to take tools that are already extant and actually link them together so that they began to generate some coherence. So in 2018, I formed the company coherent digital with a mission to tame wild content. And the idea was that we're actually doing a pretty good job with books and journals. It's the materials that are, as it were, wild. And when we colleague here, Christian, I had a meeting in need of mass and I'm Newton mass. And Christian said to me, I think there's a lot we could do. So the more I looked, the more I realized that in certain areas, there was a great deal of phenomenal content. And the one we landed on to begin with was policy reports. CNI, as a think tank, puts out great reports. These reports typically are not in libraries in the same sense that books and journals are. And the more I looked at it, there are literally thousands of organizations that put out excellent quality reports. What I noticed was even things like the principles of open publishing after five or six years tend to disappear. So we said to ourselves, what tools would we need to be able to bring coherence to this content? And we decided we would need a capture tool, an enrichment tool, a dissemination tool, and an engagement tool. In other words, we wanted to do the entire process from literally capturing to actually engaging with faculty and scholars. We built and licensed and co-opted existing software for the most part. I don't wanna give you the idea that we've created something incredibly unique here. Really all we've done is lend coherence to what's already there. We built a thumbnail generator, a publication date extraction tool, a title extraction tool. We took an OCR engine, and with this we were able to launch a database called Policy Comments, which indexed and aggregated about three million items. The key distinction of this is that we were able to index that amount of material for pennies. We also were able to get 50,000 end users registered over the space of a year. And our usage shows us that the content is actually very actively being used. As part of that initiative, we noticed that there was a great deal of content from Africa. In fact, we noticed that of the think tanks that we were indexing, over 850 came from Africa. In the global South, it is far harder to find money for APCs. It's far harder to go through the traditional book publishing process. And yet the content is still urgent and necessary. So the more we looked at that, the more I thought, and I began to talk with Chuck and said to myself, hmm, I wonder if there's an opportunity for cultural artifacts, primary sources in Africa to be aggregated in the same way. So I didn't audit again, and we found that there were over a million items pertaining to African culture and heritage spread across hundreds of institutions around the world. Ironically, many of the user interfaces on those collections were not mobile friendly and had no language translation capability. So what this meant is even though there's a lot of African cultural heritage available, it's actually inaccessible to the originators of that content. And so the more we looked, we were like, hmm, this would be a really interesting opportunity. And so a month ago, we launched Africa Commons. It aggregates and indexes collections across 600 institutions, about 100 of those from Africa. We also made it possible for individuals to upload content directly in the database. So historically, there's been a distinction between content management systems and dissemination systems. That by linking those two together and using automated tools to do a level of indexing, will actually enable people to submit content and a minute later, it's accessible to the community. Now, I know this violates a lot of shibboleths. Of course, peer reviews an issue. Of course, there's a need to make sure that the content is actually reasonably high quality. But my colleague Toby Green, who built the publishing system for the OECD, has been allowing uploads like this for many, many years. And the number of violations are relatively small. They happen from time to time and things need to be flagged. A far larger problem is persuading people to upload. Most people don't want to upload. So the nice thing here is that we will enable African institutions to actually upload content themselves. In the past year, I've conducted over 40 interviews with African librarians and the issues they are facing are crashingly simple. We have materials being damaged by water. People are stealing pamphlets if they're valuable. These materials could be relatively easily captured if we violated principles. Another example would be finding aids. I said, why haven't you got your finding aids up? Well, EAD is really expensive to create and we don't have any resources to disseminate EAD compliant data. Well, do you have Word documents that actually describe this content? Oh yeah, we've got Word documents. Well, why don't we just upload the world Word documents and make them accessible? So there are certain principles here which I've already touched on and Christian's gonna explain these in a little bit more detail. But the principles here are actually in some ways quite radical and I'm not suggesting they supplant existing processes. I'm suggesting they're brought up alongside existing processes because the challenging challenge we're facing here is orders of magnitude more difficult than we've done with books and journals. We simply cannot have these artifacts disappearing because we insist on perfectly formed marks or really, really high quality, zotcial scanner level copies. It's far more important that we preserve the artifacts before they disappear and make them accessible to the communities that created them than it is for us to observe traditional items. So I'm gonna finish here by just talking a little bit about some of the principles we've come up with. One is fault tolerance. So when Google gives you a result set and there's an error in the Google result set, I don't hear faculty going, oh, that Google, it's useless. Very often they're like, they gave me 20 good results and the fact that five of those results were poor isn't an issue for them. Now again, I'm not suggesting we litter faults across our content but AIs typically only get you to 97, 98%, even if they're good. So the cost of cleaning that 2% or 3% is way higher in my opinion than the cost of letting that content go out and be subsequently corrected. So another principle is what Christian and I call event-based cataloging. There's no such thing as a catalog record that's finished in my book. As AIs improve, cataloging must improve. As we get ever better systems for entity extraction, the idea that we would simply freeze the catalog record and keep it the same way seems to me to be wrong. We should allow the catalog record to evolve over time. I've not touched on preservation yet. Carol's gonna talk a little to this but in the African model, we have seen a link rot and disappearance of sites like you wouldn't believe. The half-life of the think tank in Africa is under four years and typically what happens is it's like a flower. It blossoms, it produces magnificent reports and a couple of years later, those reports have disappeared. Capturing those pieces of content is something that we're doing in policy commons. We have to do it respectfully of course but like the internet archive, we keep a copy. We only show that copy in the event that the link breaks. Otherwise we route the traffic back to the original site. So in a short talk like this, I don't have time to go through all the principles but I'd be very, very happy to talk with any of you here individually. If you'd like to check out these databases, there's a freemium business model. Anybody here can get access to them simply by registering at policycommons.net or africacommons.net. And just as a side note, Africa Commons is freely available to the continent of Africa and to HBC use. We think that the openly available content is a critical ingredient of driving engagement. So with that I'm going to stop and perhaps questions later. Thank you. So I became interested. I mean I knew about the work of Coherent at Scale and Coherent Digital and through my clear involvement but I came at it from the angle of the work I've been doing and my sort of intellectual struggles and research struggles and dealing with content in the wild and where will it be? You know, all this knowledge, all this born digital material lost to the future of the world, lost to future knowledge. How can libraries cope with this? Some of you might recall that at CNI in 2019. Clifford and I, Clifford Lynch and I, is there another Clifford? I don't know, there's some. Kind of raised this wicked problem in a joint presentation. We titled it Memory Institutions and Deep Digital Disruption. Beyond the Technical Challenges of Born Digital Preservation. And among the observations, we noted that collecting and stewardship are essential prerequisites of preservation. And so who is doing that collecting and stewardship even though we know technically how to do preservation? I know there's issues at scale but I think CD can address that too. So, and what we were looking at is the extent to which our institutions, all of us really lack the infrastructure and we mean all aspects of infrastructure, not just the technical infrastructure, the organizational infrastructure, the financial infrastructure, the human resources infrastructure, the conceptual infrastructure to really deal with this kind of unpublished, wild, wild's a good term, content at any kind of scale. And this problem, of course, hasn't been solved. Two years later, again Clifford and I, because we've been, I mean he's doing so many things but digital stewardship is an important part. We published companion articles against the grain and by that time I'd sort of come around to realizing that the library role was still so important in this. I was kind of hoping that society would somehow solve this problem but society thinks it has solved this problem. It has cultural heritage and memory institutions and it's got this problem back on us. So here we are. Cliff's article delved really deeply in a very interesting way into the nature of the web whereas I was focusing on the web as a treasure chest of intellectual content that really needed to be excavated and used. And the point, so my article was titled Collecting from the Web, Collection Development Policy in the Born Digital Universe and the point of that very short piece was to kind of illustrate the kinds of content that are completely relevant and within libraries collecting policy if it was published in some other way than in the wild on Born Digital. But even as I kind of made that plea to librarians in that article, I realized they're listening to this and they're not disagreeing but the reality is they lack the feasibility and capacity in most institutions to really figure out how to address this. So I started trying to analyze it and look around and think, so now what, even if people understand this, what are they gonna do? And the more I kind of noodled around with Coherent Digital, I had this literally, literally Eureka moment. I'm not making this up, no one's paying me to say this. There is no money to pay me to say this. And I realized that Coherent Digital because of the tools that it has and the way it's designed and because of Stephen's continual work and open-mindedness is not just please buy my package of digital content that I've produced, even though they have good packages and do go ahead and buy them. But they've got tools that let you do your own curation, your own capture, your own addition. If we can figure out how to put this all together the right way and that's why the partnership with Clear is so important is we actually have a way forward on capturing and preserving and curating born digital content that's important. So I'm gonna stop there and move along but that's controlling my excitement. And I'm going to be brief. As we're running up on time, I wanna be cognizant of everyone's time here. One of the things that made me very excited when we started having these conversations is a project I've been working on for about seven years now the Digital Library of the Middle East. One of the biggest problems that we've been running into for content from the Amina region is actually people are excited, they have collections, they have absolutely no infrastructure whatsoever to be actually do anything about it. They don't have the personnel, the equipment or anything to really stand up these collections and be able to do anything with it. So you have to start looking at creative ways to figure out how do you actually get this online in a way. And it's one of the things that I'm most excited about. Being able to partner with a for-profit company that has figured out some of these things that I can actually go to these people now and say I have a solution to explore and be able to put this in the hands of the people that actually own and love these collections. Often in the middle of deserts, often fighting climate change with buyers and all of the things that you have at other places, just exacerbated by their remoteness. And when I think about scale, we often talk about scaling up, but I also think about scaling down. How do we take our approaches and get them suited to a context that fits the local instances? So that's what I'm excited about and I'm gonna be quiet now and turn it over to Christian. Well, I think in time here, we're also here for Alice and leave you to hear about diversifying digital publishing and which fits very well with our panel theme here. So I'm eager to hear what you have to say and I know many of you are here. So I'm gonna leave maybe with our panel here just a metaphor to think about. So again, I introduce myself as a practitioner. So I'm mainly, I'm looking at my colleague Jennifer as an archives administrator, special collections curator. So in most of my career doing that, you're hearing about themes about taming wild content, right? That there are more things than journals and books. Presentations we have. I'm seeing Kimberly at Ephemer at Princeton and Yale is the same thing. So I'm tired of being a zookeeper. That's what I do. I'm a zookeeper, right? I acquire things and I hand it off to my catalogers and archives processors who are fantastic and some months later, as we work our way down the queue, we have a process collection. And then we prioritize and we consider copyright and other things and eventually we digitize it, okay? And that's when it becomes sort of available to the world. That's where you don't have to just come to my zoo. You can see some, you know, visit our website for the zoo, okay? We need to flip that. The tools are here now, right? I mean, I'm looking at things on my phone. I type in a keyword and it finds it through handwriting analysis of pictures of poems that I take, you know, annotated type scripts. Why are we not doing that in libraries? Why are we not digitizing that it's the very first thing that we do? And then, you know, extract as much as we can quickly, automatically apply, yes, our best talents in cataloging Carol, you know, cataloging must change. You remember Carol Mandel's article years ago, okay? They weren't born when that was written. All right, they weren't, okay. But still, you know, we have, we can put those resources to it, but as part of the chain, again, with our users who know a lot more about our content than we do. And then we can, you know, so that's the idea here. So we just need to kind of flip that around, use the tools that are out there and get on this platform. That's what we need to build. It's a platform and not our local repositories, not our local zoo. So I'll leave you with that. I want to be the naturalist, okay? Not the zookeeper. Thank you.