 All right, I think it's time and let's get started. I'm Cliff Lynch, the director of CNI, and I wanna welcome you to this project briefing session in our spring 2020 virtual meeting. Shortly after we made the decision to take this meeting virtual, we issued an extraordinary call for additional session contributions, which spoke to aspects of the current crisis. We felt that there was an opportunity in moving virtual to be able to do that and that given the nature of the current crisis, it would be very helpful to all of our members and participants to have that opportunity. This is the first session of those additional supplementary project briefings. And I was really delighted when we started getting those proposals in to see one of the very first proposals was from Toby Green. I've known Toby a bit since his time at the OECD and I value him as a very incisive thinker so I think we will all benefit from his thoughts on this. We will take questions at the end of the presentation. Those will be moderated by Diane Goldenberg-Hart from CNI. There's a Q&A tool at the bottom of your screen and we'll use that to queue up questions. Feel free to put questions in there at any point, although we'll address them all at the end of the session. And with that, thank you for being here and I will turn it over to Toby. Welcome. Okay, well, thank you and thank you very much for having me to give the first of these presentations about this dreadful situation that we're all in. I'm sitting somewhere north of Paris in a small hamlet as we're all locked down in France and but I've still been working with my colleagues on a new project. I've called this presentation event to a one. Why weren't we paying attention? But it's really a story about wild content. Let's see if I can move forward. Here we go. Now, a little bit about me. If you don't know who I am, Cliff mentioned that I worked for the OECD. I certainly did for the last two decades but I left the OECD in August last year to help co-found coherent digital. Previously, when I was in the UK, I worked for OECD Science and I've had positions such as the chair of the AELS-PSP. Now, you might wonder why Coherent Digital? And if I may, I'd like to start with a story of in fact why Legionnaires' disease is so called. The disease is named after the outbreak where it was first identified and I was at a 1976 American Legion Convention in Philadelphia. Of the 2,000 Legionnaires present, 182 contracted the disease and 29 died. Unlike COVID-19, Legionnaires' disease wasn't the result of the novel virus. It's now known that Legionella, a bacterium, causes the disease but it also causes another disease, in fact, the same disease called Pontiac fever which was so named after a typical pneumonia outbreak among people who worked at and visited acid as health department in 1968. After the Philadelphia outbreak, the news about this new disease and made lots of doctors realize that the atypical pneumonia they'd seen in care homes was probably Legionnaires' disease or Pontiac fever, take your pick. What's the point of this little history lesson? Knowledge is often there in a sea of fog if only we could see it. Back in 1985, I worked with a hill walking whiskey connoisseur publisher by the name of Mike Buckingham. Mike heard about the story of doctors possibly recognizing Legionnaires' disease earlier and wondered how many other diseases could be identified sooner if there was a way to see into the fog that is doctors' case notes. Mike thought that if doctors could add these case notes to a database that could be searched, we might be able to see into the fog an early attempt at crowdsourcing and data mining. Sadly, for a variety of reasons, mainly because it was 1985, we failed with that project. But I had learned a valuable lesson about fog. Having learned about digital publishing with Pergamon Press and then Elsevier, I left for France, my wife is French, and joined the OECD in 1998. What I found there was a parallel universe, a family of organizations, the OECD, the World Bank, the WHO, the United Nations, and the like, all of which published independently of the mainstream scholarly publishers. In print, this didn't matter. The print supply chain eventually got the books to university and other libraries, even if when they got there, they were often shelved separately from the mainstream content which always used to irritate me. However, when it came to digital, being in a parallel universe mattered. As we've learned, in a digital world, scale and size is vital in the battle for attention and audience share. Self-publishing on your website is a recipe for self-isolation or leaving your content to take its chances in the fog of wild content that forms the bulk of the internet. It's not for nothing, librarians call gray literature because you can't find anything gray in a fog. What to do? The choice was either to license the OECD's content exclusively to big-act players or aggregators who had the audiences that we sought or compete from our own corner. We chose the latter. It wasn't a difficult choice. Our plan was to cheat the system. Our problem was this. Users were flocking to the big all-you-can-eat journal platforms and ignoring eBooks, largely because at the time, there were no big all-you-can-eat eBook platforms and the OECD mainly published books. On the basis that if you can't beat them, join them, we shoehorned our books, working paper status sets into a journal platform which we called OECD iLibrary and then we injected the metadata into the mainstream discovery and aggregation systems to drive discovery. If you capture metadata right, we learned you can shape-shift it into pretty much any discovery channel. It worked. Dissemination of OECD's knowledge increased 40-fold. We won our fair share of that most vital thing, read a time. Except that I kept meeting publishers from smaller IGOs and NGOs who lamented that their publications went unread, unnoticed. Their publications were the droplets that formed an ever-larger sea of fog and they didn't have the means to build their own iLibrary. I helped where I could by sharing the iLibrary platform with some IGOs but the OECD isn't set up to be a publisher or aggregator and for various reasons, including legal ones, I couldn't extend the invitation to NGOs or think tanks. Yet I know that IGOs and NGOs produce some really valuable content, knowledge that's unique and can make a difference, knowledge that can change policies and improve people's lives, knowledge that's at risk because it's not our card anywhere, knowledge that, if gathered into a database, could be mined. As you can see, I still haven't forgotten Mike's lesson about fog. We decided to call our company coherent digital because we know the tools that could help us meet this challenger there but they don't always work together. This is true not just a policy content, there's a lot of other wild content in the fog, in blog posts, on websites, in shelves, in archives, in old CD-ROMs and so on. So we're collaborating with librarians, technologists, publishers and faculty to create a system that tames large bodies of content efficiently and speedily to make it cohesive, understandable, harmonious, coherent. And having started the story with one novel disease, I'm now going to tell you the story of another. Event 201, why weren't we paying attention? Last October, 15 business, government and health leaders met for a tabletop exercise in New York City called Event 201. They simulated an outbreak of a novel coronavirus that led to a severe pandemic and you probably missed it. Well, as did most everybody else, unless of course you listened to BBC Radio 5 Live at 2.30 in the morning or with Nigerian Guardian. It really didn't get very far. Afterwards, the organizers, which include the Johns Hopkins Center for Health Security in partnership with the World Economic Forum and the Bill and Linda Gates Foundation released the proceedings of the event. They released recommendations and six videos. Six days later, coincidentally, this appeared ranked 195 countries' pandemic preparedness and it wasn't clickbait. It was in fact the launch of the 2019 and the first Global Health Security Index and it was put out, again, with the help of the Johns Hopkins unit and I forget the other organizers. They sought to eliminate pandemic preparedness at both national and international levels. And on their website, you can find visualizations, reports and data that tell you about the pandemic preparedness of 195 countries. And here are six COVID-19 resources which you probably are familiar with one, the Johns Hopkins chart up in the top left hand corner, which you may not be aware of the other five. The ones I particularly like is the Our World in Data's charts and commentary, which I think are incredibly illuminating. And, and I'm not being biased here, the OECD's country policy tracker, which will help you learn about how different countries' policies are evolving over time. Don't worry, you don't have to make notes. They're available on our website. In fact, we've three more that we found. Now, none of these resources are published formally. They have no identifiers. They're hard to cite. They're certainly difficult to find. They are at risk of link rot. Who's going to maintain these websites over time? They are wild content. And how many librarians and libraries capture this type of wild content in their catalogs? How many can connect this content with materials from the Spanish flu epidemic? Which, of course, there are lessons to be drawn for today. And how many can include this content in their local discovery services? Now, Berka's Jim Church, who I know quite well, has, of course, always been keen on this content. And he's an expert on content from IJOs and NGOs. And this is what he said about IJO and NGO content back in 2009. NGO and IJO information is poorly documented, primarily digital, difficult to acquire, and in parallel with digital demise. He also noted, most NGOs lack the staff and financial resources to fund the publishing operation. I concur, that situation certainly hasn't got any better. Most of the NGOs and IJOs I'm still in touch with are struggling to maintain their staff and the financial resources to fund publishing because it's never a priority in those organizations. It's almost as if wild content doesn't want to be found. I don't know how many of you might have been to Africa and have actually seen elephants in the bush. They're remarkably hard to find. Now, Clay Scherke, who was a web guru back in the noughties, is famous for saying that publishing is now a button. And in many ways, I know he's been very influential with a lot of people who work and a lot of communication people who work in IJOs and NGOs. And he said there's a button that says publish, and when you press it, it's done. Well, let's check the source of that, shall we? Let's go to where that blog post, the blog post he made back in 2012. Well, yeah, right, Clay? It's done, isn't it? That blog post has gone down. That content is now lost. As Kent Anderson put it, I used to write for CompuServe, but where is the CompuServe publishing button today? There are more IJOs and NGOs and think tanks and you may realize, according to the Union of International Associations, there are over 40,000 IJOs and NGOs, and they grow at about 1,200 a year. They're major knowledge creators that are probably publishing something of the order of a quarter of a million reports a year, but no one actually knows. And on top of that, there's blog posts and data sets and videos and podcasts. And just to put that into perspective, there are twice as many IJOs and NGOs as there are universities worldwide. There's a huge amount of knowledge out there. Could we have moved faster on COVID? Have we stood on the shoulders of earlier giants? Back in the... This time around, the record in New York is not great. 100 years ago, the death rate in New York, in fact, was lower than equivalent cities in the United States. And there are lessons to be found in the old documents that we'll put out then, but they're really hard to find. Here's one document that I did find after a bit of effort, and it is, in fact, recorded in WorldCat. It's the World Health Index. At least that's what the title says in WorldCat. And it's available from Ann Arbor. There's a link. And if you click on the link to the content, you get this, something called the Influencer Encyclopedia. And yet all you've got is the fact that it's the World Health Index. What it doesn't tell you is that this table shows weekly deaths from influenza by city in New York State between November 1918 and January 1919. None of that information is in the metadata. So if you're Googling for weekly deaths from influenza, you're never going to find that table sitting in that archive. And it's not just about COVID. When I created Coherent Digital, our focus at the time was on the sustainable development goals. We were looking at tax. We were looking at Brexit. We were looking at the trade, increasing trade tensions. We were looking at global warming and we were looking at migration. All of these policy areas that are really critical going forward and all areas where IGOs and NGOs are producing a vast amount of knowledge. Clearly, there's valuable stuff in this content, whether modern or ancient. And what we need is a system that collects this stuff together and makes it visible in catalogs. We've got to capture this wild content. We've got to tame it so it becomes findable, useful and safe. We need a simple 21st century system that uses the cloud and the crowd because IGOs and NGOs are not about to invest anything in publishing. And it has to be a very simple system. A system that simply captures, circulates and catalogs the content. Now, you might have noticed that I've got some frisbee sitting behind me and that's because I'm also a frisbee player. So I'm going to use disc golf to introduce the Coherent Commons platform. First, we need to capture the content. So how? Well, we'll harvest but also we'll enable manual upload onto our platform and then we'll use AI tools to allocate an ID number, create a base record and store the item in the cloud. What happens next? Well, the item can be shared, embedded, cited and the usage can be trapped by the content owner. It then goes into circulation. How? Well, via discovery services and search engines, via repositories, via our Commons platforms, wherever the content happens to sit out there. And what happens next? Well, with what they know, users can fill in the blanks. They can add folksonomies. They can add stories. They can add links. And so the work gets catalogued and this catalogue is co-created by the crowd. And what happens next? The records are available for ingestion into library catalogue systems and the item is saved with, of course, permission. And then it goes out to be circulated again. The new users can add what they know enriching the catalogue entry further for new users and rinse and repeat. A simple 21st century system using the cloud and the crowd to tame wild content. So, for example, our system would allow me to capture the content that's in this table, in this bubble that I made because I can read the table and I can extract that information myself and I can add it to the catalogue record. Think of it being like a 21st century library catalogue card except that you never run out of space. And therefore, that table will become more findable and more useful for the user. Now, imagine all of that working at scale. And this is the major project that I've been working on for the last nine months. Imagine one and a half million items from IGOs and another million items from NGOs and think tanks all brought together into a single platform. Thousands of content items saved from defunct NGOs and think tanks and a lot of them out there that ceased to exist. Plus, we're looking to license exclusive content from partners and from archives bringing all of that content together into a single platform. And then we're going to add community tools. Tools that will enable you to extract the data from tables, alerting systems, but also to allow members to meet other members and to upload their own content. So it's not just an aggregation platform, it's a community platform and also institutions can upload their own content from projects and research groups and get usage and impact reports back so they can see how well their content is doing. I'm going to go back to Jim Church again. He says that the level of student interest for IGO and NGO content around a wide variety of courses is intense. Well, we dug into that and we found that in syllabi 25% of the links don't work. The links are broken because it pointing to IGO and NGO websites that are not properly maintained. So we're going to keep a saved copy of everything we harvest with the permission of the copyright owner so that should the link break we will have a copy to deliver to the user. And if the link doesn't break we will simply route the user back to the original website where they can find the content. Our goal is to make content findable, useful and impactful so hopefully we can help speed up the policy process. It shocked us to discover that it took more than 100 years from the science warning about the dangers of asbestos to policy action. Maybe we can compress that by making the policy content easier to find and making it more useful. So what next? Well, right now we're capturing 2.5 million records from 50 IGOs and NGOs and we're adding persistence identifiers to all of them to create the basic catalog record and building the user experience including obviously all of the content relating to COVID. We're going to do a beta release in June towards the end of June. So I do invite you to join in and together let's lift the fog on IGO and NGO content. Thank you. So I'll be happy to take any questions now. Well, thank you, Toby. That was quite a fascinating presentation and lots to think about unleashing the wild content, as you say, and making a massive amount of information available, accessible and discoverable. Quite an undertaking and really fascinating. So thank you so much for sharing that with us. At this point I'd like to open the floor for questions and invite our attendees to type your questions into the Q&A box which you should see at the bottom of your Zoom screen there. The chat box is also open and while we're waiting for folks to think about the presentation they just heard and formulate their questions, I just want to take an opportunity to remind everyone that this webinar is part of the ongoing CNI spring virtual conference membership meeting. We're so delighted that you could take time out of your day to attend this webinar and we want to let you know that there's plenty more to come. The meeting runs through the end of May so we have several weeks more of really fascinating presentations to come and I've just pasted into the chat box there. I've shared with you a direct link to the complete schedule for the remaining conference and we still have one more webinar to go this afternoon. We'll have a presentation from Rob Cortolano, Michelle Kimpton and James English on simply E and the academic e-book experience. So please check out what is to come yet in our remaining conference weeks and while we're waiting for some questions to come and I have a question if I may, Toby, about your project. I'm wondering about funding and sustainability. How is your project being you're looking at resources from entities that are known for, you know, working on very thin margins. How is this project being funded if you can speak to that? Well, the idea is that we're going to as I said, harness all this content and bring it all into a single discovery service and of course an offer of this content is openly available on the original website. Users will come into our platform it'll be free for anyone to do this, discover and then we will route them back to the original website from where they can then do the download. Now, if that content if the link is broken we will be able to serve up a copy that we've kept that service will be available for our subscribers our members. Equally, we're looking at licensing in some copy some content some backfile content in particular content that has yet to be digitized or content that is sitting in the archives of hygeos and NGOs that could do with being made digitally available and that content we will obviously include for our members for a period of time so that we can recoup the cost of capturing that content and after that period of time then the content can go open so the funding will largely come from, in a way commercial services that we're offering to the market but at the same time there's a large amount of free service wrapped into here and if those of you who've known me from Iowa City Days will have heard me talking about freemium a lot so in a way we're creating a type of freemium service here and we have other ideas for monetization to help fund the work in terms of providing usage data and intelligence about the impact of content back to the content owners so that should produce another revenue stream for us so that's where we are really interesting, okay thank you, thanks I just want to remind everyone also or share with you if this is your first time attending one of CNI's webinars we do have the capability to turn on the microphone of attendees if you raise your virtual hand and would like to ask your question directly or engage directly, make a statement please feel free to go ahead and raise your virtual hand and I can move you into your microphone is turned on and again we invite you to type your questions into the Q&A box I was also curious Toby about identifying the organizations that you are already whose materials you're already ingesting into your system how did that process happen well obviously with my network within the IGOs it was relatively easy for me to go around all of the major IGOs that maybe you haven't heard of to get their support for what we're doing and so that part was relatively straightforward with the NGOs now I knew a lot of the NGOs but together with Jenna Makowski who's the editor I've got working on this project we've been researching the NGOs and working with picking off to begin with obviously some of the larger ones but we're also looking at other areas I mean for example the European Parliament Research Service they produce an enormous amount of really valuable content that no one really knows exists so we've been working with people like that to get them to cooperate with us in terms of supplying the content so that we can really produce a service that is really quite unique in terms of its scale and its breadth. Yes indeed so much fascinating information just left untapped there as well well I want to thank you again Toby for coming to CNI to present your really interesting work and I see that we actually do have a question right now let me share that with you what idea do you have in terms of what contents to include for your project? Okay well we're looking at content that to use a bit of OCD jargon would be substantive by that we want content that's got some meat in it we don't want communication content so we don't want what I call you know it's brochureware sort of the bits and bobs that's put out that describe what they're doing or try to convince in terms of how wonderful they are so we're looking at content that actually has some substance in it but we're looking very broadly so we're looking at blog posts we're looking at tweets, we're looking at podcasts we're looking at videos, we're looking at reports we're looking at webpages we're looking at datasets the variety of content that's being put out is huge and so we're being completely agnostic in terms of the file format what we're really after is making sure that we can capture what is useful and make that available rather than selecting at the basis of each content item and plainly we can't do that, not if we're harvesting two and a half million items what we're doing is we're selecting at the level of the institution so we're working and looking at particular IGOs and particular NGOs to see whether they fit what we're trying to do and within the policy space that we've identified there are some IGOs that don't fit the World Meteorological Association for example they publish science there's no policy content there at all, it's really really hard science about weather systems so we won't be including that type of content we're looking at content that really addresses policy issues and the organizations that are in that space Interesting, that was a great question thank you for asking that and we still have some time for more questions if anybody would like to type in a question for Toby or if you would like to make a comment or engage directly feel free to raise your hand and again this is part of C&I's Spring 2020 virtual conference on going until the end of May we're really grateful to Toby for coming to C&I to chat with us a bit about his project Can I pose a question to the audience how many of you think that you would like to join us as we launch this platform and get involved in some testing because we really want to make sure that we get feedback from we've already been talking to an awful lot of librarians and a lot of faculty about what we're doing but when we actually come to roll up the platform we really want to get people involved in testing to make sure that it is delivering what's needed so how many of you do you think in a way you can put your hands up to show interest in getting involved in testing We have several hands raised there I don't know if you can see Toby in the attendee box we've got four or five hands going up there we also have a comment and if you want to make a comment please question feel free to type that in the chat box let's see we have five hands raised and here's a couple of comments let's see one comment wonderful and important work and thanks for sharing interested in the platform how will CD treat embargoed or not open to the public content will there be a way to know of existence of some of this content even if it's not readily available to the public well we're currently at the moment obviously we're getting feeds from some organizations and we're harvesting off public websites for the bulk of the content now for example with the OECD we're going to get a feed from their their system and I know because I built it that their system has the details about embargo content in the feed so for them yes we will be able to share the forthcoming titles and the embargo dates but it's really going to depend on the feed that we get as to whether we're going to be able to do that now I have to say from my knowledge of most international organizations content management systems they don't really have that ability to feed in advance embargo dates so I'd be surprised if we're really able to do that at scale I think we're going to be able to do that in a few cases now plainly if the content is imagine if the content is sitting behind someone's paywall which could exist we will know of that content inside Policy Commons so people will be able to discover it inside Policy Commons and when we hand that user back the original website that's when whatever access control system they have will kick in and so therefore yes you'll be able to discover the content but your access rights will be dependent on whether you have subscription rights or password rights or whatever whatever wall that that organization is using that's okay and Boaz that question was from Boaz who said to you got it thank you very much we have a couple of other comments here I'm not quite sure what my availability will be with this COVID-19 quarantine stuff but I would potentially be interested and we have an email address there which I'll share with you later thank you for your work that commenter says I'm at the University of California where I bet folks may already be involved I will share about this with my colleagues in special collections and government documents and yes so we have some interest definitely in this project and I'm sorry can you see me Toby did we lose you well I'm afraid we may have lost Toby and I am sorry for that but I will extend my thanks to Toby for coming to CNI and presenting about this marvelous content and his wonderful project thanks to all of our attendees we hope to see you back at other webinars throughout the meeting take care be well and bye bye