 Alright folks, it's 5.46, so I think we'll go ahead and get started here. Thanks for coming to this last of the parallel sessions for the afternoon. My name is Kevin Hawkins. I'm from the University of North Texas Libraries, and presenting on a project that involves not only me, but also some colleagues of mine, I certainly don't know who to acknowledge. This research project that I'm going to be reporting on is funded by the Andrew W. Mellon Foundation. So, to help explain the issues we're trying to address, I want to share with you an example chosen by colleague Charles Watkinson, again, associate university librarian for publishing at the University of Michigan Library, and he's director of the University of Michigan Press. So this University Press publishes about 15% of its books in open access, mostly through funding from Knowledge Unlatched. The book shown on the slide was published by the U of M Press in 2016, and it was unlatched then as well. By an author who's aiming for promotion to full professor at Stanford. The author was interested in open access, especially for reaching readers abroad, and given the book's interdisciplinary subject matter, open access also seemed especially important. But the author wanted to understand the use of the book and be able to make a case about its impact beyond winning a book award and quoting sales figures, the sorts of things you might talk about conventionally. And the U of M Press wanted to reassure the author that he had chosen the right publisher and made a good decision in going with open access. So the U of M Press has been struggling to find a way to tell such stories to its authors. The press participated in the KU Open Analytics pilot effort, which aims to provide insights into the usage of OA books included in Knowledge Unlatched. The reports provided by the pilot, such as the example shown on the slide, can be shared with authors to give insight, for example, into which countries their book finds readers in. And the report on the right, I realize it's probably a little hard to see in the back, also provides insights for the publisher, such as which platforms the users are coming through. But there are lots of other things that we'd like to know as well. For example, is the book really reaching interdisciplinary audiences? What are the institutional affiliations of readers? And are students reading this book, or faculty members, or non-academic readers? So who exactly are these readers of the book? So Knowledge Unlatched staff say that they have been frustrated trying to form relationships with all the platforms that their content is available on in order to get data for this pilot. And with openly licensed content, that content may end up being available in other locations as well, sometimes even without the knowledge of the author or the publisher. And the Internet Archive on Unglue It, or maybe the author's institutional repository. There, of course, likely with the author's knowledge. So the KU staff has also been disappointed in the quality of the data that they get from the platforms. So this book, Alienation Effects, was assigned four different ISBNs by the U of M Press for the different product formats, and that was intentional. But the different platforms have used different ISBNs of those four in their reports. And the platforms produce different kinds of counter reports. We'll talk a little more about counter reports in a bit. Sometimes for the books as a whole and sometimes on a chapter level. So these are some of the frustrations that led us to our research project here. So the, you know, objectives of this project here, you know, to create a structured conversation around usage tracking for open access ebooks, to understand implementation challenges, define opportunities for collaboration, and define a framework for moving forward. Ultimately here, we really are looking to make a compelling case for investment in open access book publishing by authors, publishers, funders, and libraries. And that's kind of the sort of agenda behind all of it here. So this work has been going on for nearly a year. We're wrapping up this one year research project. So we received funding last spring in the summer and fall. The team at KU Research prepared a discussion document that provided a review of the landscape, which is sort of an environmental scan, and outlined a concept we call a data trust. More on that in a bit. Later in the fall and into the winter, we had a community consultation period around that discussion document that culminated in an invitation-only summit that we held in New York in early December. And then this spring, coming soon, the book industry study group will publish a white paper that includes recommendations for next steps. A little more about that invitation-only summit. We explicitly tried to bring together a number of communities, especially between Europe and North America and between the for-profit and not-for-profit kind of sectors of stakeholders here. Many of the organizations that were represented at that meeting are in this pretty diagram here. So with that overview of the motivations, objectives, and timeline for our work, I want to go into a little more detail about how usage data for Open Access eBooks works and doesn't work today. So you may be familiar with usage data for online journals and databases. For these types of resources, generally speaking, views and downloads of content are reported according to one of the formats specified by the counter code of practice. It's a standard way to compare usage between different products. The code of practice was expanded in recent versions to address the specific needs of books and of Open Access content, which certainly helps with comparing usage of eBooks across platforms. Besides counter reports, though, there are other types of usage data. Notably, some web platforms use tools like Google Analytics to provide a richer look at how website users engage with content. These tools usually provide more information than the simple tallies of searches and views that you find in a counter report, and they instead assist with what's called path analysis, so how the user navigated through a particular website or web resource specifically. But information about the usage of academic eBooks, especially Open Access books, is much more difficult to gather, analyze, communicate, and communicate than comparable information about electronic journals. So eJournals are usually delivered through a publisher's website, even if that website on the back end uses some third-party platform like Silverchair. But eBooks are instead usually delivered through intermediary channels. The various intermediaries for eBook distribution, I've listed a couple here on the screen in italics, EBSCO, ProQuest, JSTOR, Project Muse, they kind of compete for market share and tend to view the usage data that they collect as proprietary. And what they do share back with the publishers and or with libraries that subscribe to them can be inconsistent and in format that can't easily be compared as I was explaining in that example. So, you know, for example, library aggregators may report chapter downloads or whole book downloads, which indicates very different kinds of usage of the content here. But there are, you know, other sources of usage data besides the kind we get from eBook distributors. So, authors or the publishers may end up making more than one version of a work available separate from those main distribution channels I discussed, right? The author's personal website might have some maybe early version of the work, essentially a preprint. There are these kind of self-service platforms that authors may end up using like FigShare, academia.edu, ResearchGate, and we tend to think about them with journal articles, but there's no size limit on them. Not that it's going to affect putting an open access book there. You know, copies and institutional repositories, other digital library systems. You know, the publisher may choose to distribute the work through channels that are specific to open access content like the OAPN library. And as I was mentioning before, if the work's openly licensed, you're going to have third parties like Unglue it or the Internet Archive that may end up distributing their own copies of the work. So they either go out proactively gathering content or people may be exercising their rights under that open license to distribute the content. So, you know, we feel that any attempt to represent the usage of an open access eBook needs to take into account these various channels. Plus what we might call storytelling indicators such as alt metrics and cross-ref event data. So as Lucy Montgomery, Cameron Nailand, both of KU Research, and some colleagues there in the research team, Alken Ozeigan and Thomas Lever argued in a recent article in the journal Learned Publishing, we want to know not just the quantity of usage of a book, but about the audiences who are engaging with it. And if possible, how and why they use the content. But it's not just the publishers and libraries that need a full picture of open access eBook usage. We've got authors and funders want to know more about how their books are used, but they're usually reliant on the publishers sharing the appropriate data with them. And, you know, while journal publishing is concentrated among a few large publishers with a fairly stable revenue stream, Monograph Publishing is actually quite highly distributed among publishers that are usually running on a shoestring budget. So a few Monograph publishers have the staff with time and expertise to examine usage data closely. So to put it another way, advocates of open access often say that an open access book will be more often downloaded, used, and cited in a comparable restricted access title. But all stakeholders in scholarly communication, you know, want proof of this. Publishers need to demonstrate the impact of their open access publications to receive support for these open access programs, especially if it's sort of institutional publisher. Funders are looking for usage data to demonstrate return on their investments, and authors are eager to show evidence of additional reach and influence for their work. So our study is really aimed at figuring out how to establish a mechanism to do this. And our discussion document argues that the problems here are less technical and more social, a collective action problem. We need to establish a trusted framework for coordinated action between all the relevant stakeholders that will allow usage data to be shared in an appropriate way between them, but in a way that guards against misuse. And so this, we call this a data trust. A cooperative organization to address the needs of scholarly publishers, libraries, and other stakeholders in the collection, validation, aggregation, normalization, and dissemination of usage data. So the features of this data trust here, there's a couple. So this would be a repository of data about open access books and their use. This kind of framework for gathering this data for use that includes well documented pathways and workflows. There's also an organization behind this sort of repository that has legal responsibility for the data. We want it to be a community or member-governed entity empowering multiple stakeholders and responsible oversight. And it also performs a sort of standards-making role here, ensuring fair and equitable access and could essentially act as a laboratory around questions of data ethics. So we hope that this data trust, or we envision, at least let's put it that way, the data trust will help us look into some of these questions here. How can the information be collected, contributed from its members, normalized, stored, curated, and preserved? And then setting up rules around sharing and usage of the data. What are the terms of use? Are they the same for all members or would they vary between members? Who can see which parts of the data? How can the aggregated data be accessed and used? And who was allowed to build services on top of that? And how would that work here? So I want to share a sneak peek at our findings in this forthcoming white paper. I'm just kind of briefly stated here. Of course, there's much more detail in our draft paper, but forgive me for just reading the main points here. So first of all, we see that a good deal of data is already available to those who want to study the impact of open access monographs. But it's sometimes held in closed environments. So quite a bit of the data actually is being gathered at this moment. But there are certain types of data that are of interest to stakeholders, but that as far as we know, they've never been compiled. And yet the number of available datasets, whether those are open or closed environments, as far as we can tell dwarfs the data that is not yet available. So again, much is being gathered in some format already. It's just not necessarily available at this point. The data, what's the greatest interest seems to vary by audience. I first talked about these main audiences here. Authors, publishers, funders, the vendors, the platforms, libraries, readers. And across these audiences, relatively little of the available data is being used widely or consistently. It's clear that there are marketplace and ethical concerns about the use of certain data points. So marketplace concerns, things like sort of competitive intelligence, what the players feel that they should be sharing with others. And then of course the ethical concerns here around reader privacy and the ethics of collecting and compiling such data and metrics around use of scholarship. The counter code of practice that we've discussed, it does not provide some of the qualitative information about open access e-book usage that the stakeholders want, but it's the governance group behind it is willing and eager to adapt the standard to be more useful for open access e-books, right? So good sign there. It's not like they're saying, this is not our problem. The use cases for open access monograph discovery, access consumption engagement have not really been widely or fully developed. I mean, I've kind of given you an anecdote here at the beginning, but these haven't been really fully developed and mapped against the needs of specific audiences. It's just kind of some ideas we have. We've found the significant work is being done outside of North America and we feel that the coordination between these kind of regional groups has been inadequate up to this point. There's still significant debate among those that we receive feedback from about how you would go about building a data trust and specifically whether its governance and operation should be centralized, federated, or distributed. I mean, the idea I described was fairly centralized, but we have a number of key stakeholders who feel that that is not the right approach to take. And finally, for creating a data trust, we feel that we need to get agreement among stakeholders in at least three areas. The standards for data exchange, where and how the data is going to be stored and managed, and how analytics will be built on top of that data. That's kind of the three broad categories there. And then a sneak peek at our recommendations for next steps here. First of all, we want to... We need to define the governance and architecture for the data trust and articulate priorities for future work. Second, we need to create a pilot service that implements the defined governance and architecture from that first recommendation. We need to implement and extend the relevant open-source technologies that already exist, many of them coming out of Europe, across the base of stakeholders, including here in the U.S., where a number of us are based. We need to develop the personas and use cases that more clearly here, that demonstrate who benefits from open access monograph usage information and how a data trust can better serve their needs. Fifth, we need to build engagement across markets. I've been talking about sort of North America and Europe, but really there's a whole world out there and there's a lot more going on, and we really need to ensure that this conversation is global, not just across the Atlantic. And we do need to better document the supply chain for open access monographs. There are certainly aspects of our conversations that we're revealing that there's still not a lot there in that area that is clearly well known and a sort of shared understanding among the stakeholders in the community. And so, we need to do some work that's kind of a fundamental here that will help all of this work go more smoothly in the future. So, I've been alluding to work in Europe and I do actually want to specifically mention the Hermios project there. They're really a kind of major player in this area and we see that as a major significant collaborator in the future here. So, with that, I provide the URL here. It's also on the session page on the website. This is where the white paper will be available. We've been getting some final feedback from some key people here, but expect to issue it in the coming weeks. So, I welcome your questions and comments.