 Hello, I'm Christina Drummond, and I'm the Program Officer for the Open Access e-book usage data trust effort. Today, I'll be providing a project update about our two-year pilot, which is generously supported by the Andrew W. Mellon Foundation. This project is a global collaborative effort to explore infrastructure and data governance mechanisms that would be required to facilitate usage data aggregation, benchmarking, and economies of scale, specifically for open access e-books. To date, we have engaged dozens of thought leaders representing the leading organizations and projects working on open access monograph analytics. To start, I want to recognize that I'm here today representing a remarkable team of advisors and principal investigators, hailing from a wide array of organizations, including university presses, commercial and not-for-profit publishers, and U.S.-based projects such as Projects Muse and Toe. From a ground-table SCI conversation in 2015, our effort has grown into our current project engaging these 23 advisors representing public and private e-book publishing, distribution, and analytics efforts across five continents. This is in addition to over 100 individuals who are informing our work through our community consultation process, which I'll describe later in this session. Now that you have a sense of who is involved in developing the pilot usage data trust, let's walk through four topics over the next 20 minutes. I'll step by contextualizing our work within the broader research enterprise to then focus in on the nuances to open access e-books. With that background, I'll present the data trust model we're piloting to govern community-based public-private data stewardship. Then we'll walk through our project outputs to date and what's queued up for the coming year. To start us off, I'd like to invite you all to take a step back and think about the research enterprise broadly from scholarly idea inception through fundraising, grant administration, and scholarly communication. Each of these areas involve their own activity life cycles, such as research data management, scholarly publication life cycle, or even grant administration life cycle. Today, I want to draw your attention to the nexus point where usage and engagement data intersects with each of these domains as each looks to incorporate information about impact into their planning and reporting. To put it simply, usage data at my talk refers to more than just counts and downloads. It refers to data about impact and that data surfaces through the amassing of online access information, usage, and link engagement data. Across the research enterprise, there is a need to establish and trust impact from pursuing support and reporting scholarship impacts on a funding agency's mission or program goals to understanding the audience and reach of one's work. Linked data is what makes it possible to provide the context and surface engagement beyond counts. Yet, doing this effectively requires amassing usage data from across platforms and repositories, and sometimes across scholarly outputs such as research data sets, journal articles, or as-is-are-focused eBooks, our open access eBook usage data trust project, or OA eBoo for short, and is looking to facilitate the exchange and use of such linked usage data. In our project's planning phase from 2018 to 2019, we gathered stakeholders to establish common needs for OA eBoo data across publisher types, platforms, and services. These discussions surface the need for improved data exchange to enable benchmarking, return on investment tracking for open access investments, and marketing outreach efforts of services. Since multiple hosting, aggregation, distribution, and reference data sources come together to generate that information about a single work's impact, we need to think through how to combine these individual elements in order to scaffold to a broader analysis. Today, the open access eBook usage data that one needs to create usage reports may be publicly available via an open APIs or metric services, or it may be siloed and provisioned within individual repositories, publishers, or platforms. Each press, publisher, library, metric service, and platform generates data, and that data goes to contribute to a story of how a book has impact. Today, each of these independently are going through the same motions of trying to combine data and place it alongside their own institutional data to tell a broader story. The further challenge is that an institution can only access data that is openly shared or for which they have negotiated access privileges, making benchmarking incredibly challenging within a global landscape that includes private publishers, platforms, and service offerings. It's within this context that our project asks two questions. Can OI monograph publishers and libraries be better equipped to understand the impacts of their collections or open access book collections by facilitating the comparison of data in a global cross-platform context? And can we as scholarly communication stakeholders do this while managing the privacy and ethical questions that emerge when aggregating usage information at such a large scale? Compared to journals, open access books flow through a wider number of dissemination and aggregation platforms. The resulting usage data itself is diverse and may be potentially seen as commercially valuable or sensitive. Yet, there remains an interest in figuring out how to get to comparable trusted benchmark data that's really trusted across the ecosystem. Whether different stakeholders see their usage data as private or public good is a complicated question. For the sake of argument, let's presume that relevant data may be sensitive or proprietary, and that expecting presses or publishers and commercial platforms to openly share all their raw usage data may be unrealistic. Especially if you're looking to link that usage data with other data related to sales, for example. So if we have sensitive data sets that we require for benchmarking, the question becomes, can we create a trusted environment for moderated data sharing by a grade upon community rule? And can we do this as the world around us shifts into a data economy in the age of AI? Amazon's Decommissioning of Goodreads API is a stark reminder that ongoing data access is fluid and it's not guaranteed. From a metadata perspective, open monograph information has often been wedged into standards and schemas developed for journals or for processes built around paid access. Right now, it is hard to track through systems second order impacts that result from a work of becoming open or being made away. To support one of the aims of our project, Michael Clark and Laura Ritchie produced the open access book supply chain mapping report referenced on your screen. In that report, Michael and Laura visualized the OA monograph publishing data supply chains. And here you see the data flows between stakeholders that are involved in the usage data capture and reporting. Solid black lines represent the generation of structured usage reports theoretically governed by counter in practice. That doesn't happen entirely. And I want to bring your attention to the areas in yellow where you see how publishers, libraries, and library management systems must work to compile data from multiple upstream data providers. This is usage data that may or may not be open. And this aggregation process, I'll note, requires significant data science expertise and deep OA book metadata knowledge. This visual, also from that same report, illustrates how different data transfer protocols are flowing through these processes for their complicating matters of data brokerage and stewardship. Onyx is reflected in green, K-Barton red, Mark in blue, and OIA, PMH, and orange. And the black lines here represent no common standard in use. It's within this complex environment that we're looking to steward aggregate and benchmark usage data at the global level. This brings me to the data trust model as a concept that we might look to for community-based data stewardship. Our team looked at a data trust as a data governance mechanism and organizational structure. Data trust emerged in civil society as a way for public-private partnerships to govern data access and aggregation to facilitate data reporting across controlled datasets. As a unique type of data commons, they allow you to work across open and closed datasets by brokering community norms, and those norms that set the rules of the road for data aggregation and use among competitors. Applying a legal trust concept to data governance, these governance structures empower data trustees to manage their singular access security and privacy control needs, while also creating shared frameworks and data-oriented outputs that benefit all participating parties. To paraphrase our co-PI's Cameron Neal and Lucy Montgomery, the hope is that by working together, data trust participants can gain access to comparative analytics and economies of scale, while also addressing resource challenges within organizations that may have less technical capacity. In other words, a data trust is a way to have multi-stakeholder governance of both open and privately shared data. With that in mind, I'll now provide a status update for the OAEBOO Data Trust pilot. While the project started at SEI Conversation in 2015, the effort is now in its second phase of funding from the Andrew W. Mellon Foundation. Our current grant is implementing recommendations that resulted from the stakeholder planning workshops in 2018. The findings in those workshops were published through the Book Industry Study Group and the report shown on the left. Working together with individuals, such of yourselves, through our community group, we are now looking to better understand and document the use cases and challenges that surround e-book usage analytics to then inform the development of open-source infrastructure, data governance policies, and sustainability models. Mapping the supply chain for OAE books, and we discussed this previously, that provided our project with a system-wide view of the stakeholders. We are now also looking to document the use cases of how these stakeholders use that monograph usage data to then consider how best a data trust can support the ecosystem. To foster engagement and feedback from the different types of stakeholders at the table, we launched seven open Google Group stakeholder communities to really foster these global conversations around specific usage data trust requirements, so we could not only understand those use cases, but also gather stakeholder-specific perspectives relating to the data policy governance and sustainability. For example, in partnership with TOME, we held author workshops to understand their uses of OAE book usage data, as we fully expect to author their best position to describe what they'd like to see and how, and they're also best positioned to flag data uses that might concern them. It's one of the examples that we have here in our stakeholder discussion groups, and while we hypothesize that the use cases for university presses and library publishers will be very similar, again, we're turning to our community members to help make that determination. I'll note that all of these are open, and people can opt in from around the world at any time. You're more than welcome to join using the link on your screen. To gather such user input, let's let a five-step virtual design thinking-based ideation process with those stakeholder communities to allow them to describe how they rely on usage data, to share their frustrations about doing so, and to capture their ideas on potential solutions. This ideation methodology has been documented in a recent volume of collaborative librarianship, as is cited on the slide. While we don't have the time to dig into our findings in detail, I'll note that everything is open for community review and comment. What you see here is an example of the summary findings contributed by our university press representatives, which suggests that the value of the usage data trust is really one in terms of data brokerage and standards development. Sticking with the university press example, you see here that the high-level operational functions and presses that engage with book usage data span everything from editorial strategy to administrative reporting, fund grant development, and open access knowledge transfer impact metrics and reporting. If you go to this document, clicking on those bubbles will take you to an eight-page detailed document that lists functional use cases by role. These use cases are available on our projects website for public comment once they are fully documented in partnership with the stakeholder community. And by the end of March, we should have those posted for university presses, publishing platforms and services, commercial publishers, and scholars. Cross stakeholder types may be interested in seeing where the functional interests overlap. Those marketing and selling publications or publishing services look to leverage the information, the usage information for marketing and sale strategy and supporting faculty use of the information of how these books are engaged with and used across the world is really of interest across stakeholders. But it's especially of interest to those that provide those analytics and dashboards as a value-added service. I'll also note an interesting finding to date has been an expectation of data control by faculty members. In a conversation with multiple authors, there was an expectation that current practices that empower individual faculty to decide what is viewable within a given P&T portfolio, for example. That decision what should be in the portfolio is something that the faculty gets to make and the expectation was that that should be maintained. In other words, scholars would continue to have a say over what usage data was and was not reviewed for hiring in P&T. This signals a growing issue where the privacy interests of scholars may come into conflict with the interests of funding agencies and university administration and necessitates broader conversations around the ethics of usage data use and careful consideration of the role of usage data in terms of potential harms to scholarly freedom and the disparate impacts of usage data use on particular communities of interest. Quickly, I'll note that for OA publishing platforms and services, economies of scale opportunities abound. Use cases surface the resourcing challenges faced by both public and private platforms and services that support scholarly publishing and rely on these communities financial support. To put it simply, they are currently working in parallel to manage data feeds and address and present information that's coming upstream. The question to the data trust is in what ways might core shared infrastructure reduce that burden? Is there rationale for community governed usage data infrastructure similar to how Internet2 and the global network of national research and education networks support data telecommunications or does siloed usage data flow management and aggregation foster innovation when different parties recreate those processes in-house in parallel? I suggest that there is a distinction between data dashboard visualizations and analytic services where innovation and competition are rampant and the data exchange and brokerage function, which I would suggest requires neutrality and commitment to abiding by community norms. So far in this session, we've discussed the documentation of the OA Ibu data supply chain personas and use cases. With this information in hand, the trust is currently focusing the second half of the project on working with community partners to develop and test the feasibility of pilot open-source infrastructure and to develop the budgetary staffing and data governance models that we would need to transition into an operational data trust. From a technical perspective, we are piloting open-source software for the data trust infrastructure. We're working with the Curtin Open Knowledge Initiative to study the technical feasibility of using the open academic observatory platform, Kibana and Elasticsearch for an operational data trust. Dashboard pilot partner organizations are working with our technical team to then pilot the ingestation, processing and presentation of commercially sensitive data unique to a single institution, while also exploring how a data like can leverage the data aggregation, privacy, security controls that will be required to manage the information securely and to derive aggregated benchmarks across the data that was contributed into the trust. We are fortunate to be working with a diverse group of global university presses, as well as Springer Nature's open access books program to model the data ingest workflows and the data visualizations through this project. In addition, we are working with the WAPIN, the Open Access Publishing for European Networks Initiative, to explore how a data trust may support their metric service. In this process, we have already discovered how critical data transfer and use agreements are. Given the proprietary and sensitive data that's being stewarded across national policy boundaries, this leads me to the role of the data governance mechanisms for the data trust. Our project policy working group will be working to develop a detailed data governance mechanism model that specifies the roles and responsibilities of the data providers, the trust as essential data steward, and downstream data users. These norms are crucial to developing streamlined processes for the legal review of standard contractual agreements about the usage data transfer at the beginning and the data use on the back end, especially when working with it international scale across public and private parties. In the coming months, our project will be publishing final versions of the supply chain report to new use cases. And by a third quarter of this year, we'll be learning from our dashboard pilot proof of concept as we move towards bottling and vetting, both the data and organizational sustainability mechanisms. All in all, I cannot thank our community enough as it is because of them that we've made such remarkable progress in the first year of our project during the pandemic. It speaks volumes to the level of community interest and commitment around this concept of finding ways to exchange data to achieve a greater public good that would benefit us all. So with that, thank you so much for listening. If you are interested in joining our community's stakeholders to inform the project, you can find our opt-in forum on our website. That is the end of this project update. Should you have questions, please do not hesitate to reach out to me at Christina at educopia.org. Thank you for listening. Take care.