 My name is Sarah Lippincott. I'm the head of community engagement at Dryad, and my talk today explores Dryad's current and future efforts to promote data reuse. This includes enabling machine usability, interoperability, and integration with open science tools, as well as researcher education and automated processes for data quality assurance upon submission. This talk builds upon the work of Dryad's Interim Product Manager, Maria Pritzellis, as well as our former product manager, Daniela Lohenberg, and consultant Dr. Karthik Ram, a senior data scientist with the Berkeley Institute for Data Science, who spent months collectively rethinking how users access and compute with data published in Dryad. Dryad is an open, international data publishing platform and community committed to the open availability and routine reuse of all research data. Since our founding in 2008, we have curated and published nearly 50,000 data sets, representing the work of nearly 200,000 researchers at nearly 70,000 institutions around the world. Our data sets are connected with research articles in over 1,200 leading academic journals. Dryad publishes research data across domains and is a powerful conduit for data that doesn't have a home with a specialist repository. We publish only research data and work in collaboration with Zenodo to facilitate open publication of associated software and supplemental information, one of the many connections we maintain with other open source tools and organizations. Dryad's strengths are in the curation and standards-based open publication of research data, as well as our collaborative approach and community governance. Our curation process is best in class for generalist data publication. Our open source publishing platform represents the latest best practice for sharing data, connecting it to related research outputs and measuring impact. Our strategy for collaborating with other values-aligned services rather than competing sets us apart. The open sharing of data underpinning research is essential to achieving the benefits of open science. The value of openly shared data lies in its reuse. Reusable open data provides evidence to support research articles and enable experts to interrogate, validate, reproduce, and build upon new findings, which is nothing short of critical in many domains. Open data allows researchers, including those from less resourced institutions and regions, to work with data they might not be able to generate themselves. Recent policies from the National Institutes of Health and other funding bodies, as well as the OSTP memo on ensuring free, immediate, and equitable access to federally funded research, provide renewed motivation for researchers to deposit their data in trusted repositories. However, moving beyond policy compliance to publishing data that is suitable for reuse and reproducibility requires new approaches. The NIH asserts that data shared in compliance with its policy should be of sufficient quality to validate and replicate research findings. Merely archiving data would fail to realize the intent of these policies. Fair data principles, that is findability, accessibility, interoperability, and reusability, are core to Dryad's mission and are deeply embedded in our technology and services. We have long excelled at the first three elements. Reusability is in some ways the most challenging and complex of the principles to implement. Enabling reuse requires a holistic approach to data curation and publication. It entails optimizing metadata quality and completeness, facilitating human and machine access to metadata and files, and applying appropriate licensing. In the following slides, I will describe a number of the current and upcoming features in Dryad that specifically empower data reuse. I will touch on longstanding features, as well as new and upcoming improvements to our user interface and API, based on the work of Daniela Lohenberg and Karthik Ram. A cornerstone of reusability is providing sufficient contextual information to help ensure that the data can be correctly interpreted by future users. A data file without context is almost impossible to reuse. One reliable and widely accepted way of transmitting this information is through a readme file that accompanies the data package. Historically, readme files submitted to Dryad have been inconsistent in quality and comprehensiveness. Since late 2022, we have provided a readme template in markdown format and made it a requirement for new submissions. A template facilitates consistency across data publications and supports our longer-term goal of embedding the readme text within the data publication page, rather than offering it only as a downloadable file. Persistent identifiers, quote, play a role in the reusability of data by enabling rich metadata and provenance to be associated with a digital object. Dryad metadata incorporates a range of persistent identifiers, including ORCID IDs for researcher names, research organization registry IDs for affiliations, crossref funder registry for granting organizations and DOIs from data site. This metadata made available through our API makes it possible to provide robust context and build programmatic links between a data set, its creators, funders, and related outputs. Related outputs may contain essential context for interpreting data sets. These include software and code, research articles, preprints, other data sets, and data management plans. Dryad encourages authors to add persistent identifiers for this type of related work, directly linking data publication with its network of upstream and downstream research artifacts. We provide a seamless conduit for depositing code and supplemental information through our partnership with the Open Access repository platforms in Nodo. And a data publication at Dryad is a living artifact. After publication, we programmatically harvest data citations to continue building connections and adding context over time. Ensuring data quality is key to promoting reuse. Downloading data only to find it contains simple errors that render it difficult or impossible to use can generate mistrust and frustration. Our team of expert data curators runs thorough manual checks for data quality, but ensuring quality at scale requires some level of automation. Since 2021, Dryad has partnered with the Frictionless Data Project to run data validation on all tabular data during the submission process, giving researchers immediate feedback about their data so they can make edits in the moment and learn about database practices. Future work will expand this type of automated pre-curation checks to further improve the quality and reusability of all data publications upon submission. As file sizes grow, it can be a time-consuming, tedious, and resource-intensive process for the average user to download data from Dryad. This can be particularly frustrating when researchers can't tell from the metadata whether a data set will be relevant to their research. Data previews, which provide a glimpse into a file's contents, can eliminate this overhead and alleviate the barrier to reuse. In 2022, we added data previews for tabular data files, which represent about a third of the Dryad corpus. With the click of a button, users can now retrieve a quick preview of the data, including column headers and copy it to their clipboard. Connecting directly with open science tools that researchers already use in their workflow helps to maintain the integrity of a data package. Dryad's integration with the Electronic Lab Notebook R-Space allows users to export data and metadata from R-Space to Dryad and associate it with a data management plan created with DMP tool, making it simple for researchers to package all relevant information and provide vital context with their data submission. While many users explore and access data through Dryad's user interface, accessing data programmatically supports a range of use cases, including retrieving large volumes of data in a format suitable for analysis and automating regular data harvests based on a specific criteria. All of Dryad's data and metadata are available through our open API and a package developed in collaboration with R-Space makes it simple to create reproducible pipelines for data querying. Dryad obliges the use of a Creative Commons zero public domain dedication license. While some research communities remain reluctant to waive all restrictions, especially attribution, Dryad strongly believes that CC0 is the most effective way to remove any barriers to reuse. Creative Commons itself asserts that putting a database or data set in the public domain under CC0 is a way to remove any legal doubt about whether researchers can reuse the data in their own projects. And it clarifies that although CC0 doesn't legally require users of the data to cite the source, it does not affect the ethical norms for attribution in scientific and research communities. Recent news about a lawsuit against GitHub, which mined openly licensed code published by its users, is a pertinent example of how less permissive licensing can create barriers to reuse and risks for re-users. What new features can you look forward to over the coming months? We have a number of improvements on the roadmap that will further promote reuse. Dryad authors often use Excel workbooks to combine multiple datasets. This creates barriers for potential users because they can't easily evaluate the data included in a submission. We can't apply data previews to this type of file and usage notes may not provide sufficient clarity to navigate a workbook. To remediate this barrier, we plan to automatically convert Excel workbooks to individual CSV files upon submission. The individual files can then be run through our tabular data validator, curated and published. Through this enhancement, a full list of data files will be exposed via the API and the user interface, allowing human and machine users to programmatically access individual files. At least 13% of data packages in our corpus comprise opaque compressed files that contain collections of tabular or FASTA files, a text file commonly used in bioinformatics and biochemistry. When viewing a data publication in Dryad, a researcher has no visibility into the number or type of files contained within a zipped file. The issue here again is the overhead of downloading files and the ability for a user to efficiently determine whether a dataset is relevant and usable. The solution is to provide this visibility through the user interface and API. This will allow users to quickly see what they're dealing with before making a download request. Finally, in collaboration with metadata game changers and the Center for Expanded Data Annotation and Retrieval Cedar at Stanford University, Dryad is piloting approaches to increase dataset quality while streamlining submission. For example, we're piloting displaying discipline-specific metadata at the point of submission for datasets, depending on fields of research identified in the submission process, and using these metadata fields to flag datasets that might be a best fit at a disciplinary repository, as well as piloting integrations with these repositories. The approach will rely on the ability of the Cedar technology to acquire and encode standardized metadata for different scientific communities using established reporting guidelines for different classes of experiments and standard terms for metadata. Much of this work will also depend on developing machine learning algorithms based on the Dryad corpus to build the automated workflows for routing data to appropriate repositories. If you'd like to find out more and keep up to date on the features related to reuse and other aspects of data publishing, Dryad maintains a public product roadmap on GitHub where you can explore our latest feature releases and or suggest your own. We post regularly about new features on our blog, which you can visit at blog.datadryad.org. I also invite you to contact us with any questions about data reuse or this presentation or about Dryad in general. You can reach our help desk for technical support at help at datadryad.org, direct general inquiries to hello at datadryad.org, or contact me directly at sarah at datadryad.org. Thank you.