 We're gonna go ahead and kick off this session. So I just first wanna say welcome to everyone that's here and to the speakers. Thank you so much for joining this session on open science plumbing, the infrastructure enabling and catalyzing policy implementation discussion. I'm Nikki Pfeiffer. I'm the Chief Product Officer at the Center for Open Science and I'll be the host and moderator for this panel along with our four terrific speakers who will introduce momentarily. So first, before we really get into this, I wanted to just frame this session. The title is a little bit provocative talking about the plumbing, but really what we're gonna get into is just kind of thinking about the, sorry, I'm getting my slides. Whoops, sorry about that. Just thinking about how open science will progress and is part of a more of a clear change initiative and the shift towards open research practices is something that is more of a social change and it's best met with a systems approach. So this is a nice pyramid that's sort of trying to illustrate the systems approach with interdependent components of infrastructure to make it possible, training and community, building and norms that make it easy and more normative incentives that make it easy and hopefully aligned with what research, practice and policy requirements require. So each of these interdependent components is necessary and each isn't sufficient on its own to shift the culture on its own. So in this panel, we're gonna talk about some of the challenges that our speakers have identified with changing research culture and share more about infrastructure solutions and policies that are catalyzing change and having an impact on open research. We'll hear from each of the speakers and then at the end, we'll open up for about 15 minutes of dialogue where we'll have some prompted questions as well as pull from the chat and Q&A function for this. So feel free to be dropping in questions that you have at any point and we'll take those at the end. So let me go ahead and introduce the speakers. So first we'll have Amanda French who is the technical community manager at Roar which is the research organization registry. Then Jen Gibson who is the executive director at Dryad and open data publishing platform. And Tim Vines who is the executive, sorry is the founder and CEO of DataSeer which is a platform that helps researchers comply with stakeholder open research policies. And Ishwar Shandra Mali Swaran who is the program director and technical lead for NIH's Office of Data Science Strategy. One last note, and I'm gonna drop this in the chat right now is that the speakers slides are available and I'm dropping the link to those as well as all of the different presentations from the panels and sessions throughout the two-day event are available also on the OSF. Okay, so with that, again, thank you to these speakers and thank you all for joining. I'm going to pass it off to Amanda to get us started. Great, thank you so much, Nikki. Next slide, please. Hello, everyone. No, just the first slide, title slide, I'll introduce myself. Back, there we go, right there. I'm Dr. Amanda French and I am the technical community manager for ROAR, the research organization registry which is an initiative jointly operated by Crossref Data Site and the California Digital Library. Next slide. ROAR is a global community led registry of open persistent identifiers for research organizations that is becoming a global standard. ROAR makes information about research organizations clean, consistent, normalized and easy to exchange among software systems so that journal articles, data sets and other research outputs can be reliably associated with organizations such as universities, laboratories, companies and funders. Next slide. If you haven't heard of ROAR before today, maybe you've heard of ORCID identifiers and maybe you've heard of digital object identifiers or DOIs. An ORCID ID identifies a researcher, a DOI identifies a research output such as a journal article and a ROAR ID identifies an organization associated with the research such as the organization where the researcher is employed, AKA the researcher's affiliation or a funder that supported the research. Next slide. Here's a sample ROAR record for the California Institute of Technology. And this record should make it clear that ROAR IDs like ORCID IDs and DOIs aren't just identifiers. The true value of these identifiers is that they are associated with rich information or metadata about the entities they identify. On this record, you can see the unique ROAR ID at the top in the left there. But you can also see the name of the organization, what kind of organization it is, other names it goes by, including acronyms and common alternative names such as Caltech, the organization's website, its location and its mapping to other identifiers. ROAR also supports parent-child relationships and sibling relationships for organizations within the registry. And you can see that here. And this can include labs, hospitals and research centers. The ROAR registry currently includes over 105,000 research organizations from around the world. Actually, I think it's over 108,000 now. With new records continually being added and existing records continually being updated. This image is taken from the ROAR browser search, and you can, if you like, open this in another tab, roar.org slash search and look for an organization. But all the information shown here is also available in the ROAR API and in the downloadable dataset of ROAR, both of which are entirely open, public domain and free to use. Next slide. So on this slide, with some apologies to NASA, you can see an example of one of the problems ROAR helps solve. If you search NASA's scientific and technical information repository for work associated with Caltech, you'll get different results, dispending on how you spell and even capitalize Caltech. Because historically, like many, many, many, many other scholarly systems, the SDI repository has captured and stored organization names as free text. And of course, just imagine the different results you'll get if you look for work of the California Institute of Technology spelled out instead of Caltech or CIT, which is a common acronym for Caltech, or for work produced by Caltech's Jet Propulsion Laboratory, which may or may not appear in a search for Caltech work. It's vitally important to connect all of these works together because they are all, when they are all associated with one organization. Next slide. Next slide. Okay, here we go. The terrific news is that NASA's STI services is working now to address all these name variants and incorporate ROAR IDs into the NASA STI repository. We worked with them quite a bit last year, during the year of open science, to make sure that NASA related organizations are properly represented in ROAR so that they can continue to clean up that data that's in their repositories, and that work is ongoing. Next slide. So here's a great and elegant example of a ROAR integration in the Data Repository Dryad. We'll hear from Jennifer Gibson in just a moment. You can't see ROAR anywhere here. We're talking about infrastructure, we're talking about plumbing, this is behind the scenes. ROAR behind the scenes is powering the organization lookups in both the institutional affiliation field and the granting organization field. So typing Caltech brings up a suggestion for the California Institute of Technology all spelled out. And when the user makes that selection, it's associated with Caltech's ROAR record and stored in Dryad, where it can be shared with other systems. Similarly, you only have to type the acronym for the California Space Grant Consortium to bring up the ROAR record behind the scenes. Once that is stored in the system, again, that can be and is shared with other systems. And so hopefully, STI's repository will soon look like this. Next slide. Many key infrastructural systems are using ROAR and more and more are adopting it every day. We're a relatively new identifier, especially compared to DOIs and especially compared to ORCIDS, but we're seeing a tremendous increase in adoption, especially over the last couple of years. One of the most important infrastructure services that uses ROAR is ORCID. Individual researchers really don't need to know about ROAR or use ROAR, but if individual researchers create ORCID profiles and keep their affiliations up to date, that helps to make sure that the whole scientific and technical information ecosystem has clean and consistent organization data. So you can see here an ORCID record where somebody has said, yes, I am employed by Caltech and you can see that they've selected that from a similar suggestion box. And if you click the show more detail link on the right, you will see the ROAR ID and other information that is coming from that ROAR record. You can actually see that in the UI of ORCID. Next slide. So finally, I just wanted to include some testimonials that show how and why ROAR as a free and open data set with global coverage and a powerful API is important for creating reliable connections between researchers, research outputs and research organizations. One quotation I didn't include here, but could have is from the August 2022 OSTP memo on ensuring free, immediate and equitable access to federally funded research, which we heard a great deal about in the wonderful plenary. And that memo explicitly calls for good metadata, including quote, all author and co-author names, affiliations and sources of funding referencing digital persistent identifiers as appropriate, end quote. OSTP very clearly recognizes that open science depends on the kind of invisible infrastructure that ROAR helps build. This is a panel on open science plumbing and I hope you've enjoyed this peek at the pipes. Next slide. Last slide. Thank you, feel free to contact me after the session if you have any questions. And that's all from me. Great, Amanda. I'll take it from here. Hi, everyone. I am Jennifer Gibson. I'm the executive director of Dryad, a platform I'll describe for you in just a moment. With my brief remarks, I want to offer a few key points and that the first thing is to go back to the framing of the meeting again and the goals for the year of open science. There is strong alignment, obviously, with what we're talking about and the goals. So the infrastructure for open data sharing is a pillar of any strategic plan for open science. More than that, making well-described data open can contribute to the integrity of peer review as well as further research. It helps contribute to improvements in research assessment and to opportunities to recognize more contributors to the global scientific enterprise, including those who have been underrepresented. So credit to Nikki for pulling this theme into the conversation. Another point that I'd like to leave you with even before I get into the slides is that the infrastructure for sharing data is at once mature with some specialist repositories online for several decades and rapidly changing in ways that I'm going to talk about in just a moment. And finally, I'd like to try and make clear that we're ready to support emerging policies for publicly funded data and to help funders expand your view of the impacts of your investments. So moving on to the next slides, the data infrastructure is it's expansive. With my remarks, I'm going to hone in on work by Dryad and other generalist repositories which are home to data from a wide variety of disciplines and are to be distinguished from specialist data repositories that support specific domains and are the first port of call for domain specific data where such a repository exists. On the next slide, I just want to tell you a little bit about Dryad. We are indeed an open data publishing platform. I don't like to talk about repositories so much because given the maturity of practice and systems and policy that we have available, it really behooves us to talk about bringing the data to life. Dryad is also a multi-stakeholder community of institutions and funders and societies and publishers all committed to a common vision for the open availability and routine reuse of all research data. And we've been working since 2007. Just a couple of other points here to say that as a generalist repository we do support work across research domains. At Dryad, we only do data. We only publish data. We partner with others to support other work and our process is supported by a team of human curators. On the next slide, Dryad is a participant we're honored and excited to be a participant of the generalist repository ecosystem initiative funded by the US National Institutes of Health Office of Data Science Strategy and Ishwar. I understand you're gonna hear quite a bit about Gray today and tomorrow. So I'll be brief in saying that we're about adopting common standards and approaches to enabling and supporting open data sharing and reuse. But in principle, I wanna highlight that Gray is very exciting with the exercise we're modeling an unprecedented social and technical collaboration across different types of entities whose potential is simply and purely to improve the integrity of the global open research infrastructure and advanced open science. So just take a moment and consider how powerful it is to have Elsevier and digital science take up the types of community standards pioneered by Dryad and the Center for Open Science. That's just one example. That's just one facet on the potential here. So I'm gonna show you a little bit about what that looks like in practice. So on the next slide, we see this is a form from within the Dryad system. So through the collaboration at Gray, each of the repositories has committed to the data site metadata schema. So we're all using the same metadata schema. And within that schema, we've committed to not only capturing funding information made available now through war, but a secondary level of funding information such that the National Institutes of Health, the funder of this particular program is not just learning that the dataset was published by the, sorry, funded by the data, sorry, funded by the NIH, but that it was funded by a particular institute. So this establishes a powerful precedent for the other agencies. Next slide, please. That information is then possible to be made available on the public-facing, human-facing instance of the data publication. So again, this is looking at the Dryad data publication page where the funding information is displayed. And because that information is captured in the metadata for the data, it is communicated up through data site and trawlable and discoverable through data site comments and any other system to which the metadata is sent. So again, permeating this connection of funding and available data. Next slide, please. It makes it possible then to use our search interfaces to isolate data by funder and by institute. And on the next slide, we take a look at the back end of the system. So there's an administrative dashboard in the Dryad system. And again, I'd point to data site as a global access point for access to all this type of metadata from a lot of different sources. In the dashboard, you can not only browse it, but you can export that data and then analyze it in any way you like. So at Dryad, we're offering the title of the data set, the authors, the DOI funder, funder ID award, and then information about the lifecycle of the data and where it's at. Next slide, please. So I'll stop there and hope that I've made a connection between the kind of grand ambitions and potential that we have here by having a nice, steady, robust, and organized data infrastructure to support open science. Thanks very much. I'll look forward to the discussion and I'm now going to hand it over to Tim Vines. Thank you. Yes, Dryad's amazing. So, DataSeer, we're tackling a different part of the open data infrastructure. Let me get into some more information about that. Ford, please. Okay, so here's a sort of organizing principle for the talk, which is cake. Next slide. So what I mean by a piece of research being a slice of cake is that the scientific article is really just the icing and underneath that there's a whole layer, many layers of different components, datasets, code objects, protocols, metadata, lab materials, all of which come together to underpin the article itself. So next slide. And here's a cartoon of the research process. Funders pay for the research, scientists do the research, scientists send the article to a journal. After peer review, the journal publishes the article and in the end of Gaunt Report, scientists tell the funder about the article they published. And of course, in steps three to five, the layers of the cake have gone and all that's remaining is the icing. And next slide. This is a problem both from a resourcing point of view because per piece of research, we're probably spending, funders are spending about $200 or $500,000 at salaries, overheads, lab equipment, materials, you name it, it's an expensive procedure doing research. And if most of the outputs of that research never become public, then we are not making the best use of those public funds. Next slide. Those missing outputs also are detrimental to science. We can't verify the results of the article without access to the data sets and the code that's used to analyze those data sets. We also can't repeat research from scratch if we don't have the list of lab materials used and the precise protocols that were used to create the results. And we also can't find the key outputs without metadata saying what links to what, what relates to what. Okay, next slide. And so the solution really is to ensure that all the research outputs become public. That is the cake itself is published rather than just the icing. Next slide. Okay, so why now? Why do we really care about this now? And there's obviously probably everyone in this session is aware the Nelson memo from last, from 2022 really had a profound effect on the policy landscape in North America. And in particular, it says that scientific data underlying peer reviewed publications should be available publicly accessible by default at the time of publication. So that means the work of ensuring that the data are available has to be done before publication. And that implies publishers. And we also need to monitor data sharing across the articles. The agencies are tasked with providing information back to OSTP saying, how did we do? Next slide. And lest any of your institutions think that you're not involved, you are. Because when it comes to NIH funding in particular the data management sharing plan from January of last year is now part of the grant. And that means that the institution's effectively promising to ensure that the authors share those data sets that they lay out in the data management sharing plan at the publication of that article. And of course that is a huge compliance issue for the institution because NIH contracts or federal contracts and there may be consequences for the institution if the data sets are not shared when they are published. Okay, so let's move on. A new slide. So what does data set do? Where do we come into this? So we do a bunch of services. So for example, here are open science metrics. This is for a corpus of articles. We help the stakeholder understand where they are, where they're going with things like, did the policy we brought in last year have any effect? Or we're thinking about bringing in open science policies. Where are we now? What do our authors generally do with their data? Do they put it on supplemental material? Do they tend to put it in an online repository? Or do they do nothing at all? So next slide. So our one project you may have heard about is our open science indicators, which we co-create with PLOS. I recommend you listen to Ian and Eskibitz tomorrow morning. He'll also be talking about the open science indicators. And it leads to all sorts of insights. So next slide, please. So here's an example for a different client that we have created. So they wanted to know what proportion of our authors share data online. So we picked up a bunch of research articles in the left-hand doughnut. Most of these generated a reuse data. Quite a lot of them didn't share data for about a quarter. Another quarter or so put the data on the supplemental material. And over half actually shared their data in an online repository, which is amazing. This is actually the best result we've ever seen for this. And then we look at code. So did the authors use a command line software? If they didn't use command line software, then they have no code to share. They did have command line software. About half of them didn't share the code. Very few of them put it in the supplemental material. And then about 30% put their code objects in an online repository, typically GitHub or Zanodo. Okay, so this is the kind of insight that comes from our open science metrics. So next slide. The second highly related service is what I was just showing you is column L, the proportion of articles that share data online or somewhere. And so that's 100% within the red box here. Next slide. We can also redo these metrics to ask, instead of what proportion of the authors did behavior X, we can say for this article, what proportion of the open science behaviors that the authors could have shown, did they actually show? And so these authors here in this first row did a great job, they shared their data, they put it on in an online repository, they included the DOI URL, they put it on Zanodo, they also shared their code which is a bit further over. And so they did really well. Next slide. These authors down here did much worse job. They claimed to have shared their data from the supplemental material. And if you look at their data accessibility statement, they say, oh, it's all in the supplemental material. There's actually nothing in the supplemental material. So they have done a much worse job of their open science. Next slide. And so what we can do is sort of separate the open science sheep from the open science goats with the sort of authenticate like service where you send us the article and we say, okay, these authors have more work to do around open science. And this would allow, for example, publisher to triage articles as they arrive to say, okay, these ones need a bit more intervention to make sure that they're doing a good enough job of open science, whereas these ones can just carry on into peer review. So that's the purpose of our open science snapshot. Next slide. Then we get into our third service, which is much more meaty. This is really where we are aiming for high levels of compliance with an existing policy. And in this case, what we do is produce a catalog of the datasets and other outputs associated with an article. So here there are five datasets that they have generated, rows one, two, three, four, five there that they have not shared anywhere. And so we're telling these authors, look, you've got these five datasets and RNA seek alignment and VAR data. You have not shared, you need to put these on repository. And then they've also in the lower four lines, they've also reused existing datasets, but they've only cited the accompanying article. They have not provided a PID for the dataset itself. And so we're saying, okay, you need to ensure that we can actually go right to this dataset. So next slide. This is very effective. We've been doing this work with learning science across Parkinson's. We have a bunch of other clients for this, but in particular, learning science across Parkinson's asks the authors to send us the first version of the article, we've produced the report. And then when we see the second version of the article, then the compliance rate is way higher. This came out in plus computational biology earlier this year. So it really, really works to tell authors what they need to do. And it jumps across this implementation gap more broadly between the general policy that is formulated by a state called you must share your data and the actions required for an individual manuscript. And that takes the problem from being a PI level problem where they have to sit and work out how the various policies, open science policies that affect them are actionable for this particular article. And we jump all the way to it being a postdoc or a post grad style problem where they are given a list of actions and told, please go ahead and share these datasets on the following repositories. And that makes it way easier to promote compliance because the authors then know what they need to do. They also know that you know what you need to do. That they were, they also know that the stakeholder knows what they need to do. And so it's much harder for them to say, oh, I didn't know what to do because it's very clear when they've done what they needed to do. Okay, that's it for me. Thank you. I'm going to hand it over to Ishwar. Thank you, Tim. And thanks again for the opportunity to share about infrastructure efforts to support open science at the NIH. The next slide gives you a little peek literally as Amanda said and kind of showcases that open science infrastructure cuts across several NIH initiatives that go towards the implementation of the NIH strategic plan for data science. And they're essentially intended to better enable implementation of the policy related to public access for federally funded digital objects that Tim just alluded to in terms of the Nelson memo by effective adoption of best practices in data management and sharing throughout the research lifecycle. I can't obviously go through all of these but I should just touch on a couple of these to highlight them and demonstrate that. The next slide showcases one of our flagship projects called the Strides, which stands for science, technology, research infrastructure, but discovery, experimentation and sustainability, which is essentially a partnership with commercial cloud services to allow NIH researchers to access modern storage and computing resources. And in addition to providing modern storage and compute infrastructure, one of the goals over time is really to enrich the registry of open data as demonstrated if you just hit next to the next slide as demonstrated in the Strides partnership with AWS or Amazon services to have this growing collection of registry of open data service. The next slide showcases simply the opportunity for us to break down silos across the NIH supported systems through the NIH cloud interoperability program and the research or services that are both programs to help streamline not only the access of NIH funded data assets so that users can log on to a system more easily search within and across the systems and be able to analyze across platforms. The next slide shows you a glimpse of the generalist repository ecosystem initiative that Jen beautifully summarized as being an opportunity for both social and technical collaboration amongst leads in the generalist repository space with the hope that NIH funded research data could be more discoverable and hopefully more reusable as well in the given that they would share appropriate data in the right time and the right place. The next slide builds on that with the hope that some of the tenants established by the generalist repositories could be extended towards domain specific repositories and this is a program supporting data management and sharing policy through targeted funding of the repository and knowledge base infrastructure. These are intended to be investigated program initiatives that have unique and bring unique value proposition towards both new as well as established data resources that all go towards efforts to modernize this landscape of data resources that are NIH funded. The next slide just recognizes the important role of software and research software tools to advance biomedical open science and we have efforts that bring in research software engineering best practices and encourage open development of these resources so that the software developed in one project can be reused in other projects across the board. The next slide talks about our NIH's role in participating in the NAIR pilot which stands for the National AI Research Resource which is a concept that brings together various agencies and groups to establish a national infrastructure to connect both resources and researchers. And in fact, everyone can engage in this and there's a survey that is open to March 31st to provide research educator youth cases that will offer insights to shape the trajectory of this project. This is a demonstration of our partnership in the larger in with other agencies as well. And the next slide, which I think is the last slide kind of talks about our role in the big data interagency working group that is kind of stood up by the OSTP Networking and Information Technology Research and Development Program. And our goal, one of the goals is to update the strategic plan for big data interagency working group to address some of the gaps in the plumbing needs for infrastructure to stay tuned for more updates on the strategic plan. And I think the last slide over there simply gives you access to more information and all of these projects that are available on our website. And you can also reach me to find and learn more about these. Passing it back to you, Nikki. Wonderful. Thank you. Thank you so much to all of the speakers for those great presentations. Those were some really interesting things that you brought up. And I think at this point, we'd just like to open up some dialogue across the speakers. I'm gonna stop sharing the slides. I have one last slide just to kind of say we're gonna get into some question and answer time. So there's a couple of questions in the Q&A already. So we'll definitely address those live. I have a couple of prompts that I think we're gonna go get started with, but we'll feed in the Q&A after we get through a couple of these sort of prepped questions. And so feel free to put your questions in the Q&A. So just the first question and we'll kind of go around the panel. I'd love to hear, even though you touched on these things sort of throughout the presentations you gave, I'd love to hear more about specific challenges or opportunities that you're seeing in either what could be done or is happening relative to advancing open research practices. So maybe I'll start with Jen since she's right next to me and see if we can pass it around just to hear your high level thoughts on this. Sure, thanks, Niki. First, I'm optimistic about the progress that we've already made and the progress that we will make. I think we have a powerful precedent in open access kind of broadly speaking to build on and learn from as we expand into other topics in open science. We've learned some things that work and we've learned some things that don't work and we've seen change over a couple of decades. So I see optimism there. From the Dryad perspective, a singular challenge for us is resourcing, frankly. We're a small nonprofit organization in a very busy competitive space. And if the community and funders want to see nonprofits and open source initiatives thrive, then we need that investment. And I'm reminded of the community that Ishwara mentioned a moment ago, the specialist repositories funded by NIH that are newer compared to some of the folks that have been in the wild for longer. And I'm really wondering how we can all collaborate so that we're not all out there alone trying to get money from Ishwara, for example. What economies can we create in collaborating and supporting one another to thrive in a very busy world? Yeah, that's a really good point. And I don't know if you or maybe you and Ishwara together want to talk a little bit more about the co-operative sort of concept that is brought up through that generalist repository ecosystem initiative because it is the sort of getting at your point and the places where there are economies and opportunities to collaborate and really catapult some of that investment. I can add something to that, Nikki. And in terms of the challenge over there, I think it's not so much, even though the session is on pumping and infrastructure and we're looking at under the hood of what's available and how people can leverage it. I think the biggest challenge with Gena, Louis II is really the collaboration aspect. It's really the teaming aspect or the mindset change in how do you kind of think beyond what your own needs are in terms of the larger, greater good. So I think that is the biggest challenge in helping people kind of overcome that barrier in burden. And the flip side of that is really the opportunity in terms of making them realize what's in it for them while they're kind of collaborating and contributing to that. So I think that's kind of the two sides of the coin over there. So just alluding to the co-operative concept, it's really a novel concept that I think the generalist repositories are successfully working through and now they're into their third year of really looking into what each of them brings. And in fact, in the domain specific world, we're trying to coin these as asks and offers. What has somebody to ask of others that they have challenges of and what do they have to offer in terms of challenges they have solved? So how do you bring these and put these on the table and how you can leverage each other's asks and offers? Hopefully they're complimentary. And if there are common asks over here, then how do we as a community address and work these together? And that's kind of what kind of builds and will hopefully mature this concept of co-optition. Jen, I don't know if you wanna add something more to that. You know I do, but I'll be super brief. So I've just really gotten into this concept. So I did study it in business school. So I knew it was out there. But the social experience, you know, building the trust and the dialogue with the other systems has been really fundamental to opening doors and getting things done. And I wanna highlight that, you know, I really respect and appreciate NIH's pioneering. This in our space, I'm coming to see other examples in adjacent spaces that I think are also exemplary and that we can learn from. The Eclipse Foundation is one, there's a working group there where the major software companies of the world have collaborated on common code from cars. So if we all have worked from a common code that we need to date for basic operations, how might we better spend our time in building value and differentiating? So I'll stop there. They're pretty excited about the idea. Yeah, go ahead, Tim. Yeah, so I think the things that I'm very optimistic about, there's two really. One is, I think a lot of the sort of slowness of movement towards open science has come from a lack of solid metrics about it because if we can't get a sense of the scale of the problem, then we can't really take any action or it's hard to convince people to take action. So gathering metrics, which is, it's not a coincidence that this is what we do is because we think this is really important in order to sort of motivate action. And the other problem that we really see is this fundamental misalignment between values and capabilities in open science. And by what, what I mean by that is that publishers have this incredible workflow called peer review, which is designed to improve manuscript, manuscripts until they're ready for publication. And so you can hold up a manuscript and peer review until the authors have done what you asked them to do. And so this is the ideal venue for ensuring that open science happens, that you can say, okay, you have to share this data set there, that data set there. And once you've done that, then you can go on to be published and you have the moment of attention from authors. You have, they are strongly motivated to comply, but publishers don't see any value in open science. There's no economic value in open science for them because if you publish an article that shares fully reproducible, it shares all its data, that gathers the same APC or the same subscription revenue as one that shares nothing. So there's no money to be made right now for publishers from open science. And so that limits their willingness to really get into this, because who's gonna pay for it? And then on the other side, there are funding agencies and institutions who are putting money towards research. Like I said, $200, $500,000 per piece of research. And so the extra investment required to ensure that all of the outputs associated with this piece of research are shared and the whole thing is reproducible is a tiny fraction of the money that's been spent already. However, they never really get to see or hear about articles until long after they're published. So they don't have any ability to have a dialogue with the authors whilst the article is in peer review. And I think this is the fundamental piece of plumbing that needs to be connected is that during peer review, the publisher says, okay, clearly this is an NIH funded article. You are therefore under this, affected by this policy will help you share your data sets. And then we will tell the NIH we did this and maybe there's a fee for that too or maybe that's tacked onto the APC but the ability for journals to tell authors, look, you're also affected by policies from these other stakeholders and we're gonna help you comply with them and then demonstrate your compliance back to your stakeholders. That's a huge service that needs to happen and that's gonna join all these incentives together. Finished. That was great. Thanks, Tim. Amanda, do you wanna chime in? Oh, so much. So, you know, Roar obviously being a bit different from some of these other initiatives. I mean, certainly the greatest challenge that we're facing is also one we're optimistic about in that we're seeing just increasing Roar adoption all the time. Roar does tend to work a little bit like telephones, you know, it's more valuable than more people have them. And so we've seen great adoption but we're still waiting for more especially from large publishers. So from our perspective, we really want all of this great tracking and discovery to be possible but we need a bit more adoption of Roar to make that happen. And such a fascinating discussion about the co-opetition model because I have found that really tremendous as well. And it's, with some of the larger scholarly publishers, there are, I think, free market forces that are working to drive Roar adoption in that it is a competitive advantage because interoperability with certain systems is a competitive advantage. And they need to be interoperable with sometimes sort of impoverished systems if that's not a thing. But ones that need a free identifier need to build on free infrastructure like as in no cost as well as in open. So what we see is that, you know, you can't really use proprietary metadata to create a genuinely interoperable system. You can't do it. I mean, you've got copper pipes in one part of the house and PVC in another and they don't fit together. You know, you can't have that. They all need to connect together to be really useful. So what I have found so, but the co-opetition model has sped up specifically Roar adoption among those systems. I think they would have done it. Some had done it already. Some are in the process of doing it. Some did it in the course of the Roar task group. And I think they all would have gotten to it eventually, but I think that they all agreed that this was something that was a value to them that was a kind of a basic step they could take. So I did love that. And then I just think it's, it will be interesting to see, I think really honestly, especially in the next 12 months, 12 to 18 months, what happens in the commercial sector with specifically Roar adoption because I do think that there's a lot of movement there. Anyway, Bill. That was great. Thank you. Yeah, even with, even the policies coming out over the next months and year, two will also help, I think, encourage some good practice adoption. We've got three Q and A questions. So I thought we could take those for the last few minutes that we have. Maybe one is clear that it's for Tim. So there's a question about data sear. So it's looking at the layers of data under the icing. Is it looking past that, past just data and looking at things like protocols and how does that relate to clinical trial data? So if you could answer that. Yes, yes. So we are just testing and are close to releasing a preregistration metric. That is, was this article, did this article preregister itself either with open science framework or on one of something like clinical trial registry? And so that's a new metrics coming along. We've also got a metric of use of protocols which is much more sort of general thing, maybe life sciences use of protocols on protocols.io or pointing to a protocols manuscript. And so we've got these two interrelated metrics that we are close to launching as part of the plus open science indicators. And these will be available to other clients too. That's great. There's two more. So we'll just run through these and see who might wanna chime in. The first one is talking about if there's challenges with reproducibility, why aren't there, I guess, more metrics for this? This, I'm generalizing the question but I feel like that's something Tim, you kind of started to talk more about. And so I think this is an interesting one and I can just say through the gray activities with Jen and Iswar, this is something that is coming up. So definitely more metric tracking but I wanna hear if you all have some specific thoughts on this. I have many and I'm gonna try and keep them in very short one. Reproducibility is an article level phenomenon. That is, can we get to the results in the article? And that means that we have to ensure that all of the data sets, all of the relevant data sets are together. However, the sort of Uber metric of is this fair is to my mind, the best way you get at fair is did anyone reuse this? Like is this reusable as kind of an abstract question? Did people reuse it? And of course you're blending in usefulness. Like is this data set useful to the community? But if it's useful but not reusable, it won't get reused. If it's useful and reusable, we will be able to monitor reuse. However, right now as Jen is painfully aware, it's quite hard to track reuse because authors typically cite the article when they've reused data and not the data set. So we at Data Set are actually developing a set of metrics where we go and track down these quasi data citations where they say something like, we downloaded the data set from Chevet Al, Chevet Al 2009. Right, so clearly they're using the data but all they're doing is citing the article itself. And we need to capture this kind of reuse. From our experience, it's about 50 to 60% of reuse takes this form. So if we can capture that, then we can get better monitoring reuse and therefore capturing reproducibility and fairness. Thank you very much. Please. The only thing I would add is that I think from NIH's perspective, we have an emphasis on metrics. We have an emphasis on reuse, the ability of data and we have efforts towards encouraging people to reuse and demonstrate that use and share some of the challenges they have found that. And so we want to be able to get towards where data resources are tracking metrics and not just that metrics go beyond just counts, but they're actually kind of demonstrate the scientific impact of the data itself. Thank you, Jen. What were you gonna add? I would like to add that Niki should contribute to this as well. And tell me if you want to or not, I'll set you up and say that the Center for Open Science has been a pioneer in reproducibility studies. I'm not as familiar with the psychology work but that was a foundation for work done in cancer biology in partnership with Elife where I was before and it was about testing the reproducibility of work reported in those stories. So Niki, your best position to kind of talk about what work you guys may have done on the data level but I realize moderating and speaking at the same time is very, very challenging. It is. Thank you very much for that setup. I wish there was more time that I could go into more depth on that. I do think the studies that my colleagues at COS do is really to evaluate some of these things and it's not only just at the metric level, it's really going deep into understanding how reproducible certain cohorts of studies are and why, what are those challenges? And so I think a lot of it is similar to what Tim has shared out. It has a lot to do with sharing of all the layers underneath in the cake and that is a component that we're just not good at yet. I think we're trying to get there and I think as we support early career researchers taking on those practices, those things will improve. There's lots of nuance to that relative to the quality. Once you start doing the action, how good is it, what you're sharing, how robust are some of those things? So I think that's a very good call out. We're past the time for the end of the session. We had one last question. We're gonna go ahead and answer, but I just wanna thank everyone that has joined us for the past 50 minutes or so. We really appreciate your attendance and your engagement with the speakers and the questions, but we won't keep you because you only have a few minutes till the next session. So if for those that stay, the last question is really around, it's a really interesting one, talking about a study that came out looking at the DOIs for research papers, probably some exploration of data as well, just talking about the preservation of some of this and are we at risk for how we're going to scale and support as more articles, open access things and data sharing occur. This really does get at the plumbing, I think a little bit. So I'd love to hear from the speakers if you have thoughts on this. Sure, I'll speak to this. This is done by my colleague, Martin Eve at Crossref. It's wonderful work. I think I'm not implying that the question is wrong in any way, but I do want to make it clear for people who are not familiar with the work that the work does not mean that works have disappeared. Sorry about the thumbs down, I didn't need to do. And it doesn't mean that DOIs are not resolving. What it does mean, which is a matter of concern, is that especially among smaller publishers, the work is not properly backed up. Now, again, not to detract from Martin's work at all, but properly backed up is quite a high bar, to be honest. So it means deposited in a recognized archive with scholarly archival standards. I'll give you a, anyway, I could go into more about this. I think one of the things you asked is, how could this happen? And I think implicitly, what are we going to do about it? What can be done about it? I think how it can happen is honestly a matter of resources when a DOI resolves and the link works, a smaller publisher in particular may not be, may be relying, for instance, on just their web host, which probably has a regular database backup, like it can be retrieved. We transfer DOIs all the time, perhaps that small publisher gets acquired by another publisher and the material changes domains, changes hosts, but still has a working link. So that kind of thing tends to be good enough for people who really aren't organizations that really aren't digging into properly backing things up and things like Portico or Clocks or Locks or even the Internet Archive. They may think, oh, the web archive crawls my site, that's enough. It's really not a properly high bar for preservation. So that's all to be said. The headlines I think are a little more alarmist than even the actual work warrants. Now, the other thing is that Martin is doing some very exciting work at Crossref around this. And I think Crossref is also, we've been having some internal discussions about encouraging members to think more about proper preservation. It is part of the terms of Crossref membership that people take a look at how their material is backed up. But that has been just a sort of an honor pledge in a way. So this analysis is partly so we can say, are Crossref members honoring what they have agreed to in their membership terms by agreeing to make all this properly backed up? So this study is actually a really great sign in that it is the harbinger of great advances in preservation, I think, especially among Crossref members. That was great, thank you, Amanda, for sort of some inside info on what went down there. Any last words? I'm happy to hear from anyone else on preservation or on topics generally, and then we'll lead the wrap it up. Go ahead, Jen. I would just add briefly that I'm reminded of the desirable characteristics for data repositories for federally funded research. So might not need to remind this audience, but there are some hard work that's been done in the development of the policies that we're facing now to describe standards for repositories that include preservation, redundancy, use of DOIs, persistent identifiers, and other metadata. So within that context, I think there's important foundation for us to stand on and outside that context, an important precedent to take a look at and try to leverage. Very good point. Thank you. All right, well, I think we will go ahead and wrap it up. I just wanna thank everyone for joining us. I wanna thank the wonderful speakers for their time and efforts to just pull together this amazing sort of set of updates and shareouts about the work that you're continuing to do. And really just, it's exciting, I think, to see some of the plumbing coming together to really catalyze and enable open research and the impact that we all wanna see. So thank you all. Have a great rest of your day. Well organized, Nikki, thank you. Thank you. Thanks everyone. Bye-bye.