 So we're still waiting for some of our speakers to join. But in the meantime, I would just like to welcome all of you. I would first like to thank the Center for Open Science Team for really putting on, helping us with all the logistics and everything with putting on this program. They deserve a lot of credit. And I would like to just introduce a session. Well, first of all, myself. So I'm Christopher Erdman from SILIFLAB. So we provide research data infrastructure for the life sciences in Sweden. And I'm also part of GOFAIR and San Diego Supercomputing Center. I am here to sort of introduce a session on ongoing activities to advance open science at the U.S. federal agencies following the launch of a year of open science. And we've heard a lot about this and this is our chance. I think, you know, when we were putting together this program, we were really excited to hear from some of the agencies. So NSF and NIH and DOE and thankfully they responded to the call. And so we have Martin Halbert from the NSF. And I'll let Martin introduce himself later on. Susan Grigorek from the NIH and Brian Hitson from DOE. Just a little sort of background to where we're going to be talking about sort of the activities involved in the year of open science. So that can range from anything for strengthening open science policies to promoting incentives for open research practices. And we also sort of cover the public access plan, which has been mentioned in earlier calls as well, earlier talks. And just a sort of logistics here. We'll hear from each speaker for about 15 minutes. We have the Q&A. We have the chat. I will be relaying their slides so you can follow along. But later we'll have Q&A. So it'll be our opportunity to ask them any questions that you may have. So without further ado, I will introduce colleague Martin Halbert from the NSF and introduce yourself, Martin. Thank you, Chris. Can you hear me okay? I can hear you. Yeah. Great. So my name is Martin Halbert. I'm the science advisor for public access at the US National Science Foundation. And I'm really delighted to be here today to talk to you in this concluding plenary of what has been, I think, an amazing webinar conference, hybrid conference. I am going to go ahead and start my slides then. And Chris, can you give me a thumbs up that those are coming through? Okay. Yeah, it's coming through. Awesome. Awesome. So this has been a great event, like I said, and I sat through all the sessions yesterday, although there were so many breakouts that I couldn't, obviously we can't attend to all of them, but I learned a tremendous amount. I think this is a marvelous opportunity for us to reflect on this, but the accomplishment of the federal year of open science. And especially in this session, we want to talk a little bit about ongoing activities in this space and this transition or pivot point from the year of open science in 23 towards this future state that we're calling a future of open science and what we mean by that. So in this short session, what I'm going to do is give you just a little bit of context about the genesis of the year of open science, our aims in it, a little bit of context about NSF, and its participation in open science prior to the year of open science 23, and then a little bit about our current activities and going forward and things that I'm looking forward to in coming months and years. So just a word about the 23 year of open science and what led to it. There were a lot of open science developments that agencies were undertaking in 22 the year before it. Certainly you're going to hear about some of these in this session, the NIH data management and sharing policy, which was then imminent in going into effect in the coming year, 23. You heard a bit yesterday from Shale Gentiment about the NASA TOPS initiative that was brewing up in 22. And at NSF we were busy announcing competing this thing called the Feros RCN program that I will talk about momentarily, which and those projects took effect in 23 and we will talk a bit about them. All of these activities were really, you know, jump-started into like supercharged or something when the Nelson Memorandum came out August 22nd of 22. And as we all sort of realized, wow, 23 is going to be a really dynamic year with all of these activities going on, we began to think, well, you know, several of us in different agencies started thinking, well, what can we do to synergize? You know, how can we take advantage of this moment and synergize all these activities in the best way possible? And then as we talked about it, the idea that came to us was to identify 2023 as a year of open science, especially for federal agencies that were seeking to catalyze efforts in this space as we were all planning on posting our new public access plans from the Nelson, called for from the Nelson Memorandum and all these other synergistic activities. And we just thought that it was an opportune moment to make sort of a stake in the ground of statement about the importance of open science to our respective agencies. And that's what we ended up doing. So the year of open science, I think, has been, it was a celebration and a statement about the priority and importance of open science policies and processes by the federal agencies that participated in it, which are the primary funders of research in the United States. Well, okay, so we did the year. What comes next? Well, we thought quite a bit about this in the course of 23 and realized that we wanted to make it clear that this wasn't the end of the story, but rather the beginning and that we're now transitioning into a future of open science with a period that we see as continuing our movement and momentum towards open science goals. And the point I want to make here is that in some ways, the year of open science was the easy part. It was the part on sort of the mom and apple pie statements of our support by agencies and research communities in open science principles and goals. And the challenges that we're now facing and that we're going to face going forward are the more detailed ones of actually realizing those goals, implementing all these changes, these cultural practices that are so hard to inculcate in scientific communities and scientific practices. So open science is going to, my main point is that it's going to require us to think and deeply internalize this notion that this is going to require ongoing and continued change efforts. Nobody likes change, but it does lead to better things when you improve things. So that is kind of the core point about going forward and our ongoing activities. But let me start to say a few things about first, before I get to all those things, I'm going to give you a little bit of context. So all of us, of course, have to have a gratuitous slide about our agencies in this upcoming one on NSF. Let me not only give you the usual stats, but a point that not many people are aware of, a few points that people may not be aware of about NSF. So while everybody presumably knows NSF, we're an independent federal agency, perhaps unusual in among federal agencies in that we support all branches of science and engineering. We're not focused on large clusters of areas like health or energy like my colleagues are going to talk about, but rather we address all areas of science. So we're spread pretty broadly. And you have some stats there about the number of awards that we do in a typical year, 12,000 awards. Typically we reached over 300,000 researchers with this funding and fund 2,000 educational institutions in a typical year accounting for about 25% of the federal support for higher education research efforts. Now, something that a lot of people don't know about NSF is in our original sort of founding document, incepting document by Vannevar Bush at the conclusion of World War II, Vannevar Bush was of course the chief head of research, efforts in the wartime period. And he was asked to share this report about where to go from there with the incredible results that science had produced in the Second World War. Well, this was led to this famous 1945 report, Science, the Endless Frontier, which incepted not only NSF ultimately through a congressional act in 1950, but also I would argue our fundamental conception of the modern publicly funded research landscape. Well, something that a lot of people don't know when you go through the original report and I would direct you to the 75th anniversary edition of it that you can find online, publicly accessible. The very last appendix talks about what we would now term public access. And I think it's remarkable that they articulated it so well in 1945, right? In a period when they're coming out of a wartime period when scientific secrets were essential to security winning the war and so forth. But already then they could understand that public dissemination of publicly funded research was a basic public good and something that we have to move towards. So it is kind of in our DNA and NSF and something that we think a lot about. So many, I won't go through the background of the 2013 Holdren memorandum. Most people are familiar with it. It was the first, I think it's notable because it was the first federal commitment to the basic proposition that publicly funded research outputs should be publicly accessible. At NSF it resulted in our 2015 NSF first public access policy NSF document 1552. It led to a wonderful collaboration with our partners at DOE the Office of Science and Technology Information. And you're going to hear from Brian in just a moment. I want to thank Brian up front for a wonderful collaboration of almost 10 years now that we've had with DOE in developing our joint infrastructure at NSF our public access repository that DOE develops and runs for us. So I want to thank them. They've been fantastic collaborators. We, to say the obvious, if you're not familiar with how it works in the course of annual reporting NSF PIs deposit copies of either manuscripts or the version of record to satisfy the public access requirement directly in research.gov. It's integrated with par.nsf.gov and it has been a strong backbone of our public access efforts ever since. Now this thing has grown. Let me give you just some stats about it. This is the same growth information represented as both bar charts and numbers without delving into all these numbers a lot that just a couple of observations. There is a lag in when while this chart the bar chart is portrayed as publications associated with a given year what you can already see and is exemplified by that 2023 column is not that we have less publications in by our researchers in 23, but rather there is a lag time of up to a year in the reporting cycle of when that stuff gets into the repository. So you sort of see these bars go up over time over the one to two year period after the particular publication year happens. Now the other thing I'll point out I can't see all the colors all that well in this chart but the thing I want to draw your attention to is here to four we've only required deposit of publications that little dusting of red that you see there of data sets we implemented in 2021 the capability optionally for researchers to deposit or make publicly accessible data sets in public access repositories to register them through par so slightly different from the way that we do papers and publications and what we should anticipate as we go into 25, 26 and 27 and after the Nelson memorandum requirements going to effect is a lot more red. In fact, I was out of presentation yesterday and people that have done some analysis of it what you would see if we had been requiring data sets in this period would be very large blocks of red deposits each year of data sets up to about 115,000 so right now we've got just around about a third of a million items in NSF par and it is continuing to grow probably now north of 60,000 items per year and we anticipate that will grow significantly as we move into data sets and other format types. I want to call out the importance of the NSTC subcommittee on open science or SOS it has been a fantastic collaborative community for us in the federal space to work on these topics together it is comprised of a large number of individuals over 160 individual representatives of dozens of different federal agencies has a lot of different working groups and has really dynamically synergized our activities. NSF is a co-chair of the overall subcommittee on open science and is actively involved in all its working groups and it has really framed our thinking in collaborative ways through products like this thing I've called out on the right a document that took us a while to come to consensus on but I think is important desirable characteristics for data repositories for federally funded research so this brings me up to the Nelson memorandum I again won't go through this because everybody's familiar with it at this point the major departure the new updates to the public access mandates of the federal government in terms of zero embargo of both publications and data sets some things I will call out is that it does have a multi-phase series of rollouts in both 25 and 27 and we are right now within the agency working on several key implementation decisions in decisions on these things within the agency I will note and if you read our public access 2.0 plan NSF 23-104 is the document number you find it really easily on the internet equity concerns our primary driver in our policy and implementation thinking as in ways that I may mention here but let me keep moving we've done a lot of community engagement on these topics of public access we were at pains because of the nature of NSF to really hear from our communities that we serve about the what their perspectives are as we go forward with this we've held half a dozen well attended major public engagement webinars in many many smaller sessions also you may have seen because of these equity concerns we did a targeted RFI to invite public or anonymous or attributed feedback on our PA2 plan with some particular focus on equity issues I said I would say a little bit about our Faros RCN program this was our sort of featured initiative as part of the year of open science and it was the inaugural competition and a first of its kind major NSF program focused on catalytic improvements directly on open science and the fair data guiding principles hence the name Faros RCN research coordination networks and another thing that was unusual about the program is it represents a true cross agency commitment to open science all the NSF disciplinary directorates funded awards and made commitments in this space to disciplinary progress in open science in the particular ways that makes sense and are of consequence in particular disciplines they we funded ultimately multi institutional national projects comprised of 28 separate individual institutional awards total of 12.5 million we're now engaged in fostering the community of the Faros RCN projects to encourage synergistic collaboration and national leadership efforts by these projects in advancing open science we are we learned a lot in doing this competition we're analyzing the program with an eye on the future while we can't talk about new competitions before they are released before they're done please stay tuned for new announcements we learned a lot and we thought a lot about how to best catalyze continued innovations and adoption of open science practices in this space and I am I am I will rush through these last slides Chris sorry I'm running over a little bit just to mention of some additional synergistic NSF programs we fund a lot in these spaces I do want to call out the the various agencies that are collaborating in the year of open science and our transition future of open science let me conclude with two slides one I'm indebted to the PI and participants of our project we funded the informate project that identified I think a really useful term the global research infrastructure that they have been using to frame their research activities that let me point out that and they they have a lot to say about how the different elements of kids and metadata comprise this global research infrastructure of repositories and services and so forth a point I want to make is that the aims in section four of the Nelson memorandum which had to do with the 2027 phase of implementation what I realize the other day is that they are ultimately contributing to the creation of a global knowledge graph of science you will hear from Susan about something called the proto OKN collaboration interagency collaboration and you may very well have heard about the near the national AI resource pilot projects all of these are the kind of the devils in the details of how good the quality the quality control on our metadata and efforts to identify research outputs through PIDS are NSF and DOE are now actively exploring possibilities in this space so let me conclude with this slide about just what I think three characteristics of the future of open sciences what if people ask us what is the future of open science mean well I think it means active collaboration between funding agencies to specifically to advance open science policies and practices practical and detailed endeavors to advance open science in particular disciplinary context and then finally a focus on the changes of culture of scientific research to realize the benefits of open science and thereby improve our collective community ability to undertake research thank you sorry I ran over a couple minutes Chris No perfectly fine there are there are two questions actually awaiting for you in the Q&A some clarifying questions so if you can go there and we want to do questions as we go along or do we want to do all the presentations and then go to tonight yeah we'll do all the presentations and then we'll get to questions but it looks like a question you can you can answer will do but in yeah next up we have Susan Kruger from the NIH hey hey Susan so I will be presenting her slides let me just share my screen and then hand it over to you Susan so hopefully this this will work here I have your slides queued up I think no problem if it doesn't I can jump in oh you can do it did you test yourself oh no I didn't test it but I'm hoping this is very similar to the others here we go they're coming up lovely they look great yeah so I will I'll mute myself and you can start so thank you and thanks Chris and to Brian and to Martin I'm super happy that I'm able to join today and and so nice to see everyone as well what a great opportunity to talk about open science and this has just been a fabulous year and there's been so much progress across the federal agencies and loved hearing from you Martin so let me pick up the ball and tell you a little bit about what's happening at NIH and so if we can go to the next slide this is our canonical mission statement slide I won't surprise you to hear that our focus is on health it's in our title we are the largest public funder of biomedical and behavioral research we are a driving force behind the decades of advancements that improve health and revolutionize science and serve our society our work isn't finished when we create and deliver new scientific discoveries really our work is finished when all of our citizens are living long and healthy lives again health is in our title it's our bread and butter our research encompasses the laboratory the clinic and the community we have made an immense progress in accelerating scientific methods such as new data analytics and data applied to everyone including communities and patients but you know really making that data available making the information about the data available in a way that constitutes open science is the next big leap and it fuels all of the next scientific discoveries and artificial intelligence and other technologies so the next slide just tells you a little bit about some of our NIH strategic plan and our global objectives you can see that there are three big areas in research foundational science to treatments and innovations and cures to research capacity and workforce and infrastructure development and to stewardship and partnership and then there are a number of cross cutting themes and I'm over there on the very far writing data science just an FYI that not only do we have the NIH strategic plan we also have an NIH data science strategic plan the RFI has just closed and we are now finalizing that plan you should expect to see that certainly by this summer and I'm really excited to showcase that new plan but can do that today so let me go next to the slide and tell you about what I can talk about today which is what we have been doing over the past five years to really support this idea and concept of open science and particular data science so I'll talk about the general repository ecosystem initiative and support for repositories and knowledge bases across the NIH I'll cover very quickly fair software and software sharing guidelines and then to talk about the work we're doing in stewardship and sustainability of data assets including our data management center of excellence and some of the work we've done in partnerships with for example FASAB and data site so those are just a few of the things I can talk about but there's tons of work going on across NIH and across our 27 institute centers and offices for which my office is a coordinating glue we have a small budget and with that budget we hope to make big changes so the next slide is you know why do we care so much about data like what's the value to the research community what's the value to science and to society in general well it doesn't surprise you you know this already that it validates our research results making high value data sets accessible fuels the next generation of discovery we want to fuel the future of research directions and increase opportunities both for citation of data and for collaboration around data and you'll see a lot of activities in terms of reusing existing data improving data making it AI and fair ready by imputation for example we also want to do another activity with our ability to share data we want to foster transparency and accountability to the taxpayers and just demonstrate that we have strong stewardship and sustainable plans for our data and data assets that we are effectively and efficiently investing taxpayer dollars in data that really reap the benefits to them and of course we want to reap benefits for researchers and research participants who are contributing their data to our clinical and observational outcomes we want to support appropriate projections of research participants data so we have a lot of activities that we're doing to create this culture of data sharing and the next slide is really the biggest change that we've made that I'm sure you're all aware of which is our data management and sharing policy and the request that investigators submit a data management and sharing plan with their application for funding and this can be your R01 your P30 or your contract or your other transactional authority application and OTA so now our data management sharing policy is fairly broad in all of our research activities intramural and extramural it doesn't encompass infrastructure training of course we've made a lot of progress we're still learning about how to effectively enhance data sharing through these data management sharing plans and you'll hear just a few of the activities that we're doing everything that I say after this slide is supporting this policy this is our number one goal if you see the new strategic plan for data science it is the top goal and it is a very prominent goal of my office and not only my office but all of the institute centers and office NIH I just want to call out and I'll try to point it out as we go along the strong partnership we have with the National Library of Medicine as well as the Office of Extramural Research so the next slide tells you a little bit about just some of the activities we're doing to support open science fair data at NIH and the first activity is it's kind of hidden there it says implement consistent capabilities and then there's a not link and really what that is is it's a supplemental to the NIH data management sharing policy and it's instructing researchers what are some guidelines and best practices in selecting a repository to share your research results and we encourage investigators to select a data repository that exemplifies and aligns with the OSTP desire characteristics for data repository including a repository that may be supported by NIH or by any federal agency or private sector or even institutional repository aligning with these desire characteristics really promotes that concept of open science and fair data and so I hope that you will look at that and notice if you want to and will enhance our ability to search and discover NIH from the data there are a number of really cool activities happening both at the National Library of Medicine which will be if you hear Dr. Bertinelli's discussions it will be the home of how you find and search and use data as well as partnerships with NSF on the open knowledge network we want to enhance and conduct outreach and training for fair data practices including helping reproducibility of research using data and we want to foster communities to sustain research software so you'll see on the far right a few activities that I'm going to highlight today that really exemplify the concept of open science and fair data at NIH and let me just go to the next slide and here's how we're partnering with the National Library of Medicine and I just want to thank my colleague Maryam Zaga-Lamharam from NLM who prepared the slide there are a number of really great opportunities to look at data and open science at NLM for example the CDE repository CDEs are a way to structure question and answer format in clinical and observational research there's a wonderful opportunity to weigh in on the concept of common data elements and the idea of aligning some of these to consistency and with the US CDI plus standards you'll see that is an RFI which I can put in the link because I don't have it handy here on the side there's also other activities that we work across NIH to support the NIH comparative genomics resource as well as works to modernize clinicaltrials.gov what I want to highlight is in the next slide and that is led by NLM but it's an NIH wide activity it's the trans NIH medical informatics coordinating committee some of the outputs of the coordinating committee include a governance for CDEs but importantly they include resources that researchers can use to discover data sharing repositories and knowledge basis and so there's a link there if you get nothing else from my talk maybe some of these links will be helpful because if you're looking for how you might want to share your data there are options for you and this is a go to place as well as the resource called sharing.gov from NIH which will point you to resources to share your data the next slide is what we're doing to provide sort of an ecosystem of data science activities in the repository world and this is the general repository ecosystem initiative we call it the gray general repositories are an awesome place to share data especially if there isn't a community led data repository like for example in my community it was the protein data bank if you don't have that community established and led repository this is an awesome way for you to comply with the concept of open science and data sharing there's a number listed here at the top of options for you but what we want to do is to establish a common set of cohesive consistent capabilities and services that span all of these general repositories as well as social infrastructure to really start the discussion of how do you work together in a repository world we also want to train researchers and help them adopt fair data practices and principles to better share and reuse their data I encourage you to look at this noto output of the gray we have done activities common metadata fields one of them that are important for the agency here is just being able to link your data to your grad that would be an awesome thing for you and for us to really show the impact of the data management sharing policy of course we want to normalize and compare metrics across the repositories we want to think about ways in which we can connect researchers to research outputs via PIDs or could ID for is just one example and finally we want to basically increase the community of data sharing through teaching and learning materials so this is just one of the activities that we are supporting the next one on the next slide is very specific funding opportunities for communities who want support for either developing a new biomedical data repository and knowledge base that just came back from a wonderful meeting at UCSF and it turns out that there isn't really a good repository for bone and cartilage and connective tissue data so we just found out a good opportunity here and there's a funding opportunity for those who are very well established biomedical data repositories and I mentioned the PDB but there's others such as UnitPro which is funded by this initiative what are some of the objectives of our funding opportunities well of course we want to support data repositories as repositories acknowledging their importance that their core and vital assets to our biomedical research ecosystem we want to encourage and adopt fair data practices aligning to the OSTP desired characteristics we want to support the compelling and different stages of biomedical data repository lifecycle so we're managing these in a lifecycle approach and what are some of the requirements for a researcher who might want to apply to these funding announcements you really have to develop a scientific impact the impact is dependent on the stage but if you're an established repository global impact is important we really want to see you look at resource management practices and aligning with the resource lifecycle concept we want you to engage your communities and your journals and we want to really think about what does it mean to have a trustworthy governance to have long-term preservation and to think about policies of data access and privacy and ethics so that's some funding announcements for you to think about the next slide is just a shout out to our partnership with FASEP FASEP has a wonderful program called DataWorks Explanation Point here the prize is to recognize and reward leaders and data sharing and reuse and to create opportunities for the broader research community to really learn and find this for two years now and so we are up to a million dollars but we're going to keep going and in fact the next prize will reopen in May of 2024 that's just two months away I don't have an exact date but stay tuned the next slide will just tell you a little bit about what we have done you can see the number of teams and the number of people and the number of countries who all participated in the challenge and if you just hit the button there will be a link that you just do next slide okay yep there's the next sorry this one is animated I didn't realize there were so many animations I just want to highlight our last 2023 grand prize winner which is the COVID-19 cancer catalyzing collaboration it is a collaboration across 126 different cancer institutes in North America it's the largest registry of its kind it has almost 20,000 cases and what was really cool about this is that the primary submitter and the team lead is I think a graduate student so when I say this is for a broad community of participants it really is a broad community and I just want to acknowledge the awesome work that the team has done in the cancer consortium the next slide is how we are supporting our data repositories and knowledge bases to really embrace that concept of reusable, citable fair data is through our partnership with data site. Data site is an ability for us to mint digital object identifiers as one way for a persistent identifier of digital with data objects any institute center and office at NIH it can in most our consortium members under an umbrella consortium we are the lead organization for this through that membership their repositories can mint DOIs and we partner with Oste the department of energy's office of science and technology information they serve as the connective glue between our institutes and our databases and data sites so they provide those support services so that we can actually accomplish the goals of making data more findable, accessible and reusable. The next slide and I'm coming towards the end here is just an impact in most of you this is for our NIH community and our internal community is our data management sharing center of excellence it provides opportunities to create guidelines, best practices it hosts a number of engaging events for the research community and training on data management and sharing practices and plans and it also helps NIH internal staff to have tools to help assess those plans now they're just tools and guidelines the program managers do make the assessments themselves quickly I'll just tell you that we have next slide because I know we're coming in time we have our best practices for sharing research software it's an opportunity to hear about our goals for researchers who are developing software to provide transparency and rigor and reproducible software to track the investments we have in NIH software but providing a consistent metadata set for software that you can share through GitHub or others we certainly want you to share your code when possible and we want others to be able to reuse your code stay tuned there'll be two funding announcements that provide a support for research software engineers as well as sustainable software that basically advances biomedical and behavioral research and the last slide is my shout out to my colleagues across the federal agencies led by NSF is the proto-okn network the open knowledge network it's an interconnected network of knowledge graphs that supports a broad range of application domains of course our goal is biomedical and behavioral research we support the theme called the fabric which is basically knitting all these knowledge graphs together into an interconnected fabric or knowledge graph that help link our databases and our knowledge bases so it's just been a fantastic opportunity to work with NSF and my shout out to Cheyton and his team I can't wait to see what comes of the okn that the last slide is my way of thanking you and hoping that you'll stay in touch with us and engage with us in any different platform linked in twitter email we have a newsletter that goes out once a week you can hear more about what's happening in the institute centers and offices they're funding opportunities we have seminars so with that I'm going to turn it back over to you Chris and thank you again for the invitation yeah thank you and I think you have a question waiting for you in the q&a I thought there was another one but maybe Martin answered it for you but I think you have at least one question there that you can answer in the interim but we'll get to questions later I'll move on to Brian Hitson at the DOE if you wanted to introduce yourself Brian I I can share your slides if you okay I will find your slides and I will sorry I have to put these down there we go wait those are Susan's they might have been mine I just wanted to slide show mode when you find it there you go I'm still seeing hers actually interesting I'm seeing DOEs if you'll just put it in slideshow mode I don't know if you're seeing still seeing hers yeah I'm still seeing hers for some reason so let me I see yours Brian if others are seeing mine then it's a strange quirk I still see Susan's so I don't know Martin do you want to I know you had maybe you had the slides up or Brian if you want to try it who had them up before was that was that you Brian no I didn't have them up before so when you should be able to share do you see that green share screen button do you see that I just want to make sure that I have the file itself but if Chris's slides slides are visible to everyone he may not be able to see them but I can tell him I found the quirk so so yeah I found it out so I think I can share now with everyone so here we go sorry about that if you'll get a slideshow mode we can see them in a good presentation style so yeah ready are you I'm seeing it in a non slideshow mode can you get it into a slideshow mode no that's interesting another quirk yeah I am Martin Martin are you seeing them the slideshow or a regular view I was seeing it in a regular view like you Brian Brian I've got your slides I could put yeah I know Martin if you could I don't know there's some quirks happening here so yeah give me just one second we'll deal with it thank you all for your accommodation sucks okay sorry I just had to download them Brian no this is here we go are you seeing it now Brian yeah if you'll just put that in slideshow mode I'll be good I think it's doing the same thing to me okay hang on there you go there you go it looks good for me okay go for it I'll put my camera let's see here if they can switch to my camera view that would be good Martin looks better I'm seeing you Brian are you seeing me is everyone else seeing me yeah we can see you Brian okay because I was seeing Martin so I'd rather look at Martin than myself so that'll work for me alright hope everyone was patient with our technical glitches there very happy to be here and present the Department of Energy's progress in open science first and foremost I really do want to express appreciation and my pleasure with being with my colleagues from NIH and NSF Susan and Martin two fabulous agencies making such great progress as you heard from them in their presentations and also appreciated their mentions of their touchpoints to our partnerships with them on a number of areas so thank you for that also want to thank the Center for Open Science and NASA for pulling off this wonderful conference Martin mentioned this earlier but I've learned so much myself and really learned a lot from it so thank you for all the organizers of that obviously at a government-wide level we couldn't have done so much we've done in public access in open science without the White House and the Office of Science and Technology Policy Maryam Zaringhalam there has done such a great job coordinating interagency activities in that regard and going even further back from Maryam Dr. Chris Markham was such a great leader and helping to get the Nelson Memo finally issued so really appreciate the White House's leadership on this and I would be really remiss if I didn't also acknowledge the leadership in the Department of Energy all the way from the secretary to the undersecretaries to the Office of Science without their support we couldn't have done what we did and so much of the progress that DOE has made emanates from my organization the Office of Scientific and Technical Information, OSTI and so I really want to have a shout out to my colleagues at OSTI who made so much of this progress possible thank you next slide. So I always like to start out with this kind of a 30,000 foot view of DOE and our mission DOE is a very complex agency with several key prongs to its mission one of those prongs is how we invest in energy sciences of different kinds and about $15 billion out of our total R&D budget is for the research and development purpose and of course that money is allocated to us through the appropriations process to our multiple program offices within DOE and then that funding is allocated out to our 17 national laboratories to many hundreds of grantee institutions and universities all of them are making major scientific breakthroughs and technology advancements and so forth and their knowledge is recorded in the various forms of scientific and technical information that you see here publications of different kinds software data patents and so forth about 50,000 such STI products annually are generated from that $15 billion R&D investment excuse me next slide please and so I mentioned earlier why it's such a pleasure to be here with Susan and Martin and the trio of our three agencies this pie chart shows that out of the entire government we three are generating close to 75% of the total journal article output from federal funding and of course many many other agencies are also contributing major shares to that so DOEs in that third position of the triumph right here but definitely contributing to that 75% and I should acknowledge that articles are just one piece of open science and the research outputs there are many other forms software data sets and so forth but for the journal article this kind of shows you how the breakout of that occurs next slide please so we've heard a lot about the Nelson Memo of course every that's on everyone's minds from the 2022 but all the public access mandates really do emanate from that original public access mandate from OSTP in 2013 what was known as the Holdren Memo laying out expectations for agencies to develop a public access plan about how they would make their scholarly publications more accessible and also increase access to their digital research data and so like with the Nelson Memo the first thing we had to do was develop a public access plan which addressed those key components of the Memo and in the realm of publications we built a model around the author's submission of accepted manuscripts to us within 12 months of publication and this is really kind of the green OA model where they are able to submit the public the accepted manuscript to us we do we do that through what's called the government purpose license where we retain a license to the copyrighted version of the accepted manuscript and we have the authority to collect and distribute that for government purposes we also allow for the gold open access route where authors will pay a fee to get published we don't prohibit that that is an allowable cost but our emphasis has been on that green OA model our model really builds in we won't have very collegial and positive relations with the publishers we have a voluntary participation of publishers component to our model this largely operates through the chorus consortium so where a publisher willingly provides access to a publisher's manuscript or the version of record we will integrate that into the metadata of our overall collection and provide links out to that in addition to what we obtain from our authors and so we do all of this through the discovery to tool DOE pages which we established after the Holder Memo DOE pages stands for the public access gateway for energy and science and it's been a key way by which we provide access to these scholarly publications so that kind of accounts for the publications model piece of it on the data management side of it and that was another big part of the Holder Memo most agencies like DOE have established the data management plan requirements which asks any funding proposal to describe how they're going to make data more accessible so we've had that requirement in our original public access model for many years now as we move into the Nelson Memo that will shift into more of a data management and sharing plan so some changes there next slide please so the mission and the responsibility for providing public access to DOE's R&D results didn't just start with the Holder Memo the mission and DOE to provide access to our R&D results goes back many decades Martin mentioned the Manhattan Project and some of the things that came out of that but after the war of course the Atomic Energy Commission was set up and there's always a piece of enabling legislation lays out all the mission and responsibilities when a new agency is formed and so there was very clear language there where the agency needed to provide access to its unclassified R&D results where we're trying to turn a lot of that Manhattan Project research into peaceful purposes and that requirement has been reiterated over the years with each successive agency so AEC turned into ERDA the Energy Research and Development Administration and then it turned into the Department of Energy this was in 1977 and so each enabling legislation talks about the responsibility provide public access to it and other pieces of legislation have reiterated that and the quote that I have here from Energy Policy Act of 2005 talks about that so we've had this mission going all the way back to 1947 and we have this umbrella search tool called osti.gov which captures that from all the way back to the Manhattan Project to the 47 to the present and so it has over 3 million Department of Energy research records expanding that period of time what happened with the Holder Memo and NIH had a little bit of a leap on other agencies that they were actually getting into the journal article space prior to other agencies but the Holder Memo really opened up authority for agencies to provide full text access to the journal articles that result from their federal funding previously we had only been providing metadata about that but not the full text itself and so again we introduced DOE pages as our agency repository and discovery tool for finding that and we've now achieved upwards of 200,000 articles in DOE pages and the next slide will show the pie chart of the actual year over year growth in the quantity of accepted manuscripts and journal articles in DOE pages I mentioned on that pie chart earlier that we generate around 25,000 articles per year we're not totally comprehensive we don't get all the submissions from our researchers we're continually working to improve that but the bar chart here shows year over year we're roughly in that 25,000 range of editions each year to what is in DOE pages. Next slide please so yesterday especially in some of the sessions I've heard we talked about like what are some of the impacts how can we measure the benefits of public access and there are lots of different ways of doing that so here I've just picked out one example which I think is kind of interesting. This is an article from a journal, Eye Science in 2023 and a little bit busy on the visuals here so I kind of explained it but on the upper visual you see sort of a dark a dark blue line and a more faded gray line and then sort of that red vertical line in 2014 and that's when we really put the requirement on our national labs and our grantees to start providing submitting their accepted manuscripts to us so we can provide them through DOE pages so you see after 2014 our national labs really ramped up their efforts in submitting their accepted manuscripts to us and not every lab is at the same level but labs have reached a level of about 90% comprehensiveness in terms of the number of articles that they submit to us out of the total that they produce so a very big level of compliance and success by our labs we really appreciate their efforts and then the lower slide here the lower visual shows how certain communities have benefited from those increased accessible articles from those 17 national laboratories. Specifically this is in the realm of when people file patent applications so some key communities like inventors and small firms and these may be companies and others that file patents don't have subscription access at their institutions they're just private individuals and so forth so these are really communities that have benefited greatly from for example those national labs increased publications and so two communities inventors and small firms are now citing those lab articles at 42% and 49% higher rates than they were before that 2014 vertical line again these communities would not have had access to these publications had it not been for the public access mandate so a real evidence I think and how especially in their area of technology and commercial innovation and so forth the public access is benefiting from an economic standpoint these communities you'll know at the top of that lower visual that scientists have a 0% increase in the number of citations of these newly available articles and I don't totally know the answer to that but I think my theory is that scientists are typically working at universities and laboratories and so forth that already had subscriptions to journals and so they were more or less benefiting from having that kind of access already and so these newly freely available articles didn't cause them to cite those articles at any higher rate after 2014 than they were before 2014 but again the key point here is we're benefiting communities that haven't had access to it before. Next slide please so fast forward and a lot has been said about the Nelson memos so I hope not to be too redundant but maybe a little bit so just like the Holden memo required agencies to develop a public access plan it had some key changes it said a lot of positive things about the Holden memo keep up all the great work that you're doing there but it changed the number of key things one of course is the elimination of the 12 month embargo to go to zero embargo to provide immediate access to those publications and so our new public access plan has been approved and published we got that out in June of 2023 we had submitted it over to OSTP in April excuse me in February of 2023 we got approval from them in April so really a big thanks to Merriam for helping us with that our model in terms of what we're describing our public access plan talks about how we're going to achieve this immediate access to the publications how we're going to maximize reuse and use rights in our publications to enable machine readability for them on the data side of things there's a piece in the memo that talks about providing immediate access to any data that's displayed in or underlying a publication so we will be implementing a data management sharing plan so there's some a revision from the previous DMP to now DMSP where we expect our funding proposals to show us how they're going to achieve immediate access to those kind of data and also to data that doesn't underlie live publication we want them to address how they're going to make those kind of data sets more accessible and then the Nelson memo really for the first time unlike the Holder memo was really big on persistent identifiers where agencies are going to integrate persistent identifiers as kind of that connective tissue and Susan mentioned that in her presentation so I'll talk more about that in a minute too and the Nelson memo wanted agencies to provide a forum a channel by which the public can engage with agencies on their plans and so we have this very simple email comments at osti.gov where they can provide that to us next slide please so this slide kind of backs of some of the formal requirements from OSTP to get more into the conceptual meaning of open science and what we're doing to really achieve open science in DOE and I've taken the liberty of lifting this visual from Wikipedia that kind of shows in a visual sense of definition of what open science means and gives you all the different components to it I took the liberty of annotating this visual to some extent by putting in the yellow boxes there where DOE has a very heavy footprint and providing more access to open source software to data to publications and then above that the tools that we're using to make those kinds of research outputs more accessible DOE code for software DOE data explorer for data pages for publications and then all of that is under the umbrella search tool that I mentioned OSTP.gov and then I also took the liberty of drawing this dotted line around persistent around open science called persistent identifiers and several people have mentioned persistent identifiers or PIDs and it takes many different shapes but another illustration I'll show here in a minute just kind of is really a good indication of the connective tissue that PIDs play across all this next slide please before I do that I want to just kind of give a shout out to the highlights that DOE contributed to the year of open science of course Maryam mentioned yesterday winners of the open science challenge which are great those are some really outstanding activities by other parties and promoting open science another part of the year of open science was for agencies themselves to submit highlights that were mentioned at a certain frequency at open.science.gov and so DOE contributed to that as our agency contribution and one of these is public reusable research data resources and I want to give a big shout out to my SC the office of science colleague Dr. Michael Cook who really led this effort and it really is SC's a way of giving a badge of honor to data repositories knowledge bases analysis platforms and some others that are just kind of like exemplars or centers of excellence for data stewardship management and fair practice so we really have kind of given the badge of honor to seven of these data resources and data activities some of these emanate from our DOE user facilities some are from universities but seven so far and I'm sure others will be added to this and then the content from those data resources are mostly discoverable in DOE data explorer. Next slide please a second highlight that we mentioned and I mentioned in that open science illustration DOE code DOE has probably if not the largest among the largest collections of scientific software or software that come from R&D efforts that were made accessible to the public because software is kind of the third leg of the stool of publications and data to really enable open science and to see to go beyond the publication to get to drill down into data and software to really promote reproducibility so we established DOE code as one place where you can come and find any kind of software. We have a very active and I'll mention that in a second PID services where we assign PIDs in this case digital object identifiers to software to enable that linkage and discovery of it. Our big emphasis is on open source software so out of 5500 total projects 87% of those are open source and our metadata for when you go to DOE code you'll find sort of this burst metadata page that includes among other metadata elements links to where that software can be accessed so in the case of the 87% where it's open source and the licenses are very liberal you're able to go and use that software directly. If it's closed source software then the link will give you advice on how you're able to access that software. Next slide please. And this is kind of a way to only a couple of other slides to kind of close out my presentation but just to emphasize the importance of PIDs and persistent identifiers to promoting open science. That's kind of that connective tissue and so DOE through my organization Austin has taken a very aggressive stance and trying to advocate for PIDs, try to get PIDs integrated into our metadata to really achieve this meaning of open science where you're able to link from publications to data to software and so forth. And so we have services that we provide to our national laboratories and to other agencies in fact where we're assigning PIDs for data for software, for text documents, we know publishers assign PIDs for articles but PIDs typically haven't been assigned to things like technical reports or grade literature so we do that through this service. The memo talks about requiring PIDs for awards so we had a pilot project where we worked to do that and I think that's going to go a long way toward our overall solution we run the government U.S. Government Orchid Consortium where we're integrating helping laboratories and others integrate PIDs into their workflows and so the PIDs for people is really becoming more prevalent and also PIDs for organizations we work with Roar and some of the others that assign PIDs for organizations we maintain our own authority so that that helps build into our metadata, the accuracy where there may be organizations with different names but if they have their own PID that reduces that ambiguity so PIDs for organization is a big focus of ours too. Next slide please. So this is a very busy slide and I'm sure one that many of you have experienced in your own research but it just kind of shows how instrumental PIDs are in the research landscape in enabling open science and so this is for example just a metadata page from osti.gov or could be from DOE Pages but all the different places where we have PIDs and you're able to then move out to that related research object or people organizations and so here you have the PID for the publication which the publisher has most likely assigned. This could also be where we're showing the accepted manuscript. In the authors if they have Orchid IDs you're able to link out and see their full complement of publications that they've produced over time but most importantly here I want to highlight if there are data underlying this publication or software that went into the research osti has most more often than not assigned the DOI to that we integrate that into our metadata and you're able to bounce right out to Github or other places where the software may be or Big Share or the places where the data set may be or at an institutional data repository so that really this is open science in action this is really where you're able to seamlessly move and it really really supports the reproducibility and transparency of the research so this is a next slide please this just kind of shows how we are making great progress and we're really excited with the development of our new public access plan to contribute to ostp's expectations thank you Thanks Brian and thanks Martin Martin's been answering questions too as he's presenting the slide so yeah thank you again Martin for and sorry for the the snafu earlier I think like one question I'll look at because you haven't had a chance to look at the questions so Brian I'll start off with one for you but it can also be a question for all of you which is to what extent is the DOE or you know also NSF and NIH funding AI research and how much of it is classified in the labs versus done at research institutions Great question I mentioned when I showed that $15 billion in R&D investments as sort of one of the multi-pronged mission of DOE and so DOE has other missions national security being a big one and production activities that support the national security piece of it and then there's research that goes into national security related efforts in terms of nonproliferation and just things that aren't appropriate to be published for the general public and so you know very very very clear processes within DOE there are these risk matrices where you kind of put things in you know red yellow and green as to what is appropriate to share and so yes we have very good processes for ensuring that that kind of content doesn't get out into this public access open science that was kind of mentioned in one of the yesterday's sessions so and the labs have a process by where they run it through reviews to see if there are those kind of risks but also IP concerns and other kind of concerns privacy concerns so we really try to make sure that everything that gets out there is fitting and going to benefit the broader open science community. Yeah I don't know if Martin or you want to respond to that AI question or Susan if you have anything as well but we can move on. I'm sorry I was typing an answer it's some of the things in there well I'm sorry could you repeat the question? It's a let me see but I can come back to it it's I think it's down here it was just a round yeah here we go what extent at DOE but maybe your agency is about what extent is DOE funding AI research and how much of it is classified in labs versus done at research institutions and I just thought huge topic yeah NSF is making major investments it's one of our most central investment areas especially focused through the NAIR the national AI resource pilot program that was announced in January do some Google searches you'll find lots of references to that while some of the there are certainly research security concerns in some parts of AI research these days the vast majority of it is not restricted it's only when some of those research efforts become national security secrets or something like that but the 99.9% of NSF research is open it is only a very small category of research areas that are strained we will not share the information notably things like quantum cryptography research biological research that can lead to dual use biological weapons that kind of thing but the vast majority of this stuff is openly accessible I can just chime in as well we are participants in the national AI research resource by NSF we are co-leads on what's called NAIR secure NAIR secure is a secure way to access data and resources it's a co-partnership with DOE in this case and where we come into play is that we hold a number of patient participant data enclaves such as the N3C the national cohort collaborative that is a secure enclave you do need to have permissioning and control access and so for anything in AI that has to deal with patient data particularly we are working on controlled and secure platforms in access not necessarily classified but importantly controlled in high security ways absolutely personally identifiable information is another instance of a prime instance of where data cannot and should not be shared thank you so there is another question about our government repos of publications working to include their publications and internet searches so google scholars is mentioned here possibly as an access option alongside the journals and I know that actually google scholar if you go into metrics you can see all the different government agencies that are indexed but Brian wants to go first so Brian you first well not necessarily anything unique from NIH and NSF but really good friends with honor agachari at google scholar and working with him quite closely and not just google scholar but google and being in others we go out of our way to do as much as we can to enable indexing of our content and so that really helps with the discovery of that we learn a lot from them and things we've had different models with NDOE centralized distributed you know we have the laboratories the universities we have allowed for deposits and institutional repositories where austi goes and links to those in some cases harvest those for dark archive purposes we've learned from the google scholars of the world in some cases that's not satisfactory because they're not always there may be firewall blocks or other kind of reasons where they're not able to go index those fully and so in our new plan we're going to try to work through some of those vagaries and it wouldn't surprise me if we're going to expect to be harvesting those from institutional repositories so that they are available for the google scholars of the world to index those more properly and it also sets the stage for us to perform analytics on them more easily so that's the direction we're heading Brian answered the question perfectly the one thing I would add is we have a very collegial relationship with Anurag he has been very helpful in having a lot of discussions on these topics and exposure of our resources through google in fact we hosted a interagency talk by Anurag just in the last couple months to enable agencies to hear from him comprehensively we really appreciate all the work that he does just a quick shout out to my colleagues at NLM the PubMed center it's a way to look at the articles and the full text articles that are archived not just from our funded investigators but generally from many many domains of biomedical behavioral clinical research and just a note out to a really cool application that NLM and CBI developed called PubTaters it's a way to search for information and entities in over 6 million full text articles in PMC so this is an AI based algorithm that you can actually find information in indexed materials in PubMed and what was that called again? it's really cool it's called PubTater I'll put a link to it yeah I believe PubTater I have not heard that one from you before Susan you gotta check this out it's pretty cool there's something new every day I guess like that's the other question I had just briefly is that I follow your newsletter Susan all of you mentioned all these wonderful things you're working on and we've mentioned Science.gov and other places where you can go and check but I'm wondering if you have other resources that you want to provide here of where we can track what you're doing that are current because all of us are interested in what you're doing yeah sharing.nih.gov if you're interested in open science and our policies this is the best way to communicate with us especially on what's required for grantees it's a great way to look at our policies our materials our resources and all of our publication access it's in addition to some of the wonderful resources at NLM and NCBI I put our public access web page in the chat it is pretty crowded admittedly with notifications and links to the various engagements but that's the main place that when we have something major that we post it we also do obviously listserv postings to groups of researchers the NSF Office of Legislative and Public Affairs maintains a number of listservs to different scientific communities and one overall one that is the director's listserv that goes to everybody anything major we put notifications on yeah and we we have I think this goes to your question about monitoring and reporting and so forth and DOE kind of has a tell of two cities in terms of the ecosystems that we're doing research in one is our 17 national laboratories which is a relatively small number and a more centralized environment to be working with and then there are much more distributed grantees and so forth and so we have good systems metric systems for sort of capturing like where is this lab versus lab and so forth and so we deal with them one on one you know we're not trying to put up a wall of shame they're all doing well but I don't want I don't want to single any one of them out so we kind of deal with them directly and then we have separate processes that he's happy to sort of report on aggregately how DOE is doing but at an institutional level we probably will not be posting that thank you all as usual Martin's answering questions there's one that we can answer I know we can go a little bit over if people want to stay on we can try to answer some questions but at least two of them are for you Brian if you can go into and maybe Martin and Susan can answer this general question which I said we can answer live which is how are agencies working with other stakeholders to support compliance and minimum burden on burdens on researchers you want to start sure absolutely of course you can see that we're working with pass up and data works they have a number of different activities including data salons which work with their communities on how to develop data management sharing plans and practices and fair data we've been doing quite a bit of seminars and webinars just generally but we also partner with the data curation network to provide training and a lot of that training does happen with institutional librarians because let me tell you that might be your go to place if you want to learn how to make your data open and fair so we've been partnering with societies and organizations that can broadly reach much larger communities and of course each institute center and office NIH has their own unique community and are engaging those communities of theirs some institutes are providing guidelines and sometimes even templates to help produce a data management plan or to help researchers understand where and how to share data so it is definitely a multi prong approach from our office and engaging broadly communities and nodal points as well as institutes who might be developing their own very unique dissemination materials for open and fair data and information we just briefly we do as I mentioned a lot of different public engagement efforts in particular with we have an active and fairly regular webinar series with officials from offices of research and sponsored programs at institutions around the country where we announce new procedures, mechanisms compliance expectations, that kind of thing to basically the respective individuals at university campuses that are responsible for monitoring that so we could always do more certainly but we also do public as much time as we have as we can devote to it we do public engagements that are open come one come all and those are all recorded and up on our website that I put in the chat and just to find Chris thank you for pointing out the question about the Federal Purpose License Government Purpose License and explaining that a bit so lawyers as you can imagine have been heavily involved in these kind of discussions and so while we enjoy a license to the accepted manuscript and that gives us the authority to collected and distributed and so forth that authority itself does not extend for someone in the public then to take take that article and redistribute it or do certain kind of derivative works on it and so forth most of that is governed under the fair use license so we know that the 2020 to memo wanted agencies to continue to maximize rights and reuse while still respecting copyright and so forth so we're still looking at that seeing how certain agreements that authors and so forth could be modified to maximize that we very much the commercial commercialization piece of it I don't think there's any any prohibition against somebody finding something on duty pages and then running with that in terms of like moving on out with different commercialization avenues that that could take but we do recognize that agencies current practices and it's not just a deal-e thing but it's a government wide thing the limitations on reuse of certain things is one that's still a nut that hasn't been cracked and something that OSTP and other agencies are working on on talking through there is for those that are interested in this somewhat arcane topic there will be a I just got notification yesterday that there's a webinar in the UC University of California system and the authors alliance will be hosting on this topic so you might contact them for information on that event I believe that's in April I saw the two thanks Martin for bringing it up but there's one more question I think that was answered for DOE Brian it's more of a technical question but I just wanted to I think we're closing out and I wanted to thank all of you really for a great informative session really appreciate it I think again we were excited to see you say yes and agree to be here and present from this the team that put on this program I just wanted to thank center for open science again they're amazing and particularly Katie Corker she's our ambassador of Kwan so I'm probably making her blush but she has really just done an amazing job so thank you all and look forward to seeing you out there in open science then thank you all for having me thanks Chris