 Good afternoon. Welcome to the Chemical Sciences Roundtables webinar on open access and fair data. My name is Kay Wims and I'm the program assistant with the Board on Chemical Sciences and Technology at the National Academies of Sciences. CSR provides a neutral forum to advance the understanding of key issues related to chemistry and chemical engineering, as well as promote thoughtful discussion between government, industry, academia, and non-profit. CSR standing committee members Dr. Rob Mollochka, Dr. Jake Yeston, and Dr. Karen Woolley will be our moderators for today. Dr. Mollochka is a professor of chemistry at Michigan State University. Dr. Yeston is an editor at Science Magazine and Dr. Woolley is a distinguished professor of chemistry at Texas A&M University. This is the third webinar of 2023 on emerging chemistry topics. We launched these webinar series in early 2020 and all the webinars are available on the CSR website. Please see the website link in the chat. This webinar will provide an overview on open access and its impact on different groups, discuss the impact of fair data principles, and highlight the opportunities and challenges of data management. The chemistry and chemical engineering community will be central in today's discussion. We will have a total of three presentations and at the end of each we'll address one or two clarifying questions. All other questions will be addressed in our discussion time after the third presentation concludes. Please note that the chat feature has been disabled for Zoom audience participants. Your questions can be submitted via the Q&A button on Zoom, which is located in the bottom control panel. Our moderators will ask questions on behalf of the audience during the discussion portion of this webinar. Finally, I would like to invite you to join us for our upcoming workshop on October 9th through the 10th on the future implications of open access and fair data practices on chemistry and chemical engineering publications. The workshop will be a hybrid event at the Keck Building in Washington DC. The workshop will expand in depth on the topics presented here today. You can register to attend the free event in person or online. Use the QR code, the link in the chat, or visit the CSR's upcoming webinars event page. With that, I would like to pass it over to Dr. Rob Mlechka to introduce our first speaker. Rob. Thank you, Kay. So our first speaker is Dr. Tashneen Dubroy. Dr. Dubroy serves as the executive vice president and chief operating officer at Howard University in Washington DC. She is simultaneously leading an overhaul that includes the improvement of Howard University's court network as part of a partnership with IBM and is installing a new university-wide enterprise resources planning system. In its first year, the Dubroy administration effectively reversed six consecutive years of enrollment declines and yielded a 15% increase in new and returning students in 2015. She was named in 2017 as CEO of the year in the Triangle Region and to the 40 under 40 excellence in leadership list by the Triangle Business Journal. Dr. Dubroy began her career as a research scientist at BASF, the world's largest chemical company. She quickly ascended to the position of global technology analyst and after two years was appointed to serve as chemical procurement manager where she managed a strategic sourcing budget of $35 million. Dr. Dubroy earned her PhD in physical organic chemistry from North Carolina State University in 2007 and holds an MBA from Rutgers University. Prior to her executive appointment, the Shaw University alumna served as special assistant to the president, chair of Shaw University's department of natural sciences and mathematics and as an associate professor of chemistry. And with that, I will hand over the Zoom to Dr. Dubroy. Thank you so much, Rob. Good afternoon, everybody. I look forward to being able to share a few principles with you and a few best practices that Harvard University is using as we transform into an R1 institution and as we respond to open access and the fair data principles that are quite upon us right now. I'm going to share my slides here if you give me one moment. Sorry, this slide show. So in order to give you a bit of context, I figured I would put some information about Harvard University and what it is that we've been up to since our founding in 1867. Currently, we have a student enrollment of about 13,000 students. We're tracking as an R2 university, but we are slated to become R1. We have coordinated via our office of research to develop an interdisciplinary effort to ensure that the university meets all the parameters to become an R1 institution. It turns out we used to have an R1 designation, but it was revoked at a time during which the criteria changed and we fell outside of the R1 band. So we're looking forward to being able to demonstrate that we are an institution that has high research activity once more. Our 14 schools and colleges include the College of Medicine, Pharmacy, Dentistry, and Engineering, and we have an academic medical center and a hospital that is affiliated with the university as well. We share a consolidated balance sheet. And I'm telling you all of these stats just so that you can get an idea of the size of the institution. It turns out that our endowment is hovering at about 1 billion now, and that endowment is the largest among HBCUs, the institutions that follow closely with it. I think they're now bucking around 500 million, and that's only one of them. And I think the second closest would be at around 350 million. There, here we go. Let's see why this isn't advancing. There we go. So I am demonstrating what the fair guiding principles are. I think we are all aware of them by now. Having data that is findable, accessible, interoperable, and reusable. We have about 100 HBCUs that are recognized by the federal government. I'm only using a few of those logos here, but they're largely concentrated in the south and on the southeast of the nation. And we have an HBCU in the US Virgin Islands as well. As you can imagine, having 100 HBCUs, they vary in size. They vary in the amount of budget that they're able to allocate to research. There are some of our institutions that are not research oriented and are largely teaching institutions, but all of them struggle for funding and all of them need open access services, especially when they don't have the resources to be able to stand up any activities on their own or subscribe to major journals. As I mentioned, we are within the band of institutions, especially in the district that are considered to have high or very high research activity. We rank among 131 of the universities across the US with five other universities in the district being able to maintain the Carnegie classification. Our strategy Howard Forward calls for us to regain that R1 designation by 2024, and we're tracking very well to be able to do so. I want to talk a little bit about our HBCUs and the disparate funding that has been issued to them over the years and why it is that we have to be exceptionally creative as we think about partnerships and especially with the federal government or with our peers. It turns out that the monies that have been allocated to supporting SANS and engineering at HBCUs, it's declined significantly over the years. You can look at this graph and it tells a story on its own. The funding has certainly been meager even back in 2001 and now it's becoming much more detrimental to the survival of our SANS and engineering programs. So we have spoken as a collective with the federal government. We have spoken with grant and funding agencies to demonstrate how it is that the poor investment over the years has impacted our students. And so I think the feds are paying attention and we have a lot of work to do, but I look forward to being able to come back to demonstrate that the type of investment that is needed for HBCUs has been allocated to them. I think, you know, I'm speaking about HBCUs because I am on an HBCU campus, but I think this story could be retold if we spoke about HSIs and if we spoke about small institutions, especially those in the small private category that may not be as research intensive but still would benefit from having research funding. In terms of the departments of education and what has been done, in recent years, the federal government allocated about 198 million of American Rescue Plan high education funds to support community colleges, rural and MSIs. Now, as you can imagine, it's never enough. There aren't enough funds for us to be able to stretch across all of the institutions that fall under these categories, but any amount helps. Howard quickly realized, though, that there had to be a value proposition that our research has demonstrated that would allow us to be able to attract larger grants, especially those that meant something substantive to advancing research across the nation. And so, as a collective, we were allocated about 2.7 billion in total COVID support over the last three years. Recall that there are about 100 HBCUs, so the math can tell for itself what it is that we've been able to see in terms of the allocation of funding across our institutions. Understanding that they vary in size and that size of the institution, the funding that is ushered to them, would be commensurate with the size of the student population. Howard University, North Carolina A&T University, and Morgan State all posted record research expenditures in 2022. And we're getting closer to qualifying for that R1 status. Our enrollment has increased over time, but also the enrollment of HBCUs nationwide only increased by about 8,000 during the 2022 year. A lot of people are under the misconception that we've had a really significant rise in the amount of students that are matriculating at HBCUs simply because we had a COVID event. That's not been the case. If you were to look at the HBCU population and look at individual schools, one would quickly notice that the HBCUs that are smaller in rural areas and don't have a significant value proposition for our students, the why for attending, they did not experience an increase in the enrollment. And so we still have to think about how it is that we can be supportive of them, how it is leading the charge in being able to do so, and there are a few other PWIs that are doing it well. We were awarded $122 million in annual research funding and that allowed us to create significant opportunities for our students and this was between the years of 2017 and 2021. This is new funding coming into the university. It really means a lot for the university as you can imagine because our researchers in some cases we've been able to attract researchers that have left their institutions and come to the university and brought their talent with them. And I think because of the kind of research output that we've been able to demonstrate over this short period of time, we were also able to attract a university-affiliated research center via a partnership with the Department of Defense. It awards us about $90 million over a five-year period. But it leads me to speak about the infrastructure, the same type of infrastructure that were infrastructural challenges that we're talking about as we think about open access, as we think about fair data principles. It's the same concepts that result in us having to bolster our infrastructure in order to be able to accept major grants of this kind. And we are still working to ensure that we can attract even more federal partnerships simply because we have the research capacity to be able to do so. I think there are a few things that we've got to consider when we're thinking about how it is that we approach open access, the variety of campuses that we have, the variety of resources that we all have. And the real question is, can every university afford open access? Can every university afford to be in compliance with the fair data guidelines that are coming up, coming through the pipeline? And I think you would get a mixed bag of results if you were to ask various campus administrators that question. It turns out that the White House has stated that public access to all university-funded research papers, they're expecting that to be public by 2025. And that's no easy feat. There's some scientists I can guarantee you on campuses who aren't aware of this mandate that's coming down from the federal government. So we have a duty as administrators to be able to ensure that we're conducting the type of communication through our organizations so that they understand how it is that their participation or lack thereof will impact our universities and our ability to attract federal funding and other types of funding if we're not in compliance. I looked around for information on journal prices, how they've escalated, especially compared to the budgets that we've allocated to being able to bring in various journals to campuses and meet them available. And you can tell from the graph that I've shown here that the average health sciences journal price has changed significantly since 2013, escalating in costs by about 85% over time. While there has been a negative change and a very sharp decline in the collection of budget changes since 2013 during that same period of time, meaning the budget is just not on par with the rate of escalation for the journal costs. So I think that's a story that we can often tell at our universities. Of course, sure, one can argue that there has been access to journals that are free, right? And there has been more access that's available to us. But I don't think we can ignore the fact that the budget allocations for common journals that are to be shared across campus hasn't been the same as it was in prior years. And even prior to this, we were still on the funded. And the librarians tell this story much better than I can. There are tools that are being used to save universities millions of dollars in journal subscription. One is called Ansub. SUNY has a story where they were facing an annual $9 million bill for its subscription of about 2200 Elsevier titles. And by using Ansub, they were able to tell which journals had higher utilization rates. They shaved down the spending to about $2 million a year and subscribed to about 248 journals compared to the larger, massive journals that they were doing prior to this. And sure, that's one way to do it. But I just wanted to make it available to you in case it's something that our campuses have to do. And putting it in context, the total revenue for HBCUs in 2020 to 2021 fiscal year was $12.4 billion with 1.8 coming from student tuition and fees. As you can imagine with 100 HBCUs, this does not bode well for the sector. We are severely under resourced. And when we're talking about how we compare to one other university, forgive me if anybody from Yukon is in the audience, but their revenues were at $2.6 billion. And so we've got a lot of work to do. If we are to be in compliance with some of the mandates that are coming down, we have to find unique ways to partner with other universities in order to help them. We want to meet some of the demands and the mandates that are coming through the pipeline. So how do we ensure that our institutions of all types have the foundations for OA and fair access? Well, Princeton is doing it well. Princeton has a partnership with all the HBCUs where they were able to assist with us receiving support for our research activities. Binghamton University, as part of the SUNY system, they were able to do it well. They were able to assist us through a research alliance where they are pooling resources to assist HBCUs who don't have the resources when standing alone. And it leads me into thinking about how do we comply? Yes, we've been able to have strong partnerships with a few HBCUs, but that doesn't mean that the entire sector rises because there are select few that have been able to command these partnerships. So there's still room for us to be able to do more together. I think if we look at our pharmacy school, for example, our pharmacy school has a very high demand for what type of information should be made public. And it's always under a very unreasonable deadline. And so there's a lot to work through there. We expect that our campus is going to have to be ready for the types of mandates that will be coming through the pipeline. And I think it's going to increase in terms of the speed at which these mandates are coming at us. So we've got a plan and we certainly have a mitigation plan as well. We are trying to partner with PWIs across the nation and with all the HBCUs to see how it is that we can comply with open access, create a culture of open access. What we've done here at the university, we were awarded a grant and the title of it is All of Us. Because we're working on data analytics training and software training not only to get our campus researchers to collaborate with each other, but also to reach across other universities to see how it is that we can develop strategies for success, especially as it relates to open access. There has to be interdisciplinary collaboration and that doesn't start at the point of open access solely. It starts with our Office of the Provost in our example where we are ensuring that the sciences, whether it's natural, physical and the humanities are intertwined with each other. We have several projects that have been successful developed under our innovation division where we are combining degrees so that students get exposed not only to the hard sciences, but in the humanities and they can leave the university with credentials that support that. And I think those types of foundations where we get accustomed to the collaborations across the university then bode well for us as we have conversations about open access. We're certainly allocating resources we're needed and we are continuing to share best practices. We've got to ensure that the mission for open access is institutionalized. So our Librarian Compliance Division, the Office of the Provost, and that includes the Office of the Research is developing an organizational change management plan with the intentions of ensuring that everyone understands all the principles and knows how it is that they can participate in the same. As I mentioned, training is paramount. It's important for us to train our professors on what open access means, what it means for their research outputs, what it means for data integrity, what it means for their success, especially as it relates to the timing of when data is released. I think one of the things that I've learned over the years when participating in roundtables, I've heard HBCUs speak about the research efforts that they have on campus, sharing the open access and then having a more well resourced university amplify their research before they even get an opportunity to come to a conclusion. And so I think there are some fears that are warranted, but then there are unwarranted fears as well. But we have to walk through every one of them and ensure that the university is going at a pace of change that people can get on board with and understand. We talk about listening and learning from the organization as administrators because we want to ensure that we're getting feedback from our organization on how things are going. I won't read through everything, but it's important for us to find university champions, reward those who contribute to open access, and ensure that the university is always being an advocate for policy changes we're needing. And with that, I'll end it there. I did a speed talk because I had to get through so many slides, but if you have any questions, Rob, I'll send it over to you. Thank you very much, Dr. DuBois. So there are no actual clarifying questions. We do have a great question that we will be taking discussion. But I think your talk was demanded no clarifying questions. So that was fantastic. Thank you very much. I will now turn the Zoom over to Jake, who will be introducing our next speaker. Okay, thanks so much. Our second speaker today is Leah McEwen, who's the chemistry librarian at Cornell University, where she has supported information discovery and scholarly communication and the chemical sciences since 1999. She is an active contributor to national and international chemical information initiatives, organizing dozens of thematic programs on research documentation and dissemination. She has served as both secretary and program chair for the American Chemical Society Division of Chemical Information and on the ACS joint board council committees on publishing ethics and chemical safety. She is a founding chair of the Research Data Alliance Chemistry Research Data Interest Group and a board member of the International Chemical Identifier or INCHI trust. She currently serves as chair of the IUPAC committee on publications and cheminformatics data standards to facilitate design and implementation of digital standards. She is the lead on the chemistry case study for the CODATA RDA World Fair Initiative to advance fair data practices and is an international advisor for the NFDI 4 Chem Research Data Infrastructure Consortium in Germany. She holds master's degrees in nutritional biochemistry from Cornell and library and information science from Emporia State University and was the first Paul Oplett Eugene Garfield Fellow at the Science History Institute. I know I personally have benefited a great deal from Leah's advice over the years on fair data best practices and I'm very much looking forward to her talk today. Thanks very much Jake. It's just a it's actually it's not honored to be here this is a really amazing set of speakers so I'm really looking forward to the discussion afterwards. I'm trying to get the slide share going. Alright, is that going for everybody. Yes. Okay. Thanks so much for the invitation and the opportunity to participate in the Chemical Sciences Roundtable. This is this is a very, it's a very impactful and timely topic. I think Tashi ended an amazing job of clarifying how and why that's so important for our community and to enable all of us to participate in this community to realize as much as we can. I'm just going to touch a bit on on a little bit more unfair and chemistry in particular look at some examples. I'm coming at this from a kind of a pragmatic level as chemistry library in a Cornell University it's it's a great part of my work is to work directly. One on one with individual researchers and research groups on challenges that they're navigating with publishing and and working with their research data so that's a lot of what my perspective is driven by. I just wanted to put these up here. I wasn't going to go into any detail on these but just as a starting point chemistry really is a data driven science and has been through the history of this research area. And I think the opportunity here really is that we can really realize not only many more opportunities and much greater value from the data that that we collect in chemistry, but also realize some hopefully some supporting mechanisms and maybe some cost management for our scholarly exchange across our community and with the with the greater global community at large regards chemistry data and information. But that's that that's a long story and takes a lot of players I hoping my internet hangs in there with had a little trouble here today so any apologies ahead I think we've kind of been through the fair data principles already. And we will continue to do these. I think it is worth saying here on this slide, really that the emphasis is on handling data automatically across the cloud, as much as possible. And this is really the thrust is to be as open as possible, and to provide this opportunity for merging data across different use cases and there's any number of amazing applications that are emerging in this space. We can think of any number of use cases for chemistry data along with other types of data to address challenges I was just on a another panel yesterday about decarbonization and then you know and pulling in material science and and transportation data and many other types. I think also, it's really important to recognize that fair operates at a very technical level, and the implications here I think in this panel, and for the workshop will be in part around the infrastructure that's needed to support that. And that's a that's a real challenge for us, and that's where there's going to be some cost and coordination, coming in. Another point I think it's worth emphasizing here is that a lot of value can be realized from fair and in a gradual stepwise manner. And, including just starting with your local research data management process. This figure here on the lower right is from a group working in the industrial sector. But they're realizing that even if they're not fully prepared to open up all of their processes and their in their data that they can realize a lot of benefit from fair, including some some technical progress. Implementing it locally and inside their company and to actually maximize the value out of their data for their company. So there's a lot of opportunity here. I'm just delighted about how much exploration there's been in this area but we have a long way to go. And down into the weeds about how dated flows and chemistry. You know this represents potentially a typical workflow that many of us might have engaged in as managing our data over time, and collecting some observations working with instruments, definitely utilizing some data to analyze our results and in generating more process data the XY numeric kind might pull this all together in a supplemental supplemental information document or right into the manuscript, cutting and pasting figures and and values that have been generated along the way and compiling all this and putting it to the publisher. At that point there's been some work to start engaging with repositories over the last few years as there's been some mandates, suggesting that more data sharing is is beneficial. This has really been a very, very low level and broad implementation for the most part across many communities. Some journals are very few might even incorporate a data analyst or have active peer review on the data. This varies quite a lot by disciplines and in chemistry is probably not very prominent but there are a few examples. So the question is really becomes, you know, can we iterate on this if we want to scale this. And that was part of what I was asked to think about today was, you know, what's what about the 80% to we have a wonderful effort amongst many research groups to pioneer ways forward on this. But a lot of challenge for the greater community. In 2019 NSF funded a workshop that I organized with a colleague to think about this problem of how we can iterate on our current processes and, you know, to facilitate more data collection as we go throughout the research cycle, and, and to package this up for repositories and getting it out into the community in a way that's more fair, more usable, more accessible. In fact, in the intervening years there's been increasing number of communities working on this workflows are developing. They're utilizing any number of tools such as electronic notebooks to facilitate that more domain repositories are being formed in, especially out of granted funded projects, supporting this kind of work, as well as analysis platforms online so it's really great to see these things emerge. There's a lot more data support support services on campus. Campuses are looking at data storage challenges, but it's all happening at a very grassroots level. It's all very dispersed. There isn't a lot of long term, large scale infrastructure investment on this and it's all still very siloed and specialized. This is the challenge ahead of us. I think, I think one thing I'll reflect on this slide before I move on is that this is really a heavy burden on researchers and institutions still carrying this part of the process to get the data together to manage that data in such a way that it can be made available and then once it's out there, you know what's the ongoing need for curation and to continue to support access to that data ongoing. That's a lot of burden on the institutions that support these resources and what's not even reflected here again are a lot of people like myself, librarians running around trying to help people put this all together. So there's a lot of burden in this current mode. So I'm just going to move quickly here through some of these thinking a little bit more about what fair and it means and how we can implement that more directly. So we're providing the discovery and use process in a way that machines can consume it. Excuse me. So that we can semi automate as many processes in the research process as possible and focus on the creative aspects. And for this to work. So it takes two major considerations and one is to expose as much of the information as much as possible and to utilize the basic functions for data exchange online that are already emerging so exchange protocols in the cloud. And that is what's going to engage the data to be fair and accessible and reusable. So we stop and think about find and access the first two areas of fair. What are the common search metadata that we use as scientists to look for information authors and keywords chemical structure information maybe properties. We're looking for bio activities, for example. And in the mechanism of the cloud environment. One very common way that this is happening is through identifiers I think all of us are pretty much now familiar with the do is for research publications or any type of narrative publication. I think many of us are familiar with those and then there's any number of identifiers emerging for other types of discernible components. And there's metadata that's associated with these identifiers and it's the information contained in that metadata that really allows us to create links between all these parts and how they're related to your data. You can see on the right is your linked linked graph you can create these graphs based on the metadata and this is all exposed in an online environment where programs that are navigating the cloud can link up to this metadata directly without even a human intervention. So for an example of how this is implemented in one framework. This is for Scalix the Scalix framework. There's metadata specified and that defines the relationship between articles that might be in journals and data sets that might be in repositories and this metadata is exposed in the identifiers associated with those two documents. And so anyone wanting to navigate that connection go from the publication to the to the data set or anyone else wanting to navigate to the data set can can utilize that that published metadata, you might say. So and then this is just kind of all happen automatically through this framework and there's a number of repositories and journals that are utilizing this. Here's an example of a crystal data set sitting in CCDC and along with the article that is associated with it in a different journal. Here's another example. Pubchem repository is supported by NCBI. They have a template where you can upload information about chemical structures and other metadata about your data. And they'll incorporate that into their data system they'll utilize that information to match and validate it against their structure model in Pubchem and then you can see here that there's a co-list information about your data alongside all the other data that they've been receiving from different sources. So you can see here an example from mass bank, where some research data was deposited, and then it can be indexed into Pubchem. And then another example was done by my colleague at the University of Alabama, where structure information was provided for their dissertations and they were able to load this into Pubchem. This is a discovery of their dissertations and they found that between 40 and 50% of the structures that are associated with those dissertations were actually not found in Pubchem previously. These may very well be novel compounds. So we, Pubchem's worked out this nice template that they use for this process that facilitates that automated process but it's a separate workflow for each user. Imagine if we could put this into something like the SCOLICS framework where it could happen more regularly up front and automatically rather than having every single connection have to be done one on one. I'm just going to move quickly through these interoperable and reusable. Those are harder areas to tackle. Getting into semantic representation, there's a long way to go with this chemistry is very complex. We really need to do a lot more standardization, for example, around even basic things like units of measure and there's some initiatives working on that. You'd think that chemical representation would be solved by now and the reality is that there's a lot of different rich range of chemicals and materials that we work with in our science and a lot of different ways of looking at them depending on what kind of work you're doing. Etc. Reusable. I think this area is a little bit more understood, but this is where a lot of work needs to be done to actually create usable resources that can be used. So for example, a lot more work on file formats and licensing. I think the licensing conversation has a long way to go in chemistry. It's pretty nascent. But understanding what the opportunities are with licensing data so that the terms of use are upfront and what is and what can be done and realizing that you can have data that's not shared but metadata that might at least facilitate a research process still can be available. I think another thing that's really going to be important in chemistry is the validation. So if you're providing file formats that are consistent still being able to check and make sure that those are implemented consistently is going to be really important. And I do want to also call out here based on Tessian comments to how important it is to professionalize navigating this landscape with everyone who's involved with this so that a lot of researchers of course will be interacting with these mandates but also people like myself and others on this call who work at institutions in publishing environments at the repositories handling data. There's a lot of skill development that needs to happen for us to be able to do this in a more consistent scalable manner. Just a real quick couple of examples here. This is crystallography a kind of a high level view on it. Basically, you know, we, a lot of us are familiar with this story, you need to put your data in a repository to standard format and this covers several different repositories in crystallography. There's also a tool that checks for that, looking a little bit at the, at the behind scenes workflow and the crystal data structure database for example. There's a several different routes by which data can come into the repository including directly from the diffractometer, but there's also a relationship that's set up with publishers to review that data, and to make sure that the data set is associated with the article. Again, this has really been highly successful in crystallography but how do we scale that how can we utilize more consistent mechanisms for other data types so that we don't have to do this one, one by one one relationship at a time. That's where a lot of the work needs to happen but we also need to make more mechanisms available that can be adopted as we go. This is just wanted to mention this one this is the NFDI for chem that Jake mentioned in the introduction. The infrastructure project in Germany where the DFG is a primary funder there has put a lot of resources over a five to 10 year period for different consortiums associated with different disciplines to build out the infrastructure needed to really support the research process, especially on the institutional side. Also for NFDI for chem, the heart of their program is an electronic lab notebook that's closely so distributed with the repositories called ChemMotion, the system, and then linking that data out from there into other repositories and making it more broadly accessible via some of these other services that are listed up here. One great thing about the couple of great things about this project one is it's a really great use case for all of us to be looking to it learning and understanding the both the opportunities and the challenges, both scientifically and technically but also institutionally and infrastructure and also the outcomes from this project are open source so this does give us a lot of materials that we can work from in any more systematic approach we start using to implementing infrastructure and other areas as well. So just these are sort of my closing points, we really need to address the state of reporting workflow we need to really get this a lot more systematized so that the data get out there. We need to have a process for research data that's just as familiar and in routine and in progress is as articles and I think Tashi and use the, use the phrase institutionalize and this is really important at the research community level as well. We need to address this challenge of where are the data. There are not enough repositories in the chemical sciences and those that are emerging are mostly associated with short term time limited grants, and they are not really scalable. And they're not necessarily fully interoperable they really need this is a this is where the infrastructure boost needs to happen and this is going to be a real challenge and I would love to see not only nationally, us to coordinate how we think about approaching this problem, particularly among the funding and agency community but also internationally as well. And then really the last thing that's that I want to emphasize here is that is that need for standards. And this is the part of the work that I'm probably most actively involved in is just providing all of these technical motifs in particular and working with those in the community who are stewarding these repositories, who are developing skill and workshops, who are putting policies into place to incorporate these as much as possible and and streamline the workflows for everyone. All right, and I'll stop there. Sorry I went right at time. All right, thanks so much. That was a terrific talk and it looks like once again it was very clear and so we got some good questions to discuss in the panel section. And now I will leave it to Kay for the explanation of the poll questions. Yeah, I just put some questions in a poll you guys feel free to fill them out. We'll give you about a minute or so or maybe 30 seconds so go ahead and make your selection. And I'll pass it off to Karen if she wants to begin introducing her next speaker. Thanks. Thank you so much. It's my pleasure to introduce our third and final speaker in advance of the open discussion. Dr Deborah Otis Debbie is the project leader of the polymer analytics project in the material science and engineering division at the National Institute of Standards and Technology or NIST. She joined NIST in 2013 as a national research council postdoctoral fellow and transition to staff two years later. Her research focuses on machine learning, polymer databases and using both theory and simulation to understand polymer physics. She is also a founding member of two polymer data resources, the polymer property predictor and database, and the community resource for innovation and polymer technology. She is a member of the American physical society and the American Chemical Society. She received her PhD at the University of California and Santa Barbara, and her bachelor's degree at Cornell University, both degrees during chemical engineering. We have fair data for polymers, so the floor is yours Debbie, take it over. Thank you so much. All right, let me get my slide. All right, can you see that. Yes, thank you. Excellent. Okay. So first of all, thank you so much for the opportunity to talk today. I'm mainly going to focus more on the fair data aspect and but I also will touch on open science and open access here and there. And I'm going to discuss a lot from the perspective as someone in the weeds doing the research and really trying to make this happen. Both of our previous speakers have already gone over exactly what there stands for but I felt absolutely obligated to put it up here because it's so important. And I encourage you if you haven't already to check out this paper and kind of go through the checklist and we're going to find out that, you know, it's not necessarily so easy. We've kind of heard about that before, but I'll continue to go into that in detail. So I thought I'd kind of start with a question. So why are we thinking about this now. I mean, of course, there are mandates coming down, but but why are there mandates, why now. So the need for data is not new. We've always needed data. I'm a chemical engineering my training, the Perry's chemical engineers can backbook goes back to 1934. I lost of useful data in it for building your next chemical plant. Even things that maybe we don't necessarily think of AI, but in some sense kind of far have been around for a long time too. We have group contribution methods for calculating if a polymer is soluble and a given solvent, which take in, you know, you try to figure out I have this chemical group and this chemical group. What does that give me. And in some sense it is early machine learning, which of course required data in order to make that happen. So it's not just needing data, and it's not AI under a different name. I think part of it has to do with the fact that everything has been moving more digital. We have the materials genome initiative right that started in 2011, but this had absolutely no explicit mention of machine learning and AI and we kind of forget about that because it's, we hear so much about it nowadays. We talked about experimental tools computational tools and the integration with digital data and that's not to say that machine learning didn't have a role there it just wasn't explicitly called out. So all of these ideas that have been ruminating around all kind of been there before. But I think part of the thing is, you know, AI is accelerating and AI needs data. Over five years ago, I wrote this opinion piece, looking at polymer informatics which is the idea of applying data science including machine learning to polymers. And we can see the future we could see what was possible we saw was being done more broadly in the world. We saw that what was being done in other areas of materials, and that it could really be a way for us to accelerate the discovery of new materials and new physics. This is mostly born out in papers published to we search for polymers papers in this broad space that there has been a significant increase in those published. Additionally, kind of going back to these open science concepts, there's been a growing push toward open access to, and we can see that the numbers of papers in that category are is steadily increasing. So part of it is all of these things coming together. And, of course, we're dealing with digital data resources. Here is a list of some of the polymer online data resources that are present. And there are new ones popping up here and there. Not all of these are fair. But some of them are moving in that direction, because I mean it's challenging like we've heard. So, as a researcher, we kind of ask the question, well, why can't we just make our data fair right it is such a simple question. But of course, there are lots of challenges to overcome. I'm going to talk about some of the challenges that specifically we're facing in the polymers community, technical challenges in particular. One of the issues that we have is that we have often have small disparate data sets, a lot of our data resources are put together, they're curated by looking in the literature and putting things together. And the problem is not necessarily that we have the small disparate data set, but if they're small and disparate and they can't talk to each other if they're not interoperable, or we can't determine that they exist and they're not findable, then that is a problem. For example, on the previous slide I listed these polymer resources, and I compiled that not from a single journal article, but from looking at three different journal articles that have overlapping information, but some still missed some of the resources. Another problem was right back down to the chemistry. Polymers are stochastic molecules, everything is a distribution that we're dealing with. We have molecular mass distributions, we have composition distributions, we have branching distributions potentially, and that makes it very difficult to just represent what we even have. And sometimes we don't necessarily even know, that's what we have to do characterization for to try to figure out what do we have there, but we can't look in and say, oh, I have one of these and one of these. Another issue is that we have highly non standard data. We often present our data in different formats and use different descriptors for the exact same thing. Like going back to describing molecules, polymers in particular, you could use the IUPAC notation or you could use a source notation, or you could use a trade name, and these are all potentially the same type of polymer, and yet we call them something different. Furthermore, we have issues with process dependent data. As a kind of interesting example of this, we can think of the glass transition temperature for polymers. So when they kind of go from being more melt like to being arrested in a given state. So when that happens depends on how fast you cool down your sample. And so you can end up getting huge variations based on what cooling rate you choose. And if you don't preserve that metadata, then it's difficult to reproduce. And this is why if you look in the literature for the many models, machine learning models for the glass transition temperature, even going back to the 90s with neural networks, you'll see that they have airbars of 30 degrees Kelvin. And it's because that metadata was ignored, which makes it impossible to do design. So now I'm going to talk about some of the things that we've actively been working on to try to address these technical challenges. And like I said in my title, this is working towards fair data. I wouldn't say we're fully there yet, but we're definitely trying. And so to deal with the small disparate data, we've created a user focused data resource, instead of trying to get a bunch of information from the literature, trying to go back to the source to the users. Specifically, we developed the community resource for innovation and polymer technology, which is located at this website here. And our basic goal is to try to enable science for the researchers by allowing them to kind of take care of the data management side of things. And this effort is led by Brad Olson at MIT, and also has contributors from NIST, myself, University of Chicago, Citrine Informatics, and Dow. To deal with the stochastic molecular representations, sorry, stochastic molecular structures, we're looking at new representations. So my collaborator, Brad Olson and his group helped develop big smiles, which is an extension of smiles for polymers taking into account the fact that we have these stochastic structures. So you have all that chemistry is encoded, and you can encode your bonding, whether or not things are head to tail, tail to head, or either or, and you can have all kinds of different ways of putting together your monomers to create complicated polymers and are able to describe that. Now, one thing that I should note here is what big smiles does is it defines a possible ensemble that a polymer sample can belong to. But then if you have a specific sample, then you have a realization of part of that ensemble. So for example, you have a homopolymer, which is pretty simple, right, you don't have every chain link, you have a particular molecular mass distribution to go along with that. So the big smiles would represent all chain links, and then you have additional metadata that would allow you to describe that information. And you can check out the paper here. And then to handle the highly nonstandard data and the process dependent data, we built up a data model. And so the way that we tried to do this is we wanted to make it very flexible. Our users are doing research. And so that means that things change. And so we put this together thinking of it kind of like a flow chart or graph, where you have nodes that correspond to different things. You have materials. So for example, you have various materials that come into some process such as a polymerization. Once you have that material, then you can do characterization on it and figure out additional properties, or you could combine it with something else to do another process to give you a new material. And you can then do characterization on that and even include how you did the analysis of the raw data because often we have to do some sort of analysis in order to get some sort of derived quantity. You can find out more about that in our paper in ACS Central Science this year. And so if we zoom into one of those nodes, this particular one is for polylactic acid, and there also is some lactic acid in the system too. So it has a unique identifier through this URL, which is cut off, but there's parts here that would have the unique identifier, additional information like created by. The very important thing I've learned about is the importance of having a note section because no matter how hard you try to figure out everything that you might need, you will never really forget something. So if you have a note section, it gives you some way to at least capture that extra information. Then we have a section on identifiers where we can include things like the big smiles and our properties, which we can link to our data and include our methods. And then also link to our processes and provide additional keywords to help search. But it's not just our technical challenges that create a problem. We also have non-technical challenges, right? As a researcher, we have the issue of, well, what do I do? How do I make data fair? Where do I put it? Well, all those things requires a lot of effort, and it also requires time. And so how do we have the time and the day in order to clean up the data and make sure it's ready? And then of course, there's issues around the community, and this goes back to some of the things that LA was saying with standards and agreeing, you know, how are we going to describe things? And we always agree on a handful of them so that way we can talk in an interoperable fashion. And so the way that we've been going about this is by trying to listen to our users, and we've basically created profiles for the different types of users that we have that have different needs. You have some people who don't really want to worry about the data, and they just want to do their chemistry, right? And then you have other people who are in love with code and want to run machine learning models on everything that they can get their hands on, but aren't necessarily as in touch with the chemistry. And then you have people who are all the way in between. Another huge thing in machine learning recently, especially I've seen in the polymer space is more on autonomous efforts. The components to build robots has gotten a lot cheaper. And so that kind of has also been pushing some of these things forward as well. And so instead of thinking about things from the mandate side of things, we've been thinking about what can we do to make it easier for people to do their science. And so kind of the ways that we've been doing that is we're starting with our data pre-publication. So allowing people to put their data in right away as they're generating it. The benefit for doing this is then you don't have to go back and cross-compare and waste a lot of time trying to get your data just so and relabeling everything when it's time for publication. It's already there, it's already in the right format, it's ready to go. The other thing that we're trying to do is through collaboration. This is a cloud-based service and so then the idea is that different users can then you can share your data pre-publication with just the people you want to. So people in your particular research group or collaborators at different institutions. And kind of going back to knowing what our users need, we're trying to have different modes of interaction. So some people are going to love Excel. It seems a little foolish to me, but nonetheless, we still have our Excel users and so we need to be able to support them. Then we also have ways of interacting with the system through an API, specifically a Python SDK. And then we also have the website as well. And when you enter in the information via one mode, say through the API, you can then go to the website and actually see it's there. So different people can interact with it differently as needed. And this kind of goes back to some of the points around automation as well because then you can write scripts to try to handle some of this for you, which reduces error. We also are trying to have advanced search capabilities so that way you can do substructure searches. This is something that's very much a work in progress. And so that way you can search for all of the block of polymers or block of polymers with specific chemistries or with particular molecular mass ranges or ones that have particular properties. So let's say that they form a melee or something like that. And so then you can search for these sorts of things. And then that allows one to do better comparisons and comparisons is really important because we want to use our data in order to be able to form benchmarks for other data. And it allows for the development of new theories and the different moving science forward on a whole. And then we of course also include things like automatic validation. So that way we check to make sure that if people enter in a temperature, for example, that it's sensible, you don't have something negative Kelvin, you don't have something with units that aren't temperature. And so that tries to reduce the errors that we have in our system. So everything that I've talked about up to this point is kind of more focused on data. But of course I wanted to say something else about code as well as someone who does quite a bit programming. We should be sharing our code additionally and this is really important because it helps really produce ability and the adoption of ideas. And unlike when you build a new instrument, you can't just magically carbon copy it, but you can do this with code, which really allows a way to push forward science. But there's still a lot of challenges that remain. I'm not going to go into all the details, but, you know, doing proper documentation determining what is the purpose of your code. Are you trying to make sure that people can reproduce your work? Are you trying to build something bigger like a package that people can use for their own research? And I think checklists could help here. So I kind of wanted to end on thinking of things as this individual science. How do you benefit? So by making your data fair and trying to make your science more open, which includes making your publications accessible, you allow other people to adopt your ideas more readily. This directly leads to an increase in citations and I encourage you to check out this reference here that talks about these sorts of things. And kind of going back to the first talk. This also leads to advancement in the democratization of science. I got into science because I wanted to push it forward, you know, push forward the boundaries. And so that means that we want to have not just some people be able to do it, but everybody to have access so that other people at any institution can then take the code and the data that I've derived and build off of it to create something new and interesting. So as an individual researcher, what can you do? I encourage you to think about open science from the start and use automation when possible to make your life easier and also to reduce errors. And consider using data resources that are specific to your particular field and the use of preprint repositories such as archive or chem archive. And for extra credit, I wanted to mention that you could also get involved in the materials research data alliance, which you can find out more at Marta Alliance.org, which is really trying to push forward these fair data ideas thinking about the different challenges that we have and creating working groups around that. And thank you. Thank you so much Debbie that was that was excellent. We're going to be moving into the discussion phase, but I'm not sure whether Rob you wanted to give the outcome from the poll. So, I'm trying to find the outcome. I do know that the first question and most the biggest most common answer was the cost to researchers. Sorry, I have it in front of me. I don't know if K can show the results or should I just read them off. There you go. Thank you. So if everyone has digested those answers should answers to the three questions. You know, Karen, I think when you look at those, those answers and, you know, remind everybody that they, they're still time to ask questions in the Q&A. But as you look at those questions, a lot of it is concerned by the cost but also is the infrastructure available and that comes with a cost. And so just to start off this discussion. I wondering if you or Jake can comment on what you think are some of the challenges and how will we actually, how will we actually afford open access fair data going forward. Do you want to go first. So what I'd like to do is just point that toward our speakers because the cost is a big part of it. And as the first speaker had mentioned cost is is differentially distributed to different institutions different persons. And so I would like to give the opportunity to our speakers to, you know, address these, these poll results. Jake, any comments. I think the increased cost upon the researchers is problematic and institutions will need to supply some support. I'd like to give the opportunity to our speakers to, you know, address these, these poll results. Jake, any comments. No, let's hear from the speakers and then I'll weigh in after that. Since I'm a moderator here I'm going to actually add by a little bit of my own on this and that is, is that, you know, I do think open access the there is a potential other cost here and and fear data for that matter and that is the, the opportunity to share data and make it available to all of us that does provide an opportunity to democratize science. One of the things that I wonder about though is how do we make sure and Tashnian spoke about this, how do we make sure that we're still making the field open to as many people to contribute to those those data, and to contribute to those articles that will wind up being open. And that's one of the things I think going forward we have to challenge. We have a challenge in front of us. So with, with my two cents. Let's open up. Yes, this discussion to our, our speakers. So this is Tashney speaking. Can you hear me well Rob. Yes. Fantastic. So one of the things I think we're thinking about on our campus, for sure is, you know, like, we have to be intentional about allocating dollars to affect mandates of this kind. So I think the first thing that we have to do collectively is prioritize it because we know that the challenges are coming and depending on which disciplines are involved, the price tag may be even higher. You know, as a chemist and you know, I'm sure that a lot of our colleagues on your have a chemistry background or even if in your if you're in another field. I often think about, you know, all the data that comes across or desks and all of the data that we gather and how on earth is it that we're going to even have the people power to be able to input or findings in some of the systems that we have, or some of the systems that we procure. So it's really not just about journals and, you know, I think it's really that we have to find these infrastructural dollars as well. To create the systems that are needed but we're going to need the people power to be able to do that so all of those things are part of the collective that we have to think about and we have to allocate as universities. I think our librarians are doing it well as they continue to try to find resources outside of the university, they try to find grants that are available to give us that alley you. And so it's difficult, but it's something that has to be done. Deborah or Leah. Sorry, I would have jumped right in but I was actually just trying to type something in the chat I was going to share and I will put that in in a minute that always CD collaborated with code data to do a study on sustainable business models for repositories. A little while ago that might be worth looking at again. This is a hard challenge. Right. I think we do need to retune some of the infrastructure investment in a number of places I think the poll indicated research institutions and agencies and it's probably where, you know, there's the most potential to retune how we support what we already support. So I think I would love to see an opportunity somehow to study this problem. This challenge, break it down across different sectors across different, you know, sizes of institutions and sizes of data problem. You know, think about those lenses and those facets that we need to break out. We just need a little bit more systematic view on it. I appreciate that institutions are thinking about investment. Thank you Tashi for for tackling that and please share your plan because I think that's another thing. You know a lot of things do happen through the community. So the more we can also outreach on on all of this, you know how people are approaching this what's worked for them. You know, and I think, as well as the infrastructure challenge. You know, and I don't have to get into this too much work yet right now because I want to get to a return but you know, enabling research scientists can also happen through their own networks as well. And so getting out where they are and bringing workshops to them bringing things to them. You know what can you do now what works for you what's already available. Maybe a little bit investment in people who are able to do those kinds of things train the trainer workshops I know the data carpentry workshops for example really exploded. More of those kinds of things might also help from the grassroots level up. Yeah, I think I agree with these things. You know, as someone kind of in the weeds in this like, you know, it is a challenge right. Crypt is currently funded by the NSF but at some point in time it will not be funded by the NSF and making sure that we can continue to make these resources available. And so that way they have longevity. And I would say, you know, in, especially at some institutions right you have to worry about like do you have the funding for publication, right. And so, you know, you can try to make things more accessible like I said by using pre print servers right as a way of getting your research up but that's actually only one step right like if we talk about the democracy science as a whole. You have to also be thinking about, you know, how do people, you know, there's a gazillion papers published every day, you know how do we keep on top of it how do we make sure that we actually see the great research that's being done everywhere to. Yeah, I mean, I think it's a really hard problem. And I think, I mean the fact that we're talking about this, I think is a good sign. And the fact that, you know, there is this realization of these sorts of issues means we're going to have a chance to move in the right direction, but there is there's no simple solution. So maybe just to pick up on this idea of the longevity of the data. Part of that is then the longevity of the utility of the data as things progress. Can any of us speak about the need for metadata versus versus data. Is it essential that both be open? Does that make both fair? What are your thoughts on archiving and making it available and open in the long term metadata as well as the data. So, I'm going to twist into a slightly different answer. So, so I ideal world, I'll just start there. I think it would be good to have both the data and the metadata. But the one question that this actually slides into is how much data and what data are we talking about the raw data? Are we talking about the process data? Are we providing how we got from the raw data to the drive data? Which metadata do we need? Which metadata don't we need? Which metadata do we not even know? What we can't even describe. And it's really, really hard. And there are no like hard and fast rules for what one should do. And I've even seen talks where people are like, well, we generate so much data, we're going to have to throw some of it out. And thinking about these things is, it's going to be a lot of judgment calls because it's research, right? Like you can't have a one size fits all solution. It's just not going to work. I mean, I think, like I said, if possible, it would be good to have both of them open, but then I think it gets into this bigger question of what is the most useful? When I try to share my own data, I try to think about this, like, who would be the consumers of my data? What would they find the most useful? How would they be able to reproduce my work? And I try to think about it from that perspective and then share the appropriate data because it does take time as well to put it all together. So as usual, there's probably no one size fits all answer to these kinds of questions, you know, and it may very well depend on, you know, the collective value of the data over time and as aggregated. So, you know, that's one way to think about it, right? Useful data has been aggregated and over time in certain areas and maintaining that those collections facilitates making sure that all the information in there carries forward as technology evolves, as documentation evolves. So that's one way to potentially think about it is by the resource that it's captured in. That's not the whole problem, but I just wanted to call that out. Yeah, I mean, it's, you know, it's one of those challenges, right, as a librarian. I'm like, document everything, save it all. But obviously, there's a lot of challenges with, you know, the sort of that blanket approach. I did just want to plug in, though, that documentation is critical. That is what the metadata is. You cannot have the, you know, especially if you're saving raw data where that might make a lot of sense. You know, if you see the methods in a field evolving, if you know the technology is going to improve and the resolution of the raw data is likely to improve, that's maybe an area where you want to be, you know, saving that data for a little while as things shift. But you need to document that really well. You know, raw data sitting there is not useful for people in the future. They don't have any idea about the parameters around that data and it's not perceivable. Documenting things in a text-based way is something that can be readable one way or another going forward. And I guess the last thing I would say this is there are some things probably you should definitely not do. And that's not hard code, you know, data or metadata into things that are not parsable in the future. You know, we've done a lot of that in the past. You know, binary database systems were developed and are no longer very easily readable except, you know, on systems mirroring, you know, technologies from decades ago. I don't want to get into that. That's not a good answer, but. I want to combine a few questions that we got early on, especially from Henry Rezepa about, you know, where he pointed out, fair and open are not the same, right? You know, data can be fair, but still not be strictly open or open and not strictly fair. And then there was another question about, you know, sunsetting a data set where, you know, people have worked hard for the duration of the project but, you know, what's going to happen afterwards. And what I think both of those boil down to really is how are you allocating resources? You know, where are those resources coming from and how are you prioritizing what to put them toward? And that also gets back to our poll question, the third poll question, which was, you know, whose responsibility is maintaining this and there was a pretty even split between, you know, the funders and the universities, right? And, you know, then I'll also loop in Leah's point in her talk that I thought was a very good one, which is data management is a professional skill, right? It's not something, especially if you're standing up a repository, it's not something you should be doing on the side, it's something that people who have specific knowledge in that area should be, you know, charged with and compensated for, recognized for. And so, I think the basic question is, again, you know, how do we prioritize these resources? Should the universities and the funders be working in concert? Should they be talking more to each other? Is there, you know, Leah, you talked about an initiative in Germany, is that something that could potentially be a broader international model? How do we think about, you know, the best way to make all of this financially sustainable? All great questions, Jake. I do think, you know, there could be, you know, there could be potential to prioritize maybe a few things to move forward. One thing that keeps coming in my mind is facilitate the discovery of data that is becoming more and more shared. Because as we're all discovering that and it gets used more and people are realizing the value of it and we start learning what the limitations are, either in terms of it being open or fair or sustained or, you know, accessible or, you know, the more we can learn about this, the more we have fuel to make the case that we need to invest more. So I do think that if we could get these different stakeholders at the table to negotiate how we do data citation, you know, in all of our discovery tools, that would really be, you know, and maximize the utility of our DOIs and other data fires. I mean, Crossref is great data sites, great, I will say that Crossref utilization by publishers is kind of all over the map and, you know, y'all have had a few years to practice and get this right. Could we elevate the level of practice in there a little bit and make the beta data more clean, you know, so to improve the usability of it. I think there's a lot of progress that's ready to go now, but it does take a weed that does take the community, the STM community, publisher community, but others in the community as well saying, Hey, this is a priority for us to get this right. So that's a partial. I wonder if you have thoughts on, you know, the, what's the university going to pay for what's the funder going to pay for or the university and the funder talking to each other as as constructively as they could be. I think there's room for advancement of the conversations between our university and funders. I think there are some funders that are more mature in their thinking about how it is that the impact that the open access has on a university's budget and on our infrastructure, and then there are others that have the expectation that we will cover everything. So, you know, I think as we continue to have conversations about this that it makes it a much more palatable conversation for the science community and for our funders I think every time we've been able to experience seed changes in our fields. There's always some amount of, you know, I guess, opaque, for lack of a better word sometimes that the clarity isn't there in terms of what our next steps are, where we'll get the funding from but then, you know, over time I think we work to make things much more clear. And who's who is responsible for what aspect of it. I have seen where some funders are, or some entities even vendors actually that have partnered with us for us to become our one institutions they have an understanding of what it will take for us to get from one point to the next. We had a partnership with IBM. They helped to strengthen our data analytics program, the same thing with sas known in North Carolina. And so, you know, these are all part of what I think entities can do vendors who may not necessarily be traditional grand tours of funds to the university for something of this kind but when asked they've step up to the table to offer whether in can contributions through their talent or to offer us a service that we've had to pay for every single year or budget is refined and I don't think it gets larger and larger. We're always trying to do more with less money. And so, you know, we're just not there yet in terms of how it is that we allocate enough resources to feed the outcomes that we need. So we've I think we've got a long way to go but what matters is that we've started. Yeah, and Deborah I wonder just briefly in the same vein if you could comment on on how you got down involved in your project and how much that industrial support has helped. So, so Brad kind of put the whole thing together and so like he had been talking with some people from from Dow and so that they are also thinking about these digit digitization type efforts right of course when we deal with you know companies there's all kinds of proprietary issues so right now we're focused on academic but of course, if you have an academic data source that is, you know, useful for companies as well right. And then I helped bring the train into the picture as well because I was aware of some of the efforts that they were were doing on their data models that were similar to the sorts of things that we wanted to do. And so that's kind of how that all all came together. So, we are at the five o'clock Eastern Mark, and so there are so many more questions. Fortunately, though, we have an opportunity to answer and even ask more questions in October. So the chemical sciences round table will as was said by Kay earlier will be following up this webinar with a day and a half workshop where many of these questions proprietary and from data industrial academic collaborations. How, as Nicky Paul asked, how is it that we can repeat something from a journal article in 1880 that we can't do one from from 2008. And so all these questions will fair data and open access improve that situation. Will it make it more challenging. I encourage you October 9 and 10 put everybody who's interested in this topic, put that in your calendar. The workshop is free of charge. You don't have to be in DC, but if you can I look forward to having a chance to meet you. And so with that I'd like to thank all of our speakers, as well as my co hosts, if you will, and especially Kay and Linda and those from the National Academy of Science Engineering and Medicine who really made this possible. Okay, I don't know if there's anything else for you to say other than, if not, I'd be happy to just conclude by saying thanks everybody, especially those who attended thanks to everybody who participated in the Q&A. And I hope to see you in DC or online soon. Thanks everybody. Thanks team.