 interesting too. So you think about Psyhub and you think about piracy. That's what it is. Why should we even talk about it? The idea is that before open access was out there, the idea was there. This concept of guerrilla open access was already there. We have to admit that is part of the reality of open access. When you think about people in different places thinking about getting access to scientific articles, many of them know that they are indeed available for free, but these are unauthorized means. It's not a secret. If you are explicit with what you are interested in, you Google something like how to access pirated articles or anything similar, you will get an immediate answer. It happens, it's unavoidable. Several answers, in fact, as time passes, you do find as the first results on Google what is active now at the moment. If you follow this over the years, you realize that there have been many different alternatives and websites that have facilitated this. Of course, using these platforms, we know that they are infringing copyright laws and access restrictions, but the interesting thing about it is that they are being used by a large number of people. The question is why does this happen? Maybe there is something that we can learn about it. The title of this article, of course, is really interesting. It says who is downloading pirated papers. It says everyone. This is a article in science published in 2016. Of course, the title is really dramatic, but it has to do with actual numbers with users of site hub and other websites. As I said, the story is very interesting. When you look at numbers, if you study from 2022 where they are up, they ask a number of a sample of academic researchers all over the world. It's like more than 50% of academics admit using these websites, so they are aware of it. There is a small percentage of people who actually don't know that they exist, but the number of users is very large of you. Here, I'm showing you a screenshot of the site hub website. You can see that yesterday, between 5 and 6 p.m., there were 126,000 people who used the website. There is definitely a lot, if you think about it. Site hub is probably something that you are familiar with because of all the legal issues. You have seen them in the news. Of course, if you think about it, site hub is hosting material without regard to copyright at all. If you look at all the legal issues, there is a couple of legal issues that became very popular. In 2015, the website was suspended, but the reality is that it came back almost immediately. In 2017, the creator of site hub, Alexandra Elbaquian, was ordered to pay $15 million to Elsevier in damages. The problem with these and all the legal issues is that it's actually pretty challenging to enforce any legal judgments because if you look at the story, they didn't even show up. There is no legal defense and it's something that is very difficult to enforce. At the same time, besides all the legal issues, the surprising thing is that site hub has many supporters out there. Here, I have from last year in 2023, the Electronic Frontier Foundation gave Alexandra Elbaquian an award. You don't have to try too hard to find more information about it. If you glance at the Wikipedia page, which they can access yesterday, you can see that there are lots of, actually a large number of researchers who have thanked site hub in the acknowledgments sections of the articles. Also, there are people who mentioned how different names for the creator, like Mother and Day Robin Hood and Robin Hood of Science and Sciences Pirate Queen. There are also some biological species that are named after her. Some people believe that they'll probably eventually get a Nobel Prize for her work, which is very interesting and the award that I just mentioned. This acknowledgement to her in articles is something that I have seen also in thesis defenses, etc. It is something that definitely is out there. What did site hub do? If we put it in plain words, basically 88 million articles in PDF format were stolen by site hub. Here I have the graph that comes from the website itself. Site hub had a growing database of PDFs all the way to 2021 and it's about 88 million PDFs that were stolen. Of course, these files are out there and they are posted by site hub, but at the same time they are posted by many other people. There are lots of different copies of the files. Again, as I said, this is something that you can find right away by googling. There is this website and an archive that has a collection of 99 million articles in the form of PDF and that other people have collected. Site hub is the famous one, the popular one, but there have been so many different ones over the years that have closed, that are constantly changing, but the copies of these articles are out there hosted by many volunteers. That is very interesting. I had this question to myself. Is it difficult to store 100 million PDF files? This month, this year, if you have a budget of $1,000 or $1,500, you could buy the equipment that is sufficient to store all these PDF files. Organize and searchable. You could have your own copy of site hub. Hypothetically speaking, don't do it. That is definitely copyrighted material, but it is it's crazy how things have evolved over the years. This was not the budget that you would have needed just two years ago. Our drives keep becoming cheaper and cheaper. This network attached storage units keep getting cheaper too. Building something like this is relatively easy, which is the surprise here. Why do people use site hub? There are lots of stories out there about people needing access because their institutions don't have access to articles. You will be surprised here that the main users here in the U.S. actually come from students from different universities and cities right here, including, as long as in Michigan, where Michigan State University is, as I was a graduate student on Michigan State, I can tell you that we, at Michigan State, we had access to everything. Articles and textbooks and many different libraries and different services and the university makes sure that everyone has access to all the resources. As a student, there I can actually say that I was spoiled because I didn't ever have to think about not having access to any of the resources. Why is this happening? If you have students that do have access to articles, why would they use site hub anyway? The reason is simple, convenience. Why is it convenient? This is just an example. Here I have the article that I was just talking about and here is the URL. This is the address. Site hub works in a way that you can change the address by incorporating the site hub URL in front of this. Anything could go after this slash right there. The DOI, the address, there are many different things. It will all point to the PDF straight and then download a copy of the PDF, which is normally what users tend to look for, the PDF, that is what you can read online, any online services where people talk about why they use site hub. This actually made me remember that back in 2011, I don't know if you are familiar with this. This is the hashtag I can have PDF. This is actually a really popular hashtag for people to request access to an article. If they go to any social media website and using this hashtag, then there will be someone out there who can send you a copy of the PDF, replying to your tweet or anything like that. It is just unbelievably fast. People request and there is someone out there who is willing to help and send the article. Normally, people also delete any posts because to raise the trace, so it doesn't find you for requesting something like this. This has actually existed. This is from something that was initially used in 2011. Of course, for those of you who are not familiar, this is related to this meme from 2007. You are reading some ideas that share what you have and somehow this was a hilarious way of requesting access to a PDF. Here, you can find lots of different articles and blog posts and everything about this. I can have a PDF. It is all related to that, how people, users are looking for the PDF file of an article. The interesting thing is that the motivations for the community are utilitarian. For some reasons, they feel that sharing articles is actually helpful to everyone else. I had never seen this manifesto, the Guerrilla Open Access Manifesto from 2008. This is just one of the paragraphs, just a few paragraphs, but it talks about how students, librarians, scientists have this privilege of having access to all the knowledge and how sharing it has to set a duty. If you say, you have a duty to share it with the world and you have. For some reason, this is true. Even for people who are not even familiar with the manifesto and are not trying to advocate for anything like this, it is something that naturally happens. You can see because of the large number of users who do this out there, there is the unreaded scholar. It is the same thing. People request an article, a chapter of a book, a thesis or anything. Then people participate in this type of discussions online and people do get access to that. This is, again, one of the many different websites that are out there. The Science Hub mutual aid that is relatively new. As you can see, 609,000 requests for PDFs have already happened. This is, again, something that exists and is some of one of the options. So what did we learn again by doing all this, by looking at this? One of the reasons why Science Hub became so popular is because it is easy to use. It is so easy to find a PDF. You know the URL, you know the UI. It is easy to use. That is why it became popular. Then, as I said, the scientific articles are available to the public. The damage is already done. We cannot deny that. Even though it would be interesting to request that Science Hub erases these files, the reality is that these files have been around probably a large collection of PDFs that existed much earlier than that in the creation of Science Hub. Then, well, Open Access is a reality. There are lots of enthusiasts you can see from the participation. The other key is that it is becoming increasingly easier to store and organize large amounts of data. There have been so many initiatives out there by the pirate community where these files are organized and metadata is collected. I think there is something there that we can learn about in this pirate community. With that, I would say thank you very much for this interview. Thank you, Louis. Next I would like to welcome Danny Schultz. Danny Schultz is the Director of Discovery Process Chemistry at Merck. Since joining Merck in 2014, Danny has been a member of the Process Chemistry and Enabling Technologies Group and became the Director of the Discovery Process Chemistry Group in 2021, where she leads a group of process chemists in support of the Merck's molecule and peptide portfolio. Beyond her role at Merck, Danny has gained recognition worldwide for her scientific achievements and leadership, delivering over 20 invited lectureships. Her commitment to collaborations is evident through partnerships with several esteemed academic groups, including McMillan, Schindler, and Sarah. These collaborations have resulted in notable publications, and her expertise in the academic industrial collaborations is exemplified in her recent Nature Chemistry perspective on the topic. Recognizing the significance of industrial publications and advancing scientific knowledge, Danny actively advocates for their importance and has participated in several open access panels to share insights on how industry might approach open access. So this is a real great opportunity for us to hear, particularly from the academic perspective on this area. Thanks so much, Danny. Great button. Okay. Thanks for the introduction. I sent a fresh copy this morning. So much happening. So I guess I'll just get started. Maybe all the slides are coming up. Thank you so much for this opportunity to talk to you today. I feel honored to be one of the few industrial folks invited to this workshop. I've learned a lot. Kind of as we mentioned over these past few days, I've taken really good notes. I'm going to take it back to our company and hopefully see how we can further work to advance open access and fair data practices. With that said, I hope today, through my presentation, I convey to you why industry publishes, why it matters to hear the industrial perspective. I think some of the barriers that industry faces when it comes to publishing and then also how we're starting to think about and work through open access and fair data both within our company, but also externally as well. Maybe my side will come up soon. Hey, there it is. Really skinny. Oh, beautiful. Thank you so much. Looks good. Let's see if I can advance. Okay. I do and it works. It works. I think we're good. Okay. We're good. Okay. Thanks. Awesome. So sorry about that, folks. So here's the title of my presentation, and I did want to give a bit of a disclaimer. This is a perspective. This is my perspective. It does not necessarily represent the perspective of Merck or the pharmaceutical industry as a whole, but I did have a lot of fun in crafting this content that I hope will lead to a pretty productive panel as well. And so I think a common misconception that we hear is that I thought industry doesn't like to publish. You like to patent. And while I cannot speak on behalf of the other chemical sectors highlighted here, I can say that overall pharma does like to publish papers. And our publishing philosophy at Merck is directly connected to not only our purpose, but also our values. And so as a company, we really believe in saving and improving lives around the world. And in order for us to rise to this really significant challenge and also opportunity, we really invest and hold ourselves to a really high standard with respect to the innovation and scientific excellence that we pursue. Drug discovery is incredibly hard. It's incredibly complex. It takes a really, really long time. We generate lots of data along the way, lots of really interesting results. And so naturally, it is our obligation and responsibility to do the best that we can to share that data more broadly. And so to help folks kind of understand what the drug discovery process looks like and what we might publish along the way, I did want to go through this very simplified drug discovery schematic that's broken up into these three different stages. You have drug discovery, you then have the clinical trials, and then hopefully the launch. And so within the drug discovery space, this is when you're dealing with tens of thousands of compounds, you're doing high throughput screening, you're looking for small molecules, you're looking for peptides, you're looking for monoclonal antibodies, you're trying to see if they're going to engage with your target. You do design make test cycles, you go through this process for several years at times until you eventually arrive at a candidate that you want to take into the clinic. And it is at this stage in which process chemists and formulation scientists and analytical folks get on board and we start to manufacture the drug in order to meet the clinical trial needs. And then if all goes well, we have a drug and we file and all as well. But this takes a really long time. It takes 10 to 15 years. And as a result of this, we can only share so much at certain times. And so that's one hurdle potentially for hearing from the industrial community. But with that said, I did want to emphasize why publishing throughout this cycle makes sense. And so within the drug discovery space, publications that you might see come out of here are really innovative drug design and also discovery, which includes synthesis. In addition to this, there's lots of shared problems amongst the pharmaceutical industry and also academia about problems that you might face during drug discovery. One that we're actually encountering right now is that with canonical amino acids, they are formally represented by a three letter code. Those are very, very standard. But what about if you have a non canonical amino acid? This is then called a helm code. And different companies like to come up with their own codes and not harmonize on it, which is creating a lot of stir. And so that's just one way in which we're just trying to work towards making a more harmonized vocabulary so that data sharing can become more fluent. And then last but not least, in vivo and in vitro studies are paramount within this preclinical space. We like to test anything before we take it in before we take it into people, but these models are really complex. And they use life. And so as a result of that, sharing it any way possible is definitely a good thing. Within the process chemistry space, as we start to synthesize and manufacture these drugs, we really pride ourselves on developing green and sustainable, green and sustainable manufacturing processes and sharing this more broadly with the community really helps in hopefully curbing any sort of environmental impact that these processes can have. And then last but not least, safety. So we like our employees to go home at the end of the day and we want others to as well. And so all of our routes, we try to publish as much rigorous safety data as possible, not only to share with other companies, but also the academic community, which legs a little bit behind with respect to safety culture. So over the past few years, there's been editorials and perspectives kind of highlighting why hearing from the industrial voice matters. So hopefully you kind of have a flavor about the types of work that we like to publish. And at Merck, there's four main reasons why we like to publish. The first one being that it really helps advance and influence the field of science. We're working with, you know, some of the best and brightest in the world. We have really cutting edge technologies. And so anyway, that we can share this more broadly will hopefully benefit the scientific community. Also, publishing gives us freedom to operate. So if we develop a manufacturing process and if we publish it, we can then use that process around the world. Third, it really allows the personal and employee development of our of our employees through publishing, they are then able to grow their resume, they're able to go out and give talks. And then this leads into a positive feedback cycle where we're then able to recruit the top talent who then comes and works for our company. And then our fourth reason is the stimulus for scientific collaborations. As as Olaf pointed out yesterday, by sharing what we're doing research on, we start to highlight and elevate problems that we think are interesting, which are really fertile grounds for these academic industrial collaborations, which I'm going to get into a little bit more. But there are some deterrents with respect to publishing. And I think an important thing to note that that that is a contrast between academia is that publishing is not our top priority. Our top priority is to save and improve human lives through the advancement of novel pharmaceuticals. It is not to publish papers. And this is I think something that Warren's mentioning. And so because it is not our top priority, this is usually something that folks do in their quote unquote free time. And it takes a lot of time people leave the company trying to find their old data. And so sometimes you never see the light of day for certain projects because people have moved on. And it's also just really complex. Speaking of complexity, the IP landscape is also incredibly complex. Some companies have different postures on whether we should publish or whether we should patent. Some are more conservative than others. I'll get into this with a little bit more detail on my next slide. And then the final two statistics I pulled from this Jay Medcom paper, which was put together by some folks at BMS and also GSK, which says that 20% of science impends actually ends up in a journal. And then 5% of the science that we actually do within industry actually is published. I'm not actually sure where they got this data from. They probably really great librarians who did this for them. But the numbers seem about right from what I can gather as well. So diving a little bit more deep into this Jay Medcom paper, it was a really interesting deep dive where they looked at 23 of 23 pharmaceutical companies. And they looked at where the medicinal chemists were publishing and they pulled seven top journals. Six of them are shown here. I went with six for the aesthetic purposes. And they tracked the publication count over the past 20 years. And I think a few things should hopefully jump out. One, not all companies publish at the same rate. There are some large pharmaceutical companies that pursue patents over publications. And the publication count overall is pretty low with respect to academic research labs. But I think the one more startling thing that came out of this paper, at least to me, is that when they looked at the total number of medicinal chemistry articles published over the past 20 years, what they saw was a decline of around 25%. And so medicinal chemists are not publishing as much as they used to. They go into a few reasons for why that might be. But it was definitely the main takeaway is that publishing is declining overall within the medicinal chemistry community. So this paper got me really intrigued. And I went to our Merck Library to ask how we're doing. How was our early discovery organization doing with respect to publishing? And they did a really great job. And so within our early discovery departments, we have discovery biology, pharmacology, and also translational medicine. And what you can see is that there is actually a downward trend towards the publications that we have. And I would say that the huge spike in 2020 is probably from the pandemic when people had time and they were publishing more. And so probably the more accurate numbers reflect the 2021 to 2023. And so this was interesting. And then I thought, well, what about development? The type of department that I'm in, the process chemistry space. And they pulled the stats for us. And so within formulation sciences, analytical, and also process research over the past few years, we see a more consistent track record of more consistent track record of publications. Overall, the publication count is lower than what our discovery colleagues have. But nonetheless, we do seem to publish at a more steady clip. Now, I think many of you are probably looking at the yellow versus blue bars. I was also curious to know, are we publishing in open access journals? And I was really delighted to see that within the discovery space, around 40% of our papers are actually open access. And then within the process chemistry space, around 21% average over these past four years. And we do this because we feel that hearing the industrial perspective matters and by making these articles open access, we can have a broader reach. And I think one really nice example that I wanted to highlight was a paper that was published in OPRD in 2017, which was about the Hofmeister series, it's an old principle, but we applied it to new, new, we applied it towards new concepts as a general strategy for salting out molecules that you might be interested in. And so I think a few things to point out. I think the publication count is now over 300. It's cited in more than 25 countries. And the article views are very high. And so at the time that this was published, this was one of the top one, this was one of the top one 1% of papers published that year, just kind of showing that when industry does make their papers more, more accessible, you can see the kind of broad impact that can result from that. And so when we think about an open access future and how this could impact industrial publications, as I think as Sarah put really beautifully yesterday, we're pioneers and being able to discover a drug and bring it to market is a huge innovative achievement. And I think anything that we can share along the way is going to be really important. And so when you think about open access, having broad access to our data through healthcare providers and also researchers and the public is going to be really important. We really hope that this accelerates drug discovery and also human health and also does prevents reinventing the wheel. People doing things that have already been done before. But I think some unknowns that we're just not quite sure of is what are the article processing fees going to be. And so there was this really nice C&E News article published in 2021 that kind of went into, I think this was when Model S came out. And as you can see here, the article processing charges vary significantly depending upon, depending upon the journal. We at Merck have a budget just like everyone else. It is that bottomless. We cannot pay these fees all the time. And so we do have questions about if the APC charges are significantly high, is this going to push some companies to potentially patent versus publish? Are they going to produce fewer articles as a means to save money? And then would they target a journal that has a lower APC even though it may not be the best fit? And then last but not least, some curiosity around what the data requirements will be. All of this to say, there's been some really great discussions about these institutional open access agreements. And I do wonder what they might look like within an industrial environment as well. So what I shared with you primarily is research that we're doing within our walls at Merck. But what about how we engage with the academic community? And so I would say that numerous pharmaceutical companies are involved in active partnerships at Merck. We have more than 50 that span across the world. We do this for three main reasons. We do this to build new capabilities not only at Merck but also at different universities like we have established these high throughput experimentation centers across the US and also Canada. Professional development is a really great way for Merck scientists to co-lead a project. It's really great for the PI and for the graduate students to kind of see a little bit behind the curtain about the type of research about the type of research that we do. Then last but not least, it's fun. So scientific curiosity, there's a lot of things that we are very interested in pursuing but simply do not have the time for. And so by engaging with the academic community, this is the way to kind of accomplish that. And so there's been a few articles written about why industry should engage with the academic community more. And I think if you look at the pipeline over the past few years, it's becoming more and more complex. You're seeing antibody drug conjugates, you're seeing macrocyclic peptides, you're seeing natural products. These are really beastly synthetic organic chemistry challenges. And so it only makes sense that we should leverage the entire chemistry community in order to solve these problems. And there's been a few articles written about different models in which that can be done. And so one was written in Nature Chemistry by 2018 by a panel of different pharmaceutical companies. And then Elsie Campo and I co-authored this Nature Chemistry review in 2020, which is called Harder, Better, Faster. We love Daft Punk, but we also thought it really kind of drove home the point that if we're being asked to develop synthetically harder targets, better and faster, we really need to roll up our sleeves and we need to engage. And so, and I think the crux of our article was really on why are we not seeing more of these academic industrial partnerships. And this was a theme that kind of came up yesterday. While it is incredibly important to engage with the public and do science communication, I also think it's really important for us to engage with each other across different sectors and lean into some vulnerability and some potential discomfort. And so what do I mean by that? Industrial chemists tend to rely upon these battle-tested reactions. They may not want to innovate because they want to do what works and they want to deliver the drug ultimately. And then you have a professor up in an ivory tower saying, why don't you use my catalyst a little bit out of touch? And then you have someone who might be me saying because your catalyst takes 15 steps to make and we only want to probably use 15 steps or less to make the drug that we are interested in. And so we're just not speaking the same languages. And so I would say that industry needs to do a better job about sharing the problems that we are interested in. But I think academia needs to lean into the need to lean in and ask for feedback about why their inventions are not being used. So we've seen over the past few years really more leaning into the discomfort and really awesome research being done. Here's just a few notable publications amongst the community with respect to academic and industrial partnerships. And so those agreements were done between institutions and also the pharmaceutical companies. However, companies can also engage in these goalie grants, which were brought up previously. As of January this year, there's more than 300 of these goalie grants going on that span a variety of different STEM fields. And I know that there's also these NSF centers as well, which are really great opportunities to engage with the industrial community. So my last slide and last thoughts, I don't have any more content beyond this is on fair data practices. I actually don't have much to share here. And I think Bob put it really well yesterday in that we are developing internally our own fair data practices to try to ensure that the data amongst different departments and different groups is definitely accessible. And it's definitely a work in progress. And stay tuned and more happy to answer any questions on that in the panel. So thanks. Great, Danny. Thanks for the whirlwind tour. Okay, moving right along. I want to welcome the next speaker to our panel. We're going to slide into support. So from research to support. Our next panelist is Yili. Yili is the librarian for chemistry, chemical engineering, material science and engineering at MIT. She also serves as a trustee actually on the board of Cambridge crystallographic data center. And she served as a chair in 2023 of the ACS division of chemical information. She's also been active on the force 11 scholarly communication Institute member of the program committee, executive committee and chair of the archive committee. And through research teaching and partnership with research and students, he has established her expertise in chemical information and literacy data management and sharing. And she's also been a librarian at the Colorado School of Mines and the Michigan State Michigan University University of Michigan. I always get the big stuff. I so apologize to everyone. So and she's she's really been very active in the in the exploring the role of librarians and supporting scientists in data management in particular. She's been an active instructor in the data carpentries work, for example. So I'd like to welcome you to talk about some of her experiences. Thank you very much, Leah. And thanks everyone for the invitation. And it's my pleasure to talk about how we support the research community with their open science and fair data practices today. And my apologies that I wasn't able to join you yesterday because I was traveling to Scotland for the International Data Curation Conference. But I heard that the conversation was really good. But pardon me if I'm repeating some of the things you have discussed yesterday. And so as I'm pretty sure that we probably have talked about yesterday that the excuse me for federal funding agencies requirements on open science has been there and has been evolving in the past two decades. And during this whole time, we have really come come a long way in terms of recognizing that open and equitable access to not only the publications, but also, oh, sorry, some of the, yeah, forgot to advance in the slides. Pardon me. I was just really highlighting that in the past decade, our open science policies has been evolving. And we really have come a long way to recognize that open and equitable access really not only applies to the publications, but also for the scientific data as well as all these other types of research outputs that will enable people to make progress with research. And the policies are also starting to recognize those legal, privacy, ethical, technical, intellectual property and secure security issues that may limit the access to data and also may have implications on researchers practices. And it's also recognized that we really have a broad range of stakeholders involved. And it's so great to have all of us represented in this workshop. And it really takes the village to do it right. And we do need to, we do need to really join the force together to help translate the policies for the research community so that they can build the research practices. And we want to translate it into different types of funds. And we also wanted to really improve the infrastructure behind the thing to, and of course, finally, we want to provide guidance and support for good practices and all those will help incentivize and also reduce the barriers for our researchers in all different sectors to be able to make this happen on the ground. And I'll just briefly talk about a few examples in my part of the world to illustrate some of this effort. And National Academy actually has this other roundtable working on the incentives and motivations for different, for researchers in different sectors, majority only look at the academic sector, but they do talk about incentives for different sectors to contribute to open science. So I don't, I don't want to repeat those things they have worked on now. But I just want to highlight like what's most important maybe for our chemistry part. Yeah, so as at MIT, the first thing happened earlier was we made sure that this faculty driven open access policy was in place so that our researchers and students are empowered and supported from the policy level within the institution so that they can go out there and say like this is really required as a part of my institutional responsibility. And so in the meantime, sorry, in the meantime, that would allow the library to really provide those support that aligned with the institutional policies. And we are, for the library perspective, we're really strategizing our investment in the, originally in the information resources now to make them as open access publishing funds, investment in open infrastructure, and other sorts of ways of helping our researchers to get, to get their feet wet within this new landscape. And particularly, I want to highlight some of the challenges for chemists to, to take advantage of the Aussie support happening on, on at this level. So from the publishing perspective, the funds provided for our publishing by library or institutions may not be sufficient to cover the full cost of the journal publications in chemistry because many of our costs for publishing in chemistry is particularly expensive compared to, compared to other fields that were studied down before. And I didn't cite them today, but we can see that that's even before the transition to the more open way of publishing. And a second part is most of many of these agreements we have in place between the libraries and the publishers may not really cover all the major publishers within the chemistry domain. And plus, I hope that I believe that we might have talked about this yesterday a little bit, the article process, article processing charge, the APC based model may not be equitable and sustainable in many cases. And so that proposes another challenge. Then when it comes to the open infrastructure investment from the library, sometimes the, at this point, at least the investment in those disciplinary specific infrastructure is has not been prioritized from the library and in academia. So, and speaking of the sustainability and equity for those agreements in terms of written publishing and so on or transitional agreements, so in order to move the needle a little further from MIT perspective, we proposed this framework to help us to get on the same page about where we're getting to in terms of these kind of transitional agreements. I don't want to go through all the details, but what I want to highlight here particularly important for chemistry fields is that when we talk about equitable access, we not only mean the access for reading and for human consumption, we also mean the machine accessibility, we mean the computational access and use of the data and the texts. And those are things that we are hoping under a new or a new perspective of talking about all these agreements for publishing, we can consider that as an essential part. And because they are the terms of the license agreements that would enable text and data mining and as a subsequent research on machine learning and AI studies. The challenges for chemists, I guess, we do have, we have made lots of efforts to make these agreements happen, but all these terms are very nuanced from one publisher to another. And it's such a complicated landscape to navigate. And the retrieval mechanism also varies because the infrastructure from different publishers are not the same and we're at different places in terms of making them better. And one more trend I have observed recently is some of our publishers become a little bit more conservative after the hype of all the large language models has brought the attention to how powerful the text and data can be in all these AI-driven systems. But it is a concern, it is a very concernable trend for some of our publishers become a little more conservative about that. I'm looking forward to have more conversation with our publisher partners to really find a good solution that would work for all of us together. And that one last part about the incentives I want to highlight is this type of sort of additional external incentive like MIT recently proposed this prize for open data. We have done it for two years now. And despite the fact that chemistry has been a hesitating area to share our data, we still had plenty of nominations and winners for this kind of open data prize. And I'm very proud of that for our community for that. And next part I wanted to really dig into that incentive part a little. All these extrinsic incentives including the funds, promotion, recognition, they are useful but maybe only go so far. They may not be sufficient to justify the additional time and effort for our chemists to go further with curating their data so that it can be more findable, accessible, interoperable, and reusable. I learned this from the researchers themselves. So in this study we did with a material science group, we heard comments like we spent the time to take care of our data for reproducibility reasons for our group. But it is difficult to justify that extra time we need to spend to further curating the data so that others can use our data to publish a pretty good model that only benefits their own academic career. That is a very real challenge and they also continue to say that if it could be a very different story, if the data we shared can be immediately turned into automated design in our lab through one of these machine learning or AI models integrated in our workflow. So if the research output actually benefits their and advance their science accelerates their discovery immediately and make that directly impacted to their lab, then they would be probably willing to spend more effort in curating those data to be more in a more fair and share them in a more fair means. And that is really consistent with this more recent report from a national academy about automated research workflow that we are really at at the pivot point of making it happen to close the loop between the big data and small data. And that fair data part is critical to close that loop because only if people started to dynamically share their data in fair means, all these models could dynamically use them to drive this kind of automated cycle. And for chemistry, fortunately, it's already happening. This figure on the right here shows one of the recent collaborative publication from MIT. And this kind of robotic system driven by a machine learning AI based model has already happened in the labs at least. So a dye molecule can be designed on the fly and the experimental can be designed and carried out through the AI based algorithms. And then the properties of the new molecules are measured on the fly and also then the data feedback into the system to restart a new phase of testing. So it's really happening. But the foundation for that for that is happening is because we have our unique language. So the all the hype about large language models out there is because all these are coming. Yeah, okay. Yeah, I'm sorry. Yeah, I'll speed up. Yeah, that's okay. Yeah. So, yeah, because we have those unique language in chemistry, we have the chemical structures, we have established the standards. And to make those work, we have to really make those experimental, we have more experiment data with quality, either generated by text data mining effort or treasures, or we need to have our researchers to be able to share their data directly. And here this slide just lists some of the standards who has made this happen, including into key, our IUPAC effort from the very beginning of chemistry becoming a domain like those terminology and like Goldberg and format standards like JCOM DX for Spectrum and all these all these means, all these standards along with our unique vocabulary of chemical structures really build that foundation for this kind of automated system to happen. So for me as a science librarian and a data librarian, my mission is really to provide those guidance and support to researchers in the practice field. And we do that by adding the chemistry flavors into the trainings for data management and sharing. And more importantly, we want to be able to provide this to specific research groups with the with the flavor of their own research practice. And to make that happen, we really need those embedded data steward and and data champions inside of those groups to make that happen. And it's not yet happening in the US much, but I observed that in Europe, many of these embedded data stewardship is happening right now. And that really made a huge difference on how individuals research groups can save time and build those fair data practices. And I'll skip this one mostly, but just want to say like we leverage those graduate student effort to and most importantly, this kind of a minimalist approach of making fair data happen before those standards can be integrating our infrastructure, we can at least to share those those documentation about the data, share the code about the data, share the identifiers about the data with a simple read me file so that eventually when the infrastructure is ready, these information are already shared and ready to be leveraged. And my last part is going to be about the all community of practices efforts that we are here trying to also build. And all these are what really can the communal type of effort really to make things happen. And I'll skip this one. And that includes the further further effort in helping our researchers to learn more through different workshops, but also help our data curators to understand more about the chemistry specific knowledge on how we can curate chemistry data in the proper way. And we want to this is what I feel most proud of recently, Leah and I made the connections happen so that we can allow researchers and work with communities of practice like IUPAC to convert all of our old treasures in the reference works into the structured data. And all these structured data are coming with the curation process and the code people have used it. And that truly made the difference for the reusability and interoperability and a further collaborative improvement can happen across the community. And the more examples you can find them in this cookbook that also Leah has led the effort for. And last but not least, I'll actually skip this one. And of talking about industry partnership and collaboration, I really appreciate what Danny has highlighted. And also from the academia side, we really try hard to build those two. And for example, this consortia, I think, more case also a participant of this really allowed our researchers to really build that investment partnership with industry and give them back some of those return of the investment to give them more insights and the collaborative opportunities down the road. But also in the meantime, have an open means to share what parts that the contributions from the academia side but also still leave the competing edge for the industry partners outside this. Yeah, I'll end my talk with this quick slide. But I saw this in at the airport of IDA yesterday when I came in. And this is the advertisement for this particular data management system. And they say that focus on the science, not the data management, but I want to argue that the data management sharing is really essential for advancing the science. So we should also spend time and effort on it. And I'm glad that we can, I can contribute to that effort to support that kind of effort to happen. Thank you very much. I'm very out of time. Thank you very much for that. Another whirlwind tour of all the things that are going on. This is a very exciting time. One more speaker. Thank you, Shannon for filling in our last spot. So we'll have another perspective from additional library services that are being made available. I think our goal here is to really emphasize, you know, so many activities that are happening and how many different partners and stakeholders can facilitate this collectively for the research community. So to help support that perspective, I'm inviting Shannon Farrell to the podium. Shannon is the research data services lead and the director of the digital repository at the University of Minnesota. I believe she's also a lead on the data curation network as well. And before becoming a librarian, she spent over a decade working on large-scale data intensive research projects in the fields of ecology, animal behavior, molecular biology, systematics, wow, entomology, sustainable agriculture. Oh, this is great. I love this. All the things you can do with data driven research. So she leads the Minnesota University's research data services team and this group is focusing on this campus wide education and consultation around data management and data sharing. So thank you so much, Shannon, for bringing your experience. Yeah, so I'm going to start off by saying I am not a chemist, but I do work with a lot of chemistry researchers in my role at the University. So my talk today is really going to focus on the University of Minnesota perspective, hopefully to serve as a guideline of what other institutions could be doing. So as Leah just said, the research data services that I am in charge of at the University of Minnesota is a group that does campus wide education and consultation around data management and data sharing. And we also run our institutional data repository. So nope, there we go. Okay, so research data services, although it's housed mostly within the University libraries, is responsible for coordinating campus wide education around data management, data curation and data sharing. We do have many campus partners, however, including a group called lattice, which is the liberal arts technology and innovation services. We also work closely with sponsored projects administration, the institutional review board, technology commercialization, the Office of Information Technology, the data science institute and several others. The role of RDS is to develop and implement data services and education for faculty, students and staff. We are actively involved in national and international conversations around data management and data sharing. This includes interacting and responding to those federal requests for information from the federal agencies, which we have been doing for several years to share the University of Minnesota's perspective and the needs of our research community. Okay, so what do I mean by data services? As I mentioned before, our work is centered around data management, data curation and data sharing. This includes helping researchers figure out how to best preserve their data sets for the long term. What does that look like in practice? It can be one-on-one consultations with researchers, reviewing researchers' data management and sharing plans, doing a variety of group instruction activities, and holding various outreach events, such as during Love Data Week, which was just during February 14th, or Research Ethics Week. So I'm going to spend a little bit of time here to talk about what our data management instruction looks like. We do sessions for large groups on campus, including departments, centers, or specific research interest groups. These have covered a wide range of topics, but most recently have focused a lot on how to write a good data management and sharing plan, or what the data sharing expectations are for various funders. We also conduct sessions for smaller groups, such as an individual research team or a lab, where we focus on topics such as how to manage data together as a group, going over various storage and backup scenarios, or how to manage data during a research project. We also teach directly within courses, usually those that are upper-level undergraduate or graduate courses. These most often are teaching general data management concepts or focus on particular aspects of data management and sharing, such as around human subjects data as one example. On an annual basis, we also provide data management boot camps for early-stage graduate students, so meaning they're either in year one or year two of their degrees. These boot camps are focused on data management basics, which includes file and folder organization, risk management, storage and backup, and how to provide good documentation. Beyond this, we try to teach different topics based on what we've identified as current needs. So last year, for example, we covered citation managers, workflows, and tools for backing up data and versioning information around federal data sharing mandates and data publishing. We've been providing these boot camps since 2013, so when that original OSTP memo came out, and we've only seen the need grow. So just one thing to point out here, these sessions used to be in person, but we switched to virtual only in August 2020. Red on this graph is the number who registered and yellow is attendance. There's always so matrician. In 2021, we also offered a boot camp specifically for graduate students in the college of science and engineering, so this includes the discipline of chemistry. This was based on an identified need for these students and high enrollment of students from these departments in our general data management boot camp. You'll notice a similar theme in what was covered to the general data management boot camp with some additional focus on lab notebooks and physical samples. It was well attended with 42 students from all stages of their graduate school career from that college. So I'm going to switch gears a little bit and talk about our data repository now. So this is what drum or the data repository for the University of Minnesota looks like. We're free open access and any University of Minnesota affiliate can deposit their data here as long as it fits within our guidelines. The guidelines are mostly around size and human participant data. We are not equipped to handle big data at the moment and we do not offer any restricted access. So we have policies around private and sensitive data. We also do require curation of the data, which means that a subject specific expert will look at the data to make sure that the files open, that it is adequately documented so that someone else can reuse it. That being said, much of our work involves referring researchers to other repositories. If you look at guidance from say the National Institutes of Health, they often say that researchers should look for discipline specific repositories when trying to identify a place to house their data. This is guidance that we also give to our researchers. Drum in many cases is a last resort because of a if a disciplinary repository exists for your kind of research, that may be the best place for your work to be deposited. The University of Minnesota is also the financial house for the data curation network and Mikaela Narlock the director is employed with University of Minnesota. The data curation network is a name implies as a network of institutions so 19 in total that comprises over 50 data curators. The data curation network is important to know about because it contributes to the national conversation around data curation and data repositories. Essentially they connect data specialists to knowledge that allows them to support researchers. An example of this is the database of primers that they've created. One example that applies to chemistry researchers, which we just talked about in our last talk, would be the mass spectrometry primer and you'll notice a familiar face as one of the authors. So let's see, sorry. Oops, gotta move on here. So all of our primers are public on the web on GitHub. They all have a similar structure part of which is highlighted here and I know it's hard to see so I'm sorry about that. It describes common file formats, known repositories, recommended open formats, how to convert files, and so on. So for curators like myself these primers help us understand the common types of data found within different disciplines. As I mentioned before the DCN is comprised of 19 member institutions. We all serve as resources to one another. What this looks like in practice is if say the University of Minnesota receives a data set for DRUM that we either don't have the expertise or time to curate, we can send it over to the DCN who will then work to find a curator from a member institution to curate it for us. They will look at the data set, open the files, and then make recommendations for ways to make the data set more fair. So findable, accessible, interoperable, or reusable. Our local curator will then send on those recommendations to the UMN researcher. So it really is a wonderful asset for us to have access to the DCN. Another output and contribution from the data curation network is the curated model of data curation. Curated stands for check, understand, request, augment, transform, evaluate, and document. This is a model that we follow when we are curating data sets at UMN. We check the files to make sure they open. We read all of the included documentation. We try to understand the data. Then we request missing information or ask for changes to the data set. We augment the metadata by adding DOIs. We transform any file formats for use using tools, so such as converting Excel files to CSV. We evaluate the data set for fairness by looking for licenses and so on, and then we document all the actions that we took by creating a curator log. Now I want to talk about how we have worked with chemistry researchers in DRUM. I'm going to start off talking about our work with the UMN Materials Research Science and Research Center, what we call MERSEC. MERSEC requires that data from the center that's associated with publications around a certain grant be uploaded into and archived in DRUM. We worked with them to create an easy to follow workflow and to establish a collection so that all MERSEC data sets are housed together. Both the collection and the workflow are linked here. So what does that workflow look like from the researcher's perspective? We ask them to follow a few steps. First of all, we confirm that they can publish the data, that it isn't owned by someone else or it's not trademarked, etc. Then they're asked to locate all the files associated with the publication and to make sure they open. We ask them to make sure that the files are described. We ask them to create JPEGs of ChemDraw files, and then we ask them to organize the files into a logical directory structure. We then ask them to download a copy of our README template and fill it out. The README asks for a description of all the acronyms or abbreviations that may appear in the data. It asks for a description of your data collection methodology. It asks the researcher to describe any relationships between files and to name what software would be required to open or access the data. The directory structure that was generated will also be pasted into this README. After these preliminary steps are complete, they will fill out the DRUM upload form and they will upload their data. They then proceed through the curation process and work with their assigned curator to finalize the data set. Once curation is complete, they receive a DOI to add to their manuscript. The workflow on DRUM side starts when the author deposits the data set. So our DRUM coordinator will look it over, accept it, and assign it to a data curator. The curator runs through the CUR steps of the curated workflow and then emails with the author with their questions and recommendations. The curator then waits to get a response from the author. Sometimes that goes quickly, other times it could take months or weeks. Once the author responds, the curator can finish the ATED steps of the curated workflow. They'll make any necessary changes, upload those changes, edit the README, and then mint the DOI. The curator then sends the author a final email noting the changes that were made and telling them curation is complete. We have a similar process in place at the University of Minnesota for the Center for Sustainable Polymers. They have this little nifty data sharing timeline that they use to tell researchers when and how to use DRUM and how it fits into the publication and grant reporting process. So you can see on this little timeline where you have to submit to DRUM. This is what a finalized data set looks like. So you can see the file types at the bottom. The DOI is on the slide, so you can look at it if you want. Sorry to go back here a second. There's the DOI, and these are all the file types. Okay, so I wanted to use these examples to illustrate that even though there may have been initial resistance with chemistry researchers, and there was, we have been able to generate buy-in with some high-profile chemistry research centers on campus, these centers now are serving as advocates. The way we were able to do this is to make the process as easy as possible for them. They have detailed instructions and a logical workflow that can be followed. And with the Center for Sustainable Polymers, there was an assigned administrative or research support person who could assist with the actual uploading. Overall, we're trying to create this culture of sharing and training the trainers so that it is just another expectation that they share. I wanted to close my talk here to discuss a little bit about what we've learned and what challenges we still have. So starting with the positive, we're all figuring this out together, and we are working through it. We've discovered that it is possible to share and disseminate your work or data openly. It definitely gets easier once there is a workflow in place. It becomes habitual. The workflow means there's less work for both of us as curators and for the researchers. It means there's fewer back and forth communications or unexpected questions that arise. These workflows have set expectations for everyone within the centers, and there's something that can be pointed to for new hires and graduate students in each of these centers. We have viewed this activity as growing the next generation of scholars, where it means that they will not even question or be resistant to sharing their data because it is just part of the normal process. It's commonplace. On the other side, there are still challenges. So in the last year, we have received numerous questions from researchers about storing and depositing big data. So in our case, we define that as hundreds of gigabytes or terabytes in size. This is not something that our data repository is currently equipped to deal with, although we are working on it and thinking about it. On our end, we have concerns about long-term storage costs. If we would have to eventually deaccession or removing datasets and what that access would look like for end users. So right now, we are promising researchers 10 years of preservation. Can users on the other side feasibly download such large files if they are at less resource institutions? So that's something we're thinking about as well. We've also fielded questions about providing restricted access. So again, this is not something that we are equipped to do. Our repository is completely open. There have been similar questions about data that's going to be patented or has other intellectual property concerns attached. That being said, more guidance is coming out around that, specifically about what you need to write into your data sharing plan and what kind of timelines should be followed. Finally, as a small but mighty staff of data, people at UMN, we are concerned about expected growth of need in our services on campus. So as it stands, we're not only working with a small amount of people or departments within the chemistry discipline, for example, and we anticipate seeing more datasets deposited in drum and more requested consultations about which repositories researchers need to use and how to go about curating their datasets. And that is it for me. I just wanted to acknowledge all the wonderful people that I work with at the University of Minnesota who contribute to this work and who looked through these slides and helped me. So thank you. Okay, welcome back everybody. This is our final panel discussion and we welcomed all of the speakers from this last panel to the floor to answer your questions and engage in conversation. So in the room, we have two microphones and we already got some folks lined up and then we've already had some great questions coming in through chat. I am going to just kick us off and if you can speak really briefly, fast forward 10 years, what does this space look like from your perspective with the kind of directions of your projects and your work that you're doing, your professional work? So sorry, let's just go this way. Okay, that works. If I think about, first of all, 10 years ago compared to like now, you can see how, when I say 10 years ago, getting all the information you want on a cell phone was probably not something that we were thinking about. And I am thinking that if you go to the future and we think about 10 years from now, there will be some technology that we are not even aware of. One of the most fascinating things that I have seen recently is to be able to connect to chat GPT with your thoughts. That's like one that I found absolutely amazing. So I can only imagine what that is going to be in 10 years. So you can access knowledge and a lot of information just by, let's say, googling something with your thoughts. I think the future is going to be really interesting, but also terrified. Yeah, I was going to kind of go along the chat GPT lines as well. It's something that I didn't have the opportunity to mention during my talk, but we use electronic notebooks and we do a lot of high throughput experimentation where we're generating reams and reams of data. We try to connect that data into our notebooks. How are we ultimately going to be able to search that today, tomorrow, 10 years from now? I hope in 10 years we're able to use a chat GPT or some sort of AI to just put in a scheme and it pulls up all of the reactions that have previously been done. You can extract the data, you can reuse it. So that's what I'm hoping the future holds with respect to that. My thought is also along that line, but I wanted to emphasize my hope is on the interoperability part. When we break the silos, things can really happen in the way that we're hoping it to. When I go back to earlier what Luis was mentioning, the ease of use for Saeha really rooted in that they really break all the silos, although in a not bigger way. Also, the underlying interoperability really made things happen. If in 10 years we could have our system fully incorporated, like including electronic lab notebook system, or now sometimes people raise it as research data management system, if all those things have underlying standards happening and if all of our researchers have good practices to share the data in a fair way more broadly, and then eventually that kind of system can become true. In addition to the large language models, we do need those quality experimental chemistry data to power those systems. So I hope in 10 years. I'm really going to bring it back to a university perspective, just thinking about our campus more specifically. So what Elaine said yesterday, there are never enough people, we really need to ramp up our staff. I also really think that we need so one of our goals this year is that we really reach out more across campus to other people that are doing data-centered work. We know that there's lots of silos occurring across campus, we know that there's a lot of duplication of work occurring across campus, and so we're really trying to find all of those partners within the research process. It's why we work with the IRB and sponsored projects administration just to get the pulse of what's going on so that we can meet those needs. But just thinking about our data repository, we want to grow that in ways that we actually are meeting the needs. So as I said in my talk, there's a lot of times where we have to turn people away because our technology doesn't do what we needed to do, and I don't want that to be a barrier anymore. And finally, just being here and learning about all of the international efforts that have been happening, I really would like to be more involved in that. I think we can learn a lot from each other and start establishing really good, we always say good practices, not best practices at our institution. So yeah, that's my answer. Awesome. Thank you for the practical viewpoint there. Now let's move to the floor, Marty. Great. Thanks so much, Marty Burke from University of Illinois. Thanks so much to all of you. It was a really fascinating session. My head is kind of spinning thinking about if you play this all out, right? Imagine everything that was talked about this morning synergizing to create new things being possible. I think it's really interesting to kind of go there. So kind of consistent with Leah's question. Think about it, there's eight billion of us on the planet, and maybe the one thing we can all agree on, everybody wants to be healthy. And we think about the power of medicines to help make the world a healthier place. And maybe I'll start this question to Danny. It'll be interesting to everyone's thoughts. Can you imagine a way that everything we've been talking about comes together to create a world where discovering new medicines is everyone's business? We could find tomorrow's medicines together through a kind of democratized drug discovery initiative. And how might we actually make something like that happen? Yeah, it's a really great question. And it's something that we've been thinking about and trying to engage in whenever possible. And so there are a lot of diseases out there. Every company has to prioritize their own pipelines, their own portfolios. And so obviously we can't do it all. And so as a means to broaden the impact and reach, if we're able to engage with the academic community or with other practitioners around the world, that would really be enabling. And so I think with that said, we have been part of a few different partnerships. So with the Bill and Melinda Gates Foundation, we've leveraged that collaboration to bring forward new malaria drugs. That's been really successful. I think another area that we are interested in is the antibacterial space. I think a lot of companies have walked away from antibiotics. However, resistance continues to grow and our toolbox continues to get smaller and smaller. And so I know that there are different acts going on in Congress. I think it's the Pasteur Act has been kicking around for probably 10 plus years as a means for the government to help fund and kind of kick off some of these drug discovery efforts within the antibacterial space. And so I think it's totally possible to have it done. For the group that I work in, we do a lot of peptide research. And I think peptides are made out of amino acids, which is a beautiful way to collaborate with the academic community in parallel, where you have them working on amino acid chemistry. They have no idea what the composition of matter is, but we're able to kind of like leverage that community to help us design the amino acids that we're interested in and bring those tools into our company in order to advance the peptide therapeutics. So thanks, Danny. And may I add one more aspect to that? When it comes to the health science data, I guess one thing is important is people's privacy and ethical use of the health science data. So that's another part I feel the academic sector and industry can really collaborate together to figure out how do we use those health science data along with the chemical data in a more ethical way so that everyone is protected, but also make progress together. So as people may or may not know, the NIH is requiring data sharing, data management data sharing for all of their grants now. And one, when that came about, which was January 2023, there was a lot of, and I'm going to use the word panic on our campus about what that meant for the health sciences researchers. And what we've found now that we're a few months out is that it actually is possible that we are sharing, they are sharing their human subjects research. And I think that is just going to be another thing that we have to work through. It's de-identified, of course, we are maintaining privacy and confidentiality, but it is happening. And so I think that will just pave the road for this sort of thing alongside industry. I think the idea of a crowdsourced drug discovery process is fascinating, right? It's something that you would not think about many years ago, but if, as we were saying yesterday in the discussions yesterday with this idea of a micro-release of information, if you have, let's say, a particular reaction to test and you can have many different enthusiasts that are participating in different places and they're testing the reaction in with many different conditions, let's say. And everything is done at the same time. There's all this data can lead to the discovery of a new transformation like that. I don't think there are any examples of crowdsourced research results yet, but it's probably something that will happen very soon. Awesome. Thank you. Great. Thanks for that. I have a follow-up question on chat related to this. And you mentioned, Shannon, the public access policies and we have this potential of research directions and collaborations with industry and government labs. How is that further thoughts on how those policies are going to impact those collaborations going forward, especially if the mandate has to apply to all authors, for example. Look at it, Danny. Go ahead. I'm not sure, to be honest. It's something that I'm not part of those circles. However, I'm sure that those discussions are definitely taking place. So I think when we've engaged in collaborations with the academic community for science, for chemistry, I would say that it is frequently the academic institution who is the more conservative party within these agreements being authored. And there's a lot of concern about IP and licenses where industry just wants freedom to operate with whatever inventions come out of these agreements. So the agreement process can take a really, really long time because of that. And it's usually because of the academic institutions. And so I would say that if we are going to engage in more open access drug discovery, all hands on deck in building these teams, I think it's going to require industry and academia to come together with what terms are reasonable in order to pursue that and to collaborate and to make the data as accessible as can be. We don't want to hinder the availability of our data. We currently have processes in place for folks in the healthcare industry to reach out to our company and access our clinical trial data if it is not accessible. We have a whole workflow for that. And so I would imagine that if this were to proceed forward, we would just find more workflows in order to get that done. I would think that the mandates are made to create a change. And then what people will actually end up doing is a mystery. And in the next few years, we will see the result. And by now, we don't know. Yeah. Just want to add what, from my perspective, this is not a binary choice. There are ways to really make tiered sharing happen. And I didn't have time to really go through the details of one of the examples I mentioned for the industrial economic collaboration. Yeah. So it's definitely respected that everyone needs to have the protection of their intellectual property. And in the meantime, also sharing as much as possible. And so our researchers are on that route already. They found ways to, for example, if it's a machine learning study, they find ways to share the part of the model where it's trained by the data that is publicly accessible. But for the part of the model that had the data, that had in-house data from industry or had data that cannot be broadly shared, then those can be much more protected and become only available for those sponsors that will support the research by their private funds. So there is a balancer that where how much can be shared and how much should be protected. I'm optimistic that as long as everyone keeps an open mind, we can still make progress collaboratively together. And there is a time limit to what cannot be shared at some time point. So later on with time going on, just like patents or things that we at some point that these can be shared as long as they were shared in the proper way at the beginning with the embargo happening then that that could happen. Yeah. I totally agree with what you said. When we have been working with researchers through their data management and sharing plans, there's been various ways of going about protecting data that can't be shared right away, putting it in a restricted database where they can allow a specific point of access to individuals. So it's not just, oh, it's open automatically and it's open to everyone that, you know, there are other repositories that exist for that purpose. And in terms of technology commercialization or IP, we have seen quite a few researchers just write in that after a certain time period, then they will release the data but it still gives them the ability to patent or trademark or do what they need to do. So as we always say, there's no one answer. There are lots of answers. It's a very gray area and you have to think outside of the box all the time because there's no prescriptive advice but there are solutions. Thanks for that. Range of perspectives. One more follow-up on this before we get back to the floor. Licensing. This was raised on Zoom and I know it's probably a question a lot of people have. Any thoughts about, you know, the CC by or other kinds of licensing is part of the accessibility and managing the what to share jump right in any library. So that is the license that we usually advise people to use. So it means anyone can use your data. It asks for attribution. Of course, we can't enforce it. It's not binding in that way but that is the purpose of it. A lot of people are very concerned about, you know, I spent years collecting this data. It's mine. I should be able to trademark or copyright it but this is one way for us to get around that since copyright law does not really agree. That's all I have to say. Yeah. Licensing. I agree with Shannon that the enforcement part is really difficult especially when people use aggregated data to build those computational models. And it's almost, for now, it's most impossible to recognize contributions from different parties in the traditional way we're looking at citations. Even if there is a CC by, it's really hard to count those contributions. But some of the researchers on the campus are seeing that problem too and they are doing some research and trying to see if there are technical ways to help that. But in the meantime, as a practice, I guess, the recognition of people's credit by contributing data should really become a more philosophical discussion among ourselves. Yeah. So another part of, is a part of the conversation is the idea of scooping. It is a real concern. Looking at data and looking at the publication are quite different in terms of what people considered as scooping. So the licensing part may or may not necessarily have help with that aspect. So it really becomes an ethics discussion very soon. All right. Let's go back to the floor. So this question is going to run sort of the other direction from these concerns about IP and licensing. And this may be a naive sentiment or question, but might there be an illustrative example provided by the tech industry and their embrace of open source? Although it wasn't always the case, eventually actors in that space really adopted and it became the standard practice for the tech industry, where they actively work in the open source sphere and reap great efficiency gains from that. And then focus on actually the distinguishing aspects of their product when they're actually trying to market a thing. Arguably, there might even be a greater opportunity here in the case of chemistry in that you still got the tech industry out there making all of those open source products. There's the opportunity to focus on the non-differentiating aspects of processes and products from these companies and to sort of develop those in a sort of narrow open source way and to have those benefits reaped by everyone. And I guess my question is why hasn't this been more actively adopted? For example, openly sourcing processes, standards and workflows and what might be some of the windfalls that could be reaped if this were indigent and actively? I'm going to say that open source software is an amazing thing. Back in the day where you would have companies that are writing to have a particular program of software and then open source is there on the side. And I think the big difference is that with code, there is a pretty specific thing that is being shared which is the code. So it's something that can be put out there and there is a particular language and there is a way in which it's done. But with science, it's a little bit more challenging because sharing data in the way it was done with software is a bit different. And also in terms of software, you do have, I mean, just an example would be Microsoft. Microsoft is behind lots of different open source projects. At the same time, they have their normal programs but they're also contributing to the community. But in our case, in the case of science, it's not the same thing. There is no transition in which the two closed code and open code are coexisting. I think in our case, it changes more and more dramatic. I don't think it's that the same thing could apply. Thanks, Luis. And I just want to add my probably a very shadow perspective on that. So open source really worked for software. For one of the reasons is that people who build upon those open source software could get their investment, return of investment through providing service-based maintenance or customization or other way of service providing. But when it comes to data and also publications, we are yet to find a way to define those value-added service to allow a sustainable way for people to keep contributing back to that dissemination system. So that is the challenge in my thought. I think we are making progress in many sense, again, if all sectors work together, to try to define what's at value-added service where we can allow this system to really self-sustain. And that's the key to get to that. But thanks for a very good question. I think to kind of build upon that, I think open source is interesting and we do our best to share what we can with the community. But I think developing a piece of software is arguably more simple than developing a drug. And so even if we're able to share one aspect of the process that we use to manufacture a drug, I think in the larger picture of the drug development process, it's just a very small window into that. And so I would say, sure, we can do that, but it requires multiple departments, hundreds if not thousands of people to ultimately bring something to the end and a lot of different scientific expertise and clinical expertise, medical doctors. So I mean, there are things that we can share, and we hope that people can leverage our technologies when they can. But I think within the bigger picture, it's a lot more complicated. Yeah, Alejandro Strachan from Purdue University. So the discussions yesterday and the presentations show that there's still open questions and concerns about sustainability of data repositories, even open access publishing. And when we look for solutions, we tend to look at the federal government and the examples yesterday were all supported by the federal government. And so the first part of my question is, what's the role of industry that can actually benefit from having these open resources and have orders of magnitude more resources than the federal government and the funding agencies? And so, and maybe to Danny specifically, I wonder, so you showed us the fraction of papers that were open from your company. I know you're the only representative of industries. I went to put you on the spot, but it looks to me that necessarily companies, as you mentioned, publish less than academic labs and national labs in terms of their output for obvious reasons. So it seems to me that it's be entirely in their advantage to make and support open access and paying for publication charges. So I wonder why they're not taking more of a leadership position in the open science, open access science. Thank you. I can start. So like you that I don't want to speak on behalf of all of industry, I would say that in a point that I tried to make is that we publishing is not our top priority. It doesn't pay our bills. And so while we do it because it is part of our values, it's our obligation to inform the community about what we can pursue or inform the community about what we are pursuing and to share that information. I would say that we have budgets just like every other institution. And within that budget, only a fraction of that is for publication support. And so I think the fees incurred through open access, while some departments are more than willing to pay them, every department is balancing their own budget, and they are weighing various factors. And so our department is more open to open access, no pun intended. But other ones, and I don't if I mentioned this or not, someone in a different department last week wanted to publish your paper open. And they were told no, because the fees were too high that money could go towards sending someone to a conference or it could be used for potentially other research applications as well. And so I would say that the value that industry has in making more of their publications open access is tremendous. And that's why I tried to emphasize that industry already is reluctant to publish as it is. And so the more barriers that go up in order to make it more expensive, more difficult, more restrictions, I think you're going to see fewer industrial voices, which I think is a real shame. And so I just encourage publishers to have industry at the tables when these discussions are being had about APCs, and these open access agreements and maybe industry could engage in these agreements to try to find a way to make sure our research is more accessible globally. Yeah, thanks, Danny. And I wanted also to add like, how direct the benefit is to the industry sector can be a very determining factor. And I see that on the data front, it's what rather easier to see that direct benefit. And our industry partners actually see that and they are taking the leadership role in some of these areas. For example, the Pistoria is one of the consortia for pharmaceutical companies to get together to leverage the open data and also to contribute back to open data and partner with government sector like Pubcam and so on and to make it better. So there are areas the industry is taking leadership for the open part. It's just we need better ways to show that direct benefit and direct impact. So I hesitate to say this, but from a librarian perspective, I would say a lot of our values and toward where we steer people and what kinds of repositories we steer people toward, we usually veer away from the commercial ones. And so I think that would just be something that we would really have to consider. The reason being is that, you know, if a commercial entity decides that they no longer want to do it, what does that mean for the data that's being stored there? Do they have like timelines in place and preservation guidelines that say that they will be keeping it indefinitely here for 10 years? Or is it more if they decide not to do it anymore? You know, everything there is at risk. And so that's something that we think about all of the time. That's it. I just want to say that I thank you Danny for sharing that the fact that publishing open access was simply more expensive than just publishing the normal way. And I think that's an important thing that we should consider. Oh yeah, that's a real concern. I think it's a real concern. And I would say that all the industry as a whole budgets are indeed differently. So every company has their own R&D budget. Some value it more than others. You know, I think one type of industry that we haven't heard of, which is doing really awesome research is the biotech industry where, you know, they're kind of at the cutting edge of science. And so I think, you know, they definitely probably do not have, you know, significant time or money to spend towards open access efforts. And so I think, you know, that's another voice that isn't being heard because we're all just trying to do the best we can to get our jobs done. So just a little bit of a follow up question related to something that came in on Zoom around this topic of, you know, what are other ways maybe that industry and other sectors can contribute to supporting open access and sharing such as, you know, helping develop platforms that, you know, that others can use and, you know, contributing in other ways, thoughts on that. So there are several industrial consortia that exist where, you know, companies are coming together in this pre-competitive space to figure out what, like to figure out what we can share, the best practices, how to formally share this information with the broader community. And so I think, you know, IQs is what they're locally called, I think is a really, you know, potential avenue in order to explore that. Great. Thanks, Danny. And then he also mentioned to Stoy Alliance and I think there's other, you know, Alatrope and so that's helpful to appreciate that industry connection through those groups as well. Want to go back to the floor? You've been waiting a long time. Thank you. Ralph House from the University of North Carolina Chapel Hill. This is predominantly, I think, for Yi and Shannon and any other libraries that are in this room. But I'm curious how, what kind of interactions you all have with your Vice Chancellor for Research Offices. You know, in Chapel Hill, they're sort of leading the charge in this space. So I'm just curious in your institutions what kind of interactions there are between the VCR office and the library. So I mentioned sponsored projects administration. So that's actually part of the OVPR or Office Office of Vice President for Research. They actually just changed their name to Research Innovation Office. So Rio. But same thing. We meet with the director of that of the sponsored projects administration on a bi-weekly basis. So we are very closely aligned with them. We know what's going in and out of there. You know, her boss is the head of OVPR. So we partner on figuring out who needs to know about what, coming to talk to the Council of Associate Research Deans, and so on and so forth. So I, and that's a, that is a partnership that has arisen just within the last year. But it's been very valuable. And I think on both sides, there's a lot of things that we wouldn't have been able to get out to campus without having that wine. Yeah. For our campus, I guess what really is the fundamental thing is our library director and leadership has worked very closely with the faculty body, faculty governance body from the very beginning to make sure that all the policies and directions and all the important critical decisions are coming from faculty members. Even though the library is the one who is responsible for getting it, making it happen, but all these strategic directions and all these decisions come from faculty-driven task force and committees. So that's the first layer. And then our director, I'm really stepping out of my comfort zone because I'm not involved in those leadership part of the conversation. But that close relationship with the university leadership makes sure that we are reflecting what the research community wants to do is definitely the key for the foundation. And as Shannon mentioned, we also work with other campus units at different levels. And at MIT, we work with the general council with the office for sponsored program and the workplace. We've newly founded the office of research data and computing. And we are trying at different levels to break the silos. And all of our larger institutions are very decentralized. And we are trying to, again, break the silos from different levels of, yeah, but having that foundation from the beginning with the faculty-driven decision is kind of a key. Okay, thank you for that. Olaf? Actually, before a question, I wanted to follow up on a question that was made earlier about open science and the analogous in open source. There is actually, there was such an initiative about, I want to say 14 years ago, around a compound called JQ1 where there was actually an initiative for open cancer research. If you're interested, there is a TED Talk by James Bradner on open cancer research and JQ1. But actually, I think Harvard Business School might have been even had a study out there that what they basically did there was this was the first bromodomain inhibitor that was published. And they went to some CRO and had like a pound of that made and everybody could write them and would get a sample of that, free to use, free to operate, do whatever you please. And the impact of that in literature was tremendous because there was a back-to-back paper by GSK, GSK, also bromodomain inhibitor. You look five years on on the citations between the GSK compound and JQ1 and it's like a one to 10 ratio. Okay, anyway, that was not what I wanted to ask about you. We talked a lot about sharing of data and I think it was Shannon who said that there's a lot of gray spaces and you have to kind of figure out each case. How about sharing these agreements? In other words, the models that enable the data sharing. I have in my time did quite a few of them and they all start with the thing under no circumstances can you make these agreements public or it would be so much easier, particular for like a center as I'm heading to say, well, here is the agreement with company X, can we use that as a template for company Y and the answer is apparently no. So what are initiatives that could facilitate these kind of agreements by saying here are a few examples how this could be done and yes, there will probably have to be change and massage and timelines change but it seems to me it's a huge waste of time that everybody starts at square one. So what are initiatives? What are things we can do to allow the sharing? Yeah, that transparency part is really critical and I was trying to highlight one of the things MIT is proposing that framework for publisher agreement negotiation. While the key piece is the transparency part about the agreements themselves and there are many historical reasons the non-disclosure term is in place, it will take a while for us to adapt any new practices but we are putting those efforts in. I think one of the key factor to make that truly happen is we at least need to be open to the strongest driving force for decision making from different parties and if we are not transparent or if we are not open to talk about that then there is no way that we could make the agreement open because then that would review the motivations that we don't want to talk about. So yeah, I don't know the solution but I think there are multiple forces at this point trying to make that happen. Well, I have two thoughts. Another office on campus that we've been trying to work with this year is the unfunded research agreements office so the ones that not spas counterparts where they are working with businesses and it's not necessarily funded through a grant. I think that would be a great place to talk to them to see about the reasons why that they couldn't share these kinds of agreements. On the other hand, we have been working with researchers so part of one of our services that we offer is reviewing all of these data management and sharing plans and so part of that is looking to see what contracts or agreements are in place and we ask those researchers that they will give us permission to share those on our website. The problem with that is that we're not going to cover every discipline, we're not going to cover every research area but we do have some examples which would help illustrate to a researcher what they need to do in order to share their data for under certain circumstances. That is one option and I know that NIH has a great resource of data management and sharing plans as well examples. Yeah, and just a quick addition to that. Again, I was at the data curation conference earlier and one of the things people talk about is called machine actionable data management plan and one thing that kind of thing would enable is to sort of have a higher level summary or a deducted version of the data management plan or including agreement ways above the sharing part to share like not the whole plan itself but at least a sort of redacted version and machine actionable DMP can have the potential make that happen automatically. I can only speak on behalf of our legal department and why we may decide to share agreements or not but within our company we definitely have agreement templates that we leverage so we aren't reinventing the wheel however every agreement is so unique that naturally you kind of have to do some of that anyways. I would say I think this is like a really good place where consultants probably come in handy in order to advise on you know the structure of these agreements and you know just having conversations with folks in industry and also academia about what they've done but before I mean we are open to giving suggestions and ideas but as far as like actually sharing what the actual agreement was I'm not sure if I can comment on that. Okay great that's a really interesting area but like the reference to the DMPs see that could bring forward some of this thank you for that. Elaine. A comment and a question and this is to Danielle I think that industry may be underestimating the impact on publishing and so when I'm often presented a new license for journals that that is much more expensive than before so when it's to make it open it's coming up 20 25 percent the excuse is typically industry we're going to lose the publisher saying they're losing the industry revenue therefore they have to tax the libraries right and I hear this over and over and over and so I do think that there is some conversation that needs to happen because somebody's going to pay right and I think I think it's unfair for higher ed libraries to to basically recoup the costs from industry in order to support open access right so that was my first point and then my second point is for Louise and this whole the pirating and all this like this isn't going to end like this seems to be the cost of business right and so and I stopped talking about it because it's just it it's the ease of use that is really part of it and I'm just wondering have you I mean my biggest worry are graduate students and you know that they're going to get in trouble and all this and we really discourage it quite a bit the library does a lot to discourage it but have you know I'm sure you've had this presentation and other venues like what are you hearing oh you're not like what are the administrators like what are people talking about because this is you know this is part of the calculus I think that's again that's why I think I'm being taxed so much is because of trying to make up for the leakage that's happening and and I just don't it sounds like from your presentation there's no end in sight like this is just going to keep going it's a really interesting perspective one that I didn't necessarily think about and I can see industry having a role in trying to influence other industries to kind of step up and to lean in to open access and to be part of these agreements I would ask to see the data when they say that industry is you know the reason why your your fees are going up as I showed in that publication totals for like say medchem that was you know not all industry publishes at the same clip and even if you take some of the more aggressive companies such as Merck and Pfizer and GSK like 1,000 articles over 20 years that's small it's really small and so I do wonder you know is it really industry needs to step up or is it other academic communities as well need to but I take that as an action item to to to go back to our company and to hopefully you know encourage us to lean into some of these open access discussions and potentially on these license agreements as well I mean I'll just add that I think it's my assumption as you're paying a lot more for the subscription as well so that's I want to acknowledge that that you're paying more than the libraries probably are yeah and so again I'm not putting the honest on you and industry but I'm also saying this is what I'm hearing from a lot of publishers that this is a problem that has to get addressed yeah and so I just want to bring that up and similar to your talk which resonated with me when I talked to our librarians and encourage them to pick up other subscriptions of journals they say we can't we don't have the money to and so there's journals that we publish in rsc journals that we don't have access to so we have to leverage document delivery in order to get the articles that we publish which is kind of ridiculous right so I would say that you know we're not we don't have deep pockets when it comes to publishing and so I think if we think about reasonable fees and practices that apply for academia and also industry you will have more buy-in ultimately that's really helpful to hear that do you do have some follow-up on that Sarah maybe I feel like I should say something like this is a complex problem right and this is why this transition is going to be messy for some time if you think about the way that acs pubs revenue is split in a in a subscription world right it's split between academic libraries it's split between government libraries it's corporate libraries and pretty much the main sources of revenue right in an in an open access world a lot of that the source of revenue shifts and and we can argue whether you know whether apc is the right model I don't know it's kind of kind of one of the the dominant model that's out there right now and business models last as long as business models last but in that kind of apc based world it means that those institutions that are producing a lot of the research are the ones where the financial burden falls and you know I hear the concerns here about like the large publishing institutions right you guys are getting hammered by you know by by the amount of research that's coming out of your your institutions and that's really great and there are lots of ways that we are working on and doing this right when we when I talked yesterday about these undergraduate institutions where we say as long as you keep subscribing we'll we'll let you make all of your um papers open access I think that's a really interesting kind of model and it it it works in small quantities but I think we as an industry have to figure out how to make this work uh well across the board and I really welcome more dialogue about what do we do to make it fair yeah and may I ask a follow-up question to Daniel actually so um as we know uh rc is aiming at make everything away um by uh 2026 so when that happens how likely how likely do you see uh that industry would still pay back to that process in a different way like it's not a bc-based um if there is a model to do that how likely do you think industry would be willing to contribute that to that I don't know sorry I like I just don't know um and so again I'm probably not the best person to speak um on behalf of our Merck library and all of that and and and industry as a whole but I think having us at the table when these discussions are taking place I think matters a lot so you're at least hearing our perspective our concerns what we care about to ensure that we are doing all that we can to ensure that the that the industrial voice is heard yeah thank you being open to that discussion yeah absolutely and and we are I know that rsc um has pinged me recently about maybe having kind of another panel like this in order to hear more industrial voices besides me which is great because I think um you know there's lots to be heard there so i'm an industrial guy great yeah so i'm an industrial guy and I was I frequently was in a position to oversee agreements it's pretty easy the motivation is is it creating long terms shareholder value and if you answer the question yes you get to do it you answer the question no you don't and it doesn't matter whether it's advocacy or anything else it's really pretty simple it's not a big mystery and if you can draw that line then industry will fall in and support you we haven't really looked at history here but I come from dal chemical and if you look at dal when it was in its heyday we were all over the books books used to be these things that were the repositories so like stalls books on thermodynamics still out there there's still the thermodynamics of organic chemistry that were published by the company when they had enough access to allow it but if you watch what's going on at google now right they've stopped the buffets they've stopped a lot of the do good for the world because you start coming back to is it delivering shareholder value my diaphragm appreciate that perspective I appreciate this dialogue this is exactly what we were trying to film it one less question and then I want to allow Louie to get back to the part well to answer the policy question that yeah we definitely don't want students to get in trouble and it is illegal to do this right to just simply get a copyrighted material like that and share or make money out of out of that is definitely legal so when you think about many years ago in the music industry for example when people were using Napster and there were all these suits people were sued for copying a song or a couple of songs and then they had this millions of dollars in fees for a person or going to prison and all that and then it's a crime that then is so massive that there are so many people doing it that is difficult to enforce and I think it's exactly what is happening nowadays and maybe it will be something similar to what happened to the music industry in which new services appear that didn't exist before and maybe that will happen but we don't really know yeah and I just want to add to this part too so from my experience talking to students about Sehab and about the piracy part I think one thing really keeps me hopeful is once I articulate that their action has consequences to the long-term healthy healthness of the publishing system they started to say oh that makes sense and next time maybe they will do differently so at least I encourage them so even if for convenience reason you use the Sehab source but still put in your interlibrary borrowing request so that it can be counted and if something can go back to the publishers who made this happen and that helps them to make the whole ecosystem healthy so I think many students do get that and I guess our way of education really could help for junior faculty member and also students to think twice before they yeah that's a really nice point thank you for enforcing that um Jake do you want the last word we're about to wrap up the panel oh my goodness the last word well I'm glad I'm here because I promised you that I would bring this up so um I'm gonna ask a standards question um we haven't really talked a great deal about standards in this workshop even though they're looming very large in the open data space um Danny you I think you very briefly brought up the helm system for for um designating unnatural amino acids in macromolecules and um I want to sort of think about that as a springboard for you know are we having enough integrated standards discussions obviously iupac is you know the the gold standard so to speak but um you know should we be engaging more um among industry academia um the the various uh chemo informatics groups to improve but maybe not even improve standards but build awareness of standards so that when people are archiving their data you know the data are are interoperable I'll go first when Danny is thinking so uh totally agreed uh Jake and I um as I was trying to talk a little more about the standards part too but I guess the key there is really the adoption part and the adoption is not only for the practitioners like for the researchers but really for the infrastructure to adopt and that conversation really happens in that communal way and um I I like so for the researchers part um what they hope is that all these standards can really be behind saying and so they don't have to worry about it um but that but as you said the awareness part if they don't they are not aware of those standard matters then they can't really bring it up as a as a requirement or as something that they need with with the infrastructure builders so that's the part that I keep education part that I think we can do more about I think it's a really great question and it's something that I don't know why it's so hard to come up with a standard harmonized vocabulary across different disciplines and so I did mention the amino acid codes I deal with that on a daily basis and um it is um you know it's mind blowing to me why we have different companies coming up with their own vocabularies and I'm not sure why that is and so I would love to have more discussions we wrote an article um a year ago on the potential opportunities and pitfalls of of non-canonical amino acids there's been papers on helm we've been trying to engage with the industrial community on like how we can come together on this and it's been crickets so I'm not really sure what's behind that and so yes I think we need to do more there and I think that some industries have an appetite and I think the more that you start to publish and start to see these things probably the more adoption occurs because it just becomes more commonplace thanks for this and I'm actually going to switch my hat really briefly here because actually Helm thank you for mentioning it was recently brought to IU PAC from the Pistoy Alliance it was originally developed by members of this Pistoy Alliance to facilitate exchange in a pre-competitive way and and but the Pistoy Alliance has been very successful about pulling together industry groups to you know initiate projects of shared interest but they they have definitely made the decision that they don't do long-term sustainability so IU PAC was their go-to place and we're currently putting together a sustainability plan for continuing development and community support you know community engagement with the development of Helm so you know it is we are trying and we realize that this is probably a you know that community coordination is is so vital both to how that standard came together but also carrying it on to be a useful tool and so I just it was just a great segue thank you for that and I'll put my other hat back on and and finally give the last word you had discussions over the break about following the money so I've learned a couple of things in my life and one is follow the money and certainly SciHub was a curiosity to me because I couldn't figure out where the money was coming from but turns out they published their their finances they run the whole site for like 13,000 a year it's the largest as you I think you stated largest repository of pirated data anywhere way bigger than the PDB which I think we learned had a 10 million dollar a year so it is curious to me how these places are going to get funded how they're going to remain funded because you talked about you don't like commercial entities because they could disappear I don't understand how you can't worry about having a non-commercial entity disappear too if the funding runs out but in SciHub it was kind of an interesting thing that somehow they're getting by on a really shoestring budget all with relatively small contributions if you believe them I don't know whether you've looked in their finances I did not look into the finances of SciHub but it is definitely low cost that's that is the that is the idea one of the I think the important aspects is how all the data is actually being copied constantly and many people have different copies of all the the collection of PDFs because there are out there there are lots of data hoarders people who just are simply to have that that hobby that is storing preserving data for the future and having a server at home is something that is appealing to them and collecting as much information as possible like is and sharing it is it's actually a hobby and that how that's a community that is out there that is very supportive of initiatives that have to do with sharing information back when SciHub had all the legal issues and that's the interesting part about it right is all these websites exist SciHub it has there's a lawsuit and the first reaction to that is let's save the information so then the pirate community just is into just rescuing all the data so then more copies are made and the the the code is protected and all that is kind there are so many copies out there that is to me what I what is what is fascinating about it that for no cost because the data is preserved by volunteers somehow the system is preserved by by doing that even if a copy is erased somebody will have a copy of it sorry I just also have to add SciHub is of course you know dogged by rumors that it's supported by Russian intelligence because they're using the credentials to break into universities and steal data so that would and just just want to point out that the obvious too is um their low cost is because all the rest of the cost is covered by others so that that's the that's the interesting thing about it it's like oh we they're ordered to close the site but the site is still there and it is very difficult to enforce okay I do want to leave a few minutes for wrap up and we don't want to let people go at at noon people probably want to get on with their weeks and um but just to that that very last little conversation I just reinforced what you just said I mean SciHub is possible because the metadata that goes along with those articles has been developed and curated and normalized in the community right our citation our citation metadata and and and that is done by publishers it's done by the scholarship community when when they submit so it is a lot of and that's what when we talk about interoperability and and documentation that's what we mean and so that you know it's great that it's convenient and that's a value add that's being built on um on the metadata that's already available there you go yeah yeah yeah and I to me when I think about this future that's the you know there's the the part about the the create you know generating the data from robust research and documenting in such a way that you know both exploratory areas but also a lot of functional practical workforce resources can come can come out of that um you know I think and again SciHub is a is a great example of what can be done when the metadata are made available and and are are are consistent I think that um yes it is illegal and all the data was stolen and it is it is a terrible thing but I I find it very interesting how SciHub kind of provided a a glimpse of what a world of open access would look like is a simple or I simply have access to all the information I want and I change the URL and there it is that's what I was looking for that that is absolutely crazy interoperability really at its success all right um I think we want to close out the the conference I just want to really thank all the panelists on this panel and then all the speakers over the last couple of days is just fabulous to hear everybody's actual experience working in you know all around data and and scholarship yes chemistries chemistry is about the long tail but a long-term science and I mean I think we have I love this field because we accumulate knowledge is has value accumulated knowledge so we can just keep at this for another few hundred years and and move towards the best practice is possible to keep the knowledge flowing and advancing so thank you so much to all of these panelists and all of you for your questions and the discussion and I'll just