 I think we're good. Okay, we are at three o'clock now, so we'll go ahead and get started. Hi, everyone. Welcome. Good afternoon. Thank you for joining us for our virtual conversation on data sharing in the life sciences. My name is Melanie Ganey. And I'm the director of the open science and data collaborations program at Carnegie Mellon University Libraries, as well as a liaison librarian supporting the life science departments at CMU. The open science program has existed since 2018, and we support publicly available, transparent and reproducible research across disciplines with tools, guidance, training opportunities and events. And CMU Libraries also provides financial support for open access publishing. As some of you might be aware, the White House declared that 2023 is the year of open science. And in celebration of that CMU Libraries is hosting a number of events and activities to stimulate discussion awareness of open science this fall. We're excited to have our fourth open science symposium being hosted virtually on November 3, as well as an in-person mixer event on November 8 on campus, so I encourage you to check those out as well. This roundtable today kicks off our open science programming for the fall and was inspired by conversations that we've had with researchers on the changing policies around data sharing. Last year, the White House Office of Science and Technology Policy announced that all federal funding agencies will have to update their policies within the next couple years to require public access of publications and underlying research data with no embargo periods. And life science researchers are some of the first to be directly impacted by changing policies in this area of data sharing. In January, the NIH updates policy so that NIH funded researchers have to submit and comply with a detailed data sharing plan for their research data. Oops, actually. And while many researchers in the life sciences do have some experience of data sharing, either from journal policies or the NSF, or simply because it aligns with their values, we do expect that the NIH policy will greatly improve public access to research. And it's important to note that many would argue that there are not enough incentives right now for data sharing, so this policy might be very impactful when you think about that as well. So, we also acknowledge that the fields within the life sciences might be differently impacted by these policies because these research communities have different amounts of existing infrastructure standards and research practices around data sharing. Especially since their research and methods become very specialized. So with that in mind, we wanted to invite three CMU faculty that work in different fields of the life sciences. And so I loosely describe this as psychology, neuroscience and genetics and genomics. And so we're excited to hear perspectives from their research communities today. So I'll just start with some introductions. Jeanine Dutcher is a senior research scientist in the psychology department and the lab director for the health and human performance lab at Carnegie Mellon. Her research uses multiple methods, including neuro imaging and psycho neuro immunology to examine how positive interventions and experiences may lead to reductions in threat and stress responding. Joel McManus is an associate professor in the biological sciences and computational biology departments at Carnegie Mellon. The McManus lab studies the genetic and molecular mechanisms of gene regulation with high throughput sequencing and computational methods. And Eric Itry is an upper Lee family associate professor in the biological sciences department and the neuroscience institute at Carnegie Mellon. The Itry lab studies the functional interactions of the motor circuit that lead to skilled behavior using a combination of behavioral physiological and computational research methods. And I'll also note that Eric was a collaborator with Anna Wajan and myself when we organized the first open science symposium back in 2018. So thank you, Jeanine Joel and Eric for joining us to share your perspectives on this important topic of data sharing. I'm also excited to introduce our moderators Anna von Gulick and Wajan Wong, who are my former colleagues at CMU libraries and the co founders of the open science and data collaborations program. She is currently a fixture where she manages repository projects for US federal agencies and public and private research funders. She also leads fixtures data curation service. Wajan Wong is the director of programs at the center of open science where she oversees the development and implementation of training, education and consulting services to help research communities engage in open and reproducible research practices. So thank you for your participation for your help with this event and your participation in it and it's been really fun to work with you again. And finally, I'd like to thank my colleagues at CMU libraries, Tom Hughes and Katie Berman who've helped with the logistics of this event, they are helping with zoom right now so really appreciate their help and thank you to them. So before we dive into the conversation, I just want to note that we are recording so if you would like to turn off your camera that would now would be a good time to do so. Thank you for the recording after the event. We welcome questions from the audience for our panelists and so if you think of them during the event just put them in the chat and we will be sure to leave time for some of those at the end as well. And so with that, I'm going to turn it over to Anna. Great. Thank you. Thanks everyone for joining us today. And thank you to the CMU libraries for the invitation. It's great to be back. I drew the short straw a little bit and I'm going to cover the challenges first and then we'll get to the opportunities that increase in increased data sharing might create across these different disciplines. And I want to start off by addressing first the NIH policy. It's a little bit the elephant in the room I don't want us to spend the hour, going over the details of policy compliance I don't speak for the NIH. The new NIH data management and sharing policy is going to be a large driver in these biomedical desk disciplines that will greatly increase data sharing and will hopefully make data sharing and data reuse and important practice in these scientific fields. And so to make sure we're all on the same page about what the NIH policy states this is a policy that went into place with new awards beginning January 25 of this year. And it says that all NIH funded research that generates scientific data must submit a data management and sharing plan that will be evaluated an ongoing basis and this applies to extramural grants and contracts as well as actually to intramural research at NIH. It says that the data needs to be shared regardless of the supports of publication or not should be shared as soon as possible, at least by the time of publication, and that while not all data needs to be shared that results from NIH funded projects. Broad data sharing is what is encouraged data that supports replications and no no results and so the goal of these data management and sharing plans is that they should maximize appropriate data sharing and leverage existing data standards and existing repository resources as appropriate. There are of course NIH Institute specific policies as well that give more information if you get funding from an IMH they have specific ideas about what they might look for and program officers but that's the that's the lay of the land so it doesn't say all data must be shared it says data management and sharing must be planned and you should maximize data sharing. So with that context out of the way I want to pose a first question to our speakers, which is simply to say about the policy this new NIH policy. What was your reaction. What is the reaction of your colleagues and your research communities, and how would this, how will this impact your work. So, where should we start Eric do you want to kick us off. Sure. I think that you're the one who submitted a data management and sharing plan to NIH under this rule so I thought maybe we'll start with you and your experience. Yes, I've already undergone that that that step. And I think it's useful and other organizations like NSF have had something like it for a while. The downside is that to me there's very little explanation of expectations as far. I think it's because the NIH itself doesn't know and it's still feeling things out. So the difficulty as a hopeful grantee is that you don't know what the bar is, as you and I sort of said, it's, it's quite vague as to what needs to be and what might be and how to do all the things. There's scientists on the other side who might benefit from this change, which I do think is important and change rarely happens on its own. There are sorts of questions of how do you pay for storage, how long do you need to do it and what issues you have to overcome in terms of just the discoverability and access if it's in a file type that I've never heard of. If you can't find it, these sort of fair principles can really make or break what the NIH I think is rightfully trying to do, but we're going to need a lot of growing pains for that. Thanks. Jeanine, your thoughts your colleagues thoughts on the new policy. I mean, I think generally this is the way psychology has had in some ways a crisis of faith on this. I think this has been a big topic for a number of years and there's been a lot of discussion about what kind of open science practices should look like for the behavioral sciences. And there's some tricky kind of privacy issues that I think come up in particular when you're dealing with human subject data, you know, in terms of identifiable information and protected health information and things like that and I think those issues are going to kind of persist to some extent without clear guidelines, but I do think that generally the fields been moving in this direction and trying to figure out how to do this well. I think that, you know, to some extent, echo what Eric was saying about lack of expectations can be challenging and I think one of the things that is true is that setting this up as a policy at the funding level and that you have to have a clear plan when before you ever start funding is good because I think one of the things that's really hard is trying to shift practices in the middle of data collection or figure out what to do. There are some issues with IRBs saying that, hey, you didn't tell participants when they signed their consent form that you were going to share their data even if it's de identified and so you can't share that data yet or sensitive issues about, you know, certain mental health kind of conditions and where people want how they want to share de identified data there and sort of being, you know, protecting our participants while also doing good science and reproducible science and making sure that we can share this broadly so I think putting a policy in place is a good step and I think hopefully it will force the conversation a little bit about how to do this well. And you know some of the barriers that come up and sort of the reactions that I think a lot of the, the, you know, more senior faculty in behavioral sciences are often really wary of this kind of stuff because it often feels a little like you're giving up your property and so trying to figure out how we change the culture in such a way that it, it feels collaborative and like people aren't out to get you or they're not trying to prove you wrong and, and some of those cultural issues I think still are have not fully been addressed and so I think those are some of the immediate reactions that come to mind in terms of how generally the discipline of psychology has approached this. And I think one of the challenges is also how you share multimodal data and how that kind of, you know, if we do neuroimaging and we also do, you know, blood draws where we're asking for various inflammatory biomarkers and we're also doing anything genetic I'm you know Joel will have a much better answer to this but I think there's there's some there's some real challenges with sharing large multimodal data sets and how you do that well and so I think this is a great step initially for forcing that conversation and really trying to make sure that researchers have the tools they need and are exploring them because I think a lot of them exist and have not often been pursued. And this is, hopefully, hopefully this is the direction that these conversations go in and, you know, addressing both the the logistical barriers and as well, the cultural barriers. Yeah, great. A lot more, a lot more we're going to dive into there. Joel, a first reaction from you. Yeah, so I guess, in my field of genetics and genomics, a large segment of the data have been shared for the last 10 or 15 years because of the problem arose earlier on with this. But very large data sets of sequences of DNA were being generated. And it was acknowledged at the time that we didn't really even have the tools to completely properly analyze the data and understand them so there's there was a framework put in place. It was a long time ago to store data and share it publicly through the NCBI, which the US government funds and supports. But there are other types of data that have not really been accessible to people and the NCBI database is very difficult to navigate, because now there are hundreds of thousands of experiments recorded there. And so there are questions of, you know, how long is it really necessary to do that. How useful is it how to access the data. And there are many other kinds of data that are not currently shared that I think the new policy should help encourage people to share those kinds of data that are not currently reported sometimes things are reported in ratios and the raw data are never reported. I think overall this is a very positive thing. There aren't that many guidelines as to how to do it. But I think over time that should get resolved. And I think it's interesting to hear from other fields because you understand the data types are extremely broad and varied. There will be very different reactions and cultures to those things, but at least in genetics and genomics that there's been a culture from the explosion of this the last 15 or 20 years of trying to share the very most raw forms of the data. I think that people can actually go back and use them for whatever purpose they find, including, you know, proving you wrong, which I think is probably what, as scientists, one of the main things that we, we should actually encourage instead of being scared of. So this is one of the challenges in scientific research is that, you know, for our own egos and self self worth we want to believe that we're always right, but the whole point of science is that we keep proving older things wrong. And that's often how we make the most progress. So, I think that that's an important part of maintaining and sharing data. And also kind of disconcerting one on a personal level and worrying about what people might find. So, those are all things that that need to be navigated and I think it's a very positive step that the funny agencies are starting to require this. Thanks, thanks, yeah, I think this is a hopefully we'll really shift the culture. It's a little a little faster than it was already shifting in some fields, but you know makes a certainly a good one to look to. So, my next question, I wanted to dive into something that each of you touched on partly, which was in your disciplines, how prepared is your discipline to handle widespread data sharing. And what are the gaps that exist different disciplines are more or less mature in this area in terms of standards training for their trainees in data management sharing practices, common practices formats and documentation to make data reusable, and also discipline specific repository resources. So, I, I think, you know, Joel you touched on this in terms of the sophistication that's already happened in genomics, but maybe we'll circle back to you to keep that thread going. What are the, what are the gaps that still exist and where do you think more maturity needs to come. So, yeah, so in genomics. Now there are these public data sites where you can deposit data but the problem is really in analyzing the data. Again, if you have to download and copy it and it's very slow to do these things. So there are efforts being made to try to link it to computational resources in the cloud so people can use data more easily and reanalyze data without a lot of, you know, network transfer. And I think that's one area that could really use a lot of development is, you know, having distributed analysis platforms that are available and easily accessible to researchers. Another issue with this amount of data has just been cataloging it and being able to find what you want to work on. And having the correct metadata for for experiments is extremely important so people can know well, you know, maybe your experiment doesn't match what was published because there's some environmental difference or there's some difference in how the experiments were performed in order to know that you really have to have very detailed metadata. So the other challenges as we go forward there are new data types that really no one has figured out exactly what to do with. Traditionally these have just been sequence data from a single sample. And now the sequence data are being converged with imaging data and something called spatial transcriptomics and sort of a frontier of new new techniques and approaches to examine the expression of genes and different locations and in biological tissues. And that creates, you know, an order of magnitude larger sizes and files and they really aren't well appreciated or standardized file formats and these are all things that that complicate the issue of sharing data. I think that you know Eric alluded to also the file formats and things so there are challenges be as as technology improves the data sets change the types of data change to kind of creating uniform ways to store and process and access data. I think those are the kind of the bigger challenges going forward. Yeah, love the shout out to the metadata, high quality metadata. Jeanine or Eric, what did you want to pick up with your discipline. Next. I jump on the metadata thing which immediately came to mind. It is, as someone pointed out a cultural thing a lot of times it's not well recorded, but also can be a huge cost in order to properly record the metadata. They have to train whoever's recording it to to record it appropriately, and they have to take the time to do it. And currently, not to get back to the elephant in the room but currently that there is no allowances there for either training of students or that amount of time to be invested in it. So I think that's a big thing in neuroscience and neurophysiology we have had a sort of standard that is existed called neuro data without borders. Everyone knows about it. Very few people participate that may happen in the future. But I think that's, I think metadata and just the resources to store our lab generates three terabytes of data a day. That's too big for fig share fixtures fantastic, but that that's a hurdle that maybe the NIH will come up with like they have with proteomics and genomics but we'll see. Yeah, I think, you know, in psychology maybe we have a lot of people thought a lot of surveys or there's behavioral experiments so the, the actual data points maybe very few. And so sharing those kind of data don't seem to be a particularly challenging thing there's all kinds of different types of repositories. There's no consistency in what people use and how they format those things but I think that is relatively something straightforward that could happen. I do think that a lot of journals have for a number of years required data sharing in some capacity in order to publish there and if you've ever downloaded those supplemental materials or those pieces of data it's horrible because there's genuinely no consistency whatsoever and sometimes I don't even know what I'm looking at. Some of it's probably a metadata problem but some of it is also just so many labs have very specific idiosyncratic ways that they label data and that they they sort of process their data. And one of the problems at the level psychology also is how raw do you want like item specific numbers do you want that what they typed, if it was ever on paper, which hopefully isn't anymore but there are some data sets where people are never or if there's something that a participant actually has to physically do how do you translate that is it just reaction time, you could just share the reaction time. Is that useful to people and, and so many of the kind of things that happen in psychology experiments are often the methods are really the essential piece to result in that data and so how you share a data set really needs to be structured within the methods used to collect those data. And I think that's that's one big challenge. Similarly with neuroimaging data just fmri for example, the task specific design matrices and how you share all of that really influences that are usable from the sort of data sharing from actual scans and so I think there's some challenges there, I think when you have a data set that most of the data I work with is multimodal right will have behavioral outcomes neuroimaging outcomes will have neuro inflammatory markers whatever is going on and sharing those and some sort of way that is useful when the formatting for each type of data might be a little bit different can be one challenge that I think there's not, you know, every lab has something so differently that there's not really a consistent pattern and that for me literally in my data is one of the challenges and we have one data set where we used an app on a smartphone to collect a lot of passive sensing data. And it's huge it was a whole semester hundreds of participants, and sometimes those measurements were happening every, you know, 15 seconds or whatever it is there's location data every 10 minutes. And having data like that is, you know, I'm sure it is not three terabytes a day so I don't know what is going on over in my cheer lab but there's definitely, there's definitely a lot of data and trying to figure out how to share that in a way that's useful to people is is one thing I think what has happened is a lot of people have been forced to share data and so like fine you get ever whatever format I have it in and I'm not going to do any work to share that in a way that's reusable for anyone and I think that's one of those cultural problems that I think comes up. People feel like they don't want to get scooped or they don't want to share too much of their data before they get the chance to publish on it so why would they format it in a way that makes it easy for anyone and so I think there's some of these like cultural things that again back into the fray that influence what barriers there are but I still think about just like a lack of consistency but also just how raw and what format is reusable and I think that that's a nice term that you brought up on is the the reusability of these these data is is key and I don't think there's a good answer at least in the behavioral sciences yet. You feel like there's standards and people are intentionally not adopting them because of effort, or that they are not applicable to enough use cases. I think they're thinking of bids and think and open neuro and things like that, but they're not that widely adopted. And I think there's so many lab specific things like if you're working for a PI who doesn't know about or doesn't care that deeply about this, they're not going to teach you you might learn from a postdoc you might hear about it like you know I just stepped into a role to be the co director of the bridge neuroimaging imaging center and we are supporting bids and other sort of neuro imaging sharing practices but even if that is available to you how do you do it. There's a learning curve and I think, if that is not a big piece of your training as a graduate student. You miss that then it becomes super laborious and the long run to be able to try and figure out how to adapt your protocols to be able to share in ways so I think some of it is disingenuous but I'd say a lot of it is just a learning curve. And that has that has gotten in the way of some of the I you know I tried to go back to old data to make it more bids friendly and that took over a year to get like scripts and everything written by people who are experts in writing these scripts to get it to a position where I could make it sort of serve the bids format so I think there's there's there are standards and practices but getting trained in them is still a little bit the Wild West and it is so PI specific that I think that hopefully something like this where if it at a funding level because journals if you sent them a file they'd go okay. Sure. So planning this in advance and sort of saying to NIH this is how we're going to do it. I think helps kind of structure the change that would support this. Again, there are there are financial issues that come up. And I think that's maybe something NIH could put you know, tag some money for supporting these things in these grants but that that training pieces is a key one that I see being a problem here. And I should define our terms brain bids being the brain imaging data standard for neuro imaging for MRI data although it extends to many modalities now in the EG and an MEG and stuff as well. Great, so I'm going to ask one more challenges question which we were just getting to there with Jeanine, which is what support would you need and where might this support come from to encourage broader adoption of best practices for data management and sharing what might support come from a department or college or institutional library level to kind of bring it back to the CMU libraries. What some support might come from your funder. In the case of NIH would that be financial support or would that be resources or best practices dictated from their side. So whoever wants to jump in first I'll leave the floor open. In a lot of ways we, we, maybe not the sources but identified them, some form of training, which maybe is best coming from an entity like the library. And just thinking about, and to get jugged out of the way I brought up fair principles before but that's findable accessible interoperable and reusable which are terms we've already use. Yeah, I think one other one would be storage, which I mentioned fig share Pittsburgh super computing has some storage. There's other things there. But unfortunately, the, the first one findable that almost has to come from big entity that's nationwide or even international. But certainly in terms of getting it there. That's probably best done at a local level. There's this broader ecosystem of resources and infrastructure underpinning this whole thing, which is, which is what my day to day is about a lot of repositories working together with the people that create data set search and. Yeah, but there's the there's the local infrastructure as well, especially for storage, which is costly. Joel, were you going to jump in. Yeah I was going to agree with those needs for for storage. So we have, it can take quite a while to submit data to be shared, even with this framework we have now within CBI and the several databases that they have a sequence read archive is called SRA. And for every genomics paper where we produce new data, it can take us an afternoon or a day of our time to submit those data and, and it's it's something that you have to pay a lot of attention to during the process it's not. It's not like dragging a file into a, you know, Google Drive to share it's a, it's a very long and kind of tedious thing. If you take it seriously and fill out the metadata properly it can take a lot of your time to do, especially if you have dozens of samples and places very. So, you know, having support for that sort of thing would be really, could be really great. It's just something that you know eats up your afternoon or your day or your students couple of days depending on how familiar they are with it. And I think more support in accessing and reusing data would be really great so one thing that that comes to mind with this is that I don't, I'm not sure how well we train our, our undergraduates and our graduate students on how to find these things. It's part of the, the challenge with so much information being accessible and available. If it's not used it's not really doing anything it's just, you know, store and being stored on the compute server someplace. So, getting access to the data and, and finding ways teaching people how to access the data and what to do with it, I think is important. So it goes into, I think, faculty courses. A lot of graduate courses for PhD students are reading papers and discussing them. And I doubt many people at all will say let's discuss the supplemental tables. So what did you guys think, could you understand what each column meant with the way it was described you know so I think that we as faculty are not really supporting our students to learn how to responsibly package data to be shared and to access data for sharing. I think you know there's not, there hasn't been a great culture of that. This is a lot of, a lot of things that we will probably need to do better in a cultural sense and science, then just to post it online and say okay we, you know we're done, I think the other side of what's going on is there are probably huge numbers of repeat experiments that are not necessary, but are done just because we don't know how to find the answer that's already been published that's out there. So, so I think all those things would help support scientists and sharing data and accessing it in the future. The training thing seems like a key one, and getting people on board with some sort of structure that serves them I think one of the things that you know, I've noticed even when I am eager to figure out how to share data well, I'm faced with a multitude of options. It's hard to assess what the best strategy is. And, again, I think there's like a cultural I hate to bring it up but I am the psychologist in the room so okay. There's a cultural problem in in sort of like if you are communicating with a PI who is less enthusiastic about open science practices, or who's enthusiastic but has done none of it. They're not really going to be as helpful in helping you kind of parse what is necessary and what is helpful for future people and I think, you know, sometimes we learn best by seeing what other people do so to, to Joel's comment about structuring the course, even just downloading a data set and being able to figure out what they did well and didn't do well and sort of trying to learn from what exists out there and trying to improve some of those standards, I think could be an important learning opportunity and also way to make sure that we're getting data to people that they can use. Which I still think is a big problem, at least in psychology data. And, you know, the how long we store it question that I think Eric brought up earlier, still is bouncing around my brain a little bit because you know medical records you store for seven years or whatever it is. And then the doctor's office, you know, back in the day could shred that. And that's not what we want to do with any of our scientific data, but you know there is that file drawer problem where if it was no results and it wasn't that interesting, no one cares that just goes out into the ether and something else comes up later and actually that would have been really helpful to learn about or to know about or whatever and so there's also kind of an incentivizing problem where you're not going to publish those data, maybe they can't get published and so then they're just to something that then tells people that those data exist and you just get into this position of not knowing what is out there and how to even use it even if it were and I think it just it feels a little bit like there's a lot of, there's a lot of challenges but the training one seems to kind of fit a lot of this like if we did had some consistency and could train people and consistent practices. So I think that would be really that would be easier to to find the important data sets. Yeah. I think that's great and I think that rolls right into the opportunities and the incentives so I'll hand it over to watch in. Thanks, and thanks for all the great discussions is just really rich discussions I hear a lot of, you know, the issues with the technicalities the technical difficult barriers for the different types of data, the size of the data and also the cost of data sharing and the student training and also Janine you brought up the other facades, the culture, the research culture, the culture change is important piece. So yeah that's a natural segue to talk about the potential opportunities and changes that might be happening. And I think I want to dig a little deeper on the cultural change of things and just kind of. Do you think that we are going we're heading to the direction where. So these, these policy changes are kind of padding one step towards the right direction, right. Do you think practically in the near future are we going to go to a place where the sharing data and research outputs are going to be a viewed as a standard instead of a burden actually work and also with activities. Thinking about how we evaluate research right with whether it's going to be viewed as an important professional accomplishment, as opposed to just something like a box that I check. So maybe you, Jeanine, do you want to take the first step. So I think the currency of academia is publications and grants. And so having some, you know, those data sharing is not necessarily rewarded. And so having policies at these journals and at these funding agencies that supports data sharing I think will hopefully make it more standard and I think a practice that then becomes part of how you do science. But I think the initial kind of perception is going to be that you're being forced to do it. And so I think that is, you know, maybe a little bit of a cultural barrier and I think it'll feel a little bit like there's not, you know, there's some journals I know of in my field have been giving papers badges when they share data in appropriate ways and it's, it feels a little bit like a sticker chart but maybe that's maybe that's helpful for some people and maybe that's enough of an incentive to sort of say like, hey, we have, we have a bunch of, you know, data sharing stickers on our sticker chart gold star for the for the for the year but but I do think that there's some level within the institutions that can kind of support data sharing and data science as well. How does your tenure review your annual review how did they see open science practices and are those incentivized and rewarded and I think that those are the kinds of things that will make it much easier for it to be a standard practice. And that said, I just a small little asterisk on grants and publications being the currency to because I think that can cause a lot of weird behaviors in terms of splitting and splicing data. In terms of, you know, quality of publications and things like that so there may be a broader cultural shift that this kind of accompanies as well about how we how we incentivize good science not just productive, plentiful science. Joel or Eric, would you like to chime in. Maybe, maybe I could add a little bit so yeah I agree that the main problem is that there isn't a lot of incentive and it's sort of a disincentive because we have limited amount of time and we, we don't get paid in any way right now we don't get any, any immediate gratitude or benefit for sharing things properly. So I think actually, you know, one of the currencies of science we all have for better or worse, like, there's a number called an agent decks and an index that that tracks how many times our publications are cited by other people. Maybe we need a data index instead how many times our data sets are downloaded and used or appear and referenced in journals as being used for other people's work. So maybe keeping score of that in addition to how many times our work is cited would be good because oftentimes papers are cited just in passing saying oh this person showed XYZ. And other times someone will actually cite the paper because they use the data. So I think that's something that might be able, you know, we might be able to automate as a community to go through publications and pull out when when they're actually used for their data, and you know, actually record that and give people credit for it. I think actually also training people to use public data would increase people's value of sharing their own public data so if you're commonly downloading public data sets and using them for your own research then suddenly you feel more compelled and to actually go back and repay that instead of paying it forward to other people saying oh I benefited from this so I want to make sure I continue to do this. So I've done the training and increased use of public data would actually also incentivize people to want to deposit their data in usable ways. In this discussion I've actually kind of thought about some of the infrastructure that already exists. I have two major grants that are specifically designed to put me together with someone who's more computationally minded. And so, for every biological experiment I do, in theory I get another paper or two cited, in which I've given that data to someone else and they've found out things that that I either can or don't have bandwidth to do. And so there's actual mechanisms for that. And so I decide steps many of the problems that we've been discussing but in some ways that very purposeful and and planned out mechanism for sharing may be more useful. In the remaining cases, it may be valuable to make a distinction between the probably useful and the potentially useful. If it is a very small niche study. This is much ado about nothing there's always the, you know, who knows it might be big someday or save someone an experiment. But maybe instead of everyone must save all their data and have it all accessible and pray that people get metadata. You can either turn it into a supplement in which people, you know, if you're going to be generating large data volumes that many people in the community might be, might enjoy using you, you have that as a supplement, or, or an extra on your grant. Rather than assuming that all data are equally usable. And that might also help with findability as well by just not flooding the marketplace. Random thoughts but yeah, there, there is some infrastructure already there but it's, it's very small and personal bowl rather than institutional. Yeah, the intentionality of that because I think it fits with the grant cycle right that you're designing a study with the purpose of sharing the data in a way that's reusable right and I can think of large scale neuro imaging studies that are designed like this from the beginning right to generate very large data sets designed to be reused and not even by neuroscientists also by computer scientists right so that's the data sharing influencing the science. When the attention, the intention helps focus the attention and getting everything in line and in those cases oftentimes there is money allocated and accepted that resources will be allocated towards that goal, in addition to the purely scientific exploration enterprise. That's great. So, a lot of these culture discussions sounds like we're kind of landed on the incentive piece right like it was talked about intensive for like for faculty promotion and for grants for recognition and for paying community back right. And so what. So was all these how was all these So how from here to move on to the like being open and data sharing as viewed as a professional accomplishment how do you think we should evaluate give you a credit for all these right and kind of measure the impact so that we can this is so that it becomes part part of excellence excellence is that I'm just something we ought to do for the greater good. Yeah, so I think there should be some kind of scoring system. I mean that that's just one proposal for for kind of keeping track of this so that people actually know how often some PIs work has been used before whether her lab has generated a huge amount of data that is used over and over again. I mean, if that's the case, then it would be nice to be able to quantify that. And that that might drive as an incentive that you know people can brag about that a little bit and go see on their Google Scholar page, not just their age index but their d index, you know, if you want to whatever you want to call it and see that grow and feel kind of inspired by seeing other people who have that as well. And Janine or do you have something to add. I'm less optimistic about another measure that all of us will claim they don't want to use like impact factor. But yeah, journalists have tried and they don't have the infrastructure to do it will to various extents NIH may have it but maybe this brute force will will lead to small and meaningful changes. But I think there's a lot of, you know, discussion about publications and grants being the primary way that you get the incentives. And there's a lot of kind of hidden work. I mean, even just reviewing papers seems, it seems unbelievable to anyone outside of academia that you don't get paid for that and you just have to do it and that there's recommendations that you're supposed to review the three papers per paper, you know, some, you know, there's numbers and there's no incentive for that and honestly, if you do that sometimes you can't do papers or grants, because you are spending that time doing that and so I think, you know, all that hidden work is this starts to feel like this is part of that hidden work right that is part of that, that extra labor that gets tasked to someone and someone junior and someone female and someone, you know, all somehow this all gets down and it's there's all kinds of inequalities that come up and so I think it's kind of an important thing to think about how that, like who leads that. And maybe at the level of the department they need to sort of incentivize by, you know, encouraging and setting things up and funding and those incentives to make it less difficult or less challenging. Maybe there needs to be sort of, you know, department, like personnel that actually can support you I think the libraries have been trying to do this, I don't think that everyone even knows what kind of capabilities the libraries have. And, honestly, I'm learning about it now which is probably how I got invited to this is because I've been talking to different librarians about the problems that we have in this and so I think, you know, those kinds of supports are maybe available and maybe well known, but also I think would be a really helpful way to make it feel like it is not quite as much the hidden work and instead is incentivized by making it easier, you know, sometimes it's not about necessarily rewarding it per se but making it less challenging logistically too. That's, that's great, that's a great point just kind of make it easy and rewarding at the same time. So, for the sake of time I guess we're five minutes left, I want to save some time for the audience questions too. So I'll pass it back to Melanie. Okay, thank you and I think this question of who leads this work actually segues into our question is from Cheryl Tomer who's also in the biological sciences department at CMU. And she says, yes, having standards provided by a guiding body would help to do better science because all metadata would be recorded and proper controls performed, who should do this. So I think it would be hard to remain the measure of productivity, then it should fall to the journals or at least in collaboration with the funding agencies. We are shifting from paper recording of protocols to electronic and therefore researchers need to detail how all of the data was collected, processed and used. So I think this is the question of who who leads in creating the standards that the funding agencies are the journals involved and Cheryl also feel free to unmute yourself if you'd like to answer that on that point. Not so much I think it's I don't think we know right it's it's kind of it's just up in the air. I do feel Carnegie Mellon can make have an impact because of the strength of the computer science department and people look to Carnegie Mellon as an example for a lot of this so if faculty need to be supported somehow. How that is, I don't, I guess I'm not sure. I believe that if there if if someone would say you know you need this and this and this on your data, people would comply, but you're not just going to make stuff up because of a nebulous plan about data sharing right if you would all comply if you were given a mandate to do that. So, I feel like NIH is expecting people to just come up with, you know, what they're going to do, and I really think they should give a little bit more, or else give a whole pile of money to get people to do it. I just read something about our page putting some money into a data structure grant so maybe, maybe that's a faster way to do it than through NIH. So we'll see I will put a little plug in for the cloud lab because all of this data is collected there. Every item, you know, right down to the temperature of the room when the experiment was performed but it's only, you know, good for certain kinds of experiments so but that idea of having every piece of the environment and the protocol collected is interesting. I think one really key thing that comes up NIH funds a wide range of disciplines, and there are some discipline specific issues that would come up and if you wanted to have some sort of standardized protocol, trying to imagine a clinical mental study, paralleling a study on, you know, frankly, mostly any kind of NIH related thing is challenging and I think that is probably why nothing has happened thus far is that there's some idiosyncrasies that are going to be hard to contend with and I do think that there's a way to do pieces of that right like there's a way to kind of have some guiding principles and themes, and then implementation that matches the discipline specific needs and, and, you know, I don't imagine that that is going to be an easy one to probably require some, some time but I do think that that is a challenge that has made this hard and then is made it really hard for interdisciplinary researchers like myself I have a lot of trouble knowing because, you know, one journal might have some policies that another journal doesn't and I publish in both of them regularly so which one am I going to use as the kind of overarching guidelines for my lab. I don't know. So, great thank you and I think that actually, you know, that point you made about how the NIH is, you know, involves many different disciplines, and that was, you know, I think the original idea behind this panel was, you know, to hear from these different parts of the NIH funded research ecosystem and I know I learned a lot about what that looks like in genomics and psychology I'm more familiar with neuroscience because my own background but I just want to thank all of you for there's so many interesting insights shared today. And I will say just the last minute, Jeanine alluded to this, I think CMU libraries does do a lot of work to support open science. It's always a bit challenging to get that word out to everyone on campus. So, I do encourage people to check out our resources get in touch with us. We're really, really creating new types of services and support around this type of work which is evolving pretty rapidly now. So, again, a huge thank you to our panelists are moderators and my colleagues who are behind the scenes, helping out. I will say that a lot of the themes you touch upon today we will be talking about at the Open Science Symposium on November 3 as well we have a whole panel dedicated to this issue of promotion and tenure. So, if you are an audience and you wanted to hear more about any of this I encourage you to sign up for our virtual symposium as well. So, with that, thank you so much for attending, and we will be sending out the recording soon. Thank you. Yeah, thank you.