 Yeah, I warmly invite you to be here and listen to our speakers. So I would like to invite Dan Hamilton to the stage, and we need his slides on the screen. So Dan will be talking about frequency of data and code sharing in medicine, final results of an individual participant data meta-analysis of meta-research studies. Welcome Dan, the floor is yours. OK, hopefully everyone can hear me OK. To apologize in advance, I've lost my voice since coming over to Australia, from Australia. You have to bear with me a little bit. So my name is Daniel Hamilton, and I'm a PhD candidate at the University of Melbourne, Australia. And it's my great pleasure to be here today to present the findings, the final findings, as well as particularly the challenges of performing a systematic review and IPD meta-analysis of meta-research studies that looked at data and code sharing in the context of medicine. So just before I jump into the review, I thought I'd be a bit cheeky and prime the audience by asking a couple of questions to see what you think. And that is, let's see if this goes. So what percentage of articles, meta-research studies that empirically studied data sharing in medicine do you think shared their own data? And what percentage of these studies do you think were computationally reproducible when you got your hands on their data? So these are things I'll go through after this. So just to briefly recap what the review was about, and I'll do this briefly because we did pre-print the results. So we did a systematic review where we collected all the meta-research studies that looked at data and code sharing in medicine. And we were primarily interested in two outcome measures. The first outcome measure we were interested in was declared data sharing. And we defined this as the proportion of articles, medical articles that declared that their data or code was publicly and immediately available or available on request. And then the second outcome measure, we termed actual availability. And we defined this as the proportion of articles that were verified by the meta-researchers as having indeed shared their data and code or shared upon request. So we published a protocol. And we've also posted the findings on pre-print or meta-archive. So I'm just going to go over this quickly because I think this is less interesting. So the key things we learned were historically success rates in obtaining data and code from medical researchers have been less than 40 and 25% respectively. And this is very context dependent. So the type of data, the type of study. Unfortunately, public data and code sharing in medicine is still uncommon. However, what we did note is that the declarations of data and code data sharing, public data sharing, is increasing over time. So it's increased from roughly, we estimate about 4% in 2014 to 9% in 2020. However, what we learned or what we think is that the number of people who say their data is publicly available doesn't seem to be actually the case. There seems to be a gap between the number of people who say their data is available, but when you go to have a look, you don't necessarily going to find it. So that's the review. And I thought today, given the audience, it might be more interesting to talk about the challenges of performing such a review. It might be more informative. So I thought it might be good to go through five challenges that we experienced in performing such a systematic review of meta-research studies. So the first challenge was conducting the literature search, second was obtaining data for the review, the third was performing our data integrity checks, processing data for the review, and lastly dealing with redundant assessments across meta-research studies. So in terms of the first challenge, we worked closely with an information specialist at Cochran to design the search strategy for this. And what we noticed straight away is that the studies that we knew that would be eligible were often referring to themselves in very different ways. So for example, some were referring to themselves as meta-reviews, others meta-epidemiological studies, cross-sectional studies and so on, which complicated the design of the search. And at the time, this was in July 2021, we noticed that it didn't appear to be any obvious or reliable controlled vocabulary for us to use. So this is, for example, mesh terms in Medline or entry terms in M-Base. And so it wasn't a surprise to us that we found that about half of the articles, we found half as many articles in our gray list of searches as we did in our main database searches. So the things that we took away from this were, we wondered, is there a standard term or phrase that we could use in this space? And good meta-science as a field actually benefit from controlled vocab. And if there's any information specialists in the room that have any thoughts on this or are aware of controlled vocab, I'd really be interested to hear about it. So in terms of obtaining data, so after we did all of our article screening, we found that about 114 studies were eligible for the review. And when we looked into these 114 studies to see how many of them shared their data. So again, these are studies that looked at data or code sharing and or code sharing in medicine. We found that 70 or 61% shared all of their data publicly. So all the data we needed for the review. 18% or 20 shared some data, but not all that we needed. And when we asked for it, we were able to get nine of 20 of what we needed. And then the remaining 24 or 21% didn't share any data at all publicly. 11 of whom shared with us when we asked them for it. I should say four were not applicable, three stated up front that they couldn't share and one we couldn't source contact details for. So what we found overall is that of the 40 requests that we made, 20 were successful. And what we also found is that when we asked for data, the median time to receive the data that we wanted was seven days. However, it did range up to almost 2 thirds of a year, which is not ideal. So in terms of the third challenge, we performed a series of data integrity checks. The one I'll talk about today is just simple computational reproducibility. So we looked at the report, what percentage of they said that the articles had shared data or code. And then we tried to reproduce this number from the data set that we had our hands on. And what we found, so we had a hundred data sets of check and we found that two, there was a mismatch between the report and the data. And that was due to very simple typographical errors. Two had unclear data filtering steps, which meant we couldn't reproduce the findings. We discovered one coding error, which is due to a trailing space issue. And we weren't able to reproduce one finding due to we found out receipt of incorrect version of data. So this worked out to be 94% reproducibility rate, which was great. In terms of the fourth challenge that we experienced with this review, by far and large, the biggest challenge that we had was the only information that we had about the articles that the meta researchers studied was the title. And so this is a snippet of a data set that we included. And obviously, without any other further information, like a DIY or a journal of pagination, it's quite difficult to figure out what this means or what this paper refers to some. So we were kind of a bit stuck in some circumstances. Though it was salvageable in other circumstances we weren't going to today. The other issue we experienced is data was often over processed. And by this, I mean, important outcome measures that we were interested in were often dichotomized into yes, no. So, you know, the number of studies that said that their data was publicly available and the number that were available on request were dichotomized into a yes, which wasn't what we wanted. And to a much lesser extent, some other issues we dealt with was data being divided across several spreadsheets. In one instance, up to 20 data being corrupted or stored in proprietary formats. And some instances of data being color coded, which was a lot of fun to figure out how to deal with that. So overall, some lessons that we learned were if you're studying published research, if that's your research subject interest, ideally you want to, it would be good to include a permanent unique identifier or at least a unique identifier. So a DOI or an ID from some of these databases that you use to search for these articles like PubMed IDs or Scopus IDs. And to consider your processing level when you're storing your data, consolidate your data and store it in a non-property format. And so the last challenge I'll talk about today, which is probably the most interesting to me and the biggest surprise is that we noticed that of the 114 studies that were eligible for the review, many of them sampled the same articles and some of them overlapped by 100%. So some studies looked at the exact same sample of articles. So just as an example, there was a preprint by Summoner et al. It looked at COVID studies and 100% of their studies were subsequently looked at by Stritch. And then both of their studies were looked at by McGinnis and Shepard. And then all of their studies were looked at by Collins. And then the text minus came in and looked at everybody's studies. So then it raised a really important question to us, which was if five meta-research studies are looking at the same article, whose assessment do we use or who do we trust? And something we had to think about. It also raised some interesting thoughts about this is yet another important reason why meta-research is, I should ideally share their data. It's also important factor I think to consider when performing reviews of meta-research studies as well. So just to round it all up and just conclude, just some concluding thoughts on this front. So obviously performing a systematic review on IPD minerals in this space, it was disappointing to see that many of these studies did not share their data. I didn't speak about in this talk because I didn't have time, but many of the studies that did share the data publicly were also not fair compliant or at least compliant with basic fair principles. So it'd be good to see us lead in this space. I'll set a good example for other fields. However, in contrast, it was really encouraging to see that a minority of studies were irreproducible. And I can say that the interactions that I had with the meta-researchers on this front was fantastic, barring a few edge cases. People were really good to work with, really humble. It was really great. And then the last two things that I'm thinking about is just more generally when planning and reporting and archiving your research products to consider about some secondary users using those products as well. And then for anyone in the room who is actually considering doing something similar, which a review of meta-researchers studies to think about potential overlap. So I was reading a paper the other day about looking at trial and non-publication. And so they were including all these studies that looked at some non-publication rates in clinicaltrials.gov. But they didn't actually look at potential overlap between all of those studies. And they were looking at similar studies at similar time frames in the clinical trials.gov. So it's probably something, it could be something important to think about moving forward. So I'll leave it there. I think I've come out of time, which is great. Here's my contact details. I've posted a link to the pre-print and the protocol if you're interested. And thank you very much for your attention. We have four minutes left for Q&A. So please use the microphones that are on both sides of the hall and ask your questions. Hi, Mitchell Tillman from Stevens Institute of Technology. What do you think, a great talk, by the way. What do you think are the biggest obstacles to data and code sharing? The very low rates were quite surprising. What do you think is the reason for that? Is that specifically in a meta-research context or a medical context? So I'm in the life sciences, so I guess in the life sciences context. Okay. Well, in terms of the meta-research context, I honestly believe people maybe just didn't think about it. And I'm guilty of this too. So a lot of the people, when you contacted them to ask them, oh, sorry, I didn't even think about it. It was a lot of time. It was the kind of response that we get. In terms of at the next level down, in terms of at a medical level, obviously there's heaps. It has been super well studied about sort of barriers to sharing. And in the context of medicine, part of that is navigating confidentiality when you're working with human participants. You know, lack of incentives to share and considerable fears around people, basically lots of academic productivity, like it's gonna affect your publications and your grants and things like that are some of the common ones that get thrown out there. But that tends to be the reason. In terms of the meta-research, honestly I think it was people just didn't think about it or maybe they didn't think it was useful to others. So yeah, it was a bit of a strange irony, but I don't think it was intentionally withheld. I think the issue with not being able to source data from the people who didn't share a lot of the time it was because I didn't get a response or most of the time it was just I didn't get a response. It wasn't some people intentionally trying to withhold, I think, for the most part. Thank you. Hi, and I'm John King from National Student Aging, NIH. I have a really long title I'll admit. Really interesting stuff. One thing I wanted to point out that I'm hope, well, NIH I guess, is hoping that a recent introduction of new data management and sharing plans in the grants themselves will at least place some of the, what you point out is I think you can nailed it. People didn't even realize, of course, despite the fact that they're in the field, that somebody else would look at that data set. And what we found preliminarily, looking back at whether people, whether people's data management plans would have been appropriate given our new rules is almost none of them were. And some of them included some very simple oversights, like you're generating some vast amount of data. Do you know how much vast data that is and where you're gonna put it? And this is at the outset. They know they're gonna collect the vast amount of data and they didn't even put that down. So this is sort of the kind of, I mean, it's sort of surprising that if you don't, unless you're forced to check the box, in some cases you don't even remember that there's a box there. But anyway, people have interest in NIH data management sharing any of that kind of stuff and I've got nothing else better to do where I am. But thanks so much. It was interesting I was reading the NIH policy and obviously a lot of funders now require data management plans, but less funders will check compliance with it. I do notice I'm pretty sure in the NIH update that they say they are going to and there is a risk that if you don't comply with the data management plan that you may have your funding pulled or it may affect future grants. And it's the same with Horizon Europe says they did that too. We'll need to have a data management sharing plan that we approve at the outset and then it's going to be a checklist item on each annual renewal. So yeah, it's gonna be a bloodbath the first two years so then we hope that it will resolve into something better. Thank you very much for excellent questions. I encourage you to continue during a lunch break and we need to move on with our next speaker. Which is Kevin Astelis.