 I'm happy to introduce our next presenter, the Moore-Pierre, presenting our lessons from a computational reproducibility study. Okay, I'm sorry I didn't address it, but yeah. Thank you, hi everyone, good afternoon. I'm Lee Moore-Pierre, I'm from the Institution for Social and Policy Studies at Yale University, and I'm presenting today on behalf also of my co-authors who are PhD students at Yale, Simone de Paulis and Nika Tony. And before sharing some lessons from our study, I'm gonna give you a little bit of background and tell you a little bit about the reproducibility checks that we do at ISPS, and I will also make these slides available after the conference. So I put the slide up for context and definitions to orient us to where we are and what I'm actually talking about. So I'm talking about computational reproducibility. Do you get the same output with the same data and the same analysis? Earlier today, we've heard this referred to as outcome reproducibility or non-deliberate irreproducibility. So I'm talking about computation and I'm focusing on code. This is also a nice chart just to kind of remind us about the path of where we've been with, I think, open science, questions of access and questions of reproducibility as kind of the next frontier. So really quickly, ISPS was founded in 1968 as an interdisciplinary center at Yale, meant to help the social sciences do policy and other applied research. ISPS since 2011 has decided to publish data and code and to ensure before doing so that the data and code are usable and understandable and that the results are reproducible. So I'm talking about an internal process for reviewing the research compendium that includes the data and the code. Most of the research that we handle is experimental, including field experiments and survey research and also including, I'll mention the article that was mentioned yesterday in the talk on internal validity, the Gerber-Green-Larimer study on get out the vote. So we got the data and we got the code if you're interested in that. So why do we do that? We do this because oftentimes when you use other people's data and code, they're not really fully usable or understandable and this causes friction to the researchers who want to use this material. So as I said, I'm focusing on code and just to explain further a little bit, these are scripts used primarily to analyze data and they are mostly in R or Stata in our case. So what we see that oftentimes happens with code is that most of the time it has some bugs and some errors, these are unintended, it causes friction and it does not, in our opinion, reflect on the quality of the research itself. So research or friction is a function of the skills of the code producers and also of the code users, the consumers. When the code that is released does not clean and we're not necessarily talking about professional standards of coders but just clean code. It makes it difficult for other researchers to use. So the main point of this slide is really just to kind of highlight that researchers are not coders and maybe we shouldn't expect them to do everything perfectly the same way as developers would do. So what do we do about that? We do code review and when we talk about code review, we're talking about maximizing the research utility. So at ISPS we have a team that supports the researchers in optimizing data and code that we share for maximum usability. Again, this is an internal and technical review of the code. We do this so that we can help realize the full potential of open researchers to help boost the resilience of open scholarship and to contribute to new research. Okay, so how do we do this? As I said, this is a technical review. Our goal is to make researchers research outputs usable, understandable and fair compliant. So generally we do a bunch of data curation that I won't talk about today but focusing on the code. What we do is we either fix the problem or suggest a fix for it or suggest an option for the researchers to fix the problem. To do this, we use custom-made software that we call YARD and it guides the process. It's an iterative software that allows some iteration within the system to talk to the researcher and fix some of these problems. At the end of the review process, we generate a detailed, what we call an author report and it includes those other sections that I mentioned, information about the files, the documentation and the data, but today I'm focusing on code. So focusing on code, here's an example of the feedback that an author might get. What worked, what didn't work and what exactly specifically we found and what we mean. The, again, the ISPS team works with the authors to remedy the errors before we publish an archive with the ISPS data archive. So what is made public in our archive has already been tended to. As I said, this is an iterative process that results in better materials. I'll mention a couple of other things that are important. The ISPS team intercepts these materials at various points in the research lifecycle. Ideally, we wanna see them as soon as possible, even before submission to a journal. Another great time to do that is in R&R, but in many cases we find, especially recently, these materials have already been deposited or published with journal repositories and we just work on the copy. So really quickly about our study, we took 26 author reports that were published between over a period of two years. Again, this is the first path at running the code. We conducted a structured content analysis using a coding schema that was developed by the Odom Institute and we looked at the comments that were made about the code. We then did some descriptive statistics and what we find is that about 31% of these author reports include comments about code execution errors. So what do we mean by these errors? Typically, these are small errors, as was also mentioned in previous presentation. These are things like typos, file paths, updated packages that are not updated. We also found in about 38% of our author reports that there were some output discrepancies that we flagged for our researchers. Sometimes they're minor, oftentimes they're very minor, like rounding errors or some manuscript errors like transposing table columns and things like that. Sometimes there are aesthetic differences that have to do with colors of graphs and things like that. We work with our researchers to fix these issues again and to produce a better replication package or research compendium. So I guess in summary, we expect to see some noise in the code. We think the code review helps reduce these kind of friction for future users and we think that that enhances computational reproducibility. So for my last slide, okay, I'm good on time. I just want to offer a few big pictures of observations or thoughts. Both from this particular study, but also from basically working on this for 10 years having experience doing this sort of work. I guess the first thing I'll say is that I think that these are sort of expected growing pains of open research. We're moving from access to usability. I'm not saying that we solve the access problem. We still know that in many cases we don't get data or code from authors, but when we do, we have a usability problem oftentimes. So I think this is a work in progress. I think growing pains can be expected. I don't think it's necessarily the researcher's fault, but I think that we need to think about it, sort of have them think about it in a new way now that they're expecting that they're going to be sharing their materials. The second point is about reducing friction. So again, making these materials more usable and more understandable. So we need to imagine how these materials will be used and what the problems that might anticipate the problems that future users might have and try to remediate those problems if we can ahead of time. Specialized skills. So domain experts are researchers. They're experts in their domains. They don't know everything about. They're not professional coders, as I mentioned. They also don't know data curation or data management as experts do. So thinking about working in teams and with other research support functions to help make the data and the code more independently understandable, more fair, and do that alongside our researchers. And finally, shifting the burden. So we have a lot of these research compendia, a lot of these packages of data and code that are already out there. If any researcher that tries to use them will see these problems, will encounter these problems, they spend, there's a cost to that. They spend time cleaning it, understanding it, what's going on. And I'm thinking that maybe we wanna shift the burden from researchers to these more professional support function, community or institutional function. And we've heard today about several also of those to enable researchers to make more seamless use of research outputs. So that's what I have for today. Thank you. Looking forward to questions. We have time for about two questions. Bile path management is the bane of all of our existences. But one question I had is sometimes the ideal is that we wanna just have the replicator just kind of point and click and get the results. But sometimes when you have these little errors that forces the replicator to really get kind of elbow deep into the code and really understand what it's doing. And you learn something from that. And so I just, I don't know, sometimes that we can kind of worry about automating things so much that we're not really kind of understanding what the replication's doing. So anyway, yeah. Yes, so the question is, I guess, is there value in encountering these sort of errors and dealing with them? And so I'll just say that I know that our materials are used in teaching a lot. And so I think, I guess two things. One is they're never gonna be entirely error free as time goes on, we will encounter more problems. And so we also have a paper on active maintenance where we sort of try to say you gotta come back to these things every couple of years to kind of see if they still work. So I think there are gonna be these issues. But more generally, I think the main value really is just that I think is better than just having an automated code run is to allow the researchers to really open that code file, really look into it, really look to see what's going on, whether and hopefully not encounter errors that will frustrate them and stop the code from running. But yeah.