 All right, next, I'm happy to introduce Lars Bildoner talking about trace the transparency certified trusting computational research without repeating it. Thank you for moving that chair. Thank you, and the Moore has made my job a lot easier since she's already done some of the premises of what I'll be talking about. So we're talking here again about computation reproducibility, we're actually going to narrow it down to some extent, but we're going to open up the box and sort of say under what circumstances should we actually expect that to work, not just file paths, but also some other things. So this is joint work with a bunch of super competent folks at Illinois and at Odom. All the errors are mine, even though I'm not responsible for actually computing some of these things. So we already mentioned the research companion, I'm going to focus a piece here around that first step of how trustworthy, how reliable we actually think it might be. So research companion is sort of the article, is the statistical theory, is the computer code, is the data, is the computational environment, all these things play together to allow us to make all sorts of entrances, right? And every piece here has a role to play, some of which we try to assess during peer review, some of which are a question about the literature itself, other future users to think about, okay? So narrowing it down to the role that, for instance, I play, I'm a data editor for the American Economic Association, so we have eight journals, we go through this process ahead of time. One of our worries is that let's get these simple things out of the way, right? One of the key things, even before thinking about file paths or other things like that, do I actually have everything? And what does everything actually mean, okay? So completeness is one thing. What are the conditions to actually make it work, right? Do I need to worry about which version of Stata or Julia or GPUs or some other thing that are required to actually implement this, right? And the physical sciences that might be the specific equipments we're thinking about in the social sciences, we're typically talking about some computational stuff. And once I've combined all of those ingredients, does the code and the data and the computational environment actually produce the stuff that's in the paper, okay? And very, in a very pedestrian way, that's kind of what we do, that's what Lima does is what other journals do in part is well to figure out if we've got everything before publication, we'll run it through it again, right? That's inefficient, it's expensive, and it's not always feasible. So what if we can't do that, right? What if it takes a long time to produce the result? What if it takes an extremely long time to even accumulate the resources to do this kind of stuff? And what if the data is transient or very large or other things, right? And it happens quite a lot. It's not just accessibility of data. If I actually have to run on a bunch of GPUs in a high-performance computing center, well, I might have difficulty accumulating that in a reasonable amount of time before the paper comes to publication. So coming from that idea of this is not an impossible test, because others have obviously done it, and in particular the researchers claim that they've done it, we sort of took a step back and said, what do we actually need to believe that that original execution actually was, how can I trust that, right? And to do so implies some premises about what we're going to be checking for, right? Does the system have internet access, or could you pull down stuff? Can the authors interact with the system while it's running, or can they not, which serve our pieces of that trustworthiness of the original execution? And the rest is relatively simple. Now that I have all the ingredients, the description of the environment it's running on, and the output that was generated from it, can I sort of package it all up, sign it off, and say, here's a trustworthy package. You don't need to run it again. What you see is actually what was produced. Now, under what conditions can I do that, right? I already mentioned internet access or other things. User interaction is kind of a no-no in the scenario, because then I've got something that is not documented. So conceptualizing the whole thing, how to package it and make it integrate with the systems was our goal. A very important piece of that is that it needs to be implementable at scale, right, in many different systems. At least looking at the journals we have, we've identified a few areas where we think this can work. If you think about the federal statistical research data centers in the United States, they have an internal system. Once it gets executed in there, there's a well-described system. Nobody can access the data in an easy way. It takes months to get there. Do they have a system where we can do this? Maybe not there, but there's German systems or Norwegian systems or Canadian systems that we know of how to do that. High performance computing centers at universities are a candidate. Do they have a job queue? Can we sort of implement this as part of the job queue where you say, okay, I'm done with all my preparatory analysis. Now I'm going to send it off to that job queue, right? And to sort of demonstrate how all of this works, we actually have a proof of concept, okay? Now, obviously in the proof of concept, we're not talking about a terabyte of data and 250 GPUs. We've got a very simple example. If you have the time to take a picture of that, that QR code will take you to that actual sample. So we did this using, as a trigger, a GitHub workflow. We send it off to our trace-compatible system. It builds the environment, executes the code. So now we have a description of the entire system that's out there. It then identifies content. So what am I shifting into the system and what is coming out of the system? So it's not just a matter of zipping up stuff that comes out at the end. I also need to know what was there before. And because we're talking about big data or confidential data, what was there at the time that the code ran, that I can't put into that final package also needs to be documented, right? And so that now we can put together a list of all of these pieces. What is the system? What is the input? What is the output? What is not in the output? What has been subtracted? And we're going to zip it all up into something that, there's a whole vocabulary that we're putting behind it. We're going to sign it with, for now we've got a PGP key, but who knows what key infrastructure we might have in place. And then it can be posted on an institutional repository to be picked up by journals or referenced by journals as sort of at least one piece of that entire workflow. Does it solve all the problems? No, it doesn't. If I'm going to reformat the table, the table that's going to be produced by this might not be the one that's actually in the paper. But we can address that by, you just need to reference this trusted one and now you can put that fully open code with the fully open outputs for that in there because it must have come from somewhere, okay? So if you want to give it a try, go and follow those things out. If you want to figure out where we've missed some questions and in particular we're going to be doing a survey of stakeholders in this kind of analysis to figure out what are the questions that we should be asking. I've mentioned a few that we care about a lot. Does the user interact? Is there internet access? What kind of software was there? What data is not there? Okay, but maybe we've missed some questions. We're focusing initially on the social sciences and thinking here broadly about political science, econ, etc. That my specific domain is econ. So it'll be biased towards those, at least from the initial questions. But we'd like to make it as broad as possible, okay? So the idea is that we can convey the trustworthiness of the original computational tasks. We don't have to redo it, but we're going to get all the benefits that sort of the review that is done by data editors, etc., is done. It should be simple to implement. The primary audience are going to be journals that are going to be consumers of this to sort of enhance the reliability of the artifacts that they do. But the secondary users of these are going to be other researchers who are going to be able to say, this actually ran. And knowing that it actually ran and is complete, maybe it's worth my while to invest in it to actually get it to work again as well, right? And so that's sort of our point. Here's all of our names, contact any of us. And that QR code will take you to our current project side, which obviously will be evolving over the next couple of months. Thank you. Do we have time for a few questions? Sure. Or it's just all so easy. I'll see you in my minutes. Thank you very much.