 Oh, okay, I guess we can get started. So thank you all for coming, for joining this panel. Today I'll be serving as moderator, I'm Casio, and in this panel we have four panelists here. I will say in the order that is appearing in my screen, we have here Dr. Etienne Roche from University Reading, and he serves as editor for ReScience magazine journal, sorry, ReScience Journal, ReScience C and ReScience X. I think he will tell us in more details what these are, but they are journals that focus on, that I focus on papers about reproducibility, well, that replicator works, right? So I'll let he handle the details later, but just keep on with the introducing our speakers. We also have here Dr. Anna Cristalli, who is in the University of Sheffield, if I remember correctly, right? And she's, she organizes reproducibility hackathons, repro hacks, I think we will listen a little bit about it. And we also have here Dr. Sege Frolov from Pittsburgh University. He is an experimental physicist, so a quite different field here. As I, as I hear, he also do some sort of reproducibility at the very least, either from the, from bottom up or really just picking, reanalyzing data, available data from previous experiments done by all the researchers, and he had some at least interesting experts in this past two years. I don't know if you'll talk precisely about that today, but we are very excited to hear about it. And Dr. Stephen A. Glenn is also here with us. He is one of the co-founders of CodeCheck, which is a platform for peer reviewing code. I guess it's a simple way of putting it. And I guess we'll also tell us a little bit about this platform. I hope we can then discuss on about different approaches on how to engage people in actually reproducing research and papers, which is, well, it's definitely not something trivial and it requires some engagement. So I hope we can have a very fruitful discussion today. I think we should start, right? And let us go in the order of presentation that I just said here. If you're ready, then please, Dr. Etienne Roche, it's with you. You can call me Etienne. So I am an editor in, so I represent the family of journals called Rescience. We currently have two venues, Rescience C is the oldest. It's an academic journal for computational work. So authors will reproduce other people's codes and other people's results. And we will peer review this work and publish it on that platform. It's very much about code and analysis, basically. And recently we launched Rescience X for experimental work, where people can publish experimental work, reproduced work, physically. Hey, that's a short. Thank you. So let's go ahead with Dr. Anna Kristalli, please. I think she has some slides for us. Yeah, OK. Go ahead. I was given seven to ten minutes, so I decided to actually use some slides and maybe give a bit more of my personal background as well to give some detail of sort of where my perspective is coming from. So I'm going to go ahead and share my slides now. I feel a little bit overkill now, but we've got an hour and a half. No, no, no, no, no. We've gone an hour and a half. I can take a little bit of it. OK, so everyone can see my slides. Yeah. Yes. Excellent. So, yes, hello, everyone from Sheffield. And thanks for inviting me to participate, Cassio. So I am Anna Kristalli. I'm a research software engineer here at the University of Sheffield, where our teams help researchers do more with their code and data. You can find me on various locations on the internet, but probably more active on GitHub. And seeing as we're going to have an hour for in-depth discussion on our topic, here, I just wanted to take this opportunity to give you a little bit more about my background and projects I'm involved in that will hopefully help give an idea of where my perspective is coming from, where my areas of expertise lie and where they don't. So my background is actually in marine ecology. And, yes, boats are really fun. They're hard work, but they are fun. But in all honesty, in my PhD, I didn't actually get much time to collect data on boats. Instead, I was a complete data parasite working with long term plant and survey data and satellite data and spent most of my time working in R and trying to get a hold of, process, combine and analyze complex data. I got to make lots of maps and I really fell in love with R and data science and programming in general. And it became very clear to me that after my PhD, this was the most exciting thing I've learned and this is what I wanted to focus on. But even before my PhD, there were a couple of experiences that were quite formative into what I would want to do afterwards. And the first one was being a quality assurance auditor for a contract research organization. So their research was governed by good laboratory practice regulations. And so they took quality assurance really seriously. And our unit would go into active research, inspect it when it was ongoing, audit final reports and raw data and to feedback or finance management. The experience taught me that human error is pervasive even when you do very standardized research. And that inspecting people's works is quite delicate business. And that sort of finger pointing and shaming people doesn't really work. That it's far better to sort of focus on system level solutions and make it hard to make mistakes than easier not to. And then for my next job, I worked for an extreme sports equipment distributor. And this actually taught me the hard way, the importance of data management. And that's because sort of having to explain to one of your dealers that an item that was showing one in stock, you promised them for next day delivery was in fact out of stock is kind of really awkward. So the sort of direct and immediate consequences of that database error was a really strong incentive for me to physically check edge cases and think about how things can go wrong. So it ended something I find sometimes in science is the fact that maybe we're a little bit further away from immediate consequences or no one's checking directly what we're doing kind of has the opposite sort of incentive in terms of checking often. Now, so lots of these experiences and interests sort of came together when I was finishing my PhD and I finished at a time when research had been for some time becoming more computational and the reproducibility crisis was in full swing and increasing calls for sort of opening up science. And at the same time, there was an increase in appreciation that the skilled required to tackle some of these issues were consistently being lost from academia out to industry. And this is really what the research software engineering position came about to address. Now I'd really like you to join the team in Sheffield soon after it was established and our team supports researchers through software engineering or consultancy. It advocates for better practices and it promotes capacity building through training all of which kind of completely fit the type of involvement I wanted to have with academia. Another sort of relevant project is I'm also an editor for Aurobin Psi. So Aurobin Psi helped develop our packages for the sciences via community driven learning and review and maintenance of contributed software in R. Now the review process is actually what really underpins and brings together the community. And what I really enjoy about this particular review process is that it's generally very productive and rarely really capacitive. It is carried out in the open and the community is generally very welcoming. But I think the tone of the review process is more a result of the incentives involved and that reviewers represent potential users and rarely competitors of the software that they're reviewing, which means that often the incentives for both authors and reviewers align and that they want a software that's well functioning and user-friendly as possible. And then now probably the most relevant project to today's topic, which I'm likely gonna draw on most in today's discussions is the ReaperHack project. So briefly, ReaperHacks are one day hackathons where participants attempt to reproduce a paper from associated code and data and feedback their experiences to authors. Oh, sorry. So they provide a sandbox environment for practicing reproducibility, both as a creator as well as a user of such materials. And I guess there's a good place to point out that when I say reproducible, I mean getting the same result using the same code and data. So a typical ReaperHack event involves inviting authors to submit papers for review, leading up to the event. And then hopefully that generates an interesting paper list. Then on the day participants will choose paper they wanna work on and then spend the rest of the day attempting to reproduce it. We do regroup throughout the day to share experiences. We've also had relevant talks in some remote events we've run, but the most important aspect is that by the end of the day, participants provide feedback to the authors. So participants get practical experience in reproducibility with real materials, which they can implement in their own work while authors get valuable feedback and validation from others engaging with their materials. They often have put a lot of effort into creating them. Now, so far we've had good feedback from participants and authors alike. We feel like we've got a successful format now and are getting increasingly more requests for advice and support from others to run such events. We've shared and made the infrastructure we've been using as reproducible as possible, but admittedly it's not as straightforward as we'd like it. So we spend a lot of time recently developing a hub for our activities. So this will include a central paper list, a place for organizers to administer events and for participants to view papers and submit their reviews. And we're hoping this will really simplify the logistics of running an event and open the activity up to more people. Now, if you're interested in re-bracking, at the minute you can join us on Slack or follow us on Twitter. Consider hosting your own event, submit one of your papers or join one of our events and there will be one next month to celebrate the launch of the hub. And at this particular moment, you can also help by testing out the hub. We're still putting finishing touches on the dev hub before we move it to the live one, which will be eventually rebrowhat.org. So feedback is welcome. So on coming to the end, I just wanted to close by summarizing some of the key takeaways of these experiences and sort of working in the space for a few years now. So I feel we've definitely made progress in making the case for reproducibility and transparency and more code and data definitely being published. But I do feel there's some way to go still to ensure the materials are fit for purpose. I think it's partially because we don't have a clear definition of expectations of such materials and that will be necessary for us if we really wanna teach and review and ultimately be able to reuse them. And at the minute, we also don't formally engage with such materials or practice producing or using them. And then once we do define expectations, we can start reaping the benefits of convention by producing tooling, automation, templates, et cetera. And then finally, I feel rebrowhacks can help with all this because not only do they provide a low pressure environment to build capacity, but they also allow us to evaluate different approaches to reproducibility in terms of fitness for purpose. So thanks again to Casio for inviting me. And I just wanted to extend the thanks to the rebrowhack or team as well, the N8 Center for Excellence in Computational Intensive Research, Software Sustainability Institute and the RSE team at the University of Sheffield for their sponsorship and support of the project. So I'm sorry, Tiana, I took some of your time. And that's me. Thank you, Anna. I already have some questions, but I will just move on to the next panelist so we can then engage the discussion. If the audience has any question, you can submit to the Q&A box at any time and we will pick it up later. We have plenty of time for some nice discussions later on. So moving on to Dr. Sengiri for Love. You have your own mute. So I also whipped some slides because it looks like that's going great for Anna and I had some almost ready to go. So yeah, my name is Sergei. I am an experimental physicist. Now that can mean a lot of different things. You may think particle physics like a big collider in Switzerland or in Chicago. I'm not that kind of physicist. You may think telescopes. I'm not that kind of physicist. I have an experimental lab, which is human size. Here you can see a picture of it with machines which are cryostats and I run a research group which is of water 10 people. And the subject of our research is quantum physics, also known as condensed matter physics, solid state physics. And in very short, what we do is you all know chips that are in your cell phone, your laptop. So we take chips like that but made of different materials and we subject them to extreme conditions like in these machines you see on the screen, we can submerge them into ultra low temperatures, millikelvin, so almost absolute zero temperature or we can apply extreme magnetic fields a thousand times the Earth's magnetic field that's in the natural field we are all living. And under those extreme conditions, the chips, the electronic devices, the new physics comes out of them, quantum physics. And so we do basic research on this kind of physics. We're funded by mostly government but there is a strong interest in this for technology and this technology is a quantum computer. So you take a chip, electronic chip that is similar to the one in your computer, you put it in the quantum realm and maybe you can use it to store and control quantum information and that can come with huge computational boosts for certain problems. So this is down the road and I'm not building such a machine, I'm doing basic science, but there is this interest factor. And so compared to Etienne, Anna, and also Stephen, I am actually like a little ant in the trenches. I'm just responsible for my own research group. I don't run a community effort, at least not at this moment. So I'm a case study, right? I'm a case study. I have done some reproducibility research recently and it's related to this idea from 1930s that there could be this bizarre fascinating particle that is it's own antiparticle and it can be found in the chips like the one I'm studying and it all started by our paper when I was a postdoc in Holland where we reported some sightings of maybe some signatures of this effect almost 10 years ago. So you can see a transistor like device now in color from a cover of science from that time. And so there is this high interest in this. You can get published in science and nature and if you just look at how the field took off, halfway the screen is where the transition happened from sort of a low level where it wasn't sort of obscure interest in different fields of physics and then it can take off and that's roughly where we entered the field. And so it's a huge area with thousands of papers being published. It is highly promoted in all kinds of news outlets. And well, like I mentioned, it carries this promise of stable, powerful quantum computation. And that's why also Microsoft invested hugely in this. So there is, I don't get money from Microsoft but other people do. And so there's been a lot of pressure. It's a high visibility area in which a lot of high impact papers were published. And we're doing similar research in my lab here in Pittsburgh. And at some point we got spooked by some of the results cause we were seeing something similar but we're not arriving to the same conclusions. And there is no strict reproducibility in our field. We can talk about that later. But we came close to just not understanding how people arrived to their conclusions. So we asked for their data and wow, we got blown away. There were big problems in other people's data. Actually, personally it was also difficult because it came from my former postdoc group where I did my postdoc. And so we got this paper retracted from nature after a lot of complaining. And here you can see the timeline on the screen. This is a very painful and long process. And we are now engaged in a couple more like that also with top journals like nature and science. And every time it is a huge personal effort that takes a lot of time and emotion and also thinking how to explain issues and stuff. So I don't really recommend it but we find ourselves in this trench. So here's an example of what we found. So the bottom figure here is from that retracted nature paper and it makes a powerful case for the existence of these Majorana particles that I mentioned. But when we got actual data from the people you can see a segment between the magenta lines is actually missing. So they cut it out and they glued together the scan and they also cut off pieces on left and right. So it's interesting because they used real data that they took with their instruments, right? And then they just did some manipulations with it to hide imperfections and deviations from the theory. So they didn't make up data, they manipulated data. But another thing that they did and I show it here with cartoons they selected data in a non-representative way. And that's a big issue in our field that a paper is not the result of processing of a set of data which you could do with a code and then you can rerun the code and see if it gives a result. It is really just a little book with pictures that tells a story and the pictures are individual data sets. So you could have taken 10,000 data sets and then you show five in the figure papers. So they're really just illustrations and there is a huge trust that you put in the author, right? That they selected the pictures for the paper in a representative way. So here in the middle, I see, you see all these objects are, they're semiconductor nanowires and they grow in all kinds of directions. They have all kinds of spooky weird shapes but then you just focus on the perfect one in the middle and make that your figure one, right? And you can do that also real time. You start seeing that the data doesn't look good and you look away from that regime. So that is harder to detect. So as far as community work, I think I'm gonna do more and more as time goes by but so far I have written these two things. One got published in Nature, one in Physics which is an American Physical Society Magazine and I hope it gave people a lot of food for thought. And in particular, in my case, I've been observing the reproducibility crisis from afar and I thought it doesn't concern us, the physicists, you know, the kings and queens of science where we work with hard facts, data is data. You cannot fool a physicist. And in fact, the reproducibility crisis is ubiquitous, it hits every field and the more self-assured fields are probably hit the strongest. So in our case, what helps a lot really is if people share data. And, you know, so, you know, if you write a paper and you put these words in your paper, additional data available upon request, that's good but just don't write it, just share your data and there is now this platform Zinodo which is created by physicists at CERN, the particle ones with a big collider which works very reliably and they give you 50 gigabytes per record. So just use that, share all your data, just dump it there and that's a good starting point for reproducibility in our field. So that's all I got. Very much, I'm already now full of questions but I will hold on until Steven has made his introduction. Full disclosure, I actually did my master's and PhD on the theoretical field of my arena of firmings. So I will have to actually hold myself to not go too much into the details. And actually, I once participated in the Ripple Hack and I did some reproducibility on my arena code too. So it is actually not so far from one another as one might think. We are kind of close here. So next, please, Dr. Steven Eglin. Do you have your slides? Should I share your slides for you? I should be able to, let me just see if I can. Okay, great. Sorry. Okay, can you see them now? Like we can see it now? Can you see them? Yes, we can see it now. It took some time to load but we can see it. Thank you very much and thank you for organizing this session this afternoon. I just put the link to these slides in the chat and I apologize in advance if I have to run off and drop my camera or whatever. I've got childcare and poppy issues this afternoon, unexpectedly. This is joint work with my colleague Daniel Nust from the University of Munster, the Co-Check system, which is an open science initiative to facilitate sharing of computer programs and results presented in scientific publications. I have a few declarations and acknowledgments. Really the idea here, sort of following on quite nicely from Sergei is that what we're doing at the most in the scientific world right now is sharing, sort of exporting our work. We're exporting it to the world just as this shiny, very carefully curated PDF, these papers that we're writing. And as we saw from Sergei's example showing there quite nicely is that quite often people are free to sort of cherry pick their results and show what works best. What we're encouraging is to really share what's on the left side of the dotted line. So this is inside the lab. All of this mess that we have that somehow we boil down into the paper. This is a classic. If you just have the paper, there's no way that you can solve the inverse problem and get all of this stuff, right? So that's an inverse problem which we can't solve. But if I share this with you, there's a small chance, still there's a small chance because you may have different interpretation that you may be able to get the same paper. So our co-check philosophy is trying to get to sharing more of these things, the data sets, the programs, the models, the results and all the statistics, sharing all of that in its entirety. So really the start for me for co-check was actually when I saw this article by Nick Barnes in 2010 who talked very provocatively and came up with this article, this opinion in nature, publish your code, it's good enough. I just thought this was very clear and precise. It was partly, I think Nick had a background in climate science and there was a challenge at the time with the UK climate science community about code not being shared and spurious results. So there is a growing, and I think it is getting better. There is a growing movement to share code and be able to work at that level rather than at the level of the paper. I put down some contemporary approaches that sort of are complementary to what I'm going to tell you about next. Co-check, for example, Code Ocean, which nature has been trialing and a French system Cascade for certifying reproducibility of confidential data. So the code check philosophy is very straightforward. What we're trying to do is to make a very low bar for entry to sharing code. So a system like Code Ocean is very nice. It's got these computational capsules that you can go onto the website and you can rerun them and everything. They set the bar, I think, very high by trying to make the code reproducible forever for everyone. And what we've done, so you've got the two dimensions there, who is it reproducible for and for how long? What we've done is to go to completely the other end of those two dimensions and just say, was the code reproducible once for one other person? So that is pretty much we think the lowest bar that we could imagine. And we wanted to do that deliberately just to sort of get things off the ground. So what we do as a co-checker is that we check that the code run and then it generates the expected number of output files. We don't even explicitly say, are they exactly the same output? Because that becomes very difficult. And that's a subjective definition of whether the results are the same. So we just assume that if those outputs can be generated, somebody else who is an expert can look at them and compare and evaluate them. And finally there, we explicitly do not check the validity of the code, whether it is correct. We just simply says, we simply say, does this run generate things that they report they can run? I'm gonna slip, probably skip over the slide in the interest of time unless we want to come back to it later. So there are different communities. We've got the author who provides the code and the data and provides instructions on how the code can be run. There's the code-checker, which is kind of like a scientific peer reviewer, although as they say, they're not explicitly reviewing the code. They're just checking that it runs code. If it works, they can write a certificate and we host everything. Just like Sergei was saying, we host all of our outputs, like these certificates and the artifacts generated. We host them on Zanodo, they're archived away. And then the publisher in a journal setting will oversee this process and actually help depositing the artifacts and persistently publishing the certificate. We think this is a great system. We think there are lots of people that can benefit from it. First of all, the author gets an early check that their code works. It's always nice to know that somebody else can run your code and get the same results. The code-checker gets an insight into latest research methods. The publisher gets a citable certificate with the code and data bundle to share, which should increase the reputation of published articles. So no longer will that sentence data available on upon request be needed because they will already, as a result of writing a certificate, the data are there. They have to be available for the code-checker. Peer reviewers doing the scientific peer review can check and look at the certificate. So they don't have to worry about checking the code. And finally, the reader can see the certificate immediately get working with the code because they know it already. It's guaranteed to run... Sorry, it's not guaranteed to run. It's guaranteed that somebody else could run it. So that means that they can get started with it. On our website, we've got a register with that, which this is an old slide. I think we've got now something like 25 or 26 code checks. I just wanted to point out probably the most controversial or interesting one was we actually code checked the Imperial College model of coronavirus transmission that was used or helped to sort of condense the UK government in March, 2020 to do the lockdown. This at the time was a very controversial decision because the code and data were not available. Nobody could understand this big model. The group themselves admitted that it was a long sort of, it'd been a piece of code that has evolved over, I think, something like almost 20 years. Microsoft and GitHub actually helped tidy up the code a bit. And then I worked with the Imperial group and we did a code check and we could actually show that the results were reproducible despite what people were saying. The strap line there, it ain't pretty, but it works. There's something I saw on Twitter about it, which I thought was quite nice. This is not the be all and end all of reproducible checking, we've got several limitations. We don't really know how we can give valuable credit for code checkers, we've got some ideas. It's very easy for authors to cheat this system. They could hide their outputs in their code bundles that they provide, but who really cares? Because it's all open, so somebody else will eventually find it if they're interested enough. The authors code and data must be freely available. That's pretty much our working philosophy. And of course, that was probably gonna hit some problems at some point with, for example, confidential data sets. We have a deliberately low threshold for gaining certificates, which over time, I think we may be able to bring up. We don't obviously have endless resources, so we can't sort of just rerun everybody's high performance jobs just like that, for example, with the COVID model. I was lucky to get some local support on high performance computer to run it because it was seen as critical, but that weren't routinely available. And we cannot yet support sort of all possible workflows and languages, we're dealing with the common ones that the code checkers know like R and Python and C and Python. So next steps, we're trying to embed this into journal workflows so that this can be sort of part of the peer review process. We'd like to train a community of code checkers and this is something that there's a good sort of overlap and synergy with what Anna has been doing in the repo hack. And we are looking for funding to sort of get this sort of a permanent code check editor to oversee this process. So if you'd like to see more information on this code check, please visit this website, codecheck.org.uk. Thank you very much. Thank you. Thank you, Stephen. So, okay, we get here some, there is one short question in the chat, perhaps two, but let us try to first go straight to the discussion. And if you would follow me, I would like to pick up from the last slide of Anna, where you, I guess it summarizes the, in a sense, limitations that seems to pervade everyone's problem in the first sentence that there's a challenge of going from theory to practice. So could you develop a little bit more about that? And if the other panelists have something to say on their fields, on their perspectives too, I would like to ask more about this. Yeah, sure. That's sort of what we're finding through the Reaperhack experiences. We do see a lot more people motivated and willing to open up their code and opening their code up. We have seen more training of people to write better code, maybe discussions of version control, where to deposit stuff, but often what happens when you come out the other side and try and work with some of it is because we don't have a formal way until code checks, actually, and I really like, I'm really glad Steven made it because it does feel like what sort of missing is, we have a lot of people wanting to produce and producing materials, but because no one on the other end is really engaging with it as much or certainly not in a formal way, we don't know whether the materials are fit for purpose and the only way we'll figure out whether the materials are fit for purpose are by literally sitting down with them and someone else trying to run them. We've developed these events to try to do this, but ultimately we see this as something that should be happening within a research lab. You're gonna submit a paper, we'll get some of your peers to try and reproduce it. This should start becoming embedded within our culture, but I do wanna point to the sec, that sort of leads to the second point as well, which it is still, I understand why people find it difficult because we've talked a lot about open your code and yes, your code is good enough, you should open it, that's a good starting point, but then eventually if it is gonna be of any use, there are then certain levels of standards and quality that it needs to meet for someone to be able to reuse it and we're not got to the point where we're actually defining what it is, what the expectations are. When you say a published reproducible code with my paper, what does that actually entail and does it differ depending on the language, which I would say it does. So now we're getting into the nitty gritty, I would say. We've made good progress, but now we're really getting down to the details, kind of others have something to add. For a lot of, I'd love to have something to add. Yeah, I mean, I agree. It's best to have well-organized code and clearly organized data and curated and annotated, but at the moment when there's still no culture to share it is viewed as a barrier. Even people who think they might be willing to share inside without anyone telling them they think, oh, but my code is not pretty or my data is a mess. So I'm not gonna share it. That's why I emphasize that just dump it, just share everything. So we share, I've been in a similar panel where they ask me, well, what should we share? We have hundreds of data sets and share all of it. We just shared from that 2012 paper, I mean, 4,000 data sets, a gigabyte of data, that's all we got. Maybe it's not so much for some fields, but it's a lot for others, but that's just everything. And it's a mess, but at least we signal that we are open. And then if somebody actually wants to look at it and has questions, we will put additional effort to answer those. And then eventually once it gets rolling, then the new experiments, new efforts will be organized from the start and designed to be clear and reproducible and well-documented, which I guess they should all have been already, but okay, in practice, they're not often, but it doesn't mean they shouldn't be shared. Can I? Yes, Etienne was already muted, so I think he wants to say something. Yeah, so I think that code generally has a step is a bit ahead of the curve in many ways because there were a lot of incentives and practices and a culture already of sharing code, open source and these sort of things. When it comes to all the work, experimental work, I can psychology, for example, let's just pick one, one of my favorites. I'm a software engineer by training, but then I became a psychologist and neuroscientist. I'm jumping, oscillating between these two fields. And I find that the entry level to be able to produce or reproduce work, which is experimental, is actually much higher for experimental work. And so for all of us, for example, how long did it take you to run your first experiments? And then how long did it take you in total to reproduce somebody else's work? And the extent of the reproduction was mostly about re-running the analysis or checking the results, not actually running the experiments, right? Yeah, I don't know, should I answer or should we give Anna back the floor? Yeah, so yeah, so if I wanted to actually reproduce exactly an experiment, it might not be possible because they work with different samples and on different equipment. But if I wanted to come close and just focus on that, that's not really possible because it would take years and the hundreds of thousands of dollars and many man hours. So yes, you're right. It is a re-analysis. So it's a reproduction of conclusions based on the same data that they took that strictly speaking, we did. We did do more than that. We did publish Nature Physics in a lesser journal, similar data with different conclusions which we already had going at the time. And there are a couple other examples like that where another big group was doing a big effort in the same direction and saw this paper turned and had data where they could just publish it and say, no, it's not like that. But yeah, by and large, it will be, once the data is available, you will be able to check for things like, do the figures correspond to the data and are they representative and have all the cross checks been done in the original experiment? Some of it like real-time data selection when the data appear on your screen and you say, oh, that doesn't look good. Let's move over. That is difficult to verify. You need at least lab journals and there are some privacy issues with that that you may not be able to get access to lab journals. Anna, you have something to say, I think. Yeah, I just wanted to make sure I made it clear that I'm not criticizing at all code that is published as is and we do, that's why we call our events like a sandbox environment because any feedback that authors get is strictly coming from a good place. We're not criticizing and to me, we can't really criticize because we haven't set any standards. We haven't said a reproducible paper consists of this, this and this and this. I'm not criticizing anyone and it is great that things are being published but I think the act of trying to work with someone else's code could really make you understand what it takes and help you improve your own work and ultimately start setting some, maybe start with internal standards but once you start having some conventions around what a reproducible output is, paper or compendium, I like to call them as Stephen mentioned, then actually it becomes much easier, it's less investment because you have some guidelines, you have a way of thinking about these things, you don't have to figure it all out on your own from the start. Yeah, I think that's kind of the point I wanted to make and that's sort of what I feel the next step should be in terms of how we help the community so we help with more specific guidelines. Stephen? Yes, just to sort of feed on what Anna was saying. If a, I mean, so for example, some communities are pretty good at sharing. I think they're rare but I think they're good and I think I look to genomics. I think it's a particularly good case study. So just down the road from me is about 20 miles away as the Sanger Center which in the late 90s was in a rush to get the human genome before Craig Ventner did and it was all made public within 24 hours, right? So the data just came straight off the sequencing machines into FTP servers and that community has a very good reputation for sharing the data and then sharing the code, right? Within the R world, there's the bio conductor system which is fabulous and so there they do have standards and I think the community expect, you know, when you publish you're expected to sort of work within that framework. In other areas, there's no chance of doing that and if you're an earlier adopter, like Sergei said, if you just say, here's all my data and I was really worried when you said this Sergei that you said, for example, if people have questions you're gonna try and devote, I can't remember how you said it wasn't like if people have questions or why but you would devote some resource to trying to, you know, to answer them. I think that's great but it could be a huge time sink, right? So the Barnes article was just like, here's the stuff, you know, Caviar Entor, if you want it, it's there because if you're an earlier adopter you're providing all this data and being a good citizen that may come at the cost where other people just expect you to sort of answer endless bug reports and so forth. So, you know, it's really hard to get the culture right and I think it really, you know, I think it just takes time and patience and like, for example, I think I'm a neuroscientist as well and I see, for example, neuroimaging is doing pretty well. They have good standards with particular software packages that they tend to use. There are databases for supporting it. In my area, in neurophysiology, there's not, right? Everybody is still in their own little enclave and doing their own little thing. So things do take a long time and I think what Anna is suggesting is this notion of just trying to support and help I think is critical rather than forcing, you know, too early, putting too early to putting standards and commitments onto people because, you know, people will just burn out. So I think if that happens and so any code check is the same. If a code, if piece of code doesn't work, we don't just write a certificate saying this was a failure. We tend to then work with the author. It's not anonymous. So the review is sort of each person knows the other, the author and the code checker. And by that way, they can have a productive discussion to get the code to a state where it does work. And that's the supportive collaborative process. And so that's really where I think we can really make inroads. Can I? I think actually you touched something that is really the crux of everything is it's a human factor that is really important and people who, and I think Anna, both Anna and you, Steven, you mentioned that you enforce a positive culture that you're not criticizing anybody and so on. And I think this is really, really important. In my experience, wanting to train users or train people to do better statistics, for example, or to think about their designs and their experiments, the main barrier that I find is that people are scared a little bit because we are now researchers. We got a PhD, we had blood, sweat and tears and it was really hard and we're meant to be experts. We're not meant to be students anymore. And so when you turn back and you said, well, I've spent six or seven years learning statistics and I know this and really I should know that. And people are, yeah, there is a bit of an emotional turmoil about needing to be better. And in my experience, this is the main barrier to sort of trying to create a culture change. Yeah, it's definitely a factor, the embarrassment, students might feel that they will be disadvantaged if they share their sloppy notes or perceive the sloppy. Maybe they're actually fine or on par with what everybody else does. So that's definitely a factor. What Steven mentioned is not actually a factor. For instance, I've been sharing data for the last year and I have not had trouble people asking me for it because the interest in the data being shared is actually greater than interest in checking the data that's been shared for the moment. No, people don't have habit of doing that. I think if that starts happening, like people start asking me actually, some of the reviewers start asking now. So that's interesting. They say, oh, you published 9,000 data sets, but you haven't done the analysis. So I have to go back and, you know, so if that starts happening more and more, it will only force me once I already shared, right? It will only force me to organize it better so I don't have to answer each individual question, right? So that will be a positive, but at the moment it's not really a problem. People don't get inundated with these requests. This just doesn't exist. Nobody checks. And then the future, I would be happy if that starts happening. I spend a lot more effort requesting data from people who refuse to share it than asking for clarifications for the data that has been shared. Anna, go ahead. Yeah, that was exactly the point I wanted to make actually, that I think this is kind of what we're, what the point of Reaperhack is in some ways is that people worry about the slumpiness or whatever, the truth is no one is out there really looking for your data or your, it's really hard to get people to engage, you know, and often people put a lot of effort into getting stuff out and no one is engaging on the other side. So they don't know if it works and they might even get discouraged if people start putting a lot of effort in and no one is engaging with their efforts then they might well give up. They might just see this as another tick boxing exercise and just be like, oh yeah, here's some code, here's some, whatever, yeah, I've done it. And to me, I think we should discourage that because I think there is value in having these materials available for building on them. And this is another thing that I think some participants get out of the Reaperhack experience is that they get the value for the community of just engaging with the materials. So lots of participants have chosen maybe a paper that's in the domain and seen something that they're like, even if it's just code for a plot and they're like, oh, I'm gonna use that. I'm gonna use that plotting technique or that package or whatever. So yeah, there is, we need to try and engage with the efforts that people are making to publish these materials in a more systematic way, I think. Let me pick up this flow when there is one question from Eric also in the chat that it connects with what Dr. Frolov said in his presentation when you said, just dump it, just dump the data. He asked if it includes various metadata that help researchers connect to them to your papers and all related work. And then I would then already ask you, so all of you, then what are the barriers for these data sharing and code sharing that you have seen or that you feel somehow? Should I start, because I said dump it. So I can tell you what we do and then we can generalize. So of course, unreferenced data is useless. It's just some files with numbers, right? I mean, it may be useful for some research integrity investigation to check if you manipulated your files, but nothing else. So in every experiment that I'm aware of, there is layers of analysis and there is the original source data and then there's process data, but there's also lab journals where it's recorded what you were doing at the time. And then there's post analysis like a PowerPoint file or a OneNote or whatever word file where you plot and comment and think about the data. There's Python notebooks where you interact with the data, fit it, and so we dump it, all of it, right? So we don't make any files on purpose for Zenodo. We just share all we got. If there was a group meeting presentation where a student shared preliminary analysis, if it's complete enough, we'll put it on Zenodo and then it has all the data file names in it. So you can go in the data folder and get the file and plot it yourself and there'll be a reference to plotting software that we used or in the newer experiments, an actual Python code that plotted it or something like that. So yeah, every, yeah, there needs to be some kind of guide for your repository, but my point is you don't need to manicure and create, put a special effort necessarily. If you want, you can. And to curating and making your data more accessible. I mean, if you have the time, please do it. It would be fantastic and I appreciate it. But if you don't, you probably already have a bunch of stuff from the time that you were doing this work that you can just at least use as a starting point, if not just, you know, share. If you have like zero embarrassment barrier then just share whatever, you know, whatever you wrote at the time. I agree. That's what I tell people as well, to just share everything. But they do are, I mean, they are scared, most people. And I've had students not wanting to share their presentations, for example, for personal reasons. I ask what kind of reasons. That's a good question. I'm not too sure. So it was mostly about not wanting to make a full of themselves or something like that, sort of really just simple reasons, not anything big majors, like just insecurity, I think. You have any raised your hand? Thank you. So yes, again, and sort of following on from the last question, the technical issues are much easier to solve than the human issues. So, you know, we now have, you know, it's amazing, you know, thanks to CERN, we have Zanodo, which just offers these huge data repositories for us just to throw, you know, throw large data sets out. So you can just dump everything by enlarging in many fields, not all, of course. But so the technical side of things has been solved. Again, it's the human issues. And we heard, you know, you're asking about what reasons there are that people don't share. The, another practical thing is just that of time, you know, to annotate everything takes a big investment. I still urge people to do it for two reasons. One is you normally don't just do an experiment and then just forget about it. You'll normally come back to that experiment a year later or two years later. And so the number one beneficiary of that work where you spend your time annotating it will be your own group, right? So it's, to me, it's worth putting in a little bit of effort. And then if you want to do it properly, you can always go the route of trying to get a data paper out of it. Which I think, I mean, I like data papers a lot. They're kind of no nonsense, right? It's just the description of there's the data. There's no, you don't need to provide a particular set of results. You just say, here are the conditions under which I collected the data. These are the formats of the files have at it, right? And there's no, you know, there's no sort of complicating need to try and make a good story out of it. So I do think data papers are a good way of giving people credit by which they can, you know, then invest the time to write these things up properly. Because it is a huge endeavor, right? You know, experiments now are quite complex and putting all this metadata down is a thankless task, right? Unless you're super organized and you do it all as you're going along which invariably doesn't tend to happen. It's a very, it's a hard thankless task. You know, other groups will complain if it's not done and if it is done, you know, they sort of almost take it for granted. So it is a thankless task. So we do need to think about credit and reward mechanisms by which we could do that. I should add a disclaimer. I wasn't until recently an editorial on the editorial board of scientific data which is one of these data journals. But I am no more. So I feel I'm not advertising them. So I could add this dimension to the discussion. So yes, there's a lot of social issues and but still some technical issues that could help greatly. There could be government incentives or mandates even to share data. But if we strip it all down when you were doing research you obtained a set of original source data whether it was numerical or observational or a survey or an experiment in a lab there is that body of data. So suppose there's zero annotation or zero metadata that still exists and you could be required or asked or incentivized to share that as a layer. And then as a next layer you could annotate it with either existing or new notes. For instance, if somebody asks you to or if you would like to engage with people on reproducibility or follow-ups or expansion reuse of your data. So at least the zero level the original source data you could share. So I go back on my answer and I said, yeah, we dump everything and we could dump just the original data. So if there is for instance a research integrity investigation in the future they could look back at that and compare it to our papers and this should be zero effort because you should have a folder on your computer with all that data, you just share it. It's maybe not so useful to others but then it's already out there. I think this connects well to a question that is in the chat and it's in the red, it's had a few words regarding it but maybe wants to say more. So to play the devil, I would just read it to play the devil's advocate for a moment. Do you think there could be an intimidation aspect or a chilling effect for would be reproduced seeing a big dump project versus the curated package with a resume or package list. Over time would reproduces start to default to, oh, they dump it so much stuff it should be okay. It should be fine. Can I start? This sounds kind of a little bit like a friend of mine had a postdoc like maybe last year and she said basically her whole postdoc was to take this hard drive, figure out what everything was and then continue on the research. So here I kind of really agree with Steven that trying to push this is this is gonna be good for you and your research lab is probably the most effective way to go about it. I do think there is an element of training even with just really basic tips that is lacking. So I think there's expectations of publishing data but not really all code but again, we're not really true. Maybe we're training for better programming but not necessarily how to publish it better. And I see funders actually doing a better job at trying to address this. For example, NERC here in the UK have asked for all data that is created with NERC money to be published. But then they realized that to demand this they needed to invest in their PhDs that they were funded in some training for them. So I've actually been involved in this training for five, six years now. And it does help if you tell PhD students right at the start, a few basic tips and tricks, a few expectations so they understand that it's not just about publishing papers, there's more to their research there's more value in these digital sort of resources they're creating and you would be surprised actually we do go into metadata and even structured metadata and ontologies and what have you, which can be scary but it's amazing how much just some tips on good file naming, good file structure, get some metadata in your file names and just a simple good read me and ideally just what are the columns in your data? Just as far as that can get you a long way with data. And as for code, I work a lot in R and I read what I like about how the R community has sort of approached this as they've said, okay, you've got a paper and then analysis that's in R. Well, the best way to share code in R is the R package. So you might as well create your analysis in R around an R package. We've also got these documents that you can have code and data and output in them. You might as well use that for your paper. And then if you do this, then what opens up is a strict convention that already exists that you don't have to figure out on your own for managing all your materials. There's software and tools to help you check your code and test your code and publish your code. And then finally what I like is that there's been a really cool package called RRToolsBuilds on top of all that that can help you actually set up a project that's more relevant for a research compendium, right? So for an academic paper. So it's called ReadMe that is a template that you can fill in that guides you towards what a publication would need rather than a paper. It has facilities for sharing your data within it, following our convention. So yeah, when I talk about standards and conventions, that's sort of what I mean. It's more, can we jump on convention that already exists because of the tools we're reading and sort of redeploy that for our purposes? Yeah, I think eventually it would be great to converge on a fixed set of data formats, probably not possible to sort of enforce it at this state because every little lab like my 10 people use one format and next door they use another format. But if we start sharing, we will see, oh, we will start resolving these issues. But I also agree strongly that offering a course on data management, maybe an online course and then just sending all your students there, come back with a certificate would go a long way. And if they all take the same course, they might start doing the same things about their data. So I think that that should be encouraged and such courses should exist in different fields. And I do know anything of if there are courses like this elsewhere, you said you were doing those courses, do you have any info? I don't really, I mean, some of the course actually I've taken, especially the part about file naming and file organization, there's a really good carpentries course, so I work on just general good file naming and organizing your project that's sort of generic and we do use some of that. But yeah, this was commissioned especially by NERC or actually one of their DDPs, their doctoral training partnerships to target specifically first year PhD students. So I'm sure there are libraries do this sort of training but I think there's a lot of focus in sometimes the data management training focuses on getting the data management plan completed. I don't know if people are familiar with this. So it's often you're asked to do a data management plan at the beginning of a research project, but that tends to feel a little bit administrative as well. It gets you thinking, but it feels like another tick box and what we wanted to do with this course is do it really practical with practical exercises, creating metadata and giving really practical tips like I don't know how to encode no values and just things that they're gonna see in their day-to-day life from a researcher's perspective. But no, I think the carpentries is a good place to start. I'll try and find a link to that course. Thank you. Eric also from COS, Center for Open Science has just said on the chat that COS has data management modules launching later this year. I think it is worth the advertisement. It really fits the discussion here. And I would pull up for one more question or topic that didn't come up in the discussion yet. I wonder whether you think that authors or perhaps one or two authors among the list of authors in the paper or all of them, do you feel they fear sharing for some secrecy, some willing to not disclose research secrets? Is there a lot of this? Do you observe this kind of thing too? And by the way, if you can add on the top of that because I already suppose the answer is yes, from your base expressions, how to address this? What would you would say? Stephen, go ahead. So yes, there are certainly lots of domains where it is seen to be giving away your competitive advantage if you share your code or your data. I have, I would say on the whole in my field, in my immediate field, there is, if you're in a community and you're known to each other, there is, if you ask for some of these data, you tend to get it. 20 years ago, that used to come with a little bit of a backhand, well, I'm sharing my data with you. So if you do anything with it, you have an expectation of being on the paper just for sharing your data, right? Now, that I think thankfully has subsided a lot, but why would groups want to give away their valuable resource, right? If somebody, and I've had people, and I don't have a good answer for this, but people have said to me, I've spent 20 years building up my reputation by having this particular code base. I collaborate with people who want me to run my software on their data. We write joint papers. This is my unique selling point within the scientific community. So by sharing it, I'm giving away my sort of unique selling point within science. But it's very hard. I mean, I struggled to convince any, it's only happened a couple of times, but I've struggled to be able to convince them. And in a certain sense, you can't because it's a very entrenched view. The problem, of course, is about, as always, again, the human element is how do you give people credit for doing this, right? We don't think you can now give credit just for sharing data because it's up there. And that seems, I think that seems right. But if, for example, like let's say I wanted to analyze some of survey's data and one email turned into sort of three months of discussions about what he'd done and what this meant and so forth, the threshold for collaboration or for work may have passed so that, if I were to write a paper, it would be appropriate for survey to be on it. So again, it comes down to this issue of, how can we get people's selfish interests to be aligned with sort of what's good for science as in the sharing? And I don't have a good answer just to say, I do think it changes over time. I think neuroscience is coming along. It's not anywhere near other fields. But I think I'm optimistic that we should be sharing more and I think more and more people are turning around to it. And the thing that I would point to in that regard is data, is preprints, right? So a lot of people in physics have been using preprints since our archive is now 30 years old, right? And it's gonna be around. Bioarchive by contrast is only a few years old and its usage has gone through the roof in the last year or two, right? And that's people starting to share more earlier because they're trying to get rid of this notion that we can only share when we've got the paper out in nature or whatever. So I do think we're, I think the preprints gives me optimism that we're at least going in the right direction. I have some experience. So in my PhD, I designed this very nice piece of software, a 3D animation visualization software and my university, my advisor asked me to not do an open source and not give the code away. And he wants to patent the software and keep it. And now if anybody wants to add access, they have to go to Geneva to do it. They can ask for a license for the small price of 500 Swiss francs. And so my, I just think really sad about this. And so I think at the time, I probably would have tried to convince him that he would get more citations if he actually shared the code rather than keeping it for himself at the time. So I think that there is actually evidence that the more you share, the more people we speak about it, right? We are definitely sharing more. I think Leibniz was writing stuff and putting it in his desk and he got scooped for that famously. And the culture then was kind of like a card game. Oh, you published this. Well, I had this in my drawer for the last 15 years. So how about that? And we are farther along than that, that's for sure. Physicists did do a good job with archive and sharing preprints, but as far as reproducibility crisis and, you know, reckoning with it and looking for solutions, then I would say medical sciences, social sciences are actually way ahead of physicists. We're just now understanding that some of the fields are really in trouble. As far as Casio's question goes, which was... Wait a minute. What did you ask? Both secrecy and hesitancy. My people keep a secrecy and I was smiling. I'm not in a representative cohort right now because most people I've been asking data from had very good reasons probably to not show it because now their papers are starting to get retracted. And that's one red flag. You ask for data and they share nothing or very little. That's actually a red flag. Now, another component that was not mentioned is just not being educated about your responsibilities and your benefits, likewise, of sharing. In some cases, you actually have to share by the journal policy or your national policy or the funding agency policy. So in Europe, they do a much better job with that. In the United States, we have a set of, you know, 500 different policies for each agency and each university, but for instance, in the Netherlands, there is a national code of conduct for research integrity. And it says, on any request, you have to share. And okay, it's not a... You don't go to jail if you don't share, of course. It's not a law of the land, but it is a code of conduct. So some of it is just education, right? You published in Science, you did sign somewhere that you have to make your data available. And then there is a debate about what constitutes, you know, all the data you have to make available. If you have a conflict of interest, like you filed a patent, you have to disclose it. And if you haven't disclosed it, it means you don't have it. That means you have to share. So there's a lot of structure already kind of in place, just not enforced. And education about that can take you a long way. And we already discussed that it would be benefits to sharing, like more citations. I guess if, in my case, I don't, I'm not afraid of being scooped from sharing because to repeat this experiment, you have to have a million dollar lab and unique collaborations that I've established with other groups that provide me materials and stuff. So I'm not really worried about that, but I can see how if you put a lot of effort in code, then you can get scooped. I guess a way to do it is to not do it on public dollar, but to create a company and develop this code this way. But you guys are, the three of you are code experts and I am a lab rat. So you should tell me how to do about code. I don't write valuable code. I guess it's a good question for Stephen. I'm not quite sure how to answer. Yeah, your life, your where life. Sorry about that. I think the easiest way to share code is if you've got one minute is you just zip it up and you stick it in with your, you have a separate folder and you stick it in with your data. I guess the question, sorry, is how to get credit for code, right? Oh, how to get credit for code. That's a different matter, but I am exploring that with, because I mean, in a very deep sense, code is data, right? So just as if there are data journals, where are the journals the way you can publish your code? And so it's not quite there yet, but there are journals, for example, one that I've worked with Gigascience, where they publish workflows, which is more than just one or two files, but it's sort of the whole description. And there, what you're publishing and getting credit for is to say, well, if you've got this type of data, this is the kind of sequence of programs that I run to generate these kinds of outputs. Now, that's not quite at the level of, if you've got, for example, in your lab, if you're the only person in the world to have this particular machine, then probably there isn't a package unless you write it. So, you know, it's very bespoke bits of code. I can't think of a direct way that you get credit for that within the community, except of course, if you give it a URL, and if you ask colleagues when they, if they write papers using your code, they should cite something that you can then show, you can then point to, to get credit. And the journal that's, again, so I'm thinking probably for slightly more polished packages, there's the Journal of Open Source Software, which lives entirely on GitHub, where if you write pieces of code with documentation, you can publish papers that get in there, but maybe Etienne has experienced through the rescience project of other... Yeah, so we follow the JOS model and we do the same thing. I think that our authors would keep their repository private until the moment where they think that they can share it. And then everything is public. I do note that we have a three minute warning for the session. Can I just make one quick comment? That there's absolutely no reason you can't publish your code in Zenodo, or, you know, at the University of Sheffield, we have a similar repository called order. You can publish data there, you can publish code. Yeah, there's absolutely no reason you can't get a DOI for your code that wouldn't need to be cited if someone used it. I think it'd be unlikely if it's the code associated with a paper and you might wanna publish it as a compendium altogether with one DOI, paper, code and data. But then if it's a package as well, you can publish those in Zenodo and then potentially, like if it's our package, it could go through the R-up inside review, which actually gives you a lot of credit if you have a package gone through the R-up inside review in the R-community research community. Okay, so we have only two minutes left. I'll just read what just came in the chat. Conversely, I'm just reading as is. Conversely, I would argue it's not the scientists who crave credit, but the academic system requiring credit as currency for career promotion, perhaps reform in that domain would solve problems too. And Etienne has answered that any reform of the incentive system would be good. I agree, I see Anna is nodding and Saga is also nodding. I think we kinda all agree with that. So since we have only two minutes left, I guess I will close the questions here. But if the panelists would like to say some ending words, you can have like 30 seconds. If you do not want to say anything, that's fine too. Does anyone want to leave any final remarks? Share your data and code and we'll see what comes next. And if other people share their code and data, go play with it and see what comes out of it. Yeah, it'll be fine, don't worry. Yes, taking the first step is often quite nerve-wracking. But, you know, disasters don't tend to happen. People tend to be quite supportive, so. I'm sure many people on, you know, like if people wanted to find out more about getting started, I'd be happy to take questions offline just to help people get started. Okay, thank you. Well, I guess the big takeaway from this session is just share everything and we sort things out later. And don't forget to be nice. I guess that was the human factor in a sense too. And be patient. So with this, I think we can close this panel. Thank you all for participating and from the different perspectives that we could see that we couldn't have here today. It made me remember all the sessions that we saw that I've seen throughout mental science this week. And I think it is really a lot related with the session before ours. And there was a session about reproducibility culture in social sciences. In the end, I guess we can really all collaborate and have some good insights and improvements for all of us. So thank you again for everyone to comment with this. I guess we can close this panel. Thank you all. Bye everyone.