 So, welcome everybody to the VEGU 2021 great debate on research software. On behalf of the conveners, I welcome you and our debaters to the first time the discussion on how we can improve research software and the geosciences takes the biggest stage the EU has to offer. We thank the EU team and our partners for the support. We welcome the Earth and Space Science Informatics Division, our co-sponsor the American Geophysical Union and the ECEB Data Help Desk. The virtual data help desk at EGU lets you engage with data and software experts on your questions on research data and research software. Check out the tutorials at bit.ly slash data help EGU 2021. You can also ask questions via the web form or Twitter using the hashtag data help desk. In the next 90 minutes, we will talk about everything related to creation use and sharing of research software across all geoscience research disciplines. We will start by hearing short introduction statements by debaters before continuing to answer and debate your questions. Now I want to point out that we hope this debate will be a starter of many discussions, and there are multiple options to stay engaged after this event. You can find them all on the great debates website which you can find in the chat or at bit.ly slash VEGU 2021 minus debate. Now let me briefly introduce the team behind the debate and the debaters. My name is Sander Neuss and I'm with the University of Münster. The co-conveners are Niels Drost from the Netherlands eScience Center, David Topping from the University of Manchester, and Leslie Wyborn from the Australian National University. All moderated today is Daniel Katz. Dan is an expert in research software himself and we are very happy to have him guide us through the discussion towards the really hard questions. He's chief scientist at the National Center for supercomputing applications in the US and research associate professor at the University of Illinois Obama campaign. He's also founding editor of the Journal of open source software and an active member of the research software community, not the least known for his work on software citation and software sustainability. Before we turn to our debaters, some brief announcements. Please use the zoom Q&A feature to ask your questions and help us find the most interesting questions by using the voting buttons. Please note that participants joining us for a live stream on the VEGU website using Vimeo won't be able to ask questions directly, but we will monitor the chat there and try to transfer questions to zoom. Please understand there is a delay in the stream and you won't be able to vote on questions or participate in polls. So please join the zoom webinar if you can. If you're on Twitter, use the event hashtags VEGU 21 hashtag GDP three. And now, please meet our debaters. Karina Haupt is head of the software engineering group in the intelligent and distributed systems department of the Institute for software technology of the German aerospace center DLR Karina and her team and to support domain scientists by creating high quality software for their day to day problems. Karina is an active member of the German research software engineers community and conducts her own research and scientific software open source and knowledge and data management. Kim Saradell is the manager of the computational sciences group at the earth sciences department in the Barcelona super computing center. Kim's group guides scientists around the technical challenges using high performance computing. He is responsible for the operation of model runs such as Monarch and the Kelly opi equality system and contributes to the HPC community on an international strategic level as a teacher, as well as in making HPCs for first more efficient. Patrick Sanon is a computational mathematician and scientific software developer in the geophysical fluid dynamics group in the Institute of geophysics at EDH Zurich. Patrick pursues cross disciplinary research, combining applied mathematics, computational geometry, parallel algorithms and physical simulation his interest lead to diverse activities for example in electro acoustic and computer music software for large parallel geodynamic simulations or scientific application testing. Whole food is an assistant professor at the water resources engineering department of Delft University of Technology. He is an advocate for open science and principal investigator on the water cycle project where the team is building a platform that makes computational experiments easier for hydrologists and offers easy access to models made by other hydrologists as well as to commonly use input data sources through open standards and containers. His interests also include geoscientific science communication, for example using games. At least, so some about her. She's newly appointed professor in technology and geodynamics at RWTH, our university, where she teaches and researches and deformation processes from outcrop to play scale. Susanna's tools are numerical finite element and analog sandbox techniques. She worked for the Geological Survey of Norway. Susanna also was program committee chair for EGU for the 2018, 19 and 2020 General Assemblies and served as president for the tectonics and structural geology division from 2013 to 2017. Please do check out the details speaker profiles on our website and take a look at the excellent work. I apologize to the speakers for not including much of their great outputs because they were too numerous or too hard to pronounce. Karina, Tim, Kim, Patrick, Rolf and Susanna. Thank you very much for joining us today. Then let's get started with the debate. Okay, great. So we're going to start with some brief opening statements from each of the panelists about two minutes and we'll start with Karina. Hi, thanks for having me. As an open source enthusiast, I want to state shortly how research software comes together quite naturally in my opinion with open source and how important the community is. Today, research questions get more and more complex, which we try to answer, and we are able to do this because we have existing knowledge which we can base it on and we have hardware and computer science and hardware and everything which helps us to answer a question. But there's one potential which we don't use as much as we could and that's especially existing software. So, while a lot of research all researchers already work with open source software, especially general tools which support them which helps them write scripts to analyze their data and stuff. There's a lot of research software out there which is not publicly available for this. A lot of people think it's only them having this problem or that others couldn't use the software they are working on and it's not worth sharing it. But in my opinion, this is much more of not the case than it is. Because I work at the German Aerospace Center and we have faced a lot of domains. It's from aerospace, space transportation and traffic and we face the same type of problems in a lot of different places and we are creating communities to bring these people closer together. And they figured they have faced the same issues and they implemented again and again. And I think we have to change this by opening up the source code by opening up to research code and make it available for us. Not only that as can we use it but they can also support and maintain it together with you so that we can create small communities, not about domains but around the same kind of issues they have and just worked really well for the HPC domain. So there we have this really neatly that a lot of domains are stepping into it and coming together and I think there's a lot of more potential we can step into. Thank you. Great, thank you. Next will begin. Thank you Dan. So first of all, thank you for inviting me and the opportunity to participate. My introduction would be that to understand to start saying how can the research software become a first class output across all the digital sciences and how we can credit, we can be given to its authors and contributors. So for the first for the first question I would say that that it would be important to analyze and import methodologies tools and best practices from the would say traditional software development to research scientists. Historically research science, science software has been driven by by by science by domain scientists, which I think the, the ultimate goal, totally understanding is the is the science and the results to do that, that they are working on, and perhaps taking a little bit less care of the framework and the software practices that they are, they are applying so so I think it's time and and we are on the road but there's a lot of to do to add research engineers in the in the in the workflow to deploy this kind of solutions and solutions I'm thinking obviously on control systems on practices like the test test driven development like unique testing or integration test, or continuous integration and continuous deployment deployment to, for example, to always validate what the output of the software that is produced, or even the performance of the software and and and when when we want to deploy this kind of practices I think it's really important to put a lot of pressure to have clear directions from a management or for a head of the group that decides to put this working practices and obviously lots of training and documentation. And, and in the, the question of the, of the credit, I think it's really important to add research engineers to scientific papers co authors. So, this, this section that is appearing the last years that that explaining the papers who has done what I think it's really important, and having these names on the research engineers in in these papers is really important for their careers so we tend to think that it's important only for these papers but if if a research engineer wants to move from an institution to other. It's a way of showing what he has or she has been done in in the previous career. So thank you. Okay, great. Thank you very much. Next will have Patrick. Yeah, thank you all for coming and thanks for having me. My background is applied mathematics and a postdoc working on geodynamics at the moment. And I work with parallel software, including the Etsy library so what I'd like to use my introductory time to do is to bring up the questions about how we are going to sort of teach researchers more about software. When we have the opportunity to do so so this is really important in the sense that if we believe that computation is the third pillar of science, I think has some ability to do it beyond just marketing HPC. We have to take this opportunity to teach people at a lower level. The tools that they're going to be using so even a few weeks of training and things like how Unix actually works how lower level languages like C work how computers work. I think gives some intuition to people to follow up on what Kim said understanding how things like version control work. And that once on a fundamental level could be a valuable opportunity and I'm wondering if the community feels that they would have benefited from learning these things up front. I think that to follow up on something that Karina said I think that the education of researchers, we're going to be using software does have to include some covering of the ethics involved here and the scientific aims and the aims of open source software are quite aligned things like reproducibility, openness, assumptions of good faith of the community are very well aligned between these two communities and I think we need to double down and emphasizing those things in training our researchers. And finally I'd like to hear discussion on maybe a more practical level about what tools we're going to be using to do teaching. There's been a lot of movement towards things like Python towards things like Julia, perhaps away from things like MATLAB. These are things I'm happy about but I can see they're being a debate about whether that's become a useful thing on the on the front lines in a practical sense I'm interested to hear about that. And finally, during this discussion I'm interested to hear what people think about how we can be more intelligent and how we're going to use our very limited time and energy. So I think that we need to reuse things. I think we should really be focusing on how we can be smart about realizing that scientists are a very particular class of software user. There are people who, for instance would actually love to read a very good set of documentation. There are people who often, maybe the typical case is to need to modify the code you're using in some way the typical case is to need to dissect how it works. And this is perhaps creates a tension with a lot of the tools that exist for software engineering at large which is aimed towards a much broader and perhaps different class of users. So I look forward to the discussion and again. Great, thank you. The fourth panelist is Ralph. Thank you. I think one of these two minutes to say that any effort aimed at teaching scientists to write better code is probably wasted. And also like to say that the best research software engineer and also the best research software is invisible to scientists. Because scientists should focus on their science and the questions they want to answer. Research software is something that facilitates that. And I like to compare developing research software by research software engineers and scientists alike to road construction and the metaphor breaks down at some point. But it serves me slightly I think for this panel discussion, where a scientist just wants to go from ADP. And if there's no road, they'll find their own path. And then they'll report on the path that they found. And then once they're at point B they want the entire community to come there and then move to a point C. But to get the community to that point you can do either show them the little path but it's not an efficient path mean I'm also in hardware and hardware design and this stuff is full of duct tape and it's not the kind of stuff that you want to advertise people to use. What you want is a research software engineer that then builds a highway to go from point A to B, so that the entire community can go there. But don't let the scientists build that highway because if there's anything they're not good at it's that they're good at potfinding. Of course, the scientist needs to inform the research software engineer have a good discussion on where the highway actually need to go but the scientists also needs to shut up when the research engineer says this is the best way to build a highway. And I do think that we need to be able to understand each other's language but we need to be very carefully that we don't demand of the other to become each other. Because only then can we supplement each other if we're not the same. All right so it's interesting to see that we have some different views about how scientists should have expertise in software or or shouldn't. We'll come back to that. So we'll end with the opening statement from Susan and then we'll move on to the next piece. Thanks a lot. I'm actually really loving the debate that's already taking place. In a way I would not always see such a distinction between scientists and software research software engineers, because many of the scientists are also developing the software doesn't make them total specialists but So, so the research software comes in many flavors and sizes and we all ensure that our software passes technical tests and benchmarks. But we don't really have a demand on user friendliness of a software, or let alone a set of criteria by which to judge the ease of use of a software, or related to this where we could find the software or know the conditions of access to software. My view is that similar to research data and digital tools such as software and should be fair and it should be findable accessible interoperable and reusable on a statement like this assumes that software is open. And my view is that academic software should be open. But when I say that my own software is not open. It's not user friendly and our experiments with sharing so that we did not have the time for the expected user support. The field is clear but there are pragmatic reasons why this doesn't happen. So who's responsible for assuring that the software can be found and used and that it can communicate with other software and then users know how to use it. Researches researches themselves we often don't simply do not have the time. And the solution could be community software. And this shares to work. But at the same time I think there should always be room for new software for skill development but also to give a chance to new approaches that might otherwise not appear when the community is going in one direction. So software is a tool that needs time, not just only for maintenance, but for adaptation to user needs and demands and for training. So I'm really looking forward to the debate. Many thanks. Thank you. So at this point, we will try a poll of the audience to see what everybody out there thinks about research software and then we'll go on and have some some more discussion between the panelists, look at some questions. Right, so we have quite a mix of people. Professor postdoc students are sees research staff. We have a mix of people developing software as well as most people, not developing so much, but lots of people sharing their software almost half, which is all the time and three quarters at least some of the time which is great embarrassment about low quality is the main reason people don't show their software which I think no one should be embarrassed by the quality because everybody software isn't very good. Not much of a mandate for sharing software. So documentation is seen as the main problem but most people are using other people software which I think makes sense because it's almost impossible not to these days. And good to see lots of citation and a fair amount of other things and almost nobody is not recognizing the software which is great. So, okay, so let's, let's get back to the panel. So the intent was that we would spend a bit of time, initially looking at problems that we have, and then have another poll, and then talk about some solutions to those problems. And so, just looking at some questions that we've had from the audience. So let's start with the first one that came in, which is that software engineers are much better at writing software than scientists, but most research institutes don't provide software engineers to support scientists and the software scientists have to do the job of writing software themselves. How can we address that. So anybody want to say anything about that. I want to have two points on that I guess I think we should make a distinction in here between the software that you use to come to a scientific conclusion. And that and the software that you want others to use to build upon, because these are two very different things software that you used to come to scientific conclusion that that gives graphs or outputs that gives insight into new giant into geoscientific processes, etc. is not necessarily meant to be built upon by others. If you recognize that you want others to build on your software. For example, if you make an hydrological model in my field, then there should be an onus on the developer to make it such that others can build upon it. But if it's just an analysis. I don't think you have to. In the field where everybody is using each other's is building on each other's software and not only just knowledge. And you don't have a super engineer close at hand, then you should make sure that that changes and that is very difficult depending on the position in academia you're in. It's very formal to include budget for software engineers in a science proposal that's very heavy on computational science. Okay, I think it's very important as well said to try and distinguish between the types of software that are required and I think that the assumption that software engineers are good at writing software or better writing is true in a certain sense but there's also the problem that the that I feel like I've seen several times is that the type of software that is enjoyable or inspiring for software engineers and RSE even to write is not necessarily the software that the users need or want. And so I think that like from the perspective of someone who's a developer at RSE looking at the at the users of the scientists. There are people who are like graduate students who are probably have only been programming for a few years maybe. They need to understand that writing things in a way that that can be modified or at least can be understood is incredibly valuable to these people and that solving things in a clever way, writing things that are very efficient is kind of what software engineers and RSEs often think that there is their mandate. I don't have for a specific application that warrants that that's true, but I think it's unrecognized enough as it should be that in many applications, something that is understandable by the user, or simply even if it is less efficient. Can actually be what they need and want, at least in the initial iteration of a project, which to be honest is often the only iteration of the project many projects don't survive the five years maybe let's say that really warrants high performance implementation I think that yeah it's good that we're trying to define different classes of software that RSEs or engineers are delivering because if they don't get that right beginning the project can be doomed and their time can be wasted everyone's time can be wasted. Okay, Suzanne. I think these are very valid points and they partly points to communication between two groups that I think they're probably being put separate but I would not like them to be put separate actually because I can see the big overlap. I mean there's there's definitely a difference in rigor of software and in flavor of software. I would not like to make a distinction between now a full blown software that we use to that is developed with research software engineers and that we use to test things versus what we're all for saying a tool that we just use for for analysis, because I think in both cases. They should be treated the same in the sense that they should be open, because that will allow us to to verify and to reproduce and for others to it's the it's the exact requirement and we should always be able to to check each other to reproduce what someone has been done. And then I would treat actually any tool whether it's a software or some other analysis or methods in the same way. We need to make sure that we are always reproducible. Okay, so Karina. Yeah, thanks. I have to agree. So, it's always about the context you develop your software for but the context can change over time so I agree with Suzanne that you can't separately completely because often software. It starts at one point at something like a proof of concept from perhaps a PhD student or something and then it evolves over time and it comes bigger and bigger and people realize okay this is something a lot of other people might want to use and it needs to be written in a better way. So, I think it's hard to distinguish at the very beginning but it's, you know, it's something you have to reiterate time over time again, what is the context and what is the target of my software what do I want with it is it just to show. And in all these cases it needs to be open, in my opinion it needs to be available to others, but obviously it needs different aspects regarding the quality of it. So this has to increase over time if it gets mainly used if it's not getting used you don't have to increase the time. This is something I want to also talk about the poll really quickly because it stated that it's a typical result that people are afraid to share the code because of bad quality and I think if you write to it what's the purpose of your code is then you don't need to be afraid so much because if you say this is just a proof of concept. This is just an analysis I did quickly for this paper or something that nobody expects too much, but if you say okay now a lot of some several people are using it in my Institute or something, then the expectation goes up and you might also have an idea okay what do I need to do next. There's also guidelines, what I developed was together with friends, my colleagues at the alarm and other people and there's who give you exactly for different steps of research software types like suggestions what you could improve regarding software engineering of your project and stuff like that. To come a little bit back to the question is like, often you don't have an RCE where you can say here take this do this for me you don't have the structure and a lot of research institutes. It's about financing it's about organization about having those people but often there's somebody who's at least interested in it. And so you have to find the right persons who then you then perhaps contact at the right time and say okay I'm now at a certain point of my project. And I perhaps need your help and so it's important to identify the right people also who can help you with this in your in your Institute and that's why it's important to have in community there. So you know who can support you and that it's not only always the one single person who has to help everybody. Thank you, Kim you're something you want to add. Yeah, I would like for me to illustrate with an example why I consider that it's really, I would say even mandatory to have this combination of research scientists and scientists. I'm in the high performance computing world so we are operating with with machines. Not only for case marinostrum, but many other machines and as you may know, these are really complex machines. So using these machines with with a combination of hardware and software is really complex. And if we don't team together, the scientists that know, for example is the one that developed the model and a research scientist that that really has a knowledge for example on HPC. It never worked and I would would be even further and the personal supercomputing center that we have even a computer science department, we realize that if you try to make a speech a modeler that knows, for example, the methodological model and you take a computer science engineer that really knows only about the hardware. And there is a very big gap between them in in to close this gap you even need for example people research software for of my team, which are have the specialized in, and in our case in computational earth science that will cover this, this gap, and we'll make this this interaction very easy, because if you tell if you if you try to overcome it it will be really difficult and then you have problems that the model does not run you don't take enough performance, and, and so on. And fine, then, and my last statement is something that we put a lot of effort you were mentioning for example in proposals to put research engineers because we really understand that there are always a need of working in models, not only in development issues, but for example, analyzing the performance how it, if you can take much more advantage of a model in a machine because at the end, if you run it better you will you will be able to run much more experiments and produce at the end what we, I think, much more better science So that's why I consider that it's needed and a team work between both both groups. Okay, so and then back to Patrick and then we'll go to Ralph to turn this topic and move to the next one. So a quick follow up that addresses few of the points that were just raised. So I think that yeah we have so much we have such a limited amount of time as scientists and engineers working on things funded for science that we have to be very efficient with our time I think one way that's quite simple that addresses a lot of these problems is being very good at producing things like memo working examples like read me is like mini apps which can be really the same thing. This means that if you're embarrassed about your code or they're worried it's not user friendly. Get very good at presenting a concrete set of instructions on how to get the code to go from nothing to producing an image or a small understandable output to a domain scientist of how it works. It's important on many of these levels we've talked about so it frees you from a lot of the burden of having to document everything of having to worry about the internals if your code being overly criticized because it really points people directly to the one thing that you guarantee should work it answers the question of what does this code do. How do I run it. If I'm trying to optimize it for an HPC context. What is the first thing that I optimize. I'd like to hear your suggestions from the community and from all of you as well on ways that almost sound trivial like this but ways that you can really focus your effort into saving time and energy that have worked and I found that like providing things like simple read me and quick starts to users is very little work for me it's useful to me to go back to projects. And I think it's a good a good thing to emphasize as we're training people on using software is how to produce minimal examples and quick starts. Okay, Ralph, back to you. One of the things that Suzanne said is that she likes to see that community, not be separated, and I agree to a point where I actually think it's not domain scientists over here and research over engineers over there and then a big nothing in between and it depends on your personal preferences, as well as the field you're in where on that spectrum you you fall. But if you look if you're more towards the domain scientist part of that spectrum. I think it's incentivized to show it the first time and as Patrick addressed that first time will probably not be so nice and I like to make parallel to the hardware development section of earth sciences sensor design etc where I'm also a big part of that that really embraces the, this is really rough you know, this is really rough. And we're happy that it's somebody just works the first time usually 80% of the time, and we presented in things like the McGuy for session it has a ton of duct tape on it. The problem is, as soon as someone else wants to use this because it is very useful. If someone else wants to use it. They don't want to have that rough version. And what we're lacking, I think is the incentive to improve upon this one because I've written I can write a paper about this one, but I can write a second paper that says I just made it slightly better. And so we need to have something in place to be able to do that because only then can people build on each other's work. So let's, let's go into a different question this is maybe the one that came in from column, which is should code used to produce a scientific paper be tested replicated as part of your review. And if so, by whom. Suzanne. So I started answering in the Q&A. I think that's a really interesting point. And as I wrote in the Q&A, I mean I've came across a case where a publisher not only asked for the software to be made available, but they actually wanted the reviewer to check the models which I think is great. And at the same time it poses a very practical problem, because some some of my models they run for months. Not everybody might be willing to postpone the review process to extend it for months. And some reviews might not just have the facility to actually in the time to run the models. So maybe we should think of another way in which we could as a community come up with ways to check what we do. So let's start a new study I actually start by reproducing but somebody has published. We also do do benchmarks where we all pull together and run the same complicated nonlinear science problem where we don't know the answer to. But yeah, so I think it's an interesting point, but how to do it practically is another some other thing here. Thank you, Patrick. A quick response to that that I've seen in reviewing things for the HPC community is that you often have the same problem that you can't reproduce something because you don't have access to the particular supercomputer they used but there are partial solutions to this and one of them. So there's a great paper by Thurston Hoffler where he proposes this idea of interpretability of results and it's not that you can necessarily reproduce them but it's that you can understand enough to reproduce the the actual idea that's being presented. And then another thing that I really appreciate it from another paper I reviewed is that it provided a fairly good set of scripts and output logs from their HPC runs. I don't have access to this computer, but it allowed me to apply my most of what I would do as a reviewer which is read the paper, think of things that seem suspicious and then verify that those suspicions are either justified or not by looking in the actual log like to see how they launched their jobs I can. I could get a lot of the way to reviewing this even without access to the computer. There's still some aspect of trust that's never going to go away and reviewing scientific work but I think we need to at least create some standards of ethics and providing enough data that the people can answer some of their questions at least. I think that will get us most of the way. Yeah, I think related to that there's been some work from David Sorgill about confirmation depth, which I think is fairly similar in concept. I like that. Patrick mentioned the level of trust. I mean, we trusted Newton when he wrote like the numbers of his measurement on a notepad so there's always like a turtles all the way down. I think that will have to accept. But given that, if what Suzanne said some of her models run for months. What you usually see in most research is that you can compartmentalize your research faces with your model runs producing some output, and then some analysis on the output. Usually, if you can provide that output data to a reviewer so that they can they can make different graphs based on on that output data they can get a feeling that you're not cherry picking you're not just making this one graph that looks pretty good based on that output and that would maybe imply giving your reviewers actual access to whatever machine is hosting that output data. That would be a technical hurdle for paper that we're writing we're actually providing in the letter to the editor we're providing a logins to our machine, so that the reviewers can work on this machine and then do a few analysis to see that we're not fooling around. Yeah, to follow to on this point. I mean the I think in the same field as Susan that says that we cannot in climate for example we cannot reproduce a simulation that can take for me I would say that even an optimistic I would be I would be happy. For example, we are able to reproduce a result even inside the same institution so after some year, if, if I take even as a role say the outputs and I'm able to take these outputs and do the analysis and get some results. And, and this is sometimes it's really really difficult because you don't have already access to the machine where you put the, the, where you did the results or you cannot reproduce the, the, the software stack the tools that led to these these results. And these I think we need to put a lot of, of effort in understanding how is the, our infrastructure, and, and trying to be to, to, to, to put tools that are allow us to, to have a clear picture on how was the system what when, when we did this this analysis to be able to reproduce. And for this we have alternative like spark is a bill that that will automate the, the deployment of software and at least you say okay I did this, this analysis, and I know how my, my system was at that time. And I will help you and it will not always guarantee that you will get the same, the same results. Great. So, there were, there were a couple of comments are either questions or comments it seems like people are using the question and answer to put in some comments as well as ask questions which is probably okay. Looking at journals and talking about the role of journals in this process and in particular, the fact that there are journals where software can be reviewed. And this is different than actually reviewing how the software is used in a particular science case. And so I'm curious if any panelists want to talk about what they think the role of journals that just look at software kind of in general and then the absence of a particular science case is in terms of reproducibility in this work. And Ralph concern. Why do we even have journals. So, basically, if we want to acknowledge people's contribution in software, the only reason we're writing a article in a 200 year old format is because we want to have academic credit for it. So, just put it on something that has a DIY make sure it's cited nicely make sure it's documented nicely. But the journal, sorry, you have I know you started the Journal of open source software, and I feel that in the current system we need it, because otherwise people that are contributing software which is an immensely complex to the scientific endeavor are not recognized. But I think we should face it out and find another way to recognize people for this without having them to go jump the hoops on on writing these articles what they could have done in in a nice repo that has the same info. So, just to respond to that very quickly and then we'll go into Karina. So I'm one of the co founders of Joss I'm not the founder. So I just want to point that out there. Arfen Smith this is really the main founder. But also the papers are on the order of one to two pages so they're not too too difficult to write the review is mostly on the software. Our goal for the journal is to have the journal go away we don't actually want it to stick around we want people to cite software directly. It's really just a temporary activity, because the over the scholarly system doesn't work very well for citing software directly right now. And so it's a placeholder for the minute. Sorry, so Karina. I just wanted to jump to your rescue and say exactly what you said, because I think currently it's really something we need was occurring system and people are about to change this kind of system or trying to it's an effort a lot of people put in to change it. There's a lot of work in the field of software citations. I think we also need to try to to find their solutions because it's not as easy as you sometimes might expect like okay you can cite software in a way but it's about dependencies who gets some which which kind of part of the fame goes to whom and stuff like this so it's not it easy as it might seem that first glance. And second I think we also need some way to kind of still like give a meaning to a piece of software meaning or give it have the possibility to say something about the quality so currently in, I think it just but correct me if I'm wrong but also I know from other conferences, the software is at least looked at is, can somebody run it. Can somebody understand what it's about can they kind of give a good argument why this is important for science and it's not just any piece of software but it's kind of has an influence because I mean we want to, in a bit have measures in a way that this is important for people to even open it up and say this is important for everybody somehow, but perhaps we go step by step. And so it's important that we have some criteria to judge this piece of software because, if not you have a criteria which you can undermine really quickly and I mean judging the quality of software is something a lot of people have been dealing with for a long time and all the things like who's contributing her match lines of codes it's all like ideas which are already known to be bad because you can can can really screws this up and like okay I made a million lines commits so I made a lot of fame and I did so much but in reality this is not the case. So there's a lot of things we have to to target and I think it's a step by step process and things like just as a great step on the way in the right direction. And this is really adapt adaption how research works is not necessarily like how people are. Yeah, you could say radar like yeah how many not not just by papers but give it like move them to something which is similar first and then go to completely new new ways but it's it's something which is in the process I think yeah. I need to do their lot and do a lot of research in this fields and I know there's quite groups working on exactly all of these topics so it's not my ideas it's something. A lot of people also some here in this round are working on. Thank you so Patrick for the last answer to this immediate topic then we'll go. Yeah I won't reiterate the same point but I'll just make a slight defense that sometimes sometimes that we don't need a paper for a piece of software to be slideable so I'm glad that it's possible but sometimes I do like reading that like, I'm an academic and sometimes it's really illuminating to hear an expert. Before to summarize their work and present it and a paper is one way to do that so I think that that should still be a venue that exists. But not everything needs to be published in in Tom's for example, but I would like researchers to still have that opportunity because I think that I have certainly learned a lot from some papers on software, if perhaps not most. Okay, great. So we will. At this point we'll run a second poll. We've been talking a bit about problems. This poll will be a little bit about agreement on these problems and a little bit about solutions and then we'll try to focus the remaining part of the session more on solutions. Okay, good. So I think this will be interesting because I was realizing as I was answering these questions that there are a lot of them that I wanted to answer in one way but I couldn't quite answer the way I wanted because the wording was a little bit too strict. So I'm curious to see what other people thought as well. So we see that there's pretty strong support for mandatory initial training and software. There's pretty strong support for all software being open even nuclear reactor software. Kind of mixed views on journals requiring sharing of the software being used in research which is interesting because if the software is open in the previous question but it doesn't have to be shared I'm not quite sure how those get balanced. But at least everybody or almost everybody seems to think it should be cited, which is one where I actually said no because I think there's probably software that doesn't need to be cited like latex, for example. So it's interesting to see how people interpret this question. Lots of people think funders should require publishing software. And so it's about who's responsible for making sure the software is used and lots of support for each you offering a second abstract for different kinds of objects. So let's actually just start with these and maybe I'll just ask the panelists, if we can keep these questions up as well. If anybody wants to kind of respond to any part of this and we'll start with Kim. Thank you. I will, I will jump on the first question on the initial training. So, yeah, I did not really understand or or it's the the the mandatory, if it's mandatory in each institution but I think for for research software when when they work obviously it's mandatory to start continuing developing but for me I would, I would, I would more focus on if it's mandatory for scientists that enter and needs and learn how to use the tools that are produced in the same place or or reused by by other scientists by by other by other teams. And for me, a community that is quite important in these things it's the PhDs. So how the PhD centers and they, they handle of the software, when they had the beginning in the first year that they have lots of for them, everything is new they have the pressure of starting the research, and then they have to learn in all a bunch of new software that probably they don't know so this is really, really, really difficult and we need to put a lot of efforts and this comes from the management from the tutors to to really devote some time to do this training and to handle, because I think they will have an impact on the on their careers. Thank you, Patrick. A brief, brief follow up on on Dr. Katz's comment. I also have the same problem answering those questions and that most of them I would say 99% of the time, make everything open. So I think that there's no replacement for a community that understands the ethical reasons for doing these things. I think that large understands and believes that opens open source software sharing information reproducibility is actually what drives science. Then these things kind of become obvious and those norms can be created. Absolute rules like make every piece of software open source are impractical in many cases, there's people won't accept that I think anyone in the audience can think of a practical reason that they wouldn't be able to do that in every case, but we can and still should enforce ethical norms in our communities that say that reproducibility and openness and some sense of, of giving this even of your results is, is important. Okay, I want to. I don't like to force all PhDs to follow one particular course because of the enormous diversity that we have within Geosciences on topics that people have. We have, we have PhDs that do most of their work in the water lab they do very sophisticated measurements. They do their analysis mainly in analytical equations and then could do the entire data analysis in Excel, maybe have an open source version of Excel. These don't necessarily have to use a, if they don't have to follow a software design course. I am a very strong proponent of having like a multitude of courses that incoming PhDs together with the supervisor can select from that are suited to whatever research they are going to do but I don't like the blend. Everybody needs to know this statements. All right, Karina, you're muted. Okay, it had to happen once to one so this time with me. I kind of agree and I disagree so on the one hand, I agree that there's not this one course everybody has to take and this will help everybody that's definitely not the case. But we started offering some really basic courses which we offer and we recommend to a lot of people and greatly there's not just the trainings we give at our research center about Helmholtz white with the German research center you could say. And we have a great project where we start sharing our trainings meaning we have always like some introduction into how to use get and how to do a little bit of data analysis, using for example Jupiter notebooks or Python or something. And we have there like different nuances of kind of a similar basic thing. So all learned some great basics because nearly everywhere get lab or get is used in a way. So they got some some basic knowledge but they can see what fits them best which ones that fits them best and it's not just offered by us but by a lot of different centers so they can pick one and they have a close to the beginning when they start with us and it doesn't matter if it's PhDs or people just coming to us and they start to have anything to do with development. Obviously if somebody is just building stuff or something they might not have to attend the training or it's never forced but in my opinion you always should as a when somebody starts working for me always think about what do they need to do with me what will be their task and which trainings would be helpful for them and then I send them along really early, and I show them where they are and so for me it got like a requirement to at least think about if they should go to this and this kind of trainings, and some are really the basics like get or get lab use. And there we also not only show like okay this is how you do a big software project but this is how do you do when you have a small script or even if you want to organize yourself and some documents or you can write papers in it and so we show the whole range. So people can get used to tools which they then can use in software development as well. So we have a training regarding sustainable software development which is focusing on software but it's not like software design but it's a bit about clean code, how can you have a nice structure what should be in a repository we watch belongs together. A little bit about open source licenses so that they know when they use dependencies from others whereas I have to take care, and whether they have to be careful with its gpl code or something and stuff. So it gives them a nice overview into half days to those that they feel familiar now where the tricky parts is and especially who to contact and who they can ask within DLR or within Helmholtz. And I think this is the way you need to go you don't need to make some experts definitely not because not everybody of them becomes a research software engineer or really a scientific developer or however you want to call them. So we have some basic knowledge and know where's tricky parts are and where they have to become careful. This can prevent a lot of things, because we do a lot of consultation. I do a lot of open source software and license consultations and also with an architecture and so on and so on continuous integration software engineering. So it comes to me with stuff and I often so often it's like okay if you would have done this small thing in the beginning a little bit better or if you have new about this and that basic concepts you might have taken another part and you wouldn't now sit there with kind of a mess of software which nobody wants to use but everybody has to and it's kind of got important that they have to rewrite or how can we go with this and you don't have this. You have legacy code which you need to deal with, but so I think it's all why these questions are very strong like everybody and everywhere I went with yes because I say okay, at least think about going to this but yeah it's it's important that we go in the right direction there. So we'll see I think Suzanne wants to talk about a different one of the questions I believe. So please go ahead. Yeah so that that was number six, the, who's responsible for making sure science software using a publication is of sufficient quality I find it's an intriguing question we addressed it partly right. We talked about the review process. And your journal of open software that should be faced out because we should find other ways of acknowledging and citing the software but So in my field in geodynamics, publishing a software by itself is a really hard go. So so a journal will require that you, you publish your software in the sense that you put it in a paper where you actually use it for an application and they will review the application and not the software. So if we asked them who's who's going to make sure that that software is actually of reasonable quality. But most people in the positive first of all the researcher and you need to document this, or you need to show that you've passed the standard tests and benchmarks in your field, for example. But the interesting one for me is like the journal, because being editor of a journal. I would say, you know, the poor editors that would have to handle this. And they would, they would of course do is to ask a reviewer to do it which brings us back to the earlier debate discussion that we had on how do reviewers do this. And can you ask them to, to, it's not always possible as as Rob said earlier to compartmentalize a software and just ask them to look at the specific part of the data set I mean the data that your software puts out at the end you can always give. But but it's been through so many steps already. I think that that that step is probably not not the best one to look at, then you're looking more at the visualization, but it's all the steps in between, especially when you have non linear feedbacks that make it not intuitive to you to understand what what happens. I think yeah, I would say it's first of all our responsibility as a researcher, whether I use my own software or somebody else's software I need to document that that it is behaving well, and it's of sufficient quality. Yeah, I think that if you disregard the very high end of the HPC part of the community at the moment that the model runs that take forever and ever. I would argue that we can demand from the editorial staff, not the editors themselves because they're unpaid labor as well. I mean, what are we paying journals for anyway, so why not demand that the editorial staff at least does a first check on if I run this code. Does it generate the same graph as the author is claiming. And to some degree that that process could be automated. And for some research, especially if I look at my own field in hydrology. For a lot of research I think you could provide a single notebook with some containers around it we could say if you run this on your hardware, it should generate this graph as well and it starts at that point with what we consider input files. I think that there is something that publishers could be asked to do before it goes to reviewers who then can vouch on the scientific quality of of the work being done. So it's interesting I think I didn't read this as as necessarily scientific quality I was reading this more as software quality so it's interesting to see. I understand but I would say that the, the software quality is well, of course the quality of the work in itself is of course with the one producing it both the scientific and software quality. But then checking the software quality could be done at the public in the at least for a first level check could be done at the publishers the amount of review requests you get where just one glance to it. You would say this, this should have been stopped by the editorial staff before it reached reviewers. So that's why I say in full them more. Yeah, it's, it brings up a question about if there's a difference between something being reproducible or reusable and something having high quality. I think there probably is but I'm not completely sure, but Patrick here you have something you wanted to. Yeah, I mean I guess I'm not point I would say that the, when it comes down to allocating your limited time if it's a paper that's supposed to make a research contribution it would be nice if it was high quality software but it has to be reproducible. But to follow up on something Ralph said I think that in terms of basic tools that we need to educate people about. I personally wish I knew more about about containerization because that is a solution that is becoming more popular in the software world at large for making things reproducible in terms of testing and deployment and so on. I think this also ties in with why using open source tools is good for science in the sense that I found that those tend to be more scriptable. I think that's a good point. I think that's a good piece of an incentive to really keep you contained in any kind of ecosystem so, for instance something like para view which I think a lot of us in geodynamics use is becoming increasingly scriptable and as an open source project I'm really glad to see that, because maybe we're not there yet but I think in a few years it should be totally reasonable to say, you know, use this use this Linux image download this version of para view run this Python script, it might be a horrible looking script, but it'll generate the picture and if and when I need to go in there and you might be also an expert in para view. It becomes possible to apply the scientific process at least like there might still be some ugliness because we don't have time to do everything nicely but the fundamental thing is to be preserved. If we use these tools and educate people on the basics of how to run scripts use containers those types of things. So, since we've been since we've been talking about openness affair but I'd like to talk about platforms briefly and see what people think about that. I'm kind of curious if from the panelists if they have particular platforms that they think are good for for sharing software for using software for reproducibility. And part of the reason that I asked this is that there have been ties between some journals and some closed commercial packages as platforms and then there also are more open platforms and I'm curious if anybody thinks that the whether the platform is open or closed actually matters that we should be using and what platforms people think are are the best ones to use today, again for for sharing for reproducibility and for reuse. So, Kim, if you want. I will start for sharing for example something that we that at the beginning when we, when we started, I think, six, six years back at the science department with the arrival of a new director for sharing we had a debate, for example, if to use it have that at this, at this time was already was pushing or, or, or directly go for an instance of a property, a known instance of good lab so we were thinking on, on, on both ideas on both and finally we decided the, the, to take the on good lab because you can decode this in your place and you have more control. And then you can share, but that's true that if it with the years we detected that that not being for example in a platform like that have sometimes it's much more difficult than to share and collaborate the code. So this is something that we were not thinking at the beginning, because when you have to give access and you have to give users so, so now we have the code in our places and, and I think it's good but in the other in the other hand if we compare that now lots of institutions are moving to GitHub. I think we love we are losing easiness to share and interact with other users that can fork and so on your, your research software. Any rough. Did you want to respond. I want to make sure that we don't present this as a false dilemma because I think you can do both. So, so you can have your code on GitHub, because it's easy to use and you like to develop on GitHub with your team and make usable the fancy bells and whistles that are available there, and also use it to share with people but you're scared that at some point might pull the plug etc. You can have a mirror on your own get lab or have releases on Zanodo that make sure that the archiving function is maintained. I completely agree with Kim that usability of a platform should be very important in making the decision where you work, but you can have double instances and are synced copies of whatever you want. In my view. I guess I'm also kind of curious about things that are more for code that's usable like things that for for code that's in a container or things like code ocean or binder or, or these kind of other platforms as well but Karina did you want to add something before we I just wanted to add that we are exactly at this point so we have our internal get lab, but we have still quite some code and get hop we use the mirrors has some disadvantages because you still have one place where you develop where you have your issues and so on. You can move stuff around, but our main problem with that people know internally now our get lab so they know how to handle it how to use it and get hop is different. So currently we are thinking about changing to get lab for stuff so that they have to just deal with one tool but obviously it's not the main use framework. So, while it's not a. It doesn't have to be one or the other doing both ways is also kind of a struggle and you need the strategy and we're currently working on that so just wanted to mention this a little bit from my practical side where I'm currently standing at. Okay, Kim. Yeah, that that has been this technology has been mentioned in the last minute to do its containerization. So from the HPC side, this is something that I think at the beginning when when we when we got this this this new idea is this new promise when we reach the HPC centers. It should be that it would change the game and it would be things more easier, but I'm sorry and after some years working it you realize that perhaps it's it has good, good points but it's not the solution that will that we like to ring other everything so in when you arrive in an HPC depending on the policy of seeds admins, for example, there are there are tools that are not there are solutions that are not allowed like docker then they want you to use other other other solutions so it can be it can be used for for some special use cases, but it will not clear and we will solve everything. So, and I think this, this is part of what we were saying that to have research engineers that will help and we will allow to identify which use cases will are suitable to be to be solved and and using this kind of technology as containers. So, because we're actually getting close to the end of the session time. I'd like to start moving towards the end. And what we were planning to do is to have closing statements from each of the panelists again maybe a minute or two. And this can be general but also kind of interested in what you have learned from this from the session and what maybe what you're thinking about slightly differently or what you're thinking are are interesting challenges that you weren't thinking about before. And we'll go ahead and do this in the same order I think that we did the previous part and start with Karina. Wait, so I don't have time to think about that. Yeah, after I really like the results of the polls to see that so many are for this open and community going way. Especially the arguments from from wealth, who's kind of staying on another type is decided for some things. But he is correct, especially with that we need a structured role for for this big software project where you can't perhaps run this really well just with scientists because they should focus on their own designs. And they will stick with other parts where scientists are definitely necessary and also for some projects they have to stay in the loop really close because I don't want to deal with the domain knowledge and to get all of this into my heart they have so they should do this part and I'm dealing with the rest. I think this was was really interesting. And I would have loved to hear more about the contentization and stuff because there's something also still up as a challenge so if there's something really practical, hit me up with it. I'm looking forward to it. Okay, thank you. So, for me, the main take away of this debate is the that we had time and the organizers thought on this topic of research engineers to have this debate in such a forum like EGU. So my research engineer and when I started for me, EGU was the place of what scientists of geophysical scientists were presenting their work and realizing that right now we have time to discuss these kind of things really I think we are in the good way. It's in places like this where we will discuss these topics people that perhaps are not so used and don't have these ideas will take something and we'll say okay so we can apply this at least this piece of ideas I can take it back and try to see could I implement a good lap or should I ask for these services in my institution, and I think bit to bit we will realize that that if we apply these practices, we will go a step further and we will deploy it in much more centers at the very end and and and goal of developing much more better software for research scientists. So thank you. Great, thank you. Patrick. Thank you so much. This is really, really interesting so I think one thing that's emerged from me over and over is that there's really no way around emphasizing the community aspect of these things we tend to think of these as being purely technical issues but so many of the problems we have a really about communication so I think that we need to make sure that we have a continuing dedication to a common set of values but we're trying to do. So it argues some basic technical literacy. It allows us to work together this isn't about making software making researchers do the jobs of software engineers but I think that understanding having some common language but how computer software works especially lower level things, how to use terminals, how to write scripts for example is is essential for us to be able to talk to each other and foment these these community relationships that are going to solve a lot of our problems. And since we didn't talk about it very much but I think in terms of how to get these things funded I think that a good knock on effect of requiring more openness from publications from review processes, and particularly from grants is that RICs are going to have to be written into grants to provide the reporting to provide the things that are required for these publications so I think we should very much support those sorts of rules because it really means that if those things are first class deliverables are going to be produced by first class staff basically. Thank you. Okay, thank you, Ralph. Yeah. First, thank you of course for inviting me to be here as a panelist, I really enjoyed it. Thank you for organizing this it was a really smoothly run session. I have three final points that I want to address the first is rather technical. I'm in hydrology a field where every research group has its own model and it's virtually impossible to run on someone else's model because legacy code different system dependencies etc. We're building the system based on containerization that does allow you to work with each other's model and work through interfaces. You can work with me, or any of the people on the water cycle team during each year, we'd love to talk about what we've developed. There's an upcoming technical paper where we're describing the system. But it's upcoming and I'm not putting a date on it right now because some of my team members are watching and like no no no, don't mention dates, rightly so. One thing that I want to mention is what I heard a little bit today is people saying researchers should do this and research software engineer should do that and I think we should be very careful by making statements like that we should always ask ourselves. If they should why aren't they doing it yet. Why aren't they stopping them from actually doing that because because a lot of these things are obvious PhD should learn how better how to better write code. Why aren't they doing it yet. Because I don't know what he's telling them, or they're too busy, or they're just focusing on writing papers and they don't have to write incentives. So we should also address that part of academic structure and culture if we want to make the change. Finally talking about that culture. I think that there were some questions in the chat about what can early career scientists do culture change is hard. And I think that the senior scientists listening in on this should take the responsibility to push for that culture change at the same time. I think there's going to be an opinion paper out. Caitlin Hall is the lead author on that it will be on Earth archive as a preprint very soon that provides a how to guide practical guides on how to be an open hydrologist and it will also have very relevant stuff for how to share software so I'd like to advertise that here. That's all. Thanks everyone. So, so I started off by saying in the beginning that there's this fine balance between what we really wish and what we actually manage and I think I'm coming out of the debate we're thinking that that question is kind of still there. What was interesting for me was a was brought up concerning the role of journals and their responsibility, which is true in a way because the journals are of course finally responsible for what they put out. But there's only so much that you can actually put on them and it's indeed not the editors because editors are also volunteers and the question is if you put it on the staff of the publisher. What does that mean for the cost. Because somewhere along the line somebody has to pay. And it's many other thing that yeah there's clearly a need for for communication between those who develop software those who use the software that journals and people train the different generations that are using the different digital software, because it's like the legacy software that can be sometimes very interesting to access it to read. So one thing I was all thinking we didn't talk much about the funders, but I think actually grants should be written in a way that they that there's time written in for doc for publishing a software for documenting and for making it available the same as we now started in costs for open access publishing we should write in the costs for any software that's developed during the grant time to make that software available. And then you should not be assumed that this just happens by itself. So so and I love the attention for the for the research software so thanks a lot to the organizers and thanks a lot to my colleagues on the panel. Hey, great. Thank you. I guess I just wanted to throw in one comment based on what folks have said, which is that the early career folks who are listening who have different ideas. It's important to remember those ideas when you become more senior and not to feel beaten down by the system that you've managed to succeed in but to remember that there are ways that we could change it to make it easier. I want to thank the panelists and the conveners for panelists for bringing up great topics and making this a great discussion conveners for making this very easy. I want to apologize slightly to the people who asked questions that we haven't gotten to. We had probably twice as much time because we actually had some of our own questions as well and there are a number of things that I was looking forward to hearing the debate about but we'll have to have to wait and come back for the next time. And so this I'll turn it over to the conveners to wrap up. Thank you Dan, and thanks everybody for for such a lively debate. It's very interesting. And for everybody who wants to follow up on the poll results. I've copied them over to the slide deck, which I, which is shared by the link at the bottom of the slide. And I also found this very interesting I think we were only partly preaching to the choir so it was great to see that the vast majority of people do recognize software for example, and that many quite quite a few people already share always always share software. And I found it also interesting to see that there's on some points quite quite high agreement on things that we might change in the future. Now on that note, I really enjoyed the debate and the closing statements and I've tried to capture a few of these thoughts that that resonated with me during the debate. I'm going to turn it over to the convener on the slide, and I think I only want to look at the last ideas that were shared that culture changes hard, and we need to work together to to drive it. It's, it's hard because all aspects of science are touched upon. And that's why I'm very, very grateful to the speakers that they participate today because I hope this event was one of the building blocks for us to discuss what we need to change, and how we can improve research software in the geosciences and improve the situation for all people developing and using that research software. And that said, thank you very much for your participation. Don't forget to stop by the data help desk on Twitter. And also feel free to use the link at the bottom of this page, a bit Lee slash VEG you minus software minus discuss. It will lead you to a discussion forum and GitHub, where you are invited to follow up with each other or with with panelists and share your own ideas and ask the questions that we were not able to discuss today. Thank you. And I wish you all a great remaining base and Dylan said the GU this year and hope to see you all again next year in Vienna.