 Hello and good morning everyone I'm Leslie she and I am one of the co-chairs of the CSDMS terrestrial working group. I'm here today hosting this presentation on fair and research software. I'd like to just give brief presentations of our two presenters before I let them take it away. We'll be hearing from Dr. Anna Lena Lamprecht. She's an assistant professor in the software technology group Department of Information and Computing Sciences and Utrecht University Netherlands. Anna Lena conducts research at the interface of research software engineering and applied formal methods. She's currently focusing on fair software and automated composition of scientific workflows and also teaches courses on programming software engineering and formal methods in her department study programs and beyond. She's a Westerdike fellow faculty ambassador of the open science community Utrecht and co-founder and steering committee member of the award-winning Women in Information and Computing Sciences Network. We also have Dr. Salvador Capella. He is group leader of the Spanish National Bioinformatics Coordination Node at the Barcelona Supercomputing Center in Spain. Salvador's current focus is on the development of long term infrastructures to facilitate the scientific benchmarking and technical monitoring of bioinformatics tools, web services and workflows in the context of Elixir, which is a pan-European distributed organization across 22 countries and more than 250 institutions. In the context of Elixir, Salvador co-leads the Elixir tools platform which aims for better software development across the life sciences. So welcome very much to these two presenters and I'm going to disappear here and enjoy your presentation. Thanks for the introduction Leslie and again also for the invitation. So I'll start this off and Salvador will jump in in a while and then we'll change back and forth a bit. So yeah, fair and research software indeed a quite hot topic I was saying. It's also currently a very fast moving field. So every time that we give this presentation we adapt it a bit to account for the latest developments. And yeah, the idea for this presentation is a bit to shortly recap the fair data principles because that is where all the discussion started and that most people know but just for the context of the presentation, we will do a small interactive recap with you so that you have clear in your mind of this morning again. What this is about and then we will discuss a bit okay data and software. What are actually the differences and why do we need separate principles for software or why do we say we need different principles for software. And then we'll discuss a bit about the different steps in the journey towards fair principles for research software that we have already taken the outcome so far. And we will also mention a few pointers to the international community around the topic that has formed and what you can do to get involved if you're interested in because there's also many different ways to to to take part and take part in this journey. So maybe you're familiar with this Mentimeter tool. So I would invite you to just grab your phone or another browser window or whatever, go to menti.com and type in that code that you see on the slide there and answer our first questions that is simply the fair data principles how familiar are you with those. Have you heard about them at least. I won't maybe be somewhat if this is really the first time you hear about fair that will be a no. And if you have a bothered or worried a bit about fair already, then please choose, choose yes. Fair, while you're answering I'm just talking about that a bit. It's this acronym for findable accessible into operable and reusable. This idea that all research data and in fact all research objects should be these four things findable in the internet accessible without really limitations or at least it should be clear how to assess them. It should be interoperable to be able to be integrated with each other, and then the spirit of science of course also reusable so that people can make use of the research outputs that other people have already produced and do not have to do with themselves. Most people here say yes or even somewhat only one person says no never heard of this before, but then yeah unfortunately we can do a full introduction to that topic here but I think the most important thoughts for fair software will become clear. Also, without that. So the next four questions are kind of quiz. I'm checking how familiar you're really with the spare data principles and the way it works so I have a statement and then you need to say which of the four principles so FAI or R that is. Let's see how that goes. I think you need to enter some some player name maybe first for can answer this question so we have some some players here. And so we have question one or four. You should be able to see it now yes so metadata are assigned a globally unique and persistent identifier. So is this findable accessible into operable or reusable. So globally unique and persistent identifier for a paper can be a do I for example so this digital object identifier, or another like persistent URL that I give it by large databases that is what it's meant by a by a persistent identifier. So now which which principle is this. Times up. See, yeah, findable and that's correct so most of you get this correctly that's that's very nice. And that's also very clear one that maybe as a spoiler at this point this applies to both software and data so there's a very important principle indeed. Okay, next question. Let's see if that's a bit more difficult question to far is metadata meets the main relevant community standards. So which principle talks about community standards and domain relevant standards. It's actually a more difficult one. 10 seconds left to answer. Right, time's up. And this is interoperability and no I mean it always works we always trap people with this one. And it's a bit mean one so this principle if you if you go to the original fair paper, this one is listed under usability and the point here is really like the domain relevant community standard and community. So this is very much about what allows people to reuse and research object that is the ideas that this has a lot to do with like the domain specific information that is provided. And it's, it's arguably it has to do with interoperability as well this is also why many people choose it. But the emphasis here in the principles it's in it's in reusable. And it has much more to do with really formats and technicalities than it has with domains and communities. But yeah, we'll dive a bit deeper into that a bit later. Next quiz question three out of four is another sentence so metadata are retrievable by their identifier using a standardized communications protocol. Yes, findable accessible into operable or reusable five seconds left as voted. This is accessible. Yes. Also if you're right now and so these standardized communications protocol can for example be HTTP or something so that means you could just download that stuff through the browser that doesn't know. Yeah, standardized and open communications protocol. Yeah, that's very, very important as well. Okay, last of this series of quiz questions is now a metadata include qualified references to other metadata, which one is this again 30 seconds to answer 10 seconds. And one of the difficult principles, that's what I can already say, and that sparks as well as for data as for software a lot of discussion into operable indeed. So maybe you just figured because we didn't have this one yet maybe it really sounded like interoperability but this is indeed correct. And the idea is, yeah, references between objects that has a lot to do with interoperability of course. Well, thanks for your answers on this one. And let me see if we can get to the was a bit slow computing but we do have a leaderboard for the quiz. So agent X nine, no idea who was the worst but was the fastest and the best in this but you have been doing quite well everybody so this is quite a good, quite a new, quite a good result. I have a few more mentor interactions with you but not in the kind of a quiz but more of a discussion but maybe keep that keep your mobile ready or your browser window open, you will need it again. Yeah, so very quickly. So this is what Ferris about you seem to have a quite good idea about that and yeah they have, they have been tremendously popular in the last years but they have actually only proposed like five years ago in 2016 by Mark Wilkins and others. And yeah this original paper is mentioned here on top of the slides and what you see there is a really that original paper talks about the fair guiding principles for scientific data management and stewardship. And this is this prominent title and the way the paper was framed and also led to the fair principles being perceived and communicated mainly for data in the beginning, although from the start and that's also the original paper. They were meant for, yeah, all kinds of research output, at least digital research output and then this is like a quote from a European Commission paper that also names for example data software and other research resources that should also be fair. And yet starting point of our work was then not five years but like three and a half years ago so to say okay, how do we now resolve that mismatch yeah so actually the fair principles are meant for all research objects, but then the these really popular famous principles talk about data so much. And let's have a look at that what that really means how do you do these fair principles really relate to software. And yeah as I mentioned in the beginning it's an ongoing discussion, there is not yet a set of definite fair principles for software. And the paper that that Salva and I and a bunch of other people published. Yeah, officially spring last year that I think it definitely was a milestone in this discussion and summarize some of the important thoughts why we need separate fair principles for for software. And indeed yeah proud to say it was most viewed article in that journal in the first half of last year. And what we will do in the following is also next to get into some of the thoughts and ideas in that paper to understand a bit more what the discussion is about. So here's the next Mentimeter question for you open question. And just typing your answers to this. What is actually data. When we talk about the difference of data and software we first should get the terms clear. I'm curious to see what comes in here. Answer in recorded information. And that's not I mean answer what what comes to your mind because there's no really a right or wrong because there's certainly different definitions for data and they can all be, and they can all be correct observations like measurements observations again. Information in its rawest form that's a phrasing that I like. That's a phenomenal. Let's see what else we get information in communicable form numerical or other forms. And it talks about the types of the things of it. Yeah, but we do see a lot of like information that is recorded measurements facts that we can do something with a recorded information again. So just agree with with any of these I mean they're all good composition but let's see like what the other things are that we talk about so this is data. And then, then what is software. Interesting because this is often people find this more difficult to to answer thinking I'm waiting for the first replies. Yes. Computer code written to accomplish a purpose. For sure, I can agree with that code applied to data. Yeah code often I mean software often applies to data. Computer code composed in a human readable programming language. We could probably discuss about human readability but yeah computer code in the programming language makes sense as a definition for software code packages and performing some operations. In the computer process sometimes data collection methodology that's a quite abstract view but certainly not not wrong. Software tools based on a computer perform specific tasks, perhaps evolving the user output of data but not necessarily. Yeah but the most interesting programs do something with data input and output. Actually scientific software set of equations. In the computer process computer study and receive digital information perform tasks. Yeah so what would you all capture here in a sense really that software does something it's executable dynamic, and that it might do that on data input, not necessarily I mean you could run simulations for example without really having input like starting a simulation from in itself, but most computer program have some kind of input and produce some kind of output. There's a difference between like data being a bit. Yeah, a different kind of digital object. Right. So, next question. Based on this so what do data and software have in common listed a couple of things what they are but if you look at the data and software maybe also on your computer where are they kind of the same. Software is data. Yeah that's an interesting quote that often comes and I will comment on that a bit later. Both are part of a scientific process. Yeah and this time at age that's that's for sure they're used together to solve problems. Yeah, I mean data alone would maybe not be so useful. They're stored on a computer. Yeah. So software in some sense that was, yeah, always the case punch cards could also be data but I mean data can also be recorded on paper of course in principle nowadays that's not so much the case anymore. Both are forms of information. Yeah. Digital form. They must have standards that's also a good one. Support research output and a very valuable. Yes, although I sometimes have the feeling that people value data more than software. So, that's a bit of a cultural change that needs to happen. Data input and output that is related to software software is computational engine to use data. Yeah, they have interpreted language. Both. Yeah, you need to learn to read both of them. Nice. So there's interesting answers. And maybe you could expect that after this question of what they have in common. The next question is, what makes them actually different. Yeah, we can have the quote again of software is data and as I said I will comment on that in a, in a moment. But what else is different. Yeah, software without data aren't so useful for science. That's true but also data without software is usually not of too much use. Yeah, data are not executable. That's an important one in my opinion. Software needs data to work, certainly for scientific software. They accumulate the software may be rerun many times and data observation aren't repeated exactly. In most of cases that's true. Yeah, so data are often more indeed more more static once you have recorded data sets. They are they are there although you might of course run measurements again. Yeah, and data doesn't change and software can be versioned. Yeah, I mean there's also the concept of versions for for data but it's much less exposed that's true. There's understandable using metadata software can understand using metadata itself. Well that's maybe debatable. There's also metadata for software, but I agree that it's again more developed for for data, especially in the also fair data work that has been done. Yeah, they require different knowledge to create implement document. Yes, so this procedural and computational thinking that is required for programming is certainly different kind of skill sets than what you need for collecting data. Data is independent of a function execution. Yeah, that comes with the software, different expertise. Data represents and software instructs. Yeah, so very nice differences and observations. Really good. So maybe like a comment from someone that I promised to do to do so this the statement of software is data or software is not data. So and that is a statement that you often hear so people say software is just a special kind of data. And in some sense, that's true. And especially when you look at maybe what people have learned in their basic computer science classes at some point. So that is a, yeah, as depicted here on the top of the slide, a very technical view and that says okay in a computer system, everything at the end of the day everything that we store is represented in sequences of ones and zeros. That's we call data, and that can be then in the computer information but also instructions and software and whatever else. And in that perspective, we would say software is a special kind of data. We weren't wrong, but we found that in the context of discussing about fair and software that this perspective is just not very helpful. And because it's a it's a level of granularity where fair is actually not operating so fair is much more on a domain specific researcher level. So fair talks about the digital object about the being like the superclass the parent, and as children of that we do have data and software but next to each other. And it means that they share as you have also identified different properties. So for example also to have, they can both be assessed assigned a DOI for identification, but then software is executable, and that comes for example with more dependencies which data doesn't have so it needs a different treatment there. So this was more the picture that we take for the discussion of, of the fair principles for, for research software. Right, so maybe some other points about research software so what is this term actually and there's again a working group working on a clear definition of what is research software and what is not research software. Because there are some edge cases for which is not not so clear but for, for the sake of here just, let's just say that any software that is used to generate process and analyze results that you. That you do in a research context that you intend to appear in a publication certainly research software. But without sticking to this one definition so much you can certainly say that research software comes in many forms for many purposes across many distribution channels, and much of opens of research software traditionally has been created as free or open source for research software that's this false term. And an interesting thing in the fair software discussion is the relationship between fair and false. Because you can see there's a clear overlap of objectives and so we want to make things available to other people and make make science more open and transparent and connected. Yeah, but false is very much about open source code and open licenses to do that, where, when you look at fair, the open data is really not a requirement. And that's the fair data discussion for good reasons because data is often privacy or privacy sensitive. For example, when you look at health records or other personal data. So that makes sense but then these concerns are not valid in the same way for software and because software are instructions methods that you should share with others in a good scientific practice and yeah this separate from the data so. Why not share the software that could be a demand for open. And as I said it's an ongoing discussion but I would like to hear your opinion like should fair for software and require software. Yeah, to have an open license. And that will be then the differed from the fair data principles, or should it be aligned with the fair data principles and say now it doesn't have to be open but just need to be clear what the excess conditions are. So just feel free to fire whatever comes to your mind here. So first answering is a yes of the principle of accessibility. I must say I don't have a have a clear opinion on that yet. I can really find arguments for both positions. So next second answering is need to know because sometimes your constraints to use of some specific software that is not open. Yeah, and it would be be a shame if you cannot make your own software open just because you depend on that. Yeah, open license by default exceptions allowable that's compromise suggestion. Yeah. In some cases will be benefit. Yes, it's a default. Yeah, I mean I'm also certainly think that fair for software should encourage open open licensing. But maybe it's difficult to enforce. Yeah, both of our integral part of open science data open source. So what fair does enforce though is that you clearly state the conditions for access, something like data will be provided on reasonable request or something and then you just decide if you like that person or not I mean that's not in the spirit of fair needs to be very clear what the conditions are. Yeah. So I'm saying here strong recommendations for open software. That's good. But maybe not not enforcing, but all will multiple way to execute a program that's also a good point yeah you could also provide a program as a service without open sourcing it but then people might make the argument okay but but if you don't offer that service anymore what can you do with it. Okay, I'll leave it at that because it can be a very lengthy discussions but just to give you an impression of the things that we are talking about. Another interesting thing that we're talking about usually is the relationship between fair and software quality, because this fair software discussion was of course picked up in their international research software engineering RSE community. And people are immediately worried about like good software. Yeah, fair is is one principle what does fair also make helping software better can it meet those expectations. And there was a lot of like back and forth and know this is like, this is fair but this has to do with software quality or these kinds of software quality. We do not have to do anything with fair. We kind of managed to resolve that this conflict by distinguishing between the form and the function of the software form meaning the way in which it is provided I hope the code looks the thing that you can download in a way the artifact that you get. And also to do with code quality maintainability things that you can you can basically assess by looking at the source code. And these things we found can be covered by the fair principles, because you can define standards and evaluating that it has to do with how things are provided. And then there are things like functionality of the software. So that means things you can observe and software is executed. And if it's, you can test if it's functionally correct. If it's secure computational efficiency these kind of things. And that is not covered by fair. And this is also not the intention of fair. And you see that when you consider the fair principles for data again. That's also not the case so fair is really not about the content but is it's about the form in which something is provided. So in an extreme case you could have scientifically completely rubbish data that can be provided in a completely fair way. And that could be the same for software, the software does complete nonsense. Completely faulty analysis of something completely stationary super inefficient, but you can come. You can provide it in a completely fair way. And so this is a bit the difference. Yeah, this is another discussion question so the, and this question is now about fair data and software so fair does not talk about the quality of the content, but should it or should it not. Is it good that it doesn't or should it rather also make demands on certain levels of quality of scientific quality of the things that are provided. So here is to hear your opinions. There is no queue and fair that's a that's a good point. Yeah, so another no I'm not a computer scientist but the code I write works for my analysis, even if it isn't pretty. Yeah. No that's a different issue. Another no yeah because different scientific fields of different standards quality maybe described as a part of metadata. Yeah. I would say quality is included implicitly because of reviews yeah that could be a point. But knowing your code is used by others may improve its quality. That's true it might be a psychological effect that people before they publish, pay a bit more attention to to make it good. Yeah. Yeah, so it's a bit of like a tendency to know should not be a principle but maybe things like that can be recorded in in the metadata. Yeah, and I think personally think it would kind of be nice to also ensure more quantum quality for both data and software but the fair principles are not the right mechanism for for doing that. Yeah, thanks so there was. Yeah, we're not able to present you the fair definite fair principles today but I mean this gave you hopefully a bit of an oppression what this discussion is about. So, meanwhile, and this has been like after we published this paper a international fair for research software fair for RS working group has formed so jointly convened as an RDA working group 411 working group. So software Alliance Task Force. Yeah, you find it under the URL that is given on the slide and you also that's one way to get involved you can still join the working group and their discussions, and the very recent outcome of that working group is actually the first draft of a set of community agreed fair principles for research software so that was presented at the recent RDA plenary. And I'm just copy it in a slide here so as I said it's a draft it's not the definite thing. And I also don't want to go through all these points here but what you can maybe see that many of the principles so the shape stays kind of the same so we are able to transfer the fair data principles to software in some cases by just adapting it to slightly bit often it's just really replacing data by software. In some other cases we need to make some small attend extensions to better capture the the properties of software for example for the finability with the position identifiers for software it's important that these identifiers aside also supports versions. And that's just possible for data but not so much required. Yeah, then we have other principles with our rephrase a bit more and that certainly also require more discussion. But yeah, this is the first draft will now be released for community consultation. So, and again said so if you're interested in discussing this more what should really be fair principles in the future. This is a good moment to join to join the working group. And ongoing discussions that has to do with the, with all the principles of courses metrics because at some point we need also to find out if we ask people to make software fair. We also somehow need to assess how fair things are and at this point I will hand over to Salva because he has been one of the first people to work on metrics, even before the fair for his working group has started to work on the course metrics as well. And yeah, I'm just slipping, skipping through your slides so just let me know when I should go to the next ones. Yeah, perfect. Thank you and Elena. So far I wanted to participate in the whole discussion but then we can have, you know, two three hours conversation, rather than the very limited time that we have for this presentation. So that means we have all these conceptual discussions were about fair for software and so on. But then we want to know how fair is the software that we're using, I would say, every day. So what we decided in opening bench was to look at the individual performance, the visual metrics for for software, but also to have a sort of an observatory to detect or to identify patterns or trends of the how the community is developing software. Where shall we put the focus, because maybe there are aspects that they are not in taking care of the people is not aware of that. So we have open events that the technical monitoring where we look at both individual tools and then at the whole population tools. When we're looking when we're saying whole population of tools we're talking about roughly 21 22,000 tools every day we're computing at another. If we go to the next slide, please. So we have we have seen the, the 30, the technical collective effort, the quality assessment framework that is an ongoing activity and probably will keep going for for for for a long time because it's really nice but also difficult to incorporate all the sensitivities and the little details into into the principles. And then we have the practical part, the technical part. Okay, so we have the principles. Once we have the principle we can derive metrics, and then the question is how do we measure those metrics. We have many metrics that can be associated to the first for software, and we decided to start with few of them, and then present them to the community. So the community can say okay we like it, we don't like it so we are open to criticism. Or how or what else we can we can measure right so in that infrastructure level we're doing an ideal strategy so distractions and transformation of moving, but you know the main tricks and then we're also having a platform to release those results. We go to the next slide please. This is a structure basically, what we do is we go to different sources that can be repositories like it back to get have registries like by the tools, or platform like galaxy and others or buy a condom, and then we analyze the data metadata. So there, we do a process of integration and organization and then once we have consolidated that we start measuring the different metrics that have to be derived from the community. But we have those metrics we process and visualize them at the individual and at the general level. We move on to the next slide please. So basically here you have a graphical description of what we do we we start by looking at by a condom by the tools kit have a galaxy. For those that doesn't know, for instance about vital tools are probably the most domain specific resource by the tool is a registry, a software registry where you can register the software. I use it on two ways. So that favor that your software can be fine by people. I want to find a software doing a specific things a specific function. So I will place two ways to register and look there, especially when you have that many, that many tools. So we query those resources, we bring the part of the metadata the data, we are integrated nice and then we start complementing those resources, looking for instance at the tools home pages, doing the, the Jordan as well so we try to enrich the data that we have. We have a better view of the software that will be in that. And of course, then we go always to the analysis and visualization, but we don't want to stop there. So we have to do the hard work. That's how to, how to present that sources of how we can identify those trends that I was mentioning. So if we go to the next slide please. I was just talking about very interesting results. So the first one is about licensing. So you might think from before the rest of the discussion of the open licenses, or not how to do it so do my thing all the software has a license. And we have depicted that that is not the case. So 60% of the software have no license whatsoever. Let's say we played by the book, like the rules software without license shouldn't be used, because you don't know what about the software, probably because we're researchers or we are going to use the software and so on but for instance a company will not use software will not incorporate that software into the processes because then they don't know what might happen if they use a software without licenses. And there is people who assume that no license means open license and that is not true. So open licenses, they have a number of things, a number of implications and no license means that you cannot make any assumption. So we have 60% of software without licenses, and then we have 38% of the software that is that have an ambiguous licenses. But answer why are we saying ambiguous and ambiguous licenses, because might happen that you have different versions, and that may be major minor, or you might have different deployments or even you can have the software register in different places that you have the the repository for the software and you have the homepage for the software and sometimes happen that there is not the same license in two places and then we were detecting those. Like me for us generally speaking, there is not many ambiguities in the interest of licensing. So when we analyze the how the licenses are, so we can see that most of them are open source, not all of them, so 71% of the software has an open source license. To the next slide you will see that the most popular ones is the GPL. So we have about 5000 instances using GPL, then MIT artistic and so on and so forth. We decided to put together by families because we have for instance, GPL one, two, three and then you have different versions and small variants and so on and so forth. So we decided to bring out together to analyze what is the general trend by the community, this 71% of the software with open licenses. And it is important to identify this because then we see what is the tendency where the community is moving. And because those are those aspects are relevant for the reusability and I will say for the interoperability as well so I want to use the data software for instance in a workflow. I want to use it and to use it and to understand how to do it. So before we move on to the next slide. I know that aspect that for us is important is the version control. So because if that happened to you that used to be that you get a binary, and you can not modify it you can not have a look at the functionality and so on so the community has to be moving from those early days to, you know, in the early days, the common practice is to have a repository with the code, but still not all the software is in a version control repository when I'm saying a personal repository. I'm saying about GitHub, GitLab, BitPacket and similar. So then we see that 30% of the software is using a version control. Of course, will be interesting and it's something that we're doing at the moment is to see the historical evolution of that because here we're putting all the software. So my bet is like new software tend to be in a version control repository, while also what might be just the binary and that can be also seen because when we were analyzing the different repositories, we saw that the predominant is GitHub. The second one is software, but to me, tend to be quite popular, few years ago, like 10 years ago, 15 years ago something like that. So, that may be a reason that still the community, especially for all personal software has to move to this kind of repository. So it is relevant for the component of pre usability. We want to know where is the software, where is the documentation and so on and so forth, and also for the accessibility, because I might find the software using a race here or using Google for instance, but then I want to access to that software. And if I don't have a way to access to that software using standard protocols then we have a problem. So those are the, the second aspect I wanted to introduce here. And if we go to the next slide, I think this is my last one, and no mistake, but I just wanted to say that there are a number of recommendation for software that I will explain to all of us. Thanks Saeva. And it's really cool to see the results of your work there, they're emerging as a, yeah, there's more, more to come about that. Yeah, one thing I wanted to mention here at this point is also this five recommendations for fair software published by the Netherlands and the Netherlands E-Science Center, which is really a radical, let's say practical approach to fair, because I mean this, as you saw before the discussion around the really principles are still going on and it can get quite academic because we want to capture it completely and really do it well. And what the E-Science Center, so it's a really, yeah, center in the Netherlands supporting researchers to do software. They wanted to quickly release something that helps people to make the software better. And if your interest is not so much in defining these principles and understand them at a broader academic or policy level, but just need to know what should I do, then this can be a very good approach to do. And what they do on this website, FairSoftware.eu, is to have, yeah, five fairly simple recommendations, where they say you can do this and then your software will be fairer automatically, right? You can do, that's very nice because there's also a lines with what Salva has just presented. So do put your code into public publicly accessible repository with version control. Yeah, that's both for, as Salva was, accessibility or usability and also add a license. And the broad interpretation doesn't even have to be open source, but do add a license so that people know under which condition they can use the software. And then the other three are register your code in a community registry. So Salva was briefly also talking about bio tools, for example, which is a big registry, like a yellow pages phone book in the life science community. We will just people will go when they search for certain functionality and then they will find the right tools for that. Yeah, enable citation of the software. There's also like different ways to do that, but let people know how they can cite you. So is there a paper associated with that that should be cited, or can the software be cited directly. There's different ways to do that, but make that possible. And the fifth, I mean, it relates a bit to software quality, but this thing of using a software quality checklist. But it will take you through, yeah, having formats having documentations using the right standard it's having the right metadata these elements of force. So have a look at that website if you're interested in really practical tips for for making fair and yeah about all these topics there's lots of more information on the website. And the nice thing is if your organization is convinced that yeah this is a good thing to do. You can also endorse this and then you will appear on the website of fair software endorsement organizations. So on that very last meant you meet a question to you these fair recommendations for. Yeah, for research software will be interested. Are you doing these things maybe already anyway. Posatory license community interest trees citation software quality checklist is this something you have. Yeah done before or are you doing or is it something all never done this but maybe I should consider supplies of course to everybody here who who has software who writes software. Alright so the answer is in to see if we see similar pattern as often. It seems a bit so many people do the first two and this is what we see very often right so many people use, presumably get up that's the most popular thing. People do provide licenses. In many cases also open source licenses make sense on GitHub. But then usually the, the other three are less exposed and you're doing quite well here and especially the enabling of citation of the software. It's usually lower so you're really doing, doing well on this one. That's very good to see. Yeah, but I think the community registries they need to catch up a bit and that's also very domain specific where when they're already like really popular and well established community registries people are of course more inclined to to put this over there. Thanks for those answers. And we are almost closing what I wanted to also give you as a kind of a fair for software reading list it's fairly incomplete but this is maybe some of the kind of milestone papers or resources that have come out in the last years. Yeah, we'll share the slides with the organizers so that they can share it with you so you don't have to type it from from the screen now, you can have them have them later. I think we just should acknowledge all the numerous people who contributed to all these discussions and this is including you now because from all of these presentations we take something to the to the ongoing discussions, and maybe you're even interested to join us in in the working and yeah with that thanks for tuning in with us this morning in a seminar that has been at an earlier hour than usual as I have learned. And yeah maybe it's the last remark this picture is what you get when you use like Microsoft online office to create slides and you type in fair then it's starting to suggest pictures automatically and then it had this like picture of this. Like the fair trade fair event and I thought this was a very last nice illustrations for this like interest interesting journey towards a fair principles for research software. Yeah, so thanks again for for tuning in and I think we can take questions we have a bit of time for that. Thank you so much for that presentation, and you even got in some semantics there in the end with fair so that's great thank you. I saw some things that were shared in the chat. Thank you very much. And now we have time for questions, you can either type those in the chat, or you can unmute yourself. I have a question for our presenters. I will start with one question that I had, while people are thinking on one of your slides on Elena you, the fair principles for software just caught my eye that there was one statement about the software should include something about associated provenance. No, and I was just wondering if you could say a few sentences about that to explain maybe to scientists who are writing code like what would that entail to include associated provenance information. Yeah, I mean I start so I can can add to that I mean provenance is often what we mean by that also for data like kind of recording the history or something has come into being. Right and therefore software for means like a what were the development steps also a bit of the change history, which parts from other projects that may be contributed to all that all this. So software has come into being is also transparent. Yeah, and then, but it's often like not completely clear in which form to record that so that's also part of ongoing work. It says can really be provenance standards also for software and there has been more work done on provenance of workflows and data. But I think for software it's not so clear yet what the standards are. You're still mute. Yes, I was muted. I agree that hasn't a lot of work hasn't been placed in into the provenance for software but to me a very basic form of provenance is a repository that you have because you can trace all the transformation that the software has suffered has been under our time to understand what are the changes. Great that is a very clear answer that I think people can say okay if I'm using version control that will go towards the provenance. I see a question in the chat. So what is the difference between general software management and fair software. You want to start Salva. Yes, I couldn't. I will say that is a super interesting question. I haven't, you know, reflect on that, you know, I will have you that first software is a component of the general software management because when you think in general the general software management covers many aspects and first can build towards that, but it's not only just on the general aspect. Yeah, yeah, I was saying like fair is something fairly science specific I would say like where general software management processes exist also in industry and all other sorts of context. And the fair as you were saying is something very specific that that comes now in. It should ideally of course be integrated then in scientific software management and development processes but the development in general is a bit broader. Ideally I mean if we bring up our scientists and the programming education so program can kind of in the future ingrain these thinking that that the, yeah the kind of fair software is not something that you think about at the end, or I have the software now how do I make it fair but that you kind of like develop it in a fair way from the beginning. And then part that's already happens right because the version control is such an important part of that. So it's, it's, it's anyway with the fair it's a quite new term that came up five years ago, but many of the concepts have been much older. They have them, but then this coining one really sexy term for us has really brought very catchy. Yes. Yeah. I think there's a raised hand from Albert. Would you like to ask your question. Yeah, thank you Leslie. And thank you for the presentation and it was great. I missed the first 1015 minutes or something so apologies if I asked something that you're already explained. To me software sounds like, you know, a package that you buy Microsoft Word or whatever right where numerical models are more like, you know, the toys that researchers make so why have you chosen to mention constantly software versus, you know, numerical models. Is there a reason for that. Yeah, I mean the difference that we discussed at the beginning was really like data versus software right where they are different than what we have separate models so that is indeed something we discussed in the beginning but the, that you mentioned the point of models that's actually an interesting one because also models you have all this like train machine learning models for example. And then we had this discussion recently out are they actually data or software. So where do they belong and that's not so clear. Yeah, so I, yeah, so I don't have a really answer to that but you really hit a point that this is the kind of thing that is in between. Yeah. So to complement on that, we made the distinction on research software, because while we were discussing on that we explicitly live out, let's say, Microsoft Excel were, you know, the, those tools that help you your day to day activities. And do not enable research that is not conducting to, to a publication or to see a patient eventually for instance in the health main and so on and so forth. That is important. And then for the discussion about the model, I will claim that it's a digital object. So, you know, if you use an ontology you say well it's a digital object. So let's apply one of another from both the data on the software. Maybe like a fair models is another, another discussion that should be should be led by modeling experts. Thank you. Thanks. Interesting question. Thank you. Let's get to this other question. Okay, there's two more questions I think we might be able to get to them if we are succinct so one question in the chat is should version control systems be a fair requirement for research software. So I will, I will start by that, and I will make the statement that for to me as is aspirational. It's not like why white and black is like you built and then you probably don't have a fair or more fair software support. I would for sure require having a version control but it will not make it explicit that has to be, you know, and if you remember one of the analysis lines was a specific mention about version for soccer, because the personal software has many implications and the other one is about when you want to reproduce your analysis or when you are doing benchmarking. You want to know which version there's and you have to use to the analysis of which version you have benchmarking when you are compiling things because differences between versions can have a great impact in the results you're having to reproduce or to compare results. Thank you and I'm going to try to rush us to this last question we have in the chat which is, could you please talk a bit more about software includes qualified references to other objects in the interoperable principle. Can you give some examples of what is a qualified reference. I mean one simple example I mean it sounds very complicated and it can indeed if you look at the full semantic stack and semantic website can be very complicated but one of the qualified references can already be a dependency to a library. That's a qualified mean there's an instruction and defined way and software I mean if you have Python scripts for example you have the imports in the beginning that's already qualified like a reference to to other software. So that will be a very simple example. But that's already one thing that can be made to that yeah. But the idea is that in the metadata and there can be even more that is really like link data and semantic web content and then it gets more complicated so this is all meant by this term but a very simple one is really just the import that you do. And that's going to be a whole nother webinar. Well, that is excellent. We are at time and I want to thank everyone for being here but thank you presenters thank you. CSDMS members and thanks for the great discussion. This has been recorded and it will be posted on the CSDMS website. I think all that's left is to say, see you at the next event and thanks have a great rest of your day wherever you are in your in your day today.