 Good morning everybody. My name is Matthias Lippus. I'm from the Australian Research Data Commons And I would like to start this webinar by acknowledging the traditional owners of the lands on which we are I'm in Perth, so I would like to acknowledge the Wajuk people of the Nungar Nation And I'd like to pay my respects to their elders past and present Today I'm very pleased to introduce an international guest beaming in from Chicago, I believe Illinois Dan Katz, who is assistant director of for scientific software and applications at the National Center for Supercomputing Applications Also known as NCSA I'm pretty happy about this. I the NCSA is Is world famous. I mean amongst other things invented the first web browser But Dan has something else to talk about today and that is research software sustainability Now as Dan speaks you can enter questions in the question module in go-to webinar And we will address those questions at the end in a Q&A session. So over to you Dan Okay, thanks very much Okay. Yeah, thank you Matthias, and thanks to ARDC for for the opportunity to talk to everyone there I am indeed going to talk about research software sustainability But before I do that, I want to just start off by talking about research software itself and Give some reasons that that we actually should care about research software Some of these slides are going to be a little bit US centric. I apologize But those are the examples and I know the best and so I think that this is hopefully translatable We did a study where we looked at NSF projects and we found that 20% of them over 11 years topically discussed software and their abstracts This is the National Science Foundation the main research Funder in the US and so that's 10 billion US dollars that was spent on software projects The Department of Energy has a large project called the Exascale computing program It's currently going on trying to build all the technologies needed to get hyperhormones computing to Exascale and Two of its three main areas are research software and that's about four billion dollars that is being spent there So this is a pretty large amount of money that's going into research software we can also look at publications and software intensive publications Software intensive projects actually are a majority of publications today And the most cited papers are papers that are either discussing methods or software I should mention that I'm providing a bunch of footnotes and references on the bottom of these slides. I'll be Providing the slides themselves after this webinar and so hopefully there will be a way that you can Download them and then click on them. I hope The third thing that we can look at is is just actually to ask the researchers What they do and what how they use research software and we've done two surveys one In the US and one in the UK the US one of postdocs the UK one researchers at top universities 90% roughly in both cases of the researchers said that they use research software About 65% said they wouldn't be able to do their research without it and about 50% develop software as part of their research We can also look at the fact that there are a bunch of organizations that are studying research software And are trying to contribute to the research software ecosystem In the UK, there's the software sustainability Institute, which is in its third period of funding They've been going for over 10 years now and they're funded by all UK research councils Which really are all the different disciplinary and topical areas of research funding In the US, there's the better scientific software activity It's a clearinghouse to gather discuss disseminate experiences techniques and tools and other resources To improve developer productivity and software sustainability that's funded by the Department of Energy here in the US There's an activity that I'm a co-PI on called hersey the US research software sustainability Institute This is a conceptualization project So we're trying to plan an activity that would be something like the SSI in the UK And we will be putting in a proposal to do some of this The actual institute work Later this year. We also have some interest from private foundations that are supportive of the things that we're trying to do And then finally, there's the research software alliance Which is really intended to coordinate all of the all these groups that I've mentioned as well as others that are also interested in this space so we're building up a community of Organizations that are all interested in this and all supporting this and trying to figure out Really how to promote research software and the awareness of research software Another thing we can look at is is open source and software growth In 2001, there were 208,000 source forage users in 2017 this was up to 20 million github users and Last year this was up to 37 million github users So there's an increasing number of people that are developing software and getting involved in And then producing and maintaining it In terms of software itself in 1998 There were 180,000 downloads of Netscape. This was a web browser of that time for people that don't remember this over a two-week period In 2017 there were 21 million downloads of a JavaScript library and that same two-week period in a similar two-week period so we've gone from the the point of view where there were a relatively small number of downloads of a User application to the point now where there are billions of downloads of something that most people don't know anything about But it's kind of hiding behind the scenes and making things work The last thing to mention is that There was a 2018 survey of scientists developers That found that 82% of them were spending more time or much more time developing software than they had been 10 years ago So we have an increase of software users We have an increase of software usage and we have an increase of people developing software So software shows up in the research cycle in a few different places This is a kind of a stylized research cycle. That's not really how things work, but it's good enough for this We have the idea that somebody creates a hypothesis They acquire some some resources to investigate the hypothesis that could be some funding some software some data They then actually do some research and on the process of that they may build some more software or build some more data And then they publish the results which traditionally would have been a paper or a book But today might actually be the software itself as the result or the data sets as the result And this is the the knowledge here And this knowledge then enables this process to start again right based on this knowledge You can create a new hypothesis and then start the cycle again But it's not exactly true because in order to actually start the cycle again You have to gain recognition as well and that recognition is the thing that helps you Acquire the resources to do the next piece or to find the collaborators to do the next piece So so software shows up here in three places as you can see In one area it's research research software the software that is done as part of the research And in the other case, it's what I would call infrastructure software. It's the software that is Coming into the researcher going out of the research. It's being shared with others so To repeat this with a few more words We have some research software that's really intended just for research. It's funded by different funding agencies It's sometimes explicit, but it's often implicit The the researcher doesn't say that their purpose of getting a grant is to develop software But they're going to develop software to do the work that they are going to do The software is intended for immediate use by the developer. It may be archived for future use and reproducibility And it's probably dependent on infrastructure software, which is the other kind of software This is software that's intended as infrastructure from the beginning. It's intended to be shared it's almost always explicitly a Subject of a grant proposal and when it's funded the funding is to produce the software and to share the software It's usually intended for use by a community sometimes a small community often a larger community And it's easier to find appreciation reward for building the software sometimes because of the sharing The software that's intended for research can be turned into infrastructure software But this requires making a conscious choice By the researcher and test consequences that are both positive and negative and I'll get into this a little bit more as we go on But just to go back to these two different kinds of things for the minute if we think about infrastructure software projects They're often built by people that maybe consider themselves tool makers. They're people that want to make something that's useful to others And and they like this process of making and giving and having other people appreciate what they've made Once they've made this the software then they have some options they might accept contributions from other people and If so, they might broaden the focus of their software They might want to bring in other related packages and build something bigger And they might want to broaden the government So they might want to give up control of What they've done in order to give other people a stake into to share this governance and to build a better collaboration An example of this is a project that I work on called parcel, which is a library that permits interactive parallel programming and Python It as you can see on the right Has these purple Decorators that you can put in front of a Python application or a bash application an application that calls some kind of external function or external executable And by putting these decorators in front of these functions. It then lets the parcel library Understand that these functions can be run asynchronously and that they return futures Immediately proxies for the results that may not be available Parcel also can look at data dependencies and they can see if there are a bunch of different applications that don't share any dependencies it can run them in parallel on an HPC or system or a cloud system or something else And so the user doesn't have to worry about specifying the order of things are these dependencies It's picked up by the library. They don't have to Worry about how do you submit this job into a particular queuing system on a particular resource that's done through a configuration option? So this allows people to write parallel programs in Python that are independent of where they run Or the user just defines the functions that they want to run so This is a project that was built based on Previous project called Swift, which is a workflow system in language where we had worked about ten or more years on this before Parcel was initially funded by NSF about three million dollars over three years and we're currently stretching this into a fourth year We have two and a half core developer FT ease PI a bunch of co-PI's I'm one of those some chemistry and education application developers and some undergraduate and graduate students that all work on this We started this as an open-source project from the beginning intended to be an open community project where people would contribute Including building a library of reusable workflows that would be shared from from one user to another Some of the interesting things that happened along the way in this project And again, this is kind of typical of infrastructure projects in general is When we had our first outside user first person that wasn't part of our team that wanted to use this Our first outside user who didn't contact us first who we just found out was using our Tool when they when they told us they liked it or they told us that they had a problem or something else The first outside contributor the first person outside the project that contributed code The first outside contributor who didn't contact us first Or we just got a pull request from somebody that improved something without us expecting it We've had some success with purely external contributions to code like that, but we've had more success with collaborating projects the LST dark energy science collaboration. This is an astronomy project and Professor Doug Thane's group at Notre Dame are examples of groups that have come to us and have been interested in and Both using and contributing to parcel And and so in some cases they provide funding in other cases. They just provide collaboration We have a new follow-on project called funkex That is NSF funded and we're going to use the four years of funding that NSF is giving us to develop on kex But but because funkex is going to use parcel We'll also use some of this funding to support parcel because we couldn't do funkex without it So this is part of the sustainability plan that we have is this follow-on project If we look at the research software projects on the other side These are often built by somebody that has some problem. They want to solve and they need to write some software in order to solve that problem And once they've done that they have options. They can keep the software private Or they can share the software and if they share it then they get into the same situation as the infrastructure software They can accept contributions and if they do they might broaden the focus Bringing together other related packages or broaden the governance and collaborate with other developers So an example of this kind of research software is a software that I worked on in my my graduate studies Called finding difference time domain electromagnetics My PhD was in building a new method for this method Sorry, I actually a boundary condition for this method And and adding ways to treat curved materials and Cartesian Cell-based code which maybe means something to some and how to model infinite space On a finite grid So sorry the details of this aren't very important the main thing that's interesting here Is this was a code that my advisor and lab had and I added features to this code It was written in the like 1970s originally when I came in it was the late 80s and early 90s It was written in Fortran 77 originally with no subroutines It was just thousands of lines of code in one big main function And when I was working on it, I was involved in modifying it doing some better software engineering porting it to some other languages Supporting vector computing supporting parallel computing But we never shared this code and it was viewed as the labs in a natural property The algorithm itself was shared and many other people redeveloped the code And there's now a lot of different versions of this code including open source and commercial versions So this was an an example of research software that was not shared but was published upon So the if we think about this then there are different stages that a project can go through with these potentially as examples One stage is that you can Turn a project into open community research software where you start off with code and a user and gradually you build up a team And then you find other teams and community users And maybe you get to a self-governing developer community and maybe you get to some kind of a foundation that supports everything at the end You don't certainly don't have to go through all these you can just stop at some point In some sense the FDTD code that I talked about really ended up at stage One effectively and somewhere between stage one and stage two because that was the choice that the Well that my supervisor the the owner of the code basically made Another path that you can go through is a commercialization path Which which leads to a commercial product that raises money and that money leads to support And that support then keeps maintaining the code over time and this is also a very valid path When you're going through one of these processes though You have to change stages along the way and at each point or stage you have to decide consciously that you want to go forward to the next stage In order to do this you have to think about the methods that you're going to need the goals that you have and the consequences of making this choice And what resources are available to help you when you get to the next stage? And also what kind of work is going to be needed? So when you're doing initial development of a Of a product a lot of the effort really is going into building the product and Understanding it and looking at different options and very little is actually going into delivery But once you get to something that's more mature most of the effort actually then goes into delivery and working with users And and things like that and very little effort actually then is in further work on the product itself and in adding new features or Or looking at different options or doing a complete redesign and sometimes these things happen But the but the majority of effort of a mature product is in the delivery And so going through these different phases you have to think about it are the are the right skills available to move Do you have somebody whose expertise is in the delivery side not in the software development side? What are the incentives for you to do this? Why why would you want to do this and how would you measure success? And finally will your institution support this does this actually make sense in the in the place that you are? So the reason that some of these things are important is Is something that Conor and Hinson is designed? Sorry to find this software collapse, and this is the idea that software stops working eventually if it's not actively maintained The reason for this is that computational science software and data science software are built in a stack structure Where at the bottom of the stack there is non scientific infrastructure that's developed often by professional software developers Things like operating systems and compilers and support code for IO and user interfaces and things like that And above that is scientific infrastructure that's often developed by Professional developers again, but put more research developers These are things like libraries and utilities that are used for research across a lot of different disciplines On top of this is discipline specific software That may be developed by professional developers or may be developed by researchers who consider themselves researchers first These are things like tools in the libraries that implement the disciplinary models and methods And then on top of that is the project specific software that's developed by the the researchers themselves The software to do a computation to do a computation Using the building blocks that come from the lower levels things like scripts or workflows or computational notebooks Or special purpose libraries and utilities The problem is that software at any level builds on and depends on the software and all the levels below it and Any change that happens below the level that you're working in can cause your your software to stop working So having said that then I like to think of software sustainability as the capacity of the software to endure and This then can be further defined as will the software continue to be available in the future on new platforms meeting new needs the software development and maintenance requires human effort and Human effort is kind of like dollars, but not exactly We can have projects that are entirely human effort based These are community open source projects where people come together without exchanging money because they want to work on something together They want to contribute to it We have projects that are that are all salary based Where we're all the people that are working on the project are paid to work on that project And they're paid in some sense through the project and that's the case for commercial software or grant funded projects often It's very hard to combine these two The effort is not exactly equal to dollars and people are not purely irrational So if somebody sees a person working on a project that's getting paid for all their time and they're being asked to volunteer They may or may not actually want to volunteer. They may feel like they should also be paid so let me give you a couple of examples of of some Problems that we've had an open source software One is in open SSL In 1998 a UK group built a set of internet encryption tools called open SSL and in 2011 they inadvertently introduced a bug called heartbleeds into their code By 2014 two-thirds of websites around the world relied on open SSL And open SSL had one full-time developer who was barely supported by a foundation called the open SSL software foundation It was actually a private for-profit company In 2014 this bug was discovered and it was then fixed fairly quickly after that And the open software foundation open SSL found software foundation Excuse me requested donations to avoid things like this in the future And they said to the world Two-thirds of the websites in the world are alive on this Can't you give us some money to help support this thing that you depend on? And they got nine thousand dollars and they were very unhappy with us And then they let a further campaign and had a lot of publicity saying this is a horrible situation We really need more money This is important. You should give us more money and that campaign led to support for four developers for three years up to 2017 Today open SSL and the open SSL software foundation have turned into an active project With 18 committers and financial and in-kind support from a bunch of different companies And so this is an example of a project that almost died but but had enough Let's say community pressure that led people to contribute and and turned into a reasonably healthy project today Um a way of looking at this is in terms of a bus factor and the bus factor Sorry, let me back up first second can be defined as sorry. This is a little bit crude For a software project the number of people that work on that software project They would have to be hit by a bus before the project would stop working And so if that number is one then that project is dependent on one person if that number is ten Then that project is dependent on ten people and is probably much healthier than the project that's dependent on one person There's an organization called numb focus, which is an umbrella and non-profit that support scientific software Including all these packages on the right to which probably you've heard of a number of them and probably used a number of them as well Um numb focus has had a sustainability summit of its projects annually since 2017 and in that initial Summit they did a factor or the survey where they asked each of these projects what their bus factor was How many people the project depended on? And they got a bimodal answer back that about half the projects depended on only one or two people and the other half depended on four to six people One of these projects that I would Guarantee that a number of people in this call of use, but I don't want to name is in the one to two category and And it's used by probably millions of people daily It has support for Actually, well at the time that I was writing this and have support for two developers each one day a week From the companies they worked for They had a huge backlog of issues and pull requests and they didn't have time to go through all of them Which led to the contributors being frustrated because they couldn't get their their contributions actually into the code And they were able to support students, but the students actually weren't able to be the core maintainers And so this was a problem at that point In the wider open-source community a study that's been done looked at popular projects on GitHub and Found that two-thirds of these popular projects effectively had a bus factor of one or two So this this really is a problem that the people do not contribute in a maintainer way to a lot of these projects So just to summarize where we are at this point software that's Developed and used for the purpose of research To generate process analyze results within the scholarly process is research software. It's increasingly essential within the research process But it collapses if it's not maintained software bugs will be found in Inevitably and new features are needed and new platforms arise that the software needs to be ported to The software development maintenance in order to do these things is human intensive a lot of the software that's used It's developed specifically for research and by researchers and these researchers know their disciplines, but they don't often know software best practices And the researchers are not rewarded often for their software development maintenance activities in academia And then finally the developer community does not match the diversity of the overall society or the user communities So we have a number of problems that come up Things that we can do about those problems I think are best described by Max Blanker Who wrote in German but translating into English a new scientific truth Does not triumph by convincing its opponents and making them see the light But rather because its opponents eventually die and the new generation grows up. It's familiar with it and The shorthand version of this that's often described as science advances one funeral at a time and the version of this that I would like people to think about more often is that the culture of science advances one funeral at a time so So there's problems here Question is what we can do about these problems One thing is we can just wait right? We have senior people that are not as familiar with research software We have younger people who are familiar with research software. We wait for the senior people to Let's say charitably to retire And then the younger people move up and and they change the system And that's what Plunk said The other thing is we could try to act now and if we wanted to act now I wanted to talk about some of the things that we could try to do So I've previously given a talk about 12 scientific software challenges This includes incentives and citation credit models and metrics as one Career paths as another training and education as a third and then a bunch of other things as well And I don't really have time to go into all these here I do want to mention that these are all tied together and I just want to talk about the first two of these briefly So first incentives Citation credit models and metrics So if you think about how we work in software and want to get credit for that one question that comes up is What should we actually be measuring in order to give credit? If we think about a person who's the developer of an open-source physics simulation There's a bunch of different metrics that we could consider Ranging from how many downloads of the software are there to how many contributors are there to the software To how many times is the software used to how many papers cite it to how many papers that cited are themselves cited and The things at the top are the easiest to measure the things at the bottom are the hardest but in some ways I would say the things at the bottom of the things that are the most valuable to measure and Primarily the things related to impact I think are the things that we really care about often in terms of giving credit If we think about somebody that's developing an open-source math library instead of an open-source physics simulation The metrics are probably similar, but the citations are less likely You may not be able to measure downloads for a variety of reasons and the people that use the software Maybe don't even know that that's the software they're using because it's hidden down a layer from the thing that they're calling directly So if we want to try to measure software impact, we have a few different choices one that I like is citations and Because the citation system that we have was created for papers and books We have to figure out a way to jam software into the citation system Another thing we can do is to look at altmetrics, which are not citations But other structured means of discussion about software things like tweets and blogs for example There's a project impact story that automates the measurement of some of these things measures the number of reads and citations and tweets Software and papers and other things There's a project that they used to exist, but isn't completely Alive at this point was more of a demonstration called Debsi It was really kind of like impact story specifically for software So it measured software impact by looking at downloads and software reuse from looking inside GitHub and looking for forks For example stars and things like that and then also citations and tweets And then there's also libraries.io which counts software dependencies, which again can be a useful metric to know how your software is being used If I don't just look at citations briefly because that's I think one that's kind of interesting Today we have a problem that software and other resources currently appear in publications in very inconsistent ways Howison and Bullard did a study In 2015 and they found that a random sample of 90 articles in the biology literature led to seven different types of software mentions And none of these at that time were citations to the software as published software There were three different kinds of citations a couple of different in-text mentions and one One mention that's a little bit strange And I don't want to go into for the minute where there was a discussion about software, but there was no even name of the software discussed in the works When people have done studies on data citation and facility citation they found similar results So one way of addressing this is by writing a paper about your software And so I'm one of the co-founders of something called the Journal of Open Source Software or JOS That's trying to to address that JOS is a developer friendly journal for research software packages We say if you've already licensed your code and have good documentation, we expect it should take less than an hour to prepare and submit your paper Everything about JOS is open be submitted and or published paper the code itself the reviews and the process are completely open There's no no blinding everything is very collaborative in terms of the review process and the code for the journal itself is also open JOS papers or archives. They have DOIs and they are increasingly indexed This this takes time for a new journal, but we're working on it and getting there now The first paper was submitted in May of 2016 after about a year. We get accepted a few over a hundred papers After about four years, we had about 900 accepted papers Before COVID-19 struck we're at about 400 papers a year was our current rate We went on a brief hiatus and we've just within the last week started accepting papers again in the first Four days of accepting submissions. We got 60 submissions Which is kind of a good sign in terms of demand But kind of a bad sign in terms of those of us that need to actually then review this and make this work On that note, we started off with one editor-in-chief and 11 topic editors We have moved at this point to one editor-in-chief five associated editors-in-chief which I'm one of 31 topic editors and 13 emeritus editors And we're currently trying to bring on another 10 to 15 topic editors to deal with the growth that we've had So this again is one example as it is write a paper or try to make this process as easy as possible And then ask people to cite your paper when they use your software The other thing is to cite the software itself And so I was one of the co-founders of a software citation group that started under an organization called force 11 in 2015 and we merged with a different workshop group a few months later and ended up with about 55 members from a variety of different Areas we reviewed existing community practices about software and about citation developed a bunch of use cases for what we thought was important for software citation And we drafted and then published a document paper on software citation principles By starting with the data citation principles that had come out about a year or two earlier and When updating those based on the software use cases and related work and a bunch of working Discussions in the feedback and a workshop. And so this paper was published in 2016 And it includes six principles Importance credit and attribution unique identification persistence accessibility and specificity and these are principles that again We believe apply to citing software itself not citing a paper about software our group ended in 2017 and We then started a Second group a follow-up group called the software citation implementation working group to take the principles and turn them into practice Again, I was one of the co-chairs of this and we brought on a couple of new co-chairs We have been working with institutions and publishers and others Since since this process about three years now There's a lot of good work that we've done and we're I think coordinating a lot of ongoing activities and so just to give some examples data sites schema version 4.1 was The changes from 4.0 to 4.1 were brought in to address software citation There are activities called code meta and citation dot CFF citation file format That are being aligned with schema org and these are all ideas about how people record the metadata about their software In a machine-readable way The authors for example so that somebody that sites the software can find out who the authors are and and create a good citation for that There is open-source archiving and identification work going on in a project called software heritage There is I would say good work in initial acceptance of software citation in some communities in particular astronomy and earth science and mathematics and and Progress in other communities as well, but I would say these are some of the exemplars at this point We published a software citation checklist for authors and a software citation checklist for developers Authors means paper authors in this case and developers or software developers And we also have written most of a software citation checklist for reviewers for people that are reviewing papers for example There's a repositories task force that's developing a set of good and best practices for registries and repositories in terms of how they can accept software and index it and provide identifiers for it and Earlier this year we started a journals task force to develop simplified guidance that we can give to editors and journals and publishers That they can then share with their with their authors and bring into their processes So again, I think there's a lot of good good work going on here The second thing that I want to mention then the last of the two things is career paths So I would say the career paths for software developers and universities is unclear at this point The picture that's here on the right is a Google bus I have a colleague at it was at the University of California at San Diego and He had a bunch of developers that worked for him in his lab And these developers would be considered postdocs for a while and then after some point They would decide they didn't want to be postdocs for the rest of their life and they wanted some job that had more security and was would help them with provide stability for themselves and their families and And they would often go to work for Google in Irvine about Probably 50 miles away And Google saw enough of these people happening enough of these situations that they provided a bus so these people wouldn't have to Leave their house, but they could go from their houses in San Diego to Google and Irvine And and work along the way So one question on that comes up is should we give up? software developers work in universities Maybe in favor of work in national labs and and government Research or government funded researchers in or in industry and the reason that somebody might say yes to this is that So there's more financial awards in those organizations They often would have cohorts others with similar problems and solutions that they could work with And they would have better career paths better promotion opportunities I think the answer is really no And the way that we can address this is by trying to make published software or software papers or software citations Something that can be used to make software evaluate output similar publications and make software show up in a way that counts in universities Another option is using university centers like NCSA where I work for the San Diego Super Computer Center or attack in Texas To give programmers our home and provide them a critical mass of people in a particular place That allows that place to develop career paths and to work with the human resources organization to put those into place and then a third option is Is that the more Sloan foundations created a data science program? In this created new structures across three universities University of Washington University of California Berkeley and New York University and These new structures which were different in these three different universities are potentially models for other universities to support Professional staff that work on data science or work on software development And so it'll be interesting to see if these if these catch on they are starting to I think in some US universities And then the last thing is research software engineers or RSEs And this is a term that you may or may not be familiar with so let me talk about it a little bit So research software engineers are not independent researchers. They don't have a personal research agenda There are people who are Facilitative and supportive and collaborative by nature or by training They're part of the academic community, but they have professional IT skills And they have a deep engagement with research groups They have the ability to understand and study the research projects and to be part of the research groups activities overall They can read and understand the papers that the group produces or the group works on And they provide a they provide sustainability and some long-term support for software activities They provide institutional memory as the postdocs and the students move on out of the out of the lab The RSEs potentially Provide the memory that let's the the software continue and let's expertise in that software continue And they provide continuity and stability and maintenance of the software activities as well But in most institutions, they don't have a formal home in academia or a career path And so in 2012 at the SSI collaborations workshop A group that was there came up with this this idea that they recognized that this was a problem And that there were all these people and they came up with a name RSE research software engineers to identify them and In 2012 University College London formed an RSE group and this was as far as I think anyone knows the first RSE group in the world More groups formed shortly after this in Manchester and Sheffield and Southampton in Cambridge Again, you'll see these are all the UK because this is really where this started In 2016 one of the UK research councils awarded RSE fellowships To people to help them move into RSE positions and to encourage universities to support these positions And a bunch of RSE conferences started in 2016 These have grown over time the one that happened last year was the first one Or it needed to be called the UK RSE conference as opposed to just the RSE conference because other activities around the world We're also starting up and and getting some prominence Last year the UK group formed the Society for Research Software Engineering This is a professional organization that's independent as a membership fee as voting rights for members as a steering board And and officers and things like that and in the intents is this really turns into a professional society for RSEs over time as it grows Other countries are at various stages of development in terms of their RSE activities In particular I want to mention the US activity which has the graph of members on the bottom right Where we've gotten to about 400 members as of last month and we're kind of growing I would say not exactly steadily month-to-month, but fairly steadily overall There is Australia and New Zealand RSE community and there's a link to that that is on the slide and I'll show you a couple slides as well In Germany and the Netherlands there were initial conferences in 2019 And we were planning an initial workshop in the US in April that we have postponed Currently we're thinking about October, but we may end up postponing it again until next year So part of the question that happens with people that are either RSEs or people that are faculty or at any other Situation is that we have to think about how they're promoted and evaluated And so the the guidelines that are given for promotion and evaluation particularly in universities are really important Because they say what's valued and they shape the operating the activities that people undertake and these promotion guidelines are written by the senior people most often and so a question that comes up is how can they be changed and So I just want to kind of say to everybody that's here whether you're a senior or not You can still influence is the influence these processes when you participate in these kind of evaluations if you're on a On a hiring committee if you're writing a letter of recommendation for somebody You can take try to take into account their software activities And we as a community can provide templates and guidelines for recognizing software contributions And then we can encourage organizations to adopt them and there's a bunch of different groups that are working in this space trying to do this Promotion and evaluation are not fixed. They can be changed In 1994 the National Academies in the US created a report called academic careers for experimental computer scientists and engineers this Report said experimental artifacts are important in computer science and should be part of evaluation of computer scientists And this was intended to be a reference point for change And this has been quoted in a bunch of different tenure recommendation letters And I think this has made a difference in the computer science discipline I don't know that there's really the equivalent of this in other disciplines. And so this is one of the problems The National Science Foundation in 2013 Changed its biosketch requirement its requirement on what people provide when they submit grants to ask for products rather than Publications so that people could talk about the software they've produced through the data sets or other things like that And this was intended to be a signal to universities that universities should also consider these other products, not just publications so Some solutions that we can think about One is to try to convince governments and funders of the importance of software and the importance of sustained funding of some of it including Including the maintenance of that software via the research software alliance and other organizations like RISA We can encourage the use of software citation day developers You can join the force 11 implementation software citation implementation working group if you're interested in this We can try to build better career paths for developers via RSE activities in the UK and the US or in Australian New Zealand And and we can try to impact department and lab policies as well to support these paths And then we can also develop and use software best practices through something that I'm trying to push called project carpentry Or we develop a set of lessons to teach projects how to how to build projects that are effective Or through incubator activities that products could go to And finally if you're really interested you can join groups that are working on these things SSI in the UK or C in the US Numfocus and another group like numfocus called code for science and society that a global again RISA or other things So, um, so I hope there this leaves you with some idea that there are things that we can do and the situation is getting better over time I'll just leave up the slide thinking a bunch of different people that have been Significant in helping me think about this or funding parts of my work And we should probably go at this point to any questions that there might be so so thanks everybody for listening and I'm very happy to talk about this either now or if anyone wants to email me or tweet to me or anything else later Thank you Great. Thank you very much for Dan very insightful and we certainly have some comments about how valuable all this material is And so in answer to that question that was asked I would like said the slides will be made available and this session is also being recorded And I'll be distributing that recording once we've uploaded it to YouTube Okay, onto the questions. So the first question that came in Is asking whether you have any information on how many software projects survived the first two years After initial funding has dried up So I I don't have an answer to that I can say that James Howison has been doing a study to look at How projects that have been grant funded have been able to move to be coming community projects that are sustainable through community activities And the survey that I've seen I think the partial results that I've seen from him had something like 85 projects that he had looked at that were funded by a particular NSF program and Somewhere between one and two of them had moved into being successful community projects That doesn't mean that the others dried up because like parcel as an example They're often our follow-on projects that continue to fund that and support that technology So the fact that the grants it's ended doesn't mean that all the funding has ended So I that's the reason I don't really have a very good answer to that Okay, thank you Next question And so this is partially commentary about the bus factor and how the bus factor is a bit ambiguous So for example a bus factor of 10 could mean that a Any one of the 10 are substitutable for any other and thus all 10 need to get hit by a bus or 10 buses Or there are 10 separate points of failure You're using it more in sense a in that the the 10 contributors are interchangeable But this particular person suspects it's more that B is more realistic. What are your thoughts? Yeah, so I think it's probably somewhere in between So in in parcel just to give an example We are initial PI who is the the person that led the work in developing the software and getting the funding for the software Left to form a startup company and we brought in a new PI and so The fact that there were still Four other co-pias let us bring somebody new into the project and get them up to speed Even though this new person actually hadn't worked on the project because none of the other four of us Actually wanted to be the the main PI so I think that they're If you have enough people the loss of any one person in a good project Should be something that you can recover from by bringing somebody else in new and training them based on the knowledge of the other people but I also agree that the Person who asked the question that there certainly is some aspect of that that part B as well Okay, great. Thank you Okay, next question if you work deeper in the stack So in the infrastructure, would you expect it's more difficult to get citations? This tends to happen with data producers Okay, so so the answer was yes without any question the head of NCSA Bill Grop is a developer of MPH Which is one of the main MPI packages and he and I were just having this discussion Actually the last couple of days about how do things like an MPI library get credit and There really is not a good answer in terms of thinking about citations The answers that exist are trying to put in some kind of tracking of the software or doing some kind of survey is to understand how the software is being used by Infrastructure providers or or maybe by users but the users maybe don't even know that that's the software that's underneath So there's a lot of things that we can do None of them are perfect. A lot of them have privacy issues that are hard to work around so The further down the stack you go and the less the direct user Can tell you that they're using yourself for the harder. This is Yeah, and that actually leads quite nicely into the next question Which it was around measuring usage So the the question is do you think there is any kind of appetite for some kind of centralization of usage monitoring? For example, whenever a web service is called Or sorry for calling a web service each time the software is invoked by a given user or IP Yeah, so it's funny in order to try to cut this down to something that was going to be a shorter talk than the normal Our talk that I give on this I took out a slide that exactly said that has a suggestion So so one trick that people do sometimes is if they have software that's being built that they're developing they put a Into the make file they put a curl command That just does something to a website that they own and then they can count how many times that website has hit Which tells them how many times the software is being built And you can do the same thing with running the software though And again the problem with these things is is privacy and do users agree to do them and so in parcel we initially started with actually with opt-out user tracking where we did collect data from all runs and we made this very public in our documentation that we were doing this And we got some very negative feedback and we switched to opt-in user tracking And now we probably get about a third the amount of information we used to get But it's probably healthier So so I think the problem is that doing the centrally is something developers want but users don't want and it's very hard to convince Users that that this is something that they need to do in order to In order for the developers to be able to make a case to get funding to continue their work Okay, great One last question, which is probably all we have time for Regarding software citation Can you comment on a good way to capture the fact that software may be actively worked on for many years after the initial software publication that is cited So some organizations are looking for publications in the last five years when evaluating for promotion And so do not capture the ongoing development and maintenance work Yeah, so I think this is this is a really good question and it's certainly a challenge There's a few different answers to this depending on how the software is being cited So one is that in terms of Joss, we're very happy to have repeated publications of the same work when major changes have been made And so this could be that every major release gets a new publication If you're talking about software that you're citing directly until you have software that's on On GitHub and you've connected it to Znodo so Znodo is archiving tagged releases you can be Asking people to cite the release that they're actually using or in terms of software heritage They could cite the commit hash of the of the software that they're using And so you then would get Citations to more recent things depending on what people are actually using and there's advantages to doing these things Because the if you think of software that was written Some long time ago The initial developers are probably not the current developers and so people that come on to the project later Don't actually get any credit which can be a problem The the issue on the other side is the more you split the citations over different objects the harder it is to count them And so that's something that I think we need to work on as a as an indexing community That's not something that we should make The software developers have to deal with individually Okay, great. No more questions have come in. So I think we can leave it there So I'd like to thank you once again Dan for staying up late to deliver this webinar to us And then delivering at a time that is friendly for all parts of Australia as well as New Zealand And so as I said, we'll be distributing a recording to the link and the slides once the recording is available Please have a nice day