 Good afternoon everyone and welcome to the BioXL webinar series. My name is Rossin Apostolov and I will be today's host. These webinar series are brought to you by BioXL, which is the leading center of excellence for computational bio-require research in Europe. In this series, we feature notable scientists in the field, developers of popular software applications and various tools. We also invite in our series guests from different initiatives or organizations that do work, which is very relevant to our field, which is in fact one of our guests today, Steve Crouch, from the Software Sustainability Institute. It's my pleasure to welcome today Steve Crouch, who is leading the research software group at the Software Sustainability Institute, SSI, and he's been working in the institute for many years. He's assisting researchers in the communities by consulting on software development in general and the best practices. He's managing the open course, developing best practices guides. He's also heavily involved in teaching at the Software Carpentry workshops, which you are probably very familiar with. He has a very extensive experience in it, and it's my great pleasure to have him present today about how to make software sustainable. So welcome, Steve. I will now give you the presenter mode. Okay. Right. Okay. So, what is that? We don't see your slides yet. Let's see if it does it now. Yes, that's good. Great. Well, thank you very much for the nice introduction there, Ross. And hello, everyone. It's nice to meet you all virtually, so to speak. So what are we going to talk about today? Well, software sustainability and why and how to do it. For a long time, software has contributed to, indeed, it's been a crucial element really in many research discoveries. There's been an increasing recognition of the research data needed made available and sustained for the longer term, and more recently, software has been gaining attention in this area. But why should we sustain software for the longer term? How important is it and what can we do to sustain it? So here's an outline of what I'll cover today. We're going to start by taking a look at the importance of research software. We'll then look at two things we can do to directly help the sustainability of it. Firstly, we'll look at what we can do to plan for sustainable software in its development. We'll take a brief look at software management plans and other types of plans. We'll also look at some key principles and practices we can employ to make our software sustainable. We'll also look at the evaluation of software as both a skill and a tool to improve research software, talking a bit about the Software Sustainability Institute, which is based in the UK, and how our services employ evaluation to help researchers. And finally, we'll look towards the future and how evaluation can be conducted at scale with the aid of automation. So I'll start by talking about the state of research software. You may well have an opinion on this. I think you'll agree that research software is important, but how important? So in 2014, the SSI set out to get the numbers via a survey. The results came back from a random sampling of 417 researchers of 15 Russell Group universities based in the UK across a range of disciplines, seniorities and genders. And here are the headline findings. So whilst unsurprisingly almost all researchers, about 92% use software, a majority 56% actually develop their own. 70% of research, the overwhelming majority would be fundamentally impossible to perform without research software, which is a key finding. And another finding of interest though is that it's almost half of the respondents who expect it to write software into their research wouldn't actually put it in a funding bid. It's possible, indeed even in the European space, that you may have experienced this same problem. It can be surprisingly hard to actually get people to pay for the infrastructure and software development that's needed for research. If it's a problem you've experienced, it will be very interesting to hear the strategies that the BioExcel community have used to deal with it. Of course, these results are from people who chose to answer the survey. Is there a sample bias here where those who use software are more likely to answer a survey on it? And on that, any findings really need replication. So the institute set out to check if the self-reported survey results matched up with a situation on the ground. So 38 different universities in the UK mandated that papers written by their staff are published in an institutional repository. And so if you actually research the abstracts of these papers and these repositories for software-related terms, things like HPC, computational and things like software, you'll find almost two thirds of them are actually mentioned in software. So this broadly speaking, it is a good match for the numbers in our survey suggesting they're fairly solid. And you'll notice over time, these numbers are increasing more or less exponentially. We can also look into the funding situation for research. We can search the abstracts of successful proposals to the research councils for software-related terms. And interestingly, this result is slightly lower with around a third of bits mentioning software. But there is a positive trend upwards and increasing in large amount of funding is going to explicitly software-related research. Of course, the trend in the plot may simply be down to improved acknowledgement by researchers who have always been dependent on software. But either way, this puts a lower bound of about almost one billion pounds in funding spent on software-related research. And so the software you write really is important. Traditionally in academia, software seen as a necessary but throwaway artifact in research, at least historically, but this of course is changing. The most important message really is protective investment, which is the software. And software inherently contains value. It contains value in producing results, contains lessons learned, they wouldn't necessarily want to relearn if you re-implemented the thing again. And the effort taken to develop it as well is important. But it can be difficult to gauge to what extent it might be used or useful in the future, and also by whom. It could be that the whole software could be reused or perhaps just a part of it. In time, a part such as a key algorithm may actually turn out to be more valuable than the whole software itself. And importantly, software contains value for reproducing results in publications which you should always strive for. In terms of following projects, keeping software and its lessons learned can prove invaluable. It stops you reinventing the wheel. And it also means that you actually have solutions to hand, the problems you've already solved. But importantly also, you may find you created a solution to a problem that causes others big headaches and they can reuse it. And since software is an important asset in research, we need to make sure that we develop it well. This helps us and others to reuse it in the future. And as responsible researchers and software writers, we need to ensure that what we develop gives us the correct result. But if we don't do things well, what could go wrong? There's an example of a well-regarded researcher, Geoffrey Chang. Now, very highly regarded when a number of prizes, actually as it turns out for a lot of his work, he published five highly cited papers on multi-drive resistance. But unfortunately, others found they couldn't reproduce his results. And this had to lead to five retractions of five papers. The 40 results were caused actually by a bug in an internal software utility that flipped two columns of data. Now, this is obviously the kind of situation you've come to avoid ideally. But it's interesting that he's quoted the saying, I didn't question it then. Obviously now I check it all the time. And a little bit of paranoia of the correctness of our software in this operation can go a long way. We don't want to be too paranoid, but just enough so that we're able to sort of think, well, is what we're doing the right thing? Is it generating the right results? And importantly, if software cannot be trusted to generate the correct results, that's going to affect its popularity and sustainability as a product going forward. If a product isn't being used, it doesn't have a user base, but that's declining, that's going to affect its sustainability. And also notice that this is an internal software utility that created this issue and that software utility was not made available. And if we're able to share software by making accessible gives us the opportunity for collaborators and others to give us feedback or perhaps even fixes. And of course, whilst that's inconvenient in the setback, it's better than a retraction and it could even save us time in the long run. So what about planning for sustainability? What can we do beforehand? The problem of having open source as a goal and releasing your code in itself is that it doesn't actually say anything about how the software will be managed, developed and sustained, only how it will be distributed. And there's this absolutely brilliant report by the Mozilla Open Source Archetypes report, which is well worth a read. I thoroughly recommend you go and have a look. It's beyond the scope of this talk, unfortunately, but it presents a set of common open source models for providing software. And although it was developed initially to be relevant to Mozilla, they're also applicable to many other open source projects. So take a look if you have a moment. So when we wide with proposals, we often think about the goals of the project and work out a plan for how the research will meet those goals. But since software is such an important asset, shouldn't we also be planning for its reuse in the longer term? What we'll do is we'll take a look at what we can do to plan for sustainability. We'll take a brief look at data, but we'll focus a little more on software. So the first of the types of plans that we can make use of are data management plans. Now, these have been around in some shape or form since about 1966, where they originated generally in aeronautics. Now, data management plans are usually quite short at two pages and they answer questions such as the nature of the research project, purposes of data collection, the nature of the data itself in terms of its volume, its format at the time and the ownership of the data, the procedures and standards of data collection, references to institutional and other relevant data management policies, things like that. Now, these are actually currently more prevalent than other types of plans have been around for longer and they've been strongly encouraged or required in the UK for grant proposals to UK research councils. Secondly, we more recently have software management plans. Now, these have similar aims to data management plans, but they have a recognition of the differences between data and software. And so their guidance and the questions they ask are sort of slightly different, but essentially it looks at what software will be creating, how and plans for making it available in the longer term. We'll have a look at these in a moment. And then lastly, we have output management plans. Now these take a combined view of all assets created by a project. So this takes into account concerns addressed by data management plans and software management plans, as well as concerns for other assets such as things like created materials from perhaps a biological project like antibodies, cell lines and reagents. But let's have a look at SMPs and OMPs in a little more detail. So the idea of managing software outcomes is not really new. The US Department of Defense had the Defense Systems Software Management Plan way back in 1976. This covered software acquisition, development and maintenance. So this thing's been around for a while. And similarly, NASA had their advanced composition Explorer Software Management Plan in 1994. And this included aspects like contracts, quality assurance of the software and the procedures being created and also responsibilities and scope of the project. So ways to frame what's actually being created and how it will be maintained and made available. And SMP really is a statement of intent for how you'll manage your research software and make it available. And note the use of the word intent here. The plans when first written are likely to evolve. So the idea that an SMP is a living document that evolves with the projects and how it manages the software outputs is one we'll come back to. So an SMP should be revisited, for example, every three to six months or annually with revision creating a new version. So you're iterating the management pile as things are changing within your project. The project lead project manager is ultimately responsible for taking ownership of this and making sure that what's actually being done is actually being done. Now, however, in a research project, as I'm sure many of you relate to, very often other things take priority. Collaborations, papers, proposals, meetings, et cetera. And as an output, historically software hasn't been seen as an equal to things like publications and even more recently, things like data. So why should we write one? Well, an SMP helps you to understand how you can support those who wish to or do use your research software. It also helps you to work out how your software relates to other artifacts in your research ecosystem, the links between these things and how to manage those. And also how you will ensure that your software remains available beyond the lifetime of your current project. In essence, it's really forcing you to think about these things before you start. And typically we found in research projects when we're working with collaborators in the Institute that typically these things don't seem to be sort of given the right kind of level of attention a lot of the time, at least early enough. Typically what we find is at the end of the project, people tend to think about these things. But you get the real value when you start thinking about it early. The other reason is that they're also gaining traction with funders, at least in the UK. So for example, within the Engineering and Physical Sciences Research Council, they were required for the high-end computing HEC consortia call 2017. And also another call, the Computational Science and Engineering Software for the Future II call. So at the least, it shows you consider how software assets will be managed and made reusable beyond the project. And also we're seeing that institutions are providing guidance on them on how to use software management plans and how to fill them in. For example, in the UK, there's Bristol, UCL and York that currently do this. So it's best to write them at the start of a project so you can consider all these aspects before you start and how you'll implement them actually within the project. But it's also valuable in providing a way to draw together and summarize the aspects of research software that are already being decided. It can give a state of play of how you're currently managing things. When someone asks, you always have an answer. But it also gives you foundation to improve. So yes, this can reveal additional aspects, gaps or options that weren't considered or weren't even applicable when the project began, but have become more relevant as the project's gone on. So whilst an SMP may sound complicated, it's really about just committing to a plan that helps you sustain the software and importantly, thinking about that plan. It forces you to consider aspects that everyone should think about before writing software. So here's the set of core questions about what software does, how it relates to the software, how you'll make it available, how you'll support it and how it contributes to the research you're doing. If you go to our website, we actually have a set of guides that actually help you develop a software management plan and also a very in-depth checklist of things which goes into far more detail than what we're covering here at the moment. But essentially, they're just a series of decisions that need to be made and recorded. Critically, it's really about reuse. So within the SSI, on many occasions, we've observed research software being addressed very late in the project's lifecycle, as we've said. Now, this isn't ideal since many decisions that could influence how the software could be sustained and managed have already been made. It's often much harder to do things later on, but there's still a possibility of change there and doing this at any stage, really, is better than not at all. Now, those are software management plans, but another type of plan developed more recently is of particular relevance to bioinformatics, I think. Now, an SSI fellow, Lauren Gatto, is an open science advocate and group leader at Doudou Devay Institute in Belgium, and he presented the talk at Research Data Management Forum in 2017. He discussed the differences and commonalities between DNPs and SNPs and the proliferation of plans in general, data management plans, software management plans, all kinds of plans, materials plans. And there was an interesting discussion there with Kevin Ashley from the Digital Curation Center, who, where they had a discussion on the need for an output management plan, they simply covered all outputs from a project. As you said, things like it could be data, software, research materials. And so, there are also these output management plans which are gaining traction and are required in some circumstances by welcome trust for grant applications. So, essentially, an OMP is required when significant research outputs are expected. These could be things like database resource, data itself, software tool or materials, and include any of these where there may be views for research beyond their original purpose. And then, actually notably, it also means that things that could be generated other than that, such as significant IP, can also be captured and then brought under management. Essentially, if there's a clear potential for reuse, again, this is forcing you think about your outputs, how they'll be provided, and their usefulness to others. So, we've looked at how we can plan for sustainability with various types of plans. That's very helpful at the level of project organization. But what should we do when we're actually writing the software itself to make it more sustainable? So, what we'll do is we'll look at some of the principles and practices that we can actually employ to do that. The fundamentally, one of the main things the SSI has actually learned is not to expect too much, particularly all at once. There's a lot of straightforward, good enough practices out there. There's a couple of papers I've linked to here to see the Greg Wilson co-author papers, which gives some great suggestions for development practices and approaches. But the thing we've realized is that really every community is different, different levels of expertise, different needs, and also different priorities. And the key to SSI's approach and the approach we advocate is to try and change the underlying culture of software development, to make best practice something people in the field just do naturally as it makes their lives easier rather than as a list of seemingly arbitrary demands and posts on the outside. A key realization is it's important to convey the motivations for a practice as well as how you actually do the practice. If you can set it as an idea, then it becomes part of the culture. And then once you have a few in place, initially, you can build on that step by step, introducing new practices as you go. But the idea, at least initially, is just to pick the ones that will help you the most. So another thing that's getting traction is fair approaches to data and software. Now, the fair guiding principles for software data management and stewardship provide recommendations on how to make research software findable, accessible, interoperable, and reusable or fair for short. And the idea of fair as a set of principles for managing data has been around since 2016, but in more recent years, there have been attempts to apply fair principles to software. So what is fair? Here's the interpretation of fair principles generalized from how they are applied to data. First of all, the thing that we actually think talking about could be data or software should be findable. It should be easy to locate using a unique and persistent identifier. There also should exist a descriptive metadata to enable its discovery. Also, it should be accessible. A thing should be easily retrieved by machines and humans using standard protocols with metadata archived long term. So anything describing what a software is should be available long term, even if the software itself actually becomes unavailable. Interoperable, so in terms of things like software and data, should be available in formats that can be exchanged, interpreted, and combined, which includes the metadata of the thing we're describing. And of course, it should be reusable. There is this metadata that ensures reuse of future research with clear license and provenance of all the assets and where they come from. And those assets and metadata follow community standards. So how are these principles applied to data? For many of them, we need to apply them to metadata as well as the data itself. But here's a quick summary. In terms of findable data, it needs a unique and persistent identifier such as a DOI. And this is one of the most important things we can do to start with. It allows the other fair principles to be applied. And its metadata should describe how the data set was generated, who collected, edited, and published the data set, and the data's quality and other characteristics. For accessibility, it needs to be attainable using a standard, open free, and appropriately secure protocol. This could be HTTPS, HTTPS, or a manual one such as requesting access to the data by email. Its metadata needs to be available long after the data itself is no longer available to ensure that use of the data can still be traced, for example, by author, institution, or publication, for example. For interoperability, the metadata needs to be described in a suitable machine readable and usable language. And examples of this could be RDF, OWL, or JSON-LD, or whatever is appropriate. Using a common ontology, and that should reference other metadata where appropriate. And importantly, the ontologies themselves should also have fair principles applied to them, too. Now, for reusability, the metadata should be as rich and as detailed as possible, covering things like scope, data creation, the type of data, the version of the data, for example, and released under a clear and accessible usage license, ideally open ones like created commons. Now, this is how it applies to data, but it also turns out these principles are applicable and desirable within the context of software. So how would we do that? So we can look at aspects of how each of these different categories can be applied to software. So let's look at findable and accessible. So in terms of making software findable, it can be easily assigned persistent and unique identifiers for projects and versions. For example, we could deposit source code or eventual assets in a repository like Synode to get a DOI. There are also many metadata formats for describing software. One of these is CodeMeter, a community standard in Crosswalk, which provides an explicit map between the metadata fields used by a broad range of software repositories, registries, and archives. There are also the APIs that exist for searching and retrieving software from code repositories, digital repositories, packaging, library archives already. And some of the metadata is actually archived long-term in some of them as well. Software is increasingly intruffable and reusable. We see an increasing use of open community standard formats for integration with other software. There's also a greater use of platform, plug-in architectures, and there also exist common libraries and package managers. In terms of community-developed guidance, well, that's out there too, and standards exist to improve reuse there too. And there's also documentation licensing and other formats we can use for software development. So much of this already exists for software. So what we can do now is we can take a look at this, an interpretation of Fairfield software. What does it actually look like? So we'll have a look at that in terms of a checklist we can use as a guide to providing software in a fair way. Now, I've also included links to the corresponding data principles here, just a reference for those who are interested. But to start with, we write a software management plan. But this kind of stands on its own, but in general, it's useful for you, reusability. But it also can help to inform and provide information for a lot of the other things that we can do within the checklist, particularly for metadata. We should choose a license and make it clear. If we don't apply a license by default, no one else has the rights to use it. We should use version control and follow a good commit practice. So using a system like Git, Bitbucket, or perhaps an institutional repository and commit our changes to that repository frequently in small related blocks. This is one aspect that can help with reproducibility. We should also publish an update, a metadata about the software, allowing others to find and reuse it. And we should also use open community standard file formats. So where possible, we should explore the established data file formats being used in our domains, our communities, our projects, or even our local lab and use these. So for example, there exists the text encoding initiative for arts and humanities, literally text is Darwin core and ecological metadata language in the life sciences. We should also reuse packages and libraries and dentury implement where we can help it. So this obviously promotes we use ability and interoperability of our software. And we very importantly should provide documentation. And this really is a critical one that can't be stated enough. It was something that greatly aged reuse and discovery. We should provide documentation for us to install, use, modify software and also comment our code to aid its understandability so others can work with the code and extend it and maintain it themselves. We should also create releases, sign them, identify and distribute via a repository and try and deposit them into an archival repository where we can with metadata. And these allow our software to be findable and accessible. And we should also publish a citation for our software which others can use in their publications making it easier again to find and reuse. So what do people find difficult? The problem really here is it's the metadata. It's much easier to write software than to describe it for machines and humans. It's hard. It requires a different mindset really than writing software. And often more types of people need to be involved to do it. So we should make sure that we allow enough time and effort in our projects to allow this to be achieved. Now here's where an SAP can also be useful. The detail you include in it when describing the software can help inform this descriptive metadata you need to write. So this can provide us with a base level of practices that we can pick and choose from to make our software fair. But again, as you said, you can't do everything. So the idea here is that you should be prioritized the ones that will be most useful for your project. And then implement them over time and then revisit and add more as you continue. So we'll talk a bit now about the valuation of software. When writing software, it's tempting to reach what you already know. A library or a piece of software you're already familiar with or even the first thing you find that seems to fit the bill. I've been a victim of this many times and with certainly a pressure coming from the research side of things like publishing papers, getting funding, it's easy to give software and what we develop not enough time really to do things as we really would want to. But it's important to keep a critical mind of making software decisions. And critical valuation of software, either of the software we write ourselves or the software we reach for and use is a useful skill to learn. We'll start by taking a look at the Software Sustainability Institute and how it aims to help researchers. And then we'll look at how the SSI offers software evaluation as part of the consultancy services that it delivers. So the SSI really has one core goal, better software, better research. It started in 2010 delivering software development consultancy to researchers but it's evolved over time. There's been increasing recognition that it's researchers, teams and communities and not just software that reach boundaries in their development that needs to be overcome. And these boundaries can prevent improvement, growth and adoption of research software. And the SSI helps groups to reach the next stage in their development. And there's five main parts it pursues towards the better software, better research goal. One of the first things it does is it performs consultancy work offering researchers expertise and support to not just help failure but also help us keep in touch with the challenges they're facing on the ground. Another thing we provide is training, offering courses to try and improve the software engineering skills of researchers. As Rossum mentioned earlier, things such as software carpentry, data carpentry, so on and so forth. It's community work involves reaching out to communities like yours to gather intelligence on their needs or working to help other communities get together. We also perform policy work, logging stakeholders to try and make changes to the software environment in academia. And finally, our outreach role provides involves making people aware of our work. The SSI is a small organization with only about eight staff with 210,000 academics in the UK. It's much more efficient really for us to encourage communities to take up sustainable software practices themselves and to push them from the outside. So let's take a look at what the software team does in terms of evaluating software. Our software evaluation page, which details two approaches we take to software evaluation, is actually the most second most popular page on our website with over 55,000 hits. Now, the first approach we use is a tutorial-based evaluation. So this is an experience-based assessment, a very in-depth evaluation. The aspects we evaluate are refined by the collaborating project. We get them to work with us to determine what it is they want us to look at. They may have particular concerns regarding a specific aspect, or some aspects we may typically look at may not apply depending on the maturity of the project. We take a number of roles, developer, user, local project member, looking to use the software for the first time. For users, we typically look at user documentation, installing prerequisites and software, using it to basic tasks as well as how it's supported, issue tracking and their activity, for example, and the expectation of technical support. For developers, we typically look at the ease of setting up a development environment, technical documentation, example coding tests, the code-based itself, things such as readability, modularity, how the software's files are structured, and things like licensing. We also look at options for contributing to the project, for example, contributions policies, and options for getting technical help for developers. Also, we look at the software development process itself and determine how that can be improved, and how the practices they can actually employ. And the outcome here is a pretty comprehensive technical report of experiences with a set of observations and recommendations for improvement. Now, the second approach is slightly different. This is a higher-level criteria-based evaluation, which was derived and modified from an ISO standard. Now, this is more high-level assessments and evaluates where the software meets explicit criteria. There's about 180 criteria in total, split over topics such as usability, sustainability, and maintainability, and it's very useful as a checklist to guide development. For example, in other questions such as, is the source code repository available online for others to access? Now, those are the two ways we do software evaluation, but they have a very important secondary effect. They have an important role to play with software or output management plans. So software management plans are an intention of what you plan to do. Software evaluation is really about what you're doing. And as well as refining the project software and the related assets, software evaluation can thus help with delivering your S&P and adjusting it where necessary. So you can see where you are and then you can have that information feedback to or even initiate your software management plan. And this can be done through mechanisms such as assessing code quality, usability, and also the overall sustainability of your code base. So in essence, it provides a quality assurance in a sense for the process of enacting the management plan. So it becomes a feedback loop. And the idea here is that you continually evolve your management plan, assess where you are, and you're able to continue to have this virtuous cycle of software management plan improvement, better project assets as you go forward. So in a little more detail, here's a list of criteria topics. Now it's focused on three aspects, usability and sustainability and maintainability. These two are grouped together since they're actually quite closely tied anyway conceptually. The usability, we're looking at user tasks like understanding the software, learning new tasks, documentation, and actually using the software to accomplish basic tasks. For sustainability and maintainability, we're concerned with aspects like ownership in terms of identity, copyright and licensing, project governance and how it deals with its user and developer community, but also technical aspects like how testable, portable and interoperable the software is. And importantly for future growth and change, there's also analysability, the capability to understand the implementation, which you would require before you do the next one, which is the change ability, ease of modification of the software, and also evolvability. This is where evidence of the software is actually being developed and will continue to be developed and supported. And all of these are critical for sustainability. So what we'll do is we'll look at one of these in a little more detail. Let's take a look at documentation. We can see a list of some of the criteria for assessing the software's documentation here. So again, essentially what we have is a checklist of items that you would expect to see, such as different sections of the documentation for different types of users. There's also some further information, examples of how to do things and troubleshooting if there are issues. So this is also useful for projects doing this themselves. It can provide a set of things they aren't currently doing for them to consider. But of course, some of these may not apply to them, but it gives them a broad net of things to think about doing at a high level. And if they're doing this early enough, this is something that they can use to change their project going forward in a meaningful way. And this is something important we tend to use instead of criteria as a reference checklist when doing tutorial-based evaluations. So software assessment typically forms the core of what we offer. And what we'll do is we'll look now at how we provide consultancy as a service and how assessment actually forms part of their services. So evaluation enables us to provide recommendations for improvement, but it also allows us to familiarize ourselves with the software before providing any in-depth guidance or development. The first mechanism we have for this is the open call for projects, which starts in 2012. And the idea is the researchers submit a very short proposal really about the problems they're facing, how all the institutes can help. And it's free efforts. We work with the project on a plan of work to assess and provide guidance on the challenges. We've done a couple of these, we tend to do a couple of these a year. We review applications on our ability to address the problems and the potential impact of work we have on their user and developer community. And out of this, they will get an in-depth tutorial-based assessment and report, which they can use to replicate the experiences and understand the experiences of approaching the software and trying to use it and develop it for the first time. And the other thing we also tend to do sometimes is include some development to illustrate how to employ a particular tool or practice. A common one here, which we've done many times, is introducing a test suite for the software. We might show them an example of a test suite framework with some initial tests, which they can then build on. This activity tends to be quite variable in terms of effort from a few days to a couple of months of effort depending on the project and how we can help. Another thing we do is we provide consultancies funded directly from their projects. Now this typically includes some form of assessment guidance, but often it has a very stronger emphasis on actual development. So that could be something like helping them to refactor their code to be more maintainable. And that's very dependent on the project. We've also done a number of free light engagements with projects as well. These are usually only a few days of effort and it's usually sent to the round assessment and showing the projects the next step in how to evolve what it is they're doing and improving the software. But more recently, at the Research Software Engineering Conference in September, which was held in the UK, we launched the Research Software Health Check, which is another free software service. So again, a lightweight proposal of work is made by them a few minutes to fill in a form. We assess, in this case, submissions every couple of months. And this takes into account a lot of the lessons we've learned with doing over about 70 projects. This serves as more light weights at around one to two days of effort. And here it's leveraging our experience to identify key issues more rapidly and to help put projects on the right track. So we've noticed there are common patterns that tend to happen in projects, common stumbling blocks, common obstacles. And so we've come a lot better at being able to identify these a lot more quickly. And so often we can look at something for just a couple of days and we can give some meaningful recommendations on the next steps. And you'll notice software evaluation here is a very fundamental part of what we do. Now, an ongoing issue is how to do this at scale. Services like the Research Software Health Check can help us to evaluate more scalably. They're more efficient to do, so potentially we can do more of them. But it's still limited by a relatively expensive manual process. And the question then becomes, how can we extend our scalability of delivering our expertise even further? So we want to help researchers getting better at improving their software. We want to help them identify issues to improve their software and help guide them to the practices they need to address them. But the problem, of course, is that manual evaluation, whilst a very comprehensive, potentially a very comprehensive activity, it's quite expensive in terms of effort rate. It does not scale particularly well. Now, as you said, we've worked with a lot of projects to date and there are patterns of problems and the guidance also that have emerged. So typically when we've identified a problem, the guidance tends to be more or less the same. So if they haven't used version control before, the guidance is you should use version control and here are some guides and the reasons why you should do that. But looking at our previous open call projects, we've observed the areas of help that researchers acquire the most. So the four most common issues we've found, for example, are with software documentation, the development process, getting software to confidently calculate the right answer and also maintaining software. Secondly, within each of these, the actual guidance we're given reports is largely the same. So there's an opportunity here. If we can quickly identify the challenges that are being faced by our project, we can lead them to the guidance they need to address that challenge. And so here we have an opportunity and potential for automation. So in response to that, we developed something called the online software evaluation service, which alternates the criteria-based assessment. It's been in operation since 2012 and it's been used over 370 times. Now it's based on a criteria from the criteria-based evaluation. And the idea is that the user answers questions about the software and gets to report email to them based on their responses. The guidance has been taken from our reports that we've done previously with other projects and other sources that we've typically supplied. And the guidance that we supply as part of the online software evaluation service includes what the recommendation is, along with benefits of doing it, plus also the links to helpful guidance and other material. It's been very useful, actually, as a means to get an idea of the state of a project software before they collaborate with us. So this actually turns out, we've been using this as a prerequisite for submitting for the open call, which means that as the proposals for work come in, we can understand the requests of things they're actually asking for in context with actually knowing the state of their software because they filled in this questionnaire. However, it does have a few disadvantages. It's a large set of separate questions. And the problem is that currently, implementation-wise, you have to answer them all. It's very comprehensive. You get a very large report. But the problem here is it's not ideal if we're just looking for what to do next. It can give a whole suite of things that you could do, but it's not really giving any guidance for what you should actually do in terms of prioritization. However, it did have some big advantages, particularly in terms of scalability. It required some initial development effort, but not very much. And after that, very little effort was required to maintain it. So what we can do is we just leave the service to run, it's always available, and we just update its questions and topics periodically. But ultimately, we can just leave it and it works on its own. And so that's our online SES, but that leads us to an evolution of that, the online guidance service. And the idea behind that is going to be a lightweight web-based service that we're going to have hosted by the SSI website. And instead of a monolithic set of 70 questions, we're going to split these into separate topics and revise and expand them. And the key thing here is we're going to make the topics and the questions quite easily customizable so we can maintain them all easily going forward and potentially give, has the opportunity here for asking for others to provide topics that they can contribute into the system from outside. Importantly, it's not just going to assess the status software, but also assess individual skills. And so this gives us the potential to give guidance on training courses and other resources to improve them. So the thinking behind that is if we're able to assess what an individual is able to currently do in terms of their skill sets, then we can lead them to that training. And the other important aspect of this is that the idea is that through this site, users are going to be guided to the topics that are most useful. So instead of being sort of presented with a whole spread of different sorts of topics and things they could potentially look into and get assessed and get guidance on, we're going to assess where they currently are, what it is they actually most need help with and guide them to those topics. So we've been working with UK RSE group leaders and SSI fellows to help inform the service and refine the topics and questions to date. But we're also excited to be also working with BioXL, who are exploring automated assessments provided a quality mark by BioXL software. So we're in here to explore how we can develop generic structure and criteria for general assessment. And we're aiming to launch the online guidance service itself early next year, which leads to us, leads to our summary. So software is, so here's some take-homes for you really in essence. Software is critical assets for research and we should be managing and protecting it as such. SNPs and ONPs can help you plan for sustainability in the outset and we should do these, but we should also ensure that we revise them regularly. We can pick the best practice that help us most to deliver sustainable software and grow them over time. We don't have to do everything at once. We can pick the ones that are going to help us the most to start with. And also evaluation is a useful skill and activity to improve your software and the plans that you have sustaining them. And also lastly, automation provides an always available efficient mechanism for delivering lightweight self-guided assessment to guide improvement at scale. So there's a couple of things I was going to mention just in closing. If you're looking for an evaluation of your software, you may want to consider the Institute Research Software Health Check. Our next round of reviewing applications is early December. You'll find there's a link there that which you can use to go to our site and you can submit a proposal via an online form. And you've got until Friday the 29th of November next week to get something in. It's a very short process actually for submitting a proposal. So I would hope it wouldn't probably take more than about five, 10 minutes perhaps. The other thing I wanted to mention is the Institute's Collaboration Workshop which is taking place at Queen's University at Belfast in Ireland, 31st of March to 1st of April next year. And this brings together researchers, developers, managers, funders, publishers and trainers to explore best practices in the future of research software. And the themes of the workshop next year are open science, research, data privacy and software sustainability. Now our collaboration workshops follow a run conference for solar philosophy. So the idea is that attendees get to choose which proposed discussions get to happen. We also have a hack day on the last day where attendees can propose ideas for doing something and try and gain support for those ideas. If they have enough, they can take it forward to the hack day. And successful hack projects need not all be coded. They can include any combination of software, specifications, guidance and other materials, anything really that you need help of. And that enables people from all backgrounds to participate. And I hope to see you there. Okay, thank you very much for listening. I hope that that was useful. Onto questions, I guess. Thank you, Steve. This was a great presentation. So I would encourage our attendees. You can use the questions tab in the control panel to write your question. Please let us know whether you've used some of the tools where you've looked at the criteria that Steve was talking about. To what extent is software sustainability important in your project or in your organization? Interesting to hear. Yes, Steve, it's really great. All the work that SSI has done so far and indeed also in Microsoft we identified as very important for the long term the availability, sustainability of the software and looking forward to develop further your work on the criteria and the quality marks. Yes, that's particularly exciting. I'm really looking forward to that. I think there's great potential there. We can reuse those kinds of resources in many different ways. For what we're talking about in terms of automation but also for manual evaluation. The other thing I was thinking of that we've currently just been thinking about is evaluation as a trainable skill. So actually having perhaps a training course or something around evaluation. So improving our ability to pick and choose the right software for projects that's itself sustainable that helps the sustainability of our projects but also self-criticality of things that we're producing as well. So it becomes self-critical of the work we're doing and that helps to lead us to improve the software in a continuing fashion. You know, sort of adopting new practices, so on and so forth. There could be a potential there. Yes, yes. And as more and more publishing houses and journals start to require that when you submit to research you should provide access to sort code or metadata, et cetera. As this becomes the norm then your work will be even more important than main for the development of the software projects. Yeah, so in the UK we're seeing increased traction with things like software management plans and the need to be able to produce software that can be demonstrably reusable after a project ends as well, yes. So we like to think we're sort of having the game sometimes. But it'd be nice to see if that turns out to be the case. Yeah, and from interactions with various user communities there's been, we've heard several times that sometimes they worry from, for example, commercial software that suddenly, well, the company doesn't have resources to maintain any longer and then the users don't have access to it and they are critically dependent on this piece of software. So sustainability of the critical codes is a major issue in the long term. Yes, indeed, I totally agree. We have a question from Anthony. Let's see. Anthony, can you say something? Oh, hi, I thought I was just gonna type it out. Yes, my question really refers to the actual best practices of the design of the software to make future changes to that software far more sustainable, especially with respects to the research realm where you may have like a higher turnover of staff and students and when, say, for example, a PhD student leaves, perhaps a lot of that software gets shelved or becomes a bit archaic and can't be updated. I come from a commercial background and we spent a significant amount of time on the actual design of the software with respects to, in comparison, to actually putting your finger onto a keyboard. And I was wondering whether any of these these kind of best practices from industry were beginning to filter down into the research realm. That's an incredibly good question. So, where you were in your industry, did you do things like technical handovers by any chance? Oh, yes, a tremendous amount of that. And a lot of it came out through a massive kind of back end of a uniform modelling language down through metadata and documentation, version control through software like Doors. So everything was there and we had a lot with respects to process management and unable to actually get through to the next development process until certain goals had been reached. Yes, so one thing we actually have to take for very strongly here, at least with technical handover, is actually doing one. So at least getting the software sort of written to the point where essentially when you're writing the software, you're always assuming two things. Firstly, that someone is always going to want to reuse the software. And that would be perhaps to reproduce what's already been done. So one thing we tend to advocate there or a number of things we tend to advocate there are things like code reviews. So you're probably familiar with the concepts of code inspections. Oh, yeah, it's quite terrifying when you first start doing one. Very much. Yes, yes. But it's one of those things that is really, really good to do. But there was a lot of research that actually come out. Actually it's been known for quite a long time now that the first hour of code review is actually the most useful and that can just be just with one other person. So you can do it as a group and that's going to be very valuable indeed. But a quick mix for that is just getting someone else who's also familiar with the software too or may not be even to sort of sit by you while you're going through the code and explain what the code's doing. And that could be just for a critical part of the software. It doesn't have to be for the whole thing. Typically when you're looking at a piece of software it's not just all of it. It's partly, it tends to be a very sort of small part of it that's actually the most critical part. And you can sort of walk through how that's operating with someone else. And just by explaining it that can be a really useful way to help improve software, its design and how it's commented and making it explainable and understandable. And just an hour that someone else can be very useful. The other thing that we do is advocate very strongly for technical handover as a process. So the idea being that you're making the software into a state where it can be handed over very quickly to someone else. So one way of doing that is documentation of course. And it's one of those things that developers don't tend to like to do. Documentation is not very exciting. But it is one of those things that really helps others to understand. It's one of the most important things. Every time I talk to RSE group leaders in the UK they're often saying documentation's the most one of the most important things. You know, documentation can include things like technical documentation to house it together, software design of the software as you say, very important aspects of that, the architecture, how all the other sort of how it all sort of holds together. And someone else looking at that can then sort of get a head start on understanding how to maintain and develop the software going forward. So yes, technical handover can be very important and having that as a process. The other thing that we started to do here in the RSE software group at Southampton is we do things called knowledge transfer. And that's been a very useful thing for us to do. So this is a concept called bus factor, which is how many people would have to run over by a bus or leave on a bus for another job in order for your software to basically be in a state where it couldn't be sustained. And in certain cases you're identified by the bus factor is actually very low, which is very bad. It means that no one else really would have enough sort of information to be able to take software going forward, or at least it's not, it's too low really, it's like two or three or something. So what we've done is we've instituted a procedure where we actually have people talking about the software in sort of not a code review setting, but a more informal setting. They're talking about how the software's put together. And then that leads to questions about how it's put together. And we capture this discussion, then that feeds through to actually improvement in the software's documentation and also sometimes the software itself. And that's something that groups can do as well. So you just get people talking about it. Yes. And that's effectively a sort of informal code review, but also it covers other things like the usability of the software, the architecture and how it's all sort of put together in sort of a broad sort of state. And the key question there is, what does someone else need to know to be able to start developing on the software? And that's what someone who's in charge of software can try and answer. You're back into all this sort of improvement. Yeah. Sorry, that's a very long answer to your question. That's all right. Thank you. Okay. My pleasure. Any other questions? We don't have other in the control panel. Well, we're also getting close to the hour. Well, Steve, you already put a few links in the presentation slides that we can get in touch with you and SSI. Oh, here we have a question from Michelle. Let's see if we have audio. Michelle, can you? Yeah, hello. I had a very simple question. I was wondering whether those slides were going to be made available eventually? Yeah, I think they will. Yeah. We will share the slides. Yes. And also a recording of this session. Okay, that's great. Thank you very much. Goodbye. Thank you. Okay. So that will be for today. Thanks again, Steve. Thank you very much for the recording. And now we look forward to collaborating with SSI on making software sustainable and spreading the best practices among the communities. Me too. Very much looking forward to it. Thanks a lot for the opportunity, Russon. And thanks for the invite. Thank you. Bye, everyone.