 In the past 10 years, I've worked and collaborated and researched together with both private and public sector in terms of how they can use and collaborate, consume, open source. And one common question I commonly come across on both the public and private side is, can we use this? Is it safe enough? Can we trust that it's maintained? Is it sustainable? So, for the past almost one year and one year ahead, we have this research project where we're working together with primarily the Swiss industry and the Scania, working to develop health assessment framework, enabling you and your organizations to evaluate the open source components that you use, but also that you're potentially considering in your intake process as a way to be more secure and address the risks that taken in open source that has a low level of health or sustainability. So, this can enable you, both in a sourcing option, should you choose something else or should you develop it or whatever, or is this something that you're dependent on that maybe you actually need to go in there and try to engage and contribute back to the project that you are using and are depending on. So, this is a way trying to identify these and analyze them on a more qualitative level. Is this a project where you should engage in or how should you consider it in your intake process? So, what I'm presenting today is like a first draft of this framework where we've identified 107 characteristics over 15 different themes. So, open source health is a very multifaceted topic. It has so many streams, so many perspectives, so many nuances. So, it's very difficult to get a complete picture. So, this was our first step in doing this. So, I'm going to give you some context here trying to describe what I consider open source health. You're all welcome to engage and going to give you an overview of the different perspective that we found, the different nuances of health, and also going to give you an example way of applying it in practice as I did when I supported a large international organization in Sweden in terms of this, but applying an already existing set of metrics, the K-GAS framework, which I will come back to. Okay, so, first off, what is open source health? There are many, many definitions here. I consider it in lines with sustainability definitions. It's a product's capability to stay viable and maintained over time without interruption or weakening, so that there is a high quality maintenance all through. You can also look at it from a more ecosystem perspective, consider it's productivity, robustness, and openness. Productivity meaning how active is it, how actively is it developed in terms of robustness or stability rather, how concentrated is the development to one or a few individuals or one or a few companies. And then openness, how open is it for you and your organization or you as individuals to actually come in there and contribute and influence the direction on where it's heading. And I don't think this is anything new to people here in the room, but open source is everywhere. It's part of our digital infrastructure. And just as physical infrastructure, roads and bridges, it needs maintenance to stay viable and secure. So hence the analogy to bouncing corrects. So we need to care for its maintenance. And I think this strip pair from XKDC is quite well shared on Twitter where all of the modern digital infrastructure really relies on these small open source components, often maintained by one or a few individuals, quite often in the spare time still, even though some of these maintainers have started to more develop different business models of being employed and so on. This one refers to the Kerl project maintained by Daniel Stenberg in Sweden. But a lot of people I talked to still in both public and private sector, at least in Sweden, they think that this is not an issue, that it's cared for by others. Well, the fact is that, yeah, sure, giving enough eyeballs all things are shallow or all bugs are shallow, but it really requires that enough eyeballs actually reaches the source code that enough people actually engage and help with the maintenance, at least to the level that it's sustained. So we have a free writing. I don't know if I want to call it an issue because it's also good. I mean, open source is there to use. The more people who use it, the more popular it gets, and eventually the more eyes on it there will be. So I wouldn't say that free writing is an issue, but it can have good and bad aspects. So here comes the thing, okay, but where does the free writer issue apply? So here we can talk about the tragedy of the comments connected to the free writer issue where that exemplified this by Hardin in 1968 by this open pasture where the rational farmer keeps on adding animals and animals, maximizing their own benefit, which eventually will lead to overgracing and lost opportunities or lost potential for other farmers to benefit from the pasture. So this may be considered an, what I don't know, Ostrom, a famous political economist calls a common pool resource. Basically a resource system. So imagine the open pasture with the grass being the resource system and it having resource units being the grass. It's non-exclusive in is that it's difficult to exclude others from, or costly at least, and it's subtractable. The more you utilize from it and if it's not maintained or allowed to replenish itself, it will eventually deplete it and destroy it. And I've seen a lot of analogies of common pool resources, the open source software. Well, I'd like to make the third here that it's actually the brain time and the maintenance labor or effort here that is the common pool resource. This is the resource system. The resource units is the time that maintainers have available or that is put in by the community into the maintenance of this. And yeah, maintenance are humans, not robots. They can easily shift an interest, but they can also burn out. We've seen a lot of research or just change family conditions or working conditions or the employer refactorizes their code base, adopts a new product strategy or innovates their business model. And all of a sudden the maintenance is gone. So what happened? Are you going to do now? So how can you and your organizations and in your use case, in your intake process, how can we find these cracks and bumps before they appear? How can we identify these open source projects and put in the effort to help with the maintenance and raise the health? And also enable you to make a more informed decision on what software you are using and help your engineering departments and organizations to make better choices and help you with a better risk management process here. So we say by considering the health of open source in your intake process. So allowing you to consider on a more rough level, at least in the intake process, usually in an organization to consider this. But I've also seen on the acquisition level, especially from public sector where you have the procurement process, there is a need to be very thorough here to be able to compare open alternatives between each other, but also to the proprietary options. And this is all quite commonly, at least in the Swedish context, where public sector falls short in choosing open options because IT security or cyber security or someone comes and says that it's not secure. We can't trust open source. We have to go with what they're using in the municipality next door. So what we did again, we found 146 studies and 107 characteristics. So 106 or seven things that could characterize health on the open source product. That's quite a lot. And I don't think that we can consider everything when we're thinking of taking an MPM library into our development. These are divided among 15 themes, so a bit higher. But still we need some kind of prioritization here what to look for and so on. And this is very contextual to your organizations. So in the application I did with the chaos metrics which I will explain in a moment, we interviewed and worked with the company trying to find out what their risks are and really boiling it down. So I'm not saying that we have to consider all 107 characteristics, but considering health of an open source product is basically painting a picture and then being able to interpret that from your perspective on what you consider a risk and so on or what risks you are willing to take and so on. And so the slides are available on the schedule platform. So here is the supplemental material where we have all of the data referenced with all of the 107 characteristics going down into the different papers, the literature. And also we listed the metrics connecting to the different characteristics. I'm not explaining that here. I didn't bring that up in the paper, but it is there for you to dive into if you want. If there is any paper that you need access to, just contact me and I can help you with that because not everything is open access. So the framework structure here, so I talked about this assessment framework. I'm trying to organize all of these aspects. One dimension is the level of abstraction on the more upper level considering the network or the software ecosystem that the focal open source product is a part of. For example, the NPM package ecosystem or the open stack ecosystem or your dependency structure on GitHub and then characteristics relating more towards the more focal project. You have the more socio-technical dimension because you have characteristics relating to the more human and community side here, the more people side of the development, while you have the more technical and product related characteristics like the software development process and the quality of the source code developed what kinds of documentation is there available. And then the more process and governance related aspects and characteristics in the orchestration theme or part. So I'm not, don't be afraid, I'm not going to describe 107 characteristics but just to give you a feel for it. So one of the themes is communication. So how productive is an open source project in planning and discussing the evolution be it over different mediums. So here we saw response time, how quick is the maintainers and the community asking or responding to questions be it over a pull request or an issue or the chat or whatever. The quality of these responses, the social activity and also the visibility was also lifted on Twitter and other platforms. There was one study quite recently highlighting how Twitter actually can have a positive or being active on Twitter can actually have a positive effect although slightly growing your contributor base. Culture which is a bit more softer not that easily to measure quantitatively which requires more of a qualitative approach in measuring or characterizing. So how able is the community to facilitate an open and inclusive collaboration. So are there conflicts representative or are there conflicts? How severe are they? How are they managed? Are all discussions blown out of proportion or is there some kind of sanity in the discussions? Sentiment toxicity is a quite large topic. There are a lot of different algorithms now being developed trying to characterize the sentiment of toxicity from different perspectives in these discussions. Openness for input or you shut down recognition or new contributions or people coming in. Diversity on other soft topic how able is the community to accommodate and attract a diverse community of actors but also from a technical perspective because we can talk about the diversity of applications that are implementing or using the open source project. The demographic diversity of the people inside the community and those using it and also the organizational diversity that many organizations are using are there a lot of vendors or a lot of pure consumers of it? And how stable are they if we talk about financial situation and how stable is the community? Does the maintainer have a business model? Is the maintainer employed in some way to work on it, provide support services or anything? Popularity a bit more relatively to measure quantitatively in some ways. Competing projects was one aspect here other alternatives how popular are they. Stability a bit more larger theme how capable is the project in terms of preserving a critical population. So a lot of things about retention what's the turnover of contributors in the project how long do they stay how what's the level of bribe by contributors and also what level or stage in the life cycle process is the project. Is it entering a dormancy or is it on its growth? I mean and these are characteristics that I would really lift up and consider in how you interpret the different health aspects because a project in its growth phase I think you would need to look quite differently at that compared to a project in its more stable phase or dormancy phase and also technical activity and for example distinguishing between the activity of the general community the actual maintainers and the overall community. From a more technical perspective the development process is there a contribution process how well detailed is it and is it well applied or is there still a lot of questions asked about how to make a contribution are there good sample good newbie or starting off issues is there an onboarding process how good is that quality assurance and so on documentation more qualitative perspectives how complete is it how complex is it and how up to date is it but also what kinds of documentation is there is there a requirement doc is there a testing strategy is there a roadmap is there a user documentation what's the level of the more technical developer documentation general characteristics are these different technical features that affects the popularity and health of the project as we found in literature and it can be the application domain where in the stack it is if it's middleware maybe it's not that visible to people they don't know they're using it while in the front end people are more aware and what kind of technology is it is it JavaScript or is it C they can attract different amounts of people depending on the community licensing aspects more from the company scaffolding talking more about the infrastructure how accessible and open is it how quickly can you set up the build environment how easy is it security both in terms of general security practices applied how long does it take from a vulnerability or a CV is discovered until it's fixed and released and then more technical quality perspective which is more non non functional requirements maintainability but also looking more at the source code a lot of papers highlighting the importance of not too complicated or the quality of the source code the more governance and orchestration related so how mature and open the orchestration is in the project to enable an open and inclusive collaboration both on the actual project community level but also on the more larger level if it's part of a foundation or some kind of a new ecosystem or whatever so there are some limitations here of course we did not do a systematic review but we did find 146 papers and we did reach a saturation in the number of the types of characteristics we found so we could have continued but we chose to stop things were repeating themselves we saw limited coverage of characteristics on the network level but that has to do with our research strategy where we started from so we could have continued in that aspect but that would have blown the start out of proportions and also did we define health correctly I mean again there are multiple definitions here but we had a process where we tried to be consequent on how we applied the definitions and so on so just to give you some highlights here of initiatives that are very quite mature here that I'd like to highlight for example the chaos project most prominently which I think it started 2017 or 18 at the LF member summit I wasn't there but not in that room unfortunately there were a lot of developers practitioners, researchers community people that developed metrics in working groups for example relating to risk, value, evolution inclusivity, diversity ecosystem and so on how we differ is that we just considered research or the literature that we could identify and just as with the chaos project we have currently developed a smorgasbord of different metrics where you have to pick and choose the ones that are the most suitable for your context our aim in our coming cycles of research we're going to collaborate a lot with Scania and the select set of companies to make this more applicable to provide cases where in the automotive context and these type of projects these are the metrics that they apply and so on we have a close collaboration with software analytics company in Sweden where we have a close collaboration trying to see how which parts of these characteristics can be automated and so on so that's another important thing of the project Open Security Foundation maybe some of you know quite good they have a lot of best practices checklists best practices and so on Sustain OSS are really inclusive, good community I'd like to highlight if you're interested in these sustainability health aspects there are quite good discussions there so how can you apply this in practice so this was a lot of theory but can you actually apply it in some sense so to get some input into this so I worked part-time as an episode strategist at one company and what we did there basically we could consider a pre-trial to this research project where we try to develop a process but based on the chaos metrics where we so this process was owned by the enterprise architects where our main objective was to lower the risk level here of the open source that was considered in the intake process goals, having a decentralized self-managed process enable but overburden the developers needs to be simple and we need to enable follow-up and actionable insights so we can do a more strategic work here on what products we're focusing on and what we're using so we developed a questionnaire based on the chaos metrics and our main concerns and risks as well as the types of open source products that were used we found out through group discussions interviews and this is the approach I would recommend you either you as representatives in your organization or your open source program offices, what champions you have what kind of functions you have we really go out and talk to your organization, get to know your organization what types of products are you using and where are you using it who are you using it and so on we observed developers applying this questionnaire and we went down from two hours which was really extreme but that person was very furrow but we managed to get it down to 10-15 minutes which is at some kind of max with what a developer have time to spend evaluating I mean they make quick decisions we make quick decisions in this sense we can't spend too much time here but still we need to find some kind of level where we can raise the awareness and be a bit more furrow than we are today and the outtake here was that it was considered to raise the awareness and decrease the overall risk and intake process so I would say a lot of the main thing here, the main goal here is to increase awareness and provide developers, engineers with a framework that they can use with their experience to more interpret and consider the open source product that they are using or take the help of their peers which maybe have more interest or more experience in open source in using open source and so on because a lot of developers they have their experience they have the things they look for some people look for other things new people maybe don't know what to look for so this is a way to help them zero down or find a common language in what they look for things to consider again interview and map up your concerns consider the types of project used keep it lightweight it's a checklist, simple answers yes and no simple categories but still try to capture these different aspects must be easy to find the process, the data to answer these questions so if really provide directions type this git command to get this number to be able to answer the question or only spend five minutes in the issue tracker to get a feel for how well how quick people respond really try to to frame it automate where possible again if you can script things build an application and especially to your existing pipeline that would be good because today we use very large amounts of open source so there needs to be so really to do this in a more mature fashion I would really recommend some kind of flagging function where you take a set of the characteristics or the metrics that you identify and automate it so you can get your or yellow orange flags on projects and then with these projects then you can go in and have a look on a more qualitative level closer level where you can use the more checklist that you are otherwise using in the intake process and again there's no one model or number here to describe the level of health or sustainability of an open source project it's really so the 107 characteristics 15 themes and so on these are different pencils that allows you to create your picture of how you interpret the health and then you need the guidance in coming up with a conclusion out of the painting just when you're walking around a museum trying to what did the artist mean here different characteristics can help you guide you in creating this picture and I won't go into this but we did a pre-trial as well at the Swedish Employment Service where I helped them in an acquisition process evaluating two different open source based e-archival solutions again highlighting the importance of considering open source health but sometimes you need to be a bit more rough on the edges and do it quickly to address the risks on a suitable level in the procurement process from the public perspective you really need to be furrow and be able to motivate so even here the health perspective has a potential use case to fill in the future we will iterate on this again with companies and trying to make it more precise and detailed in certain use cases for certain open source trying to create these reference points that you and your organizations can maybe tie closer to to get a more feel for what you should look for we're also going to go more deep on the metrics level for observations seeing how more experienced engineers what they look for when we explain these characteristics and see how that compares to what scientists have found and also as I said we have a close collaboration with software analytics companies so we're going to see how we can automate as much as possible out of this yes yes 1, 2, 3, 4, 5 enabling comparison between open and closed alternatives in acquisition yes you can't see these details the role you know you're walking into I won't stop until but you can be walking into a situation that's far worse than you might imagine I mean we've had an example of that in the past that just kind of blew up and everybody had to find some way to go somewhere else so this doesn't allow you to have a direct comparison it just gives you a feeling for the open source side of stuff but again it builds up what level of trust can you put in the community what capabilities do you have yourself to use it or are there vendors more commonly in the public context that you can contract that can really provide the same service level agreement and the same quality standards that you buy from a proprietary vendor the plus with the open alternatives is that you actually can go in and see the quality how they comment on things the dependency tree how active the development is and so on which you can't in the more closer you can demand that your agencies developers should be able to review the code but it's not everyone that found with doing that really enabling comparison basically enabling a comparison okay but how can you trust the open source because the open alternatives as I said they are quite often dismissed just on the pre-notion that they are unsafe so it sounds to me like what you said it might be just dismissed out of hand so this is about building the faith in them but the faith in the closed source alternative has always just kind of been assumed so this is kind of a way to prove one side and then the other side is kind of just where it is but on the closed side you have a contractual arrangement where you can sue or put someone at stand if something goes wrong on the open alternatives there is no one by default that you can blame or put on the stand then you have to pay someone so this is an important part of this evaluation process thanks any other questions the other question I think a lot of it can be automated but there are a lot of soft issues or topics like I mentioned in the beginning like diversity culture I mean you can define your dependent variables here of course or independent variables but again it's how much then it's how much you trust these points here and there are research into all of these different streams for example again toxicity and sentiment which is a very growing topic in research there are developments in different models for determining or identifying toxicity and different kinds of toxicities but it's a matter of how much you can trust models I don't think that they are this mature at the moment from what I've seen I could be wrong but again some things you need to look more qualitatively so when doing this more of a furrow investigation it leaves room for this type of analysis but in your intake process I don't think that an employee would allow their front-end developers to go in and determine the level of culture in the community of each package even though it may be just one individual but still I think a lot of it can be automated but a lot of things are quite qualitative so that's why I really highlight this aspect of trying to integrate into your pipeline introduce some kind of flagging function where you should go in and look so some kind of pre-screen process does anyone here consider the health or sustainability in your intake process how do you apply that do you use that consequently for your full intake process and what's the outcome if it's a bad score are you satisfied with or do you see any room for improvement you, I'm not putting you on the stand but you in the purple looking down at your computer did you consider health in your organization as well how do you do that so what do you consider have you automated this or is this a qualitative check how long does it take for it to go food list too long but again it's a matter of how far do you want to be when can you reach a level where you think that the risk is acceptable that you're increasing the awareness for the engineers to make this decision how far you are how much you trust your engineers and they want to focus on that any other input reflections yeah, absolutely I mean it's I condensed it based on the chaos metric so I mean the original but I can I can share that I can update the PowerPoint on the schedule schedule platform as an appendix appendix yeah, we're collaborating with one company that are developing a tool support for this that previously they were called the now they're acquired by micro focus I think they are the goal is that they are going to release their data model open yes yes, so again the project is focused on private but I I like working with the public because of the impact and societal aspects the only aspect that or thing I know of is when I applied it in the context of the Swedish employment agency but that actually led to those earchival professionals they were starting to talk about open source health at national and international conferences with people working with earchival solutions so for every case that we apply this it spreads and people get to know about it because open source health has been sustainable these are new words to people people in Swedish public sector they hardly know what open source is and the main challenge here is that they think it's insecure or that's what people are telling so this is a way to counter argue this so it becomes a tool for them to work with but hopefully yes we can come to their call thanks